Rendering#

In this tutorial, we’ll explain the ins and outs of rendering: the process of creating an image from 3D data.

4D-to-video rendering#

In Medusa, we try to make it as easy as possible to render 4D reconstruction data as a video. As you might have seen in the quickstart, you can use a VideoRenderer object for this. Note that this renderer is only available if you have pytorch3d installed, which is unfortunately not possible on Windows at the moment (nor on M1/M2 Macs).

from medusa.render import VideoRenderer

The class constructor takes four arguments — shading, lights, background, and loglevel, which all have reasonable defaults. We’ll ignore the lights argument for now, which we’ll discuss later.

The shading argument can be either “flat”, which creates a faceted look, or “smooth”, which creates a smoother surface using Phong shading. We’ll use smooth shading for now and set the loglevel to ‘WARNING’ (which does not output a progress bar which clutters the website) and leave the background to its default ((0, 0, 0), i.e., black):

renderer = VideoRenderer(shading='smooth', loglevel='WARNING', )

The renderer expects the 4D reconstruction data to be wrapped in a Data4D object (see data representation tutorial). Let’s load in the 4D reconstruction (by the ‘emoca-coarse’ model) from our default video:

from medusa.data import get_example_data4d
data_4d = get_example_data4d(load=True, model='emoca-coarse')

# We'll slice the Data4D object this way (resulting in only 50 frames/meshes) so that the 
# rendering is a bit faster for this tutorial
data_4d = data_4d[:50]

To render the 4D data to a video, you use the VideoRenderer object’s render method. This method has two mandatory arguments:

  • f_out: path where the video will be saved

  • data: the Data4D object

Additionally, this method accepts an optional argument, overlay, which are the colors to project onto the vertices before rendering. This can be any Textures object from pytorch3d (like TexturesVertex, for vertex colors). We’ll get into this later.

Now, let’s render the reconstruction!

from IPython.display import Video  # just to show the video in the notebook/website

f_out = './viz/smooth.mp4'
renderer.render(f_out, data_4d)

# Show result
Video(f_out, embed=True)

One way to make the visualization a little nicer is by only rendering the face (rather than the full head). To do so, you can use the apply_vertex_mask method from the Data4D object with the name argument set to 'face':

data_4d.apply_vertex_mask('face')

This method basically removes all non-face vertices from the mesh, leaving us with 1787 vertices (instead of the original 5023):

tuple(data_4d.v.shape)
(50, 1787, 3)

Now, let’s re-render the data (which is a lot faster now too, as it has to work with fewer vertices):

renderer.render(f_out, data_4d)
Video(f_out, embed=True)

Remember the background argument of the VideoRender class? Instead of setting this value to a constant color (like black, as before), you can also set this to the original video the reconstruction was based on! We’ll show this below (and use flat shading, to show what this looks like):

from medusa.data import get_example_video  # video associated with data4d

vid = get_example_video()
renderer = VideoRenderer(shading='flat', loglevel='WARNING', background=vid)
renderer.render(f_out, data_4d)
Video(f_out, embed=True)

Overlays#

So far, we only rendered the face as a grayish, untextured shape. We can, however, give it a different uniform color or specific color per vertex with the overlay argument of the render method.

The overlay argument can be a tensor with a single color per vertex (or any pytorch3d texture, but we won’t discuss that here). Colors need to be represented as RGB float values ranging from 0 to 1, so overlay should be a \(V \times 3\) tensor.

To demonstrate, let’s create an overlay that’ll make the face bright red, which corresponds to an RGB value of [1., 0., 0.]:

import torch

V = data_4d.v.shape[1]
vertex_colors = torch.zeros((V, 3), device=data_4d.device)
vertex_colors[:, 0] = 1.  # set 'red' channel to 1

tuple(vertex_colors.shape)
(1787, 3)

Let’s render the video again, but now with vertex_colors used for the overlay argument:

f_out = './viz/test.mp4'
renderer.render(f_out, data_4d, overlay=vertex_colors)

Video(f_out, embed=True)

Note that we don’t have to use the same colors for each time point! We can also create an overlay of shape \(N\) (nr of frames) \(\times V \times 3\), which specifies a specific color for each frame (and each vertex). To demonstrate, we’ll generate random RGB values for each frame and vertex, creating quite a trippy visualization!

N = data_4d.v.shape[0]
vertex_colors = torch.rand(N, V, 3, device=data_4d.device)
renderer.render(f_out, data_4d, overlay=vertex_colors)

Video(f_out, embed=True)

Now suppose that we would like to color the face along a more interesting feature, like the amount of local movement relative to the first frame (which we assume represents a neutral face). This is in fact quite an extensive procedure. We’ll walk you through this procedure step by step, but aftwards we’ll show you a way to do this more easily.

First, each frame’s movement relative to the first frame (\(\delta v_{i}\)) can be computed as follows:

\begin{equation} \delta v_{i} = v_{i} - v_{0} \end{equation}

Importantly, we assume for now that we’re only interesting in the local movement of the face (i.e., facial expressions) rather than the global movement (i.e., rotation and movement of the entire head). To project out the global movement, we can call the to_local method:

# To only show local deviations, we can use the to_local() method which projects out any "global" movement
data_4d.to_local()
dv = data_4d.v - data_4d.v[0, :, :]
tuple(dv.shape)
(50, 1787, 3)

The problem here is that we do not have one value for each vertex to visualize, but three: the movement in the X (left-right), Y (up-down), and Z (forward-backward) direction! We could of course just visualize a single direction, but another possibility is that we project the movement on the vertex normals: the direction perpendicular to the mesh at each vertex (see image below).

vertex_normals

Red lines represent the vertex normals; from wikipedia

We can get the vertex normals and project the movement (dv) onto them as follows:

from medusa.geometry import compute_vertex_normals

# normals: V (1787) x 3 (XYZ)
normals = compute_vertex_normals(data_4d.v[0], data_4d.tris)
dv_proj = (normals * dv).sum(dim=2)

tuple(dv_proj.shape)
(50, 1787)

The projected data (dv_proj) now represents movement relative to the normal direction: positive values indicate that movement occurs in the same direction as the normal (i.e., “outwards”) and negative values indicate the movement occurs in the direction opposite to the normal (i.e., “inwards”).

Now, the only step remaining is to convert the values (dv_proj) to RGB colors. Let’s say we want to show inward movement as blue and outward movement as red; we can use matplotlib for this as follows:

from matplotlib.colors import CenteredNorm
from matplotlib import colormaps

norm = CenteredNorm(vcenter=0.)  # will make sure that 0 is in the "middle" of the colormap
cmap = colormaps['bwr']  # blue-white-red colormap

# the colormap does not accept torch tensors
dv_proj = dv_proj.cpu().numpy()
dv_proj_colors = cmap(norm(dv_proj))

# convert back to torch tensor
dv_proj_colors = torch.as_tensor(dv_proj_colors, device=data_4d.device).float()

# N x V x RGBA (but the 'A' will be discarded later)
tuple(dv_proj_colors.shape)
(50, 1787, 4)

Finally, we can pass these colors (dv_proj_colors) to the renderer. Note that we also project the data back into ‘world space’ so that it aligns nicely with the background video again:

#renderer = VideoRenderer(shading='smooth', loglevel='WARNING')
data_4d.to_world()
renderer.render(f_out, data_4d, overlay=dv_proj_colors)

Video(f_out, embed=True)

You’ll probably agree that this entire process is quite cumbersome. To make things a little easier, Medusa provides an Overlay class that performs much of the boilerplate code necessary to render such overlays.

from medusa.render import Overlay

The most important arguments when initializing an Overview object are:

  • v: the vertex values that will be used to create colors (e.g., the dv variable from earlier)

  • cmap: a string with the Matplotlib colormap that will be used (default: 'bwr')

  • dim: dimension of v that will be plotted (0 for X, 1 for Y, 2 for Z, or ‘normals’)

If you want to project the XYZ values onto the vertex normals (by setting dim='normals'), then you also need to provide the vertices of a neutral frame (v0) and the mesh triangles (tris), as shown below:

overlay = Overlay(dv, cmap='bwr', vcenter=0., dim='normals', v0=data_4d.v[0], tris=data_4d.tris)

To get the colors, simply call the to_rgb method:

colors = overlay.to_rgb()
tuple(colors.shape)
(50, 1787, 4)

And finally render as before:

renderer.render(f_out, data_4d, overlay=colors)
Video(f_out, embed=True)

3D-to-image rendering#

A lot of the steps necessary to render each 3D mesh from the 4D sequence is abstracted away in the VideoRender class. If you want to know more or want more flexibility with respect to rendering, keep on reading!

Essentially, what the VideoRender class does is looping over all frames in the video, fetching the 3D mesh(es) per frame (remember, there may be more than one face, and thus more than one mesh, per frame!), render them to an image, which are written to disk as a continuous video file.

Rendering happens using the PytorchRenderer class, a thin wrapper around functionality from the pytorch3d package:

from medusa.render import PytorchRenderer

Note that this renderer only renders (batches of) 3D meshes to images. It is initialized with a three mandatory arguments:

  • viewport (a tuple with two integers): the output size of the rendered image (width x height)

  • cam_mat (a 4x4 tensor or numpy array): the affine matrix that defines the camera pose

  • cam_type (a string): the type of camera

The cam_type is “orthographic” for any FLAME-based reconstruction data and “perspective” for Mediapipe data. The cam_mat and viewport data can be extracted from the Data4D object you intend to render:

viewport = data_4d.video_metadata['img_size']
renderer = PytorchRenderer(viewport, cam_mat=data_4d.cam_mat, cam_type='orthographic')

Now, you can pass any single or batch of 3D meshes to the renderer’s __call__ function (together with the mesh’s triangles) which will return an image with the mesh(es) rendered onto it:

# render the first time point
img = renderer(data_4d.v[0], data_4d.tris)

tuple(img.shape)
(1, 384, 480, 4)

As you can see, the returned image (img) is of shape \(N\) (number of images) \(\times H\) (height) \(\times W\) (width) \(\times 4\) (RGBA) with unsigned integers (0-255). Note that you can also render multiple meshes to multiple images at the same time by explicitly setting the single_image argument to False:

# render the first 16 time points
imgs = renderer(data_4d.v[:16], data_4d.tris, single_image=False)
tuple(imgs.shape)
(16, 384, 480, 4)

If you want to render the 16 rendered meshes on top of the original video frames, we can load in the video frames and “blend” it with the video frames. To load the video frames in memory, we can use the VideoLoader class:

from medusa.data import get_example_video
from medusa.io import VideoLoader

vid = get_example_video()
loader = VideoLoader(vid, batch_size=16)

# To only get a single batch, you can create an iterator manually and call next() on it
# (bg = background)
bg = next(iter(loader))
bg = bg.to(data_4d.device)

tuple(bg.shape)
(16, 3, 384, 480)

And to blend the rendered meshes with the video, we can call the alpha_blend method from the renderer:

img_with_bg = renderer.alpha_blend(imgs, bg)

To write the rendered images (which are torch tensors) to a video file, we can use the VideoWriter class:

from medusa.io import VideoWriter

writer = VideoWriter('./test.mp4', fps=loader.get_metadata()['fps'])
writer.write(img_with_bg)
writer.close()  # call close if you're done!

Video('./test.mp4', embed=True)

As you can see, we’re getting the same result as with the VideoRenderer approach (albeit with only 16 frames)! Although it requires more boilerplate code, this approach gives you more flexibility to render the data exactly as you want it.