Rendering#
In this tutorial, we’ll explain the ins and outs of rendering: the process of creating an image from 3D data.
4D-to-video rendering#
In Medusa, we try to make it as easy as possible to render 4D reconstruction data as a video. As you might have seen in the quickstart, you can use a VideoRenderer
object for this. Note that this renderer is only available if you have pytorch3d installed, which is unfortunately not possible on Windows at the moment (nor on M1/M2 Macs).
from medusa.render import VideoRenderer
The class constructor takes four arguments — shading
, lights
, background
, and loglevel
, which all have reasonable defaults. We’ll ignore the lights
argument for now, which we’ll discuss later.
The shading
argument can be either “flat”, which creates a faceted look, or “smooth”, which creates a smoother surface using Phong shading. We’ll use smooth shading for now and set the loglevel to ‘WARNING’ (which does not output a progress bar which clutters the website) and leave the background to its default ((0, 0, 0)
, i.e., black):
renderer = VideoRenderer(shading='smooth', loglevel='WARNING', )
The renderer expects the 4D reconstruction data to be wrapped in a Data4D
object (see data representation tutorial). Let’s load in the 4D reconstruction (by the ‘emoca-coarse’ model) from our default video:
from medusa.data import get_example_data4d
data_4d = get_example_data4d(load=True, model='emoca-coarse')
# We'll slice the Data4D object this way (resulting in only 50 frames/meshes) so that the
# rendering is a bit faster for this tutorial
data_4d = data_4d[:50]
To render the 4D data to a video, you use the VideoRenderer
object’s render
method. This method has two mandatory arguments:
f_out
: path where the video will be saveddata
: theData4D
object
Additionally, this method accepts an optional argument, overlay
, which are the colors to project onto the vertices before rendering. This can be any Textures
object from pytorch3d
(like TexturesVertex
, for vertex colors). We’ll get into this later.
Now, let’s render the reconstruction!
from IPython.display import Video # just to show the video in the notebook/website
f_out = './viz/smooth.mp4'
renderer.render(f_out, data_4d)
# Show result
Video(f_out, embed=True)
One way to make the visualization a little nicer is by only rendering the face (rather than the full head). To do so, you can use the apply_vertex_mask
method from the Data4D
object with the name
argument set to 'face'
:
data_4d.apply_vertex_mask('face')
This method basically removes all non-face vertices from the mesh, leaving us with 1787 vertices (instead of the original 5023):
tuple(data_4d.v.shape)
(50, 1787, 3)
Now, let’s re-render the data (which is a lot faster now too, as it has to work with fewer vertices):
renderer.render(f_out, data_4d)
Video(f_out, embed=True)
Remember the background
argument of the VideoRender
class? Instead of setting this value to a constant color (like black, as before), you can also set this to the original video the reconstruction was based on! We’ll show this below (and use flat shading, to show what this looks like):
from medusa.data import get_example_video # video associated with data4d
vid = get_example_video()
renderer = VideoRenderer(shading='flat', loglevel='WARNING', background=vid)
renderer.render(f_out, data_4d)
Video(f_out, embed=True)
Overlays#
So far, we only rendered the face as a grayish, untextured shape. We can, however, give it a different uniform color or specific color per vertex with the overlay
argument of the render
method.
The overlay
argument can be a tensor with a single color per vertex (or any pytorch3d
texture, but we won’t discuss that here). Colors need to be represented as RGB float values ranging from 0 to 1, so overlay should be a \(V \times 3\) tensor.
To demonstrate, let’s create an overlay that’ll make the face bright red, which corresponds to an RGB value of [1., 0., 0.]:
import torch
V = data_4d.v.shape[1]
vertex_colors = torch.zeros((V, 3), device=data_4d.device)
vertex_colors[:, 0] = 1. # set 'red' channel to 1
tuple(vertex_colors.shape)
(1787, 3)
Let’s render the video again, but now with vertex_colors
used for the overlay
argument:
f_out = './viz/test.mp4'
renderer.render(f_out, data_4d, overlay=vertex_colors)
Video(f_out, embed=True)
Note that we don’t have to use the same colors for each time point! We can also create an overlay of shape \(N\) (nr of frames) \(\times V \times 3\), which specifies a specific color for each frame (and each vertex). To demonstrate, we’ll generate random RGB values for each frame and vertex, creating quite a trippy visualization!
N = data_4d.v.shape[0]
vertex_colors = torch.rand(N, V, 3, device=data_4d.device)
renderer.render(f_out, data_4d, overlay=vertex_colors)
Video(f_out, embed=True)
Now suppose that we would like to color the face along a more interesting feature, like the amount of local movement relative to the first frame (which we assume represents a neutral face). This is in fact quite an extensive procedure. We’ll walk you through this procedure step by step, but aftwards we’ll show you a way to do this more easily.
First, each frame’s movement relative to the first frame (\(\delta v_{i}\)) can be computed as follows:
\begin{equation} \delta v_{i} = v_{i} - v_{0} \end{equation}
Importantly, we assume for now that we’re only interesting in the local movement of the face (i.e., facial expressions) rather than the global movement (i.e., rotation and movement of the entire head). To project out the global movement, we can call the to_local
method:
# To only show local deviations, we can use the to_local() method which projects out any "global" movement
data_4d.to_local()
dv = data_4d.v - data_4d.v[0, :, :]
tuple(dv.shape)
(50, 1787, 3)
The problem here is that we do not have one value for each vertex to visualize, but three: the movement in the X (left-right), Y (up-down), and Z (forward-backward) direction! We could of course just visualize a single direction, but another possibility is that we project the movement on the vertex normals: the direction perpendicular to the mesh at each vertex (see image below).
Red lines represent the vertex normals; from wikipedia
We can get the vertex normals and project the movement (dv
) onto them as follows:
from medusa.geometry import compute_vertex_normals
# normals: V (1787) x 3 (XYZ)
normals = compute_vertex_normals(data_4d.v[0], data_4d.tris)
dv_proj = (normals * dv).sum(dim=2)
tuple(dv_proj.shape)
(50, 1787)
The projected data (dv_proj
) now represents movement relative to the normal direction: positive values indicate that movement occurs in the same direction as the normal (i.e., “outwards”) and negative values indicate the movement occurs in the direction opposite to the normal (i.e., “inwards”).
Now, the only step remaining is to convert the values (dv_proj
) to RGB colors. Let’s say we want to show inward movement as blue and outward movement as red; we can use matplotlib
for this as follows:
from matplotlib.colors import CenteredNorm
from matplotlib import colormaps
norm = CenteredNorm(vcenter=0.) # will make sure that 0 is in the "middle" of the colormap
cmap = colormaps['bwr'] # blue-white-red colormap
# the colormap does not accept torch tensors
dv_proj = dv_proj.cpu().numpy()
dv_proj_colors = cmap(norm(dv_proj))
# convert back to torch tensor
dv_proj_colors = torch.as_tensor(dv_proj_colors, device=data_4d.device).float()
# N x V x RGBA (but the 'A' will be discarded later)
tuple(dv_proj_colors.shape)
(50, 1787, 4)
Finally, we can pass these colors (dv_proj_colors
) to the renderer. Note that we also project the data back into ‘world space’ so that it aligns nicely with the background video again:
#renderer = VideoRenderer(shading='smooth', loglevel='WARNING')
data_4d.to_world()
renderer.render(f_out, data_4d, overlay=dv_proj_colors)
Video(f_out, embed=True)
You’ll probably agree that this entire process is quite cumbersome. To make things a little easier, Medusa provides an Overlay
class that performs much of the boilerplate code necessary to render such overlays.
from medusa.render import Overlay
The most important arguments when initializing an Overview
object are:
v
: the vertex values that will be used to create colors (e.g., thedv
variable from earlier)cmap
: a string with the Matplotlib colormap that will be used (default:'bwr'
)dim
: dimension ofv
that will be plotted (0 for X, 1 for Y, 2 for Z, or ‘normals’)
If you want to project the XYZ values onto the vertex normals (by setting dim='normals'
), then you also need to provide the vertices of a neutral frame (v0
) and the mesh triangles (tris
), as shown below:
overlay = Overlay(dv, cmap='bwr', vcenter=0., dim='normals', v0=data_4d.v[0], tris=data_4d.tris)
To get the colors, simply call the to_rgb
method:
colors = overlay.to_rgb()
tuple(colors.shape)
(50, 1787, 4)
And finally render as before:
renderer.render(f_out, data_4d, overlay=colors)
Video(f_out, embed=True)
3D-to-image rendering#
A lot of the steps necessary to render each 3D mesh from the 4D sequence is abstracted away in the VideoRender
class. If you want to know more or want more flexibility with respect to rendering, keep on reading!
Essentially, what the VideoRender
class does is looping over all frames in the video, fetching the 3D mesh(es) per frame (remember, there may be more than one face, and thus more than one mesh, per frame!), render them to an image, which are written to disk as a continuous video file.
Rendering happens using the PytorchRenderer
class, a thin wrapper around functionality from the pytorch3d
package:
from medusa.render import PytorchRenderer
Note that this renderer only renders (batches of) 3D meshes to images. It is initialized with a three mandatory arguments:
viewport
(a tuple with two integers): the output size of the rendered image (width x height)cam_mat
(a 4x4 tensor or numpy array): the affine matrix that defines the camera posecam_type
(a string): the type of camera
The cam_type
is “orthographic” for any FLAME-based reconstruction data and “perspective” for Mediapipe data. The cam_mat
and viewport
data can be extracted from the Data4D
object you intend to render:
viewport = data_4d.video_metadata['img_size']
renderer = PytorchRenderer(viewport, cam_mat=data_4d.cam_mat, cam_type='orthographic')
Now, you can pass any single or batch of 3D meshes to the renderer’s __call__
function (together with the mesh’s triangles) which will return an image with the mesh(es) rendered onto it:
# render the first time point
img = renderer(data_4d.v[0], data_4d.tris)
tuple(img.shape)
(1, 384, 480, 4)
As you can see, the returned image (img
) is of shape \(N\) (number of images) \(\times H\) (height) \(\times W\) (width) \(\times 4\) (RGBA) with unsigned integers (0-255). Note that you can also render multiple meshes to multiple images at the same time by explicitly setting the single_image
argument to False
:
# render the first 16 time points
imgs = renderer(data_4d.v[:16], data_4d.tris, single_image=False)
tuple(imgs.shape)
(16, 384, 480, 4)
If you want to render the 16 rendered meshes on top of the original video frames, we can load in the video frames and “blend” it with the video frames. To load the video frames in memory, we can use the VideoLoader
class:
from medusa.data import get_example_video
from medusa.io import VideoLoader
vid = get_example_video()
loader = VideoLoader(vid, batch_size=16)
# To only get a single batch, you can create an iterator manually and call next() on it
# (bg = background)
bg = next(iter(loader))
bg = bg.to(data_4d.device)
tuple(bg.shape)
(16, 3, 384, 480)
And to blend the rendered meshes with the video, we can call the alpha_blend
method from the renderer:
img_with_bg = renderer.alpha_blend(imgs, bg)
To write the rendered images (which are torch tensors) to a video file, we can use the VideoWriter
class:
from medusa.io import VideoWriter
writer = VideoWriter('./test.mp4', fps=loader.get_metadata()['fps'])
writer.write(img_with_bg)
writer.close() # call close if you're done!
Video('./test.mp4', embed=True)
As you can see, we’re getting the same result as with the VideoRenderer
approach (albeit with only 16 frames)! Although it requires more boilerplate code, this approach gives you more flexibility to render the data exactly as you want it.