The Triangle is Dead, Long Live the Triangle!

Alec Jacobson

July 25, 2024

weblog/

An illustration of a green triangle dead on the ground and a pink triangle rising up with a crown
The following is an edited transcript of a keynote I gave at the Symposium on Geometry Processing in 2024.

The Symposium on Geometry Processing is a yearly conference where computer scientists present advances in algorithms for analyzing, modelling, simulating and otherwise processing shapes on the computer. Shapes could be geometry in any potential sense, but this conference primarily focuses on the surfaces of 3D solid objects. And the primary representation for these surfaces on the computer is the triangle mesh: a collection of 3D points on the surface called vertices connected in triplets to form triangular patches of the surface.

A photo of a young bearded man giving a powerpoint presentation
The Symposium on Geometry Processing is the home of triangle mesh research. I've been presenting there since 2010.

The Symposium on Geometry Processing is my academic home, so it was a special honor to give a keynote there. I found an old photograph — not necessarily the most embarrassing one — of me giving my very first academic talk at SGP in 2010. I was incredibly nervous (and I remember exactly who was really mean to me in the Q&A session 👀).

The Symposium on Geometry Processing is the place where we can geek out about triangle meshes. It's where we can go into really intricate detail about the meshes that we study. The same day I gave a keynote at SGP this year, I also gave a paper talk presenting something new about meshes.

I work with triangles a lot. I've been developing libigl for close to 15 years. Libigl is a C++ library for processing primarily triangle meshes.

A photo of laptop with an elephant on the screen next to a coffee mug
I ❤️ Triangle Meshes.

I could spend hours working with triangle meshes. A few years back, I was at a coffee shop, toiling away at programming some triangle mesh processing algorithm on my laptop. When I'm really stuck debugging I tend to sigh a lot, and I must have been loud enough to bother the people next to me. When finally everything was working and I saw what I needed to see on the screen, I was so relieved and let out a big, "Yes!". The guy in the coffee shop next to me leaned over at my screen and said, "All of that for an elephant."

Today is a really wonderful time to be doing geometry processing research. The stakes are higher than ever. Digital geometric data has entered our lives in so many ways.

Over half of American children are playing games on Roblox. Whether we would like it or not, children are making and selling games on Roblox raising new economic and child-labour right questions. The Roblox platform is also realizing some form of the "metaverse." Walmart is making moves to sell physical goods inside of Roblox, and IKEA is attempting to gamify the experience of working at IKEA.

Meanwhile, 3D printing has come so far from a decade ago when the computer graphics research community surged with excitement. You can buy a 3D printer today for $62.77. Companies like nTopology are securing massive investments to change the way we approach advanced manufacturing. The possibility for 3D printed weapons which evade security measures has also changed our society and laws.

Journalists at organizations like Forensic Architecture or The New York Times use 3D reconstructions to communicate.

Digital geometry has changed the way that we communicate with each other. For example, there are virtual reality applications for demonstrating the effects of COVID-19. The New York Times frequently uses 3D reconstructions and visualizations in their interactive scroller articles to explain the latest world disaster (e.g., "The Surfside Condo Was Flawed and Failing. Here’s a Look Inside."). Independent investigators — like those at Forensic Architecture — use geometric reconstructions to question the authenticity of state or police accounts of events (e.g., "Destruction of Medical Infrastructure in Gaza").

When we look at the splashiest recent advances in 3D geometry, we're not seeing a lot of triangle meshes. For example, Neural radiance fields (NeRFs) were a dramatic leap forward in 3D reconstruction from images. InstantNGP made fitting NeRFs fast enough for convenient use. And more recently, Gaussian Splatting exchanged the neural representation for Gaussians to dramatically decrease rendering time.

Karen X Cheng's McDonald's commercial used NeRFs in a creative way (notice how the floating bits on the digital tiger/rabbit sculpture mimic the artifacts of the NeRF reconstruction at the beginning of the video).

The New York Times immediately experimented with using InstantNGP to improve 3D portraiture. Last year McDonald's released a commercial directed by Karen X Cheng that used NeRFs in a creative way.

This Gaussian Splatting demo runs in the browser. Zoom in to see the individual spokes of the bicycle.

Gaussian splats viewers have been reimplemented many times over, and web browser-based version are particularly compelling. New representations like NeRFs and Gaussian Splatting are making a big impact.

As triangle mesh researchers, we may feel lost and kind of obsolete. What is our role now? Should we give up on triangle meshes and just rebuild our geometry processing pipelines on whatever new representation is trending? "Geometry Processing with Deep Signed Distance Fields." "Geometry Processing with Neural Radiance Fields." "Geometry Processing with Gaussian Splatting."

illustration of guitar as mesh, point cloud and sdf with boxing fists icons between
Common representations are often pitted against each other as if we need one universal winner. In reality, victory is short-lived because criteria are ad hoc and case-by-case.

Another alternative would be to double down hard on identifying the one true representation. We see this attitude manifest in papers as feature tables comparing different representations and concluding that a chosen representation is ideal because it's the only one that can do all of the tasks in the table. In reality, these criteria are often cherry-picked and incomplete.

My stance is that we should choose not to see new representations as a problem. We should embrace the diversity of different function spaces for representing geometry or the quantities defined over some geometry.

Leaving the representation decision free will also give us the freedom to care more about what comes before and after our pipelines. In particular, I'm excited to see our community shift more focus toward tracking error and uncertainty through our geometry processing pipelines. I'll now provide some evidence that we're already moving in this direction and that we're good at using diverse function spaces to our advantage.

A bar chart showing the number of publications on neural fields over time
By now, this is an old bar chart (from "Neural Fields in Visual Computing and Beyond" [Xie et al. 2021]). The number of publications on neural fields has only increased since 2021. When considering the absolute values per year that these bars are broken into half-years!

Neural fields have completely exploded in the last few years. The number of publications is skyrocketing. Given the success of neural fields, and the relative decline of interest in triangle meshes, I've been bugging the geometry research students at the University of Toronto with the following line of questions: "We're doing research on triangle meshes right now. Are we still going to be using triangle meshes in five years? Okay, sure, we all agree, of course we will in five years. How about in 50 years? How about in 100 years?"

In the year 2122, will we still be using triangle meshes? The events in the film Alien were (according to some) set in 2122. This sequence in the 1979 film represents one of the first uses of computer graphics in a major motion picture. One of the people involved was Canadian computer graphics professor, Brian Wyvill, who in 2011 gave a keynote at the Graphics Interface conference entitled "Announcing The Sad Death of the Triangle Mesh". I learned about that because a different Canadian computer science professor, Christopher Batty, was told about Wyvill's keynote when Batty himself gave a keynote at Graphics Interface 2015 called "The Triangle Mesh Strikes Back". So, if anybody is upset by article, just know that there will probably be another Canadian graphics professor giving a rebuttal in a few years.

Implicit functions: the dominant alternative to triangle meshes

But what really is the alternative to triangle meshes? What are we talking about when we say using something other than a triangle mesh to represent a 3D surface? The dominant answer to that question is an implicit function.

A 2D region with math label sign of function f
The idea of an implicit function is to represent the shape as a level set of a scalar function $f$ that maps query points to values. The surface is the set of all points where this function evaluates to zero. This function categorizes space into points that are less than zero and points that are greater than zero. The surface lies at the boundary between them. Image courtesy Nicholas Sharp.

Beyond representing the surface as the zero level set of the function $f$, we can require extra criteria to make it a more powerful representation. We can say that besides being positive or negative when we're away from the surface, the magnitude of the function should also measure the distance to the surface. Equivalently, we could require that the function has a known Lipschitz bound (a mathy way of saying that we know it doesn't grow in value too quickly). This makes the implicit function $f$ a signed distance field (SDF) and really good for all sorts of geometric queries.

A 3D model of a fox represented as a signed distance field
This implicit function representing a fox is a signed distance field. The colors show isolines of the function's value evaluated in space. Image courtesy Nicholas Sharp.

A basic operation we would like to conduct on 3D shapes is rendering. In terms of a query, this means we'd like to know whether a ray shot through a pixel hits the surface, how far away it hits the surface, and where on the surface it hits. We can use the distance property of the signed distance field to march along the ray until we find an intersection.

A diagram showing how to march along a ray until it hits a surface
The "sphere marching" method intersects a ray with a signed distance field by iteratively evaluating the distance to the surface at the current point along the ray and then stepping that distance along the ray. Eventually we either hit the surface or not. Images courtesy Nicholas Sharp.

So far we haven't discussed how a signed distance field is actually stored on the computer. For simple shapes like spheres, we can just write down the mathematical formula for the signed distance field. Alternatively, we could try to build up complex shapes by composing signed distance fields of simple primitives. Or we could store signed distance values on a grid and interpolate them. We could even compute signed distances to a triangle mesh.

Neural networks enter the picture when we decide to parameterize a signed distance field $f$ with using a neural network: a "deep signed distance field" (Deep SDF). We pick an "architecture" for the neural network, consisting of layers of weights, non-linear activations. The input to the network are the $(x,y,z)$ coordinates of a point in space and the output is a single real number reporting or approximating the signed distance function at that point.

In the "DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation" [Park et al. 2019], the deep SDF function could be written as: $$(x,y,z,\mathbf{z}_i) \rightarrow f_\theta \rightarrow s$$ where $(x,y,z)$ is the query point, $\mathbf{z}_i$ is a latent code for the $i$th shape in the training dataset, $\theta$ are the shared weights of the neural network $f$, and $s$ is the output signed distance value.

A diagram showing how to interpolate between two shapes represented by deep SDFs
Once trained, each shape of a DeepSDF network is represented by a supplemental input latent "code" $\mathbf{z}_i$. These span a continuous space over the training sets and if all has gone well, then interpolating the codes leads to reasonable interpolation of the shapes. In this example, $\mathbf{z}_1$ is the latent code for a love-seat type chair and $\mathbf{z}_2$ is the latent code for a wooden chair. We can freeze the weights ($\theta$) and vary the $\mathbf{z}$ input code to interpolate between the two chairs.

Instead of thinking of the training process as outputting explicit representations of different shapes. We think of the training process as outputting a callable function that we can use inside of algorithms like sphere marching when conducting queries on the shape.

We can also train a specific neural network just to represent one shape. In this case, we don't have the latent code $\mathbf{z}_i$ as input and we indicate that the weights $\theta_i$ are optimized for a particular $i$th shape we might want to store: $$(x,y,z) \rightarrow f_{\theta_i} \rightarrow s.$$

From the point of view of training over an entire class of shapes, we can think of this as overfitting the network's weights to one particular shape. It's perhaps better to think of the training process as still "generalizing". We're just merely generalizing over all spacial queries $(x,y,z)$ and not also learning a distribution of shapes from a dataset.

This weight-based encoding can't easily do combinations between different shapes, but it gives us a very compact way to represent 3D shapes. Unlike storing a signed distance field on a grid, this neural network is able to adapt where it's spending its informational budget to the particular shape at hand.

A 3D model of a dragon represented as a mesh, a grid-based signed distance field, and a neural implicit function
This dragon is compressed using mesh decimation, a regular grid of signed distance values and a neural implicit function. All representations use roughly the same number of floating point numbers. The neural implicit representation is self-adapting and continuously parameterized form of compression. To pick weight values $\theta$ we don't have to make discrete decisions, we can just run some form of gradient descent on a reconstruction loss against ground truth values.

We can also take the same neural network structure and train weights for each shape of a large dataset of shapes like Thingi10K to exploit this representation as a form of dataset compression. Since the non-linear space of functions is fixed by the network architecture, each shape is encoded as a simple vector of weight values.

A bunch of shapes from thingi10k dataset compressed using neural implicits
We can compress the entire Thingi10K dataset using the same neural network architecture. Each shape is represented by a vector of weights. This lossy compression reduces the 38.85GB dataset to 590MB.

Neural radiance fields (NeRFs) are similar to DeepSDFs or neural implicits. They are also taking a query point in space as input and outputting something related to geometry. Typically they're not outputting a signed distance field but rather a value $\sigma$ between zero and one representing a quantity more like occupancy or density of material in space and a quantity $(r,g,b)$ capturing color or reflectance of light: $$(x,y,z,\alpha,\beta) \rightarrow f_{\theta_i} \rightarrow (r,g,b,\sigma).$$ The weights $\theta_i$ represent a particular shape or scene and the additional inputs $\alpha$ and $\beta$ represent the viewing direction for reflectance calculations.

University of Toronto PhD student Towaki Takikawa demonstrates capturing photos to train a NeRF model in his apartment. This video is part of a larger survey and course effort.

NeRFs have made it significantly easier to capture 3D scenes. Instead of representing a realistic 3D scene as a collection of triangle meshes with UV coordinates plus texture maps, we represent an entire scene by optimizing the weight values of a NeRF's neural network so that when we render the NeRF at each camera location it matches the corresponding photograph as closely as possible.

DeepSDFs and NeRFs show the power of neural fields for compression and reconstruction of 3D shapes, but we're also seeing their impact on modeling.

A diagram showing the text-to-3D pipeline

At a bird's eye view without getting into too many details, generative text-to-3D modeling works by taking a painfully expensive to train text-to-image diffusion model like stable diffusion. Something that takes all the images we have on the internet and boils them down into weights of a transformer and then can take a text prompt as input and repeatedly nudge an output image toward matching that text. We take our continuous 3D representation like a NeRF or DeepSDF with colors and we define the rendering function as a forward process that maps from the 3D world to the image domain.

Now, if we can measure a loss that tells us how well our rendered image matches the one predicted from the text prompt then we can pass the change in this loss function backwards to modify the 3D representation: via backpropagation. This requires that we can differentiate through the rendering process applying the chain rule so that the change in loss with respect to the image is pushed into change in the image with respect to the 3D representation. This works because NeRFs and DeepSDFs are a continuous parameterization of a big space of possible 3D shapes.

There are important details I've glossed over such as how to set up the loss to be sure you don't get a peacock pasted 10 times around your shape and that you actually descend in a quick manner (see for example "DreamFusion: Text-to-3D using 2D Diffusion" [Poole et al. 2022] or "Magic3D: High-Resolution Text-to-3D Content Creation" [Lin, Gao, Tang, Takikawa, Zeng, et al. 2023]).

Interestingly, the actual neural network part of these implicit 3D representations is not the essential ingredient. There are a number of papers (e.g., "ReLu Fields", "Plenoxels", or "Gaussian Splatting") in the last few years that just get rid of the neural network part in favor of some other non-linear implicit function parametrization.

The key ingredients appear to be: 1. a non-linear and continuous 3D representation; and 2. a differentiable process of going from the 3D representation to a 2D image (and 3. — as anyone knows who's actually done this — good camera alignment).

Compared to the simple diagram of arrows above, the triangle mesh pipeline suddenly appears quite stilted and complicated.

A diagram showing the pipeline of photogrammetry for a guitar mesh
A cartoon view of photogrammetry using triangle meshes. We start by taking photographs and use some process to create a point cloud, then from the point cloud we create an auxiliary implicit function with something like "Poisson Surface Reconstruction", that's not really going to be our output so we do marching cubes on that to give us a mesh, but it's now way too high resolution and messy, so we do decimation and remeshing on that. We still have to worry about the colors so we flatten the mesh to define UV coordinates and then store our colors in an image.

After many stages and lossy conversion processes we finally have something that looks like a production ready 3D asset that we could ship off to a computer game. The promise of this convoluted seeming pipeline is that once we're done then things like rendering and animation are incredibly fast and highly optimized for our current software and hardware stacks.

But this is not easy to optimize end-to-end. Suppose we wanted to take all the variables in this whole process and throw them into gradient descent. This would be really tricky. Many of the steps involve discrete choices and chain together sub-operations that don't necessarily talk well together. If our goal wasn't to create an asset for an existing 3D video game, then why are we doing it this way?

Intrinsic Triangulations: A guiding example on embracing diverse function spaces

The triangle mesh pipeline has encouraged us to think of the space that we use for representing geometry as the same space that we use for doing computation on that geometry. We do this painful process of creating this triangle mesh, and then we think of that as the true surface that we're going to do our geometry processing on, and not only the true surface, but also this mesh must parameterize the space of anything we do on that surface.

My historical hypothesis is that when this pipeline really came into place both rendering and geometry processing were pretty slow and both kind of happy having fairly coarse 3D models.

A 3D model of an armadillo
In 2007, Halo 3 was a major video game release. By today's standards its geometry is extremely coarse. Meanwhile, "As-Rigid-As-Possible Surface Modeling" [Sorkine-Hornung & Alexa 2007] the best known SGP paper of that year featured models with only a few hundred or thousand triangles.

Nowadays, both rendering and geometry processing are working with much higher resolution models. Nanite in the Unreal Engine gladly digests assets with millions or maybe even billions of triangles. Geometry processing research has found ways to scale up to million-triangle meshes (e.g., "Progressive Simulation for Cloth Quasistatics” [Zhang et al. 2022]). But yet, geometry processing methods primarily think of these triangle meshes in terms of defining all spaces that we're doing computation on as well.

A triangle mesh of a cylinder
Consider a simple example like this triangle mesh of a cylinder. To represent its geometry, the triangle has done a great job. Why not? Just have really long skinny triangles along the side of the cylinder. It's perfect.

While long, skinny triangles are perfect for representing the geometry of the cylinder, if we also insist on using this mesh to represent all computation on the cylinder it means we can't represent interesting functions along the side of the cylinder. For example, we couldn't represent a wave rippling over the side of the cylinder. We've limited the function space to linear functions corresponding to the long edges.

We see the analogous problem happening with Gaussian splats. If I represent the cylinder with elongated Gaussian splats, they'll do a good job representing the side by putting very skewed or stretched Gaussians along the side.

figures from gaussian splat deformation papers
Researchers trying to do computation atop Gaussian splat representations are noticing that these long skinny Gaussians don't afford interesting functions either (e.g., Deformable 3D Gaussians, PhysGaussian, Simplicits). For example, when trying to compute displacement fields for animation the skinny Gaussians end up awkwardly poking out. They were an efficient function space for the original geometry but a poor function space for the intended deformation.

For triangle meshes, the geometry processing community has an answer: intrinsic triangulations. We call a typical triangle mesh with vertex positions in 3D an extrinsic triangulation of a surface. Intrinsic triangulations build an alternative mesh over top an existing extrinsic triangulation.

an intrinsic triangulation of a cylinder
This intrinsic triangulation atop the extrinsic triangulation of the cylinder above is a refinement of the function space for computation that doesn't change the geometry of the surface. Each intrinsic triangulation is a kinky, piecewise flat triangulation whose edges zig-zag over the original extrinsic triangulation.

Intrinsic triangulations are not a remeshing of the surface by adding new extrinsic triangles or vertices. And yet, we can represent much more interesting functions upon them.

Intrinsic triangulations are also a great way to improve the robustness of our algorithms, even on meshes that are high resolution.

A bad geodesics on a mesh, a zoom in, and good geodesics
A University of Toronto undergraduate and I were convinced we had a bug in our heat geodesics implementation when we saw a very poor quality result on a simple test mesh (left). The mesh is very high resolution and regular (center, zoom in). There are plenty of degrees of freedom, but all of these small triangles have a consistently bad aspect ratio. The numerics in the algorithm for solving the PDE catastrophically fail. If we flip all edges that make triangles intrinsically well shaped, then we get the correct result as seen on the right.

Intrinsic remeshing comes at almost no extra cost. The world of intrinsic triangulations is becoming quite complete. As seen above, there is refinement and remeshing. There is also intrinsic simplification, which gives us a new way to navigate the quality performance trade-offs. If we want a solution faster while maintaining as much quality as we can, the old way was to simplify the geometry itself and then run our algorithm there, on the shared lower resolution function space. This results in much lower quality solutions than building a simplified intrinsic triangulation of the same resolution that lives atop the original surface without changing the geometry at all.

Intrinsic triangulations are an example — within the world of triangle meshes — showing the room to gain by embracing the diversity of function spaces. As new function spaces like NeRFs, DeepSDF, and Gaussian Splats flex their power for representation of geometry, we should learn from intrinsic triangulations not to blindly expect that these same representations will be ideal for computation.

Skipping triangles completely

So in year 2122, how will we represent 3D geometry? How will we do computation on 3D data? Will we keep triangle meshes for their explicitness? Or will we move completely to Gaussian splats and NeRFs for their fuzziness because they can represent surfaces that aren't smooth, hard surfaces.

A NeRF reconstruction of a Woolly Mammoth created by University of Toronto PhD student Selena Ling using a procedure similar to "Adaptive Shells".

My expectation is that we will move to representations that can leverage data in a meaningful way. That means moving away from rule-based representations. Looking back at the history of computer vision, this progression has been very clear.

An abstraction of a human face with springs connecting each feature
This figure from a seminal 1973 paper on human face detection (by researchers at a weapons manufacturing company 😬) shows how the relative positions of features of a human face were modeled using springs.

Rule-based models — like springs between face features — could be understood as a futile effort to model every possible interaction by hand: in vision, referred to disparagingly as hand-crafted features. Alternatively, they could be seen as an attempt to design a first-principles simulation of the underlying physics. But this is equally futile. For example, in the case of the human face, this would mean not just modeling the instantaneous elasto-dynamics and muscle activation in a face, but also the entire growth process resulting in that face. This may be a fun exercise in physics simulation for the sake of it, but it's not a promising way to do face detection.

Indeed, we have achieved astonishingly good face detection now by training models fed with billions of images of human faces. If those growth and movement physics are learned during that training, then they are latent or implicit in the model, not hard coded by rules. It was more promising to approximate the visual system than come up with a seemingly endless list of rules.

My hope for geometry processing is that by shifting to representations that separate geometric representation from computation we can similarly build better high-level applications. In particular, we should strive to build on representations that give us more insight into what comes before and after each step in the geometry processing pipeline.

Along with many others, I have worked much of my early career on making the geometry processing pipeline more robust. The idealized pipeline flushes input geometric data from acquisition through various stages of analysis and editing until consumption. In reality, we know that this pipeline suffers from various sources of "leaks".

animation of a broken pipeline with labelled leaks
Leaks in the geometry processing pipeline could come from lossy conversion processes, algorithmic insufficiencies requiring manual intervention, or catastrophic failure requiring to go back to acquisition with a new data. Data that travel through successfully can give the false impression that everything is fine: survivor bias.

One of our most successful contributions to robust geometry processing was the concept of a generalized winding number, which was conceived originally as a subroutine in the process of conducting elastic simulation on a volumetric shape. In order to run finite-element simulation on a 3D shape, we needed a tetrahedralization of its interior. Existing tetrahedralization methods — like TetGen — were powerful but only if you told it where the inside versus the outside of the shape was: this is exactly what the generalized winding number could do. Once we could tetrahedralize any shape then we could join up with the rest of the simulation pipeline. We were plugging a hole in an existing pipeline.

Later, we realized that the generalized winding number was actually a way to skip over entire portions of geometry processing pipelines. Consider typical 3D printing software which expects as input a triangle mesh (really an .stl file) describing the surface of the shape to be printed.

If I have scanned some object into a point cloud, then I would need to first go through some process to create a triangle mesh, which I then hand off to the 3D printing software. Given this mesh, the software will decide for each slice through the object where the printer needs to squirt plastic to fill the interior of the shape. With respect to the original point cloud, this has just been a roundabout way of deciding what's inside and what's outside. Indeed, we can further generalize the winding number to point clouds to do just that.

The fast winding number inside-outside function was directly integrated into a 3D printing slicer to print point clouds directly without the need for a triangle mesh.

Much of my past work has been on making geometry processing robust. I still think this is a noble pursuit. This improves the throughput of our geometry processing pipeline. But that's not enough!

I think we should question the pipeline altogether and specifically think about which parts of the pipeline have become obsolete, remaining only for legacy reasons.

cartoon of opengl rasterization pipeline
Just as triangle meshes are themselves symptomatic of the legacy real-time rendering pipeline, so are many of the core algorithms that we work on in geometry processing research.

Consider storing colors for a 3D surface. The real-time rendering pipeline embraced texture mapping and geometry processing research responds with algorithms for automatic UV mapping. In turn, we typically think of this problem as picking new UV coordinates for each of the vertices of our mesh. Similarly, vertex shaders reposition vertices and we typically think of surface deformation as the problem of finding new positions over time for an existing triangle mesh's vertices.

Eventually, by assuming these problems should be posed this way, we also anchored our expectations that our algorithms should be solutions to tidy convex optimization problems and their performance should be roughly on the order of solving a big, sparse linear system. And then that biases the way that we judge new contributions in the field of geometry processing.

As researchers we should be more comfortable with the idea of doing gradient-based optimization, the dominant approach in machine learning.

It turns out even traditional mesh processing optimization problems can be solved more efficiently with the Adam solver. For example, both of the optimization problems in "Fast Quasi-Harmonic Weights for Geometric Data Interpolation" [Yu et al. 2021] and "Spectral Coarsening of Geometric Operators" [Liu et al. 2019] are convex programs that could be fed into black-box solvers. But each paper shows that you can get much better performance by running some variant of the Adam solver. We should break out of the habit of setting the goal as setting up a big linear system and solving it.

There's also lots waiting for us (e.g., "text-to-3D" as discussed above) if we're willing to spend the time and energy waiting for large, non-linear, perhaps less precisely defined optimization problems to converge (or at least "converge" in the machine learning sense). We should get more comfortable with problems that do not come equipped with simple mathematical checks for optimality.

Please don't get me wrong. Practitioners today definitely want triangles right now. Some practitioners see the advances of NeRF and Gaussian Splats as creating unusable clutter. They are "colorful flames" radiating their own light instead of reflecting light in a scene, simultaneously super high resolution while ultimately too noisy to extract a clean surface, and lacking part-based hierarchies needed for animation.

Practitioners with these complaints really don't want the current pipeline to change and would be really happy if we could continue making production-ready assets. And sure, this is also a worthy goal to pursue.

The Digital Life project captures animals as 3D models, not just as messy static snapshots. They go through the whole process of triangulation, simplification, UV-mapping, remeshing so that edges flow along lines of action, and rigging so that the animals can be animated for use in video games or children's museum exhibits.

Ultimately, though, it's myopic to stick with the current triangle-mesh pipeline. We will limit ourselves to short-term gains. The problems may be safe, but they will be small.

A YouTube video showing the output of topology optimization being manually traced back into a CAD representation. The user is doing this so that the rest of their geometry processing pipeline can work smoothly.

Advances in topology optimization often output implicit functions which don't fit into the traditional CAD workflow. Imagine if we could instead change the downstream part of the pipeline to accept this output. Then, for example, the simulation or other verification process could not only seamlessly digest the output, but also have its criteria for success turned into a loss function that gets back propagated to affect the optimization end-to-end.

Maxims of modern geometry processing research

[unmute!] It was nearly Fourth of July in America when I gave the SGP 2024 keynote version of this material. Every year on July 4th, the Nathan's Hot Dog Eating Contest is held in Coney Island, New York. It occurred to me that this contest provides a good maxim for picking research problems: don't ask how fast you can eat a whole hot dog, ask how fast you can eat lots of hot dogs.

In 2001, Takeru Kobayashi shattered the hot dog eating contest raising the record from 25⅛ to 51 hot dogs in 12 minutes. At the time, other contests were simply trying to eat the next hot dog (in its bun) as quickly as possible. Instead, Kobayashi changed the pipeline and would devour the meat and bun separately, dunking the bun in water to swallow it whole. It's a wonderful, weird example of viewing the larger problem and optimizing for that or finding a new pattern that shatters the previous way of thinking about it.

Another way that I have been approaching problems is by asking, "am I solving a mesh problem or am I using a mesh to solve a problem?" The meshes are there for us. We don't owe them anything. We don't have to solve their problems. But we can use meshes to solve our problems.

Yet another guiding question is, "Are we looking for an exact solution to an approximate problem or an approximate solution to an exact problem?" Both questions frame research in an interesting way. Stereotypically, mathematicians approximate problems until they can be formulated precisely and if a solution is found, then it can be proven correct. In contrast, the stereotypical engineer might refine a heuristic solution to improve its performance on the actual problem facing people in reality. Both views have value, and we should find balance between them.

In the past, colleagues and I worked on the mesh Boolean problem. The inputs are triangles meshes, and we showed that if they meet certain generous criteria then we could give the precise answer to what their union or intersection is.

Boolean set operations on two triangle meshes
The grey and teal triangle meshes are unioned, intersected and subtracted exactly.

There was no arguing about whether we gave the right or the wrong answer if we accepted the framing of the problem. We could prove that it gave the correct result.

However, often what we want is not the pedantically exact result, but rather an approximate result to what we truly intended the problem to be.

Two coplanar cubes rotated slightly and intersected
Placing two mesh cubes perfectly adjacent to each other, they have zero overlap. However, if we rotate the scene (using floating point arithmetic) and pass the cubes to our exact algorithm, we see small overlaps appearing seemingly at random depending on the rotation. The green region represents the intersection of the two cubes. The algorithm is giving the "exact result to the approximate problem", but in many modeling scenarios, what we would really want is the "approximate solution to the exact problem".

Deriving or coding up an exact solution can be satisfying and feel mathematically rigorous. Relinquishing exactness to better approximate the real problem can be scientifically satisfying too. We don't have to see this point of view a less mathematical or less rigorous.

Colleagues and I have worked on tetrahedralization in the past. The leading tetrahedralization software preferred the "exact solution to the approximate problem" route: it would try its hardest to compute a tetrahedralization that exactly conformed to the triangles of an input surface mesh. The problem was that it would often fail and give no output. Instead, our approach preferred the "approximate solution to the exact problem" route. We allowed our tetrahedralization to remesh the surface as needed to guarantee that we would always output something. Once we always get some result, we can shift the discuss from binary failure rates, to statistical measures of quality, conformity to the input, etc. This naturally sets up a benchmark for future improvement along various axes.

Beyond the bunny and into the uncertain

When preparing the SGP 2024 keynote, I procrastinated by looking back at every paper published at SGP 2010 (my first conference, see above). A few papers had interesting 3D models that I'd never seen before, but most papers featured only the standards: the bunny, the armadillo, the horse, etc. This is a testament to the time. There were not that many models.

For our SIGGRAPH North America 2011 submission, my advisor sent me an email with models and said something like, "Here are the models we can use." Those were the models back then.

In 2024, every result in your paper can be a different model. This is wonderful. You can go on stock websites like Thingiverse or TurboSquid and get the best model to make the point of a given research result. You can amass so many models that you can talk scientifically and statistically about the empirical performance of your algorithm versus the state of the art. You can find models that reflect the messiness of the real world, whether it's interesting high-resolution scans or meshes that come from modeling software that doesn't maintain all the nice things that keep our geometry processing algorithms working on clean data.

The messiness of geometric data will only increase. New reconstruction methods are going to flood the geometry processing pipeline with really nasty geometry. We should be ready for that. We should also admit when this messiness causes us to be unsure of an answer.

Diagram of a scanning pipeline
If we consider a scanning pipeline that starts with some raw data, creates a point cloud, runs some surface reconstruction algorithm and creates a mesh, then it's foolish to think that this mesh is a certain representation of the geometry we started with.

Meshes resulting from the reconstruction pipeline are just a particular output of a chain of algorithms that tried to make sense of the raw data that we captured. Instead, if we can track error through this process, it can turn into uncertainty accompanying answers along each step. This eventually turns into information that guides further downstream applications or directly informs people consuming the geometry.

A cartoon of a self-driving car and a point cloud
Consider a self-driving car approaching a point cloud. Will the car hit the scanned geometry? (Image courtesy Silvia Sellán)

We could reconstruct the geometry from a point cloud using a standard method like Poisson surface reconstruction. This method outputs a single shape that we could then test for collisions against.

alternative point cloud reconstructions
Based on the single answer from Poisson surface reconstruction, the car would not collide with the geometry. However, there are many other explanations for the point cloud that would result in a collision. (Image courtesy Silvia Sellán)

Recently, a project lead by Silvia Sellán sought to equip the Poisson Surface Reconstruction method with statistically rich output.

probability field
By noticing the connection with Gaussian processes, we can reframe the original Poisson Surface Reconstruction as outputting the mean of a distribution. Those distributions also have a variance, which in turn allow us to compute probabilities. (Image courtesy Silvia Sellán)

We can now ask what the probability is that the geometry covers any given point in space. This is a much more meaningful output than a single output geometry. We can take this idea of uncertainty and also apply it to novel representations, too.

One problem in the world of NeRF reconstructions occurs when you see part of the scene in only one of the input photographs. The easiest way for a NeRF to explain the blip is to put a little bit of colored density right in front of the camera for that photograph. We call these floaters. Karen X. Cheng's McDonald's ad playfully alludes to them, but generally we see them as a defect of the methodology.

The vanilla NeRF reconstruction of a tractor is obscured by floaters. By quantifying epistemic uncertainty, we can remove these floaters, yielding a clean reconstruction of just the part where the model has high certainty.

Moving on

"After AlexNet took first place in the ImageNet Challenge, everybody of a certain age went through the five stages of grief. First, there was shock and denial. It felt like the world was upended and nothing was going to be the same again. There was no denying the result, however."

— computer vision researcher, Michael Black

We see the same denial, anger, bargaining, and perhaps depression happening within some of the geometry processing community seeing new representations replace triangle meshes. Eventually, the computer vision community of course accepted deep learning and its success has been unparalleled. In geometry processing, our grief doesn't seem to be as deep and hopefully this eases our acceptance.

One frustrating thing is that we really care about geometry applications. Our community doesn't see them as simply tasks to apply machine learning to on one day, and move on to something else the next day. So, it's bittersweet to see machine learning advances make big, bold, messy progress on problems that we care so deeply about: procedural modeling, reconstruction, or even physical simulation.

In the end, the diversity of these representations in 3D are an opportunity for us. They allow us to tackle bigger portions of the geometry processing pipeline end-to-end. They open up bigger output spaces for our algorithms, including exciting things like uncertainty.

These will lead to much bigger impacts on people. This includes the professionals who care just as deeply about our applications as we do. Many of these professionals are outside of the world of visual effects, computer games and computer graphics, where we tracked the origins of our triangle mesh obsession.

Our progress in geometry processing stands to make a big impact on everyone on Earth. It's changing the way we do commerce, creation, and communication.

Acknowledgements

I would like to thank the collaborators on the projects I've referenced, including students: Benjamin Chislett, Gavin Barill, Hsueh-Ti Derek Liu, Jiayi Eris Zhang, Lily Goli, Mark Gillespie, Nicholas Sharp, Selena Ling, Silvia Sellán, Thomas Davies, Towaki Takikawa, Yixin Hu. I would also like to thank the authors of the works I've referenced, but was not involved in. Finally, I am fortunate to receive funding for my work at University of Toronto from NSERC, Sloan, Ontario ERA, Canada Research Chairs, Fields Institute, and gifts from Adobe.