
The source cuda code can be found here, which you should noted is that the stb_image.h, stb_image_write.h and the stanford-bunny.obj must be placed at the same file level.
1. Project Objectives
This project implements a particle-based dissolve animation from stanford-bunny.obj. The core visual sequence is:
- Build a static bunny silhouette
- Trigger fracture from a single ear-tip seed (propagation-based)
- Let particles disperse under normal-direction blast + Curl Noise field
- Finish with either reconstruction (
regroup) or disappearance (vanish) - Run a two-orbit camera path with near/far shot transitions
- Export a PNG sequence and encode it into a high-quality GIF
2. Overall Technical Architecture
The system uses an offline rendering pipeline:
- CPU side: OBJ parsing, triangle construction, area-weighted particle sampling
- GPU side: per-frame particle dynamics update (CUDA kernel)
- GPU side: particle projection + multi-segment trail splat accumulation (CUDA kernel)
- GPU side: background composition, bloom, vignette, gamma (CUDA kernel)
- CPU side: write per-frame PNG files
- FFmpeg: generate GIF with
palettegen + paletteuse
Key idea: instead of triangle rasterization, reconstruct the mesh surface as a dense particle cloud, then apply stage-based dynamics and stylized shading.
3. Geometry Data and Particle Initialization
3.1 OBJ Parsing and Triangulation
- Parse
vandflines, including first-field index parsing forf v/vt/vnformat. - Triangulate polygonal faces via fan triangulation.
- Skip degenerate triangles (very small normal length).
3.2 Normalization
- Compute the bounding-box center and translate geometry near the origin.
- Uniformly scale by the maximum extent to stabilize framing and motion scale.
3.3 Area-Uniform Surface Sampling
- Compute triangle areas and build a CDF.
- For each particle:
- Pick a triangle with area-weighted probability
- Use barycentric sampling
- Position:
- Store surface normal, random phase
phase, and random seedseed
This yields a particle distribution that visually behaves like a true mesh surface instead of random volumetric sampling.
4. Procedural Noise and Curl Field
4.1 Hash-Noise Foundation
- Use
pcg_hash+ integer lattice hashing to construct parallel-evaluable pseudo-random value noise. value_noise_3dproduces continuous noise via trilinear interpolation and Hermite smoothing.
4.2 Curl Noise
- Build a 3D vector potential field , then compute curl:
- Estimate derivatives with finite differences for stable and parallel-friendly evaluation.
Meaning: the curl field is approximately divergence-free, so particle motion looks like coherent advection/vortical flow instead of unstructured jitter.
5. Fracture Dynamics
Particles are updated each frame with explicit integration:
where d is drag and F is total force.
5.1 Ear-Tip Single-Seed Propagation Mask
Given the ear-tip source point :
Interpretation: particles farther from the ear tip fracture later, producing a clear propagating wavefront.
5.2 Blast Pulse and Wavefront Enhancement
- A two-peak
pulsecontrols primary and secondary bursts. source_gainstrengthens the near-ear region.wave_frontboosts the currently advancing fracture front.
Approximate blast term:
5.3 Force Composition
The total force is the sum of:
- Normal-direction blast force (along surface normals)
- Shock radial force (outward from ear tip)
- Curl Noise flow force (smoke/fluid feel)
- Global outward expansion force (detaching from body)
- Tangential swirl force (adds rolling structure)
- End-state force:
regroup: pull back to initial surface positionvanish: gravity-driven sinking + alpha fade
5.4 Stage-Based Opacity and Regroup Stability
regroup: raise opacity after dispersion; finalsnapinterpolation enforces stable silhouette closure.vanish: delay fade-out to preserve longer trail readability before disappearance.
6. Camera System (Two Orbits + Near/Far Shots)
The camera is a parameterized orbital camera:
- Total orbit angle:
4π(two full revolutions) - Radius switching between far
~3.3and near~1.72 - Near-shot windows driven by two Gaussian pulses in mid/late timeline
- Mild vertical oscillation + target offset coupling
Effect: combines global shape readability (far shots) with high-impact fracture details (close shots).
7. Particle Rendering Model (Screen Space)
7.1 Projection
- Use camera basis vectors
right/up/forwardfor view-space projection. - Particle depth controls screen radius and base opacity.
7.2 Velocity Trails (Key Visual Enhancer)
- Project world-space velocity into screen space to get motion direction.
- Compute dynamic
tail_lenfrom speed magnitude. - Use up to 4
trail_tapsfor reverse-direction multi-segment splats. - Control each segment’s radius, alpha, and color independently (toward violet-pink at the tail).
This is the core implementation behind “longer pink-purple trails”, without relying on post-process motion blur.
7.3 Accumulation Strategy
- Accumulate all particles into an
accumbuffer withatomicAdd(RGB + alpha). - Normalize color by density during composition.
8. Composition and Stylization
Implemented in compose_frame:
- Deep-to-light purple gradient background
- Dual-layer halo (central + offset halo)
- Foreground-over-background blending (density-based)
- Light bloom enhancement
- Vignette for subject focus
- Gamma correction (approximate sRGB)
Overall color strategy:
- Static pink-white
c_static - Mid-stage magenta
c_mid - High-energy deep purple
c_deep - Trail accent color
c_tail
9. CUDA Parallelization and Memory Layout
9.1 Kernel Breakdown
init_particlesstep_particlesclear_accumraster_particlescompose_frame
9.2 Data Layout
- Particle state:
float4arrays (initial, position, velocity) - Accumulation buffer:
float4(RGB sum + alpha sum) - Output frame:
uchar3(stored as linear byte array)
9.3 Complexity
- Simulation:
- Rasterization: approximately , affected by particle radius and trail segments
- Main bottleneck:
atomicAddcontention in high-density regions