Lucky Iyinbor @Luckyballa profile

Lucky Iyinbor

@Luckyballa

Followers

3K

Following

2K

Statuses

511

Physics Simulation | Geometry Processing | Computer Graphics | AR/VR

Joined June 2012

Don't wanna be here? Send us removal request.

Lucky Iyinbor

@Luckyballa

7 months

Found some time to add scene collisions to my Vision Pro playground, and I did not expect this level of accuracy!

203

1K

13K

Lucky Iyinbor

@Luckyballa

3 days

A few weeks ago, I implemented a paper where an algorithm was described from a CPU standpoint. By rethinking it for GPU architecture, I was able to get almost a 100x speedup compared to the metrics from that paper. Here's how I did it: The Problem: We have vertices randomly assigned to clusters (spheres). We need to compute per-cluster statistics where each vertex contributes to its cluster's metrics. With significantly more vertices than clusters, the main challenge was figuring out how to do parallel reduction efficiently while supporting scaling to millions of vertices and centroids I developed 3 methods: Method 1: Direct Atomic - Each thread handles one vertex with a cluster assignment - Directly updates global cluster memory with atomics It's nice and simple, but has high memory contention, random access, and in general, atomics to device memory are quite slow Method 2: Threadgroup Memory - Each thread handles one vertex with a cluster assignment - Each thread group allocates memory for all clusters - First accumulates in thread group memory for all clusters, then to global memory Here we have less atomic contention and good thread workload, but we're limited to ~512 clusters (32KB threadgroup memory limit), which is fine for most cases but not scalable Method 3: Range-Based - Sorts vertices by cluster ID - Finds start and end indices of sorted vertices for each cluster - Computes cluster distribution given a desired thread group size - First accumulates in thread group memory for a single cluster, then updates global memory This one is my favorite - it scales to any cluster count, has perfect memory coalescing, and we have one cluster per group, so no limits on number of clusters! The downside is that it requires sorting and some threadgroups aren't fully utilized In practice: Method 2 and 3 outperform the first one significantly. Method 2 is fastest but limited, Method 3 is slightly slower but scales to any size. Both methods allow you to go from 2-3 seconds stated in the paper to 30-40ms on M1 Ultra Mac That's it, have a good day :)

0

2

19

Lucky Iyinbor

@Luckyballa

5 days

RT @ssh4net: A Radiance Field Loss for Fast and Simple Emissive Surface Reconstruction Ziyi Zhang, Nicolas Roussel, Thomas Müller, Tizian Z…

0

3

0

Lucky Iyinbor

@Luckyballa

5 days

@Spiritandsoul23 No framework Using this method in my compute shader playground

Lucky Iyinbor

@Luckyballa

9 days

I want to work more on high-dimensional optimizations I'm bad at math, so deriving all gradients in a chain by hand is painful for me I'm focused entirely on GPU development, so CPU auto-grad isn't an option I hate using frameworks - what are my options? The answer is this -

0

Lucky Iyinbor

@Luckyballa

6 days

Voronoi for the win 🏎️

Andrea Tagliasacchi 🇨🇦

@taiyasaki

6 days

📢📢📢 "𝐑𝐚𝐝𝐢𝐚𝐧𝐭 𝐅𝐨𝐚𝐦: Real-Time Differentiable Ray Tracing", a mesh-based 3D represention. Co-lead by my PhD students Shrisudhan Govindarajan and Daniel Rebain, and w/ @kwangmoo_yi

0

13

Lucky Iyinbor

@Luckyballa

9 days

@jmeseguerdepaz Another thing I am bad at is python, so probably this is not the best option for ahah

0

2

Lucky Iyinbor

@Luckyballa

9 days

RT @zianwang97: 🚀 Introducing DiffusionRenderer, a neural rendering engine powered by video diffusion models. 🎥 Estimates high-quality geo…

0

130

0

Lucky Iyinbor

@Luckyballa

11 days

Another way to approach MAT is to describe it as a field The medial field M(x) is the radius of the medial sphere centered at projM(x), where projM(x) is the intersection of a ray from a surface point to the medial axis in the normal direction It can be defined as a function that satisfies these constraints: M*(x) ≥ |Φ(x)| M*(x) = |Φ(projM*(x))| ∇M*(x) · ∇Φ(x) = 0 This representation allows finding a projection point on the medial axis in O(1): projM(x) = x + ∇|Φ(x)| · (M(x) - |Φ(x)|) Key applications include faster ray marching, collision proxy building, and ambient occlusion computation

0

3

18

Lucky Iyinbor

@Luckyballa

12 days

@miketuritzin Isn’t it a bit specialized? Like decomposition for arbitrary 3D shapes is probably not trivial

1

0

Lucky Iyinbor

@Luckyballa

12 days

RT @miketuritzin: Ran across this great article on sampled SDFs that has *great* interactive WebGL illustrations that work really well for…

0

40

0

Lucky Iyinbor

@Luckyballa

12 days

@miketuritzin Awesome stuff

0

1

Lucky Iyinbor

@Luckyballa

13 days

RT @QianqianWang5: Introducing CUT3R! An online 3D reasoning framework for many 3D tasks directly from just RGB. For static or dynamic sce…

0

98

0

Lucky Iyinbor

@Luckyballa

13 days

I was a bit boggled by the fact that VMAS uses shrinking ball to keep spheres inside instead of modeling it as an energy LSMAT shows a better way: use RBF to blend point/plane distances into local SDF approximation, making inscription part of the optimization itself

0

3

11

Lucky Iyinbor

@Luckyballa

18 days

@turtlespook Here I use the Gauss–Newton method. It has only 4 parameters per sphere, so the system matrix (approximate Hessian) is very small (4x4 per sphere) Fits nicely in threadgroup memory, which is 32kb on Apple Silicon

0

2

Lucky Iyinbor

@Luckyballa

18 days

Video Depth Anything is seriously cool

0

2

20

Lucky Iyinbor

@Luckyballa

18 days

@daveseidman Agree! You can use meshes or oriented point clouds, no additional input is needed

1

0

5

Lucky Iyinbor

@Luckyballa

18 days

@Mark_Tension Thanks! You can check out the paper results here

0

7