Lucky Iyinbor Profile
Lucky Iyinbor

@Luckyballa

Followers
3K
Following
2K
Statuses
511

Physics Simulation | Geometry Processing | Computer Graphics | AR/VR

Joined June 2012
Don't wanna be here? Send us removal request.
@Luckyballa
Lucky Iyinbor
7 months
Found some time to add scene collisions to my Vision Pro playground, and I did not expect this level of accuracy!
203
1K
13K
@Luckyballa
Lucky Iyinbor
3 days
A few weeks ago, I implemented a paper where an algorithm was described from a CPU standpoint. By rethinking it for GPU architecture, I was able to get almost a 100x speedup compared to the metrics from that paper. Here's how I did it: The Problem: We have vertices randomly assigned to clusters (spheres). We need to compute per-cluster statistics where each vertex contributes to its cluster's metrics. With significantly more vertices than clusters, the main challenge was figuring out how to do parallel reduction efficiently while supporting scaling to millions of vertices and centroids I developed 3 methods: Method 1: Direct Atomic - Each thread handles one vertex with a cluster assignment - Directly updates global cluster memory with atomics It's nice and simple, but has high memory contention, random access, and in general, atomics to device memory are quite slow Method 2: Threadgroup Memory - Each thread handles one vertex with a cluster assignment - Each thread group allocates memory for all clusters - First accumulates in thread group memory for all clusters, then to global memory Here we have less atomic contention and good thread workload, but we're limited to ~512 clusters (32KB threadgroup memory limit), which is fine for most cases but not scalable Method 3: Range-Based - Sorts vertices by cluster ID - Finds start and end indices of sorted vertices for each cluster - Computes cluster distribution given a desired thread group size - First accumulates in thread group memory for a single cluster, then updates global memory This one is my favorite - it scales to any cluster count, has perfect memory coalescing, and we have one cluster per group, so no limits on number of clusters! The downside is that it requires sorting and some threadgroups aren't fully utilized In practice: Method 2 and 3 outperform the first one significantly. Method 2 is fastest but limited, Method 3 is slightly slower but scales to any size. Both methods allow you to go from 2-3 seconds stated in the paper to 30-40ms on M1 Ultra Mac That's it, have a good day :)
Tweet media one
0
2
19
@Luckyballa
Lucky Iyinbor
5 days
RT @ssh4net: A Radiance Field Loss for Fast and Simple Emissive Surface Reconstruction Ziyi Zhang, Nicolas Roussel, Thomas Müller, Tizian Z…
0
3
0
@Luckyballa
Lucky Iyinbor
5 days
@Spiritandsoul23 No framework Using this method in my compute shader playground
@Luckyballa
Lucky Iyinbor
9 days
I want to work more on high-dimensional optimizations I'm bad at math, so deriving all gradients in a chain by hand is painful for me I'm focused entirely on GPU development, so CPU auto-grad isn't an option I hate using frameworks - what are my options? The answer is this -
Tweet media one
0
0
0
@Luckyballa
Lucky Iyinbor
6 days
Voronoi for the win 🏎️
@taiyasaki
Andrea Tagliasacchi 🇨🇦
6 days
📢📢📢 "𝐑𝐚𝐝𝐢𝐚𝐧𝐭 𝐅𝐨𝐚𝐦: Real-Time Differentiable Ray Tracing", a mesh-based 3D represention. Co-lead by my PhD students Shrisudhan Govindarajan and Daniel Rebain, and w/ @kwangmoo_yi
0
0
13
@Luckyballa
Lucky Iyinbor
9 days
@jmeseguerdepaz Another thing I am bad at is python, so probably this is not the best option for ahah
0
0
2
@Luckyballa
Lucky Iyinbor
9 days
RT @zianwang97: 🚀 Introducing DiffusionRenderer, a neural rendering engine powered by video diffusion models. 🎥 Estimates high-quality geo…
0
130
0
@Luckyballa
Lucky Iyinbor
11 days
Another way to approach MAT is to describe it as a field The medial field M(x) is the radius of the medial sphere centered at projM(x), where projM(x) is the intersection of a ray from a surface point to the medial axis in the normal direction It can be defined as a function that satisfies these constraints: M*(x) ≥ |Φ(x)| M*(x) = |Φ(projM*(x))| ∇M*(x) · ∇Φ(x) = 0 This representation allows finding a projection point on the medial axis in O(1): projM(x) = x + ∇|Φ(x)| · (M(x) - |Φ(x)|) Key applications include faster ray marching, collision proxy building, and ambient occlusion computation
Tweet media one
0
3
18
@Luckyballa
Lucky Iyinbor
12 days
@miketuritzin Isn’t it a bit specialized? Like decomposition for arbitrary 3D shapes is probably not trivial
1
0
0
@Luckyballa
Lucky Iyinbor
12 days
RT @miketuritzin: Ran across this great article on sampled SDFs that has *great* interactive WebGL illustrations that work really well for…
0
40
0
@Luckyballa
Lucky Iyinbor
12 days
@miketuritzin Awesome stuff
0
0
1
@Luckyballa
Lucky Iyinbor
13 days
RT @QianqianWang5: Introducing CUT3R! An online 3D reasoning framework for many 3D tasks directly from just RGB. For static or dynamic sce…
0
98
0
@Luckyballa
Lucky Iyinbor
13 days
I was a bit boggled by the fact that VMAS uses shrinking ball to keep spheres inside instead of modeling it as an energy LSMAT shows a better way: use RBF to blend point/plane distances into local SDF approximation, making inscription part of the optimization itself
Tweet media one
0
3
11
@Luckyballa
Lucky Iyinbor
18 days
@turtlespook Here I use the Gauss–Newton method. It has only 4 parameters per sphere, so the system matrix (approximate Hessian) is very small (4x4 per sphere) Fits nicely in threadgroup memory, which is 32kb on Apple Silicon
0
0
2
@Luckyballa
Lucky Iyinbor
18 days
Video Depth Anything is seriously cool
0
2
20
@Luckyballa
Lucky Iyinbor
18 days
@daveseidman Agree! You can use meshes or oriented point clouds, no additional input is needed
1
0
5
@Luckyballa
Lucky Iyinbor
18 days
@Mark_Tension Thanks! You can check out the paper results here
0
0
7