This is the construct. We can load anything from Chesterfields to vintage TVs. Anything we need.
Made with three.js, SD 1.5, ControlNet, and TripoSR, running on an M2 Max (generation sequences sped up for brevity).
Made a little scene editor/interior design tool with Stable Diffusion inpainting - running locally on M2 Max, using the Dreamshaper finetune by
@Lykon4072
Added brushes to my Stable Diffusion isometric editor: texture, object, and wall. It can now build a room from scratch. All running locally on an M2 Max (video at 2x for brevity).
Texturing 3D Meshes with Stable Diffusion + ControlNet
I love the meshes coming out of TripoSR by
@StabilityAI
@tripoai
but sometimes they lose some of the detail from the original image. I wanted to (re)apply a Stable Diffusion image as a texture to regain detail. Full
Updated my ControlNet-based texturing script for TripoSR meshes to go all the way around the model. Still a single generation, 12 steps in this case, at ~1.5s/step on M2 Max. Link to code below.
Got
@StabilityAI
Stable Zero123 running on a Mac with the original zero123 code (which I've modified to be a package). The trick to getting proper outputs turned out to be setting the z distance to a constant pi / 2.
I've just open-sourced my experimental adventure game running on llama.cpp in the terminal. It's rough and unfinished ... but functional!
Read on for some of the wild adventures you could take:
Experimenting with open-vocabulary object detection in generated scenes. Idea being to generate a scene with an image model, decompose it into semantic segments, and build the game world from that. This one is a super quick mockup of ChatGPT -> DALL-E -> YOLO World.
Basically
@JungleSilicon
Yes it's inpainting. Currently the objects are first painted on the base scene (e.g., empty room) and then composited. Working on further extracting them to allow for manipulation etc.
LLM-driven games will shine where language is intrinsic to the gameplay. Like taking the captain's chair in a starship. Or governing by written edicts and policies. Or having to rule through your ministers.
Working on an approach for text -> multiple views on an object with just a general purpose Stable Diffusion model. Interested to see how far I can take basic inpainting, and the initial results are looking promising!
Technique in thread...
its sculptor well those passions read
The hand that mocked them and the heart that fed:
And on the pedestal these words appear:
"My name is Ozymandias, King of Kings:
Look on my works, ye Mighty, and despair!"
Sculpted by M2 Max running SD 1.5, ControlNet, and TripoSR
Part of maturing as an engineer has been coming to terms with technical debt. You should build smart but do what it takes to ship now. Take on whatever debt is necessary. I've seen so many finely crafted codebases scuppered by changing business factors and product decisions.
Here's Llama generating a tomb adventure map on the fly when prompted a game name. The engine delegates scene layouts to the model, so it generalizes to arbitrary settings.
First, the mesh is simplified and raycast to a depth map using Open3D. Then, ControlNet for depth is used to paint the depth map. Finally, the image result is traced back to the mesh's UV map via the raycast results to generate & apply the texture.
I built something like this internally at Twitter but for video captures. You passed it a Python script where each function navigated to a page and manipulated a virtual mouse, and it output a gif for each of these little scenarios.
New release of shot-scraper, my CLI tool for taking screenshots and running JavaScript scrapers from the terminal
1.4 adds support for HTTP Basic auth, custom --scale-factor shots, additional --browser-arg arguments and a fix for --interactive mode
- Drew a tilted cube at a few rotations using the awesome 3D modeling technology of ... CSS
- Added an arrow to help guide the model
- Painted the cubes white for the mask
- Inpainted with
@Lykon4072
Dreamshaper finetune with "3d model of X at various rotations"
Experimenting with laying out and tiling SDXL-generated game assets. Interestingly, isometric turned out to be easier than top-down - the model seems to have difficulty with overhead views.
My kids were fighting over coloring pages so I decided to inpaint a new one with Stable Diffusion. Sharing my very sophisticated Preview dot app workflow here. The python script towards the end is basically a wrapper around diffusers that dumps iterations to the filesystem.
Ironically this plays way better in a text adventure right now, where the latency on the action is just straight tok/s and the pauses for user input fit with the medium
I'm interested in AI-driven improvised action sequences in games. Really capture the novelty of a Bourne fight or a Bond chase. Like a goon pops out of the fridge and you're fending them off with a toaster.
@ThinkWiselyMatt
@Lykon4072
Here's what I ended up on (at 2x for brevity), probably needs some tweaks, hah. It really wanted to turn it into furniture. And yeah, want to get to characters as well. The challenge there being the animation - haven't wrapped my head around that yet.
@raw_works
That's the plan, but here it's just the colored meshes from TripoSR. The textures will be generated in the background after the initial mesh loads in
Trying out the new TripoSR single image-to-3D model from
@StabilityAI
and
@tripoai
on a Mac. As promised, it is quite good, esp. for the speed. Runs in 6s on the M2 Max, could possibly be improved further. Some initial observations in the thread:
Working on a new floor generation approach, and the model desperately wants to create these intricate, symmetrical floor patterns. Kind of a cool finding, could be selectively applied for effect.
Also every time I go back to floor gen I feel like I should do a 2d adventure
Here's where I'm at with automating multi view sprites from SD inpainting (at 2x). It's promising, but the quality and consistency isn't quite there & hitting all the angles is a challenge.
Finally figured out how to compose a transform matrix for a surface in a CSS 3D scene and translate screen coordinates to element coordinates (more in thread)
Grammar sampling is now available in whisper.cpp. We ported it over from llama.cpp and it uses the same GBNF syntax. Based on early testing, it's definitely a different beast than with a generative model - some experimentation is needed to figure out how best to apply it.
@yuiseki_
1) my philosophy is local-first, so folks should be able to run privately on their own hardware but have the option to swap in hosted models as well
2) basically convenience? Also browsers are universal. My experience is more with web apps and I'm figuring out 3d as I go
The biggest challenges I'm facing now are:
* aligning generative models to intended goals and ethical principles
* aligning my 5 and 6 year old sons to intended goals and ethical principles
I implemented JSON schema for llama.cpp grammars but I'm more interested atm in lightly annotated natural language. The thought is that it might be a better fit to models' training data and more token efficient than JSON.
@ggerganov
Eventually it will. It generates code behind the scenes to lay out the rooms (and other game logic) so I'll constrain it to that eventually. For now I'm relying on the model following the pattern from the prompt, which it does more often than not actually.
Got zero123 (model for changing camera viewpoint on an object) running on a Mac M2. Required surprisingly few changes to the original CUDA-based code - it's impressive how well torch's MPS backend works. The patch is on my fork which I'll link.
There's a beauty and a mystery to building on base models like Llama - rather than following instructions, you're setting it up to simulate the output of some imagined process (human or machine) and working backwards from that.
With constraints on sampling and a quite minimal prompt llama-2-13b has proven more than capable of generating a plausible game world. It only goes off the rails like 40% of the time now
@ReedSealFoss
I plan to put it up soon, I think it's in a good spot now for an initial release. The models are all local, but eventually I envision having the option to run remote as well. The models as well as most of the world state are coordinated by a little Python server.
@pa_schembri
@Teknium1
Yeah! This approach is model-agnostic so I'm interested in experimenting with different finetunes when the engine is more mature.
I think I figured out the math to do reverse projection and go from screen coordinates to coordinates on a 3D surface. The projection matrix produces 4D vectors, homogenous coordinates, so its inverse isn't directly applicable as you only have 2D screen position. BUT you know
Finally figured out how to compose a transform matrix for a surface in a CSS 3D scene and translate screen coordinates to element coordinates (more in thread)
@MIncarnator
This is three.js in a desktop browser but I do have VR ambitions. For backend, I don't know much about ComfyUI but I'm trying to build this in a way where the model capabilities/workflows can be swapped out.
Did a bunch of reworking on my llama.cpp text adventure, back to something not half bad! And places now have exits. Blue is user input, green is model output (Llama 2 13B)
Used Stable Diffusion inpainting to generate this year's gift tags. Settled on "four different christmas gift tags highly detailed cozy painting", 60 iterations
@gary_doesnt_lai
@StabilityAI
Sure! The code is available but there are a few steps involved in its current state:
- the code for this demo is on a branch on my fork of zero123:
- it can be installed in a virtualenv with `pip install 'zero123-inference @ git+'`
@tripoai
@StayuKasabov
Thanks! The API looks awesome and I'm sure would yield even better results. I love that we have the choice to run local or remote.
The reverse, screen to element coords, wasn't obvious. I could invert the matrix but the w and z values seemed to be necessary to reverse the transformation. What I learned is that a 2D projected point corresponds to a ray in 3D space, and I specifically want that ray at z=0.