mok Profile Banner
mok Profile
mok

@mokshith_v

Followers
770
Following
738
Media
60
Statuses
451
Explore trending content on Musk Viewer
@mokshith_v
mok
2 years
Introducing Tasmania - a better YouTube video search engine. Every result links to a specific moment in the video, so you can pinpoint the exact part of a video that made it surface in the results!
24
70
716
@mokshith_v
mok
2 years
1/ Today we’re excited to unveil Sieve ( @sievedata ), a cloud native platform for processing, searching, and running all sorts of AI models on video.
53
44
218
@mokshith_v
mok
4 months
After months of feedback from early customers, we’re launching Autocrop in 1.0. With this release, developers have a dead simple way to super charge their video platforms for today’s age of short-form social content. Some of the story behind why we built this:
@sievedata
Sieve
4 months
Meet Autocrop 1.0. An API for social video editing. Speaker tracking. Auto reframing. One developer-focused API powered by the Sieve platform.
1
5
32
1
1
77
@mokshith_v
mok
1 year
openai api is a wrapper over matrix multiplication
3
0
47
@mokshith_v
mok
3 years
Excited to announce that @sievedata is a part of @ycombinator 's W22 batch 🎉 We're building core video analytics infra for the next generation of smart apps in industries from security, to media, supply chain, and agriculture. Super stoked to be on this journey with @thpicy :)
Tweet media one
3
3
39
@mokshith_v
mok
1 year
These are new waters for us as a company but I'm finally excited to share the news! The world is transforming right beneath our feet because of generative AI. We think this partnership will bring that transformation to the world, 10x faster. Read more:
6
2
38
@mokshith_v
mok
2 months
Thrilled to be releasing the 1.0 of our Dubbing API. It’s the first AI Dubbing solution purpose-built for developers. High quality, comes with a ton of knobs, and packaged with friendly pricing. It’s another step we’re taking to building production-ready solutions on top of the
@sievedata
Sieve
2 months
Excited to launch Sieve’s Dubbing API in 1.0 — the first AI dubbing solution purpose-built for developers! ✅ Speaker Style Preservation ✅ Multiple Speaker Support ✅ 29 popular languages ✅ Customizable voice engines, and more! More details in 🧵(1/6)
3
8
123
0
1
40
@mokshith_v
mok
2 years
@DeveloperHarris this is because AWS and GCP are now just hardware renting businesses. it's up to the startups to build experiences developers truly love. and that's not a bad deal for Amazon or Google as a business since they still get a rake + outsource product development and marketing!
3
0
22
@mokshith_v
mok
3 years
Many companies tend to store petabytes of data just in case. There's way too little attention put towards how to make use of that raw data effectively. I wrote a little about it recently on Towards Data Science :)
@TDataScience
Towards Data Science
3 years
Curating a Dataset from Raw Images and Videos by @mokshith_v
0
5
16
0
4
24
@mokshith_v
mok
9 months
Transcribe 1h of audio in 21s for $0.05, with best-in-classy accuracy on @sievedata 🦧. That's 𝟓𝐱 𝐜𝐡𝐞𝐚𝐩𝐞𝐫 than the next cheapest transcription API. Powered by @OpenAI 's Whisper-Large-V3, with many optimizations on top. Cheap, accurate speech-to-text is here. 🧵
Tweet media one
2
0
24
@mokshith_v
mok
6 months
pretty soon i don't think we'll have to train image or video classifiers at all for anything. was pretty fascinated with moondream (shoutout @vikhyatk ) so i made a thing that can classify any video in under 10s.
2
1
23
@mokshith_v
mok
2 years
5/ We'd also like to thank our amazing investors including @MatrixVC , , @YCombinator , AI Grant ( @natfriedman and @danielgross ), and a host of other angels.
1
0
19
@mokshith_v
mok
4 months
thanks for the book @MuxHQ , gonna start reading it every night
Tweet media one
3
1
18
@mokshith_v
mok
2 years
trying out early versions of a new thing we're working on, it's still a super rough experience but something about it tells me we're barking up the right tree
3
0
17
@mokshith_v
mok
2 years
3/ @sievedata 's API lets developers create a project with a workflow, push video to it, integrate into a web app or a dashboard, and submit feedback to improve models over time. All without setting up a separate backend or having to think about infrastructure.
Tweet media one
1
1
18
@mokshith_v
mok
1 year
it is cool to see that many AI products that took top notch research teams ~years to refine and actually ship back in 2020 can be done in ~months or ~days now by a single engineer due to awesome OSS.
0
0
17
@mokshith_v
mok
2 years
The entire backend is powered by an alpha embedding search feature in @sievedata of course and the frontend is hosted on @vercel with @nextjs . Reach out if it's something you'd like to play around with!
1
0
17
@mokshith_v
mok
2 years
2/ 80% of the internet's IP traffic is video, TikTok will have 1.8B users by the end of this year, smart cameras are being installed in every industrial setting, and we're being ushered into a new generation of creative tools.
1
0
17
@mokshith_v
mok
11 months
Building blocks all the way down. State of the art audio enhance by just combining a couple OSS models.
@sievedata
Sieve
11 months
Pretty insane that open source models when combined together can reach production grade quality on audio enhancement. Enjoy the YouTube GOAT with high fidelity.
6
12
44
1
0
15
@mokshith_v
mok
2 years
Open source models are getting really, really good. Kind of surprising to see how something like this could be built over a couple days.
@sievedata
Sieve
2 years
Excited to show off a Sieve workflow the team worked on over the weekend: AI generated Twitter bots! With a few open-source models, we built a bot that responds to your questions with a custom, realistic talking head avatar, all deployed on @sievedata .
2
3
24
2
0
15
@mokshith_v
mok
5 months
These are great improvements. We just updated Describe on @sievedata to use this latest model when `visual_detail` is set to low. Almost on par with the other VLMs we're using that are heavier :) It get's the text on his shirt + more details of the scene!
@vikhyatk
vik
5 months
New moondream release out today! Mainly focused on improved OCR and captioning. If you're using moondream for image captioning definitely worth checking this one out!
Tweet media one
14
15
241
1
1
14
@mokshith_v
mok
6 months
i’ll be honest, this started out as a toy demo project for @sievedata but seeing the cost, quality, speed trifecta here makes me think this would be genuinely be useful to so many people. i'm blown away by the quality something so simple brings. great work by @gaurang_bharti !
@sievedata
Sieve
6 months
🙈 Introducing Describe Beta. The most descriptive audiovisual summaries for videos generated with AI. - Process videos up to 12x faster than realtime - Costs <$0.01 / min of video - Combines visual and audial components Try on your own videos now:
6
8
38
1
1
13
@mokshith_v
mok
5 months
🌴We recently released a feature that embodies the way we think about AI at @sievedata : The Job Tree (1/X)
2
4
14
@mokshith_v
mok
3 months
💜
Tweet media one
1
0
14
@mokshith_v
mok
2 years
What could we do to make this better? 1. Index more videos (not just 5800) 2. Take into account video metadata, comments, etc. What else?
@mokshith_v
mok
2 years
Introducing Tasmania - a better YouTube video search engine. Every result links to a specific moment in the video, so you can pinpoint the exact part of a video that made it surface in the results!
24
70
716
4
0
13
@mokshith_v
mok
2 years
4/ We've been working closely with software startups redefining their products through video AI in private. Today we're opening up Sieve for any developer to try. Thank you to all our customers who've helped us reach this milestone.
1
0
13
@mokshith_v
mok
2 years
Thank you all for the love today! A bunch of you sent us questions along with workflows you want to try building on top of @sievedata :) Can't wait to see those demos live and will probably share some cool ones with the broader community on our account.
@mokshith_v
mok
2 years
1/ Today we’re excited to unveil Sieve ( @sievedata ), a cloud native platform for processing, searching, and running all sorts of AI models on video.
53
44
218
0
0
12
@mokshith_v
mok
2 years
Gearing up for a launch next week! DM me if you're interesting in tinkering with video + AI stuff, and would like an early preview 🍪
1
0
13
@mokshith_v
mok
6 months
we were inspired by @pika_labs “sound effect” feature so we made our own version. 𝐭𝐡𝐢𝐬 𝐯𝐢𝐝𝐞𝐨’𝐬 𝐬𝐨𝐮𝐧𝐝𝐬 𝐚𝐫𝐞 𝐜𝐨𝐦𝐩𝐥𝐞𝐭𝐞𝐥𝐲 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐞𝐝 𝐮𝐬𝐢𝐧𝐠 𝐀𝐈.
1
3
13
@mokshith_v
mok
9 months
man we generated this last night thinking @sama would still be at microsoft. bro keeps switching on us.
@sievedata
Sieve
9 months
We’ve recently been working with lipsyncing models and wanted to share a message on behalf of Satya. The stakes couldn’t be higher. Check out the links below to learn more and try it yourself for free!
3
3
26
1
0
12
@mokshith_v
mok
2 years
6/ This is just the beginning, we can't wait to see what you all build. Just get started for free by going to !
0
0
11
@mokshith_v
mok
2 years
Note: it's not your traditional search and isn't meant to find the most topical videos. Instead it helps you pinpoint an exact "thing" you might've been looking for within clips.
1
0
11
@mokshith_v
mok
6 months
sharing is caring 💏
@sievedata
Sieve
6 months
We’re excited to launch GPU sharing on Sieve. Split any of our GPU offerings into up to 8 components to run models more efficiently with higher GPU utilization 🫂
0
2
8
0
0
11
@mokshith_v
mok
2 years
@alexandr_wang why LLMs specifically? why not this trend of awesome joint, multimodal latent spaces that represent more than just text?
2
1
10
@mokshith_v
mok
5 months
while i don’t come from a creative background, building @sievedata has let me to interact with so many people pushing the forefront of video creation with AI. it got me curious enough about the history that i wrote a blog post on it :) hope you enjoy!
0
2
11
@mokshith_v
mok
5 months
Banger video @MadeWithCapsule and banger app madeWith @gaurang_bharti
@sievedata
Sieve
5 months
Introducing Highlights Beta for developers. Find highlights in long-form video automatically. - Specify any search term (i.e. “most viral worthy”) - Auto-generate titles for the clips - Auto-score each clip based on relevance Try on your own content now:
4
3
19
1
2
11
@mokshith_v
mok
5 months
Really excited to unveil our first case study! It’s been amazing working with the @ZightApp team, especially Sachin and @nsphin , and really awesome to see them tell the story of using @sievedata for Zight’s AI capabilities.
@sievedata
Sieve
5 months
We’re excited to unveil our case study with @ZightApp , a leader in the video communication space. They use Sieve to power all of the functionality that’s a part of Zight AI, their AI suite. See what their Head of Product, @nsphin , has to say about Sieve and the partnership 👇
1
6
16
2
1
11
@mokshith_v
mok
2 years
We're hiring. Unfortunately we don't work from speedboats :( We work in glorious SF.
@levelsio
@levelsio
2 years
Live shipping from the speedboat, because 5G 🚤 ❤️ 🚤 ❤️ 🚤 ❤️ From the classic series:
19
2
247
0
0
9
@mokshith_v
mok
5 months
VoiceCraft by @PuyuanPeng is probably the first open source TTS that's passable in zero shot voice cloning. here's @fredagainagain1 's voice...cloned.
@PuyuanPeng
Puyuan Peng
5 months
Announcing 𝐕𝐨𝐢𝐜𝐞𝐂𝐫𝐚𝐟𝐭🪄 SotA for both speech editing and zero-shot text-to-speech, Outperforming VALL-E, XTTS-v2, etc. VoiceCraft works on in-the-wild data such as movies, random videos and podcasts We fully open source it at
30
140
686
1
1
9
@mokshith_v
mok
6 months
Awesome work by the team. Cool to see how much you can gain by speeding up ffmpeg ops + parallelizing over burst compute!
@sievedata
Sieve
6 months
Introducing fast-asd: a new open-source method of doing active speaker detection on video that’s 90% faster than existing solutions.
1
3
13
0
0
10
@mokshith_v
mok
1 year
google drive is my storage solution of choice for model weights
1
0
10
@mokshith_v
mok
3 months
We just updated audio enhance on Sieve. Now it uses some "non-AI stuff" like adaptive leveling and loudness normalization on top of some diffusion which to my ear sounds much better than a lot of audio enhance solutions i've seen. Simple solutions ftw.
0
1
9
@mokshith_v
mok
2 years
@gorkemyurt i don't think this is in any way specific to data / ai. PMs working at cloud providers just aren't good at building delightful developer products. whether it's ML infrastructure, web hosting, financial services, security, etc.
0
0
9
@mokshith_v
mok
2 years
And finally kudos to @gauravity_ and @thpicy for building this awesome demo!!
0
0
9
@mokshith_v
mok
2 years
people who have used opencv, librosa, moviepy, etc for projects. anything you hate a lot about them? or anything you love so much about them?
0
0
8
@mokshith_v
mok
9 months
it amazes me how many diamonds in the rough there are when it comes to open source models. here's a two year old model that does "active speaker detection" on video. pretty unoptimized code but it works great.
1
0
9
@mokshith_v
mok
3 years
Got $10k for basically free and met so many amazing people through this. @cory is awesome :)
0
1
8
@mokshith_v
mok
9 months
why is it so hard to create an azure account and why is it asking me to log in with linkedin??
Tweet media one
1
0
8
@mokshith_v
mok
9 months
something about apple notes on mac. i like to type in it more than i do in notion or anywhere else.
1
0
8
@mokshith_v
mok
1 year
best video editing tool out there. sorry everyone else working on video editing tools you stand no chance.
@FFmpeg
FFmpeg
1 year
We are superior
60
518
5K
2
0
7
@mokshith_v
mok
6 months
this is amazing. there could not be a more exciting time to be working with computer vision :)
@vikhyatk
vik
6 months
Releasing moondream2 - a small, open-source, vision language model designed to run efficiently on edge devices. Clocking in at 1.8B parameters, moondream requires less than 5GB of memory to run in 16 bit precision.
Tweet media one
71
209
2K
0
0
8
@mokshith_v
mok
4 months
dubbing is such a fun use case to do with ai. its such a clear example of ai models seeming more like building blocks versus the end experience / product.
1
0
7
@mokshith_v
mok
5 months
team @sievedata really doing some groundbreaking work @3blue1brown . SOUND ON.
0
0
8
@mokshith_v
mok
4 months
sometimes you just know an email you've written is a masterpiece. super short, but super packed with some good shit.
0
0
8
@mokshith_v
mok
5 months
can't wait for a future where custom legal work for startups is fully automated. let's have two bots going back and forth on contract redlines!
2
0
8
@mokshith_v
mok
1 year
any good resources on the latest when it comes to GPU availability? really wish someone built a tool that gave more transparency into which regions + clouds have what sort of availability at a given moment. crazy that even cloud sales reps don't have full transparency into this.
2
0
7
@mokshith_v
mok
1 year
new york is way better than sf, but only in small doses
0
0
7
@mokshith_v
mok
10 months
the art of sales. never expected myself to appreciate it as equally as engineering.
0
0
7
@mokshith_v
mok
1 year
Going model shopping on GitHub is pretty fun, especially when you see that the only inference code available is one that evaluates on the entire benchmark dataset
0
0
7
@mokshith_v
mok
3 months
fwiw im a celtics fan and i like kyrie
@sievedata
Sieve
3 months
In honor of a ☘️ sweep today, here's Kyrie telling Boston he plans to "re-sign" four years ago, now dubbed in Chinese. Developers, get ready for next week. We've been cooking 👀
1
0
45
2
0
7
@mokshith_v
mok
5 months
been trying to figure out how to say something like this for a few days and this does a pretty good job. chatbots or text prompts exist because that’s how the underlying models work. but is every user of the internet just typing into a terminal? no. so much more to be imagined.
@c_valenzuelab
Cristóbal Valenzuela
5 months
Question-answering systems and chatbots have inadvertently created a false narrative that AI can solve any problem with a simple prompt, including complex creative tasks like storytelling and filmmaking. The ability to generate video from language has been misinterpreted as the
11
23
133
0
1
7
@mokshith_v
mok
4 months
the best customers are those that offer super candid feedback, and trust you will do right by them. it's the kind of relationship that pushes product forward the most.
0
0
7
@mokshith_v
mok
1 year
I read somewhere recently that only 40% of the MAUs on Figma are designers (other portion is likely PMs, engineers, etc). What are some other examples of products with similar usage characteristics (maybe in a different domain)?
2
1
6
@mokshith_v
mok
5 months
so satisfying when something you think is so simple just wows a customer. makes you realize that 99.99% of the world doesn’t have any understanding of how half this stuff is even possible…
1
0
7
@mokshith_v
mok
4 months
The OAI / MSFT relationship is weird in that they are competitively selling the same product? I just got access to what I think is OpenAI's "limited access" voice engine within 1 day of request. MSFT calls it "personal voice".
1
0
6
@mokshith_v
mok
4 months
everyday i wake up and think about how much i appreciate black boxes and the magic they let me wield
1
0
6
@mokshith_v
mok
2 years
@mathemagic1an @BookCameo This is awesome! We made something similar and launched it yesterday that takes the Wav2Lip output to make it look even better.
1
0
6
@mokshith_v
mok
3 months
@tdsone3 @modal_labs @LatchBio @SeqeraLabs my impression is that there are more problems to solve than just the compute in bio. visualization, plotting, hyper specific optimizations around particular workflows, etc. curious to hear you take on those.
2
0
6
@mokshith_v
mok
4 months
Scarlett Johansson voice over, @3blue1brown style math videos. @gatekeep_labs make it happen!
@JayBrunet
Jay Brunet
4 months
I am impressed by 4o's subtly flirtatious voice. To duplicate that I'd use @sievedata to add context to the situation. Poor kid learning geometry shouldn't have a flirty teacher, so the voice should adjust by including Sieve analysis. Market is infinite, specialize in the nuance.
0
0
1
1
1
6
@mokshith_v
mok
3 years
Some underrated computer vision use-cases: 1. "camera in a box" solutions in factories, construction, agriculture, etc. 2. more video-first platforms like @tiktok_us + @Whatnot 3. competitive analytics (sports + esports) 4. Analyzing the world of Zoom (i.e. @Gong_io , @GoRewatch )
2
1
6
@mokshith_v
mok
5 months
pydantic is so underrated bro
0
0
6
@mokshith_v
mok
2 months
who wants to see ilya sounding chill? run up the likes and i'll go find a clip :)
2
0
6
@mokshith_v
mok
5 months
hot take: in the long run, there will be no AI company that makes it big by serving proprietary models as an API
@tibo_maker
Tibo
5 months
Every AI company is freaking out rn They're all trying to lock people with yearly plans as new tech is released every day Would you accept the lock?
Tweet media one
79
5
190
0
0
6
@mokshith_v
mok
1 year
since when did all the best sf coffee shops disincentivize sitting there with a laptop by having no wifi or outlets? function over form damn.
2
0
5
@mokshith_v
mok
2 years
Yes! Except I think video is interesting in that it's not just bounded strictly to the public web. So much utility exists in being able to process and search camera feeds, private logs generated by robot fleets, and more. Just wait for @sievedata in October :)
@amasad
Amjad Masad
2 years
Internet users consume way more video than text, yet the web's core infrastructure is still built around text. What does the web of video look like? At a minimum, we need better search engines. This is one of the most interesting domains to apply AI breakthroughs today.
55
123
1K
1
0
5
@mokshith_v
mok
6 months
The pattern of combining models is here to stay. "𝘛𝘩𝘦 𝘱𝘳𝘰𝘤𝘦𝘴𝘴 𝘪𝘴 𝘤𝘰𝘯𝘵𝘳𝘰𝘭𝘭𝘦𝘥 𝘣𝘺 𝘢𝘯 𝘓𝘓𝘔 𝘢𝘴 𝘵𝘩𝘦 𝘢𝘨𝘦𝘯𝘵, 𝘸𝘪𝘵𝘩 𝘵𝘩𝘦 𝘷𝘪𝘴𝘶𝘢𝘭 𝘭𝘢𝘯𝘨𝘶𝘢𝘨𝘦 𝘮𝘰𝘥𝘦𝘭 𝘢𝘯𝘥 𝘊𝘓𝘐𝘗 𝘴𝘦𝘳𝘷𝘪𝘯𝘨 𝘢𝘴 𝘵𝘰𝘰𝘭𝘴."
@_akhaliq
AK
6 months
VideoAgent Long-form Video Understanding with Large Language Model as Agent Long-form video understanding represents a significant challenge within computer vision, demanding a model capable of reasoning over long multi-modal sequences. Motivated by the human cognitive
Tweet media one
5
38
235
0
0
5
@mokshith_v
mok
6 months
computer vision workloads are one example of something that's GPU bursty in the sense that there are many CPU operations sometimes intertwined with their GPU counterparts (i.e. reading frames, other transforms, etc). what other ai inference workloads are like this?
0
0
5
@mokshith_v
mok
5 months
april fools is stale and all the jokes are so lame
1
0
5
@mokshith_v
mok
5 months
as one of the few real contenders to AWS, it would cool if cloudflare could acquire huggingface
0
0
5
@mokshith_v
mok
2 years
Anyone that offers hosted ML deployment, meaning just saying "hey you give me a model, I give you a direct endpoint". Is that not just a commodity? Open to hearing if folks feel otherwise.
1
0
5
@mokshith_v
mok
6 months
yolo world v2 is everything i ever wanted when playing with object detection models back in 2016. that was the bound of my imagination. it seems trivial given all the other models we have out today but it is amazing that we have this.
0
0
5
@mokshith_v
mok
2 years
Every AWS product line can be built 10x better. A bunch companies that sell specialized compute and get really efficient at using cloud infra + building an objectively better product. And when these products become big enough to roll their own compute in-house...rip AWS.
0
0
5
@mokshith_v
mok
2 years
every software company really is just selling compute credits with their own value-add markup
1
0
5
@mokshith_v
mok
1 year
The team at Kaiber is building something really special
@stokebuilder
builder of stoke
1 year
Join us tomorrow at 4pm PST. We reintroduce @KaiberAI and chart our path for this new chapter of creativity.
0
0
14
0
0
5
@mokshith_v
mok
10 months
bullish
@rauchg
Guillermo Rauch
10 months
<Video /> is the new <Image /> – Great DX by @muxhq
28
96
1K
0
0
5
@mokshith_v
mok
10 months
what most think to be tiny tiny corners of broader fields tend to be large lakes some folks spend their entire lives swimming in. pretty cool when you get to speak with said folks.
1
0
5
@mokshith_v
mok
4 months
the video editing landscape is such an interesting one. literally every single player now has AI top of roadmap each with there own mini-take on what's important and how to do things. this wasn't the case at the beginning of last year.
0
0
5
@mokshith_v
mok
6 months
Really cool to see the gap closing between open and closed TTS models. @elevenlabsio is still the clear winner for production applications but curious which OSS models people are actually using in production if any.
Tweet media one
1
0
5
@mokshith_v
mok
2 months
7000 languages existing sounded like such cap i needed to google it
Tweet media one
@reach_vb
Vaibhav (VB) Srivastav
2 months
Toucan TTS: MIT licensed Text to Speech in 7000 languages! 🔥 The most multilingual open-source TTS model out there ⚡ Step 1: They built a text frontend that can turn text in any language from the ISO-639-3 list into language-agnostic articulatory features. Step 2: Then,
15
227
957
1
0
5
@mokshith_v
mok
3 months
Tweet media one
@sievedata
Sieve
3 months
Some new paint 💜
Tweet media one
1
1
23
1
1
5
@mokshith_v
mok
6 months
anyone that would find really good video captions useful? something that accounts for both audio + visual components.
2
0
4
@mokshith_v
mok
9 months
Literally creating a whole new slow on the cost / speed tradeoff versus the rest of the market.
Tweet media one
1
0
4
@mokshith_v
mok
5 months
many such cases
@hieudinh_
Hieu Dinh
5 months
My Reddit post is doing well #typefully
13
11
244
0
0
4
@mokshith_v
mok
3 months
@chongzluong all my homies use @lancedb tho, only then can you achieve true sage status.
0
1
4
@mokshith_v
mok
6 months
this is the man i wish existed back in high school. no disrespect to pyimagesearch but this guy is pyimagesearch rebooted.
0
0
4
@mokshith_v
mok
5 months
❤️‍🔥
Tweet media one
@mokshith_v
mok
5 months
pydantic is so underrated bro
0
0
6
0
0
4
@mokshith_v
mok
4 months
you know facebook really has their content targeting on lock when ur cofounder sends u an instagram short like this
Tweet media one
0
0
3
@mokshith_v
mok
6 months
and it's only 100 lines of code! used moondream on @sievedata and gpt4. probably could use phi-2 or gemma with a bit more prompting for this task tho which would make this fully OSS dependent + lightweight.
0
1
4