A demo showcasing a "fast" conversational agent:
⚡️ Voice to text via
@Azure
- Real time streaming interface via websockets
🧠 Inference by
@GroqInc
llama 70b - speed is a bit variable at the moment (groq API is in alpha), but best case 1st token can be sub 100ms
🗣️ Back to
@svpino
Devin - flat out scam / staged demo
Humane - bad software execution, over promised and underdelivered, will likely get better though
Rabbit - I think it will be a cool toy, but will likely get left behind pretty quickly, their “action model” is a facade for hardcoded selenium
@daniel_nguyenx
The term LAM was coined by Silvio Savarese () in '23 where he describes them as LLMs that are able capable of performing tasks autonomously rather than just responding with text, under this definition function calling is indeed LAM
The term then became
Excited to share a small experiment in AI wearables: a device that transcribes conversations and identifies individual speakers, adding a much needed contextual dimension to how we interact with LLMs. Watch the video to see it in action.
100% agreed, response latency is only 1 dimension of getting a realistic conversationalist.
Knowing to stop when interrupted (and maintaining context of when this interruption happened)
Knowing when someone is "thinking" in between words, and not jumping the gun to a response
everyone's optimizing response latency, but that's rarely what breaks user experience. it's interruptibility
most can't be interrupted, and fixing text transcript after an interruption requires TTS to return word-level timestamps
TTS APIs: plz return timestamps
@SullyOmarr
This is indeed brutal
The most mind boggling thing is the voice-to-voice latency, teams like
@Vapi_AI
have shown you create sub sub-second pipelines
TBF there is also action calling involved on the humane pipeline, but 10 seconds just doesn't quite add up, even under poor
GPT can interpret data requests, write valid SQL queries and decide the best way to represent the data to an end user 👀
Little side project i've been working on
Proof that
@perplexity_ai
uses google search in the background 🙄
Websites recently de-indexed by Google are also “unavailable” on
@perplexity_ai
These same websites are available and indexed on every other search engine such as ddg/bing/etc
I also suspect they only look at
@gaganghotra_
@agihippo
Here I tried a query about a website recently deindexed from Google—note Perplexity's response: "Unfortunately, there are no search results
provided for the website godownsize.com."
100% agreed, response latency is only 1 dimension of getting a realistic conversationalist.
Knowing to stop when interrupted (and maintaining context of when this interruption happened)
Knowing when someone is "thinking" in between words, and not jumping the gun to a response
@SergioRocks
Someone recently told me to *NOT* use GPT if i want to become a better programmer
This was shortly after assuming I was a bad programmer because I've never worked in big tech 🤦♂️
@NickADobos
I built a GPT 4 version of this if you want to try it out: - summarization happens on h100s running mixtral for speed, but final formatting is gpt4 powered
@dannypostmaa
Unpopular opinion but this is why typescript is so good, you can write full stack web and mobile apps in the same language
Plus static type safety and active ecosystem out of the box
GPT can interpret data requests, write valid SQL queries and decide the best way to represent the data to an end user 👀
Little side project i've been working on
Alright let's build something. I'm looking for a new apt in SF and scheduling consecutive viewings in the same day that are near eachother has been a painpoint. Going to create a quick app where i can input a list of links and it will:
1. Scrape key data about the listing such
@CustomerIO
I’ve been looking for you all my life.
Amazing product, can never go back to sendgrid or equivalent. This is the way email systems should work ❤️
Excited to share a small experiment in AI wearables: a device that transcribes conversations and identifies individual speakers, adding a much needed contextual dimension to how we interact with LLMs. Watch the video to see it in action.
Fascinating discussion on
#AI
and Humanity's future happening right now. If you have any thoughts please come and share them.
This is a totally open discussion.
Excited to share a small experiment in AI wearables: a device that transcribes conversations and identifies individual speakers, adding a much needed contextual dimension to how we interact with LLMs. Watch the video to see it in action.
@makowskid
@svpino
Same experience with unusable “lag” on both windows and osx
After using gaming mice with dedicated RF dongles the difference is hard to forgive, but it’s one of those things that you don’t notice until you’ve tried it, and then you can’t go back. Much like high refresh rates or
SF wide scavenger hunt happening now by an unknown group. Really well executed, it has a full blown web app and seriously fun puzzles that make you run all around the city.
@petergyang
@anothercohen
I've noticed this too, using code to break up the task into smaller chunks then programmatically stitching the responses together should yield improved results here🤔
I've been finding myself wanting to use it for larger JSON and CSV transformations as well.
@bitcloud
On audio, monolith architecture (vs stitching 3 models together) should in theory go a long way, time to first token is most important on speaking use cases, humans speak relatively slowly
@MatthewBerman
We've been stitching together STT -> LLM -> TTS models to achieve voice to voice, then plopping end-of-turn detection on top of all this to identify when the user is done speaking
An all-in-one model should in theory have tons of advantages, and has the potential to be far more
@JoshWComeau
100% - someone should make a chrome extension that lets you change the prefix (and set a default preference). The only reason to partially select text is when you're using a different package manager.