Multi-Modal AI is rapidly taking over 🔥🚀
It’s truly amazing how fast
@llama_index
incorporated a robust pipeline for multi-modal RAG capabilities.
Here’s a beginners-friendly guide to get started with multi-modal RAG using LlamaIndex 👇🧵
Using
#ChatGPT
to easily create ChatGPT plugins 🔥
#OpenAI
#GPT4
#AI
#python
#fastapi
A ChatGPT plugin consists of 3 things:
1. An HTTP server
2. An OpenAPI spec
3. A manifest file
Steps to create a plugin from OpenAI's tutorial 👇
Multi Document Agent architecture (v0) in
@llama_index
, a step beyond naive top-k RAG.
It allows answering broader set of questions over multiple documents, which weren't possible with basic RAG.
Let's break down the agent architecture and see how it works 👇🧵
Previously we've seen how to improve retrieval by funetuning an embedding model.
@llama_index
also supports finetuning an adapter on top of existing models, which lets us improve retrieval without updating our existing embeddings. 🚀
Let's see how it works 👇🧵
We've seen that smaller chunks are good for capturing semantic meaning and larger ones are good for providing better context.
@llama_index
AutoMergingRetriever takes it one step further by keeping the chunks in a tree structure and dynamically choosing the chunk length. 🧵👇
While splitting the raw text for Retrieval Augmented Generation (RAG), what should be the ideal length of each chunk? What’s the sweet spot?
Strike a balance between small vs large chunks using
@LangChainAI
ParentDocumentRetriever
Let's see how to use it 👇🧵
Let's talk about FLARE - Forward Looking Active RAG and how to implement it using
@llama_index
FLAREInstructQueryEngine.
Instead of doing retrieval once at the beginning, FLARE retrieves information dynamically multiple times during token generation 🚀
Details below 🧵👇
Ingestion Pipeline is a new and improved way to ingest and manage documents in
@llama_index
It supports:
- applying a series of transformation on documents
- caching those transformations
- managing ever-changing documents etc.
Let's see how to use it 👇🧵
Finetuning the embedding model can allow for more meaningful embedding representations, leading to better retrieval performance.
@llama_index
has abstraction for finetuning sentence transformers embedding models that makes this process quite seamless.
Let's see how it works 👇
Open source AI Diagram Generator 🔥
Uses
@llama_index
Pydantic program with partial JSON parsing and
@vercel
AI SDK to send intermediate diagrams during generation for improved UX 🚀
Repo:
Full tutorial under 2.5 minutes 👇
Extract tables from documents using
@llama_index
UnstructuredElementParser and then use RecursiveRetriever to enable hybrid tabular/semantic queries and also comparisons over multiple docs.
Let's see how to use this advanced RAG technique 🧵👇
New Open Source, Full Stack RAG project 🔥🚀
Bootstrapped with
@llama_index
create-llama 🔥
It uses loads of amazing LlamaIndex goodies e.g. Ingestion Pipeline, multi-documents agents, custom callback handler, transformations and more.
Repo:
Demo 👇
Lost in the middle problem in RAG and how
@LangChainAI
LongContextReorder addresses it.
In RAG, for really long context (10+ retrieved docs), turns out it's not the best way to just plug-in the docs in the descending order of vector similarity score.
Created FootballGPT using GPT-4
@LangChainAI
and
@ApiFootball
.
Taught GPT-4 how to use a complex API with many endpoints and numerous parameters.
This is where GPT-4's advanced reasoning capability came in handy.
Here's a demo 👇
One issue of using embeddings to retrieve relevant documents is that the results might vary with the slightest change in the wording of the query.
@LangChainAI
MultiQueryRetriever tries to address this issue with the help of LLMs.
Let's see how to use it 👇🧵
Previously we've seen
@LangChainAI
ParentDocumentRetriever that creates smaller chunks from a document and links them back to the initial documents during retrieval.
MultiVectorRetriever is a more customizable version of that. Let's see how to use it 🧵👇
Fully local, open source chat-with-pdf app tutorial under 2.5 minutes 🔥🚀
Stack used:
@llama_index
Typescript for RAG
@ollama
@nextjs
with server actions
Phi2 and
@nomic_ai
models using Ollama
Detailed tutorial:
GitHub repo:
The "Dense X Retriever" paper shows that it significantly outperforms the traditional chunk-based retriever
@LoganMarkewich
created an awesome LlamaPack that lets you get started with this proposition-based retriever in no time using
@llama_index
🔥
Let's see how it works 👇🧵
Introducing LlamaBot 🔥🚀
An open-source Discord bot that listens to your conversations, remembers them and answers your questions across a discord server, created using
@llama_index
(inspired by
@seldo
's LlamaBot for Slack)
Stack used: LlamaIndex, Gemini Pro,
@qdrant_engine
LlamaPacks by
@llama_index
are out 🚀🔥
In this speedrun 🏃♂️, I wanted to demonstrate how fast and easy it is to create a gmail agent for your inbox using LlamaPacks.
Spoiler Alert⚠️
It took only 54.86 seconds 🚀 to get to the chat interface, with only 4-5 lines of code 🔥🤯
Streaming intermediate events in RAG is crucial for best user experience 🚀
Let's see how to use
@llama_index
and
@vercel
AI SDK to properly stream intermediate events to the frontend.
Full tutorial under 3 minutes 🔥
Previously I've talked about the amazing Ingestion Pipeline from
@llama_index
.
Here's how to use Redis (
@Redisinc
) as the docstore, vectorstore and cache for the pipeline.
LlamaIndex abstractions make it really easy to just use Redis for the entire pipeline 🔥👇
Created NewsBuddy 📰 using
@LangChainAI
and ChatGPT API.
NewsBuddy is your personal news assistant.
Had quite fun building this fun little project overnight. Learnt loads of new stuff about LangChain and Prompt Engineering.
Here's a demo of our NewsBuddy 👇
Checkout this new OSS repo by
@seldo
that contains detailed, step by step instructions on how to build a slack bot completely from scratch using
@llama_index
The bot listens to conversations and answers questions about them 🔥
Here's the high level architecture of the bot 👇
Using
#ChatGPT
to easily create Chrome extensions from scratch in 15 minutes 🔥
Full Step-by-Step Tutorial with prompts.
#OpenAI
#AI
We'll use ChatGPT to create a simple extension, QuikNote, that takes quick daily notes right from the browser.
Here are the steps required 👇
The issue:
- smaller chunks reflect more accurate semantic meaning after creating embedding
- but they sometimes might lose the bigger picture and might sound out of context, making it difficult for the LLM to properly answer user's query with limited context per chunk.
@llama_index
Here's a nice animation by the authors demonstrating how FLARE blends generation and retrieval by dynamically incorporating relevant and up-to-date information.
Source:
LlamaIndex FLAREInstructQueryEngine:
Within 24 hours, OpenAI's Sora has dazzled with some stunning videos🌟
Introducing FlixAI, A one-stop hub where I've compiled all the videos by Sora so far, alongside their prompts.
It supports semantic search and suggests similar videos, other models like Pika, runway etc.
Architecture:
- For each document, a VectorIndex is created for semantic search, and a SummaryIndex is created for summarization
- Then we create QueryEngine for both these Indices
- Next the QueryEngines are converted to QueryTools
@LangChainAI
LongContextReorder addresses this issue by re-ordering the documents after retrieval.
It puts the most similar ones at the top, and then the next few ones at the end, and the least similar ones in the middle.
The Issue:
In the context window of LLM prompt, we put the most similar documents at the top, and least similar ones at the bottom.
But LLMs tend to ignore documents at the middle of its context.
Hence, this is where we should put the least similar ones, not at the bottom.
Recent research shows that:
- Performance is often highest when document containing answer to user's question occurs at the beginning or at the end of the context
@LangChainAI
ParentDocumentRetriever addresses this issue by creating embedding from the smaller chunks only as they capture better semantic meaning.
But while plugging into the LLM input, it uses the larger chunks with better context.
Multi Document Agent architecture (v0) in
@llama_index
, a step beyond naive top-k RAG.
It allows answering broader set of questions over multiple documents, which weren't possible with basic RAG.
Let's break down the agent architecture and see how it works 👇🧵
Unreal Engine is changing the Photorealistic Animation game with their upcoming "MetaHuman Animator".
You can use your iPhone to shoot and then reproduce animation of facial expressions with Insane details and fidelity, all within minutes.
#AI
#UnrealEngine
#MetaHuman
These Tools are passed to OpenAIAgent. This is the document agent.
Each document has an agent like this that chooses to perform summarization or semantic search within each document.
Next we have a top-level Retriever-Enabled Agent.
This boss agent orchestrates across different document agents.
First it retrieves the document agents relevant to the question, then passes the input to those agents only and crafts the response from those agent outputs.
@llama_index
First let’s start with some simple stuff.
We just want to ask questions about our images.
OpenAIMultiModal is a wrapper around OpenAI’s latest vision model that lets us do exactly that.
Meet GlowGPT.
Upload a photo and get instant feedback and suggestions from AI.
Created using
@OpenAI
chatgpt API,
@LangChainAI
,
@Gradio
,
@huggingface
transformers and BLIP models.
Here's a demo 👇
@LoganMarkewich
@llama_index
Thanks to
@LoganMarkewich
, there's already a LlamaPack for ""Dense X Retriever" that handles:
- generating the propositions
- creating the vector index
- Creating the retriever (Recursive retriever in this case) and the query engine
Here's how to use the pack 👇
@llama_index
@seldo
@qdrant_engine
Features:
- We can ask LlamaBot questions about what's going on across the server
- We can tell LlamaBot to start/stop listening to conversations.
- We can check current listening status, or ask the bot to forget everything from the server.
@LoganMarkewich
@llama_index
The paper also shows how to create these propositions 👇
First GPT4 is prompted properly to generate some propositions.
Then Flan-T5-Large model is finetuned with the generated propositions.
The finetuned model is called "The Proposition-izer"
Multi-Modal AI is rapidly taking over 🔥🚀
It’s truly amazing how fast
@llama_index
incorporated a robust pipeline for multi-modal RAG capabilities.
Here’s a beginners-friendly guide to get started with multi-modal RAG using LlamaIndex 👇🧵
@LoganMarkewich
@llama_index
A proposition is an atomic, self-contained text encapsulating a distinct factiod, written in simple natural language format.
A single Proposition encapsulates only one contextualized atomic fact. It cannot be further split into separate propositions.
2. Then create the plugin manifest.
A Plugin manifest is a json file with:
- simple metadata about the plugin
- tells how to show the plugin to a human
- also how to describe it to the language model
🚀 Github Copilot JUST got way better, with the help of GPT-4. 🔥
GitHub just announced Copilot X with stunning new features like:
- Chat and voice support
- Copilot for terminal
- Answering questions from docs
- Generate Pull requests
1/6
#AI
#ChatGPT
#GPT4
#Github
#Copilot
@llama_index
FLARE addresses this issue by dynamically adapting to the evolving context while it's being generated.
During generation, when low confidence tokens are generated (possible hallucination), FLARE actively performs retrieval.
Thus we use small chunks (with better semantic meaning) for vector similarity matching and return their corresponding larger chunks that have the bigger picture and more context.
Update: I've added streaming partial objects feature to the built-in
@llama_index
OpenAIPydanticProgram (Thanks
@_nerdai_
for the review)
So you can just call the 'stream_partial_objects' method of the built-in class now.
The project repo has been updated accordingly as well.
Open source AI Diagram Generator 🔥
Uses
@llama_index
Pydantic program with partial JSON parsing and
@vercel
AI SDK to send intermediate diagrams during generation for improved UX 🚀
Repo:
Full tutorial under 2.5 minutes 👇
To address this issue, we can just re-order the retrieved documents ourselves so that the least relevant ones are at the middle.
Or we can use LongContextReorder from LangChain, that does it automatically.
@llama_index
These are the transformations we can use:
1. TextSplitter
2. NodeParser
3. MetadataExtractor
4. Any embedding model
We can also create custom transformations. Guide on this is coming soon.
Output of one transformation is the input to the next one.
Thanks for reading.
I write about AI, ChatGPT, LangChain, RAG etc. and try to make complex topics as easy as possible.
Stay tuned for more ! 🔥
#ChatGPT
#LangChain
Lost in the middle problem in RAG and how
@LangChainAI
LongContextReorder addresses it.
In RAG, for really long context (10+ retrieved docs), turns out it's not the best way to just plug-in the docs in the descending order of vector similarity score.
Don't miss out on these amazing new ChatGPT powered chrome extensions 🚀🔥
1. ParagraphAI - Perfectly curated writing
2. Glasp - YouTube summary
3. Merlin - ChatGPT Plus, on all sites
4. Glarity - summarize Google/Bing results
Make the most out of these AI tools.
Storing the chunks
- As we're creating embedding for the small chunks only, we'll use a vectorstore to store those.
- Whereas the larger chunks are stored in an InMemoryStore, a KEY-VALUE pair data structure, that stays in the memory while the program is running.
@llama_index
FLARE Instruct:
This mode prompts the LLM to identify and put search queries during generation through few shot prompting.
e.g. Donald Trump attended [Search(which college did Donald Trump attend?)]
Found this amazing GPT-4 powered chrome extension - Taxy AI
It automates repetitive browsers actions by sending parts of DOM and user prompt to GPT-4. Then GPT-4 performs that action for you 🔥
Here's how it performs various repetitive taks from one-line user prompt 👇
After filling in, we try merging parent nodes.
Hypothesis is that if the ratio of no of retrieved children of a parent vs total children of that parent is above a threshold(we can adjust it), then we might as well return the larger parent for better context.
@llama_index
LlamaIndex has MultiModalVectorStoreIndex which creates embedding for both image and text nodes and stores them in vector stores.
For image nodes it uses 'clip' and for text nodes it uses 'ada' for getting the embedding (customizable).
Let’s create the multi-modal index
The first step here is parsing via the HierarchicalNodeParser.
It stores the node in a tree structure, where deeper nodes are smaller chunks and shallow nodes are larger chunks.
We can specify how many layers of nodes we want and the splitter size for each layer.
@LangChainAI
ParentDocumentRetriever automatically creates the small chunks and links their parent document id.
If we want to create some additional vectors for each documents, other than smaller chunks, we can do that and then retrieve those using MultiVectorRetriever.
While splitting the raw text for Retrieval Augmented Generation (RAG), what should be the ideal length of each chunk? What’s the sweet spot?
Strike a balance between small vs large chunks using
@LangChainAI
ParentDocumentRetriever
Let's see how to use it 👇🧵
All nodes are stored in a docstore and only the leaf nodes are stored in a vectorstore.
At first, the vectorstore retriever is called to get the initial leaf nodes.
From here we try to auto-merge parents to find parent with the correct chunk size.
After receiving some feedback from you guys (which I really appreciate), I've made some updates to LlamaBot:
- Use GPT4 or Cohere
- Remember user mentions
- Refine prompt etc.
If you encounter any issues while using the bot feel free to let me know or open an issue on GitHub
Introducing LlamaBot 🔥🚀
An open-source Discord bot that listens to your conversations, remembers them and answers your questions across a discord server, created using
@llama_index
(inspired by
@seldo
's LlamaBot for Slack)
Stack used: LlamaIndex, Gemini Pro,
@qdrant_engine
@llama_index
This parser:
- extracts tables from data
- converts those tables to Dataframe
- for each of those tables, it creates 2 nodes
- one Table Node that contains the Dataframe as string
- another IndexNode that stores the summary of that table and a reference to that Table Node
Ingestion Pipeline is a new and improved way to ingest and manage documents in
@llama_index
It supports:
- applying a series of transformation on documents
- caching those transformations
- managing ever-changing documents etc.
Let's see how to use it 👇🧵
@llama_index
.
@llama_index
has guides on how to finetune embeddings in different ways:
- finetune the embedding model itself (only sentence transformers)
- finetune an adapter over any black-box embedding model (stay tuned for this one 🔥)
@llama_index
Next we partition the nodes using this built-in function of the Unstructured parser.
Here BaseNodes contain the regular nodes and the IndexNodes (not the Table Nodes)
NodeMapping contains {id->Node} mapping for those remaining Table Nodes.
Thanks for reading.
I write about AI, CloudNative, Kubernetes, System Design etc. and try to make complex topics as easy as possible.
Stay tuned for more.
AI won't steal your girl, but someone using FlirtGPT definitely will 😎
Don't use cheesy pick-up lines anymore 🚫
Just upload a pic of your crush and let ChatGPT generate amazing and personalized pick-up lines for you 😍
Built using
@LangChainAI
@Gradio
& BLIP models.
Demo 👇
We create any retriever as usual.
And then get the relevant documents using the get_relevant_documents() method of that retriever.
This returns the documents in the descending order of their similarity score.
@llama_index
Just like text based RAG, where we were limited by the context length, here we’re also limited by how many images we pass.
Hence, we would only want to pass the images that are related to our query.
How do you find images related to your query??
Yep, via vector embedding 🚀
@llama_index
3 Steps for finetuning embeddings:
1. Prepare the data via generate_qa_embeddings_pairs()
2. finetune model via SentenceTransformersFinetuneEngine
3. Evaluate the model
@llama_index
The linear adapter:
The query embedding is updated using this linear transformation of the adapter:
updated_q = W*q + b
We train the linear adapter on the training corpus to find the best value for the weight and bias, W and b.
#AI
PROJECTS MEGA-THREAD
Thought of curating all my AI related projects and experiments in one thread so it's easier to find.
Will be updating this thread with all the AI projects I build in the future. So stay tuned 🔥
Projects were built using
@OpenAI
@LangChainAI
🧵 👇
Learnt a lot about prompt engineering and how LangChain works under the hood. Really enjoying playing with LangChain.
Created a custom chat agent for this one via extending LangChain's ConversationalChatAgent.
Also had to cut the delays in the demo as GPT-4 was quite slow.
@llama_index
.
@llama_index
has a FLAREInstructQueryEngine that makes is really easy to work with FLARE.
It currently implements FLARE Instruct mode, which tells the LLM to generate retrieval instructions.
The "Dense X Retriever" paper shows that it significantly outperforms the traditional chunk-based retriever
@LoganMarkewich
created an awesome LlamaPack that lets you get started with this proposition-based retriever in no time using
@llama_index
🔥
Let's see how it works 👇🧵
Using
#ChatGPT
to easily create Chrome extensions from scratch in 15 minutes 🔥
Full Step-by-Step Tutorial with prompts.
#OpenAI
#AI
We'll use ChatGPT to create a simple extension, QuikNote, that takes quick daily notes right from the browser.
Here are the steps required 👇
@llama_index
Told you it was easy.
LlamaIndex handles all the underlying logic for converting those image_documents to compatible format for the multi-modal llm.
But there’s an issue !! 👇
We've seen that smaller chunks are good for capturing semantic meaning and larger ones are good for providing better context.
@llama_index
AutoMergingRetriever takes it one step further by keeping the chunks in a tree structure and dynamically choosing the chunk length. 🧵👇
LlamaIndex shows from the pariwise comparision evaluation results that when asked, GPT-4 preferred the results produced using AutoMergingRetriever vs baseline retriever 65% of the time, which is above average.
3. Then deploy the server and manifest json file.
4. After deploying the plugin server, add the plugin to ChatGPT
- Provide the domain where the plugin is hosted
- Provide auth token if needed
Thanks to LlamaIndex, creating an AutoMergingRetriever is quite straightforward.
we just need to pass the base retriever and the storage context containing the docstore of the hierarchical nodes to it's constructor. And then we can use it like any other retriever.
@llama_index
Transformations are the building blocks of Ingestion Pipeline.
Each transformation takes a list of nodes, and returns another list of nodes after making the desired modifications to them.
We define the transformations while instantiating the pipeline itself.