The previous article introduced the prompting-based techniques to exploit LLMs as text rerankers. In this article, we take a closer look at associated challenges and some of the potential improvements to make these methods more ranking-aware.
Is Cosine-Similarity of Embeddings Really About Similarity?
Netflix cautions against blindly using cosine similarity as a measure of semantic similarity between learned embeddings, as it can yield arbitrary and meaningless results.
📝
RAG Does Not Work for Enterprises
Explores the challenges and requirements for implementing RAG in enterprises proposing potential solutions like semantic search and hybrid queries, and an evaluation framework to validate enterprise-grade RAG solutions
📝
Foundations of Vector Retrieval
This 185-page monograph provides a summary of major algorithmic milestones in the vector retrieval literature, with the goal of serving as a self-contained reference for new and established researchers.
📝
REAPER: Reasoning based Retrieval Planning for Complex RAG Systems
Amazon presents an LLM-based planner for generating efficient retrieval plans in conversational AI systems offering reduced latency, higher accuracy, and easy scalability.
📝
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
Salesforces presents a survey of LLM alignment methods, categorizing approaches into four main topics and identifying future research directions.
📝
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely
Microsoft categorizes data-augmented LLM queries and proposes strategies to tackle challenges in specialized domains.
📝
ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval
Introduces a multilingual dense retrieval model that achieves zero-shot transfer to other languages.
📝
👨🏽💻
A Survey on Mixture of Experts
Provides a comprehensive review of MoE models in LLMs, introducing a new taxonomy and covering algorithmic advancements, system designs, and applications.
📝
👨🏽💻
Large Language Models: A Survey
This paper surveys recent advances in large language models, including prominent models like GPT, LLaMA and PaLM, their construction, capabilities, applications, benchmarks, and open challenges.
📝
CRAG -- Comprehensive RAG Benchmark
Meta presents a factual QA benchmark with 4,409 diverse questions, mock APIs, and realistic challenges, designed to evaluate RAG systems for LLMs.
📝
👨🏽💻
In Defense of RAG in the Era of Long-Context Language Models
NVIDIA argues that long-context LLMs can be overwhelmed by irrelevant information, and proposes an order-preserving RAG method that retrieves relevant context chunks first.
📝
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Combines routing weights and hidden states from MoE LLMs to create superior embeddings without additional training.
📝
👨🏽💻
rerankers: A Lightweight Python Library to Unify Ranking Methods
Introduces a Python library that simplifies the use of various re-ranking methods in information retrieval by providing a unified, easy-to-use interface.
📝
👨🏽💻
How Can Recommender Systems Benefit from Large Language Models: A Survey
Examines the integration of LLMs into RecSys, exploring where and how LLMs can enhance various stages of the recommendation pipeline.
📝
👨🏽💻
Efficient Document Ranking with Learnable Late Interactions
Google introduces a new learnable late-interaction model for query-document relevance that outperforms existing models in accuracy while reducing latency and storage costs.
📝
BM25S: Orders of magnitude faster lexical search via eager sparse scoring
Introduces a fast Python implementation of BM25 that pre-computes scores during indexing using sparse matrices to achieve significant speed improvements
📝
👨🏽💻
MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
Google proposes a retrieval mechanism that reduces multi-vector retrieval to single-vector retrieval by constructing Fixed Dimensional Encodings of a multi-vector representation.
📝
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction
Combines VectorRAG and GraphRAG techniques to improve information extraction from financial documents.
📝
FACTS About Building Retrieval Augmented Generation-based Chatbots
NVIDIA introduces a framework and 15 RAG pipeline control points for building effective enterprise chatbots, providing empirical insights on LLM performance tradeoffs.
📝
GRAG: Graph Retrieval-Augmented Generation
Enhances LLMs' generation capabilities in graph contexts by efficiently retrieving relevant textual subgraphs and integrating them through dual prompting.
📝
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications
This survey categorizes and analyzes 29 prompt engineering techniques for adapting LLMs across tasks without retraining & also highlights several challenges
📝
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Accelerates attention computation in LLMs by using a vector search-based approach to retrieve key-value pairs from CPU memory.
📝
👨🏽💻
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Presents a novel multilevel dynamic caching system that efficiently caches and shares intermediate states of retrieved documents in RAG for LLMs.
📝
StructuredRAG: JSON Response Formatting with Large Language Models
@CShorten30
et al. introduce a benchmark for evaluating LLMs' ability to generate structured JSON outputs, revealing varied performance across tasks and models
📝
👨🏽💻
A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models
Comprehensively reviews RA-LLMs, covering their architectures, training methods, limitations, future directions, and applications in enhancing LLM generation capabilities.
📝
PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents
Enhances RAG models by incorporating user-centric agents, adapting retrieval and generation based on real-time user data.
📝
👨🏽💻
Large Language Models Meet NLP: A Survey
Provides a comprehensive survey of how LLMs are applied to NLP tasks, introducing a new taxonomy and discussing current progress, future frontiers, and challenges.
📝
👨🏽💻
A Guide to Similarity Measures
Provides a comprehensive guide to similarity measures used across various data science fields, offering detailed explanations and principles to understand, select, and design appropriate measures for diverse applications.
📝
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
Provides a comprehensive review of LLM-driven synthetic data generation, organizing existing studies into a unified framework of generation, curation, and evaluation.
📝
User Embedding Model for Personalized Language Prompting
Google Research proposes a User Embedding Module to compress free-text user histories into embeddings for prompting Large Language Models to improve recommendation accuracy.
📝
SPLATE: Sparse Late Interaction Retrieval
Adapts the ColBERTv2 model to map its embeddings to a sparse space, enabling efficient sparse retrieval for candidate generation in the late interaction paradigm.
📝
Searching, fast and slow, through product catalogs
Microsoft presents a fast and accurate SKU search system for CRMs combining Trie-based suggestions, TF-IDF retrieval, and language model embeddings, outperforming existing systems.
📝
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Google improves RAG systems by using a smaller model to generate multiple draft answers from partitioned document subsets, which are then verified by a larger model.
📝
Health-LLM: Personalized Retrieval-Augmented Disease Prediction Model
Proposes a framework integrating LLMs and medical expertise to enhance exploitation of health reports for disease prediction and preventative care.
📝
👨🏽💻
TableRAG: Million-Token Table Understanding with Language Models
Enables efficient large-scale table understanding for language models, using smart retrieval techniques to overcome context length limitations, while reducing token consumption.
📝
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
Enhances the retrieval accuracy of LLMs for complex multi-aspect queries by leveraging activations from the multi-head attention layer as embeddings.
📝
👨🏽💻
Understanding the User: An Intent-Based Ranking Dataset
Presents a new dataset that annotates multiple intents for complex queries from TREC-DL, using LLMs and crowdsourcing.
📝
👨🏽💻
A Survey on Retrieval-Augmented Text Generation for Large Language Models
Presents a comprehensive framework for understanding RAG, outlining its core components, evaluation methods, and future research directions.
📝
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future
Examines the use of LLMs and agents in software engineering, covering six key areas and analyzing their differences, applications, and effectiveness.
📝
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models
Jina AI presents a technique that improves text embeddings for retrieval tasks by encoding entire documents before splitting them.
📝
👨🏽💻
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation
Presents a method for generating task-specific synthetic datasets using user-provided few-shot examples.
📝
👨🏽💻
Dense X Retrieval: What Retrieval Granularity Should We Use?
Proposes proposition-based retrieval, which outperforms passages and sentences by providing compact, factual expressions with context to enhance generalization.
📝
👨🏽💻
Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion
Microsoft Research presents a method to personalize LLMs for search via entity-based user knowledge stores derived from logs.
📝
Large Search Model: Redefining Search Stack in the Era of LLMs
Microsoft proposes using a single large language model for all search tasks instead of many specialized models, formulating tasks as text generation from prompts.
📝
RecMind: Large Language Model Powered Agent For Recommendation
Introduces an autonomous recommender agent powered by LLMs, which leverages planning and external tools like Self-Inspiring (SI) to provide personalized recommendations.
📝
A recent research direction has explored directly prompting LLMs to perform unsupervised ranking using pointwise, pairwise, or listwise techniques. Some of these techniques even surpass the performance of state-of-the-art supervised systems.
Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases
Presents a system that uses RAG and a curated dataset to improve factual accuracy of LLMs
📝
👨🏽💻
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
Proposes a retrieval model that follows natural language instructions, enabling more flexible, and user-friendly search experiences.
📝
👨🏽💻
Self-Retrieval: Building an Information Retrieval System with One Large Language Model
Proposes an end-to-end, LLM-based IR system that leverages the LLM's capabilities for indexing, retrieval, and self-assessment.
📝
👨🏽💻
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
Presents an open-source toolkit that provides a modular framework, pre-implemented RAG algorithms, benchmark datasets, and auxiliary scripts.
📝
👨🏽💻
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach
Google DeepMind compares RAG and long-context LLMs, finding LC outperforms but at a higher cost. Proposes a method to dynamically choose between RAG and LC.
📝
Evaluation of Retrieval-Augmented Generation: A Survey
Presents an analysis framework to systematically evaluate RAG systems by considering retrieval accuracy, generation quality, and additional factors
📝
👨🏽💻
Graph Retrieval-Augmented Generation: A Survey
Presents the first comprehensive survey of GraphRAG, detailing its workflow, technologies, applications, and future directions in improving information retrieval and generation.
📝
Agentic Information Retrieval
Proposes a paradigm using LLM agents to expand and transform traditional information retrieval, offering a unified, flexible approach to complex information tasks.
📝
ResumeFlow: An LLM-facilitated Pipeline for Personalized Resume Generation and Refinement
Presents a tool powered by LLMs like GPT-4 and Gemini that tailors resumes to specific job postings.
📝
👨🏽💻
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering
Introduces a graph QA framework combining LLMs and GNNs with a retrieval method to enable conversing with textual graphs.
📝
👨🏽💻
BERGEN: A Benchmarking Library for Retrieval-Augmented Generation
Naver introduces a Python library for standardizing RAG experiments and reveals key insights through extensive benchmarking.
📝
👨🏽💻
Fast Exact Retrieval for Nearest-neighbor Lookup (FERN)
Proposes an algorithm for fast exact vector retrieval, inspired by kd-trees, that achieves logarithmic time complexity with 100% recall for high-dimensional vectors.
📝
👨🏽💻
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
Dynamically selects the most suitable retrieval-augmented strategy based on the predicted complexity level of input query
📝
👨🏽💻
A Survey of Mamba
Reviews Mamba architecture, highlighting its comparable modeling abilities to Transformers but with near-linear scalability for sequence length, and discusses its advancements, data adaptability, applications, and limitations.
📝
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token
Proposes a context compression method that reinterprets document embeddings as retrieval modality features and integrates them into LMs.
📝
👨🏽💻
A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice
Examines Recommender Systems from 2017 to 2024, bridging theory and practice across various sectors, exploring advanced techniques, and addressing industry challenges.
📝
In-context Learning with Retrieved Demonstrations for Language Models: A Survey
Google Research offers a comprehensive analysis of retrieval-based ICL, highlighting key innovations and future paths to enhance demonstration relevance and diversity.
📝
Efficient Multi-Vector Dense Retrieval Using Bit Vectors
Introduces techniques like optimized bit vector filtering, SIMD-based centroid interaction, product quantization, and per-document term filtering.
📝
👨🏽💻
Seven Failure Points When Engineering a Retrieval Augmented Generation System
Investigates failure points of RAG systems through case studies, finding robustness emerges over time rather than from upfront design, and proposes future research directions
📝
Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together
Presents a method for optimizing multi-stage NLP pipelines by alternating between prompt optimization and LM weight fine-tuning.
📝
👨🏽💻
AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings
Enables any-granularity ranking using multi-vector embeddings with a coarser encoding level, and introduces a multi-granular contrastive loss to improve fine-grained ranking performance.
📝
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework
Presents a framework for generating domain-specific datasets to evaluate RAG systems. Focuses on vertical domains and introduces new metrics to assess LLMs' knowledge usage.
📝
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning
Tsinghua University improves text embeddings in smaller language models (MiniCPM, Phi-2, and Gemma) through contrastive fine-tuning.
📝
👨🏽💻
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Categorizes and examines Chain-of-X (CoX) methods, which generalize the Chain-of-Thought prompting approach to enhance LLM capabilities across various components and application tasks.
📝
Positional encoding is not the same as context: A study on positional encoding for Sequential recommendation
Huawei analyzes positional encodings in transformer-based sequential recommendation systems, proposing new encodings.
📝
👨🏽💻
Enhancing Relevance of Embedding-based Retrieval at Walmart
Walmart presents techniques to enhance embedding-based neural retrieval for its product search, addressing data quality issues and query misspellings.
📝
NV-Retriever: Improving text embedding models with effective hard-negative mining
NVIDIA presents positive-aware hard-negative mining for text embedding models, and a model that topped the MTEB Retrieval benchmark in July'24.
📝
👨🏽💻
EfficientRAG: Efficient Retriever for Multi-Hop Question Answering
Presents a retriever for multi-hop QA that iteratively generates queries without using LLMs, outperforming existing RAG methods on three datasets while reducing latency and cost.
📝
A Survey of Generative Search and Recommendation in the Era of Large Language Models
Present a unified framework for the emerging generative paradigm in search and recommendation that leverages LLMs.
📝
What is the Role of Small Models in the LLM Era: A Survey
Explores the relationship between LLMs and small models, analyzing their collaborative potential and competitive advantages.
📝
👨🏽💻
Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions
Google presents a framework that optimizes LLM embeddings, reducing dimensionality without compromising performance.
📝
Foundation Models for Music: A Survey
Reviews the impact of foundation models like LLMs and latent diffusion models on music, highlighting their potential for advancing music understanding, generation, and medical applications
📝
👨🏽💻
A Survey on Benchmarks of Multimodal Large Language Models
Provides a comprehensive review of 180 benchmarks for evaluating Multimodal Large Language Models, categorizing them into five key areas.
📝
👨🏽💻
Corrective Retrieval Augmented Generation
Makes RAG models more robust by self-correcting inaccurate retrievals with confidence scores, web searches, and knowledge refinement, boosting generation accuracy.
📝
👨🏽💻
RouterRetriever: Exploring the Benefits of Routing over Multiple Expert Embedding Models
Introduces a flexible, multi-expert approach to information retrieval that routes queries to domain-specific experts.
📝
👨🏽💻
Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever
Enhances the ColBERT multi-vector model for multilingual retrieval, incorporating diverse training data and efficiency improvements.
📝
👨🏽💻
LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding
Improves the performance of existing retriever models by enriching document embeddings with contextual information.
📝
Generative Information Retrieval Evaluation
Examines the use of LLMs in two aspects of IR evaluation: leveraging LLMs as evaluation tools, and evaluating LLM-based generative IR systems, while addressing the continued need for human assessment.
📝
Exploring Query Understanding for Amazon Product Search
Amazon examines the role of query understanding in its Product Search, exploring its impact on ranking features, model evaluation, and proposing a framework, based on a year-long study.
📝
Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks
Presents a paradigm that breaks down complex RAG systems into flexible, reconfigurable modules and operators.
📝
Efficient Retrieval with Learned Similarities
Introduces Mixture-of-Logits (MoL) as a universal approximator for learned similarity functions in retrieval tasks, proposing efficient techniques for approximate top-K retrieval.
📝
It's About Time: Incorporating Temporality in Retrieval Augmented Language Models
Augments neural retrievers with temporal relevance to handle evolving information better, crucial for temporal question answering and fact checking.
📝
Re-Ranking Step by Step: Investigating Pre-Filtering for Re-Ranking with Large Language Models
Introduces a pre-filtering method for IR systems that uses LLMs and minimal human input to remove irrelevant passages before re-ranking.
📝
Golden-Retriever: High-Fidelity Agentic Retrieval Augmented Generation for Industrial Knowledge Base
Enhances RAG with reflection-based question augmentation, improving retrieval accuracy by clarifying jargon and context before document retrieval.
📝
Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs
Explores in-context learning in large language models, proposing that it combines knowledge retrieval and learning from examples.
📝
👨🏽💻
Mindful-RAG: A Study of Points of Failure in Retrieval Augmented Generation
Identifies key issues in knowledge graph-based RAG systems for LLMs, focusing on question intent and context alignment.
📝
Recommendation with Generative Models
Offers a comprehensive exploration of generative models in recommender systems, introducing a novel taxonomy and covering system design, evaluation methods, and societal implications.
📝
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Reviews data collection, processing, and evaluation methods for training and assessing multimodal LLMs, providing a data-centric perspective.
📝
👨🏽💻
Learning to Retrieve In-Context Examples for Large Language Models
Introduces an iterative framework for in-context learning with LLMs, which effectively retrieves high-quality examples to improve learning performance across a diverse set of NLP tasks.
📝
Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers
Distills complex pairwise ranking instructions into simpler pointwise instructions to improve the effectiveness of LLMs for zero-shot ranking.
📝
👨🏽💻
Context Embeddings for Efficient Answer Generation in RAG
Speeds up generation time while improving answer quality by compressing multiple contexts into a small number of embeddings, offering flexible compression rates.
📝
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation
Intel presents a framework that integrates training, inference, evaluation, and more to streamline the development of RAG systems for LLMs.
📝
👨🏽💻
Multilingual E5 Text Embeddings: A Technical Report
Microsoft presents the methodology and evaluations for releasing open-source multilingual E5 text embedding models in over 100 languages.
📝
👨🏽💻