Elevating Chatbot Accuracy: Mastering MultiQueryRetriever and SelfQueryRetriever with Contextual Compression

Tayyab Bilal
June 21, 2024

Introduction

In the fast-paced world of machine learning, ensuring that chatbots provide accurate and contextually relevant responses is critical. At Quids, we specialize in creating intelligent solutions to combat the issue of inaccurate responses, often referred to as hallucinations. This blog post will explore two powerful Langchain retrievers—MultiQueryRetriever and SelfQueryRetriever—and how we enhance the latter with Contextual Compression. We’ll walk you through their benefits, show you how to implement them, and share how they enhance the chatbots integrated into our sister company Mamaar.ai’s Shopify shop.

The Problem: Tackling Inaccurate Responses and Hallucinations

Before diving into the solutions, it’s essential to understand the problem at hand. Hallucinations occur when a chatbot generates plausible but incorrect or nonsensical answers. This issue can significantly undermine user trust and satisfaction, as users rely on the chatbot to provide accurate and helpful information. Hallucinations often arise when the chatbot lacks sufficient context to understand the query fully or when it misinterprets the available data.

One common cause of hallucinations is the chatbot’s limited context window. Chatbots typically process a fixed amount of information at a time, and if the relevant data is not within this window, the bot may struggle to generate accurate responses. Simply loading the entire dataset into the context window isn’t a viable solution due to technical limitations such as memory constraints and processing power. Additionally, an overloaded context window can lead to increased noise, making it harder for the chatbot to identify the most pertinent information.

To address this challenge, we employ advanced retrievers to enhance the accuracy and reliability of chatbot responses. These retrievers are designed to fetch the most relevant documents or data chunks based on the user’s query, ensuring that the chatbot has access to the necessary context without overwhelming it with irrelevant information. By strategically selecting and presenting the right data, these retrievers help the chatbot generate more accurate and contextually appropriate responses, thereby reducing the occurrence of hallucinations and improving overall user satisfaction.

Setting Up Documents and Metadata

A crucial step in leveraging Langchain Retrievers is structuring your documents and setting up meaningful metadata. Metadata acts as the contextual backbone of the documents, enhancing the retriever’s ability to discern relevant information. For example, in the following setup, documents RWP1005 and RWP1006 cover the same topic, but the “version” parameter in their metadata provides crucial information about which document would be more relevant to the user’s query. This differentiation is particularly important for SelfQueryRetriever, as we will explore later in this blog. Moreover, meaningful metadata can aid in categorizing content, improving search efficiency, and ensuring that the most accurate and contextually appropriate responses are delivered. For brevity, I have excluded the content of the documents in the code examples, but they are available at the following drive link (Blog 2 files) for you to download and test the code yourself.

				
					from langchain_core.documents import Document
import os


docs = [
    Document(
        page_content="""This policy document sets forth the standards for code quality at our company, aimed at ensuring that all software......""",
        metadata={
            "revision_date": "2023-06-01",
            "version": "1.0",
            "category": "Code Standards",
            "author": "Tech Department",
            "review_frequency": "Annually",
            "compliance_required": "High",
            "applicability": "Tech team",
            "confidentiality_level": "Internal",
            "document_id": "CS1001"
        },
    ),
    Document(
        page_content="""This policy document outlines the leave provisions offered by our company......""",
        metadata={
            "revision_date": "2023-06-10",
            "version": "1.2",
            "category": "Leave Policy",
            "author": "HR Department",
            "review_frequency": "Biannually",
            "eligibility": "After 6 months of employment",
            "confidentiality_level": "Internal",
            "document_id": "LP1002"
        },
    ),
    Document(
        page_content="""This policy document outlines the guidelines and procedures for remote work at our company, designed to provide employees with the flexibility to work from locations outside the traditional office environment. Our remote work policy is aimed at boosting productivity, promoting a better work-life balance, and enhancing overall employee satisfaction.

        1. Eligibility and Scope:
        - Remote work is available to all employees whose job duties are compatible with working from a location outside the office.
        - Eligibility for remote work is determined based on the role, performance history, and the operational needs of the department.

        2. Work Arrangement Types:
        - Full-time Remote: Employees can work remotely on a full-time basis, provided their role and departmental guidelines support this arrangement.
        - Hybrid Remote: Employees may choose to work part of the week in the office and part from a remote location. The specific days in the office will be coordinated with the team leader to ensure team cohesion and operational efficiency.
        - Temporary Remote: Employees may request temporary remote work arrangements for specific reasons such as health issues, caregiving responsibilities, or extraordinary circumstances like natural disasters.......""",
        metadata={
            "revision_date": "2023-06-12",
            "version": "2.0",
            "category": "Remote Work Policy",
            "author": "HR Department",
            "review_frequency": "Annually",
            "flexibility": "High",
            "confidentiality_level": "Public",
            "document_id": "RWP1005"
        },
    ),
    
    Document(
        page_content="""
        This policy document has been updated to reflect our evolving approach to remote work, with a focus on enhancing security, flexibility, and work-life integration. This version introduces new guidelines and resources to support our employees in a more dynamic remote working environment.

        1. Eligibility and Scope:
        - Remote work is now categorized into flexible, fixed, and temporary types to better accommodate diverse employee needs and job functions.
        - Eligibility criteria have been updated to include an assessment of the employee's remote work environment to ensure it meets our new ergonomic and security standards.

        2. Work Arrangement Types:
        - Flexible Remote: Employees can choose their remote workdays dynamically, subject to a maximum of three days per week, with prior approval from their supervisor.
        - Fixed Remote: For roles that are permanently remote, employees will not have an office space in the company premises but are required to live within commuting distance for occasional in-office meetings.
        - Project-based Remote: Temporary remote work arrangements are now project-based and will be reviewed at the end of each project or annually, whichever comes first.....""",
        metadata={
        "revision_date": "2024-06-12",
        "version": "3.0",
        "category": "Remote Work Policy",
        "author": "HR Department",
        "review_frequency": "Biannually",
        "flexibility": "High",
        "confidentiality_level": "Public",
        "document_id": "RWP1006"
        },
    )
]
				
			

MultiQueryRetriever: Enhancing Response Accuracy

Now that we have set up the documents, let’s dive into the MultiQueryRetriever. As the name suggests, MultiQueryRetriever is an ingenious solution designed to handle diverse query formulations. It takes the user’s query and uses a Language Model (LLM) to generate multiple variants of it. This approach is particularly useful when the original query requires information on multiple topics to be comprehensively answered. By generating multiple queries, we can fetch documents for each variant, ensuring the retrieval of the most relevant information. Here’s why MultiQueryRetriever stands out:

  • Handles Variability: By generating multiple query variants, it captures relevant documents even if the user’s query is ambiguous or phrased differently.
  • Improved Accuracy: Cross-checking results across multiple queries leads to more accurate and reliable responses.
  • Enhanced Coverage: By addressing various aspects of the query, it ensures that no critical information is overlooked.
  • Robustness: Helps in dealing with complex and multifaceted queries, making the retrieval process more comprehensive.

This powerful retriever significantly boosts the performance and reliability of chatbots, providing users with precise and contextually relevant answers.

Implementing MultiQueryRetriever

Now, let’s look at how to implement MultiQueryRetriever:

				
					# Import necessary modules for document processing and vector database creation
import logging
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI

# Set up logging to capture information-level logs for the multi-query retriever
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)


# Initialize the text splitter with a specified chunk size and overlap
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)

# Assume 'docs' is a list of documents already loaded
# Split the documents into smaller chunks for better processing
splits = text_splitter.split_documents(docs)

# Initialize the OpenAI embeddings with the provided API key
embedding = OpenAIEmbeddings(api_key=<openai_api_key>)

# Create a Chroma vector database from the document splits using the embeddings
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)

# Initialize the OpenAI Chat model with a specified temperature and API key
llm = ChatOpenAI(temperature=0,api_key=<openai_api_key>)

# Create a MultiQueryRetriever by combining the vector database retriever with the LLM
retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=vectordb.as_retriever(), llm=llm
)
				
			

Running the query

				
					# Define the user query
question = "How many annual and Sabbatical leaves do i have?"

# Use the MultiQueryRetriever to fetch relevant documents based on the question
unique_docs = retriever_from_llm.invoke(question)

# Iterate through the fetched documents and print their content
for fetched_doc in unique_docs:
    print("########")
    print(fetched_doc.page_content)
				
			

Result

				
					########
4. Sabbatical Leave:
    - After five years of continuous service, employees are eligible to apply for a sabbatical leave of up to six months.
    - Sabbaticals are intended for personal or professional development and must be approved by senior management based on the proposed plan.
    - Employees on sabbatical receive 50% of their regular salary and are guaranteed the same or a similar position upon return.
########
This policy document outlines the leave provisions offered by our company, designed to support the well-being and work-life balance of our employees. The following leave options are available to all employees who have completed at least six months of continuous service.
1. Annual Leave:
    - Employees are entitled to 20 working days of paid annual leave each year.
    - Leave must be scheduled at least two weeks in advance and approved by the direct supervisor to ensure continuity of operations.
    - Unused leave can be carried over to the next year, with a maximum of 5 days being transferable.
				
			
Here we can notice that the user inquired about two different topics: annual and sabbatical leaves. When examining the document, we see that these topics were not placed consecutively. However, by running multiple queries and obtaining multiple results, we were able to retrieve both relevant chunks in one go.

SelfQueryRetriever: Context-Aware Document Retrieval

Now let’s move on to the SelfQueryRetriever, a method that takes a more context-focused approach. This retriever shines when users ask questions that are better answered by fetching documents based on metadata rather than mere text similarity.

Utilizing an LLM, SelfQueryRetriever transforms user input into two components:

  1. a string for semantic lookup.
  2. a metadata filter to refine the search.

This approach is particularly useful when documents are structured such that the content alone is not sufficient to determine relevance, and decisions must be made based on metadata. 

For this demo, we will focus on documents RWP1005 and RWP1006 as they have similar content but different version numbers in the metadata. Here are the key advantages of SelfQueryRetriever:

  • Context-Aware: Generates highly relevant queries that align closely with the user’s initial query, ensuring the retrieval of precise information.
  • Reduced Noise: Focuses on context to minimize irrelevant document retrieval, thereby enhancing the accuracy of responses.
  • Metadata-Driven: Leverages metadata to filter and select the most relevant documents, which is crucial when document content is insufficient for determining relevance.

This retriever is ideal for scenarios where the right document must be chosen based on metadata, significantly improving the chatbot’s ability to provide accurate and contextually appropriate answers.

Implementing SelfQueryRetriever

Here’s a quick implementation of SelfQueryRetriever:

				
					# Import necessary modules for creating a vector store and embeddings
from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_openai import ChatOpenAI

# Initialize a Chroma vector store using documents and OpenAI embeddings
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings(api_key=<openai_api_key>))

# Define metadata field information for the documents
metadata_field_info = [
    AttributeInfo(
        name="revision_date",
        description="The date on which the document was last updated or revised.",
        type="string",
    ),
    AttributeInfo(
        name="version",
        description="Indicates the version number of the document, helping to track changes and updates over time",
        type="string",
    ),
    AttributeInfo(
        name="category",
        description="Classifies the document into a specific category, which in this case is related to the type of policy or topic addressed.",
        type="string",
    ),
    AttributeInfo(
        name="author",
        description="The person or department responsible for creating or maintaining the document.",
        type="string",
    ),
    AttributeInfo(
        name="review_frequency",
        description="Specifies how often the document is formally reviewed and updated.",
        type="string",
    ),
    AttributeInfo(
        name="confidentiality_level",
        description="Indicates the level of confidentiality of the document, which dictates who can access and view the document.",
        type="string",
    ),
    AttributeInfo(
        name="document_id",
        description="A unique identifier assigned to the document for tracking and retrieval purposes.",
        type="string",
    ),
]

# Description of the document content
document_content_description = "Documents of a software company"

# Initialize the OpenAI Chat model with a specified temperature and API key
llm = ChatOpenAI(temperature=0,api_key=<openai_api_key>)

# Create a SelfQueryRetriever using the LLM, vector store, document content description, and metadata field info
retriever = SelfQueryRetriever.from_llm(
    llm,
    vectorstore,
    document_content_description,
    metadata_field_info,
)
				
			

Running the query:

To demonstrate the use of SelfQueryRetriever, we will query for the same content while specifying the “version” parameter in the query. This will guide the retriever to fetch information from the correct document

Test 1
				
					fetched_docs = retriever.invoke("what are the work arrangement types in version 2.0 of remote work policy?")
print(f"Version number: {fetched_docs[0].metadata['version']}")
print(fetched_docs[0].page_content)
				
			
Result
				
					Version number: 2.0
This policy document outlines the guidelines and procedures for remote work at our company, designed to provide employees with the flexibility to work from locations outside the traditional office environment. Our remote work policy is aimed at boosting productivity, promoting a better work-life balance, and enhancing overall employee satisfaction.

        1. Eligibility and Scope:
        - Remote work is available to all employees whose job duties are compatible with working from a location outside the office.
        - Eligibility for remote work is determined based on the role, performance history, and the operational needs of the department.

        2. Work Arrangement Types:
        - Full-time Remote: Employees can work remotely on a full-time basis, provided their role and departmental guidelines support this arrangement.
        - Hybrid Remote: Employees may choose to work part of the week in the office and part from a remote location. The specific days in the office will be coordinated with the team leader to ensure team cohesion and operational efficiency.
        - Temporary Remote: Employees may request temporary remote work arrangements for specific reasons such as health issues, caregiving responsibilities, or extraordinary circumstances like natural disasters.

        3. **Work Hours and Availability**:
        - Remote employees must follow their standard work hours unless flexible scheduling is approved by their manager. This is to ensure alignment with team objectives and company operations.

        .
        .
        .
				
			
Test 2
				
					fetched_docs = retriever.invoke("what are the work arrangement types in version 3.0 of remote work policy?")
print(f"Version number: {fetched_docs[0].metadata['version']}")
print(fetched_docs[0].page_content)
				
			
Result
				
					Version number: 3.0

        This policy document has been updated to reflect our evolving approach to remote work, with a focus on enhancing security, flexibility, and work-life integration. This version introduces new guidelines and resources to support our employees in a more dynamic remote working environment.

        1. Eligibility and Scope:
        - Remote work is now categorized into flexible, fixed, and temporary types to better accommodate diverse employee needs and job functions.
        - Eligibility criteria have been updated to include an assessment of the employee's remote work environment to ensure it meets our new ergonomic and security standards.

        2. Work Arrangement Types:
        - Flexible Remote: Employees can choose their remote workdays dynamically, subject to a maximum of three days per week, with prior approval from their supervisor.
        - Fixed Remote: For roles that are permanently remote, employees will not have an office space in the company premises but are required to live within commuting distance for occasional in-office meetings.
        - Project-based Remote: Temporary remote work arrangements are now project-based and will be reviewed at the end of each project or annually, whichever comes first.

        3. **Work Hours and Availability**:
        - Introduction of "core hours" from 10 AM to 2 PM, during which all remote employees must be available regardless of their time zone.
        .
        .
        .
				
			

We can see that in both cases, the retriever was able to understand the user query, formulate a metadata filter, and leverage the query to match with both the content and the metadata, ultimately providing us with the correct document. It is important to note, however, that this method has a caveat: it returns the entire document instead of the specific chunk needed. But don’t worry, we will address this issue in the next section.

Boosting SelfQueryRetriever with Contextual Compression

While SelfQueryRetriever is highly effective, we saw that it returns entire documents, which can introduce unnecessary noise. To address this, we utilize Langchain’s Contextual Compression to extract only the most relevant parts of a document using an LLM. The idea is simple: instead of returning the retrieved documents as-is, we compress them based on the query context, ensuring that only the pertinent information is provided. Here, “compressing” means both reducing the content of individual documents and filtering out irrelevant documents. The Contextual Compression Retriever, Consists of the following parts::
  1. A base retriever (in our case it’s a SelfQueryRetriever but this could be any retriever of your choice).
  2. A Document Compressor
The Contextual Compression Retriever works by passing queries to the base retriever, which retrieves initial documents. These documents are then processed by the Document Compressor, which shortens the list by condensing the contents of documents or eliminating irrelevant ones entirely.  To summarize  why Contextual Compression is beneficial:
  • Selective Extraction: It extracts only the relevant sections of the document, significantly reducing irrelevant information and noise.
  • Enhanced Accuracy: By focusing on contextually relevant data, it delivers more precise and accurate responses.
  • Improved Efficiency: It minimizes the amount of data processed and presented, leading to faster and more efficient query handling.
  • Better User Experience: Users receive concise and relevant information, enhancing their overall interaction with the chatbot.
With Contextual Compression, we ensure that our chatbot not only retrieves the right documents but also delivers the most relevant content, further improving the accuracy and reliability of responses.

Implementing Contextual Compression

Let’s explore how Contextual Compression can be integrated with SelfQueryRetriever to boost its performance:
				
					# Import necessary modules for Contextual Compression
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_openai import OpenAI

# Initialize the OpenAI LLM with a specified temperature and API key
llm = OpenAI(temperature=0,api_key=<openai_api_key>)

# Create a compressor using LLMChainExtractor from the initialized LLM
compressor = LLMChainExtractor.from_llm(llm)

# Create a ContextualCompressionRetriever by combining the base compressor and base retriever
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)
				
			

Running the query

				
					compressed_docs = compression_retriever.invoke(
    "What are the work arrangement types in version 2.0 of the remote work policy?"
)
print(f"Version number: {compressed_docs[0].metadata['version']}")
print(compressed_docs[0].page_content)
				
			

Result

				
					Version number: 2.0
- Work Arrangement Types:
        - Full-time Remote: Employees can work remotely on a full-time basis, provided their role and departmental guidelines support this arrangement.
        - Hybrid Remote: Employees may choose to work part of the week in the office and part from a remote location. The specific days in the office will be coordinated with the team leader to ensure team cohesion and operational efficiency.
        - Temporary Remote: Employees may request temporary remote work arrangements for specific reasons such as health issues, caregiving responsibilities, or extraordinary circumstances like natural disasters.
				
			

Conclusion

Enhancing the accuracy and reliability of chatbot responses is crucial for delivering a superior user experience. By utilizing advanced retrievers like MultiQueryRetriever and SelfQueryRetriever, along with the powerful Contextual Compression technique, we can ensure that users receive precise and contextually relevant information. These methodologies address common issues such as inaccurate responses and hallucinations, ultimately leading to more effective and trustworthy chatbots.

We hope this guide has provided you with valuable insights and practical examples to implement these tools in your own projects, driving better outcomes and higher user satisfaction. Thank you for reading, and we look forward to seeing how you leverage these techniques to enhance your chatbot solutions. Also, do check out the chatbot at the Mamaar.ai site to see these concepts in action!