Talk with your documents locally using langchain and llama 2

Posted By :Sumit Kumar |23rd February 2024

In recent years, the AI industry has grown significantly and is still growing rapidly day by day. Slowly it's becoming an integral part of our daily life, and giving rise to our productivity while cutting off the required time and effort. Whether we are cooking and need a professional instructor or to automate large machines in industries, your saviour AI is here.


Suppose, you have an exam coming up in a week and you are still left with most of the topics to prepare or you just need to check your answer whether it's correct or not. Riffling through 100 and 1000 pages is clearly going to cost you your precious time, why not just ask the question to your book and it spits out the answer ? But that's impossible...

Well, No! Not when AI is around.


In a few minutes you will be able to do that. We will create an AI tool that will allow you to feed your document(s) and ask it questions. 

In this post, we hope to give you with

  1. A brief overview of Llama-2
  2. Brief overview of langchain.
  3. Brief of huggingFace.
  4. How to ingest your local document.
  5. Use Llama-2 to fetch answers from the uploaded documents.


1. A brief overview of Llama-2:

Llama 2 is an open source large language model (LLM) provided by Meta for research and commercial use. 

Llama 2 comes in three variants:

  1. Llama 2 7B
  2. Llama 2 13B
  3. Llama 2 70B


2. Brief overview of langchain:

Langchain is a software framework for large language models (LLMs) designed to simplify the creation of applications using AI and LLM.


3. Brief of HuggingFace:

HuggingFace is a platform where the machine learning community collaborates on models, datasets, and applications. We are using it to download the Llama-2 7b chat model.


4. How to ingest your local document:

The process of ingesting document involves following steps:


Split Text into chunks:

Chunking is the process of breaking large pieces of texts into smaller segments.

    # driver to create chunks of text and python file
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    python_splitter = RecursiveCharacterTextSplitter.from_language(
        language=Language.PYTHON, chunk_size=880, chunk_overlap=200
    # creating chunks of texts
    texts = text_splitter.split_documents(text_documents)

Create embeddings:

An embedding is a low-dimensional space that can be used to translate high-dimensional vectors.Ideally, an embedding captures some of the input's semantics by clustering semantically comparable inputs in the embedding space. Embeddings can be learned and reused across models.

   # Create embeddings
   embeddings = HuggingFaceInstructEmbeddings(
       model_kwargs={"device": device_type},)

Store the embeddings:

we are using chromaDB to store the embeddings.

   db = Chroma.from_documents(

4. Use Llama 2 to fetch answer from your document:

To QA with our document we have to follow these steps:

Load the vectors:

vectors are used to represent text or data in a numerical form that the model can understand and process. This representation is known as an embedding.

   # load the vectorstore
   db = Chroma(

   retriever = db.as_retriever()

Create Prompt Template:

   template = """Use the following pieces of context to answer the question at the end. If you don't know the answer,\
   just say that you don't know, don't try to make up an answer.


   Question: {question}
   Helpful Answer:"""

   prompt = PromptTemplate(
       input_variables=["history", "context", "question"], template=template)

Perform QA:

   qa = RetrievalQA.from_chain_type(
       # chain_type="refine",
       chain_type_kwargs={"prompt": prompt, "memory": memory},

To perform interactive QA, loop the process:

   # Interactive questions and answers
   while True:
       query = input("\nEnter a query: ")
       if query == "exit":
       # Get the answer from the chain
       res = qa(query)
       answer, docs = res["result"], res["source_documents"]

       # Print the result
       print("\n\n> Question:")
       print("\n> Answer:")



About Author

Sumit Kumar

Sumit's expertise shines as a Backend Developer, showcasing remarkable skill in Python and Django. Beyond this, he boasts a strong command of supplementary technologies such as HTML, CSS, MySQL, and JavaScript. His work ethic is truly impressive, and his resolute commitment empowers him to operate with great efficiency. Possessing a discerning ability to spot and rectify application issues, he guarantees peak performance.

Request For Proposal

[contact-form-7 404 "Not Found"]

Ready to innovate ? Let's get in touch

Chat With Us