Jun 22, 2023

LlamaIndex and Weaviate

By

Jerry Liu

Co-authors:

Jerry Liu (co-founder/CEO of LlamaIndex)
Erika Cardenas (Developer Advocate, Weaviate)

While large language models (LLMs) like GPT-4 have impressive capabilities in generation and reasoning, they have limitations in terms of their ability to access and retrieve specific facts, figures, or contextually relevant information. A popular solution to this problem is setting up a retrieval-augmented generation (RAG) system: combine the language model with an external storage provider, and create an overall software system that can orchestrate the interactions with and between these components in order to create a “chat with your data” experience.

Ready to get started with LlamaCloud?

Explore our free and paid plans today.

Learn more

The combination of Weaviate and LlamaIndex provide the critical components needed to easily setup a powerful and reliable RAG stack, so that you can easily deliver powerful LLM-enabled experiences over your data, such as search engines, chatbots, and more. First, we can use Weaviate as the vector database that acts as the external storage provider. Next, we can use a powerful data framework such as LlamaIndex to help with data management and orchestration around Weaviate when building the LLM app.

In this blog post, we walk through an overview of LlamaIndex and some of the core data management and query modules. We then go through an initial demo notebook.

We’re kicking off a new series to guide you on how to use LlamaIndex and Weaviate for your LLM applications.

An Introduction to LlamaIndex

LlamaIndex is a data framework for building LLM applications. It provides a comprehensive toolkit for ingestion, management, and querying of your external data so that you can use it with your LLM app.

Data Ingestion

On data ingestion, LlamaIndex offers connectors to 100+ data sources, ranging from different file formats (.pdf, .docx, .pptx) to APIs (Notion, Slack, Discord, etc.) to web scrapers (Beautiful Soup, Readability, etc.). These data connectors are primarily hosted on [LlamaHub](https://llamahub.ai/). This makes it easy for users to integrate data from their existing files and applications.

Data Indexing

Once the data is loaded, LlamaIndex offers the ability to index this data with a wide variety of data structures and storage integration options (including Weaviate). LlamaIndex supports indexing unstructured, semi-structured, and structured data. A standard way to index unstructured data is to split the source documents into text “chunks”, embed each chunk, and store each chunk/embedding in a vector database.

Data Querying

Once your data is ingested/stored, LlamaIndex provides the tools to define an advanced retrieval / query “engine” over your data. Our retriever constructs allow you to retrieve data from your knowledge base given an input prompt. A query engine construct allows you to define an interface that can take in an input prompt, and output a knowledge-augmented response — it can use retrieval and synthesis (LLM) modules under the hood.

Some examples of query engine “tasks” are given below, in rough order from easy to advanced:

Semantic Search: Retrieve the top-k most similar items from the knowledge corpus by embedding similarity to the query, and synthesize a response over these contexts.
Structured Analytics: Convert natural language to a SQL query that can be executed
Query Decomposition over Documents: Break down a query into sub-questions, each over a subset of underlying documents. Each sub-question can be executed against its own query engine.

Demo Notebook Walkthrough

Let’s walk through a simple example of how LlamaIndex can be used with Weaviate to build a simple Question-Answering (QA) system over the Weaviate blogs!

The full code can be found in the Weaviate recipes repo.

The first step is to setup your Weaviate client. In this example, we connect to a local Weaviate instance through port http://localhost:8080:

import weaviate
# connect to your weaviate instance
client = weaviate.Client("http://localhost:8080")

The next step is to ingest the Weaviate documentation and parse the documents into chunks. You can choose to use one of our many web page readers to scrape any website yourself — but luckily, the downloaded files are already readily available in the recipes repo.

from llama_index.node_parser import SimpleNodeParser
# load the blogs in using the reader
blogs = SimpleDirectoryReader('./data').load_data()
# chunk up the blog posts into nodes
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(blogs)

Here, we use the SimpleDirectoryReader to load in all documents from a given directory. We then use our SimpleNodeParser to chunk up the source documents into Node objects (text chunks).

The next step is to 1) define a WeaviateVectorStore, and 2) build a vector index over this vector store using LlamaIndex.

# construct vector store
vector_store = WeaviateVectorStore(weaviate_client = client, index_name="BlogPost", text_key="content")
# setting up the storage for the embeddings
storage_context = StorageContext.from_defaults(vector_store = vector_store)
# set up the index
index = VectorStoreIndex(nodes, storage_context = storage_context)

Our WeaviateVectorStore abstraction creates a central interface between our data abstractions and the Weaviate service. Note that the VectorStoreIndex is initialized from both the nodes and the storage context object containing the Weaviate vector store. During the initialization phase, the nodes are loaded into the vector store.

Finally, we can define a query engine on top of our index. This query engine will perform semantic search and response synthesis, and output an answer.

query_engine = index.as_query_engine()
response = query_engine.query("What is the intersection between LLMs and search?")
print(response)

You should get an answer like the following:

The intersection between LLMs and search is the ability to use LLMs to improve search capabilities, such as retrieval-augmented generation, query understanding, index construction, LLMs in re-ranking, and search result compression. LLMs can also be used to manage document updates, rank search results, and compress search results. LLMs can be used to prompt the language model to extract or formulate a question based on the prompt and then send that question to the search engine, or to prompt the model with a description of the search engine tool and how to use it with a special `[SEARCH]` token. LLMs can also be used to prompt the language model to rank search results according to their relevance with the query, and to classify the most likely answer span given a question and text passage as input.

Next Up in this Series

This blog post shared an initial overview of the LlamaIndex and Weaviate integration. We covered an introduction to the toolkits offered in LlamaIndex and a notebook on how to build a simple QA engine over Weaviate’s blog posts. Now that we have a baseline understanding, we will build on this by sharing more advanced guides soon. Stay tuned!

What’s next

Check out Getting Started with Weaviate, and begin building amazing apps with Weaviate.

You can reach out to us on Slack or Twitter, or join the community forum.

Weaviate is open source, and you can follow the project on GitHub. Don’t forget to give us a ⭐️ while you are there!

Keep Reading

Supercharge your LlamaIndex RAG Pipeline with UpTrain Evaluations
Mar 19, 2024

[ AI ]

[ +2 ]
LlamaIndex Newsletter 2024-03-19
Mar 19, 2024

[ LlamaParse ]

[ +3 ]
LlamaIndex Newsletter 2024-03-05
Mar 5, 2024

[ LLM ]

[ +2 ]