LlamaIndex • Nov 15, 2023
Announcing LlamaIndex 0.9
Our hard-working team is delighted to announce our latest major release, LlamaIndex 0.9! You can get it right now:
pip install --upgrade llama_index
In LlamaIndex v0.9, we are taking the time to refine several key aspects of the user experience, including token counting, text splitting, and more!
As part of this, there are some new features and minor changes to current usage that developers should be aware of:
- New
IngestionPipline
concept for ingesting and transforming data - Data ingestion and transforms are now automatically cached
- Updated interface for node parsing/text splitting/metadata extraction modules
- Changes to the default tokenizer, as well as customizing the tokenizer
- Packaging/Installation changes with PyPi (reduced bloat, new install options)
- More predictable and consistent import paths
- Plus, in beta: MultiModal RAG Modules for handling text and images!
Have questions or concerns? You can report an issue on GitHub or ask a question on our Discord!
Read on for more details on our new features and changes.
IngestionPipeline — New abstraction for purely ingesting data
Sometimes, all you want is to ingest and embed nodes from data sources, for instance if your application allows users to upload new data. New in LlamaIndex V0.9 is the concept of an IngestionPipepline
.
An IngestionPipeline
uses a new concept of Transformations
that are applied to input data.
What is a Transformation
though? It could be a:
- text splitter
- node parser
- metadata extractor
- embeddings model
Here’s a quick example of the basic usage pattern:
from llama_index import Document
from llama_index.embeddings import OpenAIEmbedding
from llama_index.text_splitter import SentenceSplitter
from llama_index.extractors import TitleExtractor
from llama_index.ingestion import IngestionPipeline, IngestionCache
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=25, chunk_overlap=0),
TitleExtractor(),
OpenAIEmbedding(),
]
)
nodes = pipeline.run(documents=[Document.example()])
Transformation Caching
Each time you run the same IngestionPipeline
object, it caches a hash of the input nodes + transformations and the output of that transformation for each transformation in the pipeline.
In subsequent runs, if there is a cache hit, that transformation will be skipped and the cached result will be used instead. The greatly speeds up duplicate runs, and can help improve iteration times when deciding which transformations to use.
Here’s an example with a saving and loading a local cache:
from llama_index import Document
from llama_index.embeddings import OpenAIEmbedding
from llama_index.text_splitter import SentenceSplitter
from llama_index.extractors import TitleExtractor
from llama_index.ingestion import IngestionPipeline, IngestionCache
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=25, chunk_overlap=0),
TitleExtractor(),
OpenAIEmbedding(),
]
)
# will only execute full pipeline once
nodes = pipeline.run(documents=[Document.example()])
nodes = pipeline.run(documents=[Document.example()])
# save and load
pipeline.cache.persist("./test_cache.json")
new_cache = IngestionCache.from_persist_path("./test_cache.json")
new_pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=25, chunk_overlap=0),
TitleExtractor(),
],
cache=new_cache,
)
# will run instantly due to the cache
nodes = pipeline.run(documents=[Document.example()])
And here’s another example using Redis as a cache and Qdrant as a vector store. Running this will directly insert the nodes into your vector store and cache each transformation step in Redis.
from llama_index import Document
from llama_index.embeddings import OpenAIEmbedding
from llama_index.text_splitter import SentenceSplitter
from llama_index.extractors import TitleExtractor
from llama_index.ingestion import IngestionPipeline, IngestionCache
from llama_index.ingestion.cache import RedisCache
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client
client = qdrant_client.QdrantClient(location=":memory:")
vector_store = QdrantVectorStore(client=client, collection_name="test_store")
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=25, chunk_overlap=0),
TitleExtractor(),
OpenAIEmbedding(),
],
cache=IngestionCache(cache=RedisCache(), collection="test_cache"),
vector_store=vector_store,
)
# Ingest directly into a vector db
pipeline.run(documents=[Document.example()])
# Create your index
from llama_index import VectorStoreIndex
index = VectorStoreIndex.from_vector_store(vector_store)
Custom Transformations
Implementing custom transformations is easy! Let’s add a transform to remove special characters from the text before calling embeddings.
The only real requirement for transformations is that they must accept a list of nodes and return a list of nodes.
import re
from llama_index import Document
from llama_index.embeddings import OpenAIEmbedding
from llama_index.text_splitter import SentenceSplitter
from llama_index.ingestion import IngestionPipeline
from llama_index.schema import TransformComponent
class TextCleaner(TransformComponent):
def __call__(self, nodes, **kwargs):
for node in nodes:
node.text = re.sub(r'[^0-9A-Za-z ]', "", node.text)
return nodes
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=25, chunk_overlap=0),
TextCleaner(),
OpenAIEmbedding(),
],
)
nodes = pipeline.run(documents=[Document.example()])
Node Parsing/Text Splitting — Flattened and Simplified Interface
We’ve made our interface for parsing and splitting text a lot cleaner.
Before:
from llama_index.node_parser import SimpleNodeParser
from llama_index.node_parser.extractors import (
MetadataExtractor, TitleExtractor
)
from llama_index.text_splitter import SentenceSplitter
node_parser = SimpleNodeParser(
text_splitter=SentenceSplitter(chunk_size=512),
metadata_extractor=MetadataExtractor(
extractors=[TitleExtractor()]
),
)
nodes = node_parser.get_nodes_from_documents(documents)
After:
from llama_index.text_splitter import SentenceSplitter
from llama_index.extractors import TitleExtractor
node_parser = SentenceSplitter(chunk_size=512)
extractor = TitleExtractor()
# use transforms directly
nodes = node_parser(documents)
nodes = extractor(nodes)
Previously, the NodeParser
object in LlamaIndex had become extremely bloated, holding both text splitters and metadata extractors, which caused both pains for users when changing these components, and pains for us trying to maintain and develop them.
In V0.9, we have flattened the entire interface into a single TransformComponent
abstraction, so that these transformations are easier to setup, use, and customize.
We’ve done our best to minimize the impacts on users, but the main thing to note is that SimpleNodeParser
has been removed, and other node parsers and text splitters have been elevated to have the same features, just with different parsing and splitting techniques.
Any old imports of SimpleNodeParser
will redirect to the most equivalent module, SentenceSplitter
.
Furthermore, the wrapper object MetadataExtractor
has been removed, in favour of using extractors directly.
Full documentation for all this can be found below:
Tokenization and Token Counting — Improved defaults and Customization
A big pain point in LlamaIndex previously was tokenization. Many components used a non-configurable gpt2
tokenizer for token counting, causing headaches for users using non-OpenAI models, or even some hacky fixes like this for OpenAI models too!
In LlamaIndex V0.9, this global tokenizer is now configurable and defaults to the CL100K tokenizer to match our default GPT-3.5 LLM.
The single requirement for a tokenizer is that it is a callable function, that takes a string, and returns a list.
Some examples of configuring this are below:
from llama_index import set_global_tokenizer
# tiktoken
import tiktoken
set_global_tokenizer(
tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)
# huggingface
from transformers import AutoTokenizer
set_global_tokenizer(
AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta").encode
)
Furthermore, the TokenCountingHandler
has gotten an upgrade with better token counting, as well as using token counts from API responses directly when available.
Packaging — Reduced Bloat
In an effort to modernize the packaging of LlamaIndex, V0.9 also comes with changes to installation.
The biggest change here is that LangChain
is now an optional package, and will not be installed by default.
To install LangChain
as part of your llama-index installation you can follow the example below. There are also other installation options depending on your needs, and we are welcoming further contributions to the extras in the future.
# installs langchain
pip install llama-index[langchain]
# installs tools needed for running local models
pip install llama-index[local_models]
# installs tools needed for postgres
pip install llama-index[postgres]
# combinations!
pip isntall llama-index[local_models,postgres]
If you were previously importing langchain
modules in your code, please update your project packaging requirements appropriately.
Import Paths — More Consistent and Predictable
We are making two changes to our import paths:
- We’ve removed uncommonly used imports from the root level to make importing
llama_index
faster - We now have a consistent policy for making “user-facing” concepts import-able at level-1 modules.
from llama_index.llms import OpenAI, ...
from llama_index.embeddings import OpenAIEmbedding, ...
from llama_index.prompts import PromptTemplate, ...
from llama_index.readers import SimpleDirectoryReader, ...
from llama_index.text_splitter import SentenceSplitter, ...
from llama_index.extractors import TitleExtractor, ...
from llama_index.vector_stores import SimpleVectorStore, ...
We still expose some of the most commonly used modules at the root level.
from llama_index import SimpleDirectoryReader, VectorStoreIndex, ...
MultiModal RAG
Given the recent announcements of the GPT-4V API, multi-modal use cases are more accessible than ever before.
To help users use these features, we’ve started to introduce a number of new modules to help support use-cases for MultiModal RAG:
- MultiModal LLMs (GPT-4V, Llava, Fuyu, etc.)
- MultiModal Embeddings (i.e clip) for join image-text embedding/retrieval
- MultiModal RAG, combining indexes and query engines
Our documentation has a full guide to multi-modal retrieval.
Thanks for all your support!
As an open-source project we couldn’t exist without our hundreds of contributors. We are so grateful for them and the support of the hundreds of thousands of LlamaIndex users around the world. See you on the Discord!