LlamaIndex

LlamaIndex Dec 19, 2023

LlamaIndex Newsletter 2023–12–19

What’s up, Llama Followers 🦙,

We’re excited to bring you another week packed with the latest updates, features, exciting community demos, insightful tutorials, guides, and webinars. This week, don’t miss our special holiday workshop on 12/21, where we’ll dive into innovative LLM + RAG use cases with Google Gemini team.

Got a groundbreaking project, compelling article, or captivating video? We’re all ears! Reach out to us at news@llamaindex.ai. Remember to subscribe to our newsletter via our website to get all these exciting developments straight to your inbox.

🤩 First, the highlights:

  1. Google Gemini Partnership: Now offering day 1 support for Gemini API on LlamaIndex, complete with comprehensive cookbooks for advanced RAG capabilities. Tweet.
  2. MistralAI Integrations: Introduced day-0 integrations with MistralAI LLMs and Embedding model for building RAG solutions on LlamaIndex. Notebook, Tweet.
  3. Docugami Multi-Doc Llama Dataset: Launched the Multi-Doc SEC 10Q Dataset by Taqi Jaffri, offering a range of question complexities for advanced RAG research. Docs, Tweet.
  4. Proposition-Based Retrieval: Implemented a new retrieval unit based on propositions, enhancing QA performance with LLMs. Docs, Tweet.
  5. RAG Pipeline Enhancement Guide: Introduced a guide featuring modules like Routing, Query-Rewriting, and Agent Reasoning for more complex QA over documents. Docs.

✨ Feature Releases and Enhancements:

  • We launched a partnership with Google Gemini, offering day 1 support for the Gemini API on LlamaIndex, including full-feature support for Gemini (text and multi-modal) and Semantic Retriever API, complemented by three comprehensive cookbooks: Gemini LLM, Gemini Multi-modal, and Semantic Retriever API, promising advanced RAG capabilities and multi-modal integrations. Tweet.
  • We introduced day-0 integrations with the MistralAI LLMs (mistral-tiny, mistral-small, mistral-medium) and the MistralAI Embedding model for building RAG solutions with LlamaIndex both on Python and Typescript versions. Notebook, Tweet.
  • We launched the COVID-QA dataset on LlamaHub, a human-annotated, substantial set of 300+ QA pairs about COVID from various web articles, complete with source URLs for easy integration into RAG pipelines, offering ample scope for improvement. Docs, Tweet.
  • We launched a new multi-modal template in Create-llama, enabling image input and output generation using the latest GPT-4-vision model from OpenAI, expanding possibilities for diverse use cases. Docs, Tweet.
  • We have introduced Proposition-Based Retrieval in LlamaIndex: Implementing a new retrieval unit based on propositions, as introduced in the ‘Dense X Retrieval’ paper, enhancing QA performance with LLMs by indexing propositions and linking to the underlying text. Docs, Tweet.
  • We partnered with Docugami to launch a new Multi-Doc SEC 10Q Dataset by Taqi Jaffri, aimed at advancing QA datasets for RAG evaluation. This dataset offers a range of question complexities: Single-Doc, Single-Chunk RAG; Single-Doc, Multi-Chunk RAG; and Multi-Doc RAG, addressing the need for more intricate datasets in RAG research. Docs, Tweet.
  • We launched a SharePoint data loader, enabling direct integration of SharePoint files into LLM/RAG pipelines. Docs, Tweet.

👀 Community Demos:

  • MemoryCache: Mozilla’s new experimental project that curates your online experience into a private, on-device RAG application using PrivateGPT_AI and LlamaIndex, enhancing personal knowledge management while maintaining privacy. Website, Repo.
  • OpenBB Finance showcases its enhanced chat widget feature in Terminal Pro, utilizing LlamaIndex’s data chunking combined with Cursor AI for improved large context management and accuracy. Tweet
  • AI Chatbot Starter (from the DataStax team), a web server powered by AstraDB and LlamaIndex, allows easy setup for chatting over web documentation. It can be used as a standalone service or integrated into full-stack applications, with simple credential setup and document ingestion. Repo, Tweet.
  • Na2SQL (by Harshad) to ****Build an End-to-End SQL Analyst App on Streamlit featuring interactive database viewing, SQL query displays, and integration with Llama Index. Blog, Repo.
  • LionAGI (by Ocean Li) is an agent framework for efficient data operations and support for concurrent calls and JSON mode with OpenAI. Check it to integrate it with a Llama Index RAG pipeline for automated AI assistants like an ArXiv research assistant. Docs, Repo.
  • Local RAG for Windows (from Marklysze): A comprehensive resource for integrating advanced LLMs into RAG workflows using Windows Subsystem for Linux, featuring five detailed cookbooks.

🗺️ Guides:

  • Guide for enhancing RAG pipelines with a Query Understanding Layer, featuring modules like Routing, Query-Rewriting, Sub-Question creation, and Agent Reasoning, all designed to enable more complex and ‘agentic’ QA over documents.
  • Guide to Building a Restaurant Recommendation QA System with Gemini to extract structured image data and utilize multi-modal Retrieval-Augmented Generation for enhanced query responses.
  • Guide to building Advanced RAG with Safety Guardrails to create constrained RAG systems with Gemini API’s semantic search, safety features, and Google Semantic Retriever integrations.
  • Guide on Qdrant’s Multitenancy with LlamaIndex on setting up payload-based partitioning for user data isolation in vector services.
  • Guide on using Prometheus — an open-source 13B LLM for RAG Evaluations, comparing it with GPT-4 evaluation with insights on its performance in terms of cost-effectiveness, accuracy, and scoring biases.

✍️ Tutorials:

🎥 Webinars:

🏢 Calling all enterprises:

Are you building with LlamaIndex? We are working hard to make LlamaIndex even more Enterprise-ready and have sneak peeks at our upcoming products available for partners. Interested? Get in touch.