Announcing our Document Research Assistant, a collaboration with NVIDIA!
LlamaIndex

LlamaIndex 2024-03-13

Launching the first GenAI-native document parsing platform

Our mission at LlamaIndex is to connect the world’s data to the power of LLMs, and today we’re pleased to announce our latest big step towards that goal with the world’s first GenAI-native document parsing platform, LlamaParse.

We launched the first public version of LlamaParse 3 weeks ago and the response has been huge with well over 2,000 users parsing over 1 million pages! We’ve been hard at work releasing hundreds of bug fixes and new features since then, and today we’re releasing a game-changing new feature, GenAI-powered parsing instructions.

Using LLMs for world-class parsing

The key insight behind parsing instructions is that you know what kind of documents you have, so you already know what kind of output you want. Why make the parser guess when an LLM-enabled parser can take simple, natural-language instructions from you and provide radically better parsing results?

Example 1: rich table support

Since we first released LlamaParse it has featured industry-leading table extraction capabilities. Under the hood, this has been using LLM intelligence since the start. It seamlessly integrates with the advanced indexing/retrieval capabilities that the open-source framework offers, enabling users to build state-of-the-art document RAG. Now with JSON mode (see below) and parsing instructions, you can take this even further.

Example 2: parsing comic books

Parsing translated manga presents a particular challenge for a parser since a regular parser interprets the panels as cells in a table, and the reading order is right-to-left even though the book is in English, as shown in this extract from "The manga guide to calculus", by Hiroyuki Kojima:

Using LlamaParse, you can give the parser plain, English-language instructions on what to do:

The provided document is a manga comic book. 
Most pages do NOT have title. It does not contain tables. 
Try to reconstruct the dialogue happening in a cohesive way.

(You can see the full code in our demonstration notebook, including what it looks like to parse this without the instructions)

The result is a perfect parse!

# The Asagake Times

Sanda-Cho Distributor

A newspaper distributor?

Do I have the wrong map?

Example 3: mathematical equations

Another challenging format for parsing is complex mathematical equations (by coincidence, the manga we picked as an example is all about how to do mathematics):

To parse this, we take the same instructions as before and add one sentence: Output any math equation in LATEX markdown (between $$) . The result of parsing is clear LaTeX instructions, which render the equations perfectly:

Anything an LLM can do, our parser can do

You can use this kind of natural-language instruction to do all sorts of advanced pre-processing on your documents — simplify language, include sentiment analysis, translate them to another language! We can’t wait to see what you do with the power of LlamaParse.

JSON mode

Parsing instructions are definitely the headline feature, but we have dozens of other features new to LlamaParse since launch. A standout is JSON mode, a rich programmatic format perfect for when you want more precision about exactly what you want to parse out. JSON mode’s output includes

  • the full structure of the document that was parsed
  • tables, text and headings marked
  • tables are available as CSV and JSON
  • images are marked and available for extraction (see below)
  • a wealth of metadata about each node

If you are building a custom RAG strategy JSON mode gives you everything you need to build it. Check out our JSON mode examples!

Image extraction

One of the best features of JSON mode is image extraction: every page that contains images comes with a list of images, marked up with metadata including their size and position on the page, and you can retrieve these images directly and include them in your indexing to extract even more information from your complex, image-heavy documents.

Expanded document types

We launched LlamaParse with exceptional support for PDFs, and we have continued to expand its capability every day. We’ve also added support for a large array of document types:

  • Microsoft Word (.doc, .docx)
  • Microsoft PowerPoint (.pptx)
  • Rich Text Format (.rtf)
  • Apple Pages (.pages)
  • Apple Keynote (.key)
  • ePub books (.epub)
  • And dozens more!

All of these document types “just work” without any additional work on your part, and we are constantly expanding the list of supported file types. Check out this demo notebook where we demonstrate parsing a PowerPoint file.

And one more thing… unlimited parsing!

The huge demand for LlamaParse has included many people asking to go beyond our free daily limits via paid plans, and we’re happy to answer those requests. Our pricing is simple:

  • 7000 pages/week are free
  • Additional pages are $0.003/page, or $3 per 1000 pages
  • Maximum size for one document is 750 pages

And of course we retain our generous free tier of 1000 pages/day.

The public version of LlamaParse is a hosted service. If you want to extend LlamaParse capabilities to build advanced document RAG, or wish to deploy LlamaParse in a private cloud, get in touch.