LlamaIndex
Talk to us

Jerry Liu Sep 19, 2024

Introducing LlamaParse Premium

LlamaParse is the best document parser on the market for your context-augmented LLM application. Since we launched it in February, we’ve crossed 50 million pages processed and 1M+ downloads on PyPi. It is capable of crunching any document - PDF, Powerpoint, Excel. We’ve also launched a wide range of different modes, ranging from a fast/accurate mode optimized for efficient but accurate text+table processing, to multimodal modes leveraging the latest multimodal models for understanding complex visual documents, like investor slide decks and product manuals.

A tradeoff is that our fast/accurate modes are fantastic for parsing long text and tables but not as good for visual content, and our multimodal mode is fantastic for visual content but not as good for text/tables.

Today you get the best of both worlds with LlamaParse Premium Mode. Premium mode leverages state-of-the-art multimodal models and heuristic text parsing techniques to extract text from the most complex documents, outperforming vanilla models like Sonnet-3.5. This lets users build context-augmented RAG/agent applications with even higher accuracy and lower hallucination rates.

Try it out today.

Key Features

LlamaParse Premium comes with the following bells and whistles:

  1. Outputs all content, from text to tables to images, into well-structured markdown
  2. Translates diagrams into Mermaid format ( between mermaid and tags)
  3. Translates equations into LateX
  4. Big reduction in missing content
  5. Captions all images (between [ and ] tags)
  6. Much better heading/subheading determination than Accurate mode.

Existing LlamaParse features, like using parsing instructions to “prompt” the parser, and webhooks to directly sync parsed data to your application, are all available with LlamaParse Premium.

Results

Let’s see some examples in action showcasing LlamaParse Premium mode on complex document properties: tables, diagrams, and reading order.

For some of these examples, we compare with raw GPT-4o and text mode.

Table

Current multimodal models struggle to extract out long tables from images without hallucinations. LlamaParse Premium is able to bypass these hallucinations.

Here is our usual caltrain schedule sample, where our Premium mode nailed it!

Source:

GPT-4o

LlamaParse Sonnet

Almost perfect but the model missed some heading, hallucinate one value.

LlamaParse Premium

Diagram

LlamaParse Premium outputs diagrams in Mermaid, creating a compact representation for LLMs to understand.

This allow your RAG pipeline to answer quest on diagram in document. Here is a sample financial organization corporate structure:

Rendered Mermaid Diagram:

Equation

LlamaParse Premium outputs equations as LateX between $$ symbols.

Sample input:

Outputted markdown

Rendered Markdown

Reading Order

Multimodal models are very good at identifying document reading order out of the box, but tend to hallucinate over the text itself. On the other hand, traditional parsing approaches are fine at parsing text but fail to grasp complex order.

LlamaParse premium preserves both. Here is the Xanax UK notice. While LlamaParse Premium missed the bottom table along with all our baselines, it outperformed both our accurate mode (better reading order, no missing content) and gpt-4o (all content is factually the content of the doc).

Source:

Accurate mode: There are reading order issues where the different columns are mixed up.

GPT-4o: The reading order is plausible and retains the 4 column structure but the content is hallucinated

Sonnet 3.5: The reading order is plausible and retains the 4 column structure but the content is hallucinated (although less than GPT4o)

Premium mode: Resolves both reading order and hallucination issues. Unfortunately it misses the last table.

As a result, your RAG pipeline can better answer questions over these data types compared to competing solutions.

We’ve already shown the power of good parsing for good RAG, for instance in our multimodal notebooks. We encourage you to try out LlamaParse Premium over your complex documents and see how RAG response quality compares to baseline parsing approaches over complex data.

Next Steps

LlamaParse Premium Mode operates on top of the latest multimodal models - this means that as multimodal model capabilities get better (from Sonnet-3.5 to Pixtral, o1, and more), LlamaParse Premium is better. We are of course still actively maintaining and improving our other parsing modes.

It is currently available at 7.5c a page. Note: This is a bit higher than our default parsing mode, so if you’re trying it out for the first time, try out a small document first!

You can try LlamaParse Premium today. Signup for an account and access the parsing playground here: https://cloud.llamaindex.ai/parse. You can either directly view the parsed results in our parsing playground or directly toggle the setting through our LlamaParse SDK.

LlamaParse Premium is integrated within LlamaCloud, our enterprise RAG platform. If you're interested in using this in an enterprise setting, come talk to us.