Sept 1, 2023

ChatGPT’s Knowledge is Two Years Old: What to do if you’re building applications?

By

Yi Ding

It’s official: as of today, ChatGPT’s knowledge cutoff is 2 years old.

Happy 2nd birthday to ChatGPT's knowledge cutoff! 🎂 pic.twitter.com/O1cgRPSP3l
— Yi Ding -- prod/acc (@yi_ding) September 1, 2023

Why doesn’t OpenAI just update it?

There are some fundamental reasons for this: training new LLMs is an expensive — at least tens of millions of dollars — and not guaranteed process. Cleaning new data sets for training is also expensive.

What should I do if I’m building an application that needs more recent data?

You may be tempted to just send ChatGPT the entire wikipedia pages for 2022 and 2023: https://en.wikipedia.org/wiki/2022 You’ll soon run into two limits: 1. there is a limit on the number of words you can send to a large language model (LLM). This is called the “context window.” 2. LLM APIs charge you by the word, so the more you send it, the more expensive your API calls become.

The standard technique is one called “Retrieval Augmented Generation” or RAG. What it is, boiled down very simply, is a process of searching for the right context, giving that context to the LLM, and then getting better results back.

What’s Retrieval Augmented Generation? Search, Give, Get.

For those of us coming from a traditional software development background RAG can sound intimidating, but it really is a simple concept:

Search for the relevant data
Give the data to GPT
Get a better response

Of course,…
— Yi Ding -- prod/acc (@yi_ding) July 28, 2023

At LlamaIndex we are the RAG experts, but there is a whole community of open source projects that are tackling this problem. We have integrated with over 20 open source vector databases and there are other open source tools like LangChain, Semantic Kernel, DSPy, Axilla and others (put your favorites in the comments!) that are attacking the problem in different ways.

Another technique is called fine tuning. Here, you essentially create a new custom model on top of an existing LLM. While LlamaIndex does support fine tuning, it often requires much more work and data:

We are big fans of fine tuning and custom models but knowing when to use RAG and when to use fine tuning, and how to use them in combination, is essential.

Watch this space! https://t.co/vTpWauhj3C
— LlamaIndex 🦙 (@llama_index) August 18, 2023

What if I don’t need more recent data?

That’s totally OK! Not every application needs data that’s more recent than 2021. Before LlamaIndex, I worked on an open source reading education tool, and phonics have definitely not changed in the last two years. If you’re building something to write bedtime stories (❤️ Kidgeni https://kidgeni.com/) or raps (check out TextFX! https://textfx.withgoogle.com/) your application

What if I just want to use ChatGPT with more recent information?

There are a lot of chatbots that use Retrieval Augmented Generation currently. A few of the ones I’ve personally tried are Metaphor https://metaphor.systems/, Perplexity https://www.perplexity.ai/ and Medisearch https://medisearch.io/, and of course Google Bard and BingGPT.

Keep Reading

Adding Native MCP to LlamaIndex Docs
Oct 31, 2025

[ MCP ]

[ +1 ]
SemTools: Are Coding Agents all you Need?
Sept 5, 2025

[ Cli ]

[ +1 ]
The Future of Vibe-Coding Agents
Aug 25, 2025

[ Vibe coding ]

[ +2 ]

Why doesn’t OpenAI just update it?

What should I do if I’m building an application that needs more recent data?

What if I don’t need more recent data?

What if I just want to use ChatGPT with more recent information?

Keep Reading

Start building your first document agent today