Announcing our Document Research Assistant, a collaboration with NVIDIA!
LlamaIndex

LlamaIndex 2024-09-05

Introducing llama-deploy, a microservice-based way to deploy LlamaIndex Workflows

Today, we're excited to announce the release of llama-deploy, our solution for deploying and scaling your agentic Workflows built with llama-index! llama-deploy is the result of our learning on how best to deploy agentic systems since introducing llama-agents. llama-deploy launches llama-index Workflows as scalable microservices in just a few lines of code. Want to dive right in? Check out the docs, or read on to learn more.

llama-deploy , llama-agents and Workflows

In June we released llama-agents , a way of deploying agentic systems built in llama-index. Our microservice-based approach struck a chord with our community, with over 1K GitHub stars and a flurry of blog posts, tutorials and open-source contributions. Building on that successful experience, in August we released Workflows, a powerful, event-driven architecture for designing any kind of genAI system including agentic systems. The response to Workflows has been similarly positive.

Now we have dovetailed the two mechanisms, producing llama-deploy , which combines the ease of building LlamaIndex Workflows with a simple, native deployment mechanism for them. llama-deploy builds on the ideas and code of llama-agents , which has been folded into the new repo.

All the ideas developers loved about llama-agents are still there: everything is async-first, everything’s an independently-scalable microservice, and a central hub manages state and service registration. Anybody who used llama-agents will be able to pick up llama-deploy in seconds.

llama-deploy Architecture

We believe multi-agent systems need scalability, flexibility, and fault-tolerance in addition to easy deployment, and that’s what our architecture delivers. llama-deploy ’s architecture will be very familiar to anyone who worked with its predecessor, llama-agents .

In llama_deploy, there are several key components that make up the overall system

  • message queue -- the message queue acts as a queue for all services and the control plane. It has methods for publishing methods to named queues, and delegates messages to consumers.
  • control plane -- the control plane is a the central gateway to the llama_deploy system. It keeps track of current tasks and the services that are registered to the system. The control plane also performs state and session management and utilizes the orchestrator.
  • orchestrator -- The module handles incoming tasks and decides what service to send it to, as well as how to handle results from services. By default, the orchestrator is very simple, and assumes incoming tasks have a destination already specified. Beyond that, the default orchestrator handles retries, failures, and other nice-to-haves.
  • workflow services -- Services are where the actual work happens. A services accepts some incoming task and context, processes it, and publishes a result. When you deploy a workflow, it becomes a service.

Key Features of llama-deploy

  1. Seamless Deploymentllama-deploy allows you to deploy llama-index workflows with minimal changes to your existing code, simplifying the transition from development to production.
  2. Scalability: The microservices architecture of llama-deploy enables easy scaling of individual components, ensuring your system can handle growing demands.
  3. Flexibility: The hub-and-spoke design of llama-deploy empowers you to swap out components (like message queues) or add new services without disrupting the entire system.
  4. Fault Tolerance: Built-in retry mechanisms and failure handling in llama-deploy ensure your multi-agent AI system remains robust and resilient in production environments.
  5. State Management: The control plane in llama-deploy manages state across services, simplifying the development of complex, multi-step processes.
  6. Async-First: Designed with high-concurrency scenarios in mind, llama-deploy is well-suited for real-time and high-throughput applications.

Getting Started with llama-deploy

To get started with llama-deploy, you can follow the comprehensive documentation available on the project's GitHub repository. The README provides detailed instructions on installation, deployment, and usage of the library.

Here's a quick glimpse of how you can deploy a simple workflow using llama-deploy:

First, deploy core services (includes a control plane and message queue):

from llama_deploy import (
    deploy_core,
    ControlPlaneConfig,
    SimpleMessageQueueConfig,
)

async def main():
		# Deploy the workflow as a service
		await deploy_core(
		    control_plane_config=ControlPlaneConfig(),
		    message_queue_config=SimpleMessageQueueConfig(),
		)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Then, deploy your workflow:

from llama_deploy import (
    deploy_workflow,
    WorkflowServiceConfig,
    ControlPlaneConfig,
)
from llama_index.core.workflow import Workflow, StartEvent, StopEvent, step

# Define a sample workflow
class MyWorkflow(Workflow):
    @step()
    async def run_step(self, ev: StartEvent) -> StopEvent:
        arg1 = str(ev.get("arg1", ""))
        result = arg1 + "_result"
        return StopEvent(result=result)

async def main():
		# Deploy the workflow as a service
		await deploy_workflow(
		    MyWorkflow(),
		    WorkflowServiceConfig(
		        host="127.0.0.1", port=8002, service_name="my_workflow"
		    ),
		    ControlPlaneConfig(),
		)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Once deployed, you can interact with your deployment using a client:

from llama_deploy import LlamaDeployClient, ControlPlaneConfig

# client talks to the control plane
client = LlamaDeployClient(ControlPlaneConfig())

session = client.get_or_create_session("session_id")

result = session.run("my_workflow", arg1="hello_world")
print(result)  # prints "hello_world_result"

Resources

See our examples and docs and API reference for more details, including using different messaging queues, deploying using Docker and Kubernetes, deploying nested workflows, and more!

The Future of Multi-Agent AI Systems

With the release of llama-deploy, we believe we're taking a significant step forward in the world of multi-agent AI systems. By empowering developers to build, deploy, and scale complex AI applications with ease, we're paving the way for more innovative and impactful AI-powered solutions.

Some features that are directly on our roadmap

  • Streaming support
  • Improved resiliency and failure handling
  • Better config/setup options
    • yaml
    • environment variables
    • CLI

We're excited to see what the community will create with llama-deploy, and we're committed to continuously improving and expanding the library to meet the evolving needs of the AI ecosystem.