Claude officially released “Agent Build Guide” (Chinese version) | AI Toolset

Category: Ai Image Tool

This article mainly describes Anthropic’s annual summary and design principles in building large language models (LLMs) and agents. The article was written by Anthropic and includesFeatures of successful solutions、Definition of agent、When to use a weapon、Use of frameworks、Building modules and workflows、Workflow mode、Application scenarios of agentsas well asPractical casesetc sectors. The article emphasizes the importance of simplicity, transparency and well-designed agent-computer interface (ACI) interfaces, and provides tool development best practices and tips for plug-in tools engineering details. Based on the above content, Anthropic shares how to build valuable agents and provides practical suggestions for developers.

December 20, 2024

Over the past year, Anthropic has worked with multiple industry teams to build large language model (LLM) agents. The most successful solution is not to use complex frameworks or specialized software packages. Instead, they are built using simple, composable modules. In this article, Anthropic shares the experiences learned from working with customers and building agents themselves, and provides developers with suggestions on how to build effective agents.

What is an Agent?

What is an Agent? “Agent” can have multiple definitions. Some customers define Agents as completely autonomous systems that can run independently for a long time and use various tools to complete complex tasks. Others describe Agent as following a predefined workflow and being more standardized. In Anthropic, all of these variants are classified asAgent systembut inWorkflowandactingAn important architectural difference is drawn between:

WorkflowIt is a system where LLM and tools are orchestrated based on predefined code paths.
actingIt is a system that LLM dynamically plans its own processes and tools, and can control how to complete tasks.

Below, we will discuss these two types of proxy systems in detail. In Appendix 1 (“Agent in Practice”), two areas in which customers find particularly valuable to use these systems are presented.

When (and when) not to use Agent?

When building LLM applications, it is recommended to find the simplest solution possible and only increase complexity when needed. This may mean that no proxy system is built at all. Agent systems often delay and consume costs for better task performance, and it needs to be considered to weigh whether this makes sense.

When more complexity is required, workflows provide predictability and consistency for well-defined tasks, and when large-scale flexibility and model-driven decision making is required, Agent is a better choice. However, for many applications, optimizing a single LLM call, in conjunction with retrieval and context examples is usually sufficient.

When and how to use the framework?

There are many frameworks that can make proxy systems easier to implement, including:

LangChain的LangGraph；
Amazon BedrockAI proxy framework；
Riveta drag-and-drop GUI LLM workflow builder;
Velluma GUI tool for building and testing complex workflows.

These frameworks simplify the operation process by simplifying the call to LLM, quickly writing and parsing related tool plug-ins, chain calls and other standardized underlying tasks. However, they create additional layers of abstraction, which may mask the underlying hints and response content, making debugging more difficult. They may allow developers to increase the complexity of their work in simple settings.

We recommend that developers use the LLM API first: many commonly used patterns can be implemented in just a few lines of code. If you really want to use the framework, make sure you understand the underlying code. Mistakes for underlying content are common sources of customer errors.

Check out ourOfficial manualTo get some example implementations.

Building modules, workflows, and agents

Common patterns of proxy systems encountered in production will be explored in this section. We will start with the basic building block – enhanced LLM, gradually increasing complexity, from simple combination workflows to autonomous agents.

Building module: Enhanced LLM

The basic building block of the proxy system is an LLM that is enhanced through enhanced functions such as retrieval, tools and memory. Today’s models can automatically use these capabilities—generate search queries independently, select the right tools, and decide what information to keep.

We recommend focusing on two key aspects of implementation: customizing specific use cases based on usage scenarios and ensuring a simple and well-documented interface for LLM. While there are many ways to implement these enhancements, one is to use the recently released model of AnthropicContext Agreement(Model Context Protocol), which supports developers through simpleClient implementationIntegrate with various third-party tool ecosystems that leverage the agreement.

In the remainder of this article, it will be assumed that these enhancements are accessible to each LLM call.

Workflow: Prompt chain workflow

The prompt chain breaks down a task into a series of steps, where each LLM call processes the output of the previous call. You can add programmatic checks in any intermediate step (see “gate” in the figure below) to make sure the process goes as expected.

Applicable scenarios: This workflow is ideal for scenarios where tasks can be easily and clearly broken down into fixed subtasks. The main purpose is to make a trade-off between reply speed and higher accuracy by making each LLM call easier.
Example of prompt chain application：
- Generate marketing copy and translate it into different languages.
- Write an outline of a document, check if the outline meets certain criteria, and then write the document based on the outline.

Workflow: Routing workflow

The route classifies the input and directs the input to subsequent specialized tasks. Workflows allow separation of concerns and build more professional tips. Without such a workflow, optimization for one input may compromise the performance of other inputs.

Applicable scenarios:Routing is suitable for complex tasks with clear categories that are suitable for processing separately, and classifications can be accurately processed by LLM or more traditional classification models/algorithms.
Applicable examples：
- Direct different types of customer service queries (general questions, refund requests, technical support) to different downstream processes, tips, and tools.
- Route simple/FAQs to smaller models like Claude 3.5 Haiku, route difficult/unusual problems to more powerful models like Claude 3.5 Sonnet, to optimize cost and speed.

Workflow: Parallelized workflow

LLMs can sometimes complete a task at the same time and programmatically summarize their outputs. This working fluid is reflected in two key variants:

Sectioning: Decompose tasks into independent subtasks and runs in parallel.
Voting: Run the same task multiple times to get different outputs.

Applicable scenarios: Parallelization is effective when segmented subtasks can be parallelized to increase speed, or when multiple perspectives are required to try to obtain more reliable results. For complex tasks with multiple considerations, LLM performs better when each consideration is processed with a separate LLM call.
Applicable examples：
- Sectioning (Task Disassembly)：
  - Security protection, where one model handles user queries, while the other filters to find inappropriate content or requests. This usually performs better than having the same LLM call secure and core response at the same time.
  - Automated evaluation is used to evaluate the performance of LLMs at a given prompt, and each LLM is used to evaluate different aspects of the performance of the model.
- Voting (voting)：
  - Review vulnerabilities in the code, and if they find that they have problems, multiple different prompts to review and mark the code.
  - Evaluate whether a given content is inappropriate, use multiple prompts to evaluate different aspects or set different voting thresholds to balance the accuracy of the test.

Workflow: Coordinator-executor workflow

In the coordinator-executor workflow, a central LLM dynamically decomposes tasks, delegates them to worker LLMs (Worker LLMs), and takes their results into account integratively.

Applicable scenarios: Suitable for complex tasks that cannot predict the required subtasks (for example, in encoding, the number of files that need to be changed and the changes internally in each file, may depend on the task itself). Although its flowchart is very similar to Parallelization, the key difference is that it is more flexible—the subtasks are not predefined, but are determined by the Orchestrator conductor based on specific inputs.
Applicable examples：
- An encoded product that makes complex changes to multiple files at a time.
- Search tasks involving collecting and analyzing information from multiple sources to find possible relevant information.

Workflow: Evaluator-Optimizer Workflow

In this workflow, one LLM call is responsible for generating the response, while the other provides evaluation and feedback in the loop.

Applicable scenarios: This workflow is particularly effective when there are clear evaluation criteria and the value of iterative refinement can be measured. There are two signs of good adaptability. First, when humans express feedback, the response of LLM can be significantly improved; second, LLM can provide such feedback. This is similar to the iterative writing process that human writers may go through when writing refined documents.
Applicable examples：
- Literary translation, where some subtleties are translated into LLMs may not be captured initially, but evaluating LLMs can provide useful suggestions for improvement.
- Complex search tasks require multiple rounds of search and analysis to collect comprehensive information, and the LLM responsible for the evaluation decides whether further search is required.

acting

As LLM matures key abilities such as understanding complex inputs, conducting inference and planning, using tools, and correcting errors from errors, agents have begun to emerge in production.

The beginning of the proxy work, commands from human users, or interactive discussions with human users. Once the task is clear, the agent plans and acts independently, and may need to ask the human beings to obtain more information or judgment. During execution, it is crucial for the agent to obtain a “real situation” from the environment (such as tool call results or code execution) to evaluate its progress. The agent can then pause to get human feedback when encountering obstacles. Tasks are usually terminated upon completion, but often include termination conditions (such as maximum number of iterations) to maintain control.

Agents can handle complex tasks, but their implementation is usually simple. They are usually just LLMs that use tools in a loop based on environmental feedback. Therefore, it is crucial to design a well-developed and clear tool set and documentation. Best practices for tool development are detailed in Appendix 2 (“Prompt Engineering your Tools”.

(Independent Agent)

Applicable scenarios: Agents can be used for open-ended problems where the number of steps required is difficult or unpredictable and cannot specify a fixed path. LLM may run multiple loops and you must have a certain level of trust in its decision-making ability. The autonomy of the agent makes it particularly ideal when performing tasks in a trusted environment. The autonomous nature of the agent means higher costs and the possibility of accumulated errors. Extensive testing is recommended in a sandbox environment and proper security protection is set.
Applicable examples: Here are some examples from our own implementation:

(Advanced process of coding agent)

Combination and customization

These paradigms are not strictly specified. They are common patterns that developers can build and combine to suit different use cases. Like any LLM function, the key to success is to measure performance and implement iteratively. Repeat: Only when results can be significantly improved, should be considered to increase complexity.

Summarize

Success in the LLM field is not about building the most complex systems. It is about building a suitable system for needs. Start with simple tips, optimize with comprehensive evaluation, and add a multi-step proxy system only if a simpler solution is not enough to cope with.

When implementing agents, we try to follow three core principles:

Ensure agent designSimple。
Prioritize by explicitly showing the agent’s planning stepstransparency。
Through comprehensive toolsDocumentation and testingcarefully create your agent-computer interface (ACI) interface.

Frameworks can help you get started quickly, but don’t hesitate to reduce the abstraction layer when entering production and try to build with basic components. By following these principles, you can create agents that are not only powerful but reliable, maintainable and trusted by users.

Acknowledgements

Written by Erik Schluntz and Barry Zhang. This work draws on our experience building agents at Anthropic and valuable insights shared by our customers, and we are deeply grateful for this.

Get“Agent Build Guide” PDF Original Filescan the code to follow and reply:241222

Appendix 1: Agents in Practice

Our cooperation with our customers reveals two particularly promising applications of AI agents, demonstrating the actual value of the above model. Both applications illustrate that agents are most valuable for tasks that require dialogue and action, have clear standards for success, are able to feedback loops and integrate valuable human supervision.

A. Customer Support

Customer support combines the familiar chatbot interface and enhances capabilities through tool integration. This is a natural scenario for more open agents because:

Follow the dialogue process, interact naturally, and require access to external information and operations;
Tools can be integrated to extract customer data, order history, and knowledge base articles;
Operations such as issuing refunds or updating work orders can be handled in a programmatic manner;
A user-defined solution is used to clearly measure whether agents solve the problem.

Some companies demonstrate the feasibility of this approach through usage-based pricing models that charge only successful solutions, demonstrating confidence in the effectiveness of their agents.

B. Coding Agent

The field of software development shows significant potential for LLM functions, with functions evolving from code completion to autonomous problem solving. Agents are particularly effective because:

The solution to code problems can be verified through automated testing;
The agent can use the test results as a feedback iterative solution;
The problem is well defined and structured;
The output quality can be measured objectively.

In our own implementation, the agent is based onSWE-bench VerificationBenchmark, can solve real GitHub questions separately. However, while automated testing helps verify functionality, human review remains critical to ensure solutions meet a wider range of system requirements.

Appendix 2: Prompt project your tools

Whatever proxy system you are building, tool plugins can be an important part of your proxy. Tools enable Claude to interact with external services and APIs by specifying their exact structure and definitions in our API. When Claude responds, if it plans to call the tool, it will include a one in the API responseTool usage blocks. Tool definitions and specifications should be the same as the overall prompts, and get the same prompts project attention. In this short appendix, how to prompt the tool is described.

There are usually several ways to specify the same operation. For example, file editing can be specified by writing diff or rewriting the entire file. For structured output, you can return the code in Markdown or JSON. In software engineering, these differences are superficial and can be converted from one format to another without loss.

However, some formats are more difficult to write for LLM than others. Writing a diff requires knowing how many lines are changing at the head of the block before new code is written. Writing code in JSON (compared to Markdown) requires additional escapes to escape newlines and quotes.

Our suggestions for deciding the tool format are as follows:

Give the model enough to “think” before it reaches a dead end.
Keep the format close to text that appears naturally on the internet.
Make sure there is no formatting “overhead”, such as having to accurately calculate thousands of lines of code, or string escapes any code it writes.

One experience is that the same effort is put into creating a good proxy-computer interface (ACI) as much effort is needed. Here are some ideas on how to do this:

Put yourself in the shoes of the model. Based on the description and parameters, is it obvious to use this tool, or does it require careful consideration? A good tool definition often includes example usage, boundary conditions, input format requirements, and clear boundaries with other tools.
How to change the parameter name or description to make the task more obvious? Think of this as writing easy-to-read instructions for junior developers on your team. This is especially important when using many similar tools.
Test models how to use your tools: in ourWorkbenchRun multiple example inputs on it to check what mistakes the model has made and iterate.
Implement for your toolError-proof measures. Change parameters to make it harder to make mistakes.

BuildingSWE-benchWhen proxying, Anthropic actually spends more time on optimization tools than overall optimization tips. For example, Anthropic discovers that models can make errors when using tools with relative file paths, especially after the agent moves out of the root directory. To solve this problem, changing the tool to always require the use of absolute file paths, we found that the model uses this approach perfectly.

Source link