Skip to content

结构化输出解析器

提前告诉大模型我希望你用“某个格式”输出结果。

定义解析器:

import { StructuredOutputParser } from '@langchain/core/output_parsers'
const parser = StructuredOutputParser.fromNamesAndDescriptions({
answer: '用户问题的答案',
evidence: '你回答用户问题所依据的答案',
confidence: '问题答案的可信度评分,格式是百分数',
})

使用解析器:

import { ChatOllama } from '@langchain/ollama'
import { StructuredOutputParser } from '@langchain/core/output_parsers'
import { PromptTemplate } from '@langchain/core/prompts'
const prompt = PromptTemplate.fromTemplate('回答该问题 \n{question}')
// 创建聊天模型实例
const chatModel = new ChatOllama({
model: 'llama3',
temperature: 0.7,
})
const parser = StructuredOutputParser.fromNamesAndDescriptions({
answer: '用户问题的答案',
evidence: '你回答用户问题所依据的答案',
confidence: '问题答案的可信度评分,格式是百分数',
})
// 创建一个链
const chain = prompt.pipe(chatModel).pipe(parser)
// 发起调用,直接得到纯文本结果
const response = await chain.invoke({
question: '蒙娜丽莎的作者是谁?是什么时候绘制的',
})
console.log('🤖 响应文本:', response)

报错:大模型生成的答案没办法正常解析。

可以使用该解析器提供的一个辅助方法:getFormatInstructions

该辅助方法可以帮助我们生成一个非常健壮的提示词模板。

输出结果:

You must format your output as a JSON value that adheres to a given "JSON Schema" instance.
"JSON Schema" is a declarative language that allows you to annotate and validate JSON documents.
For example, the example "JSON Schema" instance {{"properties": {{"foo": {{"description": "a list of test words", "type": "array", "items": {{"type": "string"}}}}}}, "required": ["foo"]}}
would match an object with one required property, "foo". The "type" property specifies "foo" must be an "array", and the "description" property semantically describes it as "a list of test words". The items within "foo" must be strings.
Thus, the object {{"foo": ["bar", "baz"]}} is a well-formatted instance of this example "JSON Schema". The object {{"properties": {{"foo": ["bar", "baz"]}}}} is not well-formatted.
Your output will be parsed and type-checked according to the provided schema instance, so make sure all fields in your output match the schema exactly and there are no trailing commas!
Here is the JSON Schema instance your output must adhere to. Include the enclosing markdown codeblock:
```json
{"type":"object","properties":{"answer":{"type":"string","description":"用户问题的答案"},"evidence":{"type":"string","description":"你回答用户问题所依据的答案"},"confidence":{"type":"string","description":"问题答案的可信度评分,格式是百分数"}},"required":["answer","evidence","confidence"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}
```

可以看到,输出结果中包含:

这段自动生成的提示词不仅格式清晰,而且非常健壮,是生产级别的 prompt 写法模板!

逐段分析#

先告诉 LLM 输出的类型:

You must format your output as a JSON value that adheres to a given "JSON Schema" instance.

然后,使用 few-shot ,也就是用示例告诉 LLM 什么是 JSON Schema,什么情况会被解析成功,什么情况不会被解析成功

For example, the example "JSON Schema" instance {{"properties": {{"foo": {{"description": "a list of test words", "type": "array", "items": {{"type": "string"}}}}}}, "required": ["foo"]}}
would match an object with one required property, "foo". The "type" property specifies "foo" must be an "array", and the "description" property semantically describes it as "a list of test words". The items within "foo" must be strings.
Thus, the object {{"foo": ["bar", "baz"]}} is a well-formatted instance of this example "JSON Schema". The object {{"properties": {{"foo": ["bar", "baz"]}}}} is not well-formatted.

然后,再次强调类型的重要性,输出必须遵循给定的JSON Schema实例,确保所有字段严格匹配Schema中的定义,没有额外的属性,也没有遗漏的必需属性。

并强调需要注意细节,比如不要在JSON对象中添加多余的逗号,这可能会导致解析失败。这些 prompt 质量非常高,把在该任务中大模型容易出现的问题都进行了强调,可以有效的保证输出的质量。

Your output will be parsed and type-checked according to the provided schema instance, so make sure all fields in your output match the schema exactly and there are no trailing commas!

最后才是给出,我们指定的 JSON 格式和对应的描述

Here is the JSON Schema instance your output must adhere to. Include the enclosing markdown codeblock:
```json
{"type":"object","properties":{"answer":{"type":"string","description":"用户问题的答案"},"evidence":{"type":"string","description":"你回答用户问题所依据的答案"},"confidence":{"type":"string","description":"问题答案的可信度评分,格式是百分数"}},"required":["answer","evidence","confidence"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}
```

通过这样一系列的 prompt,就能保证大模型以指定的格式输出。

使用示例#

import { StructuredOutputParser } from '@langchain/core/output_parsers'
import { PromptTemplate } from '@langchain/core/prompts'
import { ChatOllama } from '@langchain/ollama'
// 创建解析器
const parser = StructuredOutputParser.fromNamesAndDescriptions({
answer: '用户问题的答案',
evidence: '你回答用户问题所依据的答案',
confidence: '问题答案的可信度评分,格式是百分数',
})
const pt = PromptTemplate.fromTemplate(
'请回答问题:\n{instructions} \n{question}'
)
const model = new ChatOllama({
model: 'llama3',
temperature: 0.7,
})
const chain = pt.pipe(model).pipe(parser)
const res = await chain.invoke({
question: '蒙娜丽莎的作者是谁?是什么时候绘制的',
instructions: parser.getFormatInstructions(),
})
console.log(res)

StructuredOutputParser 不支持流式解析,因为它只能在模型输出完整字符串之后再执行 parse()