字符串解析器 - wangshengliang

为什么需要解析器？

默认情况下，LangChain 中调用 LLM 返回的其实不是一个简单的字符串，而是一个包含大量元信息的对象。

课堂示例

import { ChatOllama } from '@langchain/ollama'
import { HumanMessage } from '@langchain/core/messages'

const chatModel = new ChatOllama({
  model: 'llama3',
  temperature: 0.7,
})

const response = await chatModel.invoke([new HumanMessage('Tell me a joke')])

console.log('Response:', response)

Response: AIMessage {
  "content": "Here's one:\n\nWhy couldn't the bicycle stand up by itself?\n\n(wait for it...)\n\nBecause it was two-tired!\n\nHope that made you smile! Do you want to hear another one?",
  "additional_kwargs": {},
  "response_metadata": {
    "model": "llama3",
    "created_at": "2025-06-20T01:35:00.765212Z",
    "done_reason": "stop",
    "done": true,
    "total_duration": 2370986375,
    "load_duration": 56247875,
    "prompt_eval_count": 14,
    "prompt_eval_duration": 1146345791,
    "eval_count": 40,
    "eval_duration": 1167879375
  },
  "tool_calls": [],
  "invalid_tool_calls": [],
  "usage_metadata": {
    "input_tokens": 14,
    "output_tokens": 40,
    "total_tokens": 54
  }
}

有用的元数据，例如：

使用的模型名称和生成时间（response_metadata）
token 使用情况（usage_metadata）
工具调用信息（tool_calls）

这对于开发者来说很有价值，但对终端用户来说，这些是“噪音”。

当然我们可以手动从返回对象中提取 response.content，但这在多个模型、多个框架之间切换时会很繁琐：

不同模型返回的格式不完全一致
有些模型甚至直接返回字符串（如基础模型）
你每次都要写 response.content，写多了很容易出错

这就是解析器登场的意义。

解析器的作用

拿到模型的输出后，利用解析器可以帮助我们将 LLM 原始输出转化为结构化数据，常用于：

提取 JSON、列表、对象、代码块
数据清洗与格式化
配合 Tool / Agent 提取指令参数

解析器类型

字符串解析器
结构化输出解析器
列表输出解析器
…

字符串解析器

在实际开发中，我们很多时候只关心模型生成的文本内容，而不是一大串包含元信息的复杂对象。

此时就可以使用 StringOutputParser —— 它是最基础、最常用的输出解析器。

import { ChatOllama } from '@langchain/ollama'
import { HumanMessage } from '@langchain/core/messages'
// 引入输出解析器
import { StringOutputParser } from '@langchain/core/output_parsers'

// 创建聊天模型实例
const chatModel = new ChatOllama({
  model: 'llama3',
  temperature: 0.7,
})

// 创建解析器，并通过 pipe() 连接模型
const parser = new StringOutputParser()
const chain = chatModel.pipe(parser)

// 发起调用，直接得到纯文本结果
const response = await chain.invoke([new HumanMessage('用中文给我讲一个笑话')])

console.log('🤖 响应文本：', response)

这是最简单的 Parser，如果不使用 StringOutputParser，你得到的是一个结构化的对象：

AIMessage {
  content: "...",
  usage_metadata: {...},
  response_metadata: {...},
  ...
}

需要手动提取 .content 字段才能使用。但通过 StringOutputParser，能够自动为完成解析和提取，省去了繁琐的步骤。

流式示例：

import { ChatOllama } from '@langchain/ollama'
import { HumanMessage } from '@langchain/core/messages'
import { StringOutputParser } from '@langchain/core/output_parsers'

// 创建支持流式响应的模型实例
const chatModel = new ChatOllama({
  model: 'llama3',
  temperature: 0.7,
})

// 创建输出解析器，并构建 chain
const parser = new StringOutputParser()
const chain = chatModel.pipe(parser)

// 发起流式调用
const stream = await chain.stream([new HumanMessage('用中文给我讲一个笑话')])

// 逐块输出响应内容
for await (const chunk of stream) {
  process.stdout.write(chunk)
}