创建文本转语音请求

根据输入文本生成音频。接口生成的数据为音频二进制数据，需要用户自行处理。

Authorization

AuthorizationBearer <token>required

添加 Header 'Authorization: Bearer {账户 API Key}' 进行鉴权

In: header

Request Body

modelstringrequired

MOSS-TTSD（text to spoken dialogue）是一个开源的双语对话语音合成模型，支持中文和英文。它可以将两位说话人之间的对话脚本转换为自然、富有表现力的对话语音。MOSS-TTSD 支持声音克隆和长单会话语音生成，非常适合 AI 播客制作。

为更好地提升服务质量，我们会对本服务提供的模型进行定期变更，包括但不限于模型上下线和模型服务能力的调整。在可行的情况下，我们会通过公告或消息推送等适当方式通知您此类变更。

Value in"fnlp/MOSS-TTSD-v0.5"

inputstringrequired

对话文本使用说话人标签来标识轮次： [S1]：表示说话人 1 正在说话 [S2]：表示说话人 2 正在说话

Length1 <= length <= 128000

Example"[S1]Hello, how are you today?[S2]I'm doing great, thanks for asking![S1]That's wonderful to hear "

max_tokensinteger

要生成的最大 token 数量。输入 + 输出不超过 32k token。

referencesarray<object>

voice 字段和 references 字段互斥。如需使用脚本化对话，需要通过 references 字段传入两种音色。脚本化对话仅适用于 MOSS-TTSD 模型。

voicestring

voice 字段目前不支持两种音色。如需上传两种音色，请使用 references 字段。

Example"fnlp/MOSS-TTSD-v0.5:alex"

response_formatstring

音频的输出格式。支持的格式有 mp3、opus、wav、pcm

Default"mp3"

Value in"mp3" | "opus" | "wav" | "pcm"

sample_ratenumber

控制输出采样率。不同音频输出类型的默认值和支持范围如下：opus：支持 48000 Hz。wav、pcm：支持 8000、16000、24000、32000、44100 Hz，默认值为 44100 Hz。mp3：支持 32000、44100 Hz，默认值为 44100 Hz

Default32000

streamboolean

是否使用流式输出

Value infalse | true

speednumber

生成音频的语速。可选值范围为 0.25 到 4.0。默认值为 1.0。

Default1

Formatfloat

Range0.25 <= value <= 4

gainnumber

音频增益，用于调整输出音量。取值范围为 -10.0 到 10.0，默认值为 0.0。

Formatfloat

Range-10 <= value <= 10

modelstringrequired

对应的模型名称。为更好地提升服务质量，我们会对本服务提供的模型进行定期变更，包括但不限于模型上下线和模型服务能力的调整。在可行的情况下，我们会通过公告或消息推送等适当方式通知您此类变更。

Value in"FunAudioLLM/CosyVoice2-0.5B"

inputstringrequired

对于自然语言指令，请在自然语言描述前添加特殊结束标记 "<|endofprompt|>"。这些描述涵盖情感、语速、角色扮演和方言等方面。对于详细指令，在文本标记之间插入音高变化，使用如 "[laughter]" 和 "[breath]" 之类的标记。此外，我们将音高特征标记应用于短语；例如：Can you say it with a happy emotion? <|endofprompt|> Today is really happy, Spring Festival is coming! I’m so happy, Spring Festival is coming! [laughter] [breath].

Length1 <= length <= 128000

Example"Can you say it with a happy emotion? <|endofprompt|>I'm so happy, Spring Festival is coming!"

voicestring

voice 字段目前不支持两种音色。如需上传两种音色，请使用 references 字段。

Example"FunAudioLLM/CosyVoice2-0.5B:alex"

referencesarray<object>

voice 字段和 references 字段互斥。

response_formatstring

音频的输出格式。支持的格式有 mp3、opus、wav、pcm

Default"mp3"

Value in"mp3" | "opus" | "wav" | "pcm"

sample_ratenumber

Default32000

streamboolean

是否使用流式输出

speednumber

生成音频的语速。可选值范围为 0.25 到 4.0。默认值为 1.0。

Default1

Formatfloat

Range0.25 <= value <= 4

gainnumber

音频增益，用于调整输出音量。取值范围为 -10.0 到 10.0，默认值为 0.0。

Formatfloat

Range-10 <= value <= 10

Response Body

根据输入文本生成音频。接口生成的数据为二进制格式，需要用户自行处理。响应头中包含 x-siliconcloud-trace-id 字段，作为请求的唯一追踪标识，便于日志查询和问题排查。

TypeScript Definitions

Use the response body type in TypeScript.

responsefile

Formatbinary

Example"音频的二进制数据"

curl --location 'https://api.siliconflow.cn/v1/audio/speech' \
--header 'Authorization: Bearer sk-xx' \
--header 'Content-Type: application/json' \
--data '{
  "model": "fnlp/MOSS-TTSD-v0.5",
  "input": "你站在桥上看风景，看风景的人在楼上看你。明月装饰了你的窗子，你装饰了别人的梦",
  "voice": "fnlp/MOSS-TTSD-v0.5:alex",
  "response_format": "mp3",
  "stream": true
}'

import requests

url = "https://api.siliconflow.cn/v1/audio/speech"

payload = {
    "model": "fnlp/MOSS-TTSD-v0.5",
    "input": "你站在桥上看风景，看风景的人在楼上看你。明月装饰了你的窗子，你装饰了别人的梦",
    "voice": "fnlp/MOSS-TTSD-v0.5:alex",
    "response_format": "mp3",
    "stream": True
}

headers = {
    "Authorization": "Bearer sk-xx",  # 请替换为您的真实 API Key
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers, stream=True)

if response.status_code == 200:
    # 将流式响应写入文件
    with open("output.mp3", "wb") as f:
        for chunk in response.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)
    print("音频已成功保存为 output.mp3")
else:
    print(f"请求失败，状态码: {response.status_code}")
    print(f"错误信息: {response.text}")

const axios = require('axios');
const fs = require('fs');
const url = 'https://api.siliconflow.cn/v1/audio/speech';
const data = {
    model: "fnlp/MOSS-TTSD-v0.5",
    input: "你站在桥上看风景，看风景的人在楼上看你。明月装饰了你的窗子，你装饰了别人的梦",
    voice: "fnlp/MOSS-TTSD-v0.5:alex",
    response_format: "mp3",
    stream: true
};

const config = {
    method: 'post',
    url: url,
    headers: {
        'Authorization': 'Bearer sk-xx', // 请替换为您的真实 API Key
        'Content-Type': 'application/json'
    },
    data: data,
    responseType: 'stream' // 重要：设置响应类型为流
};
axios(config)
    .then(function (response) {
        // 将流式数据写入文件
        const writer = fs.createWriteStream('output.mp3');
        response.data.pipe(writer);

        writer.on('finish', () => {
            console.log('音频已成功保存为 output.mp3');
        });
        
        writer.on('error', (err) => {
            console.error('写入文件时出错:', err);
        });
    })
    .catch(function (error) {
        console.error('请求失败:', error.message);
        if (error.response) {
            console.error('状态码:', error.response.status);
        }
    });

"音频的二进制数据"

{
  "code": 20012,
  "message": "string",
  "data": "string"
}

"Invalid token"

"Forbidden"

"404 page not found"

{
  "message": "Request was rejected due to rate limiting. If you want more, please contact contact@siliconflow.cn. Details:TPM limit reached.",
  "data": "string"
}

{
  "code": 50505,
  "message": "Model service overloaded. Please try again later.",
  "data": "string"
}

"string"

创建文本转语音请求

Authorization

Request Body

Response Body

200

400

401

403

404

429

503

504