创建文本转语音请求
Generate audio from input text. The data generated by the interface is the binary data of the audio, which requires the user to handle it themselves.
MOSS-TTSD (text to spoken dialogue) is an open-source bilingual spoken dialogue synthesis model that supports both Chinese and English. It can transform dialogue scripts between two speakers into natural, expressive conversational speech. MOSS-TTSD supports voice cloning and long single-session speech generation, making it ideal for AI podcast production.
To better enhance service quality, we will make periodic changes to the models provided by this service, including but not limited to model on/offlining and adjustments to model service capabilities. We will notify you of such changes through appropriate means such as announcements or message pushes where feasible.
"fnlp/MOSS-TTSD-v0.5"The dialogue text uses speaker tags to indicate turns: [S1]: Indicates Speaker 1 is speaking [S2]: Indicates Speaker 2 is speaking
1 <= length <= 128000"[S1]Hello, how are you today?[S2]I'm doing great, thanks for asking![S1]That's wonderful to hear "The maximum number of tokens to generate. The input + output does not exceed 32k tokens.
The voice field and references field are mutually exclusive. If you want to use scripted dialogue, you need to pass two voice tones through the references field. Scripted dialogue is only available for the moss model.
The "voice" field currently does not support two timbres. If you need to upload two timbres, please use "reference".
"fnlp/MOSS-TTSD-v0.5:alex"The format to audio out. Supported formats are mp3, opus, wav, pcm
"mp3""mp3" | "opus" | "wav" | "pcm"Control the output sample rate. The default values and differ for different video output types, as follows: opus: Supports 48000 Hz. wav, pcm: Supports 8000, 16000, 24000, 32000, 44100 Hz, with a default of 44100 Hz. mp3: Supports 32000, 44100 Hz, with a default of 44100 Hz.
32000streaming or not
false | trueThe speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.
1float0.25 <= value <= 4float-10 <= value <= 10Corresponding Model Name. To better enhance service quality, we will make periodic changes to the models provided by this service, including but not limited to model on/offlining and adjustments to model service capabilities. We will notify you of such changes through appropriate means such as announcements or message pushes where feasible.
"FunAudioLLM/CosyVoice2-0.5B"For natural language instructions, add a special end marker "<|endofprompt|>" before the natural language description. These descriptions cover aspects such as emotion, speaking speed, role-playing, and dialects. For detailed instructions, insert pitch bursts between text markers, using markers like "[laughter]" and "[breath]." Additionally, we apply pitch feature markers to phrases; for example:Can you say it with a happy emotion? <|endofprompt|> Today is really happy, Spring Festival is coming! I’m so happy, Spring Festival is coming! [laughter] [breath].
1 <= length <= 128000"Can you say it with a happy emotion? <|endofprompt|>I'm so happy, Spring Festival is coming!"The "voice" field currently does not support two timbres. If you need to upload two timbres, please use "reference".
"FunAudioLLM/CosyVoice2-0.5B:alex"The voice field and references field are mutually exclusive.
The format to audio out. Supported formats are mp3, opus, wav, pcm
"mp3""mp3" | "opus" | "wav" | "pcm"Control the output sample rate. The default values and differ for different video output types, as follows: opus: Supports 48000 Hz. wav, pcm: Supports 8000, 16000, 24000, 32000, 44100 Hz, with a default of 44100 Hz. mp3: Supports 32000, 44100 Hz, with a default of 44100 Hz.
32000streaming or not
The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.
1float0.25 <= value <= 4float-10 <= value <= 10Response Body
Generate audio based on the input text. The data generated by the interface is in binary format and requires the user to process it themselves. The response header contains the x-siliconcloud-trace-id field, which serves as a unique identifier for tracing requests, facilitating log queries and issue troubleshooting.
TypeScript Definitions
Use the response body type in TypeScript.
binary"音频的二进制数据"curl --location 'https://api.siliconflow.cn/v1/audio/speech' \
--header 'Authorization: Bearer sk-xx' \
--header 'Content-Type: application/json' \
--data '{
"model": "fnlp/MOSS-TTSD-v0.5",
"input": "你站在桥上看风景,看风景的人在楼上看你。明月装饰了你的窗子,你装饰了别人的梦",
"voice": "fnlp/MOSS-TTSD-v0.5:alex",
"response_format": "mp3",
"stream": true
}'
import requests
url = "https://api.siliconflow.cn/v1/audio/speech"
payload = {
"model": "fnlp/MOSS-TTSD-v0.5",
"input": "你站在桥上看风景,看风景的人在楼上看你。明月装饰了你的窗子,你装饰了别人的梦",
"voice": "fnlp/MOSS-TTSD-v0.5:alex",
"response_format": "mp3",
"stream": True
}
headers = {
"Authorization": "Bearer sk-xx", # 请替换为您的真实 API Key
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers, stream=True)
if response.status_code == 200:
# 将流式响应写入文件
with open("output.mp3", "wb") as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
print("音频已成功保存为 output.mp3")
else:
print(f"请求失败,状态码: {response.status_code}")
print(f"错误信息: {response.text}")
const axios = require('axios');
const fs = require('fs');
const url = 'https://api.siliconflow.cn/v1/audio/speech';
const data = {
model: "fnlp/MOSS-TTSD-v0.5",
input: "你站在桥上看风景,看风景的人在楼上看你。明月装饰了你的窗子,你装饰了别人的梦",
voice: "fnlp/MOSS-TTSD-v0.5:alex",
response_format: "mp3",
stream: true
};
const config = {
method: 'post',
url: url,
headers: {
'Authorization': 'Bearer sk-xx', // 请替换为您的真实 API Key
'Content-Type': 'application/json'
},
data: data,
responseType: 'stream' // 重要:设置响应类型为流
};
axios(config)
.then(function (response) {
// 将流式数据写入文件
const writer = fs.createWriteStream('output.mp3');
response.data.pipe(writer);
writer.on('finish', () => {
console.log('音频已成功保存为 output.mp3');
});
writer.on('error', (err) => {
console.error('写入文件时出错:', err);
});
})
.catch(function (error) {
console.error('请求失败:', error.message);
if (error.response) {
console.error('状态码:', error.response.status);
}
});
"音频的二进制数据"{
"code": 20012,
"message": "string",
"data": "string"
}"Invalid token""Forbidden""404 page not found"{
"message": "Request was rejected due to rate limiting. If you want more, please contact contact@siliconflow.cn. Details:TPM limit reached.",
"data": "string"
}{
"code": 50505,
"message": "Model service overloaded. Please try again later.",
"data": "string"
}"string"