Authorization: Bearer ********************{
"model": "gemini-2.5-flash",
"messages": [
{
"content": [
{
"text": "分析总结下面的图片,音频,视频各自主要讲了什么?",
"type": "text"
},
{
"image_url": {
"url": "https://ark-project.tos-cn-beijing.volces.com/images/view.jpeg"
},
"type": "image_url"
},
{
"type": "audio_url",
"audio_url": {
"url": "https://xxx.com/xxxx/audio.mp3"
}
},
{
"type": "video_url",
"video_url": {
"url": "https://xxx.com/xxxx/video.mp4"
}
}
],
"role": "user"
}
]
}curl --location --request POST 'https://api.mixrouter.com/v1/chat/completions' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "gemini-2.5-flash",
"messages": [
{
"content": [
{
"text": "分析总结下面的图片,音频,视频各自主要讲了什么?",
"type": "text"
},
{
"image_url": {
"url": "https://ark-project.tos-cn-beijing.volces.com/images/view.jpeg"
},
"type": "image_url"
},
{
"type": "audio_url",
"audio_url": {
"url": "https://xxx.com/xxxx/audio.mp3"
}
},
{
"type": "video_url",
"video_url": {
"url": "https://xxx.com/xxxx/video.mp4"
}
}
],
"role": "user"
}
]
}'{
"id": "chatcmpl-toio20251208234104228824573AWv2q6t",
"model": "gemini-2.5-flash",
"object": "chat.completion",
"created": 1765208493,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "好的,我将分别分析总结图片、音频和视频的主要内容:\n\n1. **图片 (Image):**\n * **主要内容:** 这张图片展示了一幅宁静而壮丽的自然风光。画面主体是一位划着橙色皮划艇的人,漂浮在清澈平静的蓝色湖面上。背景是连绵起伏、白雪覆盖的雄伟山脉,山脚下是茂密的墨绿色针叶林。天空湛蓝,有些许白云,整体色调以冷色系为主,给人一种清爽、开阔和宁静的感觉,充满了户外探险和亲近自然的氛围。\n\n2. **音频 (Audio):**\n * **主要内容:** 音频是一段中文普通话的语音。一位男士清晰地说到:“这些牌从小到大排列,具有某种递增的规律。”\n * **总结:** 音频内容是关于描述“牌”的排列方式,强调它们是按照从小到大、具有递增规律的方式进行排列的。这可能是在描述某个游戏规则、逻辑排序或者某种模式。\n\n3. **视频 (Video):**\n * **主要内容:** 视频的主角是一只三花猫(calico cat),它坐在钢琴凳上,模仿人类的姿势,用前爪“弹奏”着一架黑色大钢琴。钢琴谱架上摆放着乐谱。视频的灯光效果很有趣,将猫和钢琴置于聚光灯下,背景黑暗,营造出一种戏剧性和幽默感。视频伴随着钢琴演奏的音乐,让猫咪的“表演”显得活灵活现。\n * **总结:** 视频主要展示了一只可爱的猫咪在“演奏”钢琴的幽默场景,结合了钢琴音乐和特殊的灯光效果,呈现出一个拟人化且富有艺术感的趣味画面。"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1626,
"completion_tokens": 1526,
"total_tokens": 3152,
"prompt_tokens_details": {
"cached_tokens": 0,
"text_tokens": 15,
"audio_tokens": 301,
"image_tokens": 0
},
"completion_tokens_details": {
"text_tokens": 0,
"audio_tokens": 0,
"reasoning_tokens": 1113
},
"output_tokens_details": {
"text_tokens": 0,
"audio_tokens": 0,
"reasoning_tokens": 0
},
"input_tokens": 0,
"output_tokens": 0,
"input_tokens_details": null,
"claude_cache_creation_5_m_tokens": 0,
"claude_cache_creation_1_h_tokens": 0
}
}