Jakevin’s Blog

如何申請Groq的API key

2025-02-12T10:42:00+00:00

網址：Groq console

Groq aka AI大善人，提供不少速度極快，品質極高的AI模型

還有最新的千問2.5 跟 Deepseek R1蒸餾版

註冊與登入，Login

到Email去收信

進入後台，選API keys，然後Create API Key

注意！注意！注意！一定要Copy Key，不然後面會找不到

以上，就完成Groq的申請了。如何使用？請看下面

Case 1

如果你看到的服務，是寫 https://api.openai.com/v1/chat/completions 那你需要改成 https://api.groq.com/openai/v1/chat/completions

Case 2

如果你看到的是寫 https://api.openai.com/v1 那就改成 https://api.groq.com/openai/v1

Case 3

金鑰 = Api Key，如果你看到的服務要填寫Apikey的話，OpenAI的格式是 sk-1122334455667788 你直接貼上從Groq取得的Key即可，格式像這樣gsk_1122334455667788

如果你看到的是 Bearer sk-1122334455667788 你就改成 Bearer Groq的Key

Case 4

模型名稱的部分，Groq的是長這樣

gemma2-9b-it
llama-3.3-70b-versatile
qwen-2.5-32b
deepseek-r1-distill-qwen-32b
deepseek-r1-distill-llama-70b-specdec

有非常多種，而且3個月左右會更新一批，如果你使用的是 preview 有時會被淘汰請隨時注意模型列表

目前推薦使用：gemma2-9b-it qwen-2.5-32b deepseek-r1-distill-qwen-32b

如何讓Google Gemini API 相容OpenAI格式

2025-02-11T14:42:00+00:00

網址：Google AI Studio

網路上有很多服務都是希望你填寫OpenAI相容的網址，但Gemini API原本的格式太過獨特，不是每個服務都有相容。

在Google AI Studio裡並不會直接提供OpenAI相容的網址，需要到文件裡去找

Case 1

如果你看到的服務，是寫 https://api.openai.com/v1/chat/completions 那你需要改成 https://generativelanguage.googleapis.com/v1beta/openai/chat/completions

Case 2

如果你看到的是寫 https://api.openai.com/v1 那就改成 https://generativelanguage.googleapis.com/v1beta/openai/

Case 3

金鑰 = Api Key，如果你看到的服務要填寫Apikey的話，OpenAI的格式是 sk-1122334455667788 你直接貼上從Google AI Studio取得的Key即可

如果你看到的是 Bearer sk-1122334455667788 你就改成 Bearer Google的Key

Case 4

模型名稱的部分，Google的是長這樣

gemini-2.0-pro-exp
gemini-2.0-flash
gemini-2.0-flash-exp
gemini-2.0-flash-thinking-exp
gemini-2.0-flash-lite-preview
gemini-2.0-flash-lite-preview-02-05

有非常多種，而且3個月左右會更新一批，如果你使用的是 preview 或是有帶日期的模型名稱，有時會被淘汰請隨時注意模型列表

目前推薦使用：gemini-2.0-flash gemini-2.0-flash-exp gemini-2.0-pro-exp

參考來源：

OpenAI 相容性

如何取得Google AI studio API Key

2025-02-10T14:42:00+00:00

網址：Google AI Studio

選 Get API key

選 Create API key

選一個專案，沒有的話可能會要求你建一個新的

這樣就完成了，把剛剛拿到的key Copy下來

如果忘了key，只要點一下key後，就會跳出確認視窗

後續：如何讓Google Gemini API 相容OpenAI格式

參考來源：

取得 Gemini API 金鑰

Gemini API 快速入門：來用 Js 做一個線上聊天吧

一起來用 Structured Outputs

2024-10-20T14:42:00+00:00

一起來用 LLM Function Calling

一起來用 LLM Audio generation

OpenAI Structured Outputs

這是OpenAI在2024，是Function Calling推出一年後再一次對輸出格式的重大更新。

我主要做翻譯工具為主，所以介紹的案例會偏向翻譯應用這塊。

Structured Outputs

情境：需要將中文，翻譯成指定的多個語系。

先不使用工具提問

{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，請翻譯成'zh-CN簡體中文' 'jp-JP日本語' 'en-US美式英文' 完整翻譯，不要發音、注解。"
        }
    ]
}

回應，耗時 886ms

{
    "message": {
        "role": "assistant",
        "content": "
            zh-CN简体中文：请问最近的车站怎么走？
        
            jp-JP日本語：最近の駅へはどうやって行きますか？
        
            en-US美式英文：Excuse me, how do I get to the nearest train station?",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 82,
        "completion_tokens": 55,
        "total_tokens": 137
    }
}

他確實回應我要的答案，但這個答案你很難回到你的程式碼裡去做二次應用。比方說只顯示中文，或是要只顯示中文與日文。

情境：使用Structured Outputs，將翻譯結果指定為多個語系。

{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，完整翻譯，不要發音、注解。"
        }
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "split_text",
            "schema": {
                "type": "object",
                "properties": {
                    "zh-CN": {
                        "type": "string",
                        "description": "Chinese reslut"
                    },
                    "jp-JP": {
                        "type": "string",
                        "description": "Japanese reslut"
                    },
                    "en-US": {
                        "type": "string",
                        "description": "Englisg reslut"
                    }
                },
                "required": [
                    "zh-CN"
                ]
            }
        }
    }
}

就規格上可以看到response_format是這次輸出的重點參數。

基本上就都是固定參數，會變動的就 json_schema.name 跟 json_schema.schema

json_schema.name 顧名思義就是這個 json_schema 的名稱。

而 json_schema.schema 就很重要了，你要決定json輸出是 object 或 array！

json_schema.schema.properties 開始，就是描述複雜的json回應格式了，就看每個情境來決定。

以我的情況來說，我需要每個語種的翻譯結果.

來看輸出，耗時1339ms

{
    "message": {
        "role": "assistant",
        "content": "{\"zh-CN\":\"请问最近的车站怎么走？\",\"jp-JP\":\"最近の駅に行くにはどうすればいいですか？\",\"en-US\":\"How do I get to the nearest station?\"}",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 120,
        "completion_tokens": 45,
        "total_tokens": 165,
    }
}

//特別抽出結果
{
  'zh-CN': '请问最近的车站怎么走？',
  'jp-JP': '最近の駅に行くにはどうすればいいですか？',
  'en-US': 'How do I get to the nearest station?'
}

可以看到，他會按照我指定的樣子產出給我，方便我們做後處理。

有趣的事，我的prompt裡並沒有指定要輸出的語言，但他也能夠判斷要輸出的語言給我

情境：使用Structured Outputs，將翻譯結果指定沒有在輸出格式裡的語系。

{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，請用'th-TH泰語'完整翻譯，不要發音、注解。"
        }
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "split_text",
            "schema": {
                "type": "object",
                "properties": {
                    "zh-CN": {
                        "type": "string",
                        "description": "Chinese reslut"
                    },
                    "jp-JP": {
                        "type": "string",
                        "description": "Japanese reslut"
                    },
                    "en-US": {
                        "type": "string",
                        "description": "Englisg reslut"
                    }
                },
                "required": [
                    "zh-CN"
                ]
            }
        }
    }
}

我刻意要求輸出泰語的翻譯，並強制要求zh-CN是JSON輸出的必要項目時。

輸出，耗時873ms

{
    "message": {
        "role": "assistant",
        "content": "{\"en-US\":\"How do I get to the nearest station?\",\"zh-CN\":\"请问最近的车站怎么走?\",\"jp-JP\":\"最近の駅にはどうやって行けばいいですか?\",\"th-TH\":\"บอกทางไปสถานีที่ใกล้ที่สุดหน่อยได้ไหมครับ/ค่ะ?\"}",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 129,
        "completion_tokens": 70,
        "total_tokens": 199
    }
}

//特別抽出結果
{
  'en-US': 'How do I get to the nearest station?',
  'zh-CN': '请问最近的车站怎么走?',
  'jp-JP': '最近の駅にはどうやって行けばいいですか?',
  'th-TH': 'บอกทางไปสถานีที่ใกล้ที่สุดหน่อยได้ไหมครับ/ค่ะ?'
}

依照我要的格式輸出外，還把指定格式內的參數也產出了。

為什麼跟上一次實驗結果一樣，沒有指定的參數也出來了？

原因在這：Tips and best practices

OpenAI有規定，要強制把所有參數都放到required裡，看來他們後端有自己也做了這個動作。

情境：使用Structured Outputs，沒有指定的語言就不要輸出。

根據OpenAI的說明，可以把 properties.paramter.type指定成 ["string", "null"]

{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，請用'th-TH泰語'完整翻譯，不要發音、注解。"
        }
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "split_text",
            "schema": {
                "type": "object",
                "properties": {
                    "zh-CN": {
                        "type": "string",
                        "description": "Chinese reslut"
                    },
                    "jp-JP": {
                        "type": ["string", "null"],
                        "description": "Japanese reslut"
                    },
                    "en-US": {
                        "type": ["string", "null"],
                        "description": "Englisg reslut"
                    }
                },
                "required": [
                    "zh-CN"
                ]
            }
        }
    }
}

輸出：耗時1603ms

{
    "message": {
        "role": "assistant",
        "content": "{\"zh-CN\":\"请问最近的车站怎么走?\",\"jp-JP\":null,\"en-US\":null,\"th-TH\":\"กรุณาถามว่าไปสถานีที่ใกล้ที่สุดได้อย่างไร?\"}",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 135,
        "completion_tokens": 48,
        "total_tokens": 183
    }
}
//特別抽出結果
{
    "zh-CN":"请问最近的车站怎么走?",
    "jp-JP":null,
    "en-US":null,
    "th-TH":"กรุณาถามว่าไปสถานีที่ใกล้ที่สุดได้อย่างไร?"
}

在不屬於我指定範圍內的語言，他就會轉成null了

情境：使用Structured Outputs，但不請他翻譯，我問他別的事情

{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "user",
            "content": "今天的台北好冷，你有沒有推薦的保暖方法？"
        }
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "split_text",
            "schema": {
                "type": "object",
                "properties": {
                    "zh-CN": {
                        "type": "string",
                        "description": "Chinese reslut"
                    },
                    "jp-JP": {
                        "type": ["string", "null"],
                        "description": "Japanese reslut"
                    },
                    "en-US": {
                        "type": ["string", "null"],
                        "description": "Englisg reslut"
                    }
                },
                "required": [
                    "zh-CN","jp-JP","en-US"
                ]
            }
        }
    }
}

輸出：2170ms

{
    "message": {
        "role": "assistant",
        "content": "{\"en-US\":[\"Wear layers of clothing to trap heat, such as thermal shirts, sweaters, and insulating jackets.\",\"Use a warm scarf, hat, and gloves to keep extremities warm.\",\"Drink hot beverages like tea or coffee to stay warm from the inside.\",\"Consider using a space heater or electric blanket at home.\",\"Snuggle up with a warm blanket while watching movies or reading a book.\"],\"jp-JP\":[null],\"zh-CN\":\"可以尝试以下几种保暖方法：\\n\\n1. 多穿几层衣服，选择保暖的内衣、毛衣和外套。\\n2. 戴上围巾、帽子和手套，以保暖手脚部位。\\n3. 喝热饮如茶或咖啡，让身体从内部保持温暖。\\n4. 在家使用取暖器或电热毯。\\n5. 看电影或阅读时，裹上毛毯。\"}",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 97,
        "completion_tokens": 200,
        "total_tokens": 297
    }
}

他還是回傳了我要的json格式！因為我的任務其實很單純，所以LLM會理解成要一個方案後，給我一些語系上的回傳。

不過 OpenAI文件有提到例外處理還是要做！拒絕結構化輸出

Structured Outputs + Function Calling = 要你命3000

每個都能獨當一面的功能摻在一起後，會變成要你命3000還是撒尿牛丸？

{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "user",
            "content": "今天的台北好冷，我想買最大的外套。你那邊有沒有推薦的保暖方法？"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "pick_size",
                "description": "Call this if the user specifies which size they want",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "size": {
                            "type": "string",
                            "enum": [
                                "S",
                                "M",
                                "L",
                                "XL"
                            ],
                            "description": "The size of the t-shirt that the user would like to order"
                        }
                    },
                    "required": [
                        "size"
                    ],
                    "additionalProperties": false
                }
            }
        }
    ],
    "tool_choice": "auto",
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "split_text",
            "schema": {
                "type": "object",
                "properties": {
                    "zh-CN": {
                        "type": "string",
                        "description": "Chinese reslut"
                    },
                    "jp-JP": {
                        "type": [
                            "string",
                            "null"
                        ],
                        "description": "Japanese reslut"
                    },
                    "en-US": {
                        "type": [
                            "string",
                            "null"
                        ],
                        "description": "Englisg reslut"
                    }
                },
                "required": [
                    "zh-CN",
                    "jp-JP",
                    "en-US"
                ]
            }
        }
    }
}

輸出：耗時 3370ms

{
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "{\"en-US\":[null],\"jp-JP\":[null],\"zh-CN\":\"在台北寒冷的天气中，除了购买最大的外套，你可以尝试以下几种保暖的方法：\\n\\n1. **分层穿衣**：穿多层衣物，如长袖衬衫、毛衣和外套，可以更好地保持体温。\\n2. **使用围巾和手套**：围巾可以保护脖子，手套则保暖双手。\\n3. **选用保暖材料**：如羊毛、羽绒或厚棉材料的衣物，能有效隔绝寒冷。\\n4. **保暖鞋袜**：穿保暖鞋和厚袜子，确保脚部的温暖。\\n5. **喝热饮**：喝热茶、热巧克力等可以让身体内部升温。\\n6. **活动身体**：适当运动可以帮助提升体温，避免长时间静坐。 \\n\\n希望这些方法能帮助你在寒冷的台北保持温暖！\"}",
                "refusal": null
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 168,
        "completion_tokens": 242,
        "total_tokens": 410
    }
}

換個問法

{
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": null,
                "tool_calls": [
                    {
                        "id": "call_WshjoFjxhiN84EFE3S8lFGHt",
                        "type": "function",
                        "function": {
                            "name": "pick_size",
                            "arguments": "{\"size\": \"XL\"}"
                        }
                    }
                ],
                "refusal": null
            },
            "logprobs": null,
            "finish_reason": "tool_calls"
        }
    ],
    "usage": {
        "prompt_tokens": 170,
        "completion_tokens": 30,
        "total_tokens": 200
    }
}

Seeeeee！看到沒，要你命3000啊！

特別把 choices[0].finish_reason 拿出來講

如果這一步採Structured Outputs的話，finish_reason是 stop

如果是採用Function Calling的話，finish_reason是tool_calls

結論

Structured Outputs是號稱100%輸出成你想要的樣子，我的案例實測下來確實是這個樣子，還能給出一些些驚喜。

這樣能有效的讓AI介入開發的環節之內，有效的利用回傳內容做後處理。

Structured Outputs與Function Calling比較

共同點

能夠有效產出json格式，並指定自己的參數上去
都會照成token數上昇，但比Structured Outputs節省一些
都會造成回應時間拉長，原本880ms，變成1300~1600ms左右

不同點

Function Calling會因為LLM選擇tool的不同，產出不同的結果
Function Calling產出結果的處理步驟會複雜一些
Function Calling彈性較大，tools的工具庫可以自由開發

Structured Outputs與Function Calling同時使用的話，不能保證會是哪一個結果被輸出哦！

一起來用 Audio generation

2024-10-20T14:42:00+00:00

一起來用 LLM Function Calling

一起來用 LLM Structured Outputs

OpenAI Audio generation

這是OpenAI在2024年10月才發佈的功能，使用的模型是 gpt-4o-audio-preview，能夠支援語音的輸入與輸出

與Realtime API不一樣，他走的還是 /chat/completions，並不是 WebSocket

價格

Audio*** $100.00 / 1M input tokens $200.00 / 1M output tokens

100美金看似很驚人，但其實與 whisper-1 的費用相同

***Audio input costs approximately 6¢ per minute; Audio output costs approximately 24¢ per minute
***語音輸入約每分鐘6美分；語音輸出約每分鐘24美分。

不過我這邊用來做即時的語音翻譯，所以我會專注說明語音輸入文字輸出的部份

Audio generation

情境：將中文語音轉成日文文字。

{
    "model": "gpt-4o-audio-preview",
    "modalities": ["text"],
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": [
                { "type": "text", text: "將語音，用'jp-JP'完整翻譯，不要發音、注解。" },
                { "type": "input_audio", "input_audio": { "data": base64str, "format": "mp3" } }
            ]
        }
    ]
}

回應，耗時 2970ms

{
    "message": {
        "role": "assistant",
        "content": "皆さん、こんばんは。私のチャンネルへようこそ。",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 100,
        "completion_tokens": 15,
        "total_tokens": 115,
    }
}

這個用法與Whisper API幾乎是一模一樣，但下面是我踩到的坑！！

Whisper是支援 webm 或 ogg等5~6種語音格式，但gpt-4o-audio-preview只有支援mp3與wav

這意味，在瀏覽器上無法直接使用，必需轉檔。

使用base64做為audio input，你還必須刪掉wav的base64的頭部meta
```
base64data.split(',')[1]
```

情境：需要將中文語音，翻譯成指定的多個語系，並使用Function Calling。

{
    "model": "gpt-4o-audio-preview",
    "modalities": ["text"],
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": [
                { "type": "text", text: "將語音，請翻譯成'zh-CN簡體中文' 'jp-JP日本語' 'en-US美式英文' 完整翻譯，不要發音、注解。" },
                { "type": "input_audio", "input_audio": { "data": base64str, "format": "mp3" } }
            ]
        }
    ],
     "tools": [
        {
            "type": "function",
            "function": {
                "name": "split_text",
                "description": "Identify Chinese and other languages in text.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "zh-CN": {
                            "type": "string",
                            "description": "Chinese reslut"
                        },
                        "jp-JP": {
                            "type": "string",
                            "description": "Japanese reslut"
                        },
                        "en-US": {
                            "type": "string",
                            "description": "English reslut"
                        }
                    },
                    "required": [
                        "zh-CN"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}

輸出，耗時4970ms

{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_dAdfaFvPBoL4JrgL4QRphIfo",
            "type": "function",
            "function": {
              "name": "split_text",
              "arguments": "{\"zh-CN\": \"大家好，今天是10月21号，星期一，欢迎来到我们的节目。\"}"
            }
          },
          {
            "id": "call_HnffYc7jMsIZntdBBhPbgrsp",
            "type": "function",
            "function": {
              "name": "split_text",
              "arguments": "{\"jp-JP\": \"皆さん、こんにちは。 今日は10月21日、月曜日です。 私たちのプログラムへようこそ。\"}"
            }
          },
          {
            "id": "call_VfDkfyBKwKzxRHkNjLyyqnd7",
            "type": "function",
            "function": {
              "name": "split_text",
              "arguments": "{\"en-US\": \"Hello everyone, today is October 21st, Monday. Welcome to our show.\"}"
            }
          }
        ],
        "refusal": null
    },
    "usage": {
       "prompt_tokens": 188,
        "completion_tokens": 123,
        "total_tokens": 311,
    }
}

Function Calling是有支援的，跟前面的實驗一樣，輸出的結果還是有所不同

但Function Calling的彈性很大，應該可以完成不少有趣的開發

情境：使用Structured Outputs，將翻譯結果指定為多個語系。

{
    "model": "gpt-4o-audio-preview",
    "modalities": ["text"],
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": [
                { "type": "text", text: "將語音，用完整翻譯，不要發音、注解。" },
                { "type": "input_audio", "input_audio": { "data": base64str, "format": "mp3" } }
            ]
        }
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "split_text",
            "schema": {
                "type": "object",
                "properties": {
                    "zh-CN": {
                        "type": "string",
                        "description": "Chinese reslut"
                    },
                    "jp-JP": {
                        "type": "string",
                        "description": "Japanese reslut"
                    },
                    "en-US": {
                        "type": "string",
                        "description": "Englisg reslut"
                    }
                },
                "required": [
                    "zh-CN"
                ]
            }
        }
    }
}

上面是照著 Structured Outputs 實驗文章的做法，把語音的結果轉成我要的多語系翻譯

來看輸出，耗時339ms

{
    "error": {
        "message": "Invalid parameter: 'response_format' of type 'json_schema' is not supported with this model. Learn more about supported models at the Structured Outputs guide: https://platform.openai.com/docs/guides/structured-outputs",
        "type": "invalid_request_error",
        "param": null,
        "code": null
    }
}

理想很豐滿，現實很骨感…

gpt-4o-audio-preview 並不支援新的 Structured Outputs 結構

結論

傳統的作法是先用 whisper將語音轉成文字後再呼叫LLM模型。

Audio generation 則是直接讓語音進入OpenAI當下就執行LLM模型，直接是快上一步。

不過，就我開發的過程來看，有兩個大缺點要改善

不能支援webm，就是個硬傷。轉成mp3或wav再上傳，不僅讓語音檔案變大，還要等轉檔的時間
Structured Outputs居然沒有支援，你是GPT-4o家族的模型為什麼不能用？！

期待這兩個缺點能快快修正，這樣我的即時語音翻譯可以再快一點點！

一起來用 Function Calling

2024-10-20T02:42:00+00:00

一起來用 LLM Function Calling

一起來用 LLM Audio generation

OpenAI Function Calling

這是OpenAI在2023/06推出的功能，基於文字生成後，能給相對穩定的輸出格式。

我主要做翻譯工具為主，所以介紹的案例會偏向翻譯應用這塊。

Function Calling

情境：需要將中文，翻譯成指定的多個語系。

先不使用工具提問

{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，請翻譯成'zh-CN簡體中文' 'jp-JP日本語' 'en-US美式英文' 完整翻譯，不要發音、注解。"
        }
    ]
}

回應，耗時 886ms

{
    "message": {
        "role": "assistant",
        "content": "
            zh-CN简体中文：请问最近的车站怎么走？
        
            jp-JP日本語：最近の駅へはどうやって行きますか？
        
            en-US美式英文：Excuse me, how do I get to the nearest train station?",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 82,
        "completion_tokens": 55,
        "total_tokens": 137
    }
}

他確實回應我要的答案，但這個答案你很難回到你的程式碼裡去做二次應用。比方說只顯示中文，或是要只顯示中文與日文。

接下來看 Function Calling

情境：需要將中文，翻譯成指定的多個語系，並使用Function Calling。

{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，請翻譯成'zh-CN簡體中文' 'jp-JP日本語' 'en-US美式英文' 完整翻譯，不要發音、注解。"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "split_text",
                "description": "Identify Chinese and other languages in text.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "zh-CN": {
                            "type": "string",
                            "description": "Chinese reslut"
                        },
                        "jp-JP": {
                            "type": "string",
                            "description": "Japanese reslut"
                        },
                        "en-US": {
                            "type": "string",
                            "description": "English reslut"
                        },
                        "howManyLangs":{
                            "type": "number",
                            "description": "How many languages have been produced in total?"
                        }
                    },
                    "required": [
                        "zh-CN"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}

就規格上可以看到，跟過往單純使用 messages 做為input來說，變的複雜不少，所以一般情況下不會使用到 Function Calling 的功能。

tools 工具箱，在這邊你可以定義各式個樣的工具，讓AI的回應時使用你的工具來回答。

這邊我做的是，將翻譯好的句字分類。

回應的結果，耗時1139ms

{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
            {
                "id": "call_IcRHiDV4SkZPx1Jyswmopw6u",
                "type": "function",
                "function": {
                    "name": "split_text",
                    "arguments": "{\"zh-CN\": \"请问最近的车站怎么走?\", \"jp-JP\": \"最近の駅へはどう行けばいいですか?\", \"en-US\": \"How do I get to the nearest station?\"}"
                }
            }
        ],
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 161,
        "completion_tokens": 70,
        "total_tokens": 231
    },
}

//回應重點
{
  'zh-CN': '请问最近的车站怎么走?',
  'jp-JP': '最近の駅へはどう行けばいいですか?',
  'en-US': 'How do I get to the nearest station?'
}

他確實把翻譯結果用我指定的JSON格式回傳。但少了一個 howManyLangs，因為這不是我必定要輸出的格式。

但，Function Calling一定是萬能的嗎？讓我們看下去。

情境：使用Function Calling計算有多少個輸出語種。

當我把 howManyLangs 加到 required後，就變這樣

輸入，省略部份

"required": [
    "zh-CN",
    "howManyLangs"
]

回應，耗時 1386ms

{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
            {
                "id": "call_MEGQYUKN3eSIoXpS0h0KCXkK",
                "type": "function",
                "function": {
                    "name": "split_text",
                    "arguments": "{\"zh-CN\": \"请问最近的车站怎么走?\"}"
                }
            },
            {
                "id": "call_Or3CdU0zhRFsAAAEQK3gdBtR",
                "type": "function",
                "function": {
                    "name": "split_text",
                    "arguments": "{\"jp-JP\": \"最近の駅への行き方を教えてください。\"}"
                }
            },
            {
                "id": "call_kHgrVkfrIF7NbcOWDoIAPijx",
                "type": "function",
                "function": {
                    "name": "split_text",
                    "arguments": "{\"en-US\": \"How do I get to the nearest station?\"}"
                }
            }
        ],
        "refusal": null
    }
}

看可看見，他一樣沒有把 howManyLangs 回傳給我，而且還把function拆成了三個回應給我…

所以程式的後處理很變得也很重要！

情境：使用Function Calling，tools中有多種工具

再來，我們將 function 拆成三個好了，翻簡中、翻日文、翻英文，三個工具

{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，請翻譯成'zh-CN簡體中文' 'jp-JP日本語' 'en-US美式英文' 完整翻譯，不要發音、注解。"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "split_chinese_text",
                "description": "Identify Chinese.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "zh-CN": {
                            "type": "string",
                            "description": "Chinese reslut"
                        }
                    },
                    "required": [
                        "zh-CN"
                    ]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "split_japanese_text",
                "description": "Identify Japanese in text.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "jp-JP": {
                            "type": "string",
                            "description": "Japanese reslut"
                        }
                    },
                    "required": [
                        "jp-JP"
                    ]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "split_englist_text",
                "description": "Identify English in text.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "en-US": {
                            "type": "string",
                            "description": "English reslut"
                        }
                    },
                    "required": [
                        "en-US"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}

回應，耗時 1666ms

{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
            {
                "id": "call_NJrlogGVEXKnNNBjXnBXZZK3",
                "type": "function",
                "function": {
                    "name": "split_chinese_text",
                    "arguments": "{\"zh-CN\": \"请问最近的车站怎么走?\"}"
                }
            },
            {
                "id": "call_bzatO4BZ4dhLwD4PVjIYDcCz",
                "type": "function",
                "function": {
                    "name": "split_japanese_text",
                    "arguments": "{\"jp-JP\": \"最近の駅にはどうやって行きますか？\"}"
                }
            },
            {
                "id": "call_QVAjPv33bpXizBwxxm0v9wbU",
                "type": "function",
                "function": {
                    "name": "split_englist_text",
                    "arguments": "{\"en-US\": \"How do I get to the nearest station?\"}"
                }
            }
        ],
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 176,
        "completion_tokens": 96,
        "total_tokens": 272
    }
}

確確實實的用了三個function來回應我的問題，在做後處理上會方便不少。

情境：使用Function Calling，只執行單一個工具

再來，我們如果有很多工具，但只要指定其中一個工作時，可以對tool_choice做改變

"tool_choice": {
    "type": "function", 
    "function": {
        "name": "split_chinese_text"
     }
}

回應，耗時770ms

{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
            {
                "id": "call_1loM3SzD5aGvILsHYcPCT2bf",
                "type": "function",
                "function": {
                    "name": "split_chinese_text",
                    "arguments": "{\"zh-CN\":\"请问最近的车站怎么走?\"}"
                }
            }
        ],
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 187,
        "completion_tokens": 15,
        "total_tokens": 202
    }
}

情境：任務與Function Calling無關時。

最後，我們再實驗，如果你的題目壓根與翻譯無關時，會發生什麼事

{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "user",
            "content": "今天的台北好冷，你有沒有推薦的保暖方法？"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "split_text",
                "description": "Identify Chinese and other languages in text.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "zh-CN": {
                            "type": "string",
                            "description": "Chinese reslut"
                        },
                        "jp-JP": {
                            "type": "string",
                            "description": "Japanese reslut"
                        },
                        "en-US": {
                            "type": "string",
                            "description": "English reslut"
                        }
                    },
                    "required": [
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}

回應，耗時 3920ms

{
    "message": {
        "role": "assistant",
        "content": "在寒冷的台北，這裡有一些保暖的方法可以幫助你保持溫暖：\n\n1. **穿著層疊衣物**：多層次的穿著可以幫助保持體熱，內層可以選擇保暖內衣，中層穿著毛衣或厚襯衫，外層則選擇防風或防水的外套。\n\n2. **使用熱水袋**：在睡覺或坐在沙發上時，可以使用熱水袋來保暖，放在身邊或腳下，能感覺到很舒適。\n\n3. **喝熱飲**：熱茶、熱巧克力或湯品都能讓身體感覺更暖和，特別是生薑茶，還有助於驅寒。\n\n4. **保持活動**：適當的運動可以促進血液循環，讓身體更溫暖，即使是在室內做一些簡單的伸展運動或瑜伽也是不錯的選擇。\n\n5. **使用厚毛毯**：在家中有一條厚毛毯可以隨時裹住自己，特別是在看電視或閱讀時，能有效保暖。\n\n6. **窗戶密封**：檢查窗戶是否有漏風的地方，可以使用窩心的窗簾或窗戶密封條來減少冷空氣的進入。\n\n希望這些方法能讓你在寒冷的天氣中感覺更舒適！",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 86,
        "completion_tokens": 353,
        "total_tokens": 439
    }
}

他就完全不會照你的設定來回答，因為題目與文字識別沒有關係了。

情境：指定輸出的文字

OpenAI裡給了一個有趣的案例是，設定衣服的大小 function，如果沒有指定的話，function可能就是回應最大、很大、特大這些文字

{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "user",
            "content": "我剛剛去店裡買的衣服太大了，請問能換貨嗎？"
        },
        {
            "role": "assistant",
            "content": "請問你買的是什麼尺吋?"
        },
        {
            "role": "user",
            "content": "吊牌不見了，但我記得是最大的"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "pick_tshirt_size",
                "description": "Call this if the user specifies which size t-shirt they want",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "size": {
                            "type": "string",
                            "enum": ["S", "M", "L", "XL"],
                            "description": "The size of the t-shirt that the user would like to order"
                        }
                    },
                    "required": ["size"],
                    "additionalProperties": false
                }
            }
        }
    ],
    "tool_choice": "auto"
}

有加上 enum的回應

{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
            {
                "id": "call_v0NLwZPkfhCoZWDUIW80HlY6",
                "type": "function",
                "function": {
                    "name": "pick_tshirt_size",
                    "arguments": "{\"size\":\"XL\"}"
                }
            }
        ],
        "refusal": null
    }
}

沒有加上 enum的回應

笑死

{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
            {
                "id": "call_BiojkLcAsOUQ6ImBHOYex4JF",
                "type": "function",
                "function": {
                    "name": "pick_tshirt_size",
                    "arguments": "{\"size\":\"最大\"}"
                }
            }
        ],
        "refusal": null
    }
}

總結

Function Calling確實能夠大大提升AI回答的可處理性，但必要的例外處理也一定要加上，避免非遇期的錯誤發生。

但是實驗中有發現兩件要注意的事

Function Calling也確實會造成 token的數量上升，一開始可能不會有影響，但多輪談話後，成本上的增長也是要注意的。
Function Calling會造成回應時間拉長，從原本的800ms，增長到1200~1700ms不等，如果是做即時應用的話，要把這個因素考量進去。

使用思維鏈，提高問題回答的正確率

2024-09-19T02:42:00+00:00

參考來源

這是個python的範例，我改用Javascript寫，並抽出重點

const messages = [
    {
        role: "system",
        content: `你是一個專業的 AI 助手，會一步一步解釋你的推理過程。對每一步，請提供一個描述你在該步驟中所做工作的標題，並附上內容。決定你是否需要再進行一步，或是已經準備好給出最終答案。請在句尾回應'next_action'（'continue' 或 'final_answer'）。盡可能使用多的推理步驟，至少 3 步。意識到你作為大型語言模型（LLM）的限制，以及你能做和不能做的事情。在你的推理過程中，包含對替代答案的探索。考慮到你可能是錯的，如果你的推理有錯，錯在哪裡。徹底測試所有其他可能性。你可能是錯的。當你說你正在重新檢視時，實際上要重新檢視，並使用另一種方法來進行。不要只是說你正在重新檢視。使用至少 3 種方法來推導答案。使用最佳實踐。`,
    },
    { 
        role: "user", 
        content: ${prompt} 
    },
    {
        role: "assistant",
        content: "謝謝！我現在將按照我的指示一步一步思考，從分解問題後的開始部分開始。",
    },
]

上面的system prompt，讓AI使用3種方式驗證自己寫出來的東西對不對。

在System prompt後，就是發問並請AI開始整理回答

// 思維鏈程式碼

let stepCount = 1;
let totalThinkingTime = 0;

while (true) {
    const startTime = Date.now();
    const stepData = await makeApiCall(messages, 300);
    
    const endTime = Date.now();
    const thinkingTime = (endTime - startTime) / 1000;
    totalThinkingTime += thinkingTime;

    messages.push({ role: "assistant", content: stepData });

    if (stepData.indexOf("final_answer") >= 0 || stepCount > 3) {
        break;
    }
    
    stepCount += 1;
}

整段邏實其實很簡單，就是把AI回答內容再丟回給他反思，直到AI覺得答案是正確的為止。

下面兩個近期用來評論AI的問題，在使用思維鏈的方式後， llama-3.1-8b gemma2-9b-it 甚至 gemma-7b-it 都有不錯的正確率

`strawberry`單字中裡有幾個`r`

與

3.11 與 3.8 這兩個數字誰比較大？

已下是回傳的對話記錄：

步驟 1：問題理解

我需要計算 strawberry 這個單字中有多少個 r。這個問題很簡單，但是我會一步一步地思考。

步驟 2：字串分解

我會把 strawberry 這個單字分解成每個字母：s-t-r-a-w-b-e-r-r-y。現在，我可以開始計算 r 的數量了。

步驟 3：計算 r 的數量

從分解的字串中，我可以看到有 3 個 r。因此，我可以得出結論：strawberry 這個單字中有 3 個 r。

步驟 4：檢查替代答案

我會檢查一下是否有其他可能的答案。例如，如果我錯過了某個 r，或者如果我計算錯誤了。然而，根據我的計算，答案是 3。

步驟 5：確認答案\n我會確認一下我的答案是否正確。根據我的計算，答案是 3。因此，我可以得出結論：strawberry 這個單字中有 3 個 r。

結論

根據我的計算，我可以得出結論：strawberry 這個單字中有 3 個 r。

最終答案是

strawberry 這個單字中有 3 個 r。

GCP Cloud Storage上傳相同檔名時，強制取代

2024-05-06T02:42:00+00:00

網址

官方文件

const options = {
    destination: destFileName,
    // Optional:
    // Set a generation-match precondition to avoid potential race conditions
    // and data corruptions. The request to upload is aborted if the object's
    // generation number does not match your precondition. For a destination
    // object that does not yet exist, set the ifGenerationMatch precondition to 0
    // If the destination object already exists in your bucket, set instead a
    // generation-match precondition using its generation number.
    // 可選：
    // 如果目标對象的世代号與您的前提條件不匹配，上傳請求将被終止。如果目标對象已存在于您的數據桶中，則使用其世代号設置世代匹配前提條件。
    preconditionOpts: {ifGenerationMatch: generationMatchPrecondition},
  };

問題

如果你上傳了相同檔名的文件時，系統會報錯，因為你檔案的版本號沒有調整

{"code":412,"message":"At least one of the pre-conditions you specified did not hold.","errors":[{"message":"At least one of the pre-conditions you specified did not hold.","domain":"global","reason":"conditionNotMet","locationType":"header","location":"If-Match"}]}

解決辦法

preconditionOpts: {
    "ifMatch": 0, //相同檔名則取代
}

GPT-4o vs Gemini 1.5 flash

2024-05-06T02:42:00+00:00

兩者官網

GPT-4o

Gemini 1.5 Flash

成本

GPT-4o

輸入 5美金/1M Token

輸出 15美金/1M Token

Gemini 1.5 Flash

輸入 0.25美金/1M Token

輸出 0.75美金/1M Token

就成本來說，Gemini 1.5 Flash真的有相當相當大的優勢

比較

第一題菜單翻譯

題目：

結果：

GPT-4o：

Gemini 1.5 Flash：

GPT-4o 乍看下沒什麼問題，但內容不少都沒有翻譯出來，還有幻想出來的菜單內容。另外，稅前稅後的價格差異太大了！

Gemini 1.5 Flash，根本沒有翻譯嘛… 但他對金額的判斷有做出來，文字的識別也都有到位。要分兩次工來做才會完美

第二題我是誰

題目：

GPT-4o：

Gemini 1.5 Flash：

答案來說，兩個都對了，但GPT-4o的回答更加完整

第三題我是誰

題目：

GPT-4o：

Gemini 1.5 Flash：

怪力也是兩個都對了，但GPT-4o的回答更加完整

第四題我是誰

題目：

GPT-4o：

Gemini 1.5 Flash：

這個梗圖來說，GPT-4o回答的就不錯了。 Gemini也不差，但頑皮彈你沒寫出來啊。

也有朋友說，由上往下看的胖丁。或是百變怪才是正解哈！

結論

GPT-4o 與 Gemini 1.5 Flash，都是多模態大語言模型，也都有相當短的回應時間。

但成本與回傳品質就表明了是不同的商品，還是要針對使用情境來選擇。

LLM Price Check

2024-05-04T02:42:00+00:00

網址

LLM Price Check

介紹

除了OpenAI、Google、Anthropic的商用大模型外，現在越來越多大模型的SaaS公司，提供各種開源的大模型串接的服務，價格可是相當有競爭力

除了在台灣開始火熱的Groq、還有靠AI搜尋而聞名的Perplexity 也列在這上面

跟GPT-3.5有差不多水準的 llama3-8b 或是 gemma-7b，價格只要 GPT-3.5 的3折

另外，比較常用的功能就是 Quality 品質的排序

可以在要求品質的前提之下，還能找到符合自己預算的平台

Jakevin’s Blog

如何申請Groq的API key

Case 1

Case 2

Case 3

Case 4

如何讓Google Gemini API 相容OpenAI格式

Case 1

Case 2

Case 3

Case 4

如何取得Google AI studio API Key

一起來用 Structured Outputs

Structured Outputs

情境：需要將中文，翻譯成指定的多個語系。

情境：使用Structured Outputs，將翻譯結果指定為多個語系。

情境：使用Structured Outputs，將翻譯結果指定沒有在輸出格式裡的語系。

情境：使用Structured Outputs，沒有指定的語言就不要輸出。

情境：使用Structured Outputs，但不請他翻譯，我問他別的事情

Structured Outputs + Function Calling = 要你命3000

結論

一起來用 Audio generation

Audio generation

情境：將中文語音轉成日文文字。

情境：需要將中文語音，翻譯成指定的多個語系，並使用Function Calling。

情境：使用Structured Outputs，將翻譯結果指定為多個語系。

結論

一起來用 Function Calling

Function Calling

情境：需要將中文，翻譯成指定的多個語系。

情境：需要將中文，翻譯成指定的多個語系，並使用Function Calling。

情境：使用Function Calling計算有多少個輸出語種。

情境：使用Function Calling，tools中有多種工具

情境：使用Function Calling，只執行單一個工具

情境：任務與Function Calling無關時。

情境：指定輸出的文字

總結

使用思維鏈，提高問題回答的正確率

GCP Cloud Storage上傳相同檔名時，強制取代

網址

問題

解決辦法

GPT-4o vs Gemini 1.5 flash

兩者官網

成本

比較

第一題 菜單翻譯

第二題 我是誰

第三題 我是誰

第四題 我是誰

結論

LLM Price Check

網址

介紹

第一題菜單翻譯

第二題我是誰

第三題我是誰

第四題我是誰