<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.2">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2025-02-12T14:32:16+00:00</updated><id>/feed.xml</id><title type="html">Jakevin’s Blog</title><subtitle>這是一個技術相關、生活相關、隨便的記事本</subtitle><entry><title type="html">如何申請Groq的API key</title><link href="/ai/how-to-get-groq-api-key/" rel="alternate" type="text/html" title="如何申請Groq的API key" /><published>2025-02-12T10:42:00+00:00</published><updated>2025-02-12T10:42:00+00:00</updated><id>/ai/how-to-get-groq-api-key</id><content type="html" xml:base="/ai/how-to-get-groq-api-key/"><![CDATA[<p>網址：<a href="https://console.groq.com/playground">Groq console</a></p>

<p>Groq aka AI大善人，提供不少速度極快，品質極高的AI模型</p>

<p><img src="/assets/img/2025-02-12-how-to-get-groq-api-key-01.png" width="60%" /></p>

<p>還有最新的千問2.5 跟 Deepseek R1蒸餾版</p>

<p><img src="/assets/img/2025-02-12-how-to-get-groq-api-key-05.png" width="60%" /></p>

<p>註冊與登入，<a href="https://console.groq.com/login">Login</a></p>

<p><img src="/assets/img/2025-02-12-how-to-get-groq-api-key-02.png" width="60%" /></p>

<p>到Email去收信</p>

<p><img src="/assets/img/2025-02-12-how-to-get-groq-api-key-06.png" width="60%" /></p>

<p>進入後台，選<code class="language-plaintext highlighter-rouge">API keys</code>，然後<code class="language-plaintext highlighter-rouge">Create API Key</code></p>

<p><img src="/assets/img/2025-02-12-how-to-get-groq-api-key-03.png" width="60%" /></p>

<p>注意！注意！注意！ 一定要Copy Key，不然後面會找不到</p>

<p><img src="/assets/img/2025-02-12-how-to-get-groq-api-key-04.png" width="60%" /></p>

<p>以上，就完成Groq的申請了。 如何使用？ 請看下面</p>

<h3 id="case-1">Case 1</h3>

<p>如果你看到的服務，是寫 <code class="language-plaintext highlighter-rouge">https://api.openai.com/v1/chat/completions</code>
那你需要改成 <code class="language-plaintext highlighter-rouge">https://api.groq.com/openai/v1/chat/completions</code></p>

<h3 id="case-2">Case 2</h3>

<p>如果你看到的是寫 <code class="language-plaintext highlighter-rouge">https://api.openai.com/v1</code>
那就改成 <code class="language-plaintext highlighter-rouge">https://api.groq.com/openai/v1</code></p>

<h3 id="case-3">Case 3</h3>

<p>金鑰 = Api Key，如果你看到的服務要填寫Apikey的話，OpenAI的格式是 <code class="language-plaintext highlighter-rouge">sk-1122334455667788</code>
你直接貼上從Groq取得的Key即可，格式像這樣<code class="language-plaintext highlighter-rouge">gsk_1122334455667788</code></p>

<p>如果你看到的是 <code class="language-plaintext highlighter-rouge">Bearer sk-1122334455667788</code>
你就改成 <code class="language-plaintext highlighter-rouge">Bearer Groq的Key</code></p>

<h3 id="case-4">Case 4</h3>

<p>模型名稱的部分，Groq的是長這樣</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gemma2-9b-it
llama-3.3-70b-versatile
qwen-2.5-32b
deepseek-r1-distill-qwen-32b
deepseek-r1-distill-llama-70b-specdec
</code></pre></div></div>
<p>有非常多種，而且3個月左右會更新一批，如果你使用的是 preview 有時會被淘汰
請隨時注意 <a href="https://console.groq.com/docs/models">模型列表</a></p>

<p>目前推薦使用：<code class="language-plaintext highlighter-rouge">gemma2-9b-it</code> <code class="language-plaintext highlighter-rouge">qwen-2.5-32b</code> <code class="language-plaintext highlighter-rouge">deepseek-r1-distill-qwen-32b</code></p>]]></content><author><name></name></author><category term="ai" /><summary type="html"><![CDATA[網址：Groq console]]></summary></entry><entry><title type="html">如何讓Google Gemini API 相容OpenAI格式</title><link href="/google/google-ai-api-openai-compatibility/" rel="alternate" type="text/html" title="如何讓Google Gemini API 相容OpenAI格式" /><published>2025-02-11T14:42:00+00:00</published><updated>2025-02-11T14:42:00+00:00</updated><id>/google/google-ai-api-openai-compatibility</id><content type="html" xml:base="/google/google-ai-api-openai-compatibility/"><![CDATA[<p>網址：<a href="https://aistudio.google.com/">Google AI Studio</a></p>

<p>網路上有很多服務都是希望你填寫OpenAI相容的網址，但Gemini API原本的格式太過獨特，不是每個服務都有相容。</p>

<p><img src="/assets/img/2025-02-13-google-ai-api-openai-compatibility-01.png" width="60%" /></p>

<p>在Google AI Studio裡並不會直接提供OpenAI相容的網址，需要到文件裡去找</p>

<p><img src="/assets/img/2025-02-13-google-ai-api-openai-compatibility-02.png" width="60%" /></p>

<h3 id="case-1">Case 1</h3>

<p>如果你看到的服務，是寫 <code class="language-plaintext highlighter-rouge">https://api.openai.com/v1/chat/completions</code>
那你需要改成 <code class="language-plaintext highlighter-rouge">https://generativelanguage.googleapis.com/v1beta/openai/chat/completions</code></p>

<h3 id="case-2">Case 2</h3>

<p>如果你看到的是寫 <code class="language-plaintext highlighter-rouge">https://api.openai.com/v1</code>
那就改成 <code class="language-plaintext highlighter-rouge">https://generativelanguage.googleapis.com/v1beta/openai/</code></p>

<h3 id="case-3">Case 3</h3>

<p>金鑰 = Api Key，如果你看到的服務要填寫Apikey的話，OpenAI的格式是 <code class="language-plaintext highlighter-rouge">sk-1122334455667788</code>
你直接貼上從Google AI Studio取得的Key即可</p>

<p>如果你看到的是 <code class="language-plaintext highlighter-rouge">Bearer sk-1122334455667788</code>
你就改成 <code class="language-plaintext highlighter-rouge">Bearer Google的Key</code></p>

<h3 id="case-4">Case 4</h3>

<p>模型名稱的部分，Google的是長這樣</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gemini-2.0-pro-exp
gemini-2.0-flash
gemini-2.0-flash-exp
gemini-2.0-flash-thinking-exp
gemini-2.0-flash-lite-preview
gemini-2.0-flash-lite-preview-02-05
</code></pre></div></div>
<p>有非常多種，而且3個月左右會更新一批，如果你使用的是 preview 或是 有帶日期的模型名稱，有時會被淘汰
請隨時注意 <a href="https://ai.google.dev/gemini-api/docs/models/gemini?hl=zh-tw">模型列表</a></p>

<p>目前推薦使用：<code class="language-plaintext highlighter-rouge">gemini-2.0-flash</code> <code class="language-plaintext highlighter-rouge">gemini-2.0-flash-exp</code> <code class="language-plaintext highlighter-rouge">gemini-2.0-pro-exp</code></p>

<p>參考來源：</p>

<p><a href="https://ai.google.dev/gemini-api/docs/openai?hl=zh-tw#rest">OpenAI 相容性</a></p>]]></content><author><name></name></author><category term="google" /><summary type="html"><![CDATA[網址：Google AI Studio]]></summary></entry><entry><title type="html">如何取得Google AI studio API Key</title><link href="/google/how-to-get-google-ai-api-key/" rel="alternate" type="text/html" title="如何取得Google AI studio API Key" /><published>2025-02-10T14:42:00+00:00</published><updated>2025-02-10T14:42:00+00:00</updated><id>/google/how-to-get-google-ai-api-key</id><content type="html" xml:base="/google/how-to-get-google-ai-api-key/"><![CDATA[<p>網址：<a href="https://aistudio.google.com/">Google AI Studio</a></p>

<p>選 <code class="language-plaintext highlighter-rouge">Get API key</code></p>

<p><img src="/assets/img/2025-02-12-how-to-get-google-ai-api-key-01.png" width="60%" /></p>

<p>選 <code class="language-plaintext highlighter-rouge">Create API key</code></p>

<p><img src="/assets/img/2025-02-12-how-to-get-google-ai-api-key-02.png" width="60%" /></p>

<p>選一個專案，沒有的話可能會要求你建一個新的</p>

<p><img src="/assets/img/2025-02-12-how-to-get-google-ai-api-key-03.png" width="60%" /></p>

<p>這樣就完成了，把剛剛拿到的key Copy下來</p>

<p><img src="/assets/img/2025-02-12-how-to-get-google-ai-api-key-04.png" width="60%" /></p>

<p>如果忘了key，只要點一下key後，就會跳出確認視窗</p>

<p><img src="/assets/img/2025-02-12-how-to-get-google-ai-api-key-05.png" width="60%" /></p>

<p>後續：
<a href="https://www.jakevin.uk/google/google-ai-api-openai-compatibility/">如何讓Google Gemini API 相容OpenAI格式</a></p>

<p>參考來源：</p>

<p><a href="https://ai.google.dev/gemini-api/docs/api-key?hl=zh-tw">取得 Gemini API 金鑰</a></p>

<p><a href="https://emtech.cc/p/gemini-api">Gemini API 快速入門：來用 Js 做一個線上聊天吧</a></p>]]></content><author><name></name></author><category term="google" /><summary type="html"><![CDATA[網址：Google AI Studio]]></summary></entry><entry><title type="html">一起來用 Structured Outputs</title><link href="/chatgpt/structured-outputs/" rel="alternate" type="text/html" title="一起來用 Structured Outputs" /><published>2024-10-20T14:42:00+00:00</published><updated>2024-10-20T14:42:00+00:00</updated><id>/chatgpt/structured-outputs</id><content type="html" xml:base="/chatgpt/structured-outputs/"><![CDATA[<p><a href="../function-calling">一起來用 LLM Function Calling</a></p>

<p><a href="../audio-generation">一起來用 LLM Audio generation</a></p>

<p><a href="https://platform.openai.com/docs/guides/structured-outputs">OpenAI Structured Outputs</a></p>

<p>這是OpenAI在2024，是Function Calling推出一年後再一次對輸出格式的重大更新。</p>

<p>我主要做翻譯工具為主，所以介紹的案例會偏向翻譯應用這塊。</p>

<h3 id="structured-outputs">Structured Outputs</h3>

<h4 id="情境需要將中文翻譯成指定的多個語系">情境：需要將中文，翻譯成指定的多個語系。</h4>

<p>先不使用工具提問</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，請翻譯成'zh-CN簡體中文' 'jp-JP日本語' 'en-US美式英文' 完整翻譯，不要發音、注解。"
        }
    ]
}
</code></pre></div></div>

<p>回應，耗時 886ms</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": "
            zh-CN简体中文：请问最近的车站怎么走？
        
            jp-JP日本語：最近の駅へはどうやって行きますか？
        
            en-US美式英文：Excuse me, how do I get to the nearest train station?",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 82,
        "completion_tokens": 55,
        "total_tokens": 137
    }
}
</code></pre></div></div>

<p>他確實回應我要的答案，但這個答案你很難回到你的程式碼裡去做二次應用。比方說只顯示中文，或是要只顯示中文與日文。</p>

<h4 id="情境使用structured-outputs將翻譯結果指定為多個語系">情境：使用Structured Outputs，將翻譯結果指定為多個語系。</h4>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，完整翻譯，不要發音、注解。"
        }
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "split_text",
            "schema": {
                "type": "object",
                "properties": {
                    "zh-CN": {
                        "type": "string",
                        "description": "Chinese reslut"
                    },
                    "jp-JP": {
                        "type": "string",
                        "description": "Japanese reslut"
                    },
                    "en-US": {
                        "type": "string",
                        "description": "Englisg reslut"
                    }
                },
                "required": [
                    "zh-CN"
                ]
            }
        }
    }
}
</code></pre></div></div>

<p>就規格上可以看到<code class="language-plaintext highlighter-rouge">response_format</code>是這次輸出的重點參數。</p>

<p>基本上就都是固定參數，會變動的就 <code class="language-plaintext highlighter-rouge">json_schema.name</code> 跟 <code class="language-plaintext highlighter-rouge">json_schema.schema</code></p>

<p><code class="language-plaintext highlighter-rouge">json_schema.name</code> 顧名思義就是這個 json_schema 的名稱。</p>

<p>而 <code class="language-plaintext highlighter-rouge">json_schema.schema</code> 就很重要了，你要決定json輸出是 object 或 array！</p>

<p><code class="language-plaintext highlighter-rouge">json_schema.schema.properties</code> 開始，就是描述複雜的json回應格式了，就看每個情境來決定。</p>

<p>以我的情況來說，我需要每個語種的翻譯結果.</p>

<p>來看輸出，耗時1339ms</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": "{\"zh-CN\":\"请问最近的车站怎么走？\",\"jp-JP\":\"最近の駅に行くにはどうすればいいですか？\",\"en-US\":\"How do I get to the nearest station?\"}",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 120,
        "completion_tokens": 45,
        "total_tokens": 165,
    }
}

//特別抽出結果
{
  'zh-CN': '请问最近的车站怎么走？',
  'jp-JP': '最近の駅に行くにはどうすればいいですか？',
  'en-US': 'How do I get to the nearest station?'
}

</code></pre></div></div>

<p>可以看到，他會按照我指定的樣子產出給我，方便我們做後處理。</p>

<p>有趣的事，我的prompt裡並沒有指定要輸出的語言，但他也能夠判斷要輸出的語言給我</p>

<h4 id="情境使用structured-outputs將翻譯結果指定沒有在輸出格式裡的語系">情境：使用Structured Outputs，將翻譯結果指定沒有在輸出格式裡的語系。</h4>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，請用'th-TH泰語'完整翻譯，不要發音、注解。"
        }
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "split_text",
            "schema": {
                "type": "object",
                "properties": {
                    "zh-CN": {
                        "type": "string",
                        "description": "Chinese reslut"
                    },
                    "jp-JP": {
                        "type": "string",
                        "description": "Japanese reslut"
                    },
                    "en-US": {
                        "type": "string",
                        "description": "Englisg reslut"
                    }
                },
                "required": [
                    "zh-CN"
                ]
            }
        }
    }
}
</code></pre></div></div>
<p>我刻意要求輸出泰語的翻譯，並強制要求<code class="language-plaintext highlighter-rouge">zh-CN</code>是JSON輸出的必要項目時。</p>

<p>輸出，耗時873ms</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": "{\"en-US\":\"How do I get to the nearest station?\",\"zh-CN\":\"请问最近的车站怎么走?\",\"jp-JP\":\"最近の駅にはどうやって行けばいいですか?\",\"th-TH\":\"บอกทางไปสถานีที่ใกล้ที่สุดหน่อยได้ไหมครับ/ค่ะ?\"}",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 129,
        "completion_tokens": 70,
        "total_tokens": 199
    }
}

//特別抽出結果
{
  'en-US': 'How do I get to the nearest station?',
  'zh-CN': '请问最近的车站怎么走?',
  'jp-JP': '最近の駅にはどうやって行けばいいですか?',
  'th-TH': 'บอกทางไปสถานีที่ใกล้ที่สุดหน่อยได้ไหมครับ/ค่ะ?'
}
</code></pre></div></div>

<p>依照我要的格式輸出外，還把指定格式內的參數也產出了。</p>

<p>為什麼跟上一次實驗結果一樣，沒有指定的參數也出來了？</p>

<p>原因在這：<a href="https://platform.openai.com/docs/guides/structured-outputs/best-practices">Tips and best practices</a></p>

<p>OpenAI有規定，要強制把所有參數都放到<code class="language-plaintext highlighter-rouge">required</code>裡，看來他們後端有自己也做了這個動作。</p>

<h4 id="情境使用structured-outputs沒有指定的語言就不要輸出">情境：使用Structured Outputs，沒有指定的語言就不要輸出。</h4>

<p>根據OpenAI的說明，可以把 <code class="language-plaintext highlighter-rouge">properties.paramter.type</code>指定成 <code class="language-plaintext highlighter-rouge">["string", "null"]</code></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，請用'th-TH泰語'完整翻譯，不要發音、注解。"
        }
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "split_text",
            "schema": {
                "type": "object",
                "properties": {
                    "zh-CN": {
                        "type": "string",
                        "description": "Chinese reslut"
                    },
                    "jp-JP": {
                        "type": ["string", "null"],
                        "description": "Japanese reslut"
                    },
                    "en-US": {
                        "type": ["string", "null"],
                        "description": "Englisg reslut"
                    }
                },
                "required": [
                    "zh-CN"
                ]
            }
        }
    }
}

</code></pre></div></div>

<p>輸出：耗時1603ms</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": "{\"zh-CN\":\"请问最近的车站怎么走?\",\"jp-JP\":null,\"en-US\":null,\"th-TH\":\"กรุณาถามว่าไปสถานีที่ใกล้ที่สุดได้อย่างไร?\"}",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 135,
        "completion_tokens": 48,
        "total_tokens": 183
    }
}
//特別抽出結果
{
    "zh-CN":"请问最近的车站怎么走?",
    "jp-JP":null,
    "en-US":null,
    "th-TH":"กรุณาถามว่าไปสถานีที่ใกล้ที่สุดได้อย่างไร?"
}
</code></pre></div></div>

<p>在不屬於我指定範圍內的語言，他就會轉成null了</p>

<h4 id="情境使用structured-outputs但不請他翻譯我問他別的事情">情境：使用Structured Outputs，但不請他翻譯，我問他別的事情</h4>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "user",
            "content": "今天的台北好冷，你有沒有推薦的保暖方法？"
        }
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "split_text",
            "schema": {
                "type": "object",
                "properties": {
                    "zh-CN": {
                        "type": "string",
                        "description": "Chinese reslut"
                    },
                    "jp-JP": {
                        "type": ["string", "null"],
                        "description": "Japanese reslut"
                    },
                    "en-US": {
                        "type": ["string", "null"],
                        "description": "Englisg reslut"
                    }
                },
                "required": [
                    "zh-CN","jp-JP","en-US"
                ]
            }
        }
    }
}
</code></pre></div></div>

<p>輸出：2170ms</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": "{\"en-US\":[\"Wear layers of clothing to trap heat, such as thermal shirts, sweaters, and insulating jackets.\",\"Use a warm scarf, hat, and gloves to keep extremities warm.\",\"Drink hot beverages like tea or coffee to stay warm from the inside.\",\"Consider using a space heater or electric blanket at home.\",\"Snuggle up with a warm blanket while watching movies or reading a book.\"],\"jp-JP\":[null],\"zh-CN\":\"可以尝试以下几种保暖方法：\\n\\n1. 多穿几层衣服，选择保暖的内衣、毛衣和外套。\\n2. 戴上围巾、帽子和手套，以保暖手脚部位。\\n3. 喝热饮如茶或咖啡，让身体从内部保持温暖。\\n4. 在家使用取暖器或电热毯。\\n5. 看电影或阅读时，裹上毛毯。\"}",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 97,
        "completion_tokens": 200,
        "total_tokens": 297
    }
}

</code></pre></div></div>

<p>他還是回傳了我要的json格式！因為我的任務其實很單純，所以LLM會理解成要一個方案後，給我一些語系上的回傳。</p>

<p>不過 OpenAI文件有提到例外處理還是要做！ <a href="https://platform.openai.com/docs/guides/structured-outputs/refusals">拒絕結構化輸出</a></p>

<h4 id="structured-outputs--function-calling--要你命3000">Structured Outputs + Function Calling = 要你命3000</h4>

<p>每個都能獨當一面的功能摻在一起後，會變成<code class="language-plaintext highlighter-rouge">要你命3000</code>還是<code class="language-plaintext highlighter-rouge">撒尿牛丸</code>？</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "user",
            "content": "今天的台北好冷，我想買最大的外套。你那邊有沒有推薦的保暖方法？"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "pick_size",
                "description": "Call this if the user specifies which size they want",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "size": {
                            "type": "string",
                            "enum": [
                                "S",
                                "M",
                                "L",
                                "XL"
                            ],
                            "description": "The size of the t-shirt that the user would like to order"
                        }
                    },
                    "required": [
                        "size"
                    ],
                    "additionalProperties": false
                }
            }
        }
    ],
    "tool_choice": "auto",
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "split_text",
            "schema": {
                "type": "object",
                "properties": {
                    "zh-CN": {
                        "type": "string",
                        "description": "Chinese reslut"
                    },
                    "jp-JP": {
                        "type": [
                            "string",
                            "null"
                        ],
                        "description": "Japanese reslut"
                    },
                    "en-US": {
                        "type": [
                            "string",
                            "null"
                        ],
                        "description": "Englisg reslut"
                    }
                },
                "required": [
                    "zh-CN",
                    "jp-JP",
                    "en-US"
                ]
            }
        }
    }
}

</code></pre></div></div>

<p>輸出：耗時 3370ms</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "{\"en-US\":[null],\"jp-JP\":[null],\"zh-CN\":\"在台北寒冷的天气中，除了购买最大的外套，你可以尝试以下几种保暖的方法：\\n\\n1. **分层穿衣**：穿多层衣物，如长袖衬衫、毛衣和外套，可以更好地保持体温。\\n2. **使用围巾和手套**：围巾可以保护脖子，手套则保暖双手。\\n3. **选用保暖材料**：如羊毛、羽绒或厚棉材料的衣物，能有效隔绝寒冷。\\n4. **保暖鞋袜**：穿保暖鞋和厚袜子，确保脚部的温暖。\\n5. **喝热饮**：喝热茶、热巧克力等可以让身体内部升温。\\n6. **活动身体**：适当运动可以帮助提升体温，避免长时间静坐。 \\n\\n希望这些方法能帮助你在寒冷的台北保持温暖！\"}",
                "refusal": null
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 168,
        "completion_tokens": 242,
        "total_tokens": 410
    }
}
</code></pre></div></div>

<p>換個問法</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": null,
                "tool_calls": [
                    {
                        "id": "call_WshjoFjxhiN84EFE3S8lFGHt",
                        "type": "function",
                        "function": {
                            "name": "pick_size",
                            "arguments": "{\"size\": \"XL\"}"
                        }
                    }
                ],
                "refusal": null
            },
            "logprobs": null,
            "finish_reason": "tool_calls"
        }
    ],
    "usage": {
        "prompt_tokens": 170,
        "completion_tokens": 30,
        "total_tokens": 200
    }
}

</code></pre></div></div>

<p>Seeeeee！ 看到沒，<code class="language-plaintext highlighter-rouge">要你命3000</code>啊！</p>

<p>特別把 <code class="language-plaintext highlighter-rouge">choices[0].finish_reason</code> 拿出來講</p>

<p>如果這一步採<code class="language-plaintext highlighter-rouge">Structured Outputs</code>的話，<code class="language-plaintext highlighter-rouge">finish_reason</code>是 stop</p>

<p>如果是採用<code class="language-plaintext highlighter-rouge">Function Calling</code>的話，<code class="language-plaintext highlighter-rouge">finish_reason</code>是tool_calls</p>

<h4 id="結論">結論</h4>

<p>Structured Outputs是號稱100%輸出成你想要的樣子，我的案例實測下來確實是這個樣子，還能給出一些些驚喜。</p>

<p>這樣能有效的讓AI介入開發的環節之內，有效的利用回傳內容做後處理。</p>

<p>Structured Outputs與Function Calling比較</p>

<p>共同點</p>
<ol>
  <li>能夠有效產出json格式，並指定自己的參數上去</li>
  <li>都會照成token數上昇，但比Structured Outputs節省一些</li>
  <li>都會造成回應時間拉長，原本880ms，變成1300~1600ms左右</li>
</ol>

<p>不同點</p>
<ol>
  <li>Function Calling會因為LLM選擇tool的不同，產出不同的結果</li>
  <li>Function Calling產出結果的處理步驟會複雜一些</li>
  <li>Function Calling彈性較大，tools的工具庫可以自由開發</li>
</ol>

<p>Structured Outputs與Function Calling同時使用的話，不能保證會是哪一個結果被輸出哦！</p>]]></content><author><name></name></author><category term="chatgpt" /><summary type="html"><![CDATA[一起來用 LLM Function Calling]]></summary></entry><entry><title type="html">一起來用 Audio generation</title><link href="/chatgpt/audio-generation/" rel="alternate" type="text/html" title="一起來用 Audio generation" /><published>2024-10-20T14:42:00+00:00</published><updated>2024-10-20T14:42:00+00:00</updated><id>/chatgpt/audio-generation</id><content type="html" xml:base="/chatgpt/audio-generation/"><![CDATA[<p><a href="../function-calling">一起來用 LLM Function Calling</a></p>

<p><a href="../structured-outputs">一起來用 LLM Structured Outputs</a></p>

<p><a href="https://platform.openai.com/docs/guides/audio/audio-generation">OpenAI Audio generation</a></p>

<p>這是OpenAI在2024年10月才發佈的功能，使用的模型是 <code class="language-plaintext highlighter-rouge">gpt-4o-audio-preview</code>，能夠支援語音的輸入與輸出</p>

<p>與<a href="https://platform.openai.com/docs/guides/realtime/quickstart">Realtime API</a>不一樣，他走的還是 <code class="language-plaintext highlighter-rouge">/chat/completions</code>，並不是 WebSocket</p>

<p><a href="https://openai.com/api/pricing/">價格</a></p>

<p>Audio***
$100.00 / 1M input tokens
$200.00 / 1M output tokens</p>

<p><img src="/assets/img/2024-10-21-audio-generation-01.png" width="60%" /></p>

<p>100美金看似很驚人，但其實與 whisper-1 的費用相同</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>***Audio input costs approximately 6¢ per minute; Audio output costs approximately 24¢ per minute
***語音輸入約每分鐘6美分；語音輸出約每分鐘24美分。
</code></pre></div></div>

<p>不過我這邊用來做即時的語音翻譯，所以我會專注說明<code class="language-plaintext highlighter-rouge">語音輸入 文字輸出</code>的部份</p>

<h3 id="audio-generation">Audio generation</h3>

<h4 id="情境將中文語音轉成日文文字">情境：將中文語音轉成日文文字。</h4>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-audio-preview",
    "modalities": ["text"],
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": [
                { "type": "text", text: "將語音，用'jp-JP'完整翻譯，不要發音、注解。" },
                { "type": "input_audio", "input_audio": { "data": base64str, "format": "mp3" } }
            ]
        }
    ]
}
</code></pre></div></div>

<p>回應，耗時 2970ms</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": "皆さん、こんばんは。私のチャンネルへようこそ。",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 100,
        "completion_tokens": 15,
        "total_tokens": 115,
    }
}
</code></pre></div></div>

<p>這個用法與<code class="language-plaintext highlighter-rouge">Whisper API</code>幾乎是一模一樣，但下面是我踩到的坑！！</p>

<ol>
  <li>Whisper是支援 webm 或 ogg等5~6種語音格式，但<code class="language-plaintext highlighter-rouge">gpt-4o-audio-preview</code>只有支援mp3與wav</li>
</ol>

<p>這意味，在瀏覽器上無法直接使用，必需轉檔。</p>

<ol>
  <li>使用base64做為audio input，你還必須刪掉wav的base64的頭部meta
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>base64data.split(',')[1]
</code></pre></div>    </div>
  </li>
</ol>

<h4 id="情境需要將中文語音翻譯成指定的多個語系並使用function-calling">情境：需要將中文語音，翻譯成指定的多個語系，並使用Function Calling。</h4>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-audio-preview",
    "modalities": ["text"],
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": [
                { "type": "text", text: "將語音，請翻譯成'zh-CN簡體中文' 'jp-JP日本語' 'en-US美式英文' 完整翻譯，不要發音、注解。" },
                { "type": "input_audio", "input_audio": { "data": base64str, "format": "mp3" } }
            ]
        }
    ],
     "tools": [
        {
            "type": "function",
            "function": {
                "name": "split_text",
                "description": "Identify Chinese and other languages in text.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "zh-CN": {
                            "type": "string",
                            "description": "Chinese reslut"
                        },
                        "jp-JP": {
                            "type": "string",
                            "description": "Japanese reslut"
                        },
                        "en-US": {
                            "type": "string",
                            "description": "English reslut"
                        }
                    },
                    "required": [
                        "zh-CN"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}
</code></pre></div></div>

<p>輸出，耗時4970ms</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_dAdfaFvPBoL4JrgL4QRphIfo",
            "type": "function",
            "function": {
              "name": "split_text",
              "arguments": "{\"zh-CN\": \"大家好，今天是10月21号，星期一，欢迎来到我们的节目。\"}"
            }
          },
          {
            "id": "call_HnffYc7jMsIZntdBBhPbgrsp",
            "type": "function",
            "function": {
              "name": "split_text",
              "arguments": "{\"jp-JP\": \"皆さん、こんにちは。 今日は10月21日、月曜日です。 私たちのプログラムへようこそ。\"}"
            }
          },
          {
            "id": "call_VfDkfyBKwKzxRHkNjLyyqnd7",
            "type": "function",
            "function": {
              "name": "split_text",
              "arguments": "{\"en-US\": \"Hello everyone, today is October 21st, Monday. Welcome to our show.\"}"
            }
          }
        ],
        "refusal": null
    },
    "usage": {
       "prompt_tokens": 188,
        "completion_tokens": 123,
        "total_tokens": 311,
    }
}
</code></pre></div></div>

<p>Function Calling是有支援的，跟前面的實驗一樣，輸出的結果還是有所不同</p>

<p>但Function Calling的彈性很大，應該可以完成不少有趣的開發</p>

<h4 id="情境使用structured-outputs將翻譯結果指定為多個語系">情境：使用Structured Outputs，將翻譯結果指定為多個語系。</h4>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-audio-preview",
    "modalities": ["text"],
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": [
                { "type": "text", text: "將語音，用完整翻譯，不要發音、注解。" },
                { "type": "input_audio", "input_audio": { "data": base64str, "format": "mp3" } }
            ]
        }
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "split_text",
            "schema": {
                "type": "object",
                "properties": {
                    "zh-CN": {
                        "type": "string",
                        "description": "Chinese reslut"
                    },
                    "jp-JP": {
                        "type": "string",
                        "description": "Japanese reslut"
                    },
                    "en-US": {
                        "type": "string",
                        "description": "Englisg reslut"
                    }
                },
                "required": [
                    "zh-CN"
                ]
            }
        }
    }
}
</code></pre></div></div>

<p>上面是照著 Structured Outputs 實驗文章的做法，把語音的結果轉成我要的多語系翻譯</p>

<p>來看輸出，耗時339ms</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "error": {
        "message": "Invalid parameter: 'response_format' of type 'json_schema' is not supported with this model. Learn more about supported models at the Structured Outputs guide: https://platform.openai.com/docs/guides/structured-outputs",
        "type": "invalid_request_error",
        "param": null,
        "code": null
    }
}
</code></pre></div></div>

<p>理想很豐滿，現實很骨感…</p>

<p><code class="language-plaintext highlighter-rouge">gpt-4o-audio-preview</code> 並不支援新的 Structured Outputs 結構</p>

<h4 id="結論">結論</h4>

<p>傳統的作法是 先用 whisper將語音轉成文字後再呼叫LLM模型。</p>

<p>Audio generation 則是直接讓語音進入OpenAI當下就執行LLM模型，直接是快上一步。</p>

<p>不過，就我開發的過程來看，有兩個大缺點要改善</p>

<ol>
  <li>
    <p>不能支援webm，就是個硬傷。轉成mp3或wav再上傳，不僅讓語音檔案變大，還要等轉檔的時間</p>
  </li>
  <li>
    <p>Structured Outputs居然沒有支援，你是GPT-4o家族的模型為什麼不能用？！</p>
  </li>
</ol>

<p>期待這兩個缺點能快快修正，這樣我的即時語音翻譯可以再快一點點！</p>]]></content><author><name></name></author><category term="chatgpt" /><summary type="html"><![CDATA[一起來用 LLM Function Calling]]></summary></entry><entry><title type="html">一起來用 Function Calling</title><link href="/chatgpt/function-calling/" rel="alternate" type="text/html" title="一起來用 Function Calling" /><published>2024-10-20T02:42:00+00:00</published><updated>2024-10-20T02:42:00+00:00</updated><id>/chatgpt/function-calling</id><content type="html" xml:base="/chatgpt/function-calling/"><![CDATA[<p><a href="../structured-outputs">一起來用 LLM Function Calling</a></p>

<p><a href="../audio-generation">一起來用 LLM Audio generation</a></p>

<p><a href="https://platform.openai.com/docs/guides/function-calling">OpenAI Function Calling</a></p>

<p>這是OpenAI在2023/06推出的功能，基於文字生成後，能給相對穩定的輸出格式。</p>

<p>我主要做翻譯工具為主，所以介紹的案例會偏向翻譯應用這塊。</p>

<h3 id="function-calling">Function Calling</h3>

<h4 id="情境需要將中文翻譯成指定的多個語系">情境：需要將中文，翻譯成指定的多個語系。</h4>

<p>先不使用工具提問</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，請翻譯成'zh-CN簡體中文' 'jp-JP日本語' 'en-US美式英文' 完整翻譯，不要發音、注解。"
        }
    ]
}
</code></pre></div></div>

<p>回應，耗時 886ms</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": "
            zh-CN简体中文：请问最近的车站怎么走？
        
            jp-JP日本語：最近の駅へはどうやって行きますか？
        
            en-US美式英文：Excuse me, how do I get to the nearest train station?",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 82,
        "completion_tokens": 55,
        "total_tokens": 137
    }
}
</code></pre></div></div>

<p>他確實回應我要的答案，但這個答案你很難回到你的程式碼裡去做二次應用。比方說只顯示中文，或是要只顯示中文與日文。</p>

<p>接下來看 Function Calling</p>

<h4 id="情境需要將中文翻譯成指定的多個語系並使用function-calling">情境：需要將中文，翻譯成指定的多個語系，並使用Function Calling。</h4>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，請翻譯成'zh-CN簡體中文' 'jp-JP日本語' 'en-US美式英文' 完整翻譯，不要發音、注解。"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "split_text",
                "description": "Identify Chinese and other languages in text.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "zh-CN": {
                            "type": "string",
                            "description": "Chinese reslut"
                        },
                        "jp-JP": {
                            "type": "string",
                            "description": "Japanese reslut"
                        },
                        "en-US": {
                            "type": "string",
                            "description": "English reslut"
                        },
                        "howManyLangs":{
                            "type": "number",
                            "description": "How many languages have been produced in total?"
                        }
                    },
                    "required": [
                        "zh-CN"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}

</code></pre></div></div>

<p>就規格上可以看到，跟過往單純使用 <code class="language-plaintext highlighter-rouge">messages</code> 做為input來說，變的複雜不少，所以一般情況下不會使用到 Function Calling 的功能。</p>

<p><code class="language-plaintext highlighter-rouge">tools</code> 工具箱，在這邊你可以定義各式個樣的工具，讓AI的回應時使用你的工具來回答。</p>

<p>這邊我做的是，將翻譯好的句字分類。</p>

<p>回應的結果，耗時1139ms</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
            {
                "id": "call_IcRHiDV4SkZPx1Jyswmopw6u",
                "type": "function",
                "function": {
                    "name": "split_text",
                    "arguments": "{\"zh-CN\": \"请问最近的车站怎么走?\", \"jp-JP\": \"最近の駅へはどう行けばいいですか?\", \"en-US\": \"How do I get to the nearest station?\"}"
                }
            }
        ],
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 161,
        "completion_tokens": 70,
        "total_tokens": 231
    },
}

//回應重點
{
  'zh-CN': '请问最近的车站怎么走?',
  'jp-JP': '最近の駅へはどう行けばいいですか?',
  'en-US': 'How do I get to the nearest station?'
}
</code></pre></div></div>

<p>他確實把翻譯結果用我指定的JSON格式回傳。但少了一個 <code class="language-plaintext highlighter-rouge">howManyLangs</code>，因為這不是我必定要輸出的格式。</p>

<p>但，Function Calling一定是萬能的嗎？ 讓我們看下去。</p>

<h4 id="情境使用function-calling計算有多少個輸出語種">情境：使用Function Calling計算有多少個輸出語種。</h4>

<p>當我把 <code class="language-plaintext highlighter-rouge">howManyLangs</code> 加到 <code class="language-plaintext highlighter-rouge">required</code>後，就變這樣</p>

<p>輸入，省略部份</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"required": [
    "zh-CN",
    "howManyLangs"
]
</code></pre></div></div>

<p>回應，耗時 1386ms</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
            {
                "id": "call_MEGQYUKN3eSIoXpS0h0KCXkK",
                "type": "function",
                "function": {
                    "name": "split_text",
                    "arguments": "{\"zh-CN\": \"请问最近的车站怎么走?\"}"
                }
            },
            {
                "id": "call_Or3CdU0zhRFsAAAEQK3gdBtR",
                "type": "function",
                "function": {
                    "name": "split_text",
                    "arguments": "{\"jp-JP\": \"最近の駅への行き方を教えてください。\"}"
                }
            },
            {
                "id": "call_kHgrVkfrIF7NbcOWDoIAPijx",
                "type": "function",
                "function": {
                    "name": "split_text",
                    "arguments": "{\"en-US\": \"How do I get to the nearest station?\"}"
                }
            }
        ],
        "refusal": null
    }
}
</code></pre></div></div>

<p>看可看見，他一樣沒有把 <code class="language-plaintext highlighter-rouge">howManyLangs</code> 回傳給我，而且還把function拆成了三個回應給我…</p>

<p>所以程式的後處理很變得也很重要！</p>

<h4 id="情境使用function-callingtools中有多種工具">情境：使用Function Calling，tools中有多種工具</h4>

<p>再來，我們將 function 拆成三個好了，翻簡中、翻日文、翻英文，三個工具</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "system",
            "content": "You are a translation expert. I will ask you to translate a language into setting languages"
        },
        {
            "role": "user",
            "content": "句子\"請問最近的車站怎麼走?\"，請翻譯成'zh-CN簡體中文' 'jp-JP日本語' 'en-US美式英文' 完整翻譯，不要發音、注解。"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "split_chinese_text",
                "description": "Identify Chinese.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "zh-CN": {
                            "type": "string",
                            "description": "Chinese reslut"
                        }
                    },
                    "required": [
                        "zh-CN"
                    ]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "split_japanese_text",
                "description": "Identify Japanese in text.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "jp-JP": {
                            "type": "string",
                            "description": "Japanese reslut"
                        }
                    },
                    "required": [
                        "jp-JP"
                    ]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "split_englist_text",
                "description": "Identify English in text.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "en-US": {
                            "type": "string",
                            "description": "English reslut"
                        }
                    },
                    "required": [
                        "en-US"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}
</code></pre></div></div>

<p>回應，耗時 1666ms</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
            {
                "id": "call_NJrlogGVEXKnNNBjXnBXZZK3",
                "type": "function",
                "function": {
                    "name": "split_chinese_text",
                    "arguments": "{\"zh-CN\": \"请问最近的车站怎么走?\"}"
                }
            },
            {
                "id": "call_bzatO4BZ4dhLwD4PVjIYDcCz",
                "type": "function",
                "function": {
                    "name": "split_japanese_text",
                    "arguments": "{\"jp-JP\": \"最近の駅にはどうやって行きますか？\"}"
                }
            },
            {
                "id": "call_QVAjPv33bpXizBwxxm0v9wbU",
                "type": "function",
                "function": {
                    "name": "split_englist_text",
                    "arguments": "{\"en-US\": \"How do I get to the nearest station?\"}"
                }
            }
        ],
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 176,
        "completion_tokens": 96,
        "total_tokens": 272
    }
}
</code></pre></div></div>

<p>確確實實的用了三個function來回應我的問題，在做後處理上會方便不少。</p>

<h4 id="情境使用function-calling只執行單一個工具">情境：使用Function Calling，只執行單一個工具</h4>

<p>再來，我們如果有很多工具，但只要指定其中一個工作時，可以對<code class="language-plaintext highlighter-rouge">tool_choice</code>做改變</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"tool_choice": {
    "type": "function", 
    "function": {
        "name": "split_chinese_text"
     }
}
</code></pre></div></div>

<p>回應，耗時770ms</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
            {
                "id": "call_1loM3SzD5aGvILsHYcPCT2bf",
                "type": "function",
                "function": {
                    "name": "split_chinese_text",
                    "arguments": "{\"zh-CN\":\"请问最近的车站怎么走?\"}"
                }
            }
        ],
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 187,
        "completion_tokens": 15,
        "total_tokens": 202
    }
}
</code></pre></div></div>

<h4 id="情境任務與function-calling無關時">情境：任務與Function Calling無關時。</h4>

<p>最後，我們再實驗，如果你的題目壓根與翻譯無關時，會發生什麼事</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "user",
            "content": "今天的台北好冷，你有沒有推薦的保暖方法？"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "split_text",
                "description": "Identify Chinese and other languages in text.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "zh-CN": {
                            "type": "string",
                            "description": "Chinese reslut"
                        },
                        "jp-JP": {
                            "type": "string",
                            "description": "Japanese reslut"
                        },
                        "en-US": {
                            "type": "string",
                            "description": "English reslut"
                        }
                    },
                    "required": [
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}

</code></pre></div></div>

<p>回應，耗時 3920ms</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": "在寒冷的台北，這裡有一些保暖的方法可以幫助你保持溫暖：\n\n1. **穿著層疊衣物**：多層次的穿著可以幫助保持體熱，內層可以選擇保暖內衣，中層穿著毛衣或厚襯衫，外層則選擇防風或防水的外套。\n\n2. **使用熱水袋**：在睡覺或坐在沙發上時，可以使用熱水袋來保暖，放在身邊或腳下，能感覺到很舒適。\n\n3. **喝熱飲**：熱茶、熱巧克力或湯品都能讓身體感覺更暖和，特別是生薑茶，還有助於驅寒。\n\n4. **保持活動**：適當的運動可以促進血液循環，讓身體更溫暖，即使是在室內做一些簡單的伸展運動或瑜伽也是不錯的選擇。\n\n5. **使用厚毛毯**：在家中有一條厚毛毯可以隨時裹住自己，特別是在看電視或閱讀時，能有效保暖。\n\n6. **窗戶密封**：檢查窗戶是否有漏風的地方，可以使用窩心的窗簾或窗戶密封條來減少冷空氣的進入。\n\n希望這些方法能讓你在寒冷的天氣中感覺更舒適！",
        "refusal": null
    },
    "usage": {
        "prompt_tokens": 86,
        "completion_tokens": 353,
        "total_tokens": 439
    }
}
</code></pre></div></div>

<p>他就完全不會照你的設定來回答，因為題目與文字識別沒有關係了。</p>

<h4 id="情境指定輸出的文字">情境：指定輸出的文字</h4>

<p>OpenAI裡給了一個有趣的案例是，設定衣服的大小 function，如果沒有指定的話，function可能就是回應最大、很大、特大這些文字</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "model": "gpt-4o-mini",
    "messages": [
        {
            "role": "user",
            "content": "我剛剛去店裡買的衣服太大了，請問能換貨嗎？"
        },
        {
            "role": "assistant",
            "content": "請問你買的是什麼尺吋?"
        },
        {
            "role": "user",
            "content": "吊牌不見了，但我記得是最大的"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "pick_tshirt_size",
                "description": "Call this if the user specifies which size t-shirt they want",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "size": {
                            "type": "string",
                            "enum": ["S", "M", "L", "XL"],
                            "description": "The size of the t-shirt that the user would like to order"
                        }
                    },
                    "required": ["size"],
                    "additionalProperties": false
                }
            }
        }
    ],
    "tool_choice": "auto"
}
</code></pre></div></div>

<p>有加上 enum的回應</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
            {
                "id": "call_v0NLwZPkfhCoZWDUIW80HlY6",
                "type": "function",
                "function": {
                    "name": "pick_tshirt_size",
                    "arguments": "{\"size\":\"XL\"}"
                }
            }
        ],
        "refusal": null
    }
}
</code></pre></div></div>

<p>沒有加上 enum的回應</p>

<p><code class="language-plaintext highlighter-rouge">笑死</code></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
            {
                "id": "call_BiojkLcAsOUQ6ImBHOYex4JF",
                "type": "function",
                "function": {
                    "name": "pick_tshirt_size",
                    "arguments": "{\"size\":\"最大\"}"
                }
            }
        ],
        "refusal": null
    }
}
</code></pre></div></div>

<h4 id="總結">總結</h4>

<p>Function Calling確實能夠大大提升AI回答的可處理性，但必要的例外處理也一定要加上，避免非遇期的錯誤發生。</p>

<p>但是實驗中有發現兩件要注意的事</p>

<ol>
  <li>
    <p>Function Calling也確實會造成 token的數量上升，一開始可能不會有影響，但多輪談話後，成本上的增長也是要注意的。</p>
  </li>
  <li>
    <p>Function Calling會造成回應時間拉長，從原本的800ms，增長到1200~1700ms不等，如果是做即時應用的話，要把這個因素考量進去。</p>
  </li>
</ol>]]></content><author><name></name></author><category term="chatgpt" /><summary type="html"><![CDATA[一起來用 LLM Function Calling]]></summary></entry><entry><title type="html">使用思維鏈，提高問題回答的正確率</title><link href="/chatgpt/llm-o1-like-reasoning-chains/" rel="alternate" type="text/html" title="使用思維鏈，提高問題回答的正確率" /><published>2024-09-19T02:42:00+00:00</published><updated>2024-09-19T02:42:00+00:00</updated><id>/chatgpt/llm-o1-like-reasoning-chains</id><content type="html" xml:base="/chatgpt/llm-o1-like-reasoning-chains/"><![CDATA[<p><a href="https://github.com/bklieger-groq/g1">參考來源</a></p>

<p>這是個python的範例，我改用Javascript寫，並抽出重點</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>const messages = [
    {
        role: "system",
        content: `你是一個專業的 AI 助手，會一步一步解釋你的推理過程。對每一步，請提供一個描述你在該步驟中所做工作的標題，並附上內容。決定你是否需要再進行一步，或是已經準備好給出最終答案。請在句尾回應'next_action'（'continue' 或 'final_answer'）。盡可能使用多的推理步驟，至少 3 步。意識到你作為大型語言模型（LLM）的限制，以及你能做和不能做的事情。在你的推理過程中，包含對替代答案的探索。考慮到你可能是錯的，如果你的推理有錯，錯在哪裡。徹底測試所有其他可能性。你可能是錯的。當你說你正在重新檢視時，實際上要重新檢視，並使用另一種方法來進行。不要只是說你正在重新檢視。使用至少 3 種方法來推導答案。使用最佳實踐。`,
    },
    { 
        role: "user", 
        content: ${prompt} 
    },
    {
        role: "assistant",
        content: "謝謝！我現在將按照我的指示一步一步思考，從分解問題後的開始部分開始。",
    },
]

</code></pre></div></div>

<p>上面的system prompt，讓AI使用3種方式驗證自己寫出來的東西對不對。</p>

<p>在System prompt後，就是發問並請AI開始整理回答</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// 思維鏈程式碼

let stepCount = 1;
let totalThinkingTime = 0;

while (true) {
    const startTime = Date.now();
    const stepData = await makeApiCall(messages, 300);
    
    const endTime = Date.now();
    const thinkingTime = (endTime - startTime) / 1000;
    totalThinkingTime += thinkingTime;

    messages.push({ role: "assistant", content: stepData });

    if (stepData.indexOf("final_answer") &gt;= 0 || stepCount &gt; 3) {
        break;
    }
    
    stepCount += 1;
}

</code></pre></div></div>

<p>整段邏實其實很簡單，就是把AI回答內容再丟回給他反思，直到AI覺得答案是正確的為止。</p>

<p>下面兩個近期用來評論AI的問題，在使用思維鏈的方式後， <code class="language-plaintext highlighter-rouge">llama-3.1-8b</code> <code class="language-plaintext highlighter-rouge">gemma2-9b-it</code> 甚至 <code class="language-plaintext highlighter-rouge">gemma-7b-it</code> 都有不錯的正確率</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>`strawberry`單字中裡有幾個`r`
</code></pre></div></div>

<p>與</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>3.11 與 3.8 這兩個數字誰比較大？
</code></pre></div></div>

<p>已下是回傳的對話記錄：</p>

<p><strong>步驟 1：問題理解</strong></p>

<p>我需要計算 <code class="language-plaintext highlighter-rouge">strawberry</code> 這個單字中有多少個 <code class="language-plaintext highlighter-rouge">r</code>。這個問題很簡單，但是我會一步一步地思考。</p>

<p><strong>步驟 2：字串分解</strong></p>

<p>我會把 <code class="language-plaintext highlighter-rouge">strawberry</code> 這個單字分解成每個字母：s-t-r-a-w-b-e-r-r-y。現在，我可以開始計算 <code class="language-plaintext highlighter-rouge">r</code> 的數量了。</p>

<p><strong>步驟 3：計算 <code class="language-plaintext highlighter-rouge">r</code> 的數量</strong></p>

<p>從分解的字串中，我可以看到有 3 個 <code class="language-plaintext highlighter-rouge">r</code>。因此，我可以得出結論：<code class="language-plaintext highlighter-rouge">strawberry</code> 這個單字中有 3 個 <code class="language-plaintext highlighter-rouge">r</code>。</p>

<p><strong>步驟 4：檢查替代答案</strong></p>

<p>我會檢查一下是否有其他可能的答案。例如，如果我錯過了某個 <code class="language-plaintext highlighter-rouge">r</code>，或者如果我計算錯誤了。然而，根據我的計算，答案是 3。</p>

<p><strong>步驟 5：確認答案</strong>\n我會確認一下我的答案是否正確。根據我的計算，答案是 3。因此，我可以得出結論：<code class="language-plaintext highlighter-rouge">strawberry</code> 這個單字中有 3 個 <code class="language-plaintext highlighter-rouge">r</code>。</p>

<p><strong>結論</strong></p>

<p>根據我的計算，我可以得出結論：<code class="language-plaintext highlighter-rouge">strawberry</code> 這個單字中有 3 個 <code class="language-plaintext highlighter-rouge">r</code>。</p>

<p><strong>最終答案是</strong></p>

<p><code class="language-plaintext highlighter-rouge">strawberry</code> 這個單字中有 3 個 <code class="language-plaintext highlighter-rouge">r</code>。</p>]]></content><author><name></name></author><category term="chatgpt" /><summary type="html"><![CDATA[參考來源]]></summary></entry><entry><title type="html">GCP Cloud Storage上傳相同檔名時，強制取代</title><link href="/chatgpt/gcp-upload-file-if-match/" rel="alternate" type="text/html" title="GCP Cloud Storage上傳相同檔名時，強制取代" /><published>2024-05-06T02:42:00+00:00</published><updated>2024-05-06T02:42:00+00:00</updated><id>/chatgpt/gcp-upload-file-if-match</id><content type="html" xml:base="/chatgpt/gcp-upload-file-if-match/"><![CDATA[<h2 id="網址">網址</h2>
<p><a href="https://cloud.google.com/storage/docs/uploading-objects?hl=zh-cn#storage-upload-object-nodejs">官方文件</a></p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">options</span> <span class="o">=</span> <span class="p">{</span>
    <span class="na">destination</span><span class="p">:</span> <span class="nx">destFileName</span><span class="p">,</span>
    <span class="c1">// Optional:</span>
    <span class="c1">// Set a generation-match precondition to avoid potential race conditions</span>
    <span class="c1">// and data corruptions. The request to upload is aborted if the object's</span>
    <span class="c1">// generation number does not match your precondition. For a destination</span>
    <span class="c1">// object that does not yet exist, set the ifGenerationMatch precondition to 0</span>
    <span class="c1">// If the destination object already exists in your bucket, set instead a</span>
    <span class="c1">// generation-match precondition using its generation number.</span>
    <span class="c1">// 可選：</span>
    <span class="c1">// 如果目标對象的世代号與您的前提條件不匹配，上傳請求将被終止。如果目标對象已存在于您的數據桶中，則使用其世代号設置世代匹配前提條件。</span>
    <span class="na">preconditionOpts</span><span class="p">:</span> <span class="p">{</span><span class="na">ifGenerationMatch</span><span class="p">:</span> <span class="nx">generationMatchPrecondition</span><span class="p">},</span>
  <span class="p">};</span>

</code></pre></div></div>

<h2 id="問題">問題</h2>

<p>如果你上傳了相同檔名的文件時，系統會報錯，因為你檔案的版本號沒有調整</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="nl">"code"</span><span class="p">:</span><span class="mi">412</span><span class="p">,</span><span class="nl">"message"</span><span class="p">:</span><span class="s2">"At least one of the pre-conditions you specified did not hold."</span><span class="p">,</span><span class="nl">"errors"</span><span class="p">:[{</span><span class="nl">"message"</span><span class="p">:</span><span class="s2">"At least one of the pre-conditions you specified did not hold."</span><span class="p">,</span><span class="nl">"domain"</span><span class="p">:</span><span class="s2">"global"</span><span class="p">,</span><span class="nl">"reason"</span><span class="p">:</span><span class="s2">"conditionNotMet"</span><span class="p">,</span><span class="nl">"locationType"</span><span class="p">:</span><span class="s2">"header"</span><span class="p">,</span><span class="nl">"location"</span><span class="p">:</span><span class="s2">"If-Match"</span><span class="p">}]}</span><span class="w">
</span></code></pre></div></div>

<h2 id="解決辦法">解決辦法</h2>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="w">
</span><span class="err">preconditionOpts:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"ifMatch"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="err">//相同檔名則取代</span><span class="w">
</span><span class="p">}</span><span class="w">

</span></code></pre></div></div>]]></content><author><name></name></author><category term="chatgpt" /><summary type="html"><![CDATA[網址 官方文件]]></summary></entry><entry><title type="html">GPT-4o vs Gemini 1.5 flash</title><link href="/chatgpt/gpt-4o-vs-gemini-15-flash/" rel="alternate" type="text/html" title="GPT-4o vs Gemini 1.5 flash" /><published>2024-05-06T02:42:00+00:00</published><updated>2024-05-06T02:42:00+00:00</updated><id>/chatgpt/gpt-4o-vs-gemini-15-flash</id><content type="html" xml:base="/chatgpt/gpt-4o-vs-gemini-15-flash/"><![CDATA[<h2 id="兩者官網">兩者官網</h2>

<p><a href="https://platform.openai.com/docs/api-reference/chat/create">GPT-4o</a></p>

<p><a href="https://ai.google.dev/gemini-api/docs?hl=zh-tw">Gemini 1.5 Flash</a></p>

<h2 id="成本">成本</h2>

<table>
  <tbody>
    <tr>
      <td>GPT-4o</td>
      <td>輸入 5美金/1M Token</td>
      <td>輸出 15美金/1M Token</td>
    </tr>
  </tbody>
</table>

<table>
  <tbody>
    <tr>
      <td>Gemini 1.5 Flash</td>
      <td>輸入 0.25美金/1M Token</td>
      <td>輸出 0.75美金/1M Token</td>
    </tr>
  </tbody>
</table>

<p>就成本來說，Gemini 1.5 Flash真的有相當相當大的優勢</p>

<h2 id="比較">比較</h2>

<h3 id="第一題-菜單翻譯">第一題 菜單翻譯</h3>

<p>題目：</p>

<p><img src="/assets/img/2024-05-16-gpt-4o-vs-gemini-15-flash-01.png" width="60%" /></p>

<p>結果：</p>

<p>GPT-4o：</p>

<p><img src="/assets/img/2024-05-16-gpt-4o-vs-gemini-15-flash-02.png" width="60%" /></p>

<p>Gemini 1.5 Flash：</p>

<p><img src="/assets/img/2024-05-16-gpt-4o-vs-gemini-15-flash-03.png" width="60%" /></p>

<p>GPT-4o 乍看下沒什麼問題，但內容不少都沒有翻譯出來，還有幻想出來的菜單內容。另外，稅前稅後的價格差異太大了！</p>

<p>Gemini 1.5 Flash，根本沒有翻譯嘛… 但他對金額的判斷有做出來，文字的識別也都有到位。要分兩次工來做才會完美</p>

<h3 id="第二題-我是誰">第二題 我是誰</h3>

<p>題目：</p>

<p><img src="/assets/img/2024-05-16-gpt-4o-vs-gemini-15-flash-04.png" width="60%" /></p>

<p>GPT-4o：</p>

<p><img src="/assets/img/2024-05-16-gpt-4o-vs-gemini-15-flash-05.png" width="60%" /></p>

<p>Gemini 1.5 Flash：</p>

<p><img src="/assets/img/2024-05-16-gpt-4o-vs-gemini-15-flash-06.png" width="60%" /></p>

<p>答案來說，兩個都對了，但GPT-4o的回答更加完整</p>

<h3 id="第三題-我是誰">第三題 我是誰</h3>

<p>題目：</p>

<p><img src="/assets/img/2024-05-16-gpt-4o-vs-gemini-15-flash-07.png" width="60%" /></p>

<p>GPT-4o：</p>

<p><img src="/assets/img/2024-05-16-gpt-4o-vs-gemini-15-flash-08.png" width="60%" /></p>

<p>Gemini 1.5 Flash：</p>

<p><img src="/assets/img/2024-05-16-gpt-4o-vs-gemini-15-flash-09.png" width="60%" /></p>

<p>怪力也是兩個都對了，但GPT-4o的回答更加完整</p>

<h3 id="第四題-我是誰">第四題 我是誰</h3>

<p>題目：</p>

<p><img src="/assets/img/2024-05-16-gpt-4o-vs-gemini-15-flash-10.png" width="60%" /></p>

<p>GPT-4o：</p>

<p><img src="/assets/img/2024-05-16-gpt-4o-vs-gemini-15-flash-11.png" width="60%" /></p>

<p>Gemini 1.5 Flash：</p>

<p><img src="/assets/img/2024-05-16-gpt-4o-vs-gemini-15-flash-12.png" width="60%" /></p>

<p>這個梗圖來說，GPT-4o回答的就不錯了。 Gemini也不差，但頑皮彈你沒寫出來啊。</p>

<p>也有朋友說，由上往下看的胖丁。或是百變怪 才是正解 哈！</p>

<h2 id="結論">結論</h2>

<p>GPT-4o 與 Gemini 1.5 Flash，都是多模態大語言模型，也都有相當短的回應時間。</p>

<p>但成本與回傳品質就表明了是不同的商品，還是要針對使用情境來選擇。</p>]]></content><author><name></name></author><category term="chatgpt" /><summary type="html"><![CDATA[兩者官網]]></summary></entry><entry><title type="html">LLM Price Check</title><link href="/chatgpt/how-to-llm-price-check/" rel="alternate" type="text/html" title="LLM Price Check" /><published>2024-05-04T02:42:00+00:00</published><updated>2024-05-04T02:42:00+00:00</updated><id>/chatgpt/how-to-llm-price-check</id><content type="html" xml:base="/chatgpt/how-to-llm-price-check/"><![CDATA[<h2 id="網址">網址</h2>
<p><a href="https://llmpricecheck.com/">LLM Price Check</a></p>

<p><img src="/assets/img/2024-05-04-how-to-llm-price-check-01.png" width="60%" /></p>

<h2 id="介紹">介紹</h2>
<p>除了OpenAI、Google、Anthropic的商用大模型外，現在越來越多大模型的SaaS公司，提供各種開源的大模型串接的服務，價格可是相當有競爭力</p>

<p>除了在台灣開始火熱的Groq、還有靠AI搜尋而聞名的Perplexity 也列在這上面</p>

<p>跟GPT-3.5有差不多水準的 llama3-8b 或是 gemma-7b，價格只要 GPT-3.5 的3折</p>

<p><img src="/assets/img/2024-05-04-how-to-llm-price-check-02.png" width="60%" /></p>

<p>另外，比較常用的功能就是 Quality 品質的排序</p>

<p>可以在要求品質的前提之下，還能找到符合自己預算的平台</p>

<p><img src="/assets/img/2024-05-04-how-to-llm-price-check-03.png" width="60%" /></p>]]></content><author><name></name></author><category term="chatgpt" /><summary type="html"><![CDATA[網址 LLM Price Check]]></summary></entry></feed>