AI

Chat Completion

https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-id}/chat/completions?api-version={api-version}
curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/chat/completions?api-version=2023-05-15 \
  -H "Content-Type: application/json" \
  -H "api-key: YOUR_API_KEY" \
  -d '{"messages":[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},{"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},{"role": "user", "content": "Do other Azure AI services support this too?"}]}'
{
    "id":"chatcmpl-6v7mkQj980V1yBec6ETrKPRqFjNw9",
    "object":"chat.completion",
    "created":1679072642,
    "model":"gpt-35-turbo",
    "usage":{"prompt_tokens":58,
    "completion_tokens":68,
    "total_tokens":126},
    "choices":[
     {
        "message":{"role":"assistant","content":"Yes, other Azure AI services also support customer managed keys. Azure AI services offer multiple options for customers to manage keys, such as using Azure Key Vault, customer-managed keys in Azure Key Vault or customer-managed keys through Azure Storage service. This helps customers ensure that their data is secure and access to their services is controlled."},
        "finish_reason":"stop",
        "index":0
     }]
}

Parameter

Messages

  • Include role and content

  • Role: Indicates who is giving the current message. Can be system,user,assistant,tool, or function.

  • Content: The question and the answer

Temperature

  • What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Frequency penalty & Presence penalty

  • Frequency_penalty: This parameter is used to discourage the model from repeating the same words or phrases too frequently within the generated text. A higher frequency_penalty value will result in the less repeated keyword

  • Presence_penalty: This parameter is used to encourage the model to include a diverse range of tokens in the generated text. so that the result will be in different topic or content . A higher presence_penalty value will result in the model being more likely to generate tokens that have not yet been included in the generated text and in different topic

Max Token

  • The maximum number of tokens allowed for the generated answer. By default, the number of tokens the model can return will be (4096 - prompt tokens).

Tools

  • For function call , here is the example of object

tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ]

Embedding

Overview

  • An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.

  • Commonly used for:

    • Search (where results are ranked by relevance to a query string)

    • Clustering (where text strings are grouped by similarity)

    • Recommendations (where items with related text strings are recommended)

    • Anomaly detection (where outliers with little relatedness are identified)

    • Diversity measurement (where similarity distributions are analyzed)

    • Classification (where text strings are classified by their most similar label)

Example

curl https://api.openai.com/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "input": "Your text string goes here",
    "model": "text-embedding-ada-002"
  }
{
  "data": [
    {
      "embedding": [
        -0.006929283495992422,
        -0.005336422007530928,
        ...
        -4.547132266452536e-05,
        -0.024047505110502243
      ],
      "index": 0,
      "object": "embedding"
    }
  ],
  "model": "text-embedding-ada-002",
  "object": "list",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

Function Call

Overview

  • You can describe the parameter required, description and the function name, and provide it to gpt

  • Gpt will help to decide which function will be called and return the parameter based on the user question

  • Then, you can make good use of return value to call your own function , and return back the answer to provide to gpt

  • Finally, gpt will output the answer, that involves multiple completion calls

Example

from openai import OpenAI
import json

client = OpenAI()

# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    if "tokyo" in location.lower():
        return json.dumps({"location": "Tokyo", "temperature": "10", "unit": unit})
    elif "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "72", "unit": unit})
    elif "paris" in location.lower():
        return json.dumps({"location": "Paris", "temperature": "22", "unit": unit})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

def run_conversation():
    # Step 1: send the conversation and available functions to the model
    messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ]
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-1106",
        messages=messages,
        tools=tools,
        tool_choice="auto",  # auto is default, but we'll be explicit
    )
    response_message = response.choices[0].message
    tool_calls = response_message.tool_calls
    # Step 2: check if the model wanted to call a function
    if tool_calls:
        # Step 3: call the function
        # Note: the JSON response may not always be valid; be sure to handle errors
        available_functions = {
            "get_current_weather": get_current_weather,
        }  # only one function in this example, but you can have multiple
        messages.append(response_message)  # extend conversation with assistant's reply
        # Step 4: send the info for each function call and function response to the model
        for tool_call in tool_calls:
            function_name = tool_call.function.name
            function_to_call = available_functions[function_name]
            function_args = json.loads(tool_call.function.arguments)
            function_response = function_to_call(
                location=function_args.get("location"),
                unit=function_args.get("unit"),
            )
            messages.append(
                {
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": function_name,
                    "content": function_response,
                }
            )  # extend conversation with function response
        second_response = client.chat.completions.create(
            model="gpt-3.5-turbo-1106",
            messages=messages,
        )  # get a new response from the model where it can see the function response
        return second_response
print(run_conversation())
// user ask question
{"role": "user", "content": "What's the weather like today"}
// gpt ask for clarification
{
    'role': 'assistant', 
    'content': 'Sure, I can help you with that. Could you please tell me the city and state you are in or the location you want to know the weather for?'
}
 // user ask again
 {"role": "user", "content": "I'm in Glasgow, Scotland."}
 // gpt detect which function should be called and return parameter based on the question
 {
    'role': 'assistant', 
    'content': None,'tool_calls': [{'id': 'call_o7uyztQLeVIoRdjcDkDJY3ni',
    'type': 'function',
    'function': {'name': 'get_current_weather',
    'arguments': '{\n  "location": "Glasgow, Scotland",\n  "format": "celsius"\n}'}}
 }
 // Calling api to get temperature 22
 // After calling 3rd data source, and then call gpt again with 3rd party answer   
 {
     "role": "function",
      "name": "get_current_weather", 
      "content": "{\"temperature\": "22", \"unit\": \"celsius\", \"description\": \"Sunny\"}"
 } 
 // Return final answer by gpt
 {
      "role": "assistant",
      "content": "The weather in Boston is currently sunny with a temperature of 22 degrees Celsius.",
  }

Langchain

  • LangChain is a framework that facilitates the creation of applications using language models.

  • It provides different components (e.g: llm model and embedding) that allow non-AI experts to be able to implement existing AI language models into their applications

  • It is easy for developer to build complex chain from components

  • Chains are the fundamental principle that holds various AI components in LangChain to provide context-aware responses. A chain is a series of automated actions from the user's query to the model's output. For example, developers can use a chain for:

    • Connecting to different data sources.

    • Generating unique content.

    • Translating multiple languages.

    • Answering user queries.

  • Chains are made of links. Each action that developers string together to form a chained sequence is called a link. With links, developers can divide complex tasks into multiple, smaller tasks. Examples of links include:

    • Formatting user input.

    • Sending a query to an LLM.

    • Retrieving data from cloud storage.

    • Translating from one language to another.

Customization Methodology

Fine-tuning

  • Fine-tuning entails techniques to further train a model whose weights have already been updated through prior training. Using the base model’s previous knowledge as a starting point, fine-tuning tailors the model by training it on a smaller, task-specific dataset.

Prompt engineering

Retrieval-Augmented Generation (RAG)

  • Retrieval-augmented generation (RAG) is an AI framework for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM’s internal representation of information.

  • Firstly, fetching the related content by function call, similarity search based on embedding (R)

  • After that, add the result to the prompt

  • Finally, the result will be returned from the model

References

Last updated

Was this helpful?