AI

Chat Completion

https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-id}/chat/completions?api-version={api-version}
curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/chat/completions?api-version=2023-05-15 \
  -H "Content-Type: application/json" \
  -H "api-key: YOUR_API_KEY" \
  -d '{"messages":[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},{"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},{"role": "user", "content": "Do other Azure AI services support this too?"}]}'
{
    "id":"chatcmpl-6v7mkQj980V1yBec6ETrKPRqFjNw9",
    "object":"chat.completion",
    "created":1679072642,
    "model":"gpt-35-turbo",
    "usage":{"prompt_tokens":58,
    "completion_tokens":68,
    "total_tokens":126},
    "choices":[
     {
        "message":{"role":"assistant","content":"Yes, other Azure AI services also support customer managed keys. Azure AI services offer multiple options for customers to manage keys, such as using Azure Key Vault, customer-managed keys in Azure Key Vault or customer-managed keys through Azure Storage service. This helps customers ensure that their data is secure and access to their services is controlled."},
        "finish_reason":"stop",
        "index":0
     }]
}

Parameter

Messages

  • Include role and content

  • Role: Indicates who is giving the current message. Can be system,user,assistant,tool, or function.

  • Content: The question and the answer

Temperature

  • What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Frequency penalty & Presence penalty

  • Frequency_penalty: This parameter is used to discourage the model from repeating the same words or phrases too frequently within the generated text. A higher frequency_penalty value will result in the less repeated keyword

  • Presence_penalty: This parameter is used to encourage the model to include a diverse range of tokens in the generated text. so that the result will be in different topic or content . A higher presence_penalty value will result in the model being more likely to generate tokens that have not yet been included in the generated text and in different topic

Max Token

  • The maximum number of tokens allowed for the generated answer. By default, the number of tokens the model can return will be (4096 - prompt tokens).

Tools

  • For function call , here is the example of object

Embedding

Overview

  • An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.

  • Commonly used for:

    • Search (where results are ranked by relevance to a query string)

    • Clustering (where text strings are grouped by similarity)

    • Recommendations (where items with related text strings are recommended)

    • Anomaly detection (where outliers with little relatedness are identified)

    • Diversity measurement (where similarity distributions are analyzed)

    • Classification (where text strings are classified by their most similar label)

Example

Function Call

Overview

  • You can describe the parameter required, description and the function name, and provide it to gpt

  • Gpt will help to decide which function will be called and return the parameter based on the user question

  • Then, you can make good use of return value to call your own function , and return back the answer to provide to gpt

  • Finally, gpt will output the answer, that involves multiple completion calls

Example

Langchain

  • LangChain is a framework that facilitates the creation of applications using language models.

  • It provides different components (e.g: llm model and embedding) that allow non-AI experts to be able to implement existing AI language models into their applications

  • It is easy for developer to build complex chain from components

  • Chains are the fundamental principle that holds various AI components in LangChain to provide context-aware responses. A chain is a series of automated actions from the user's query to the model's output. For example, developers can use a chain for:

    • Connecting to different data sources.

    • Generating unique content.

    • Translating multiple languages.

    • Answering user queries.

  • Chains are made of links. Each action that developers string together to form a chained sequence is called a link. With links, developers can divide complex tasks into multiple, smaller tasks. Examples of links include:

    • Formatting user input.

    • Sending a query to an LLM.

    • Retrieving data from cloud storage.

    • Translating from one language to another.

Customization Methodology

Fine-tuning

  • Fine-tuning entails techniques to further train a model whose weights have already been updated through prior training. Using the base model’s previous knowledge as a starting point, fine-tuning tailors the model by training it on a smaller, task-specific dataset.

Prompt engineering

Retrieval-Augmented Generation (RAG)

  • Retrieval-augmented generation (RAG) is an AI framework for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM’s internal representation of information.

  • Firstly, fetching the related content by function call, similarity search based on embedding (R)

  • After that, add the result to the prompt

  • Finally, the result will be returned from the model

References

Last updated

Was this helpful?