Large Language Model

Lab 3: Code-Along

Agenda

  1. Load Libraries and Import Data

  2. Access Model through API

  3. Load Model

  4. Tune Model with Prompt Refinement

    • Zero shot

    • One shot and chain of thought

Load Libraries

  • openai

  • backoff

!pip install openai
!pip install backoff
Requirement already satisfied: openai in /opt/anaconda3/lib/python3.11/site-packages (1.35.15)
Requirement already satisfied: anyio<5,>=3.5.0 in /opt/anaconda3/lib/python3.11/site-packages (from openai) (4.2.0)
Requirement already satisfied: distro<2,>=1.7.0 in /opt/anaconda3/lib/python3.11/site-packages (from openai) (1.8.0)
Requirement already satisfied: httpx<1,>=0.23.0 in /opt/anaconda3/lib/python3.11/site-packages (from openai) (0.26.0)
Requirement already satisfied: pydantic<3,>=1.9.0 in /opt/anaconda3/lib/python3.11/site-packages (from openai) (1.10.12)
Requirement already satisfied: sniffio in /opt/anaconda3/lib/python3.11/site-packages (from openai) (1.3.0)
Requirement already satisfied: tqdm>4 in /opt/anaconda3/lib/python3.11/site-packages (from openai) (4.65.0)
Requirement already satisfied: typing-extensions<5,>=4.7 in /opt/anaconda3/lib/python3.11/site-packages (from openai) (4.9.0)
Requirement already satisfied: idna>=2.8 in /opt/anaconda3/lib/python3.11/site-packages (from anyio<5,>=3.5.0->openai) (3.4)
Requirement already satisfied: certifi in /opt/anaconda3/lib/python3.11/site-packages (from httpx<1,>=0.23.0->openai) (2024.7.4)
Requirement already satisfied: httpcore==1.* in /opt/anaconda3/lib/python3.11/site-packages (from httpx<1,>=0.23.0->openai) (1.0.2)
Requirement already satisfied: h11<0.15,>=0.13 in /opt/anaconda3/lib/python3.11/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai) (0.14.0)
Requirement already satisfied: backoff in /opt/anaconda3/lib/python3.11/site-packages (2.2.1)
import openai
import backoff
import time
import pandas as pd
from tqdm import tqdm

Import Data

DATA_FILENAME = 'ml literacy.csv'
df = pd.read_csv(DATA_FILENAME, encoding='utf-8')
token_usage = 0 #This initializes a variable token_usage to keep track of the total number of tokens used during the process.
print(df)
                                            response   ml literacy
0               machines taught to do what humans do        Novice
1  I don't know what it means but if I had to gue...  Intermediate
2  A machine/computer's capacity to learn and rec...      Advanced
3  I'm assuming it means that a machine can learn...        Novice
4      I guess it might mean learning from machines.        Novice
5  Machine learning is artificial intelligence. I...  Intermediate
6  The process of machines learning to mimic cert...  Intermediate
7  Machine Learning is where a machine is taught ...      Advanced
8  Machine Learning is when AI uses a set of data...      Advanced

Access Model through API

Find your own GPT API key: How to get your ChatGPT API key (4 steps)

#gpt_key = "xxx" #use your own key

Create a client to interact with OpenAI:

from openai import OpenAI

client = OpenAI(api_key=gpt_key)

Implement a backoff strategy to handle rate limits or transient issues such as network timeouts or temporary server errors when interact with GPT model:

@backoff.on_exception(backoff.expo, openai.RateLimitError)
def completions_with_backoff(**kwargs):
    '''This function will automatically try the api call again if it fails.'''
    return client.chat.completions.create(**kwargs)

Load Model

gpt_org = "org-pF1Od41p8zEN8oeGTSxATXei"
gpt_host = "https://api.openai.com/v1"
gpt_model = "gpt-3.5-turbo"
model = gpt_model
MAX_TOKENS = 100

Tune Model with Prompt Refinement

Zero Shot

responses = [] #This initializes an empty list to store responses generated from the OpenAI GPT model.

for i, row in tqdm(df.iterrows(), total=len(df)):
    value = str(row['response']) #Retrieves the value of the 'response' column from the current row and converts it to a string.
    
    #Constructs a prompt by combining a prefix and the 'response' value from the row.
    prefix = "Based on the student's response provided in:"
    postfix = "evaluate and return only the student's machine learning literacy level. The assessment should categorize the student into one of the following three levels: novice, intermediate, or advanced."
    prompt = ' '.join([prefix, value, postfix])

    # Creates a list of messages containing the prompt.
    messages = [{"role": "user", "content": prompt}] 
    
    #Attempts to generate a completion using the completions_with_backoff function, passing the GPT model, messages, and maximum tokens as arguments.
    try:
        completion = completions_with_backoff(
            model=model,
            messages=messages,
            max_tokens=MAX_TOKENS
        )
    except openai.APIError as e:
        print('ERROR: while getting accessing API.')
        print(f'Failed on item {i}.')
        print(e)
        print("Prompt:", prompt)
        raise e
    
    #Retrieves the response from the completion and appends it to the responses list.
    response = completion.choices[0].message.content
    responses.append(response)
    
    #Updates the token_usage counter with the total tokens used in the completion.
    token_usage += completion.usage.total_tokens

    # Need to wait to not exceed rate limit
    time.sleep(5)

Check model output with zero shot

                                            response   ml literacy  \
0               machines taught to do what humans do        Novice   
1  I don't know what it means but if I had to gue...  Intermediate   
2  A machine/computer's capacity to learn and rec...      Advanced   
3  I'm assuming it means that a machine can learn...        Novice   
4      I guess it might mean learning from machines.        Novice   
5  Machine learning is artificial intelligence. I...  Intermediate   
6  The process of machines learning to mimic cert...  Intermediate   
7  Machine Learning is where a machine is taught ...      Advanced   
8  Machine Learning is when AI uses a set of data...      Advanced   

                                           zeroshort  
0  Based on the student's response, the student d...  
1  Based on the student's response, I would categ...  
2  Based on the student's response, they demonstr...  
3                                       Intermediate  
4  Based on the student's response, they appear t...  
5  Based on the student's response, they demonstr...  
6  Based on the student's response, they demonstr...  
7  Based on the student's response, they demonstr...  
8  The student demonstrates an intermediate level...  

Tune Model with Prompt Refinement

One shot and chain of thought

responses = [] #This initializes an empty list to store responses generated from the OpenAI GPT model.

for i, row in tqdm(df.iterrows(), total=len(df)):
    value = str(row['response']) #Retrieves the value of the 'response' column from the current row and converts it to a string.
    
    #Constructs a prompt by combining a prefix and the 'response' value from the row.
    prefix = "Based on the student's response provided in:"
    
    # Define the base instructions
    instructions = (
        "Evaluate and return only the student's machine learning literacy level. "
        "The assessment should categorize the student into one of the following three levels: "
        "novice, intermediate, or advanced."
    )
    
    # Define the example and chain of thought for novice level
    novice_example = (
        "Novice: 'Machine learning is kind of intelligence where computers learn on their own.'"
    )
    chain_of_thought = (
        "Chain of Thought: In this example, the student's description of machine learning "
        "focuses on a broad, generalized understanding without delving into specifics about how "
        "machine learning algorithms work or are applied. The emphasis on 'intelligence' and "
        "'learning on their own' suggests a lack of detailed knowledge about the processes and "
        "techniques involved in machine learning, which is characteristic of a novice level of understanding."
    )
    
    # Define the reminder
    reminder = (
        "Remember, your task is to specify the literacy level as either "
        "novice, intermediate, or advanced without adding any additional commentary or explanation."
    )
    
    # Combine all parts into the final postfix message
    postfix = f"{instructions} To guide your evaluation, consider the following example and the associated chain of thought process: {novice_example} {chain_of_thought} {reminder}"
            
    prompt = ' '.join([prefix, value, postfix])

    # Creates a list of messages containing the prompt.
    messages = [{"role": "user", "content": prompt}] 
    
    #Attempts to generate a completion using the completions_with_backoff function, passing the GPT model, messages, and maximum tokens as arguments.
    try:
        completion = completions_with_backoff(
            model=model,
            messages=messages,
            max_tokens=MAX_TOKENS
        )
    except openai.APIError as e:
        print('ERROR: while getting accessing API.')
        print(f'Failed on item {i}.')
        print(e)
        print("Prompt:", prompt)
        raise e
    
    #Retrieves the response from the completion and appends it to the responses list.
    response = completion.choices[0].message.content
    responses.append(response)
    
    #Updates the token_usage counter with the total tokens used in the completion.
    token_usage += completion.usage.total_tokens

    # Need to wait to not exceed rate limit
    time.sleep(5)

Check model output with one short and chain of thoughts

                                            response   ml literacy  \
0               machines taught to do what humans do        Novice   
1  I don't know what it means but if I had to gue...  Intermediate   
2  A machine/computer's capacity to learn and rec...      Advanced   
3  I'm assuming it means that a machine can learn...        Novice   
4      I guess it might mean learning from machines.        Novice   
5  Machine learning is artificial intelligence. I...  Intermediate   
6  The process of machines learning to mimic cert...  Intermediate   
7  Machine Learning is where a machine is taught ...      Advanced   
8  Machine Learning is when AI uses a set of data...      Advanced   

                                           zeroshort       oneshot  
0  Based on the student's response, the student d...  Intermediate  
1  Based on the student's response, I would categ...  Intermediate  
2  Based on the student's response, they demonstr...  Intermediate  
3                                       Intermediate  Intermediate  
4  Based on the student's response, they appear t...  Intermediate  
5  Based on the student's response, they demonstr...  Intermediate  
6  Based on the student's response, they demonstr...  Intermediate  
7  Based on the student's response, they demonstr...  Intermediate  
8  The student demonstrates an intermediate level...  Intermediate