Pandas AI

Introduction

Pandas is the most popular open-source Python library used for data manipulation and analysis.
If you are familiar with Excel or SQL, you can think of Pandas as a "programmable Excel" that is much faster, more powerful, and capable of handling millions of rows of data.
By importing the excel or SQL to table format , it is powerfully for analysis the data , such as calculating sum, plotting graph, cleaning data, grouping data, ... .
By Integrating with AI, the prompt with natural language can convert into pandas function to obtain the calculated result, suitable for creating a chatbot of data dashboard

Example

def chat_with_sql():
    llm = LiteLLM(
        model="gpt-4.1", 
        api_key="sk-wx6xklvYEtIrd-CQuQcqNA", 
        base_url="https://ai.hld.com/lite-llm/"
    )
    pai.config.set({
        "llm": llm
    })

    # Connect to MSSQL
    conn_str = (
        "DRIVER={ODBC Driver 17 for SQL Server};"
        "SERVER=CLUSQL01;"
        "DATABASE=HLDCCSCPRDB_report;"
        "UID=HLDCCSCPRDBViewer;"
        "PWD=EFbG5BVVNx!hgxox6#5EMkzd4ks*EN;"
    )
    conn = pyodbc.connect(conn_str)

    query = """
    SELECT
        PtyCode AS ptycode,
        ChnName,
        PtyCode + ' - ' + ChnName as PtycodeCn,
        SUM(TotalUnit) AS TotalUnit,
        SUM(AvailableUnits) AS AvailableUnits,
        SUM(AvailableLongTermUnits) AS AvailableLongTermUnits,
        SUM(RentedUnit) AS RentedUnit,     
        CASE 
            WHEN SUM(AvailableLongTermUnits) = 0 THEN 0 
            ELSE SUM(RentedUnit) * 1.0 / SUM(AvailableLongTermUnits) 
        END AS RentRatio,
        TakenDayTime
    FROM 
        vwDTOverallRentalDetail
    WHERE 
        CarType = 'ALL'
    GROUP BY 
        PtyCode,
        ChnName,
        TakenDayTime,
        PtyCode + ' - ' + ChnName
    """

    # Load data into pandas DataFrame
    rental_df = pd.read_sql(
        query, 
        conn, 
    )
    print(rental_df)
    conn.close()

    # Define column descriptions for better AI understanding
    columns_dict = {
        "ptycode": {"type": "string", "description": "Property code identifier (unique per property)"},
        "ChnName": {"type": "string", "description": "Chinese name of the property"},
        "PtycodeCn": {"type": "string", "description": "Combined property code and Chinese name"},
        "TotalUnit": {"type": "integer", "description": "Total number of units for this property"},
        "AvailableUnits": {"type": "integer", "description": "Number of available units for this property"},
        "AvailableLongTermUnits": {"type": "integer", "description": "Number of available long-term rental units. Used as denominator when calculating rent ratio."},
        "RentedUnit": {"type": "integer", "description": "Number of rented units. Used as numerator when calculating rent ratio."},
        "RentRatio": {"type": "float", "description": "Per-record rental ratio for this single property = RentedUnit / AvailableLongTermUnits. WARNING: Do NOT use SUM(RentRatio) for aggregate calculations. To calculate total/overall/aggregate rent ratio across multiple records, always use: SUM(RentedUnit) / SUM(AvailableLongTermUnits)"},
        "TakenDayTime": {"type": "datetime", "description": "The date when the data was recorded. Format is YYYY-MM-DD. Use this column for date filtering. Each property has one record per date."}
    }
    columns_list = [{"name": name, **props} for name, props in columns_dict.items()]

    # Wrap pandas DataFrame with pai.DataFrame for PandasAI compatibility
    pai_df = pai.DataFrame(rental_df)

    # Load existing dataset or create new one
    dataset_path = "company/rental-detail"
    try:
        df = pai.load(dataset_path)
    except Exception:
        df = pai.create(
            path=dataset_path,
            df=pai_df,
            description="""Rental detail data from vwDTOverallRentalDetail view showing property rental statistics. 
Each row represents one property on a specific date (TakenDayTime). 
Contains: total units, available units, rented units and rental ratio per property per date.
""",
            columns=columns_list
        )

    # Chat with your data

    # Data Set 1: Overall
    response1 = df.chat("What is the total rent ratio in November 2025?")
    print(response1)
    response2 = df.chat("What is the total number of rented units in November 2025?")
    print(response2)
    response3 = df.chat("What is the total number of available long-term rental units in November 2025?")
    print(response3)
    response4 = df.chat("What is the total number of total units in November 2025?")
    print(response4)
    response5 = df.chat("Comparing October 2025 and November 2025, what is the change in the total rent ratio?")
    print(response5)
    response6 = df.chat("Comparing October 2025 and November 2025, what is the change in the total number of rented units?")
    print(response6)
    response7 = df.chat("Comparing October 2025 and November 2025, what is the change in the total number of available long-term rental units?")
    print(response7)
    response8 = df.chat("Comparing October 2025 and November 2025, what is the change in the total number of total units?")
    print(response8)

    # Data Set 2: Property Level
    response9 = df.chat("What is the rent ratio for 友邦廣場 in November 2025?")
    response10 = df.chat("What is the top 20 highest rent ratio properties in November 2025?")
    print(response9)
    print(response10)

Result

     ptycode ChnName      PtycodeCn  TotalUnit  AvailableUnits  AvailableLongTermUnits  RentedUnit  RentRatio TakenDayTime        
0     PA0004    友邦廣場  PA0004 - 友邦廣場        731             507                     224         135   0.602679   2020-01-01
1     PA0004    友邦廣場  PA0004 - 友邦廣場        731             507                     224         126   0.562500   2020-02-01
2     PA0004    友邦廣場  PA0004 - 友邦廣場        731             507                     224         124   0.553571   2020-03-01
3     PA0004    友邦廣場  PA0004 - 友邦廣場        731             507                     224         122   0.544643   2020-04-01
4     PA0004    友邦廣場  PA0004 - 友邦廣場        731             507                     224         123   0.549107   2020-05-01
...      ...     ...            ...        ...             ...                     ...         ...        ...          ...        
4240  XS0001     桂濤苑   XS0001 - 桂濤苑        115              20                      95           1   0.010526   2020-02-01  
4241  XS0001     桂濤苑   XS0001 - 桂濤苑        115              20                      95           1   0.010526   2020-09-01  
4242  XS0001     桂濤苑   XS0001 - 桂濤苑        115              20                      95           1   0.010526   2021-01-01  
4243  XS0001     桂濤苑   XS0001 - 桂濤苑        115              20                      95           1   0.010526   2021-07-01  
4244  XS0001     桂濤苑   XS0001 - 桂濤苑        115              20                      95           1   0.010526   2022-01-01  

[4245 rows x 9 columns]
Dataset loaded successfully.
0.6305782941642462
4776.0
7574.0
10831.0
-0.0016601968000066192
-17.0
-7.0
-7.0
0.7703703703703704
                        PtycodeCn  TotalRentedUnit  TotalAvailableLongTermUnits  AggregateRentRatio
0                 PS0045 - 新港城海濤居            130.0                        124.0            1.048387    
1                    PC0007 - 清暉台              4.0                          4.0            1.000000       
2                     PG0027 - 逸峯            125.0                        126.0            0.992063        
3              PM0021 - 港灣豪庭（第一期）            307.0                        315.0            0.974603 
4             PS0035 - 新港城（ＮＰＱＲ座）            185.0                        191.0            0.968586
5                    PT0035 - 逸華軒             29.0                         33.0            0.878788       
6                   PF0019 - 粉嶺中心            222.0                        257.0            0.863813      
7                   PT0040 - 豫豐花園            171.0                        200.0            0.855000      
8                    PG0023 - 嘉亨灣            592.0                        733.0            0.807640       
9                 PM0019 - 新都城ＩＩ期            407.0                        513.0            0.793372    
10               PT0016 - 嘉兆臺（１期）            161.0                        203.0            0.793103
11                  PA0004 - 友邦廣場            104.0                        135.0            0.770370
12                  PS0011 - 沙田廣場             67.0                         87.0            0.770115
13  PG0029 - GLOBAL GATEWAY TOWER             26.0                         34.0            0.764706
14                  PF0047 - 花都廣場            124.0                        164.0            0.756098
15                   PM0031 - 創豪坊             15.0                         20.0            0.750000
16                    PT0046 - 尚悅            224.0                        304.0            0.736842
17             PM0022 - 港灣豪庭（第二期）            132.0                        180.0            0.733333
18                  PT0010 - 時代廣場             27.0                         37.0            0.729730
19             PP0022 - 疊茵庭（１－６座）             86.0                        119.0            0.722689