Lab: Data Manipulation and Analysis

To complete this exercise, you will need to use the Pandas library to perform a series of data manipulation and analysis tasks on valuation of Indonesia banking institutions. This exercise is worth 15 points.

Pre-Lab Preparation & Prerequisites

Environment Setup and API Keys

You will need to have a proper Python environment setup, with .env file containing the necessary API keys.

Data Analysis w/ Pandas

Review the Pandas tutorial to familiarize yourself with the library and its capabilities.

Submission Template (Google Colab)

Use the provided Google Colab template to complete the exercises and submit them for grading (a total of 15 points).

Google Classroom

Head over to Google Classroom to submit your assignments and receive feedback.

Financial Ratios for Value Investing

When it comes to investing, the specific style of “value investing” popularized by Benjamin Graham and Warren Buffet is based on the idea of buying stocks that are undervalued relative to their intrinsic value.

One way to determine if a stock is undervalued is to look at a company’s financial ratios. These ratios can help investors understand how a company is performing financially and whether its stock is a good investment.

In particular, We’ll be looking at the Price-to-Earnings ratio, the Price-to-Book ratio, and the Price-to-Sales ratio.

$\text{P/E Ratio} = \frac{\text{Price per share}}{\text{Earnings per share}}$

Where the Price per share is the current stock price,
and Earnings per share is the company’s net income divided by the number of shares outstanding.

In a way, this measures how much investors are willing to pay for each dollar of earnings a company generates.

$\text{P/B Ratio} = \frac{\text{Price per share}}{\text{Book value per share}}$

Where the Price per share is the current stock price,
and the Book value per share is the company’s total assets minus its total liabilities, divided by the number of shares outstanding.

The P/B ratio reflects the value that investors are willing to pay for each dollar of book value.

$\text{P/S Ratio} = \frac{\text{Price per share}}{\text{Sales per share}}$

Where the Price per share is the current stock price,
and the Sales per share is the company’s total revenue divided by the number of shares outstanding.

The P/S ratio is particularly useful for companies that have negative earnings or are in the growth phase, as it measures how much investors are willing to pay for each dollar of sales.

Margin of Safety

In value investing, a lower ratio is generally considered better, as it indicates that the stock is undervalued relative to its financial performance. When a stock is trading at a higher P/E or P/B ratio relative to its peers, value investors may typically consider it overvalued — this is where the concept of “margin of safety” comes into play. Magin of safety is simply the difference between the intrinsic value of a stock and its market price, and a stock with a higher P/E value is considered to have a lower margin of safety, unless its growth rate is significantly higher than its peers.

In the following exercise, you will be working with data directly from a financial data API to extract the historical financial ratios of Indonesia’s leading banking institutions.

The data would look like this:

symbol,company_name,year,pe,pb,ps
BBRI.JK,PT Bank Rakyat Indonesia (Persero) Tbk,2020,27.296475059198,2.24404134373024,3.90812024342799
BBRI.JK,PT Bank Rakyat Indonesia (Persero) Tbk,2021,19.8501979232225,2.13579246128274,4.02599153256836
BBRI.JK,PT Bank Rakyat Indonesia (Persero) Tbk,2022,14.4852351105461,2.47654136988394,4.40613876244184
BBRI.JK,PT Bank Rakyat Indonesia (Persero) Tbk,2023,14.3700683214413,2.77373225214342,5.18776985111159
BBRI.JK,PT Bank Rakyat Indonesia (Persero) Tbk,2024,11.8598506535486,2.42373070867358,3.96923623135049
BMRI.JK,PT Bank Mandiri (Persero) Tbk,2020,17.3942521554938,1.54302128039226,3.05224667300517
BMRI.JK,PT Bank Mandiri (Persero) Tbk,2021,11.5796062923157,1.58561909266445,2.91227616355148
BMRI.JK,PT Bank Mandiri (Persero) Tbk,2022,11.1374278712277,1.99641859827664,3.62397928851202
BMRI.JK,PT Bank Mandiri (Persero) Tbk,2023,11.6603200287564,2.46122688658584,5.20968178447345
...

By comparing the historical P/E ratios of different banks to their peers and their own historical averages, an analyst can get a sense of whether a bank is trading at a premium or discount to its intrinsic value, and whether it is a good value investment relative to its historical norm.

With the right data and analysis tools, a data analyst can construct a comprehensive picture of the banking sector’s valuation, and estimate a reasonable range for a stock’s intrinsic value. If the current P/E ratio is at the lower end of its historical range and the company’s growth prospects are as strong as ever, it may indicate a good value investment opportunity with a high margin of safety.

Banking Institutions’ historical valuation

For your convenience, a copy of the dataset has been provided in the datasets folder. Using what you’ve learned in the Pandas tutorial, you should know how to read this data into a DataFrame:

import pandas as pd

banks = pd.read_csv('datasets/indonesia_banks_050824.csv')
banks.tail()

# Output:
#      symbol                   company_name  year         pe        pb        ps
# 18  BRIS.JK  PT Bank Syariah Indonesia Tbk  2020  10.082970  1.014481  1.557497
# 19  BRIS.JK  PT Bank Syariah Indonesia Tbk  2021  23.934311  2.897505  4.622312
# 20  BRIS.JK  PT Bank Syariah Indonesia Tbk  2022  13.843540  1.760183  3.226628
# 21  BRIS.JK  PT Bank Syariah Indonesia Tbk  2023  14.072334  2.071936  4.287817
# 22  BRIS.JK  PT Bank Syariah Indonesia Tbk  2024  18.897257  2.752647  5.756971

Once you have the data loaded, perform a series of exercises that are worth 15 points in total.

1. Basic Data Exploration

Objective: Learn to explore the basic structure of a DataFrame
Task: Write Python code to import the dataset and display the first 5 rows of the DataFrame. Then, report its shape (number of rows and columns).
Hint: Use the head() method and the shape attribute.

2. Filtering Data

Objective: Learn to filter data based on specific conditions
Task: Write Python code to filter and display rows where the pe is greater than 15 and when symbol is equal to BRIS.JK. Name this new DataFrame bris_high.
Hint: Use boolean indexing

3. Calculate Average Values

Objective: Learn to compute statistical summaries on a DataFrame
Task: Write Python code to calculate the median pe, pb, and ps values for the year 2022.
Hint: Use the median() method after filtering the DataFrame for the year 2022.

4. Sort Data, Reorder and Drop Columns

Objective: Learn to sort data and reorder columns
Task: Write Python code to sort the DataFrame by ps in descending order, then reorder the columns so that symbol is the first column, year, ps, pe, and pb. Drop the company_name column.
Hint: Use the sort_values() method, and optionally drop() method to remove the company_name column. Recall that in sort_values, you can specify the ascending parameter to False to sort in descending order. You may also choose the columns parameter in the reindex() method to reorder the columns.

5. Grouping and Aggregating Data

Objective: Learn to group data and perform aggregate operations
Task: Write Python code to group the bank valuation data by symbol and calculate the average pe, pb, and ps values for each group. Display the results. Now, do the same but grouped by year and calculate the average pe, pb, and ps values for each year.
Hint: Use the groupby() method followed by the mean(numeric_only=True) method.

In past versions of pandas, the mean() method would include all numeric columns by default. There is now a FutureWarning that suggests numeric_only defaulting to False in the future.

We can either specify numeric_only=True to avoid the warning, or use the select_dtypes() method to filter out only the numeric columns before calling the mean() method.

banks
  .select_dtypes(include='number')
  .groupby('year')
  .mean()

# alternatively, explicitly specify through a list:
banks.groupby('year')[['pb','ps']].mean()

Dive Deeper

The data you’ve been provided with is not hand curated, but rather sourced from Sectors Financial Data API. If you’re feeling adventurous, you can proceed along with this section that explains how the data is being fetched from the API and subsequently processed into a csv file.

These topics may feel a bit advanced, and you are not required to complete them for the lab. They are included here for those who have a personal interest in learning more about how the “data pipeline” works in the real world.

Nested Data Structures into DataFrames

The data is fetched from the API using the requests library, as you have seen in your Quick start guide.

import os
import requests
from dotenv import load_dotenv

load_dotenv()

SECTORS_API_KEY = os.getenv("SECTORS_API_KEY")

headers = {"Authorization": SECTORS_API_KEY}

def fetch_data(url):
    """
    Fetches the data from the specified URL and returns it as a JSON object.
    We created this from the get_info function in the previous chapter (API programming)
    """
    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()  # Raise an exception for HTTP errors
        return response.json()
    except Exception as e:
        print(f"An error occurred: {e}")
        return {"error": "An error occurred while fetching the data."}

To actually use fetch_data we need to specify a url. This can be constructed from reading the correponding API documentation. For example, to fetch the historical P/E ratios of BBRI, one could do:

import os
import requests
from dotenv import load_dotenv
import pandas as pd

load_dotenv()

SECTORS_API_KEY = os.getenv("SECTORS_API_KEY")

headers = {"Authorization": SECTORS_API_KEY}

banks = ["BBRI", "BMRI", "BBCA", "BBNI", "BRIS"]
base_url = "https://api.sectors.app/v1/company/report/"


def fetch_data(url):
    """
    Fetches the data from the specified URL and returns it as a JSON object.
    We created this from the get_info function in the previous chapter (API programming)
    """
    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()  # Raise an exception for HTTP errors
        return response.json()
    except Exception as e:
        print(f"An error occurred: {e}")
        return {"error": "An error occurred while fetching the data."}

def get_bbri_as_dataframe():
    """
    Fetches the historical P/E ratios of BBRI and returns it as a DataFrame
    """
    url = f"{base_url}BBRI/?sections=valuation"
    # formed url is: https://api.sectors.app/v1/company/report/BBRI/?sections=valuation
    response = fetch_data(url)

    return pd.json_normalize(response['valuation']).T


bbri = get_bbri_as_dataframe()

This looks like a great deal more code than what you’ve seen so far, but it’s essentially the same concepts you’ve learned with the addition of pd.json_normalize to convert the JSON response into a DataFrame.

Here’s the output:

print(bbri)

# Output:
                                                                       0
# last_close_price                                                    4530
# latest_close_date                                             2024-08-05
# daily_close_change                                             -0.038217
# forward_pe                                                     10.036112
# price_cash_flow                                                 5.830954
# enterprise_to_revenue                                               None
# enterprise_to_ebitda                                                 0.0
# peg_ratio                                                       0.353977
# intrinsic_value                                              5389.036722
# historical_valuation   [{'pb': 2.24404134373024, 'pe': 27.29647505919...}]

We’ll find that the historical_valuation key contains the data we’re interested in, and we can extract it using pd.json_normalize once we’ve used .explode() to expand out the list of dictionaries into separate rows.

bbri.loc['historical_valuation'].explode().head()

# Output:
# 0    {'pb': 2.24404134373024, 'pe': 27.296475059198...}
# 0    {'pb': 2.13579246128274, 'pe': 19.850197923222...}
# 0    {'pb': 2.47654136988394, 'pe': 14.485235110546...}
# 0    {'pb': 2.77373225214342, 'pe': 14.370068321441...}
# 0    {'pb': 2.33110392317047, 'pe': 11.406607296663...}
# Name: historical_valuation, dtype: object

Seeing that it is now an object, we can then use pd.json_normalize to expand the dictionary into separate columns:

pd.json_normalize(bbri.loc['historical_valuation'].explode())

# Output:
#          pb         pe        ps  year  pb_peer_avg  pe_peer_avg  ps_peer_avg
# 0  2.244041  27.296475  3.908120  2020     1.262425    27.296475     6.378919
# 1  2.135792  19.850198  4.025992  2021     1.466306    15.014559     6.374056
# 2  2.476541  14.485235  4.406139  2022     1.008524    15.129255     4.343578
# 3  2.773732  14.370068  5.187770  2023     0.832330    14.072334     3.899448
# 4  2.331104  11.406607  3.817545  2024     0.781849    13.367830     3.457184

That is the essence of how we fetch data from an API endpoint that sometimes, as in this case, returns nested data structures.

json_normalize: Converts JSON data into a DataFrame
explode: Expands out a list of dictionaries into separate rows

We perform the above iteratively for each bank in the banks list, and then concatenate the DataFrames together to form the final dataset.

banks = ["BBRI", "BMRI", "BBCA", "BBNI", "BRIS"]


def create_df(banks, save_to_csv=False, file_name="indonesia_banks.csv"):
    """
    Creates a DataFrame by fetching data from the API for each bank in the list.

    Parameters:
    - banks: List of bank identifiers.
    - save_to_csv: Boolean flag to indicate if the DataFrame should be saved to a CSV file.
    - csv_file_name: Name of the CSV file to save the DataFrame (default is "data.csv").

    Returns:
    - DataFrame with valuation data from banks of Indonesia
    """
    df_list = []

    for bank in banks:
        url = f"{base_url}{bank}/?sections=valuation"
        response = fetch_data(url)

        if "error" not in response:
            df = pd.json_normalize(response)
            if "valuation.historical_valuation" in df.columns:
                historical_df = pd.json_normalize(
                    df["valuation.historical_valuation"].explode()
                )
                historical_df["symbol"] = df["symbol"][0]
                historical_df["company_name"] = df["company_name"][0]
                historical_df = historical_df[
                    ["symbol", "company_name", "year", "pe", "pb", "ps"]
                ]
            df_list.append(historical_df)
        else:
            print(f"Error fetching data for {bank}")

    if not df_list:
        joined = pd.DataFrame()
    else:
        joined = pd.concat(df_list, ignore_index=True)

    if save_to_csv:
        joined.to_csv(file_name, index=False)
        print(f"Data saved to {file_name}")
    return joined


df = create_df(banks, save_to_csv=True, file_name="datasets/indonesia_banks_050824.csv")

Get Started

Enterprise Data Management

Financial Data Analysis

Data Visualization

Apache Kafka and Redis

Case Studies of Enterprise AI

Final Project

Essentials

Lab: Data Manipulation and Analysis

Pre-Lab Preparation & Prerequisites

Environment Setup and API Keys

Data Analysis w/ Pandas

Submission Template (Google Colab)

Google Classroom

Financial Ratios for Value Investing

Margin of Safety

Banking Institutions’ historical valuation

1. Basic Data Exploration

2. Filtering Data

3. Calculate Average Values

4. Sort Data, Reorder and Drop Columns

5. Grouping and Aggregating Data

Dive Deeper

Nested Data Structures into DataFrames

Get Started

Enterprise Data Management

Financial Data Analysis

Data Visualization

Apache Kafka and Redis

Case Studies of Enterprise AI

Final Project

Essentials

​Pre-Lab Preparation & Prerequisites

Environment Setup and API Keys

Data Analysis w/ Pandas

Submission Template (Google Colab)

Google Classroom

​Financial Ratios for Value Investing

​Margin of Safety

​Banking Institutions’ historical valuation

​1. Basic Data Exploration

​2. Filtering Data

​3. Calculate Average Values

​4. Sort Data, Reorder and Drop Columns

​5. Grouping and Aggregating Data

​Dive Deeper

​Nested Data Structures into DataFrames

Pre-Lab Preparation & Prerequisites

Financial Ratios for Value Investing

Margin of Safety

Banking Institutions’ historical valuation

1. Basic Data Exploration

2. Filtering Data

3. Calculate Average Values

4. Sort Data, Reorder and Drop Columns

5. Grouping and Aggregating Data

Dive Deeper

Nested Data Structures into DataFrames