HOOPSIPER : Explore. Learn. Innovate

Streamlit Caching Techniques using st.cache_data and st.cache_resource to boost application speed by avoiding repetitive, expensive computations or resource loading.

Step 1: Setup and Conceptual Slow Functions

This step sets up a basic Streamlit script and defines two functions that simulate slow operations: one that produces data (st.cache_data) and one that loads a heavy object/resource (st.cache_resource).


# Step 1: Setup and Conceptual Slow Functions

import streamlit as st
import pandas as pd
import time
import numpy as np

# --- Configuration ---
COMPUTATION_TIME = 2  # Seconds to simulate slow work
RESOURCE_INIT_TIME = 3 # Seconds to simulate slow resource loading

# --- Slow Data Function (Candidate for st.cache_data) ---
def simulate_heavy_data_processing(rows, cols):
    """
    Simulates a time-consuming data processing task.
    This function should be run only when its input changes.
    """
    st.info(f"⏳ Running heavy data processing for {rows} rows...")
    time.sleep(COMPUTATION_TIME)  # Simulate expensive computation
    
    data = np.random.randn(rows, cols)
    df = pd.DataFrame(data, columns=[f'Col_{i}' for i in range(cols)])
    
    st.success("✅ Data processing complete.")
    return df

# --- Slow Resource Function (Candidate for st.cache_resource) ---
class HeavyModelResource:
    """Simulates a large, expensive-to-load machine learning model."""
    def __init__(self, name):
        st.warning(f"⏳ Initializing Heavy Model: {name}...")
        time.sleep(RESOURCE_INIT_TIME) # Simulate loading weights/config
        self.name = name
        self.ready = True
        st.success("✅ Model resource ready.")

    def predict(self, data):
        """Simulate a quick prediction step using the loaded resource."""
        return data.shape[0] * 0.1 # Simple dummy prediction


def load_heavy_model(model_name):
    """
    Function to create and return the heavy resource object.
    This function should only run once per session/change in input.
    """
    return HeavyModelResource(model_name)

Step 2: Applying Streamlit Caching

We apply the two primary caching decorators:
1. st.cache_data: For functions that return data (like DataFrames, lists, NumPy arrays) and should rerun only if their inputs change.
2. st.cache_resource: For functions that return a heavy resource object (like database connections, ML models) and should be loaded only once per session


# Step 2: Applying Streamlit Caching



# Apply st.cache_data to the data function

@st.cache_data

def get_cached_dataframe(rows, cols):

    return simulate_heavy_data_processing(rows, cols)



# Apply st.cache_resource to the resource loading function

@st.cache_resource

def get_cached_model(model_name):

    return load_heavy_model(model_name)



# --- Streamlit App Layout ---

st.title("Streamlit Caching Demonstration 🚀")

st.markdown("Try changing the input slider or clicking the button to see the speed difference.")



# Input control that forces a full app rerun

row_count = st.slider("Select Data Rows", 1000, 10000, 5000, step=1000)



st.header("1. Data Caching (`st.cache_data`)")



# Run the cached data function

start_time_data = time.time()

df_cached = get_cached_dataframe(row_count, 5) # Input (row_count) changes cache state

end_time_data = time.time()



st.dataframe(df_cached.head(), use_container_width=True)

st.metric("Data Function Runtime", f"{end_time_data - start_time_data:.2f} s")



# When the slider changes, the function runs slow (Cache MISS).

# When the button below is clicked, the function runs fast (Cache HIT).



st.header("2. Resource Caching (`st.cache_resource`)")



# Run the cached resource function

start_time_resource = time.time()

# This will only run slow the very first time the app starts.

model_cached = get_cached_model("ResNet-v2")

end_time_resource = time.time()



st.write(f"Model Name: {model_cached.name}")

st.metric("Resource Load Runtime", f"{end_time_resource - start_time_resource:.2f} s")



# Perform a dummy prediction using the resource

st.write(f"Dummy Prediction Result: {model_cached.predict(df_cached)}")



# Button to trigger a rerun without changing cached inputs

st.button("Force App Rerun (Check Cache Hit)")

Conceptual Outcome

When running this Streamlit application:
1. First Run: Both functions will execute slowly (taking $2 \text{s}$ and $3 \text{s}$) because they are populating the cache.
2. Subsequent Reruns (without changing inputs): Clicking the "Force App Rerun" button will trigger a full script rerun, but the cached functions will skip execution. Their runtime metrics will be close to $0 \text{s}$ (the time it takes to look up the cache), demonstrating a huge speed boost.
3. Changing the Slider: Changing the row_count slider will cause get_cached_dataframe to run slow again (Cache MISS) because its input changed, but get_cached_model will remain fast (Cache HIT) because its input ("ResNet-v2") did not change.