Streamlit Caching Techniques using st.cache_data and st.cache_resource to boost application speed by avoiding repetitive, expensive computations or resource loading.
Step 1: Setup and Conceptual Slow Functions
This step sets up a basic Streamlit script and defines two functions that simulate slow operations: one that produces data (st.cache_data) and one that loads a heavy object/resource (st.cache_resource).
# Step 1: Setup and Conceptual Slow Functions
import streamlit as st
import pandas as pd
import time
import numpy as np
# --- Configuration ---
COMPUTATION_TIME = 2 # Seconds to simulate slow work
RESOURCE_INIT_TIME = 3 # Seconds to simulate slow resource loading
# --- Slow Data Function (Candidate for st.cache_data) ---
def simulate_heavy_data_processing(rows, cols):
"""
Simulates a time-consuming data processing task.
This function should be run only when its input changes.
"""
st.info(f"⏳ Running heavy data processing for {rows} rows...")
time.sleep(COMPUTATION_TIME) # Simulate expensive computation
data = np.random.randn(rows, cols)
df = pd.DataFrame(data, columns=[f'Col_{i}' for i in range(cols)])
st.success("✅ Data processing complete.")
return df
# --- Slow Resource Function (Candidate for st.cache_resource) ---
class HeavyModelResource:
"""Simulates a large, expensive-to-load machine learning model."""
def __init__(self, name):
st.warning(f"⏳ Initializing Heavy Model: {name}...")
time.sleep(RESOURCE_INIT_TIME) # Simulate loading weights/config
self.name = name
self.ready = True
st.success("✅ Model resource ready.")
def predict(self, data):
"""Simulate a quick prediction step using the loaded resource."""
return data.shape[0] * 0.1 # Simple dummy prediction
def load_heavy_model(model_name):
"""
Function to create and return the heavy resource object.
This function should only run once per session/change in input.
"""
return HeavyModelResource(model_name)
Step 2: Applying Streamlit Caching
We apply the two primary caching decorators:
1. st.cache_data: For functions that return data (like DataFrames, lists, NumPy arrays) and should rerun only if their inputs change.
2. st.cache_resource: For functions that return a heavy resource object (like database connections, ML models) and should be loaded only once per session
# Step 2: Applying Streamlit Caching
# Apply st.cache_data to the data function
@st.cache_data
def get_cached_dataframe(rows, cols):
return simulate_heavy_data_processing(rows, cols)
# Apply st.cache_resource to the resource loading function
@st.cache_resource
def get_cached_model(model_name):
return load_heavy_model(model_name)
# --- Streamlit App Layout ---
st.title("Streamlit Caching Demonstration 🚀")
st.markdown("Try changing the input slider or clicking the button to see the speed difference.")
# Input control that forces a full app rerun
row_count = st.slider("Select Data Rows", 1000, 10000, 5000, step=1000)
st.header("1. Data Caching (`st.cache_data`)")
# Run the cached data function
start_time_data = time.time()
df_cached = get_cached_dataframe(row_count, 5) # Input (row_count) changes cache state
end_time_data = time.time()
st.dataframe(df_cached.head(), use_container_width=True)
st.metric("Data Function Runtime", f"{end_time_data - start_time_data:.2f} s")
# When the slider changes, the function runs slow (Cache MISS).
# When the button below is clicked, the function runs fast (Cache HIT).
st.header("2. Resource Caching (`st.cache_resource`)")
# Run the cached resource function
start_time_resource = time.time()
# This will only run slow the very first time the app starts.
model_cached = get_cached_model("ResNet-v2")
end_time_resource = time.time()
st.write(f"Model Name: {model_cached.name}")
st.metric("Resource Load Runtime", f"{end_time_resource - start_time_resource:.2f} s")
# Perform a dummy prediction using the resource
st.write(f"Dummy Prediction Result: {model_cached.predict(df_cached)}")
# Button to trigger a rerun without changing cached inputs
st.button("Force App Rerun (Check Cache Hit)")
Conceptual Outcome
When running this Streamlit application:
1. First Run: Both functions will execute slowly (taking $2 \text{s}$ and $3 \text{s}$) because they are populating the cache.
2. Subsequent Reruns (without changing inputs): Clicking the "Force App Rerun" button will trigger a full script rerun, but the cached functions will skip execution. Their runtime metrics will be close to $0 \text{s}$ (the time it takes to look up the cache), demonstrating a huge speed boost.
3. Changing the Slider: Changing the row_count slider will cause get_cached_dataframe to run slow again (Cache MISS) because its input changed, but get_cached_model will remain fast (Cache HIT) because its input ("ResNet-v2") did not change.
