Build an Interactive Sales Dashboard with Taipy: A Step-by-Step Python Guide

This is the third installment in a concise series exploring the creation of data dashboards with cutting-edge Python tools, including Streamlit, Gradio, and Taipy.

The dataset for each dashboard remains identical but varies in storage formats. I’ll aim to align the dashboard designs and features across all three tools as closely as possible.

Previous articles covering the Streamlit and Gradio versions are complete. The Streamlit dashboard pulls data from a Postgres database, while the Gradio and Taipy dashboards utilize a CSV file. Links to these articles are available at the conclusion of this piece.

Introducing Taipy

Taipy is an emerging Python-based web framework that gained traction a few years ago. Per its official site, Taipy is…

“…an open-source Python library designed to swiftly build production-ready front-end and back-end solutions. No web development expertise is necessary!”

Taipy targets data scientists, machine learning experts, and data engineers who may lack deep front-end development skills but are proficient in Python. Taipy simplifies front-end creation with Python, offering a significant advantage.

Taipy is free to start with. For enterprise use, including dedicated support and scalability, paid subscriptions are offered monthly or annually. More details are available on their website, linked at the end of this article.

Why Opt for Taipy Instead of Gradio or Streamlit?

As demonstrated across this and the prior articles, all three frameworks can produce comparable results, prompting the question of preference.

While Gradio shines in rapid ML demo creation and Streamlit excels in interactive data analysis, both rely on simplicity that may hinder scalability as projects expand. Taipy becomes ideal when transitioning from basic scripts or demos to robust, efficient, and sustainable applications.

Consider Taipy over Streamlit or Gradio if:

  1. Your application demands high performance.
  2. Your single script is growing overly complex and lengthy.
  3. You need multi-page apps with intricate navigation.
  4. Your project involves “what-if” scenario analysis or advanced pipeline processing.
  5. You’re developing a production tool for business users, beyond internal dashboards.
  6. You’re collaborating in a team requiring a tidy, maintainable codebase. In essence, select Gradio for demos, Streamlit for interactive exploration, and Taipy for scalable, high-performance enterprise data solutions.

What We’re Building

We’re crafting a data dashboard. The source data will be a CSV file with 100,000 synthetic sales records.

The data’s origin is secondary—it could alternatively be a Parquet file, SQLite, Postgres, or any connectable database.

Here’s a preview of our completed dashboard design.


Completed dashboard design.


The dashboard features four primary sections.

The top row allows users to pick specific start and end dates via date selectors and choose product categories with a dropdown menu. The “Key Metrics” row offers a high-level overview of the filtered data. The Visualisations area lets users pick from three graph types to visualize the dataset. The Raw Data section provides a straightforward table view of the selected data, mirroring the underlying CSV file. Navigating the dashboard is simple. It starts by showing stats for the entire dataset, which users can refine using the three selection options at the top. The graphs, key metrics, and raw data sections update in real-time based on user selections.

The Source Data

As noted, the dashboard relies on a single comma-separated values (CSV) file for its data, comprising 100,000 synthetic sales records.

Here’s some Python code to create a dataset, leveraging the NumPy and Pandas libraries. Make sure both are installed in your environment before executing the script.

# generate the 100000 record CSV file
#
import polars as pl
import numpy as np
from datetime import datetime, timedelta

def generate(nrows: int, filename: str):
    names = np.asarray(
        [
            "Laptop",
            "Smartphone",
            "Desk",
            "Chair",
            "Monitor",
            "Printer",
            "Paper",
            "Pen",
            "Notebook",
            "Coffee Maker",
            "Cabinet",
            "Plastic Cups",
        ]
    )
    categories = np.asarray(
        [
            "Electronics",
            "Electronics",
            "Office",
            "Office",
            "Electronics",
            "Electronics",
            "Stationery",
            "Stationery",
            "Stationery",
            "Electronics",
            "Office",
            "Sundry",
        ]
    )
    product_id = np.random.randint(len(names), size=nrows)
    quantity = np.random.randint(1, 11, size=nrows)
    price = np.random.randint(199, 10000, size=nrows) / 100
    # Generate random dates between 2010-01-01 and 2023-12-31
    start_date = datetime(2010, 1, 1)
    end_date = datetime(2023, 12, 31)
    date_range = (end_date - start_date).days
    # Create random dates as np.array and convert to string format
    order_dates = np.array([(start_date + timedelta(days=np.random.randint(0, date_range))).strftime('%Y-%m-%d') for _ in range(nrows)])
    # Define columns
    columns = {
        "order_id": np.arange(nrows),
        "order_date": order_dates,
        "customer_id": np.random.randint(100, 1000, size=nrows),
        "customer_name": [f"Customer_{i}" for i in np.random.randint(2**15, size=nrows)],
        "product_id": product_id + 200,
        "product_names": names[product_id],
        "categories": categories[product_id],
        "quantity": quantity,
        "price": price,
        "total": price * quantity,
    }
    # Create Polars DataFrame and write to CSV with explicit delimiter
    df = pl.DataFrame(columns)
    df.write_csv(filename, separator=',',include_header=True)  # Ensure comma is used as the delimiter
# Generate 100,000 rows of data with random order_date and save to CSV
generate(100_000, "/mnt/d/sales_data/sales_data.csv")


Setting up and working with Taipy is straightforward, though it’s wise to create a dedicated Python environment before coding. I prefer Miniconda for this task, but you can choose any method that fits your process.

If you opt for Miniconda and haven’t installed it yet, you’ll need to set it up first.

After establishing the environment, activate it with the ‘activate’ command, then use ‘pip install’ to add the necessary Python libraries.

#create our test environment
(base) C:\Users\thoma>conda create -n taipy_dashboard python=3.12 -y

# Now activate it
(base) C:\Users\thoma>conda activate taipy_dashboard

# Install python libraries, etc ...
(taipy_dashboard) C:\Users\thoma>pip install taipy pandas


The Script I’ll divide the code into segments and provide an explanation for each part as we go along.

from taipy.gui import Gui
import pandas as pd
import datetime

# Load CSV data
csv_file_path = r"d:\sales_data\sales_data.csv"

try:
    raw_data = pd.read_csv(
        csv_file_path,
        parse_dates=["order_date"],
        dayfirst=True,
        low_memory=False  # Suppress dtype warning
    )
    if "revenue" not in raw_data.columns:
        raw_data["revenue"] = raw_data["quantity"] * raw_data["price"]
    print(f"Data loaded successfully: {raw_data.shape[0]} rows")
except Exception as e:
    print(f"Error loading CSV: {e}")
    raw_data = pd.DataFrame()

categories = ["All Categories"] + raw_data["categories"].dropna().unique().tolist()

# Define the visualization options as a proper list
chart_options = ["Revenue Over Time", "Revenue by Category", "Top Products"]


This script readies sales data for our Taipy visualization application by performing the following tasks:

  1. Imports necessary external libraries and loads and processes the source data from the input CSV file.
  2. Computes derived metrics, such as revenue.
  3. Identifies key filtering options, including categories.
  4. Specifies the available visualization choices.
start_date = raw_data["order_date"].min().date() if not raw_data.empty else datetime.date(2020, 1, 1)
end_date = raw_data["order_date"].max().date() if not raw_data.empty else datetime.date(2023, 12, 31)
selected_category = "All Categories"
selected_tab = "Revenue Over Time"  # Set default selected tab
total_revenue = "$0.00"
total_orders = 0
avg_order_value = "$0.00"
top_category = "N/A"
revenue_data = pd.DataFrame(columns=["order_date", "revenue"])
category_data = pd.DataFrame(columns=["categories", "revenue"])
top_products_data = pd.DataFrame(columns=["product_names", "revenue"])

def apply_changes(state):
    filtered_data = raw_data[
        (raw_data["order_date"] >= pd.to_datetime(state.start_date)) &
        (raw_data["order_date"] <= pd.to_datetime(state.end_date))
    ]
    if state.selected_category != "All Categories":
        filtered_data = filtered_data[filtered_data["categories"] == state.selected_category]

    state.revenue_data = filtered_data.groupby("order_date")["revenue"].sum().reset_index()
    state.revenue_data.columns = ["order_date", "revenue"]
    print("Revenue Data:")
    print(state.revenue_data.head())

    state.category_data = filtered_data.groupby("categories")["revenue"].sum().reset_index()
    state.category_data.columns = ["categories", "revenue"]
    print("Category Data:")
    print(state.category_data.head())

    state.top_products_data = (
        filtered_data.groupby("product_names")["revenue"]
        .sum()
        .sort_values(ascending=False)
        .head(10)
        .reset_index()
    )
    state.top_products_data.columns = ["product_names", "revenue"]
    print("Top Products Data:")
    print(state.top_products_data.head())

    state.raw_data = filtered_data
    state.total_revenue = f"${filtered_data['revenue'].sum():,.2f}"
    state.total_orders = filtered_data["order_id"].nunique()
    state.avg_order_value = f"${filtered_data['revenue'].sum() / max(filtered_data['order_id'].nunique(), 1):,.2f}"
    state.top_category = (
        filtered_data.groupby("categories")["revenue"].sum().idxmax()
        if not filtered_data.empty else "N/A"
    )

def on_change(state, var_name, var_value):
    if var_name in {"start_date", "end_date", "selected_category", "selected_tab"}:
        print(f"State change detected: {var_name} = {var_value}")  # Debugging
        apply_changes(state)

def on_init(state):
    apply_changes(state)

import taipy.gui.builder as tgb

def get_partial_visibility(tab_name, selected_tab):
    return "block" if tab_name == selected_tab else "none"


Establishes default start and end dates along with the initial category. The starting chart will appear as Revenue Over Time. Initial placeholders and values are assigned for:

  1. total_revenue: Set to "$0.00".
  2. total_orders: Set to 0.
  3. avg_order_value: Set to "$0.00".
  4. top_category: Set to "N/A".

Empty DataFrames are initialized for:

  1. revenue_data: Columns include ["order_date", "revenue"].
  2. category_data: Columns include ["categories", "revenue"].
  3. top_products_data: Columns include ["product_names", "revenue"].

The apply_changes function is defined, activating to refresh the state when filters (e.g., date range or category) are adjusted. It updates the following:

  1. Time-based revenue trends.
  2. Revenue distribution by category.
  3. The top 10 products based on revenue.
  4. Summary statistics (total revenue, total orders, average order value, top category).

The on_change function triggers whenever a user-adjustable component is modified.

The on_init function executes upon the app’s initial launch.

The get_partial_visibility function determines the CSS display attribute for UI elements depending on the active tab.

with tgb.Page() as page:
    tgb.text("# Sales Performance Dashboard", mode="md")
    
    # Filters section
    with tgb.part(class_name="card"):
        with tgb.layout(columns="1 1 2"):  # Arrange elements in 3 columns
            with tgb.part():
                tgb.text("Filter From:")
                tgb.date("{start_date}")
            with tgb.part():
                tgb.text("To:")
                tgb.date("{end_date}")
            with tgb.part():
                tgb.text("Filter by Category:")
                tgb.selector(
                    value="{selected_category}",
                    lov=categories,
                    dropdown=True,
                    width="300px"
                )
   
    # Metrics section
    tgb.text("## Key Metrics", mode="md")
    with tgb.layout(columns="1 1 1 1"):
        with tgb.part(class_name="metric-card"):
            tgb.text("### Total Revenue", mode="md")
            tgb.text("{total_revenue}")
        with tgb.part(class_name="metric-card"):
            tgb.text("### Total Orders", mode="md")
            tgb.text("{total_orders}")
        with tgb.part(class_name="metric-card"):
            tgb.text("### Average Order Value", mode="md")
            tgb.text("{avg_order_value}")
        with tgb.part(class_name="metric-card"):
            tgb.text("### Top Category", mode="md")
            tgb.text("{top_category}")

    tgb.text("## Visualizations", mode="md")
    # Selector for visualizations with reduced width
    with tgb.part(style="width: 50%;"):  # Reduce width of the dropdown
        tgb.selector(
            value="{selected_tab}",
            lov=["Revenue Over Time", "Revenue by Category", "Top Products"],
            dropdown=True,
            width="360px",  # Reduce width of the dropdown
        )

    # Conditional rendering of charts based on selected_tab
    with tgb.part(render="{selected_tab == 'Revenue Over Time'}"):
        tgb.chart(
            data="{revenue_data}",
            x="order_date",
            y="revenue",
            type="line",
            title="Revenue Over Time",
        )

    with tgb.part(render="{selected_tab == 'Revenue by Category'}"):
        tgb.chart(
            data="{category_data}",
            x="categories",
            y="revenue",
            type="bar",
            title="Revenue by Category",
        )

    with tgb.part(render="{selected_tab == 'Top Products'}"):
        tgb.chart(
            data="{top_products_data}",
            x="product_names",
            y="revenue",
            type="bar",
            title="Top Products",
        )

    # Raw Data Table
    tgb.text("## Raw Data", mode="md")
    tgb.table(data="{raw_data}")


This code segment outlines the appearance and functionality of the entire page, divided into multiple sub-sections:

Page Definition

tgp.page(): Serves as the primary container for the dashboard, establishing its structure and components.

Dashboard Layout

Presents the title “Sales Performance Dashboard” in Markdown format (mode="md").

Filters Section

Housed within a card-style component utilizing a 3-column layout—tgb.layout(columns="1 1 2")—to organize the filters.

Filter Elements

Start Date: Features a date picker tgb.date("{start_date}") to set the beginning of the date range.
End Date: Includes a date picker tgb.date("{end_date}") to define the end of the date range.
Category Filter: Offers a dropdown selector tgb.selector to filter data by categories, populated with options like "All Categories" and dataset-specific categories.

Key Metrics Section

  • Showcases summary stats via four metric cards in a 4-column layout:
Total Revenue: Reflects the total_revenue figure.
Total Orders: Indicates the total_orders count of unique orders.
Average Order Value: Displays the avg_order_value.
Top Category: Highlights the highest revenue-generating category.

Visualizations Section

  • Provides a dropdown selector for switching between visualizations (e.g., “Revenue Over Time,” “Revenue by Category,” “Top Products”), with a reduced width for a streamlined UI.

Conditional Rendering of Charts

  1. Revenue over Time: Renders a line chart revenue_data to track revenue trends over time.
  2. Revenue by Category: Presents a bar chart category_data to illustrate revenue distribution across categories.
  3. Top Products: Displays a bar chart top_products_data featuring the top 10 products by revenue.

Raw Data Table

  • Exhibits the raw dataset in a table format.
  • Updates dynamically based on applied user filters (e.g., date range, category).
Gui(page).run(
    title="Sales Dashboard",
    dark_mode=False,
    debug=True,
    port="auto",
    allow_unsafe_werkzeug=True,
    async_mode="threading"
)

This concluding brief section displays the page in a browser.

Executing the Code

Gather all the code snippets above and save them into a file, such as taipy-app.py. Ensure your source data file is properly located and correctly referenced in the script. Run the module like any other Python program by entering the following in a command-line terminal:

python taipy-app.py

Within a couple of seconds, a browser window should open, showcasing your data application.

Overview

In this article, I’ve aimed to deliver a thorough guide on constructing an interactive sales performance dashboard using Taipy, with a CSV file as the data source.

I highlighted Taipy as a contemporary, open-source Python framework that streamlines the development of data-driven dashboards and apps. I also offered insights into why Taipy might be preferable to the widely used frameworks, Gradio and Streamlit.

The dashboard I created enables users to filter data by date ranges and product categories, review key metrics like total revenue and top categories, explore visualizations such as revenue trends and leading products, and browse raw data with pagination.

This guide offers a complete implementation, detailing the process from generating sample data to crafting Python functions for data queries, plot creation, and user input management. This structured approach showcases how to harness Taipy’s features to build intuitive and dynamic dashboards, perfect for data engineers and scientists aiming to develop interactive data tools.

While I utilized a CSV file as the data source, adapting the code to work with alternative sources, like a relational database management system (RDBMS) such as SQLite, should be relatively simple.




Post a Comment

0 Comments

Hype News
Hype News
Hype News
Hype News
Hype News
Hype News