Dashboard
Superstore Sales Β· 2014–2017 Β· Linear Regression Β· RΒ² 0.600
Superstore Β· 2014–2017 Python Β· Scikit-learn
πŸ’°
4 yrs
$2,297,201
Total Historical Sales
πŸ“…
Avg
$47,858
Monthly Average
πŸ†
Peak
$118,448
Nov 2017 β€” Peak Month
🎯
LR
$349,127
6-Month Forecast Total
Historical Sales + 6-Month Forecast
2014–2017 actuals Β· Linear Regression forecast Jan–Jun 2018 Β· matches sales_forecast_dashboard.png
Monthly Forecast Β· 2018
Linear Regression Β· Best model Β· Cell 12 exact values
6-Month Total $349,127
Sales by Category
Total revenue 2014–2017
Sales by Region
Geographic performance Β· West leads
Sales Forecast
Solid = historical Β· Dashed = ML projection
Forecast Breakdown
Next 6 months Β· Linear Regression
Total Projected $349,127
πŸ“ˆ
48 mo
$2.30M
Total 2014–2017
πŸ“¦
9,994
$47,858
Monthly Average
πŸ”
Peak
Nov 2017
$118,448
πŸ“‰
Low
Feb 2014
$4,520
Full Sales History 2014–2017
48 months Β· 0 missing values Β· 0 duplicates Β· Cell 5 output
Year-over-Year Comparison
Monthly sales by year β€” consistent Q4 growth trend
Quarterly Breakdown
Total sales per quarter β€” Q4 dominance every year
Seasonality Heatmap β€” Monthly Sales by Year
YlOrRd colour scale Β· Darker = higher sales Β· Matches seasonality_heatmap.png (Cell 14)
Average Monthly Sales
Avg across all 4 years Β· Nov peak visible
Monthly Relative Intensity
Normalised bar chart of seasonal pattern
πŸ—‚
Shape
9,994
Rows Γ— 21 Columns
βœ…
Clean
0
Missing Values
πŸ”
Dupes
0
Duplicates Removed
πŸ’΅
Mean
$229.86
Avg Order Value
Total Sales by Category
Horizontal bar Β· Technology leads Β· matches eda_overview.png Cell 4
Total Sales by Region
West leads Β· 4 regions Β· matches eda_overview.png Cell 4
Sales vs Profit Scatter
Each dot = 1 order Β· Red line = zero profit boundary Β· matches eda_overview.png Cell 4
Summary Statistics
df[['Sales','Profit','Discount','Quantity']].describe() Β· Cell 3 exact values
Stat Sales Profit Discount Qty
Model Comparison β€” Exact Cell 7 Output
Chronological 80/20 split Β· Train: 28 months Β· Test: 8 months Β· Linear Regression wins on MAPE
Model
MAE
RMSE
RΒ²
MAPE
Linear RegressionBEST
$12,293
$15,092
0.600
16.8%
Random Forest
$14,237
$16,902
0.490
19.7%
Gradient Boosting
$15,586
$16,591
0.510
22.9%
MAPE Comparison
Lower = more accurate Β· Linear Regression best at 16.8%
RΒ² Score
Higher = better fit Β· Linear Regression leads at 0.596
Forecast vs Actual β€” All 3 Models
Test period (last 8 months) predictions vs actual sales Β· matches forecast_vs_actual.png Cell 9
🌲
RF
13
Features Engineered
πŸ₯‡
#1
Lag_12
Top Predictor Feature
🌳
n=200
Trees
RF n_estimators
πŸ“
depth=6
max_depth
RF max_depth param
Feature Importance β€” Random Forest
Horizontal bar Β· Lag_1 dominates Β· matches feature_importance.png Cell 11
13 Engineered Features β€” Cell 6
All temporal features used in model training
🎯
Best
Lin. Reg.
Model Used (lowest MAPE)
πŸ“
MAE
$12,293
Mean Absolute Error
πŸ“
RMSE
$15,092
Root Mean Sq Error
πŸ“Š
RΒ²
0.596
R-Squared Score
Residuals Over Time
Blue = over-predicted Β· Pink = under-predicted Β· matches residual_analysis.png Cell 10
Residual Distribution
Histogram of errors Β· matches residual_analysis.png Cell 10
Predicted vs Actual β€” Linear Regression
Scatter plot Β· Red dashed = perfect fit line Β· matches residual_analysis.png Cell 10
πŸ“ˆ
Peak
Nov 2017
$118,448 Highest Month
πŸ“‰
Low
Feb 2014
$4,520 Lowest Month
⚠️
Risk
Q1
Weakest Quarter
βœ…
Best
Q4
Strongest Every Year
01 / Inventory
πŸ“¦ Stock Up for Q4
Sales peak in November–December every year. Increase inventory 15–20% before October to avoid holiday stockouts.
02 / Promotions
🎯 Q1 Discount Strategy
January–February are consistently slowest. Run discount campaigns to stimulate demand and clear slow-moving stock.
03 / Forecasting
πŸ€– Monitor Lag-12 Signal
Same month last year (Lag_12) is the strongest predictor (#1 feature importance at ~55%). Year-over-year seasonal patterns dominate.
04 / Supply Chain
🚚 Plan 3 Months Ahead
Use the 6-month forecast ($349K) for proactive procurement planning to reduce lead time pressure.
Business Report β€” Exact Cell 15 Output
All values taken directly from notebook execution output