Spreadsheet Modeling

Author

LoweMackenzie

Published

September 17, 2024

R Packages

These are the core packages used in R to recreate some of the basic data visualization examples we explored, such as bar charts, pie charts, and pivot-style summaries. Each of these tools plays a specific role from data import and cleaning to aggregation and plotting.

While R provides a powerful and flexible framework for building reproducible and visualizations, it’s important to note the trade-offs in terms of efficiency and ease of use. For basic visual tasks, such as creating pie charts, bar graphs, or pivot tables, Excel is often faster and more intuitive, especially for users familiar with its drag-and-drop interface.

As a visualizations become more complex or need to be automated and repeated (e.g., across multiple datasets or time periods), R offers a more scalable and programmatic solution. The ability to script your workflow means you can quickly regenerate visuals without manually repeating steps: a major advantage over Excel for larger projects or reporting pipelines.

In short, Excel excels at speed and simplicity for one-off tasks, while R shines in flexibility, automation, and reproducibility for more advanced or recurring analysis and visualization tasks.

#install.packages("ggplot2")
#install.packages("dplyr")
#install.packages("tidyr")
#install.packages("readxl")
library(readxl)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(tidyr)
library(ggplot2)

Warning: package 'ggplot2' was built under R version 4.3.3

Data Visualization

Import Data

Before beginning your data visualization work in R, it’s important to ensure that R can locate and access the data file you want to work with. This typically involves setting the correct file path.

In the code below, you’ll need to replace the example file path with the actual path to your file. (MUST DO!)

For example, if your file is located on your Desktop in a folder named “ExcelFiles”, and the file is named “restaurant.xlsx”, the full file path might look like:

On Windows:
"C:/Users/YourName/Desktop/ExcelFiles/restaurant.xlsx"

On Mac:
"/Users/YourName/Desktop/ExcelFiles/restaurant.xlsx"

Be sure to:

What I would recommend is right click option mac Use forward slashes / instead of backslashes in the file path

Check that the file name is spelled correctly and includes the .xlsx extension

Ensure the file is not open in Excel when R is trying to read it

Once you’ve set the correct file path, you can use the readxl package to import the data into R for analysis and visualization.

If the readxl package is not already installed on your computer, you’ll need to install it first using install.packages(“readxl”). After it’s installed, you load the package with library(readxl).

# Fix the below script to find file you downloaded from Lamaku

#restaurants <- read_excel("~/Chapter 03/restaurant.xlsx")
head(restaurants)

# A tibble: 6 × 4
  Restaurant `Quality Rating` `Meal Price ($)` `Wait Time (min)`
       <dbl> <chr>                       <dbl>             <dbl>
1          1 Good                           18                 5
2          2 Very Good                      22                 6
3          3 Good                           28                 1
4          4 Excellent                      38                74
5          5 Very Good                      33                 6
6          6 Good                           28                 5

Another Importing Data

Files (Bottom-Right):

Go through the folder path to where the data is and click on the data to reveal the import option.

Then the following screen will pop up.

Import the data and you will notice the code needed will run in the console. Which you can copy and paste for later or at least take note of because it will show you what the manual code should have looked at.

Graphs

Pie Graph

# Summarize number of restaurants by quality rating
restaurant_counts <- restaurants %>%
  group_by(`Quality Rating`) %>%
  summarise(Count = n(), .groups = "drop")

# Create pie chart
ggplot(restaurant_counts, aes(x = "", y = Count, fill = `Quality Rating`)) +
  geom_col(width = 1, color = "white") +
  coord_polar(theta = "y") +
  theme_void() +  # removes axes and background
  labs(title = "Number of Restaurants by Quality Rating",
       fill = "Quality Rating") +
  theme(plot.title = element_text(hjust = 0.5))

Bar Graph

# Create bar plot
ggplot(restaurant_counts, aes(x = `Quality Rating`, y = Count, fill = `Quality Rating`)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Number of Restaurants by Quality Rating",
    x = "Quality Rating",
    y = "Number of Restaurants"
  ) +
  theme_minimal() +
  theme(legend.position = "none", plot.title = element_text(hjust = 0.5))

Tables

Pivot Table to Dplyr

This is what we created in Excel using pivot

# Step 1: Bin the Meal Price into $10 bins
restaurants_binned <- restaurants %>%
  mutate(
    PriceBin = cut(
      `Meal Price ($)`,
      breaks = seq(0, max(`Meal Price ($)`) + 10, by = 10),
      include.lowest = TRUE,
      right = FALSE  # e.g., [10,20)
    )
  )


# Step 2: Count of restaurants by Quality Rating and Price Bin
pivot_table <- restaurants_binned %>%
  group_by(`Quality Rating`, PriceBin) %>%
  summarise(RestaurantCount = n(), .groups = "drop") %>%
  pivot_wider(names_from = PriceBin, values_from = RestaurantCount, values_fill = 0)

# View the result
print(pivot_table, width = Inf)

# A tibble: 3 × 5
  `Quality Rating` `[10,20)` `[20,30)` `[30,40)` `[40,50]`
  <chr>                <int>     <int>     <int>     <int>
1 Excellent                2        14        28        22
2 Good                    42        40         2         0
3 Very Good               34        64        46         6

Calculation in R

Median

median(restaurants$`Meal Price ($)`)

[1] 25

Mean

mean(restaurants$`Meal Price ($)`)

[1] 25.89667

Z-score

A z-score measures how many standard deviations a value is from the mean.

# Calculate z-scores for meal price
restaurants$Meal_Z <- scale(restaurants$`Meal Price ($)`)
head(restaurants$Meal_Z)

           [,1]
[1,] -0.8502343
[2,] -0.4195542
[3,]  0.2264659
[4,]  1.3031661
[5,]  0.7648160
[6,]  0.2264659

Percentile

# 90th percentile of meal price
quantile(restaurants$`Meal Price ($)`, probs = 0.90)

90% 
 38

Quantile

# 90th percentile of meal price
quantile(restaurants$`Meal Price ($)`, probs = c(0, 0.25, 0.5, 0.75, 1))

  0%  25%  50%  75% 100% 
  10   19   25   32   48

Correlation

cor(restaurants$`Meal Price ($)`, restaurants$`Wait Time (min)`,)

[1] 0.4628228

--- title: "Spreadsheet Modeling" author: "LoweMackenzie" date: 2024-09-17 format: html: code-fold: false # Enables dropdown for code code-tools: true # (Optional) Adds buttons like "Show Code" code-summary: "Show code" # (Optional) Custom label for dropdown toc: true toc-location: left page-layout: full editor: visual --- ## R Packages These are the core packages used in R to recreate some of the basic data visualization examples we explored, such as bar charts, pie charts, and pivot-style summaries. Each of these tools plays a specific role from data import and cleaning to aggregation and plotting. While R provides a powerful and flexible framework for building reproducible and visualizations, it's important to note the trade-offs in terms of efficiency and ease of use. For basic visual tasks, such as creating pie charts, bar graphs, or pivot tables, Excel is often faster and more intuitive, especially for users familiar with its drag-and-drop interface. As a visualizations become more complex or need to be automated and repeated (e.g., across multiple datasets or time periods), R offers a more scalable and programmatic solution. The ability to script your workflow means you can quickly regenerate visuals without manually repeating steps: a major advantage over Excel for larger projects or reporting pipelines. In short, Excel excels at speed and simplicity for one-off tasks, while R shines in flexibility, automation, and reproducibility for more advanced or recurring analysis and visualization tasks. ```{r} #install.packages("ggplot2") #install.packages("dplyr") #install.packages("tidyr") #install.packages("readxl") library(readxl) library(dplyr) library(tidyr) library(ggplot2) ``` # Data Visualization ## Import Data Before beginning your data visualization work in R, it's important to ensure that R can locate and access the data file you want to work with. This typically involves setting the correct file path. In the code below, you'll need to replace the example file path with the actual path to your file. (MUST DO!) For example, if your file is located on your Desktop in a folder named "ExcelFiles", and the file is named "restaurant.xlsx", the full file path might look like: ``` On Windows: "C:/Users/YourName/Desktop/ExcelFiles/restaurant.xlsx" ``` ``` On Mac: "/Users/YourName/Desktop/ExcelFiles/restaurant.xlsx" ``` Be sure to: What I would recommend is right click *option mac* Use forward slashes / instead of backslashes in the file path Check that the file name is spelled correctly and includes the .xlsx extension Ensure the file is not open in Excel when R is trying to read it Once you've set the correct file path, you can use the readxl package to import the data into R for analysis and visualization. If the readxl package is not already installed on your computer, you'll need to install it first using install.packages("readxl"). After it's installed, you load the package with library(readxl). ```{r include=FALSE} restaurants <- read_excel("/Users/ashleylowe/Desktop/SpreadsheetModeling429/ExcelFiles_BusinessAnalytics/Chapter 03/restaurant.xlsx") ``` ```{r} # Fix the below script to find file you downloaded from Lamaku #restaurants <- read_excel("~/Chapter 03/restaurant.xlsx") head(restaurants) ``` ## Another Importing Data **Files (Bottom-Right):** Go through the folder path to where the data is and click on the data to reveal the import option. ![](images/Screenshot 2025-09-19 at 2.09.22 PM.png) Then the following screen will pop up. ![](images/Screenshot 2025-09-19 at 2.09.33 PM.png) Import the data and you will notice the code needed will run in the console. Which you can copy and paste for later or at least take note of because it will show you what the manual code should have looked at. ## Graphs ### Pie Graph ```{r} # Summarize number of restaurants by quality rating restaurant_counts <- restaurants %>% group_by(`Quality Rating`) %>% summarise(Count = n(), .groups = "drop") # Create pie chart ggplot(restaurant_counts, aes(x = "", y = Count, fill = `Quality Rating`)) + geom_col(width = 1, color = "white") + coord_polar(theta = "y") + theme_void() + # removes axes and background labs(title = "Number of Restaurants by Quality Rating", fill = "Quality Rating") + theme(plot.title = element_text(hjust = 0.5)) ``` ### Bar Graph ```{r} # Create bar plot ggplot(restaurant_counts, aes(x = `Quality Rating`, y = Count, fill = `Quality Rating`)) + geom_bar(stat = "identity") + labs( title = "Number of Restaurants by Quality Rating", x = "Quality Rating", y = "Number of Restaurants" ) + theme_minimal() + theme(legend.position = "none", plot.title = element_text(hjust = 0.5)) ``` ## Tables ### Pivot Table to Dplyr This is what we created in Excel using pivot ![](images/Screenshot%202025-09-16%20at%203.12.22%20PM.png) ```{r} # Step 1: Bin the Meal Price into $10 bins restaurants_binned <- restaurants %>% mutate( PriceBin = cut( `Meal Price ($)`, breaks = seq(0, max(`Meal Price ($)`) + 10, by = 10), include.lowest = TRUE, right = FALSE # e.g., [10,20) ) ) # Step 2: Count of restaurants by Quality Rating and Price Bin pivot_table <- restaurants_binned %>% group_by(`Quality Rating`, PriceBin) %>% summarise(RestaurantCount = n(), .groups = "drop") %>% pivot_wider(names_from = PriceBin, values_from = RestaurantCount, values_fill = 0) # View the result print(pivot_table, width = Inf) ``` # Calculation in R ## Median ```{r} median(restaurants$`Meal Price ($)`) ``` ## Mean ```{r} mean(restaurants$`Meal Price ($)`) ``` ## Z-score A z-score measures how many standard deviations a value is from the mean. ```{r} # Calculate z-scores for meal price restaurants$Meal_Z <- scale(restaurants$`Meal Price ($)`) head(restaurants$Meal_Z) ``` ## Percentile ```{r} # 90th percentile of meal price quantile(restaurants$`Meal Price ($)`, probs = 0.90) ``` ## Quantile ```{r} # 90th percentile of meal price quantile(restaurants$`Meal Price ($)`, probs = c(0, 0.25, 0.5, 0.75, 1)) ``` ## Correlation ```{r} cor(restaurants$`Meal Price ($)`, restaurants$`Wait Time (min)`,) ```