Step 1: Data preparation – significant changes
How you define ‘significant’ really depends on your own operational context, and there is no right answer. In this example, I consider a deviation of more than 2 to be significant
standard deviations (sd) from
weekly average more than a year to smooth out weekly fluctuations.
dummy_data_2_flagged <- dummy_data_2 %>%
Flag = case_when(
Volume > average+2*sd ~ 'Higher',
Volume < average-2*sd ~ 'Lower',
'No significant deviation'))
tidyverse easily calculate the deviations for each product.
Step 2: Use the color of the bars to mark the deviations
As in Figure 1, we use
geom_bar() create bar charts. Coloring the bars according to the variable
Flag, we determine
fill = Flag below
aes. In contrast to Scheme 1, we do not determine a
position , so the bars remain stacked.
Tip: I’ve set
y = reorder(Item, Volume)so that the rods are arranged in descending order
# standard ggplot bar chartrecent_volume_bars <- geom_bar(aes(y = reorder(Item, Volume),
x = Volume,
fill = Flag),
stat = 'identity',
width = 0.7)
## set colours for the bars based on deviationflag_colours = c('Higher' = '#2c4b01',
'Lower' = '#8b0000',
'No significant deviation' = '#23395d')fill_colours_flag <- scale_fill_manual(values = flag_colours)
[Extra] Add labels to the bar to show last week’s quantities
label_on_bars <- geom_text(aes(label = Volume,
y = reorder(Item, Volume),
x = Volume),
position = position_stack(vjust=0.5),
color = 'white', size=6)
Step 3: Use a different geometry to indicate the weekly average volume
geom_point, we create a graph of points that reflect the mean. This gives the reader a good point of reference for deviations!
average_point <- geom_point(aes(y = reorder(Item, Volume),
x = average),
stat = 'identity',
size = 6, alpha = 0.7,
fill = 'white', stroke = 1,
shape = 21)
Step 4: Add a caption to the points
There are now several elements in our chart, and we need a way to explain what the points on the chart mean.
Configure from step 2
fill = Flag under aesthetics (
aes) creates a corresponding legend to explain what each color means. But for my dots, I really don’t want the color of the dots to be different. So instead of pointing to a field
colour, which would make the color of the dots vary by field, I instead put the text I want to display as the title of the legend. Therefore added
colour = “Average weekly volume over past year" below
aes in step 3 produces the following explanation:
A couple of cleanings are required as we use
colour parameter for this ‘unintentional’ purpose:
scale_colour_manual(values = c(‘black’))Specify the color you want for the circle
colour = NULLbelow
labs()to the final chart so that the word ‘color’ does not appear.
# Add all the elements together
scale_colour_manual(values = c('black'))+
theme_deviation_chart + # define theme() as you like
labs(title = paste('Volume sold in past week'),
fill = "Deviation in recent week:",
[For interested readers] You may be wondering why
colour specifically had to be used and not other aesthetic-like
alpha and so on.
We could use
fill aesthetic instead
colour and specify
scale_fill_manual(values = c('white'). however
fill was already used in our bars to indicate anomalies that would interfere with ‘cleaning’. I have tried to use
size but failed to “clean up”, we would love to hear if anyone can find a solution to this!
TLDR; Combined columns and scatter plot
– Take advantage of the different elements of ggplot to tell more with one formula
– Add a geometry caption to ggplot using geometry aesthetics.
The annual chart is an overview of the analysts ’toolkit as it is able to reflect both recent trends and seasonality. The basic chart from year to year is very simple in ggplot. By specifying the ‘group’ variable in the geom_line line, we immediately get the next line.
But with a little processing and ggplot elements, we can improve the chart as follows:
(1) adding date periods to the x-axis, and
(2) by shading a period of interest.
Step 1: Prepare data – formulate time periods for each week in 2021
The weeks of the year are not quite the most intuitive, so adding dates for each week in 2021 will give readers a better idea of the trends over time. To this end, we create an R-data framework that maps each year for a 2021-week period. In this example, the week starts on Saturday and ends on Friday.
# create a column for each date in 2021
dates2021 <- as.data.frame(x=seq(as.Date("2021-01-01"), as.Date("2021-12-31"), by="days"))
colnames(dates2021) <- 'Date'first_sat_2021 <- as.Date('2021-01-02') # date to start year# Match each date to a week of 2021 (1 - 52)
weeks_2021 <- dates2021 %>%
mutate(Week = floor(as.numeric(difftime(Date, first_sat_2021, units = "weeks")))+1,
Year = 2021) %>%
filter(Week > 0)
Once you have arranged each date for the week, use
group_by formatting function a period for each week of the year.
week_periods_2021 <- weeks_2021 %>%
week_start_date = first(Date),
week_end_date = last(Date)) %>%
week_period = paste(
format(as.Date(week_end_date), "%d %b"),
sep = " - "),
Year = 2021)
Step 2: Data Preparation – Obtain volumes per week
Assuming you have the number of products sold every day, we can use R.
summarise get easily sold amount every week.
# Dataset with 2 cols: Date & Volume
dummy_data_vol <- read_csv('<YOUR DATASET>') %>%
mutate(Date = as.Date(Date, format="%d/%m/%Y")) # Date to week mapping
weeks_2020_2021 <- rbind(weeks_2020, weeks_2021)# Join the date to week mapping to your raw data, and calculate volume each week
volume_each_week <- dummy_data_vol %>%
right_join(weeks_2020_2021, by = "Date") %>%
group_by(Year, Week) %>%
summarise(volume = sum(Volume)) %>%
mutate(Year = as.factor(Year))
The code above gives us the first three columns. Paste this into the time period formatted in step 1 to add a 4th column.
# Add the 2021 date ranges to each week
volume_each_week_dates <- volume_each_week %>%
mutate(week_period = as.Date(week_period))
Step 3: Use the grouping variable to draw a line for each year
As mentioned above, use
geom_line() specifies the field to which you want to draw multiple lines. For an annual plot, this field is
Year. In addition,
colour = Year is configured to color the lines differently. You can also configure
linetype = Year so that the lines are separated by shape (e.g. dashed line, dashed line, bold).
y_min = 40000
y_max = 80000basic_plot <- ggplot(volume_each_week_dates)+
geom_line(aes(x = Week, y = volume,
group = Year, colour = Year)) +
scale_color_manual(values = c('2020' = "#95C8D8",
'2021' = "salmon4"))+
labs(title = paste('Total Weekly Volume'),
y = 'Weekly Volume',
x = '2021 Weeks')+
ylim(y_min, y_max) +
scale_x_continuous(breaks = volume_each_week_dates$Week,
labels = volume_each_week_dates$week_period,
guide = guide_axis(check.overlap=TRUE))+
scale_x_continuous define the time periods created in steps 1 to
labels and we get the corresponding x-axis to the left. Dates make the plot much more intuitive for the reader instead of weeks 1-52!
Step 4: Highlight interesting time periods from the chart
Finally, to improve the baseline diagram, we can add a shaded rectangle and a corresponding label to the ggplot to reflect interesting time periods to grab the reader’s attention. This is again easy to do with the ggplot additive structure.