Data Analyst • Business Intelligence Expert • Research Scientist
For almost a decade, the forecast package has been a
rock-solid framework for time series forecasting. However, within the
last year or so an official updated version has been released named
fable which now follows tidy methods as
opposed to base R.
More recently, modeltime has been released and this also
follows tidy methods. However, it is strictly used for modeling. For
data manipulation and visualization, the timetk package
will be used which is written by the same author as
modeltime.
The following is a code comparison of various time series
visualizations between these frameworks: fpp2,
fpp3 and timetk.
#Load Libraries
# Load libraries
library(fpp2) # An older forecasting framework
library(fpp3) # A newer tidy forecasting framework
library(timetk) # An even newer tidy forecasting framework
library(tidyverse) # Collection of data manipulation tools
library(tidyquant) # Business Science ggplot theme
library(cowplot) # A ggplot add-on for arranging plots
forecast &
fpp2fable
& fpp3timetk &
modeltimewe will utilize a dataset containing quarterly production values of certain commodities in Australia.
# Quarterly Australian production data as tibble
aus <- tsibbledata::aus_production %>% as_tibble()
# Check structure
aus %>% str()
## tibble [218 × 7] (S3: tbl_df/tbl/data.frame)
## $ Quarter : qtr [1:218] 1956 Q1, 1956 Q2, 1956 Q3, 1956 Q4, 1957 Q1, 1957 Q2, 1957...
## $ Beer : num [1:218] 284 213 227 308 262 228 236 320 272 233 ...
## $ Tobacco : num [1:218] 5225 5178 5297 5681 5577 ...
## $ Bricks : num [1:218] 189 204 208 197 187 214 227 222 199 229 ...
## $ Cement : num [1:218] 465 532 561 570 529 604 603 582 554 620 ...
## $ Electricity: num [1:218] 3923 4436 4806 4418 4339 ...
## $ Gas : num [1:218] 5 6 7 6 5 7 7 6 5 7 ...
# Convert tibble to time series object
aus_prod_ts <- ts(aus[, 2:7], # Choose columns
start = c(1956, 1), # Choose start date
end = c(2010, 2), # Choose end date
frequency = 4) # Choose frequency per yr
# Check it out
aus_prod_ts %>% tail()
## Beer Tobacco Bricks Cement Electricity Gas
## 2009 Q1 415 NA NA 1963 58368 196
## 2009 Q2 398 NA NA 2160 57471 238
## 2009 Q3 419 NA NA 2325 58394 252
## 2009 Q4 488 NA NA 2273 57336 210
## 2010 Q1 414 NA NA 1904 58309 205
## 2010 Q2 374 NA NA 2401 58041 236
#3 fpp3 Method: From ts to tsibble
# Convert ts to tsibble and keep wide format
aus_prod_tbl_wide <- aus_prod_ts %>% # TS object
as_tsibble(index = "index", # Set index column
pivot_longer = FALSE) # Wide format
## 3.2 Pivot Long Convert ts to tsibble and pivot to long format
aus_prod_tbl_long <- aus_prod_ts %>% # TS object
as_tsibble(index = "index", pivot_longer = TRUE) # Long format
#4 timetk Method: From tsibble/ts to tibble ## 4.1 Pivot Wide
# Convert tsibble to tibble, keep wide format
aus <- tsibbledata::aus_production %>%
tk_tbl() %>%
mutate(Quarter = as_date(as.POSIXct.Date(Quarter)))
# Quarterly Australian production data to long format
aus_long <- aus %>%
rename(date = Quarter) %>%
pivot_longer(
cols = c("Beer","Tobacco","Bricks",
"Cement","Electricity","Gas"))
When analyzing time series plots, look for the following patterns:
-Trend: A long-term increase or decrease in the data; a “changing direction”.
_Seasonality: A seasonal pattern of a fixed and known period. If the frequency is unchanging and associated with some aspect of the calendar, then the pattern is seasonal.
-Cycle: A rise and fall pattern not of a fixed frequency. If the fluctuations are not of a fixed frequency then they are cyclic.
-Seasonal vs Cyclic: Cyclic patterns are longer and more variable than seasonal patterns in general.
# Using fpp2
aus_prod_ts %>% # TS object
autoplot(facets=FALSE) # No facetting
quaterly production of selected quantities using fpp2
# Using fpp3
aus_prod_tbl_long %>% # Data in long format
autoplot(value)
quaterly production of selected quantities using fpp3
plotting multiple plots on the same axes has not been implemented
into timetk. hence, we Use ggplot.
# Using ggplot
aus_long %>%
ggplot(aes(date, value, group = name, color = name)) +
geom_line()
quaterly production of selected quantities using ggplot
# Using fpp2
aus_prod_ts %>%
autoplot(facets=TRUE) # With facetting
quaterly production of selected quantities using fpp2
# Using fpp3
aus_prod_tbl_long %>%
ggplot(aes(x = index, y = value, group = key)) +
geom_line() +
facet_grid(vars(key), scales = "free_y") # With facetting
quaterly production of selected quantities using fpp3
# Using timetk
aus_long %>%
plot_time_series(
.date_var = date,
.value = value,
.facet_vars = c(name), # Group by these columns
.color_var = name,
.interactive = FALSE,
.legend_show = FALSE
)
quaterly production of selected quantities using timetk
Use seasonal plots for identifying time periods in which the patterns change.
## 6.1 fpp2 Method: Plot Individual Seasons
# Monthly plot of anti-diabetic scripts in Australia
a1 <- a10 %>%
autoplot()
# Seasonal plot
a2 <- a10 %>%
ggseasonplot(year.labels.left = TRUE, # Add labels
year.labels = TRUE)
# Arrangement of plots
plot_grid(a1, a2, ncol=1, rel_heights = c(1, 1.5))
Monthly plot of anti-diabetic scripts
# Monthly plot of anti-diabetic scripts in Australia
a1 <- a10 %>%
as_tsibble() %>%
autoplot(value)
# Seasonal plot
a2 <- a10 %>%
as_tsibble() %>%
gg_season(value, labels="both") # Add labels
# Arrangement of plots
plot_grid(a1, a2, ncol=1, rel_heights = c(1, 1.5))
Monthly plot of anti-diabetic scripts
Note that seasonal plots have not been implemented into
timetk .hence, Use ggplot to write:
# Convert ts to tibble
a10_tbl <- fpp2::a10 %>%
tk_tbl()
# Monthly plot of anti-diabetic scripts in Australia
a1 <- a10_tbl %>%
plot_time_series(
.date_var = index,
.value = value,
.smooth = TRUE,
.interactive = FALSE,
.title = "Monthly anti-diabetic scripts in Australia"
)
# New time-based features to group by
a10_tbl_add <- a10_tbl %>%
mutate(
month = factor(month(index, label = TRUE)), # Plot this
year = factor(year(index)) # Grouped on y-axis
)
# Seasonal plot
a2 <- a10_tbl_add %>%
ggplot(aes(x = month, y = value,
group = year, color = year)) +
geom_line() +
geom_text(
data = a10_tbl_add %>% filter(month == min(month)),
aes(label = year, x = month, y = value),
nudge_x = -0.3) +
geom_text(
data = a10_tbl_add %>% filter(month == max(month)),
aes(label = year, x = month, y = value),
nudge_x = 0.3) +
guides(color = FALSE)
# Arrangement of plots
plot_grid(a1, a2, ncol=1, rel_heights = c(1, 1.5))
Monthly plot of anti-diabetic scripts
Use lag plots to check for randomness.
# Plot of non-seasonal oil production in Saudi Arabia
o1 <- fpp2::oil %>%
autoplot()
# Lag plot of non-seasonal oil production
o2 <- gglagplot(oil, do.lines = FALSE)
# Plot both
plot_grid(o1, o2, ncol=1, rel_heights = c(1,2))
Annual oil production in Saudi Arabia
# Plot of non-seasonal oil production
o1 <- oil %>%
as_tsibble() %>%
autoplot(value)
# Lag plot of non-seasonal oil production
o2 <- oil %>%
as_tsibble() %>%
gg_lag(y=value, geom = "point")
# Plot it
plot_grid(o1, o2, ncol=1, rel_heights = c(1,2))
Annual oil production in Saudi Arabia
# Convert to tibble and create lag columns
oil_lag_long <- oil %>%
tk_tbl(rename_index = "year") %>%
tk_augment_lags( # Add 9 lag columns of data
.value = value,
.names = "auto",
.lags = 1:9) %>%
pivot_longer( # Pivot from wide to long
names_to = "lag_id",
values_to = "lag_value",
cols = value_lag1:value_lag9) # Exclude year & value
then the plot:
# Time series plot
o1 <- oil %>%
tk_tbl(rename_index = "year") %>%
mutate(year = ymd(year, truncated = 2L)) %>%
plot_time_series(
.date_var = year,
.value = value,
.interactive = FALSE)
# timetk Method: Plot Multiple Lags
o2 <- oil_lag_long %>%
plot_time_series(
.date_var = value, # Use value instead of date
.value = lag_value, # Use lag value to plot against
.facet_vars = lag_id, # Facet by lag number
.facet_ncol = 3,
.interactive = FALSE,
.smooth = FALSE,
.line_alpha = 0,
.legend_show = FALSE,
.facet_scales = "fixed"
) +
geom_point(aes(colour = lag_id)) +
geom_abline(colour = "gray", linetype = "dashed")
# Plot it
plot_grid(o1, o2, ncol=1, rel_heights = c(1,2))
Annual oil production in Saudi Arabia
The autocorrelation function measures the linear relationship between lagged values of a time series. The partial autocorrelation function measures the linear relationship between the correlations of the residuals.
ACF
-Visualizes how much the most recent value of the series is correlated with past values of the series (lags) -If the data has a trend, then the autocorrelations for small lags tend to be positive and large because observations nearby in time are also nearby in size -If the data are seasonal, then the autocorrelations will be larger for seasonal lags at multiples of seasonal frequency than other lags
PACF
-Visualizes whether certain lags are good for modeling or not; useful for data with a seasonal pattern Removes dependence of lags on other lags by using the correlations of the residuals
# ACF plot
o1 <- ggAcf(oil, lag.max = 20)
# PACF plot
o2 <- ggPacf(oil, lag.max = 20)
# Plot both
plot_grid(o1, o2, ncol = 1)
Auto Correlation Function
# Convert to tsibble
oil_tsbl <- oil %>% as_tsibble()
# ACF Plot
o1 <- oil_tsbl %>%
ACF(lag_max = 20) %>%
autoplot()
# PACF Plot
o2 <- oil_tsbl %>%
PACF(lag_max = 20) %>%
autoplot()
# Plot both
plot_grid(o1, o2, ncol = 1)
Auto Correlation Function
# Using timetk
oil %>%
tk_tbl(rename_index = "year") %>%
plot_acf_diagnostics(
.date_var = year,
.value = value,
.lags = 20,
.show_white_noise_bars = TRUE,
.interactive = FALSE
)
Auto Correlation Function
As with all things in life, there are good and bad sides to using any of these three forecasting frameworks for visualizing time series. All three have similar functionality as it relates to visualizations.
fpp2
-Code requires minimal parameters -Uses basets format -Uses
ggplot for visualizations -Mostly incompatible with
tidyverse for data manipulation -No longer maintained except for bug
fixes
fpp3
-Code requires minimal parameters -Uses proprietary tsibble format
with special indexing tools -Uses ggplot for visualizations
-Mostly compatible with tidyverse for data manipulation; tsibble may
cause issues -Currently maintained
timetk
-Code requires multiple parameters but provides more granularity
-Uses standard tibble format -Uses ggplot and
plotly for visualizations -Fully compatible with
tidyverse for data manipulation -Currently maintained