Shiny: Fast Data Loading with fst

I had several projects where I had to load in a big dataset for my shiny app. This loading was usually done in the beginning and would take more than 3 minutes. My target was to reduce this time. I starting thinking about the problem and discovered, that not the whole dataset is required when I start the app.

Fast and flexible data loading with fst

My first idea was to use a database. There is e.g. RSQLite. I also liked MonetDB a lot (much faster than RSQLite), but it was not possible to make it run on the system where it was required so I searched for alternatives.

Then I discovered the fst R-package. It is a package to save datasets (data.frame/data.table) in the fst-format. Data loading is reasonably fast. Moreover one can access rows and columns of the dataset without loading the whole dataset. So it basically provides some functionalities of a database. The partial loading of the dataset is much faster than loading the whole dataset.

So this was the functionality I needed for speeding up the data loading process.

My strategy in the Shiny App was then the following:

  • At the beginning just load the data that is really required for the dashboard at the beginning.
  • Possible required data transformations (that are always required) are done in a data preparation script once beforehand.
  • Meta-Data such as possible choices for inputs is saved in a separate .RData file called meta_data.RData.
  • The update of this meta-data is part of the data preparation process. This meta-data is small and loaded at every start of the app.
  • If some columns or rows are needed afterwards they are loaded afterwards into the app and added to the existing dataset.

All in all I was able to reduce the loading time from 3 minutes to 5 seconds by this at the starting of the app. The data loading ĺater - usually only 1 or 2 columns at once - did not have a notable performance change on the app.

Some technical details

In the following some technical details on how I realized it in my app. The following packages are required:

library(shiny)
library(fst)
library(data.table)

I initiate the data as an empty reactive Value as well as reactive Values for the fst file and the selected rows of the fst file.

data= reactiveVal(NULL)
tmp_all = reactiveValues(fst = NULL, rows_fst = NULL, cols_fst = NULL)

Then I get the fst File that I saved beforehand with write_fst. Note that the dataset is not loaded yet with this command.

tmp_fst = fst(my_path)

I specify the rows and columns I want to load and save them in tmp_all:

rows_fst = tmp_fst$year <= bis
cols_fst = c("ID", "year", "outcome")
tmp_all$rows_fst = rows_fst
tmp_all$fst = tmp_fst

Then I load the dataset as data.table and save it in the reactive Value:

tmp = tmp_fst[rows_fst, cols_fst] %>% setDT()
data(tmp)

I wrote a function to add variables afterwards. I test first if they are already available and only add new variables:

add_variable <- function(tmp, tmp_all, new_vars) {
  inputs = new_vars[!(new_vars %in% colnames(tmp))]
  if(length(inputs) > 0) {
    tmp_fst = tmp_all$fst
    rows_fst = tmp_all$rows_fst
    tmp_calc = tmp_fst[rows_fst, inputs, drop = FALSE] #%>% setDT()
    tmp = tmp[, (inputs) := tmp_calc]
  }
  return(tmp)
}

Finally the variables are added to the dataset and are now readily available for the Shiny app.

new_vars = c("the_new_variable")
tmp <- add_variable(tmp = data(), tmp_all, new_vars)
data_filtered(tmp)

Feedback

Let me know if you had similar problems and how you solved it. Maybe you have some ideas for improvement.

Written on March 22, 2022
comments powered by Disqus