Skip to content

Dockerfile image size #98

@colinleach

Description

@colinleach

This is forked from issue #96.

Before the recent update to R 4.5.2, I did some playing with local builds of images:

REPOSITORY         TAG       IMAGE ID       CREATED        SIZE
rtr-452            latest    c2d4dfb04bef   4 hours ago    2.87GB
rtr-current        latest    2bb37d8512fc   4 hours ago    2.39GB
rocker/tidyverse   4.5.2     fa0e5da2d9ea   4 weeks ago    2.86GB
rocker/r-ver       4.5.2     44e5a15c1df9   4 weeks ago    947MB
rhub/r-minimal     latest    e92270504c84   3 months ago   47.3MB

In the table, rtr-current was the 4.3.1 image we've been using for the previous 3 years, rtr-452 is very close to the build that went live on Exercism earlier today.

These are huge! And cost Exercism money every month!

We're currently based on rocker/tidyverse, which is where the hugeness comes from. Anything else in our Dockerfile is a rounding error.

Exercism really can't use all of Tidyverse. In particular, ggplot2 is a great graphing package, but the test runner isn't going to do anything useful with it.

These are the listed imports in the tidyverse package:

Imports: broom (>= 1.0.3), conflicted (>= 1.2.0), cli (>= 3.6.0),
        dbplyr (>= 2.3.0), dplyr (>= 1.1.0), dtplyr (>= 1.2.2), forcats
        (>= 1.0.0), ggplot2 (>= 3.4.1), googledrive (>= 2.0.0),
        googlesheets4 (>= 1.0.1), haven (>= 2.5.1), hms (>= 1.1.2),
        httr (>= 1.4.4), jsonlite (>= 1.8.4), lubridate (>= 1.9.2),
        magrittr (>= 2.0.3), modelr (>= 0.1.10), pillar (>= 1.8.1),
        purrr (>= 1.0.1), ragg (>= 1.2.5), readr (>= 2.1.4), readxl (>=
        1.4.2), reprex (>= 2.0.2), rlang (>= 1.0.6), rstudioapi (>=
        0.14), rvest (>= 1.0.3), stringr (>= 1.5.0), tibble (>= 3.1.8),
        tidyr (>= 1.3.0), xml2 (>= 1.3.3)

I think I need to:

  • Work out which are useful to us and which are just bloat.
  • Try some test builds from a smaller base image, installing just the packages we need on top.

I'm not totally optimistic. Installing Tidyverse locally on a Linux system (Mint 22.3 in my case) is quite slow and painful: everything is compiled from C++, and there are a strange variety of system dependencies.

Whatever, we don't finish unless we start...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions