Skip to content

Github Actions: Cache#

Life span#

Github Actions cache has a life span of 7 days, and the total size of all caches in a repository is limited to 10 GB.

Standard Cache#

Cache key should be as specific as possible, so that the post cache restore installation can be reduced or skipped.

For Python pip install, we could use the following cache key:

- name: Get pip cache dir
  run: |
    os_version=$(cat /etc/os-release | grep -i "version=" | cut -c9- | tr -d '"' | tr ' ' '_')
    github_workflow_full_path="${GITHUB_WORKFLOW_REF%@*}"
    python_full_version=$(python -c 'import platform; print(platform.python_version())')
    node_major_version=$(node --version | cut -d'.' -f1 | tr -d 'v')
    echo "os_version=$os_version" >> $GITHUB_ENV
    echo "github_workflow_full_path=$github_workflow_full_path" >> $GITHUB_ENV
    echo "python_full_version=$python_full_version" >> $GITHUB_ENV
    echo "PIP_CACHE_DIR=$(pip cache dir)" >> $GITHUB_ENV

- name: cache pip
  uses: actions/cache@v3
  with:
    # path: ${{ env.PIP_CACHE_DIR }}
    path: ${{ env.pythonLocation }}
    key: ${{ env.github_workflow_full_path}}-${{ env.os_version }}-${{ env.python_full_version }}-${{ env.node_major_version}}-${{ hashFiles('requirements/*.txt') }}

The cache action repository provides also some Python caching examples.

pip cache dir vs pip install dir#

The path parameter in actions/cache@v3 could be:

  • ${{ env.PIP_CACHE_DIR }} if you only want to cache the pip cache dir, so you can skip the Python package download step, but you still need to install the packages.
  • ${{ env.pythonLocation }} if you want to cache the whole python installation dir, this is useful when you want to cache the site-packages dir, so that the pip install step can be reduced or skipped, this is also why we must use the ${{ env.os_version }}, ${{ env.python_full_version }} in the cache key. In most of cases, this is the best choice.

hashFiles#

In Azure Pipelines, there's similar thing as hashFiles() function, it should be in the form of glob pattern, like requirements/*.txt, but without double quotes, otherwise treated as a static string.

# Azure Pipelines
- task: Cache@2
  inputs:
    key: 'python | "$(pythonFullVersion)" | "$(osVersion)" | "$(System.TeamProject)" | "$(Build.DefinitionName)" | "$(Agent.JobName)" | requirements/*.txt'
    path: ...
  displayName: ...

Otherwise, we can also achieve the same result by some pure bash commands:

# suppose parameters.requirementsFilePathList is a list of file paths
- script: |
    echo REQUIREMENTS_FILE_PATH_LIST_STRING: $REQUIREMENTS_FILE_PATH_LIST_STRING
    all_files_in_one_line=$(echo $REQUIREMENTS_FILE_PATH_LIST_STRING | jq  '. | join(" ")' -r)
    echo all_files_in_one_line: $all_files_in_one_line
    all_files_md5sum=$(cat $all_files_in_one_line | md5sum | awk '{print $1}')
    echo all_files_md5sum: $all_files_md5sum
    echo "##vso[task.setvariable variable=pythonRequirementsFilesHash;]$all_files_md5sum"
  displayName: Set pythonRequirementsFilesHash
  env:
    REQUIREMENTS_FILE_PATH_LIST_STRING: "${{ convertToJson(parameters.requirementsFilePathList) }}"

Cache with actions/setup-python#

The action actions/setup-python has built-in functionality for caching and restoring dependencies with cache key. This cache method can only cache the pip cache dir to reduce the Python packages download time like path: ${{ env.PIP_CACHE_DIR }} in above example, but still need to install the packages, which is much slower than caching the package installation location. As the time of writing, the cache source dir (which is the pip cache dir) is generated by the action itself, and cannot be customized.

The cache key is something like: setup-python-Linux-22.04-Ubuntu-python-3.10.13-pip-308f89683977de8773e433ddf87c874b6bd931347b779ef0ab18f37ecc4fa914 (copied from workflow run log), which is generated as per this answer.

steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
  with:
    python-version: '3.10'
    cache: 'pip' # caching pip dependencies, could be pip, pipenv, or poetry
    cache-dependency-path: requirements/*.txt
- run: pip install -r requirements.txt

If cache-dependency-path is not specified, and if the cache type is pip, it will try to find all the requirements.txt files in the repo and hash them to generate the cache key. For cache type with pipenv or poetry, I didn't test them.

Comments