Skip to content

Installation

The use of ETL is mainly addressed to OWID staff, but open to the general public. It is supported and regularly run on Linux, MacOS and Windows via WSL. Here's how to get set up.

Warning

Some parts of ETL rely on other internal tools and resources, making it less suitable for external uses. Still, we believe that there is value in having this project open to the public for transparency and reproducibility purposes.

Install dependencies

You will need Python 3.10+, basic build tools, and MySQL client libraries.

Tip

We recommend using Homebrew to install dependencies.

Ensure you have XCode command line tools:

xcode-select --install

Then install Python 3.10+ and MySQL client and UV. UV is our preferred python packaging and dependency management tool.

brew install python mysql-client uv pkg-config

You then need to inform Python where to find MySQL by adding some lines to your ~/.zshrc file (or ~/.bash_profile, depends on your shell). Run brew info mysql-client to see what's needed. For example, on an M1/M2 Mac where Homebrew installs to /opt/homebrew, you would need to add:

export PATH="/opt/homebrew/opt/mysql-client/bin:$PATH"
export LDFLAGS="-L/opt/homebrew/opt/mysql-client/lib"
export CPPFLAGS="-I/opt/homebrew/opt/mysql-client/include"

On an Intel Mac, the paths will be slightly different.

Finally, check that you have the correct version of Python as your default:

which python3

It should say something like /usr/local/bin/python3 or /opt/homebrew/bin/python3. If not, you will have to change the PATH variable in your shell profile (e.g. ~/.bash_profile or ~/.zshrc).

You can install most things you need with apt:

sudo apt install python3-dev python3-virtualenv python3-setuptools mysql-client libmysqlclient-dev

Then install UV package manager with

curl -LsSf https://astral.sh/uv/install.sh | sh

or

pip install uv

You will need to install WSL2 to get started.

You should use Ubuntu 22.04 as your Linux distribution.

Then, enter your Linux console and follow the instructions for Ubuntu 22.04.


Extra config for staff

OWID staff who want to upsert data from ETL to grapher database will also need access to Cloudflare R2.

First start with installing rclone

brew rclone

Then configure its config with code ~/.config/rclone/rclone.conf. You should get your personal R2 keys r2_access_key_id and r2_secret_access_key and replace them in the config file.

[owid-r2]
type = s3
provider = Cloudflare
env_auth = true
access_key_id = r2_access_key_id
secret_access_key = r2_secret_access_key
region = auto
endpoint = https://078fcdfed9955087315dd86792e71a7e.r2.cloudflarestorage.com

[r2]
type = alias
remote = owid-r2:

Install pyenv

Tip

pyenv is not crucial now after switching to uv as a package manager. However, it is still recommended to use it to manage your Python versions.

Even though it's not compulsory, it is highly recommended to install pyenv to manage your Python versions. This will allow you to have multiple Python versions installed in your machine and switch between them easily. You will also avoid issues caused by updating system wide Python.

Follow the instructions in the pyenv installation guide or follow the steps below.

Install pyenv using Homebrew:

brew update
brew install pyenv

For a more complete installation guide, follow this guide.

Use the automatic installer:

curl https://pyenv.run | bash

For more details visit our other project: https://github.com/pyenv/pyenv-installer


Add these lines to ~/.zshrc, ~/.bash_profile or ~/.bashrc:

export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
if command -v pyenv 1>/dev/null 2>&1; then
  eval "$(pyenv init --path)"
  eval "$(pyenv init -)"
fi

Restart your shell to apply changes

exec "$SHELL"

Verify that pyenv is installed properly:

pyenv --version

Now, you can use pyenv to install and manage multiple Python versions on your Mac. For example, to install Python 3.12.0, run:

pyenv install 3.12.0

To set the newly installed Python version as the global default, run:

pyenv global 3.12.0

Now check that which python3 prints path .../.pyenv/shims/python3 and python --version prints Python 3.12.0.

Clone the project

First of all, you need to have the ETL project in your working environment. Run:

git clone https://github.com/owid/etl.git

Along with various directories and files, the project also has sub-packages in the lib/ folder: catalog, repack and datautils. These redistributable in-house libraries simplify access to data.

Check your environment

You can get started by using make to see available commands. Note that to run all make commands you should be in the project folder (as it contains the Makefile).

make help

The best way to check if your environment is healthy is to run:

make test

This will install the project, and then run all CI checks.

If make test succeeds, then you should be able to build any dataset you like, including the entire catalog. If it fails, please raise a Github issue (if OWID staff, you can also ask using the #tech-issues Slack channel).

Tip

Speed it up with multiple processes make -j 4 test.

Git hooks

The pre-commit hook is activated automatically by make .venv (and any target that depends on it). It runs make check (lint, format, type-check) before every git commit, which prevents accidentally pushing code that fails CI.

If you need to (re)activate it manually:

make install-hooks

VSCode setup

We highly recommended installing the following extensions:

Custom ETL extensions

We've built custom VS Code extensions to streamline ETL development. To install all extensions:

make vsce-sync

This includes extensions for navigating ETL steps, debugging interactively, comparing versions, and detecting outdated code patterns.

For detailed information about each extension and how to use them, see the VS Code Extensions Guide.

Additional configuration

Add this to your User settings.json (View -> Command Palette -> Preferences: Open User Settings (JSON)):

  "files.associations": {
    "*.dvc": "yaml"
  },
  "yaml.schemas": {
    "schemas/snapshot-schema.json": "**/*.dvc",
    "schemas/dataset-schema.json": ["**/*.meta.yml", "**/*.meta.override.yml"]
  },

Improve your terminal experience

Using Oh My Zsh.

We recommend using Oh My Zsh. It comes with a lot of plugins and themes that can make your life easier.

Automatic virtualenv activation

We use python virtual environments ("venv") everywhere. It's very convenient to have a script that automatically activates the virtualenv when you enter a project folder. Add the following to your ~/.zshrc or ~/.bashrc:

# enters the virtualenv when I enter the folder, provide it's called either .venv or env
autoload -U add-zsh-hook
load-py-venv() {
    if [ -f .venv/bin/activate ]; then
        # enter a virtual environment that's here
        source .venv/bin/activate
    elif [ -f env/bin/activate ]; then
        source env/bin/activate
    elif [ ! -z "$VIRTUAL_ENV" ] && [ -f poetry.toml -o -f requirements.txt ]; then
        # exit a virtual environment when you enter a new project folder
        deactivate
    fi
}
add-zsh-hook chpwd load-py-venv
load-py-venv

Some staff members also use Nushell, which supports similar hooks. Edit your $nu.config-path file, find the hooks section, and add to it an env_change stanza:

hooks:
    env_change: {
    PWD: [
        {
        condition: {|before, after| ["pyproject.toml" "requirements.txt" "setup.py"] | any {|f| $f | path exists } }
        code: "
            if ('.venv/bin/python' | path exists) {
            print -e 'Activating virtualenv'
            $env.PATH = ($env.PATH | split row (char esep) | filter {|p| $p !~ '.venv' } | prepend $\"($env.PWD)/.venv/bin\")
            } else {
            $env.PATH = ($env.PATH | split row (char esep) | filter {|p| $p !~ '.venv' })
            }
            "
        }
    ]
    }

Speed up navigation in terminal with autojump

Instead of cd ... to a correct folder, you can add the following to your ~/.zshrc or ~/.bashrc:

# autojump
[[ -s `brew --prefix`/etc/autojump.sh ]] && . `brew --prefix`/etc/autojump.sh

and then type j etl or j grapher to jump to the right folder.