A short introduciton to Airflow and quick list of useful references.
Tested Configuration:
Linux: Ubuntu 18.04

Quickstart

I recommend the official documentation.

But, you may also find this tutorial a good complementary : towardsdatascience

What to do next, well you can have a look at the Airflow example pipeline or check their “how to” out.

Tip:

in production you will want to use the demonized version of airflow scheduler. It’s easy, simply do airflow scheduler -D instead of airflow scheduler

Change the SQLite default database for airflow

As a first step you will need to install some airflow plugins. Airflow maintains a list plugin here.
The official documentation is pretty good BUT : you may find these two ressources useful in order to do it

  1. about sql_alchemy_conn : medium
  2. don’t forget to change the LocalExecutor

TroubleShooting

bug with airflow scheduler

Airflow logs take a lot of space on my disk

If you decide to keep your logs on your server (you can store them online instead, like on S3 for instance) you may encounter this issue : disk full. Indeed Airflow produces a lot of logs, and they can fill quickly your disk. On solution for that is to create DAGs to remove old logs. This issue was described and solved here. Note: you’ll find other interesting scripts for Airflow maintenance [here] (https://github.com/teamclairvoyant/airflow-maintenance-dags/).

Reference

Official tutorial : airflow
“Cannot allocate memory”

 : stackoverflow -> allow swap
sql_alchemy_conn issue : medium
Another interesting quickstart : rosiehoyem
remove PID of airflow scheduler : stackoverflow
Further tips : airflow maintenance dags