Running the Pipeline¶

The Acestor pipeline is executed through the main entry point at src/acestor/main.py.

Basic Usage¶

python src/acestor/main.py [OPTIONS]

Command-Line Options¶

Pipeline Arguments¶
Option	Flag	Description
`--config`	`-c`	Path to configuration file (default: `config/dengue_pipeline.yaml`)
`--date`	`-d`	Date to run predictions for in YYYY-MM-DD format (default: today’s date)
`--download-linelist`	`-dl`	Download linelist data from S3 (default: False)
`--process-linelist`	`-pl`	Process linelist data to generate daily case counts (default: False)
`--download-and-use-weather-data-from-s3`	`-ws`	Download and use weather data from S3 (default: False)
`--use-previously-downloaded-weather-data-from-s3`	`-rws`	Use previously downloaded weather data from S3 without re-downloading (default: False)
`--generate-thresholds`	`-t`	Generate alert thresholds based on historical data (default: False)
`--model-train-and-predict`	`-m`	Train the model and generate predictions (default: False)
`--generate-maps`	`-gm`	Generate map visualizations of predictions (default: False)

Pipeline Stages¶

The pipeline executes in the following sequence:

Download Case Data (optional): Downloads linelist data from S3 when -dl is specified
Process Linelist (optional): Converts raw linelist data to daily case counts when -pl is specified
Aggregate Case Data: Validates and aggregates case data at the configured granularity
Weather Data: Downloads and uses S3 data (-ws) or uses previously downloaded data (-rws)
Generate Thresholds (optional): Calculates alert thresholds when -t is specified
Model Training & Predictions (optional): Trains models and generates predictions when -m is specified
Generate Maps (optional): Creates geographic visualizations when -gm is specified

Example Commands¶

Standard run with freshly downloaded S3 weather data:

python src/acestor/main.py -ws -t -m

Use previously downloaded weather data (faster):

python src/acestor/main.py -rws -t -m

Full pipeline from linelist data:

python src/acestor/main.py -dl -pl -ws -t -m

Run with custom configuration:

python src/acestor/main.py -c config/custom_config.yaml -ws -t -m

Generate predictions with maps:

python src/acestor/main.py -ws -t -m -gm

Run predictions for a specific date:

python src/acestor/main.py -d 2024-12-01 -rws -t -m

Quick test run (using previously downloaded weather data, no maps):

python src/acestor/main.py -rws -t -m

Configuration File¶

The pipeline requires a YAML configuration file specifying:

root_dir: Root directory for data and outputs
region_name: Geographic region to process (e.g., “Karnataka”)
granularity: Level of spatial detail (“district” or “subdistrict”)
weather_data_path: Path to weather data
geojson_folder: Path to GeoJSON boundary files
raw_linelist_path: Path to raw linelist data (if using linelist input)
ihip_s3_location: S3 bucket location for IHIP data (if downloading from S3)
debug: Enable debug mode (processes only last 3 years of data)

See the Input Data Specification page for detailed configuration file documentation.

Important Notes¶

Weather Data Options:

Use -ws for the first run or when you need fresh weather data
Use -rws for subsequent runs to save time (reuses previously downloaded data)
Weather data is cached in the weather_data_path specified in the config

Threshold and Prediction Flags:

Both -t (thresholds) and -m (predictions) are typically used together
-t generates alert thresholds based on historical data
-m trains models and generates predictions
These are separate flags to allow flexibility in pipeline execution

Date Selection:

By default, the pipeline runs predictions for today’s date
Use -d to run predictions for historical dates or specific future dates
Date must be in YYYY-MM-DD format

GeoJSON Files¶

GeoJSON boundary files must be organized in the folder specified by geojson_folder in the configuration:

geojsons/
└── <StateName>/
    ├── districts/
    │   └── district_<LGD_CODE>.geojson
    └── subdistricts/
        └── subdistrict_<LGD_CODE>.geojson

See the Input Data Specification page for detailed GeoJSON requirements and format specifications.

Outputs¶

Pipeline outputs are saved to the results/ directory and include:

District-level predictions CSV
State-level predictions CSV
Log files in logs/ directory
Map visualizations (when -gm is specified)