Running the Pipeline¶
The Acestor pipeline is executed through the main entry point at src/acestor/main.py.
Basic Usage¶
python src/acestor/main.py [OPTIONS]
Command-Line Options¶
Option |
Flag |
Description |
|---|---|---|
|
|
Path to configuration file (default: |
|
|
Date to run predictions for in YYYY-MM-DD format (default: today’s date) |
|
|
Download linelist data from S3 (default: False) |
|
|
Process linelist data to generate daily case counts (default: False) |
|
|
Download and use weather data from S3 (default: False) |
|
|
Use previously downloaded weather data from S3 without re-downloading (default: False) |
|
|
Generate alert thresholds based on historical data (default: False) |
|
|
Train the model and generate predictions (default: False) |
|
|
Generate map visualizations of predictions (default: False) |
Pipeline Stages¶
The pipeline executes in the following sequence:
Download Case Data (optional): Downloads linelist data from S3 when
-dlis specifiedProcess Linelist (optional): Converts raw linelist data to daily case counts when
-plis specifiedAggregate Case Data: Validates and aggregates case data at the configured granularity
Weather Data: Downloads and uses S3 data (
-ws) or uses previously downloaded data (-rws)Generate Thresholds (optional): Calculates alert thresholds when
-tis specifiedModel Training & Predictions (optional): Trains models and generates predictions when
-mis specifiedGenerate Maps (optional): Creates geographic visualizations when
-gmis specified
Example Commands¶
Standard run with freshly downloaded S3 weather data:
python src/acestor/main.py -ws -t -m
Use previously downloaded weather data (faster):
python src/acestor/main.py -rws -t -m
Full pipeline from linelist data:
python src/acestor/main.py -dl -pl -ws -t -m
Run with custom configuration:
python src/acestor/main.py -c config/custom_config.yaml -ws -t -m
Generate predictions with maps:
python src/acestor/main.py -ws -t -m -gm
Run predictions for a specific date:
python src/acestor/main.py -d 2024-12-01 -rws -t -m
Quick test run (using previously downloaded weather data, no maps):
python src/acestor/main.py -rws -t -m
Configuration File¶
The pipeline requires a YAML configuration file specifying:
root_dir: Root directory for data and outputsregion_name: Geographic region to process (e.g., “Karnataka”)granularity: Level of spatial detail (“district” or “subdistrict”)weather_data_path: Path to weather datageojson_folder: Path to GeoJSON boundary filesraw_linelist_path: Path to raw linelist data (if using linelist input)ihip_s3_location: S3 bucket location for IHIP data (if downloading from S3)debug: Enable debug mode (processes only last 3 years of data)
See the Input Data Specification page for detailed configuration file documentation.
Important Notes¶
Weather Data Options:
Use
-wsfor the first run or when you need fresh weather dataUse
-rwsfor subsequent runs to save time (reuses previously downloaded data)Weather data is cached in the
weather_data_pathspecified in the config
Threshold and Prediction Flags:
Both
-t(thresholds) and-m(predictions) are typically used together-tgenerates alert thresholds based on historical data-mtrains models and generates predictionsThese are separate flags to allow flexibility in pipeline execution
Date Selection:
By default, the pipeline runs predictions for today’s date
Use
-dto run predictions for historical dates or specific future datesDate must be in YYYY-MM-DD format
GeoJSON Files¶
GeoJSON boundary files must be organized in the folder specified by geojson_folder in the configuration:
geojsons/
└── <StateName>/
├── districts/
│ └── district_<LGD_CODE>.geojson
└── subdistricts/
└── subdistrict_<LGD_CODE>.geojson
See the Input Data Specification page for detailed GeoJSON requirements and format specifications.
Outputs¶
Pipeline outputs are saved to the results/ directory and include:
District-level predictions CSV
State-level predictions CSV
Log files in
logs/directoryMap visualizations (when
-gmis specified)