Running the Pipeline ==================== The Acestor pipeline is executed through the main entry point at ``src/acestor/main.py``. Basic Usage ----------- .. code-block:: bash python src/acestor/main.py [OPTIONS] Command-Line Options -------------------- .. list-table:: Pipeline Arguments :header-rows: 1 :widths: 30 15 55 * - Option - Flag - Description * - ``--config`` - ``-c`` - Path to configuration file (default: ``config/dengue_pipeline.yaml``) * - ``--date`` - ``-d`` - Date to run predictions for in YYYY-MM-DD format (default: today's date) * - ``--download-linelist`` - ``-dl`` - Download linelist data from S3 (default: False) * - ``--process-linelist`` - ``-pl`` - Process linelist data to generate daily case counts (default: False) * - ``--download-and-use-weather-data-from-s3`` - ``-ws`` - Download and use weather data from S3 (default: False) * - ``--use-previously-downloaded-weather-data-from-s3`` - ``-rws`` - Use previously downloaded weather data from S3 without re-downloading (default: False) * - ``--generate-thresholds`` - ``-t`` - Generate alert thresholds based on historical data (default: False) * - ``--model-train-and-predict`` - ``-m`` - Train the model and generate predictions (default: False) * - ``--generate-maps`` - ``-gm`` - Generate map visualizations of predictions (default: False) Pipeline Stages --------------- The pipeline executes in the following sequence: 1. **Download Case Data** (optional): Downloads linelist data from S3 when ``-dl`` is specified 2. **Process Linelist** (optional): Converts raw linelist data to daily case counts when ``-pl`` is specified 3. **Aggregate Case Data**: Validates and aggregates case data at the configured granularity 4. **Weather Data**: Downloads and uses S3 data (``-ws``) or uses previously downloaded data (``-rws``) 5. **Generate Thresholds** (optional): Calculates alert thresholds when ``-t`` is specified 6. **Model Training & Predictions** (optional): Trains models and generates predictions when ``-m`` is specified 7. **Generate Maps** (optional): Creates geographic visualizations when ``-gm`` is specified Example Commands ---------------- **Standard run with freshly downloaded S3 weather data:** .. code-block:: bash python src/acestor/main.py -ws -t -m **Use previously downloaded weather data (faster):** .. code-block:: bash python src/acestor/main.py -rws -t -m **Full pipeline from linelist data:** .. code-block:: bash python src/acestor/main.py -dl -pl -ws -t -m **Run with custom configuration:** .. code-block:: bash python src/acestor/main.py -c config/custom_config.yaml -ws -t -m **Generate predictions with maps:** .. code-block:: bash python src/acestor/main.py -ws -t -m -gm **Run predictions for a specific date:** .. code-block:: bash python src/acestor/main.py -d 2024-12-01 -rws -t -m **Quick test run (using previously downloaded weather data, no maps):** .. code-block:: bash python src/acestor/main.py -rws -t -m Configuration File ------------------ The pipeline requires a YAML configuration file specifying: - ``root_dir``: Root directory for data and outputs - ``region_name``: Geographic region to process (e.g., "Karnataka") - ``granularity``: Level of spatial detail ("district" or "subdistrict") - ``weather_data_path``: Path to weather data - ``geojson_folder``: Path to GeoJSON boundary files - ``raw_linelist_path``: Path to raw linelist data (if using linelist input) - ``ihip_s3_location``: S3 bucket location for IHIP data (if downloading from S3) - ``debug``: Enable debug mode (processes only last 3 years of data) See the :doc:`data_specification` page for detailed configuration file documentation. Important Notes --------------- **Weather Data Options:** - Use ``-ws`` for the first run or when you need fresh weather data - Use ``-rws`` for subsequent runs to save time (reuses previously downloaded data) - Weather data is cached in the ``weather_data_path`` specified in the config **Threshold and Prediction Flags:** - Both ``-t`` (thresholds) and ``-m`` (predictions) are typically used together - ``-t`` generates alert thresholds based on historical data - ``-m`` trains models and generates predictions - These are separate flags to allow flexibility in pipeline execution **Date Selection:** - By default, the pipeline runs predictions for today's date - Use ``-d`` to run predictions for historical dates or specific future dates - Date must be in YYYY-MM-DD format GeoJSON Files ------------- GeoJSON boundary files must be organized in the folder specified by ``geojson_folder`` in the configuration: .. code-block:: text geojsons/ └── / ├── districts/ │ └── district_.geojson └── subdistricts/ └── subdistrict_.geojson See the :doc:`data_specification` page for detailed GeoJSON requirements and format specifications. Outputs ------- Pipeline outputs are saved to the ``results/`` directory and include: - District-level predictions CSV - State-level predictions CSV - Log files in ``logs/`` directory - Map visualizations (when ``-gm`` is specified)