Input Data Specification

Case Data Format

The primary input to the Acestor pipeline is the district-level daily dengue case data, stored in datasets/cases_district_daily.csv.

File Location

datasets/cases_district_daily.csv

File Format

The file is a comma-separated values (CSV) file with the following structure:

Column Specifications

Case Data Columns

Column Name

Data Type

Description

state.name

String

LGD Standard Name of the Indian state (e.g., “CHHATTISGARH”, “MAHARASHTRA”)

district.name

String

LGD Standard Name of the district within the state (e.g., “BALOD”, “DURG”)

date

String

Date of the observation in DD/MM/YYYY format (e.g., “01/05/2025”)

samples_tested

Integer

Number of samples tested for dengue on that date in the district

case

Integer

Number of confirmed dengue cases on that date in the district

state.ID

String

LGD Standard identifier for the state in the format state_lgd-code (e.g., “state_22”)

region_id

String

LGD Standard identifier for the district/region in the format district_lgd-code(e.g., “district_646”)

Data Characteristics

  • Temporal Coverage: Model performs better with more amount of data - Threshold calculation logic is of 2 types. One using historical data and other using past n weeks data. So data must either be recent upto current day, or there must be data of previous years for the model to work accurately.

  • Spatial Granularity: District level

  • Geographic Scope: Indian states and their districts

  • Primary Metrics:

    • Number of samples tested per day

    • Number of confirmed dengue cases per day

Example Records

state.name,district.name,date,samples_tested,case,state.ID,region_id
CHHATTISGARH,BALOD,01/05/2025,1,0,state_22,district_646
CHHATTISGARH,BALODABAZAR-BHATAPARA,01/05/2025,3,0,state_22,district_644
CHHATTISGARH,DURG,01/05/2025,22,0,state_22,district_378

Data Quality Notes

  • Case Values: Case counts are non-negative integers

  • Date Format: All dates follow the DD/MM/YYYY format

  • Identifiers: Both state.ID and region_id are based on LGD Standard.