Input Data Specification¶
Case Data Format¶
The primary input to the Acestor pipeline is the district-level daily dengue case data, stored in datasets/cases_district_daily.csv.
File Location¶
datasets/cases_district_daily.csv
File Format¶
The file is a comma-separated values (CSV) file with the following structure:
Column Specifications¶
Column Name |
Data Type |
Description |
|---|---|---|
|
String |
LGD Standard Name of the Indian state (e.g., “CHHATTISGARH”, “MAHARASHTRA”) |
|
String |
LGD Standard Name of the district within the state (e.g., “BALOD”, “DURG”) |
|
String |
Date of the observation in DD/MM/YYYY format (e.g., “01/05/2025”) |
|
Integer |
Number of samples tested for dengue on that date in the district |
|
Integer |
Number of confirmed dengue cases on that date in the district |
|
String |
LGD Standard identifier for the state in the format state_lgd-code (e.g., “state_22”) |
|
String |
LGD Standard identifier for the district/region in the format district_lgd-code(e.g., “district_646”) |
Data Characteristics¶
Temporal Coverage: Model performs better with more amount of data - Threshold calculation logic is of 2 types. One using historical data and other using past n weeks data. So data must either be recent upto current day, or there must be data of previous years for the model to work accurately.
Spatial Granularity: District level
Geographic Scope: Indian states and their districts
Primary Metrics:
Number of samples tested per day
Number of confirmed dengue cases per day
Example Records¶
state.name,district.name,date,samples_tested,case,state.ID,region_id
CHHATTISGARH,BALOD,01/05/2025,1,0,state_22,district_646
CHHATTISGARH,BALODABAZAR-BHATAPARA,01/05/2025,3,0,state_22,district_644
CHHATTISGARH,DURG,01/05/2025,22,0,state_22,district_378
Data Quality Notes¶
Case Values: Case counts are non-negative integers
Date Format: All dates follow the DD/MM/YYYY format
Identifiers: Both state.ID and region_id are based on LGD Standard.