skip to main content

Data science for epidemiology: a case study of dengue in Brazil.

Roster, Kirstin Ingrid Oliveira

Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Ciências Matemáticas e de Computação 2022-12-19

Acesso online

  • Título:
    Data science for epidemiology: a case study of dengue in Brazil.
  • Autor: Roster, Kirstin Ingrid Oliveira
  • Orientador: Rodrigues, Francisco Aparecido
  • Assuntos: Aprendizado De Máquina; Dengue; Inferência Causal; Previsão De Doenças; Causal Inference; Disease Forecasting; Machine Learning
  • Notas: Tese (Doutorado)
  • Descrição: This thesis is a collection of studies on the application of data science to problems in dengue epidemiology. We leverage machine learning models together with methods from causal inference for two important public health objectives: (i) forecasting disease prevalence to anticipate outbreaks and allocate resources, and (ii) understanding disease drivers to develop effective interventions. Using diverse data on disease prevalence, climate, and human behavior, we demonstrate how machine learning can be applied in three different contexts: first, to develop accurate predictions of infections across Brazilian cities; second, to generalize predictions to new diseases; and finally, as an intermediate step for causal inference. In Chapter 2, we compare machine learning algorithms for dengue prediction and assess the value of causal feature selection. We find variation in the optimal predictors in national (domain-invariant) and single-city (domain-specific) settings. Decision tree ensemble models perform best at national scale. Causal feature selection performs best according to one of four error metrics, though it is not the optimal method across all cities in single-city forecasts. This result helps us better understand the potential within-domain cost in predictive performance of causally-informed models. In Chapter 3, we assess the generalizability of the dengue models developed in the prior chapter. Based on the hypothesis that diseases may share common time series characteristics, we test the effectiveness of knowledge transfer from endemic to novel diseases to improve predictions in low-data settings. We compare instance- and parameter-based transfer learning algorithms and evaluate performance on both synthetic and empirical data. Results suggest that transfer learning offers potential for early pandemic response and that the most predictive algorithm and transfer method depends on the similarity of the disease pairs. In Chapter 4, we consider the contribution of machine learning to causal inference, by examining the impact of the COVID-19 pandemic on dengue in Brazil. We estimate the gap between expected and observed dengue cases using an interrupted time series design. We also decompose the gap into the impacts of climate conditions, pandemic-induced changes in reporting, human susceptibility, and human mobility. We find that there is considerable variation across the country in both overall pandemic impact on dengue and the relative importance of individual drivers. This analysis helps shed light on the data gaps caused by the COVID-19 pandemic and more generally, on possible intervention targets to help control dengue in the future.
  • DOI: 10.11606/T.55.2022.tde-27022023-142607
  • Editor: Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Ciências Matemáticas e de Computação
  • Data de criação/publicação: 2022-12-19
  • Formato: Adobe PDF
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.