NSF CAREER: Learning from When, Where and by Whom Data is Generated for Advancing Public Health Studies (2019-2024)

Improving disease prevention through robust and high-granularity measures of lifestyle, environmental and social factors from daily life will improve healthcare by enabling precise and focused proactive interventions. This will dramatically change the healthcare paradigm in this country and significantly reduce costs and illnesses, more so than a solely reactive focus on disease diagnosis and treatment. Public health is the study of these daily life factors and prevention efforts. New person-generated data (PGD) from Internet and mobile data sources, such as mHealth, social media, wearables, and data from smartphone apps, offer unprecedented opportunity to provide sub-daily, as well as local, neighborhood-level measures of lifestyle, environmental and social factors from daily life. However, the impact of this data has yet to be fully realized for public health efforts. In part, this is because existing research efforts on PGD often focus on processing the content of data in isolation, and do not consider human data sharing patterns, that is, who contributes the data, when it is contributed and from where it is contributed. By accounting for these attributes, this project aims to improve the validity and reliability of measures extracted from PGD and enable improved understanding of high-granularity health risks and outcomes. The project will also provide a highly-integrated research and educational program for public health practitioners, students, and community members in the context of PGD and public health by: (1) preparing students to use computer science in today’s job landscape via a problem-based learning class; (2) increasing high-school students’ exposure to computer science in the real-world with a focus on applications of computer science; and (3) disseminating scientific understanding of computer science in the public health and general community. In conjunction, this work will improve both computer science and public health practice and research through method development and exposure of diverse community members and community-oriented professionals to the utility of data mining and machine learning.


Students that have been supported in full or part by this award include:

  • Vishwali Mhasawade
  • Harvineet Singh

Broader Impacts

  • Hosting two high school students each summer for research projects through the NYU ARISE (Applied Research Innovations in Science and Engineering) program in 2019, 2020, 2021, 2022
  • Designing a new Machine Learning in Public Health course at NYU School of Global Public Health
  • Organized the first and recurring Machine Learning in Public Health workshop at NeurIPS (20202021).


This material is based upon work supported by the National Science Foundation under Grant No. 1845487.
Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation

Last update: Apr. 21, 2022


Thorpe, L.E., Chunara, R., Roberts, T., Pantaleo, N., Irvine, C., Conderino, S., Li, Y., Hsieh, P.Y., Gourevitch, M.N., Levine, S. and Ofrane, R., 2022. Building Public Health Surveillance 3.0: Emerging Timely Measures of Physical, Economic, and Social Environmental Conditions Affecting Health. American Journal of Public Health, (0), pp.e1-e10.

Singh, H., Mhasawade, V. and Chunara, R., 2022. Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database. PLOS Digital Health, 1(4), p.e0000023. Highlighted in ACM News

Mhasawade, V., Zhao, Y. and Chunara, R., 2021. Machine learning and algorithmic fairness in public and population health. Nature Machine Intelligence, 3(8), pp.659-666.

Zhao Y, Wood EP, Mirin N, Cook SH, Chunara R. Social determinants in machine learning cardiovascular disease prediction models: a systematic review. Am J Prev Med. 2021 Oct;61(4):596-605. Epub 2021 Jul 27. Highlighted in the Wall Street Journal

Mhasawade, V. and Chunara, R., 2021, July. Causal multi-level fairness. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (pp. 784-794).

Singh, H., Singh, R., Mhasawade, V. and Chunara, R., 2021, March. Fairness violations and mitigation under covariate shift. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 3-13).

Chunara, R., Zhao, Y., Chen, J., Lawrence, K., Testa, P.A., Nov, O. and Mann, D.M., 2021. Telemedicine and healthcare disparities: a cohort study in a large healthcare system in New York City during COVID-19. Journal of the American Medical Informatics Association, 28(1), pp.33-41.

Chunara, R. and Cook, S.H., 2020. Using digital data to protect and promote the most vulnerable in the fight against COVID-19. Frontiers in Public Health, 8, p.296.

Tian, Y. and Chunara, R., 2020, May. Quasi-experimental designs for assessing response on social media to policy changes. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 14, pp. 671-682). Best paper honorable mention

Mhasawade, V., Rehman, N.A. and Chunara, R., 2020, April. Population-aware hierarchical bayesian domain adaptation via multi-component invariant learning. In Proceedings of the ACM Conference on Health, Inference, and Learning (pp. 182-192).

Akbari, M. and Chunara, R., 2019, October. Using contextual information to improve blood glucose prediction. In Machine Learning for Healthcare Conference (pp. 91-108). PMLR.

Abdur Rehman, N., Saif, U. and Chunara, R., 2019. Deep landscape features for improving vector-borne disease prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 44-51). Selected for oral presentation.

Rehman, N.A., Relia, K. and Chunara, R., 2018, November. Creating full individual-level location timelines from sparse social media data. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 379-388).

Relia, K., Akbari, M., Duncan, D. and Chunara, R., 2018. Socio-spatial self-organizing maps: using social media to assess relevant geographies for exposure to social processes. Proceedings of the ACM on human-computer interaction, 2(CSCW), pp.1-23.