Dalton Simancek

Privacy-Preserving NLP Researcher | PhD Candidate | Expertise in Large-Scale Data Analytics for Consumer Health and Smart Home Software

PhD candidate specializing in privacy-preserving NLP in Healthcare, with industry experience in large-scale data analytics for consumer health and smart home software. I seek full-time opportunities to leverage my academic research and large-scale data expertise to develop impactful ML solutions

(248) 568-4494 · daltonsi@umich.edu

Experience

NLP Graduate Researcher

University of Michigan Medical School, NLP4HEALTH Research Group | Ann Arbor, MI

Responsible Health NLP Research and Evaluation

  • Develop privacy enhancements to transformer-based clinical text de-identification model pipelines to reduce the leakage of sensitive patient names by 70% and the impact decrease system false negatives at scale.
  • Present research on pseudonymization and error-handling strategies at EACL and the AMIA Annual Symposium.
  • Implement a novel NLP fairness evaluation framework utilizing patient demographic information to test de-identification performance gaps across patient subpopulations.

NLP Model Development for Patient Privacy

  • Develop a transformer-based de-identification NLP model with the lead researcher, facilitating automated processing for thousands of real Electronic Health Record (EHR) notes for compliance with HIPAA regulations.
  • Manage de-identification model assets to standardize configuration files, generate logs, run tests, and streamline installation procedures for batch processing in research and production environments.
  • Adopt differential privacy and federated learning frameworks into model training pipelines of clinical text-de-identification workflows.

Cross-Functional Leadership and Mentorship

  • Supervise four student submissions to shared tasks for the 9th Social Media Mining for Health Research and Applications Workshops and Shared Tasks (SMM4H) at ACL 2024.
  • Lecture as an instructor for the graduate-level Natural Language Processing for Health Data course. Lecture on fundamental ML/AI topics such as Neural Networks and NLP subtasks such as information extraction.
  • Collaborate with Michigan Medicine Data Office for further evaluation testing of the de-identification system on unseen clinical text documents.
Aug 2022 - Present

Data Scientist

Apple - HomeKit Software Framework | Cupertino, CA

Large-Scale Data Analytics Products for Business Understanding and Quality Impact

  • Led the development and delivery of HomeKit earnings reports tracking quarterly customer adoption rates, feature usage, and accessory installations for eight company calls with thousands of Apple shareholders.
  • Spearheaded development and testing of 30+ always-on dashboards and analytics scripts, delivering real-time and daily analytics updates to dozens of engineers and executives using logging from millions of Apple devices.
  • Collaborated with lead analytics engineer to create and validate over 50 new structured logging metrics into 10+ HomeKit software updates for millions of customer devices.

Cross-Functional Collaboration and Mentorship

  • Guided HomeKit leadership on the deprecation of further HomeKit support on older models of iPhones and HomePods through analyses of the user population and growth trends, effectively excluding less than 5% of the user base in favor of novel security and framework updates.
  • Advised the HomeKit Product team on creating HomeKit user profiles using analytics of device populations concerning HomeKit usage and Smart Home device compositions. Experience guiding HomeKit product direction using novel observations and analyses of the HomeKit user population.
  • Mentored data analysts and engineers on HomeKit data pipeline nuances, focusing on opt-in populations, multi-generation software metrics, and internal development build data volatility.
Nov 2019 - Oct 2021

Software Engineering Stability Analyst

Apple - watchOS Stability | Cupertino, CA

Distributed Computing for Software Analytics Strategy

  • Independently led and maintained a backend Map-Reduce job for a highly available, always-on software analytics tool used by hundreds of engineers to aggregate MacOS device logs into summary usage statistics.
  • Wrote dozens of custom Hadoop/Map-reduce jobs for large-scale batch processing on production grids to track real-time logging and bugs from specific device customer populations.
  • Developed Python scripting to automate visualization termination rates across multiple builds from batch processing of customer device logging, reducing the required analyst hours by 50%.

Data Project Leadership and Cross-Functional Collaboration

  • Served as engineering project manager for Stability at watchOS Bug Review Boards, collaborating with senior engineers and executives to guide development on concurrent OS updates using analytics-driven strategies.
  • Provided mentorship and training to watchOS team managers and engineers on best practices to monitor stability metrics and build their signals into development planning.
  • Managed stability project benchmarks by leading cross-functional teams to triage bug fixes and set strategic stability goals for watchOS and its member applications, including Health, Music, Siri, and Fitness Workouts.

Data Analytics for watchOS Software Development

  • Independently analyzed and monitored termination rates for over 10 watchOS stability issues, including kernel panics, crashes within the OS, user-facing applications, and background Daemons.
  • Curated Top Issue lists for software issues based on aggregate statistics and the impact of bugs.
  • Created weekly reports and stability presentation briefings to effectively set and track stability milestones across the software cycle from early internal development through a series of developer builds and into public release.

Continuous Knowledge Sharing

  • Led documentation working group on Data Analytics team, setting documentation targets for each sub-team, including DevOps, Web Development, and Data Science, to generate dozens of pages in teamwide documentation.
  • Documented dozens of stability metric tutorials and sample calculations for developer audiences and onboarding employees.
July 2017 - Nov 2019

Research Area Specialist | Research Assistant

University of Michigan Health System - CHEAR Unit | Ann Arbor, MI

Research Leadership

  • Wrote and Awarded a 20,000K grant from Genetic Alliance to film patients and families managing diets with Phenylketonuria (PKU).
  • Co-Authored grant and research manuscripts, including two publications: one on a 2x2 factorial design survey relating parent antibiotic use to perceived contagiousness of “Pink-Eye” in Clinical Pediatrics and a qualitative study detailing focus groups with physicians regarding the role of genetic testing in state Newborn Screening programs in the International Journal of Neonatal Screening (IJNS).

Healthcare Data Analysis and Project Management

  • Facilitated the mailing and data collection of a national survey of physicians concerning the communication of newborn screening results to parents.
  • Administered a parent survey and analyzed the results for a 2x2 factorial design study aimed at relating perceived contagiousness with antibiotic use.
  • Co-facilitated physician focus groups concerning the role of genomics in newborn screening and annotated the transcripts following a codebook.
  • Translated the patient onboarding curriculum for a Michigan Medicine adolescent weight loss program into a series of instructional videos.

Health Data Visualization

  • • Created 30+ infographics to educate physicians, patients, and parents about diagnostic testing, patient experiences, and treatment curriculums related to newborn screening, diabetes, and weight management
July 2013 - Oct 2016

Publications

  • 2024 | Yusuf HR, Belmonte D, Simancek D, Vydiswaran VGV. 712forTask7 at #SMM4H ACL 2024. Task 7: Classifying Spanish tweets annotated by humans versus machines with BETO models. https://aclanthology.org/2024.smm4h-1.35/
  • 2024 | Fraga V, Nair N, Simancek D, Vydiswaran VGV. LHS712NV at #SMM4H ACL 2024. Task 4: Using BERT to classify Reddit posts on non-medical substance use. https://aclanthology.org/2024.smm4h-1.34/
  • 2024 | Zheng Y, Gong J, Ren S, Simancek D, Vydiswaran VGV. LHS712_ADENotGood at #SMM4H ACL 2024. Task 1: Deep-LLMADEminer: A deep learning and LLM pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter. https://aclanthology.org/2024.smm4h-1.30/
  • 2024 | Simancek D, Vydiswaran VGV. Handling Name Errors of a BERT-Based Text De-Identification System: Insights from Stratified Sampling and Markov-based Pseudonymization. Association for Computational Linguistics. 2024 Mar;In Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024):1–7. https://aclanthology.org/2024.caldpseudo-1.1/
  • 2023 | Alkiek K, Simancek D, Tan J, Weissman N, Ferraro J, V.G. Vinod V. DeiDoc: Automatic De-identification of Notes Using Long-Document Transformers. In: Poster Session. 2023.
  • 2022 | Goldenberg AJ, Ponsaran R, Gaviglio A, Simancek D, Tarini BA. Genomics and Newborn Screening: Perspectives of Public Health Programs. IJNS. 2022 Jan 28;8(1):11. doi:10.3390/ijns8010011. https://www.mdpi.com/2409-515X/8/1/11
  • 2016 | Scherer LD, Finan C, Simancek D, Finkelstein JI, Tarini BA. Effect of “Pink Eye” Label on Parents’ Intent to Use Antibiotics and Perceived Contagiousness. Clin Pediatr (Phila). 2016 Jun;55(6):543–8. doi: 10.1177/0009922815601983. http://journals.sagepub.com/doi/10.1177/0009922815601983

Education

University of Michigan Medical School

Doctor of Philosophy
Health Infrastructures and Learning Systems

Coursework and Dissertation Interests

NLP for Health Data | Deep Learning Architectures | Computational Bioinformatics | Distributed Learning | Differential Privacy | Implementation Science Frameworks

September 2015 - April 2017

University of Michigan School of Information

Master of Science
Information Analysis and Retrieval
September 2015 - April 2017

Friedrich-Alexander Universität Erlangen-Nürnberg

Post-Bachelor Fellowship | Erlangen, Germany
German, Media Studies
August 2012 - April 2013

Kalamazoo College

Bachelor of Arts
Major: English | Minor: German, Media Studies
August 2008 - June 2012

Skills

Data Analysis

Python | Statistics | ML Scripting + Coursework


Visualization

Tableau | Photoshop + Illustrator


Data Engineering

SQL (MySQL, Vertica, Impala, Oracle) | Git | PySpark/MapReduce


Storytelling

Engineering/Executive Reviews | Grant/Manuscript Writing | Documentary Filmmaking


Qualitative Research Methods

Survey Design | User Testing | Focus Groups | Interviews