Autism Spectrum Disorder (ASD) affects approximately 1 in 36 children in the United States, underscoring the urgent need for accessible early screening solutions. Current clinical assessments are time-intensive, requiring detailed observations and professional oversight. Existing AI models often depend on lab-collected data or intensive annotations like eye tracking or pose estimation, limiting scalability.
The development of this ASD screening system began with a preprocessing pipeline to clean and curate a dataset of short video clips featuring children in natural environments. Raw videos were sourced from real-world gameplay recordings, where human presence and interaction quality varied. First, automated filters removed non-human frames and low-quality footage. A human detection heuristic eliminated videos without visible people, and PySceneDetect segmented videos into semantically distinct scenes to isolate consistent behaviors.
A manual review phase followed: annotators labeled clips as usable or unusable based on the presence of relevant social or interactive behavior. This ensured the dataset was behaviorally informative while retaining real-life variability. Clips were then labeled by diagnosis (ASD or NT) and grouped for gender-balanced training and testing splits. Limiting each child to a maximum of three clips prevented individual dominance in the training set. The final model was trained using a Vision Transformer foundation model and evaluated across 20 stratified Monte Carlo cross-validation splits, enforcing consistent child-level coverage.
We implemented a custom Monte Carlo split generation pipeline with forced gender-balanced test sets:
Models were evaluated using:
All Splits Summary:
This project reflects my research focus on: