curriculum vitae
Current role
Cultural Heritage Data Scientist at Yale University
I research and develop machine learning solutions for Yale University’s cultural heritage collections. I primarily work with historical documents and images. Many workflows include developing custom pipelines for handling sensitive or complex document and image parsing. Included in this are object detection and handwriting text recognition pipelines.
Previous role
Machine Learning Postdoctoral Fellow at Smithsonian Institution, Data Science Lab and the United States Holocaust Memorial Museum
I have explored the ways in which we can ethically and reliably apply machine learning to archival material, particularly records that are sensitive in nature and multilingual. I have used the last four years to not only share my research via courses, talks and publications, but also to build tools and machine learning pipelines to help the USHMM and Smithsonian process, identify and extract important metadata from their collections. Many of these projects have resulted in open-source datasets, machine learning models, and Python packages.
Education
- 2020: Ph.D. in Medieval History at the University of Kentucky
- 2012: M.A. in History at Florida Gulf Coast University
- 2010: B.A. in History at Florida Gulf Coast University
Books
Mattingly, William. (July 2023) Introduction to Python for Humanists. Routledge – Taylor and Francis.
Peer-reviewed articles
Mattingly, William and Stephen Davis. (December 2024) Data as Practice: Alternate Modes of Learning in the Undergraduate and Graduate Digital Humanities Classroom. Companion to DH in Practice. Routledge.
Mattingly, William. (2024) Python or R? Getting Started with Programming for Humanists. Compendium for Computational Theology.
Dikow RB, DiPietro C, Trizna MG, BredenbeckCorp H, Bursell MG, Ekwealor JTB, Hodel RGJ, Lopez N, Mattingly WJB, Munro J, Naples RM, Oubre C, Robarge D, Snyder S, Spillane JL, Tomerlin MJ, Villanueva LJ, White AE (2023) Developing responsible AI practices at the Smithsonian Institution. Research Ideas and Outcomes 9: e113334. https://doi.org/10.3897/rio.9.e113334.
Dikow, R. B., Ekwealor, J. T. B., Mattingly, W. J. B., Trizna, M. G., Harmon, E., Dikow, T., Arias, C. F., Hodel, R. G. J., Spillane, J., Tsuchiya, M. T. N., Villanueva, L., White, A. E., Bursell, M. G., Curry, T., Inema, C., Geronimo-Anctil, K. 2023. Let the records show: attribution of scientific credit in natural history collections. International Journal of Plant Sciences special volume in honor of Vicki Funk, 184, (5) 392–404. https://doi.org/10.1086/724949.
Johnson, K. P., Burns, P. J., Stewart, J., Cook, T., Besnier, C., & Mattingly, W. J. B. (2021). The Classical Language Toolkit: An NLP Framework for Pre-Modern Languages. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations (pp. 20-29). https://doi.org/10.18653/v1/2021.acl-demo.3
Besnier, C. and Mattingly, W. (2021) Named-Entity Dataset for Medieval Latin, Middle High German and Old Norse. Journal of Open Humanities Data, 7(0), p. 23. https://doi.org/10.5334/johd.36
Select talks (abbreviated; see the Talks page for recent entries)
- “Evaluating Automatic Speech Recognition for Holocaust Testimonies: A Large-Scale Analysis of Whisper Performance on the Fortunoff Video Archive”, co-presented with Christy Bailey-Tomecek at the Second Workshop on Holocaust Testimonies as Language Resources (HTRes-2026), Palma de Mallorca. (11 May 2026).
- “AI at Yale”, Yale University. (22 April 2026).
- “Reading Handwritten Archives with AI”, Yale 2026 One IT Conference. (18 March 2026).
- “Leveraging AI for Linked Open Data”, co-presented with Rob Sanderson at the Lorentz Center workshop Enriching Digital Heritage with LLMs and Linked Open Data, Leiden. (10 September 2025).
- “Semantic Searching with Vector Databases and their Applications in Quote Identification”, SCOOP: Source Codes of the Past, Institute for Advanced Study, Princeton. (12–13 June 2025).
- “NLP in the Age of LLMs”, The Chinese University of Hong Kong. (30 September 2024).
- “Building Machine Learning Pipelines at Scale”, USHMM. (19 August 2024).
- “Interrogating Digital Justice as Disaster Recovery”, DH 2024. (8 August 2024).
- “Teaching Machine Learning in the Humanities”, DH 2024 (6 August 2024).
- “Introduction to Retrieval Augmented Generation”, TAP Institute (29 July - 2 August 2024).
- “Introduction to Vector Databases”, TAP Institute, (22 - 26 July 2024).
- “The Role of spaCy in the World of LLMs”, TAP Institute, (15 - 19 July 2024).
- “From Capture to Engagement: Experiments in Using AI for Indexing, Named Entity Recognition, and More” AI in Oral History Symposium (15 July 2024)
- “AI in the Arts and Humanities”, Society for Scholarly Publishing, (31 May 2024).
- “Creating a Typology of Places to Annotate Holocaust Testimonies Through Machine Learning”, co-presented with Christine Liu at the Holocaust Testimonies as Language Resources Conference, Turin. (21 May 2024).
- “Developing an NER model for the Holocaust”, Holocaust Testimonies as Language Resources Conference, (20 May 2024).
- “Where did the Holocaust happen? Locating place in testimonies through machine learning” co-presented with Christine Liu at Quantifying the Holocaust, Paris. (14–16 May 2024).
- “Developing Vector Databases for Semantically Querying Holocaust Material”, annual EHRI Colloquium. (5 May 2023).
- “AI & Prompt Design”, NISO 8-part series, (4 April - 23 May 2024).
- “Text and Data Mining”, NISO 8-part series, (12 October - 7 December 2023).
- “LLMs and Higher Education”, JSTOR (11 October 2023).
- “Building Named Entity Recognition Pipelines”, TAP Institute, (July 2023).
- “Building Text Classification Pipelines”, TAP Institute, (July 2023).
- “Introduction to Natural Language Processing”, TAP Institute, (July 2023).
- “Multilingual Named Entity Recognition”, TAP Institute, (June 2022).
- “Data Management and Analysis in Python with Pandas”, TAP Institute, (June 2022).
- “The Application of Machine Learning to Large Archives” Smithsonian Institution, (30 March 2022).
- “Machine Learning and the Analysis of Historical Texts” Virginia Tech (24 March 2022).
- “Ancient and Medieval Text Analysis”, TAP Institute, (June 2021).
- “Applying Machine Learning in the Humanities”, TAP Institute, (June 2021).
- “The Challenges of Developing NER for Holocaust and Medieval Texts” AI4GLAM (February 2021).