Dr. William J.B. Mattingly

Your Image
Icon 1

Applied Machine Learning

3+ years of experience applying machine learning to historical documents

Icon 1

Natural Language Processing

3+ years of experience developing spaCy NLP pipelines.

Icon 1


Ph.D in History and 2+ years experience as a postdoc in the Smithsonian Institution's Data Science Lab.

Icon 1

Social Network Analysis

5+ years of experience doing social network analysis

Icon 1

Content Producer

5+ years of experience producing technical content on YouTube.



Icon 1

Introduction to Python for Humanists

This book stems from my open-source textbooks and YouTube videos which teach Python to humanists. This is the print edition. It has a free online edition found here.

Icon 2

Let the Records Show: Attribution of Scientific Credit in Natural History Collections

Natural history collections are essential resources for taxonomy, systematics, and ecological and climate change research. Mass digitization of these collections provides the opportunity to study broad biological patterns among specimens and their associated metadata at a scale that was previously impossible. The specimen metadata can also be used to study the contributions of the people that collected and identified these specimens. A proper accounting of these contributions impacts our understanding of the history of these collections and who played a role in their growth.

Icon 3

The Classical Language Toolkit: An NLP Framework for Pre-Modern Languages

This paper announces version 1.0 of the Classical Language Toolkit (CLTK), an NLP framework for pre-modern languages. The vast majority of NLP, its algorithms and software, is created with assumptions particular to living languages, thus neglecting certain important characteristics of largely non-spoken historical languages. Further, scholars of pre-modern languages often have different goals than those of living-language researchers. To fill this void, the CLTK adapts ideas from several leading NLP frameworks to create a novel software architecture that satisfies the unique needs of pre-modern languages and their researchers. Its centerpiece is a modular processing pipeline that balances the competing demands of algorithmic diversity with pre-configured defaults. The CLTK currently provides pipelines, including models, for almost 20 languages.



Dr. Mattingly can be hired as a consultant. He has worked with data scientists, archivists, medical professionals, social scientists, geographers, and historians. He specializes in data cleaning, machine learning, natural language processing, social network analysis, and text analysis. He can read, with varying degrees of competence, over ten languages. He can bring this broad linguistic knowledge and background in machine learning to your project. If you are interested in hiring him, please fill out the form below.