Applied Machine Learning
3+ years of experience applying machine learning to historical documents
Natural Language Processing
3+ years of experience developing spaCy NLP pipelines.
Ph.D in History and 2+ years experience as a postdoc in the Smithsonian Institution's Data Science Lab.
Social Network Analysis
5+ years of experience doing social network analysis
5+ years of experience producing technical content on YouTube.
LeetTopic is a transformer-based topic modeling library that lets users cluster documents and analyze the results in an auto-generated stand alone Bokeh application.
Streamlit Pandas allows users to automatically generate an entire Streamlit application from a single Pandas DataFrame. Users can specify how to handle specific columns as manipulatable widgets.
Natural history collections are essential resources for taxonomy, systematics, and ecological and climate change research. Mass digitization of these collections provides the opportunity to study broad biological patterns among specimens and their associated metadata at a scale that was previously impossible. The specimen metadata can also be used to study the contributions of the people that collected and identified these specimens. A proper accounting of these contributions impacts our understanding of the history of these collections and who played a role in their growth.
This paper announces version 1.0 of the Classical Language Toolkit (CLTK), an NLP framework for pre-modern languages. The vast majority of NLP, its algorithms and software, is created with assumptions particular to living languages, thus neglecting certain important characteristics of largely non-spoken historical languages. Further, scholars of pre-modern languages often have different goals than those of living-language researchers. To fill this void, the CLTK adapts ideas from several leading NLP frameworks to create a novel software architecture that satisfies the unique needs of pre-modern languages and their researchers. Its centerpiece is a modular processing pipeline that balances the competing demands of algorithmic diversity with pre-configured defaults. The CLTK currently provides pipelines, including models, for almost 20 languages.
Dr. Mattingly can be hired as a consultant. He has worked with data scientists, archivists, medical professionals, social scientists, geographers, and historians. He specializes in data cleaning, machine learning, natural language processing, social network analysis, and text analysis. He can read, with varying degrees of competence, over ten languages. He can bring this broad linguistic knowledge and background in machine learning to your project. If you are interested in hiring him, please fill out the form below.