Benjamin Lis, Columbia University
This talk will discuss the technical aspects of an EABCC-funded project to build a Python library that extracts individual email metadata and text from PDFs. We will also report our experiences using it to enhance a corpus of FOIAed documents, some thousands of pages long, from the early days of the COVID-19 pandemic.
Event Timeslots (1)
Day Two – June 14