Benjamin Lis, Columbia University


This talk will discuss the technical aspects of an EABCC-funded project to build a Python library that extracts individual email metadata and text from PDFs. We will also report our experiences using it to enhance a corpus of FOIAed documents, some thousands of pages long, from the early days of the COVID-19 pandemic.

View Presentation

Event Timeslots (1)

Day Two – June 14
Benjamin Lis