Creating Email Archives from PDFs – The Covid-19 Corpus

Presenter:

Benjamin Lis, Columbia University

Description:

This talk will discuss the technical aspects of an EABCC-funded project to build a Python library that extracts individual email metadata and text from PDFs. We will also report our experiences using it to enhance a corpus of FOIAed documents, some thousands of pages long, from the early days of the COVID-19 pandemic.

View Presentation

Event Timeslots (1)

Day Two – June 14
2:00 pm - 2:45 pm
Benjamin Lis

Tagged EABCC awardee project briefing