Andrew Waugh


This presentation will describe the results of email appraisal experiments the Public Record Office Victoria ran as an archive tasked with preserving government records. The intended audience is institutional repositories and tool developers.

We will cover:

* The difference between preserving email of individuals and preserving institutional email. We highlight that the concept of email original order is complex, and focussing on the email collections of individual institutional recipients and the way they arranged email may not be optimal.

* The difference between ‘positive appraisal’ (selecting important emails to preserve) and ‘negative appraisal’ (selecting the junk to be disposed of). We tie this distinction to the important ability to generalise appraisal tools and approaches (especially AI systems) between different agencies generating email and different jurisdictions.

* The usefulness of email threading as an appraisal tool. Threading reduces the number of appraisal decisions, increases the amount of information on which to base decisions, and provides a better outcome for researchers. We introduce the concept of ‘super-threading’ as a potential AI area.

* The value of eDiscovery (and similar) tools in processing email collections.

* The difficulty in developing data sets for testing email processing approaches. Privacy and sensitivity concerns have to be balanced against the representative nature of the test collections. This has important consequences for the generalisability of the approaches being developed.

We conclude that email appraisal is a wicked problem to do at scale, especially if you wish to share tools.


Event Timeslots (1)

Day One – June 13
Andrew Waugh