Show all

Automation and its implications for archival coverage in the direction of electronic mail – Considering Data

This is the text of a lecture delivered to the British Government's Knowledge and Information Network in London on September 26, 2019. I have revised and expanded the text.

Think of the entire correspondence that moves in, out of, and around your organization.

Think of the structure or schema in which you want to map all the important business correspondence items so that they can be found and managed. Think of the recording system in which the structure / scheme is located.

Who would you like to place important correspondence objects in this structure / schema: people or machines?

Experiment # 1: People versus machines that can learn

Imagine that you have started a process:

  • You direct each employee to place important documents in your file system with your preferred structure / scheme.
  • In parallel, you have set up a group of machines to look at all inbound and outbound correspondence, select important correspondences, and place them in the same structure / scheme as humans.

Who would you like to win – the humans or the machines?

Who would you win the trial with?

Most of us in the fields of files and information management want the machines to win. When the machines win, they relieve the heads of our colleagues. This gives our colleagues the freedom to focus on the job they were employed for.

We would expect the machines to win, assuming that:

  • the machines were able to learn a fairly complex structure;
  • there was a feedback loop between man and machine, so the machines were pointed out to their mistakes;
  • The machines learned machines that could adapt their algorithms in response to feedback.
  • The experiment lasted long enough for machines to improve after many iterations.

We do not yet have the automation required to routinely assign correspondence to a node in the form of a complex, multi-tier, enterprise-wide taxonomy / file plan / retention plan that Records Managers like to use to manage records.

The type of automation projects that are currently being carried out

D The type of automation projects that we see at the time of writing in information management are mainly based on binary questions:

  • The legal world has made progress on predictive coding projects that use machine learning to answer the binary question: "Will this content likely respond to a particular dispute?";
  • In the US, NARA's Capstone Policy has motivated some US federal agencies to use machine learning to answer the binary question, "Is this email needed as a record?". A similar project is being carried out by the National Archives of the Netherlands (their Dutch report is here);
  • The Better Information for Better Government program, implemented by the UK Cabinet Office, will soon launch a project to develop an artificial intelligence tool that can help distinguish important from unimportant government e-mails (see the Call for expressions of interest of August));
  • Graham MacDonald has been working on a methodology to help verify the confidentiality of records using machine learning to predict whether or not a given document is likely to be covered by one of the United Kingdom Freedom of Information exceptions (see his thesis))

We will be able to deploy machines sooner if we find binary questions about their solution than if we wait for machine nodes to allocate content to complex multi-level taxonomies / file schedules / retention schedules.

Records Management makes demands on humans

Throughout the greater part of the 20th century, people were able to file correspondence in often very sophisticated filing structures. In the twenty-first century this is no longer true. In the twentieth century, people have submitted correspondence because people had to file correspondence. In the twenty-first century, e-mail correspondence was automatically archived through automation built into e-mail systems. Any request to officials to request that e-mail correspondence be moved to another system will prompt them to resubmit this correspondence.

Automation built into e-mail systems

The automation built into the mid- to late-nineties proprietary e-mail systems was not machine learning. The machines in proprietary e-mail systems could not learn, they could only follow rules. Even now, two decades later, proprietary e-mail systems associate correspondence with a very simple structure and schema.

The archive and file management community's response to the introduction of e-mail systems has been to (rightly) point out the shortcomings in file management of a system that combines correspondence into individual e-mail accounts Correspondence is distinguished between business correspondence and personal / trivial correspondence. With a few exceptions (notably NARA in the US), the file and information management community has not accepted the structure of e-mail systems as a viable filing structure, and in many administrations (including the UK) we have continued to ask people for important correspondence in relocate separate systems.

Experiment No. 2: People versus machines that can not learn

To return to the idea of ​​a process with which I began this conversation, we have played people against machines in the last two decades:

  • People were asked to deposit important correspondence objects in a preferred recording system containing our preferred recording structure / scheme.
  • The computers (in the form of e-mail systems) have been configured to place correspondence in a simple structure that is inferior to the management of records.


Who do you want to win? The automated filing or the human filing?

From the point of view of Records / Information Management, you want the machines to win with the following reasons:

  • they take the work off the colleagues
  • the submission is very predictable and consistent
  • The submission takes place immediately? … ..

… or do you want people to win because they fit into a structure that allows a more precise application of retention and access rules?

Who do you think would win such a process?

Theoretically, people have more chances to win this second attempt than the first one. The human filing could prevail if people in the organization consider the file structure / file scheme to be so advantageous that they are prepared:

  • to make additional efforts to submit correspondence to the designated recording system;
  • to use the system of recorded records instead of their e-mail account as the main reference source for their own correspondence;
  • to forgo the possibility of simply relying on the inferior structure into which the e-mail systems had submitted the correspondence.

Although officials greatly appreciate the structure / scheme of records, there is a high likelihood that machine storage will prevail. I remember when e-mail systems were introduced in the UK in the mid-nineties. Government agencies and officials in them highly valued the record-keeping systems of their organizations (file systems registered in paper form). Everyone wanted at the time, that the registered file systems survive and create an orderly transition into the electronic world. But within five years of the general introduction of e-mail in the British government, all of these registered file systems were broken and there were no replacement systems. The introduction of e-mail destroyed these systems.

Why has the automated filing of e-mail systems in a simple structure overcame the value that British officials have placed on the much more sophisticated structure of their registered filing systems?

The decisive advantage of the machines (e-mail systems) was the speed. They immediately submitted correspondence. Automated filing through e-mail systems allowed officers immediate access to their correspondence as soon as they left the sender's account. This speeded up the correspondence speed, which in turn increased the volume of items exchanged, which in turn increased the number of items that people were supposed to put down.

The introduction of emails increased correspondence exponentially, making it impossible in any way for people to shift correspondence into a complex corporate structure. In other words, the machines moved the goalposts. And won the game!

Put more simply

  • Submission by humans is a viable option when the volume and speed of correspondence are low.
  • When the speed and volume of business correspondence increases exponentially, the staffing resource that needs to be refilled does not scale (anyway, not within public budgets!).

Machine Storage versus Retention – The Experience of the Last Twenty Years

The British Government's experience in e-mail over the past 25 years can be broken down into three phases.

In the first phase (c 1995 to c 2003), people (civil servants) were asked to print important correspondence and file them on registered files, while machines (e-mail systems) placed correspondence in e-mail accounts.

 03-registered files

In the second phase (2003-2010), officials were asked to file correspondence in electronic records and document management systems, while machines (e-mail systems) placed correspondence in e-mail accounts.


In the third phase, officials were asked to file correspondence in collaborative systems (such as Microsoft SharePoint) while computers (e-mail systems) continued to file correspondence in e-mail accounts.


During this twenty- to twenty-five year period, progress has been made in the systems where we asked our colleagues to file. We have moved from printed to electronic systems. We have moved from electronic file management systems with chunky corporate action plans to more user-friendly collaborative systems. The result, however, was the same in all three phases. At each stage, a pathetically small percentage of the business correspondence from e-mail accounts has been moved to the recording system. D The automatic filing of e-mails in e-mail accounts has always thwarted attempts to convince people to get used to retaking their important correspondence elsewhere.

The Political Dilemma of Automated Filing in E-Mail Systems

Over the past two decades, e-mail systems have used a primitive form of rule-based automation to drop emails in a simple structure / scheme. This has led to a political dilemma:

  • E-mail systems store e-mail correspondence efficiently, routinely and predictably in e-mail accounts, BUT the organization of correspondence in individual e-mail accounts leads to an inefficient and inaccurate application of retention and access rules to correspondence.
  • In contrast, humans are able to transform key correspondence items into a structure that allows for more accurate use of retention and access rules, but they are likely to do so seldom and arbitrarily.

The policy dilemma is, in part, that the best practice for records management does not specify which of the following two policy requirements is more important:

  • the consistent capture of correspondence in a structure / scheme; OR
  • A structure / schema that supports the exact application of retention and access rules.

The proven records management method does not help us choose between these two competing requirements because the proven record management method meets both requirements. Best practices for records management require consistent capture of correspondence in a structure / schema that supports the precise application of retention and access rules.

We are faced with two imperfect options. We should choose the least imperfect. The least imperfect option is the option whose weaknesses we can probably correct at a later date.

We are working in a transitional phase, and the transition is aimed at increasing the use of all higher-end automation, analytics and machine learning. If the current rate of advancement in machine learning / artificial intelligence is maintained, we can predict the following:

  • In the medium term, the original sites will be able to use machines to answer binary questions that help mitigate the worst mistakes made by e-mail accounts: distinguish important from minor and personal from business post.
  • ]

  • In the long term, originating organizations will be able to use Machine Intelligence to re-archive correspondence in an order of their choosing.

Considering the Future of Machine Learning in Current Political Decisions

When we reach a point where machine learning tools can drop correspondence in an order that is desired by an organization, our policy dilemma is resolved. At this point we can assign to each taxonomy a consistent mapping of the correspondence to classify the records and / or retention schedule that an organization selects. It is also believed that we can perform machine learning on old correspondence and map that correspondence to the same taxonomy / record classification / retention plan. We can anticipate that:

  • Future machine learning tools can subsequently correct the vulnerabilities in the structure / scheme of all surviving e-mail accounts.
  • Future machine learning tools can only mitigate e-mail capture issues in enterprise collaboration systems and electronic records management systems if important e-mail accounts are preserved.

This logic dictates that we must now give a high priority to ensuring that historically important email accounts survive in the confident hope that we will later find flaws and inefficiencies in the contents of those accounts can fix in the structure and scheme of these accounts.

This would require some protection for e-mail accounts now being introduced by officials playing important roles. Business correspondence, which is stored in the emails of key British government officials, is currently not protected. British government departments subject emails in email accounts to a planned deletion. The most common form of scheduled deletion is deleting the contents of e-mail accounts shortly after a person leaves the post. This approach complies with the National Archives' e-mail guidelines from the UK government, as each department asks its officials to move important e-mail from e-mail accounts into some kind of corporate record system. However, the unintended consequence of this policy is that most business correspondence is subject to this deletion.

The protection of e-mail accounts of officers who hold important functions can now be considered as protection – later on.

This approach to protecting now-later processes involves protecting historically important email accounts, knowing that computers can handle legacy well, and later on, to filter those records, improve metadata, and / or Overlaying an alternative structure can be further advanced to these records.

Such an approach would no longer require individuals to move important e-mails to a separate system for recording purposes (although there may well be circumstances in which a knowledge management / operational organization has some teams / areas to outsource important correspondence requests from e-mail systems or attempts to redirect the correspondence from the e-mail to other communication channels.

This approach is based on the recognition that using human efforts to do something bad that machines are likely to (well) do at a later stage makes sense neither in terms of effectiveness nor efficiency.

GDPR Implications of a Protection Now – Process Later Approach

Protecting important e-mail accounts from being deleted while working on the development of machine learning capabilities will likely result in some personal correspondence being kept alongside historically important correspondence. This has data protection effects.

The GDPR permits the archiving of records that contain personal data, provided that the retention of the records is in the public interest and the necessary security precautions are taken and the data protection rights of the data subjects are maintained. Maintaining the work email account of an important official is likely to be in the public interest and is likely to be in line with the Data Protection Act, provided the following conditions are met:

  • the role played by the individual was of historical interest;
  • the individual could expect his account to be permanently preserved;
  • the person was given the opportunity to mark or remove personal correspondence;
  • Access to personal correspondence has been prevented except in the case of a mandatory legal requirement;
  • Correspondence elements that are primarily personal are removed as soon as a reliable ability to identify them is available.


In this talk, it is recommended that government agencies that use e-mail as the main communication channel do not automatically delete correspondence from e-mail from their key employees until automated tools for handling correspondence in those accounts have been developed. In practice, this should only concern the protection of around 5% of their e-mail accounts (according to the old rule of thumb that 5% of the records of a source organ should be worthy of permanent storage).

This is not a simple sale to government agencies. While the recommendation covers only about 5% of their email accounts, the departments believe that this is the 5% that has the highest potential reputation / political risk, and the 5% on the most likely to attract freedom of information requests.

Such a recommendation is by no means a waiver of the goal of file management to consistently assign business correspondence to structures and schemas that support the use and reuse of correspondence and support the accurate application of retention and access rules. It is just a recognition that the call to officials to select and move important e-mails into a separate system has not worked for twenty years and shows no sign that they will be working soon. It is also a recognition that we need automated tools to process the material automatically submitted by e-mail systems.

Above all, this approach to protecting important e-mail accounts allows us to apply automated solutions to e-mail. This would be an incentive and an opportunity to provide tools based on a binary logic ("Is this email important, yes or no?", "Is this email personal, yes or no?"), To the worst Fix e-mail account deficiencies from the point of view of information management. These tools are not a prime example, but are already used in real projects. We also hope that in the long term we have tools that go beyond binary questions and can associate individual emails with a properly detailed record classification, taxonomy, and / or retention schedule.

The theories and explanations outlined in this paper were developed as part of my PhD project at Loughborough University, which provides a realistic evaluation of the filing policy for emails from the UK government. An article from this project – The Decidable Deletion of Government Emails & # 39; was published in March 2019 by the Records Management Journal. An open access version of this article is available here in the Loughborough University digital repository (once in the repository, click on 'Download' to download the PDF or read it in the dedicated window.)

James Lappin


I like Loading …

Comments are closed.