I recently recorded an IRMS podcast with Alan Pelz-Sharpe, co-author of what may be the first book on the use of artificial intelligence (AI) for information management purposes. In the interview, Alan said that he thought the transition to information management through AI would be an even bigger change than the transition from analog to digital work in the 1990s.
Keeping records is an ongoing activity. History says that once a society begins to keep records, nothing less than the end of that society causes it to stop. Keeping records is an essential part of the functioning of individual organizations and society as a whole. Fundamental changes in society lead to fundamental changes in recording practices.
This article shows how the record has changed after the industrial revolution and the digital revolution, and how the record will change after the AI revolution. The article shows how the digital revolution has transformed records management from a service function into a policy function and how the AI revolution has transformed it into data science again.
If we look at this wide range of records, we will see three major trends:
The AI revolution offers records managers / information governance professionals new powers to effectively intervene within and between records systems for governance purposes. Like any power, this comes with responsibility and the need to use power wisely and safely.
This article tries to outline both the possibilities that AI offers and the questions that arise for the profession of records management and information governance.
records before the industrial revolution
In the United Kingdom, our National Archives has an uninterrupted set of government records dating from the turn of the 13th – when the administration of the English kings began to keep copies of the letters and documents they sent. From the 13th to the middle of the 19th century, we can characterize the records as follows:
At this early point in the history of the recording, we can already see the fundamental difference between "structured" and "unstructured" data:
One of the ongoing efforts in managing records has been to ensure that records are consistently captured in a coherent structure. This endeavor is of crucial importance if companies mainly create unstructured data such as free-standing and freely movable correspondence and documents. For organizations that do their work through structured data systems, however, this is nowhere near as useful, since the structure of the database is defined from the beginning and the data records are recorded in the structure at the time of creation.
After the Industrial Revolution
The industrial revolution at the turn of the century brought together a large concentration of workers for the first time. At the turn of the 20th century, large concentrations of office workers were brought together in bureaucracies of ever-growing government departments, companies, and other institutions. This led to a revolution in file management:
The type of documentation changed after the industrial revolution. The pre-industrial organization had recorded letters in chronological order. The practice of the 20th century to create a file for each individual work led to new document classes such as "file notes". These were documents that were not created primarily as direct communication from one person / office to another, but as a supplement to the file to ensure that the file can tell the whole story of this work. The increasing ability to copy documents (first by typewriters, typewriter pools and carbon paper, later by photocopiers) made it possible to put a copy of a document on any other file that it referred to.
After the digital revolution
The forerunner of the digital revolution was the computerization of various processes and workflows by organizations in the 1960s, 1970s and 1980s. This computerization was largely limited to very predictable, high-volume processes such as payroll, financial accounting, inventory control, etc. These processes were computerized by building databases with a data model that was very specifically adapted to the respective process.
The digital revolution hit the major English-speaking economies in the early 1990s when a way was found to apply a data structure to general business correspondence. The data structure in question was that contained in the email protocol, which set the format for an Internet-connected computer to send a message (an email) to another. The proliferation of email has led to the spread of computers on every employee's desktop in major economies.
Just as a quantum particle like an electron can be viewed as either a particle or a wave, emails within an email system can be viewed as follows:
In the early digital age (1990s to the present):
Records management is far less effective in the two decades after the digital revolution than in the previous four decades.
At the beginning of the digital age, the job of the file manager saw the task of managing electronic documents. This was based on the assumption that the fundamental change of the digital revolution was a change in format from paper to digital. We assumed that unstructured data would continue to take precedence over structured data, as has been the case throughout the history of pre-digital revolution recording. We assumed that letters would continue to function as unstructured, data-free, freely moving objects that had to be recorded in their trajectory at some point and integrated into a structure and a system.
The standard strategy for records management in the first decade of the digital age was to configure company-wide records systems in which documents and correspondence could be recorded and integrated into other records within a record classification / structure.
This strategy failed because most of the correspondence exists as email in email systems. The only step is to transfer from one email system to another if the sender is in a different email system than the recipient. There is no point at which it must be dropped by the sender or one of the recipients. It is already integrated in the structure and metadata schema of the email system of both the sender and the recipient.
The only type of recording in the digital age that functions as "unstructured data" and must be deliberately recorded in a structure are documents that were created with word processing software / presentation software / spreadsheet software such as Microsoft Word / Powerpoint / Excel. These behave like the documents of the paper age. At the time they are created, they are not yet integrated into a structure, so the creator has to put them somewhere. This requires document management systems.
Corporate document management systems deserve attention. You need and deserve careful management. They serve as recording systems for documents created in packages such as Microsoft Word, Powerpoint and Excel. They offer many practitioners more than enough work. But we cannot base a profession on them. Our profession has less and less influence on such systems, since the largest part of the market for such systems belongs only to two suppliers ( Microsoft and Google).
Corporate document management systems have an uncomfortable relationship with email systems. Document management systems rarely function as record systems for correspondence. Email systems usually serve as a recording system for documents. When a document needs to be submitted, it is usually sent via email. The email system records the date the document was sent, who sent it, who it was sent to, what message it sent, and which responses were received promptly. The corporate document management system and the e-mail system cover all corporate activities. Each of them has most of the organization's documents, but the email system offers much more in terms of decision-making around and outside of documents.
The latest generation of collaborative systems (such as MS Teams and Slack) tries to combat this separation between email systems and document management systems by moving team-based communication from email systems into a collaborative area. This is a better strategy than moving conversations that have taken place in the email environment to a document management environment. There is a good chance that individuals will primarily communicate with a narrow group of people (e.g. within a project team). However, it usually doesn't work as well when individuals work across different team and organizational boundaries with a changing number of people on different topics. This latter category includes many people whose archivists typically wanted to be selected for permanent storage (policy makers, diplomats, etc.).
The AI Revolution
The AI revolution takes place at the beginning of the third decade of the 21st century. The scope of judgments that machine intelligence can make is massively expanded.
Before the AI revolution, machines could only make information management decisions under certain circumstances, if each of the following three conditions was met:
The AI revolution enables machines to make judgments without having explicitly programmed it. We no longer have to define every step that a machine has to take. In fact, if we used a machine learning tool to identify which emails in email correspondence could be classified as "business correspondence," we would use a number of algorithms (the machine learning model) to create one develop another set of algorithms (the algorithms that will do this) differentiate between business and personal / trivial email based on the patterns observed in the data.
The most obvious way to train a machine learning tool to identify business correspondence in an email system is to send it a series of training emails, each called "business" or "personal / trivial "are marked. The machine learning model searches for the characteristics in the set of business correspondence whose values tend to differ from those of the same characteristics in non-business correspondence. The tool has a hypothesis algorithm with which the parameters for each data characteristic are defined. The algorithm is typically tested by feeding a series of business and non-business emails to see how exactly it differentiates the two.
While machine-based rules were based on security before the AI revolution, algorithms work with probability. An informal tone in an email can increase the likelihood that an email is trivial (or personal), but it is not a guarantee. By considering other data characteristics (the subject line of the email, the number of recipients, the roles of the recipients, the subject of the email, as indicated by words in the body of the message, etc.) the algorithm can increase its own confidence in the classification of the email as "trivial / personal" (or as "business"). Machine learning algorithms can tell you not only what you have classified an article as, but also the percentage certainty that the judgment was made. This can help the organization set security thresholds below which people's judgments should be reviewed.
The type of recording after the AI revolution
Based on the history of the recording, here are some predictions of how the recording will shape and adapt to the AI revolution:
Records Management in a time characterized by structured data
The rise of structured data poses a challenge to records management theory. This theory is largely based on the assumption that most records (including correspondence and other types of documentation) are created as free-standing objects (unstructured data) that move independently of a structure and are therefore partially required point that integrates into a structure shall be.
This theory needs to be refined so that it can adapt to the reality that since the digital revolution, correspondence has even been created and shared within a structured database. Such a theory would reduce the importance of building record structures for the integration of records (since most records, including all email correspondence, are created in a database that already has a structure and schema). Instead, it emphasizes the importance of creating a reasonable, pragmatic, and consistent foundation for the application of retention and access rules in the various structures and schemas of the organization's various records.
AI and the ability to restructure and aggregate entire file systems
The most profound change in the AI revolution is that for the first time, the ability to reorganize all elements in a recording system is not limited by the system's original metadata schema. A record management / information governance team can theoretically use any relevant classification logic (any classification scheme that is in any way related to the content of the record system) to re-aggregate the content in a record system. The team can apply retention and access rules through these new aggregations. The re-aggregation can be performed in the system at any time (ie a record can be assigned to a new aggregation a second, a day, a month, a year, a decade or a century later for governance purposes), its creation or acceptance.
This raises two fundamental questions for the theory and practice of records management:
To ask these questions more concretely, let's look at them in relation to email – the big unresolved record management challenge that the digital revolution brought.
In e-mail systems, the correspondence is summarized in e-mail accounts and the access authorizations are applied to the correspondence via these accounts. AI opens up three options for applying retention rules and access rights to emails:
The first approach is a high risk, the second approach is a low benefit. The third approach allows incremental changes to benefit individual email account users and their peers.
We should consider how Dave Snowden could make AI adoption more secure. In such approaches, classifications of machine learning are first introduced alongside (or within) existing structures, and then gradually influence the application of access rights and retention rules as trust in the machine learning process increases.
The theories and explanations outlined in this article were developed as part of my doctoral project at Loughborough University, in which the archiving policy towards emails is viewed from a realistic perspective. An article from this project & # 39; The reasonable deletion of government emails & # 39; was published in March 2019 by the Records Management Journal. An open access version of this article is available here in Loughborough University's digital repository (once in the repository click & # 39; Download & # 39; to download the PDF or read it in the window provided.