Show all

Information administration earlier than and after the AI revolution – Considering Information

I recently recorded an IRMS podcast with Alan Pelz-Sharpe, co-author of what may be the first book on the use of artificial intelligence (AI) for information management purposes. In the interview, Alan said that he thought the transition to information management through AI would be an even bigger change than the transition from analog to digital work in the 1990s.

Keeping records is an ongoing activity. History says that once a society begins to keep records, nothing less than the end of that society causes it to stop. Keeping records is an essential part of the functioning of individual organizations and society as a whole. Fundamental changes in society lead to fundamental changes in recording practices.

This article shows how the record has changed after the industrial revolution and the digital revolution, and how the record will change after the AI ​​revolution. The article shows how the digital revolution has transformed records management from a service function into a policy function and how the AI ​​revolution has transformed it into data science again.

If we look at this wide range of records, we will see three major trends:

  • The ever-increasing volume of the records created;
  • The increasing dominance of structured data systems over unstructured data;
  • The ever growing ability to reclassify and re-aggregate all records in a record system.

The AI ​​revolution offers records managers / information governance professionals new powers to effectively intervene within and between records systems for governance purposes. Like any power, this comes with responsibility and the need to use power wisely and safely.

This article tries to outline both the possibilities that AI offers and the questions that arise for the profession of records management and information governance.

records before the industrial revolution

In the United Kingdom, our National Archives has an uninterrupted set of government records dating from the turn of the 13th – when the administration of the English kings began to keep copies of the letters and documents they sent. From the 13th to the middle of the 19th century, we can characterize the records as follows:

  • Records consist largely of correspondence. The correspondence is mostly conducted in simple chronological order.
  • The records are small. The size of the royal administration is very small. The speed at which correspondence moves (on horseback on bad roads or on the water) is slow.
  • There is no need for a record management profession because record management skills are not required to manage the chronological correspondence sequences.
  • Some records are not made in the form of correspondence, but in the form of entries in registers, inventories, index books or general ledgers. So-called structured data began.

At this early point in the history of the recording, we can already see the fundamental difference between "structured" and "unstructured" data:

  • Correspondence (a letter) is unstructured data because it is independent of a system or structure and moves at the time of its creation. The letter must therefore be integrated into a kind of structure with other correspondence pieces so that it functions fully as part of a file.
  • In contrast, an entry in a register, an inventory, an index book or a general ledger is an example of structured data, since from the time of its creation it is already integrated in a structure with other similar entries.

One of the ongoing efforts in managing records has been to ensure that records are consistently captured in a coherent structure. This endeavor is of crucial importance if companies mainly create unstructured data such as free-standing and freely movable correspondence and documents. For organizations that do their work through structured data systems, however, this is nowhere near as useful, since the structure of the database is defined from the beginning and the data records are recorded in the structure at the time of creation.

After the Industrial Revolution

The industrial revolution at the turn of the century brought together a large concentration of workers for the first time. At the turn of the 20th century, large concentrations of office workers were brought together in bureaucracies of ever-growing government departments, companies, and other institutions. This led to a revolution in file management:

  • The volume of the created records is now much higher. The size of the organizations has grown. The speed of correspondence (motorized transport on asphalt road, rail, steamship and later by air) is faster.
  • Recordings can best be characterized as documents. These documents are kept in sophisticated filing systems in which a file (or set of files) is created for each work. The files of similar types of work are grouped into sets of records that can usually be managed with a single access and retention rule.
  • There is a need for a file management profession because filing systems are sophisticated and the set of retention rules that govern how long records are kept in each series are also sophisticated.
  • In organizations with high demands on data record management, data record management is set up as a service. In the UK government, this service is provided by registration services. Registers are inserted into the correspondence flow when the sender goes to the recipient, so that the correspondence is archived before it reaches the recipient. This has the double advantage of ensuring that the article is archived and that the recipient reads it in the context of previous correspondence on this case / fact / project / topic.
  • An organization can classify its various file series to get an integrated forest for all documentation.
  • The volume of structured data is also increasing, and there are more sophisticated methods for storing structured data, such as. B. Card index. This structured data is outside the main organization of the documentation.

The type of documentation changed after the industrial revolution. The pre-industrial organization had recorded letters in chronological order. The practice of the 20th century to create a file for each individual work led to new document classes such as "file notes". These were documents that were not created primarily as direct communication from one person / office to another, but as a supplement to the file to ensure that the file can tell the whole story of this work. The increasing ability to copy documents (first by typewriters, typewriter pools and carbon paper, later by photocopiers) made it possible to put a copy of a document on any other file that it referred to.

After the digital revolution

The forerunner of the digital revolution was the computerization of various processes and workflows by organizations in the 1960s, 1970s and 1980s. This computerization was largely limited to very predictable, high-volume processes such as payroll, financial accounting, inventory control, etc. These processes were computerized by building databases with a data model that was very specifically adapted to the respective process.

The digital revolution hit the major English-speaking economies in the early 1990s when a way was found to apply a data structure to general business correspondence. The data structure in question was that contained in the email protocol, which set the format for an Internet-connected computer to send a message (an email) to another. The proliferation of email has led to the spread of computers on every employee's desktop in major economies.

Just as a quantum particle like an electron can be viewed as either a particle or a wave, emails within an email system can be viewed as follows:

  • Unstructured data – emails are separate pieces of correspondence that are moved from one person to another and should eventually be filed with other documents of the same type of work OR
  • Structured Data – An email system is a corporate database, and each new email is a new entry in the database. As with entries in other database types, it is neither necessary for the sender nor for one of the recipients to store them, since they are included in the structure / scheme of the e-mail system from the moment they are sent / received is integrated.

In the early digital age (1990s to the present):

  • The predominant form of data records are "data records". Organizations have multiple databases. Some are specific to a particular process or industry, others company-wide. An email system is a correspondence database. A content management system is a database of content that is available through a website and / or an intranet. A customer relationship system is a database of contacts with customers, etc. Some operational databases and logistics databases may contain business-critical information as well as important intellectual property and know-how.
  • The scope and speed of the documentation increase exponentially. When the e-mail arrives, the time it takes for a piece of mail to get from the sender to the recipient virtually disappears.
  • There is no general scheme for organizing records. Each data record has its own metadata schema / data model.
  • File management becomes a governance / guideline function in which it is determined which requirements individual employees should make of the documents and data they have created and received, and which should not.
  • The transfer of structured data from analog books, index books, inventories, map registers and registers into digital databases is a transformation due to new powerful methods for processing and analyzing data that computers bring with them.
  • With the help of metadata fields, machines can "understand" data in structured systems. Computers can perform information management tasks if they are given rules that specify which actions should be triggered by which value in which metadata field.

Records management is far less effective in the two decades after the digital revolution than in the previous four decades.

At the beginning of the digital age, the job of the file manager saw the task of managing electronic documents. This was based on the assumption that the fundamental change of the digital revolution was a change in format from paper to digital. We assumed that unstructured data would continue to take precedence over structured data, as has been the case throughout the history of pre-digital revolution recording. We assumed that letters would continue to function as unstructured, data-free, freely moving objects that had to be recorded in their trajectory at some point and integrated into a structure and a system.

The standard strategy for records management in the first decade of the digital age was to configure company-wide records systems in which documents and correspondence could be recorded and integrated into other records within a record classification / structure.

This strategy failed because most of the correspondence exists as email in email systems. The only step is to transfer from one email system to another if the sender is in a different email system than the recipient. There is no point at which it must be dropped by the sender or one of the recipients. It is already integrated in the structure and metadata schema of the email system of both the sender and the recipient.

The only type of recording in the digital age that functions as "unstructured data" and must be deliberately recorded in a structure are documents that were created with word processing software / presentation software / spreadsheet software such as Microsoft Word / Powerpoint / Excel. These behave like the documents of the paper age. At the time they are created, they are not yet integrated into a structure, so the creator has to put them somewhere. This requires document management systems.

Corporate document management systems deserve attention. You need and deserve careful management. They serve as recording systems for documents created in packages such as Microsoft Word, Powerpoint and Excel. They offer many practitioners more than enough work. But we cannot base a profession on them. Our profession has less and less influence on such systems, since the largest part of the market for such systems belongs only to two suppliers ( Microsoft and Google).

Corporate document management systems have an uncomfortable relationship with email systems. Document management systems rarely function as record systems for correspondence. Email systems usually serve as a recording system for documents. When a document needs to be submitted, it is usually sent via email. The email system records the date the document was sent, who sent it, who it was sent to, what message it sent, and which responses were received promptly. The corporate document management system and the e-mail system cover all corporate activities. Each of them has most of the organization's documents, but the email system offers much more in terms of decision-making around and outside of documents.

The latest generation of collaborative systems (such as MS Teams and Slack) tries to combat this separation between email systems and document management systems by moving team-based communication from email systems into a collaborative area. This is a better strategy than moving conversations that have taken place in the email environment to a document management environment. There is a good chance that individuals will primarily communicate with a narrow group of people (e.g. within a project team). However, it usually doesn't work as well when individuals work across different team and organizational boundaries with a changing number of people on different topics. This latter category includes many people whose archivists typically wanted to be selected for permanent storage (policy makers, diplomats, etc.).

The AI ​​Revolution

The AI ​​revolution takes place at the beginning of the third decade of the 21st century. The scope of judgments that machine intelligence can make is massively expanded.

Before the AI ​​revolution, machines could only make information management decisions under certain circumstances, if each of the following three conditions was met:

  • The machine is explicitly programmed for how the judgment is to be made.
  • the judgment can be made on the basis of values ​​in metadata fields;
  • The values ​​in these metadata fields were clear and unambiguous.

The AI ​​revolution enables machines to make judgments without having explicitly programmed it. We no longer have to define every step that a machine has to take. In fact, if we used a machine learning tool to identify which emails in email correspondence could be classified as "business correspondence," we would use a number of algorithms (the machine learning model) to create one develop another set of algorithms (the algorithms that will do this) differentiate between business and personal / trivial email based on the patterns observed in the data.

The most obvious way to train a machine learning tool to identify business correspondence in an email system is to send it a series of training emails, each called "business" or "personal / trivial "are marked. The machine learning model searches for the characteristics in the set of business correspondence whose values ​​tend to differ from those of the same characteristics in non-business correspondence. The tool has a hypothesis algorithm with which the parameters for each data characteristic are defined. The algorithm is typically tested by feeding a series of business and non-business emails to see how exactly it differentiates the two.

While machine-based rules were based on security before the AI ​​revolution, algorithms work with probability. An informal tone in an email can increase the likelihood that an email is trivial (or personal), but it is not a guarantee. By considering other data characteristics (the subject line of the email, the number of recipients, the roles of the recipients, the subject of the email, as indicated by words in the body of the message, etc.) the algorithm can increase its own confidence in the classification of the email as "trivial / personal" (or as "business"). Machine learning algorithms can tell you not only what you have classified an article as, but also the percentage certainty that the judgment was made. This can help the organization set security thresholds below which people's judgments should be reviewed.

The type of recording after the AI ​​revolution

Based on the history of the recording, here are some predictions of how the recording will shape and adapt to the AI ​​revolution:

  • Records Management / Information Governance becomes a data science that monitors algorithms that apply record classifications and / or record retention and access rules.
  • The time we know that information governance has entered the AI ​​era is the time when access and retention rules are applied to aggregations that have been assigned records by the machine learning algorithm.
  • For an algorithm, everything is data. If a data set contains patterns, an algorithm can learn these patterns and use its knowledge of these patterns to make distinctions. Machines are no longer limited to responding to highly structured metadata. Algorithms can identify patterns in any structured or unstructured data.
  • Organizations still have multiple databases. Some algorithms may use data from one database to manage data in another (for example, use information from job descriptions in an HR database to support algorithms to identify important business emails in an email system).
  • The scope and speed of documentation and data will continue to increase as AI algorithms generate content (for example, through automatic replies or automatic chat bots) and support their management.
  • Algorithms understand data like humans best when viewed in the context of their original application. Email is best understood in email systems or repositories that can replicate the structure and functionality of email systems. There is no longer any need to move content from a structured database (e.g. an email system) to another system.
  • Organizations have the technical possibility of having an overall structure / scheme for the organization of data records. However, this dream is likely to be difficult to achieve, since data that was created in a structured dataset is usually much more meaningful and manageable in the structure of this dataset than outside of the dataset. Algorithms are used more often to make data in a data set manageable than to extract data from the original data set in order to manage it using an alternative structure.
  • AI brings a number of opportunities that humans have never had before. For example, the ability to restructure an entire document system so that the access and retention rules can be applied to a completely different group of aggregations than they existed when individual documentation officers created or received the documentation. Learning whether (and if so, how and when) to use this skill becomes a challenge for the recording profession.


Records Management in a time characterized by structured data

The rise of structured data poses a challenge to records management theory. This theory is largely based on the assumption that most records (including correspondence and other types of documentation) are created as free-standing objects (unstructured data) that move independently of a structure and are therefore partially required point that integrates into a structure shall be.

This theory needs to be refined so that it can adapt to the reality that since the digital revolution, correspondence has even been created and shared within a structured database. Such a theory would reduce the importance of building record structures for the integration of records (since most records, including all email correspondence, are created in a database that already has a structure and schema). Instead, it emphasizes the importance of creating a reasonable, pragmatic, and consistent foundation for the application of retention and access rules in the various structures and schemas of the organization's various records.

AI and the ability to restructure and aggregate entire file systems

The most profound change in the AI ​​revolution is that for the first time, the ability to reorganize all elements in a recording system is not limited by the system's original metadata schema. A record management / information governance team can theoretically use any relevant classification logic (any classification scheme that is in any way related to the content of the record system) to re-aggregate the content in a record system. The team can apply retention and access rules through these new aggregations. The re-aggregation can be performed in the system at any time (ie a record can be assigned to a new aggregation a second, a day, a month, a year, a decade or a century later for governance purposes), its creation or acceptance.

This raises two fundamental questions for the theory and practice of records management:

  • What are the consequences of reclassifying, re-aggregating and / or re-labeling all of the elements in a record system for a profession that has traditionally been designed to maintain and maintain governance information regimes? Access permissions and predictable retention rules that apply to predictable aggregations of records?
  • What are the implications of being able to assign retention and access rules to aggregations that did not exist at the time the records were originally created and received, and where the creators / recipients of the records would not have intended to apply the access and access rules to retention rules ?

To ask these questions more concretely, let's look at them in relation to email – the big unresolved record management challenge that the digital revolution brought.

In e-mail systems, the correspondence is summarized in e-mail accounts and the access authorizations are applied to the correspondence via these accounts. AI opens up three options for applying retention rules and access rights to emails:

  • Ignore the existing structure / schema. – b ypass email accounts use AI to re-aggregate email correspondence (e.g., by applying a corporate record classification) so that access permissions and / or retention rules are no longer available via email Mail accounts but classification applied to the records.
  • Preserve Existing Structure / Scheme Manage Email Accounts AI can help you manage email accounts by making it trivial, personal, and easy Identify confidential emails in email accounts.
  • Use the existing structure and schema as a starting point – Improve email accounts and go beyond: Use AI to email within To classify email accounts by business activity, however, continue to use email accounts as the main aggregation for applying access permissions. When individuals get used to automatically classifying their emails against business activities, they can give selected colleagues access to the correspondence of selected activities in their email account.

The first approach is a high risk, the second approach is a low benefit. The third approach allows incremental changes to benefit individual email account users and their peers.

We should consider how Dave Snowden could make AI adoption more secure. In such approaches, classifications of machine learning are first introduced alongside (or within) existing structures, and then gradually influence the application of access rights and retention rules as trust in the machine learning process increases.

The theories and explanations outlined in this article were developed as part of my doctoral project at Loughborough University, in which the archiving policy towards emails is viewed from a realistic perspective. An article from this project & # 39; The reasonable deletion of government emails & # 39; was published in March 2019 by the Records Management Journal. An open access version of this article is available here in Loughborough University's digital repository (once in the repository click & # 39; Download & # 39; to download the PDF or read it in the window provided.


Like Loading …

Comments are closed.