In this presentation, let's see what whould happen if we put Web concepts in the center of Information Governance and Management efforts.
Most current buzzwords would not exist without Web's scientifically underpinned information model. For instance...
The whole App culture has only been possible because of an extremely well organized information space created by Web protocols.
And world's most successful Information Management company has Web's information model literally in its genome...
“I want to download the entire World Wide Web onto my Desktop”
Larry Page, 1996
The Google Story, David A. Vise, 2008
We remember: the Web started off in a not too large but complex organization, with information silos distributed all over the planet.
That is, like many organizations today.
This is a famous picture, and most people naturally focus on the Next workstation where the very first web server was running. But to the lower left, there is also the print out of an ontology visibile, the first problem space that the guys at CERN tried to solve.
From these early days, Tim Berners Lee and his colleagues evolved the concept...
... to a worldwide, all encompassing information network, as all of us know. This has only been possible by the strict deployment of standards, many issued by the WWW consortium, where Berners-Lee is the chairman.
|Protocol||Main Information Abstraction|
Web's information model is based on hierarchical ordering, as found in most computer filesystems.
This strict hierarchy is "broken" through Web's perhaps most important feature: links. The notion of links had been known from hypertext patterns for a long time.
Then came something called CGI, an RFC. CGI is the mother of all dynamically generated content on the Web in that it allowed generating page content on the fly. eCommerce, eBanking, blogs and many more systems are all based on CGI.
This suddenly raised the issue of what links were actually connecting: Berners-Lee's links connected documents only, but what if these are generated as they are requested?
The answer was an abstraction called RESOURCE.
A resource can be everything, even nothing, and it has an address, called the URI.
A resource can be something big, like information about an entire building...
... or somehting small like a cell in an excel file. As long as it has an address, it remains callable from outside.
Resources are based on several RFCs:
There are RFCs, but a conclusive description about the concept of resources and best practices for Web-based systems came as a dissertation from Roy Fielding, Ph.D., author of the HTTP 1.1 standard and co-founder of the Apache Software Foundation. He defined something called the REST architectural style, the pre-dominant communications model on the Web today.
How is a resource structured?
There are 3 key parts: an address (the URI), an interface receiving requests and answering in the form of hypermedia formats, and finally the internal part of the resource, the state.
And as a resource can be everything, let's assume our resource here is a record, managed by Records Management. It consists of a document (the record) and some metadata about it.
Upon receiving a request via the URI, the resource computes a representation (a media format) about its state and returns it to the caller.
The metadata of a record consists of a set of properties describing its content, and the record resource itself is subject to many context and controlling settings (right hand side).
Don't think of properties of simple values. That's an option, but in an interlinked world, these are oftentimes URIs themselves, i.e. links.
As URIs are not limited to the domain of a single application, they can point to any resource including special resource descriptions, which takes us to the worlds of the Resource Description Framework (RDF), Linkd Data and Controlled Vocabularies.
How does that work in practice? A client (browser or machine) is sending a request towards our record resource. HTTP opens the option that a client can wish some specific representation format (here PDF). The resource will try to satisfy this, and if successful, it will answer with a PDF document and a status of 200 (OK).
In fact, the resource may have done a complex interpretation of its state. It may have sent back the PDF document stored in the secure storage, perhaps enriched by some additional state originating from, for instance, resource description resources.
In a purely Web-based records management system, communication from the different client groups takes place exclusively via standards and through REST practices. Machine-to-machine communications is best leveraged via the CMIS standard, which defines a REST-based access, including the defintion of exchange formats, that is, representations.
Heavy usage clients such as eDiscovery engines also interact with the resources, to the outside exposed as a system of resources with their addresses, standards-based. Consequently, it is really a unique and uniform address space to everybody.
Where the state is kept is absolutely transparent to the client. It only receives a representation of it anyway.
As opposed to islands, best-of-breed tools will manage the resources space - based on standards such as CMIS and Java Content Repository (JCR). The resources space does not have limits: it can span many enterprise entities and geographies; where special restriction apply, the information model can still be the same, but access to resources is limited to, say, a single jurisdiction.
The URI of this presentation: