The Web
[A Trip Through Web's Information Space (32 stops)]
© Jürg Meier, AIIM Certified Information Professional
15 Nov 2013, for ARMA Swiss Chapter
Slideset navigation: move forward: → or ↓ ; move backwards: ← or ↑
Use browser's <F11> for full screen viewing.
About the Web's Information Model and its usage for Records and Information Management. linked Web Resources graphic

Hello, I'm your ECM, I know how to do everything...

Enterprise Content Management systems try to solve everything around unstructured content, but they effectively represent islands and are accountable for most of today's Information Management challenges.
  • access and discoveribility may be difficult
  • duplicates
  • wrong/insufficient services and quality thereof
  • meta-/masterdata may not be aligned, domain space reduced to single system
  • proprietary APIs
  • vendor lock-in
  • scalability limitations
ECM islands try to do everything - or are limited to one single problem domain. But most of all, they always want to keep the information asset themselves and consider it as their property.

In this presentation, let's see what whould happen if we put Web concepts in the center of Information Governance and Management efforts.

Most current buzzwords would not exist without Web's scientifically underpinned information model. For instance...

The cloud would be completely unthinkable without a sound and extremely scalable information protocol. Or what do you think are the small lines between the computers and the cloud in the picture?
Turn it that way: Big Data wouldn't even be needed without the tremendous volumes of data and information going over standardized Web protocols.
Big Data

The whole App culture has only been possible because of an extremely well organized information space created by Web protocols.

And world's most successful Information Management company has Web's information model literally in its genome...

“I want to download the entire World Wide Web onto my Desktop”
Larry Page, 1996
The Google Story, David A. Vise, 2008

We remember: the Web started off in a not too large but complex organization, with information silos distributed all over the planet.
That is, like many organizations today.

Solving the Knowledge Management challenges of a single organization

This is a famous picture, and most people naturally focus on the Next workstation where the very first web server was running. But to the lower left, there is also the print out of an ontology visibile, the first problem space that the guys at CERN tried to solve.


From these early days, Tim Berners Lee and his colleagues evolved the concept...

tim berners-lee

... to a worldwide, all encompassing information network, as all of us know. This has only been possible by the strict deployment of standards, many issued by the WWW consortium, where Berners-Lee is the chairman.


Some Background


  • ... but in the context of this presentation, the term is used in its original sense: enabling interchange between two otherwise separated systems.
Consequently, standards are a way to supersede different vendor implementations and thus, reduce vendor lock-in from a user perspective.



  • What is an RFC (Request for Comments)?
    • An RFC is authored by engineers and computer scientists in the form of a memorandum
    • Gets peer reviewed
    • Finally adopted by the Internet Engineering Task Force (IETF) as standard

When talking about the Web and its information model, we mean a set of RFCs on the foundation of the "Internet" based protocols, such as TCP/IP:
  Protocol Main Information Abstraction
Web HTTP Resource
Internet TCP/IP Datagram

The Web Information Model

Web's information model is based on hierarchical ordering, as found in most computer filesystems.

This strict hierarchy is "broken" through Web's perhaps most important feature: links. The notion of links had been known from hypertext patterns for a long time.

docHierarchy links
Hypertext/hypermedia first introduced by American Computer Engineer Ted Nelson in 1965
  • By introducing "networked" links, Web's information model superseded the boundaries of a single application/system and leads way to a global information space.

Then came something called CGI, an RFC. CGI is the mother of all dynamically generated content on the Web in that it allowed generating page content on the fly. eCommerce, eBanking, blogs and many more systems are all based on CGI.

This suddenly raised the issue of what links were actually connecting: Berners-Lee's links connected documents only, but what if these are generated as they are requested?


(RFC 3875)

The answer was an abstraction called RESOURCE.

A resource can be everything, even nothing, and it has an address, called the URI.

A resource can be something big, like information about an entire building...
... or somehting small like a cell in an excel file. As long as it has an address, it remains callable from outside.

Resources are based on several RFCs:

  • RFC 1738 (12/1994)
  • RFC 2396, 2732 (08/1998, 12/1999)
  • RFC 3305 (08/2002)

There are RFCs, but a conclusive description about the concept of resources and best practices for Web-based systems came as a dissertation from Roy Fielding, Ph.D., author of the HTTP 1.1 standard and co-founder of the Apache Software Foundation. He defined something called the REST architectural style, the pre-dominant communications model on the Web today.


roy fielding
  • Representational State Transfer Architectural Style,
    (Roy Fielding, UC Irvine, 2000)

  • Constraints: client-server, stateless, layered architecture, uniform interface
  • Interface qualities of a resource:
    • Identification of resources
    • Manipulation of resources through these representations
    • Self-descriptive messages
    • Hypermedia as the engine of application state (a.k.a. HATEOAS)
    • ...

How is a resource structured?

There are 3 key parts: an address (the URI), an interface receiving requests and answering in the form of hypermedia formats, and finally the internal part of the resource, the state.


And as a resource can be everything, let's assume our resource here is a record, managed by Records Management. It consists of a document (the record) and some metadata about it.

Upon receiving a request via the URI, the resource computes a representation (a media format) about its state and returns it to the caller.


The metadata of a record consists of a set of properties describing its content, and the record resource itself is subject to many context and controlling settings (right hand side).


Don't think of properties of simple values. That's an option, but in an interlinked world, these are oftentimes URIs themselves, i.e. links.

As URIs are not limited to the domain of a single application, they can point to any resource including special resource descriptions, which takes us to the worlds of the Resource Description Framework (RDF), Linkd Data and Controlled Vocabularies.

Resource Description
Framework (RDF)/
Linked Data

How does that work in practice? A client (browser or machine) is sending a request towards our record resource. HTTP opens the option that a client can wish some specific representation format (here PDF). The resource will try to satisfy this, and if successful, it will answer with a PDF document and a status of 200 (OK).


In fact, the resource may have done a complex interpretation of its state. It may have sent back the PDF document stored in the secure storage, perhaps enriched by some additional state originating from, for instance, resource description resources.

A Resource-Based
Enterprise Content System

In a purely Web-based records management system, communication from the different client groups takes place exclusively via standards and through REST practices. Machine-to-machine communications is best leveraged via the CMIS standard, which defines a REST-based access, including the defintion of exchange formats, that is, representations.

Heavy usage clients such as eDiscovery engines also interact with the resources, to the outside exposed as a system of resources with their addresses, standards-based. Consequently, it is really a unique and uniform address space to everybody.

Where the state is kept is absolutely transparent to the client. It only receives a representation of it anyway.

As opposed to islands, best-of-breed tools will manage the resources space - based on standards such as CMIS and Java Content Repository (JCR). The resources space does not have limits: it can span many enterprise entities and geographies; where special restriction apply, the information model can still be the same, but access to resources is limited to, say, a single jurisdiction.

Some conclusions...

  • The Web Information Model does something simple: introducing an address scheme that supersedes machine and systems boundaries.
  • Here to expose information resources to the outside
  • All modern IT trends are based on Web's information model
  • Even if your organization is huge, it is small compared to the WWW; but still complex enough for a truly distributed resources model that will facilitate managed information exchange
  • The Web: leads way into 21st Century Information Management and Governance
Thank you for your attention.

The URI of this presentation:

Jürg Meier

Presentation done with: