Monday, November 5, 2007

Eliminating paperwork by combining digital imaging tools, content management systems and SOA

It was once said that the advance of ICT technologies such as e-mail would reduce the cost of handling documents and be beneficial for the environment. It would result in less printed paper, right? Everything would be done digitally, right? Well, it doesn’t seem to work that way. Even while technology exists to store most information digitally, lots of organizations still have an archive filled with all kinds of printed documents (invoices, orders, etc.). This is costly (you have to build, rent and/or buy archive space) and not really ‘CO2-neutral’. Especially in case documents are delivered electronically.

One of the related problems is that the process of electronically storing documents has to be compliant with a growing number of standards and laws. ‘Just’ storing documents manually on a file system won’t do.
In a customer case the following technologies are/will be combined to accomplish certified electronic storage of documents:

  • Scanners with OCR (Optical Character Recognition) software
  • Process orchestration using services
  • CMS (Content Management System)


How are these technologies combined?
Once documents arrive they are scanned. The OCR software outputs an image of the document and extracts metadata into an XML file. An automated process picks up the XML file, transforms its contents to a canonical data model and invokes a custom SOAP service, also passing the file location of the associated image. The invoked service then uses a Web Service API to store the image into a CMS database and uses the XML data to create and store metadata/attributes for the document.

Why are these technologies complementary?
While a CMS provides reliable and robust storage of electronic documents and provides multiple interfaces (both programmatic and UI-based), process orchestration and SOA implement traceable, auditable and robust processes governing the retrieval and storage of electronic documents. This in turn enables official certification in which stored documents are also ‘legal’ documents.

In this particular case Oracle products are used: the SOA Suite and Content Services (part of Oracle Collaboration Suite). But the same principal can of course be implemented using other technologies available on the market.