One of the related problems is that the process of electronically storing documents has to be compliant with a growing number of standards and laws. ‘Just’ storing documents manually on a file system won’t do.
In a customer case the following technologies are/will be combined to accomplish certified electronic storage of documents:
- Scanners with OCR (Optical Character Recognition) software
- Process orchestration using services
- CMS (Content Management System)
How are these technologies combined?
Once documents arrive they are scanned. The OCR software outputs an image of the document and extracts metadata into an XML file. An automated process picks up the XML file, transforms its contents to a canonical data model and invokes a custom SOAP service, also passing the file location of the associated image. The invoked service then uses a Web Service API to store the image into a CMS database and uses the XML data to create and store metadata/attributes for the document.
Why are these technologies complementary?
While a CMS provides reliable and robust storage of electronic documents and provides multiple interfaces (both programmatic and UI-based), process orchestration and SOA implement traceable, auditable and robust processes governing the retrieval and storage of electronic documents. This in turn enables official certification in which stored documents are also ‘legal’ documents.
In this particular case Oracle products are used: the SOA Suite and Content Services (part of Oracle Collaboration Suite). But the same principal can of course be implemented using other technologies available on the market.