Andre Pavlovic is Senior VP of Product Planning at Questys MessageVision, provider of ECM, BPM, fax, and capture technologies. Andre has over 15 years experience in the ECM industry in the areas of product planning, management, marketing, and development. He can be contacted at [email protected].
8 things you need to know about automating document indexing
1 -- Choose Your Battles
Just because you have purchased a great new scanning/capture/data entry automation application doesn't mean that it makes sense to automate every type of document under the sun. Sure you may feel empowered to spend the time or money required to automate the indexing of that quarterly report that is generated only 4 times per year, but that would be analogous to hunting for quail with a bazooka. Make sure that you look at the feasibility and return on investment before jumping into projects. Always take the automation projects with the highest & fastest ROI first and pass on the low or negative net present value projects.
2 -- Choose the Most Accurate Recognition Technology
Obviously, your choice may be limited by the records that you are trying to automate. However, if you do have a choice, follow this simple rule. Barcode/Patch Code recognition is the most accurate, then OCR (machine printed text recognition), then constrained ICR (handwriting recognition), and lastly unconstrained ICR.
3 -- Test For Recognition Accuracy Early
Even when using barcode recognition or OCR, the accuracy of recognition will likely be less than 100% in the long run. Make certain that you test the accuracy of the recognition component of your capture/automation design early on in the process with a relatively large sample. This will ensure that there are no surprises down the road.
Additionally, if you are in the evaluation stage of selecting an application, make sure that the supplier of the product performs the demonstration with a large sample of your documents. Avoid demonstrations using standard documents that are prepared by the vendor. Why? You want to ensure that the automated indexing procedure that they have developed works on your documents with a high degree of accuracy...not only on their documents that they have prepared for the demo. A good trick to throw at vendors is to provide 10 samples of the document type to be automated for the demonstration. Then, at the time of the demo, give them 100 more documents (of the same type) that they have never seen before. This will truly address the accuracy of the application and automation process.
4 -- Key on Documents You Control
In many capture applications, the logic used to automate indexing and separate an individual document (set of pages) from a batch is to key off of some identification page. In most cases, it is easier to achieve full automation with a high level of accuracy if your identification page is one that you control.
Assume that you need to scan and index all of your vendor bills into your DM system. Automating the indexing for these documents can be difficult since you have no control and there are many different formats of vendor invoices. For example, 1000 different vendors could mean 1000 or more different invoice formats. Creating an automated indexing process would be very time consuming in this case. Furthermore, your vendors could change formats of their invoice on you without any notice. This can result in the constant reworking of your data entry automation scheme. Additionally, automating vendor invoices is a process that typically requires human quality control which will increase your overall costs.
As an alternative, explore automating the input using records you control as the identification page. Using our vendor invoices example, you can use the checks you cut to pay the invoices as the identification page. Your bank checks, in conjunction with your accounting system's database, can typically provide an automation process that is nearly 100% accurate and fully automated. The key here is a change in the process. Rather than having each individual vendor invoice in your DM system as the process output, you would have a check packet in your system as the output. The check packet would consist of the check (or check stub) followed by all of the invoices the check paid for. If you every need to retrieve a specific invoice, you can search your accounting system for the check number that paid it and then pull up the check packet in your DM system.
Sure, this process does add an extra step to retrieval, but it cuts down dramatically on the input process costs and would provide a greater ROI due to reduced input costs related to quality assurance and the like.
5 -- Quality Assurance
Any index automation process is prone to some level of error. Therefore, it is best practice to establish some level of quality assurance procedure, even if it is a very brief procedure. Even though today's scanning devices have features to detect multi-feeds and auto-threshold scanned images, you will want to verify image quality even if on a random basis.
6 -- Pre and Post-Verification
It is important to ensure that you track what was intended to be processed and what was actually processed. At a minimum, simple page counts and record (individual documents in the batch) counts should be employed and verified with the output. Even the most thorough index automation process can come across an unexpected file that will throw the process off.
7 -- Documentation
There are thousands of technical writers working for thousands of software companies. Some are definitely better than others. However, regardless of how good these technical writers are, boiler plate documentation is never best for a specific process. Take the time to document the process (with screenshots and videos if possible) for your scanning/indexing staff. The time spent documenting the process will pay off tenfold down the road.
8 -- Outsource
Last, and certainly not least, outsource any manual indexing processes that make sense to outsource. All too often, firms spend time and money staffing people to perform tasks that are not part of the firm's distinctive competence. Outsourcing makes sense in many situations, even for small companies and small projects. Keep in mind that you can lower costs through 'hybrid' outsourcing...where only part of the process is outsourced. For example, local service bureaus can charge an arm and a leg to perform scanning and indexing services. It costs too much using foreign firms for small projects due to shipping costs of the records. For processes where the indexing can't be automated, try scanning the files yourself and outsourcing the indexing to any one of the thousands of firms in India with a simple upload of the scanned files.
Some other 8 things posts that may be of interest:
You are absolutely right about testing early for recognition accuracy. You don't want inaccurate info and you don't want to have to do the indexing repeated times if you don't have to.
Posted by: Riverside Machine | May 21, 2010 at 10:40 AM
Great write up! i completely agree that Barcode/Patch Code recognition is the most accurate, then OCR (machine printed text recognition), then constrained ICR (handwriting recognition), and lastly unconstrained ICR.
Posted by: carpoint | November 17, 2010 at 08:15 AM
I totally agree with your point of outsourcing benefiting even the smaller companies or projects.
Posted by: betonvloer egaliseren | November 18, 2010 at 12:18 AM
I commend #4 and #5 respectively. Having a proper and well-arranged filing accounting system sorted out can make it easy for QA to take the necessary actions. After many times of inspecting and inventory, then and only then can it be officially documented.
Posted by: Darcy Grubaugh | February 22, 2012 at 10:42 AM