"But I'm scanning to PDF now…" Advanced Search & Capture

Advanced Searching
The final post in our series looking at “Reasons why scanning to PDF isn’t enough today” is around search and capture.

#6 – Advanced Searching

With a PDF, you are generally limited to searching across the file name itself at the folder level and once inside the document, just the text itself. So we could perform a search in the network HR folder for a particular candidate (John Doe) and see all of his documents. Once inside his file, we can search for specific text that has been made searchable through the PDF iFilter (similar to OCR or Optical Character Recognition for imaged documents). This iFilter process takes the layer of text and makes it text searchable so we can then locate his SSN or perhaps a W-4.

This is a sufficient method for when we can locate the document by its name or the text within, but what if we wanted to only see a specific document type (say all the W-4s for all employees) or maybe a PDF that was an image and had no text to extract, yet was relevant to the file. Say a map or a picture or perhaps a handwritten note that had relevance to this John Doe file?

With a traditional document or content management system, you have a plethora of options when it comes to search and advanced search capabilities. Not only can you rely on the OCR/iFilter text searches for images full of machined text and your electronic documents, but you can take that one step further.

Each document can have additional metadata assigned to it, with customizable fields or inputs to allow you to get very specific about what the document is and what relevant information can be used to retrieve it. So you could have an HR template with fields for Document Type (Employee Application, W-4, I-9, Performance Evaluation, Insurance, etc), along with Last Name, First Name, SSN, Date Hired, Department, and maybe Employee ID # to keep it simple. With this step alone, we are now able to run a search across these field values and you see how much we can do.

We could search all our employees and only see the files related to Insurance Records, or only see employees hired from 1/1/2012 to 6/30/2012. We could easily jump to an employee and be very precise by using a combination of the two such as the SSN and employee application with it taking us right to John Doe’s employee application in a matter of seconds. We could even leverage the full text with this metadata search and search for his address “123 Main Street” along with his Insurance document type and his employee ID # taking us right to the specific page on his insurance that lists his address which may need to be updated. This page can then be emailed on the fly to our employee to confirm we have the information accurate, again in a matter of seconds.

There are obviously a lot more options I left out in this post, but the main idea is that a content management solution provides a much more advanced way of searching across your information, getting you to what you are looking for much faster than relying on PDFs alone.

#7 – Automated Capture

The final reason in our series is related to how documents are captured. When it comes to scanning paper documents (to PDF), most organizations either use software that came with their scanner OR a preferred low-cost alternative. In many cases, their scanning is more ad-hoc than consistent or structured and it leaves a lot desired in terms of functionality, speed, and time required after the scanned item hits the computer. More specifically, each page scanned is looked at the same way (in its batch) and afterwards things like the location it was placed and the name itself may have to be tweaked for an organization’s liking. Overall, a lot of time is spent before, during, and after the paper documents are scanned.

Most content management solutions today have some sort of automated capture tool either built-in or as an optional add-on. Laserfiche has Quick Fields, Westbrook Fortis has tools built-in, and Kofax is another top of mind, very powerful solution when it comes to capture.

These solutions allow you to set up rules or logic based on a specific document type or process, saving time on the front and back-end. Once setup, documents are simply dropped into the scanner and the technology takes over. In many cases, the document can be recognized (invoice vs purchase order) and based on this alone, different rules are then applied (where it ends up after, how it’s named, what data is extracted or looked at to build out the metadata).

In the case of HR, you could have a process setup around blank page removal whereby you can scan entire packets of employees’ files (separating each employee by a blank page). When it recognizes this blank page, it will create a new document and perhaps then look to the first page of this new packet which may be a cover sheet with employee information like their last name, SSN, or employee ID. This information is used to build out the documents name, folder in the system, or maybe other items, all “auto-magically”.

You can even do things like read a barcode, where you may have a lot more information or a simple value that when pulled, then looks to retrieve information from a database. In this case, a barcode might have an employee ID in it (say 353123 when read), which then looks into your HRIS system and populates that documents template fields with all of the information from the HRIS like last name, first name, DOB, address, etc. All of this information NOT needing to be typed in by your staff, so again time is saved.

Some solutions even have the ability to “learn” in real-time so while you may have a new invoice from a vendor that requires a process to be established, thereafter the technology will recognize that specific document and apply the set of rules you outlined earlier. So if there are 1-100 documents like this a month, it might not be a big deal, but if it was 10,000-100,000 a month you can see how this can be very powerful. And in this era of “big data” we are all in today, tools that can help save time and remove steps from your processes are definitely the way to go.

“We’re still processing your application…” Real life consequences

We saw this story last night and it just goes to show you how “backed up” organizations are, in both staff but also the proper tools and resources to control their information. It also shows the huge personal impact the lack of technology (in some cases) is having on the lives of many, many people.

While the video goes more into more detail, the paper file backlog at the Veterans Affairs office has left 565,000 veterans waiting for their benefits. Many have conditions ranging from shrapnel wounds to PTSD and have filed claims for disability pay.

The story mentions one veteran who, after 7 months of waiting, had only received a letter mentioning “We’re still processing your application for compensation”

The VA has 4.4 million active records across their 56 regional offices (all paper), with most files being hundreds of pages long. They are all currently processed by hand, while they do intend to switch over to an electronic system by the end of 2015. Why does this have to take 3 years is what I’m wondering?

How many more people will be affected by this, and have to wait for their benefits too, who knows? It’s clear though that the right technology in place could have solved this problem many years ago.