“But I’m scanning to PDF now…” Advanced Search & Capture

By in , , ,

Advanced Searching
The final post in our series looking at “Reasons why scanning to PDF isn’t enough today” is around search and capture.

#6 – Advanced Searching

With a PDF, you are generally limited to searching across the file name itself at the folder level and once inside the document, just the text itself. So we could perform a search in the network HR folder for a particular candidate (John Doe) and see all of his documents. Once inside his file, we can search for specific text that has been made searchable through the PDF iFilter (similar to OCR or Optical Character Recognition for imaged documents). This iFilter process takes the layer of text and makes it text searchable so we can then locate his SSN or perhaps a W-4.

This is a sufficient method for when we can locate the document by its name or the text within, but what if we wanted to only see a specific document type (say all the W-4s for all employees) or maybe a PDF that was an image and had no text to extract, yet was relevant to the file. Say a map or a picture or perhaps a handwritten note that had relevance to this John Doe file?

With a traditional document or content management system, you have a plethora of options when it comes to search and advanced search capabilities. Not only can you rely on the OCR/iFilter text searches for images full of machined text and your electronic documents, but you can take that one step further.

Each document can have additional metadata assigned to it, with customizable fields or inputs to allow you to get very specific about what the document is and what relevant information can be used to retrieve it. So you could have an HR template with fields for Document Type (Employee Application, W-4, I-9, Performance Evaluation, Insurance, etc), along with Last Name, First Name, SSN, Date Hired, Department, and maybe Employee ID # to keep it simple. With this step alone, we are now able to run a search across these field values and you see how much we can do.

We could search all our employees and only see the files related to Insurance Records, or only see employees hired from 1/1/2012 to 6/30/2012. We could easily jump to an employee and be very precise by using a combination of the two such as the SSN and employee application with it taking us right to John Doe’s employee application in a matter of seconds. We could even leverage the full text with this metadata search and search for his address “123 Main Street” along with his Insurance document type and his employee ID # taking us right to the specific page on his insurance that lists his address which may need to be updated. This page can then be emailed on the fly to our employee to confirm we have the information accurate, again in a matter of seconds.

There are obviously a lot more options I left out in this post, but the main idea is that a content management solution provides a much more advanced way of searching across your information, getting you to what you are looking for much faster than relying on PDFs alone.

#7 – Automated Capture

The final reason in our series is related to how documents are captured. When it comes to scanning paper documents (to PDF), most organizations either use software that came with their scanner OR a preferred low-cost alternative. In many cases, their scanning is more ad-hoc than consistent or structured and it leaves a lot desired in terms of functionality, speed, and time required after the scanned item hits the computer. More specifically, each page scanned is looked at the same way (in its batch) and afterwards things like the location it was placed and the name itself may have to be tweaked for an organization’s liking. Overall, a lot of time is spent before, during, and after the paper documents are scanned.

Most content management solutions today have some sort of automated capture tool either built-in or as an optional add-on. Laserfiche has Quick Fields, Westbrook Fortis has tools built-in, and Kofax is another top of mind, very powerful solution when it comes to capture.

These solutions allow you to set up rules or logic based on a specific document type or process, saving time on the front and back-end. Once setup, documents are simply dropped into the scanner and the technology takes over. In many cases, the document can be recognized (invoice vs purchase order) and based on this alone, different rules are then applied (where it ends up after, how it’s named, what data is extracted or looked at to build out the metadata).

In the case of HR, you could have a process setup around blank page removal whereby you can scan entire packets of employees’ files (separating each employee by a blank page). When it recognizes this blank page, it will create a new document and perhaps then look to the first page of this new packet which may be a cover sheet with employee information like their last name, SSN, or employee ID. This information is used to build out the documents name, folder in the system, or maybe other items, all “auto-magically”.

You can even do things like read a barcode, where you may have a lot more information or a simple value that when pulled, then looks to retrieve information from a database. In this case, a barcode might have an employee ID in it (say 353123 when read), which then looks into your HRIS system and populates that documents template fields with all of the information from the HRIS like last name, first name, DOB, address, etc. All of this information NOT needing to be typed in by your staff, so again time is saved.

Some solutions even have the ability to “learn” in real-time so while you may have a new invoice from a vendor that requires a process to be established, thereafter the technology will recognize that specific document and apply the set of rules you outlined earlier. So if there are 1-100 documents like this a month, it might not be a big deal, but if it was 10,000-100,000 a month you can see how this can be very powerful. And in this era of “big data” we are all in today, tools that can help save time and remove steps from your processes are definitely the way to go.

Read other posts in this series
Reasons #1 and #2 – File Names Don’t Matter & Security
Reason #3 – Network & Web Access
Reason #4 and #5 – Auditing & Records Management
Reason #6 and #7 – Advanced Searching & Automated Capture

Fill out our “Request Information” form to learn more about how we can help your organization today. We work with organizations across the Central Valley of California with offices in Fresno, Sacramento, Santa Ana, and Bakersfield.

Leave a reply