Appleby Corporate Logo
Appleby Technotes
Tips from the technology department of Appleby & Company.

Archive for July, 2012

“But I’m scanning to PDF now…” Advanced Search & Capture

Advanced Searching
The final post in our series looking at “Reasons why scanning to PDF isn’t enough today” is around search and capture.

#6 – Advanced Searching

With a PDF, you are generally limited to searching across the file name itself at the folder level and once inside the document, just the text itself. So we could perform a search in the network HR folder for a particular candidate (John Doe) and see all of his documents. Once inside his file, we can search for specific text that has been made searchable through the PDF iFilter (similar to OCR or Optical Character Recognition for imaged documents). This iFilter process takes the layer of text and makes it text searchable so we can then locate his SSN or perhaps a W-4.

This is a sufficient method for when we can locate the document by its name or the text within, but what if we wanted to only see a specific document type (say all the W-4s for all employees) or maybe a PDF that was an image and had no text to extract, yet was relevant to the file. Say a map or a picture or perhaps a handwritten note that had relevance to this John Doe file?

With a traditional document or content management system, you have a plethora of options when it comes to search and advanced search capabilities. Not only can you rely on the OCR/iFilter text searches for images full of machined text and your electronic documents, but you can take that one step further.

Each document can have additional metadata assigned to it, with customizable fields or inputs to allow you to get very specific about what the document is and what relevant information can be used to retrieve it. So you could have an HR template with fields for Document Type (Employee Application, W-4, I-9, Performance Evaluation, Insurance, etc), along with Last Name, First Name, SSN, Date Hired, Department, and maybe Employee ID # to keep it simple. With this step alone, we are now able to run a search across these field values and you see how much we can do.

We could search all our employees and only see the files related to Insurance Records, or only see employees hired from 1/1/2012 to 6/30/2012. We could easily jump to an employee and be very precise by using a combination of the two such as the SSN and employee application with it taking us right to John Doe’s employee application in a matter of seconds. We could even leverage the full text with this metadata search and search for his address “123 Main Street” along with his Insurance document type and his employee ID # taking us right to the specific page on his insurance that lists his address which may need to be updated. This page can then be emailed on the fly to our employee to confirm we have the information accurate, again in a matter of seconds.

There are obviously a lot more options I left out in this post, but the main idea is that a content management solution provides a much more advanced way of searching across your information, getting you to what you are looking for much faster than relying on PDFs alone.

#7 – Automated Capture

The final reason in our series is related to how documents are captured. When it comes to scanning paper documents (to PDF), most organizations either use software that came with their scanner OR a preferred low-cost alternative. In many cases, their scanning is more ad-hoc than consistent or structured and it leaves a lot desired in terms of functionality, speed, and time required after the scanned item hits the computer. More specifically, each page scanned is looked at the same way (in its batch) and afterwards things like the location it was placed and the name itself may have to be tweaked for an organization’s liking. Overall, a lot of time is spent before, during, and after the paper documents are scanned.

Most content management solutions today have some sort of automated capture tool either built-in or as an optional add-on. Laserfiche has Quick Fields, Westbrook Fortis has tools built-in, and Kofax is another top of mind, very powerful solution when it comes to capture.

These solutions allow you to set up rules or logic based on a specific document type or process, saving time on the front and back-end. Once setup, documents are simply dropped into the scanner and the technology takes over. In many cases, the document can be recognized (invoice vs purchase order) and based on this alone, different rules are then applied (where it ends up after, how it’s named, what data is extracted or looked at to build out the metadata).

In the case of HR, you could have a process setup around blank page removal whereby you can scan entire packets of employees’ files (separating each employee by a blank page). When it recognizes this blank page, it will create a new document and perhaps then look to the first page of this new packet which may be a cover sheet with employee information like their last name, SSN, or employee ID. This information is used to build out the documents name, folder in the system, or maybe other items, all “auto-magically”.

You can even do things like read a barcode, where you may have a lot more information or a simple value that when pulled, then looks to retrieve information from a database. In this case, a barcode might have an employee ID in it (say 353123 when read), which then looks into your HRIS system and populates that documents template fields with all of the information from the HRIS like last name, first name, DOB, address, etc. All of this information NOT needing to be typed in by your staff, so again time is saved.

Some solutions even have the ability to “learn” in real-time so while you may have a new invoice from a vendor that requires a process to be established, thereafter the technology will recognize that specific document and apply the set of rules you outlined earlier. So if there are 1-100 documents like this a month, it might not be a big deal, but if it was 10,000-100,000 a month you can see how this can be very powerful. And in this era of “big data” we are all in today, tools that can help save time and remove steps from your processes are definitely the way to go.

Read other posts in this series
Reasons #1 and #2 – File Names Don’t Matter & Security
Reason #3 – Network & Web Access
Reason #4 and #5 – Auditing & Records Management
Reason #6 and #7 – Advanced Searching & Automated Capture

Fill out our “Request Information” form to learn more about how we can help your organization today. We work with organizations across the Central Valley of California with offices in Fresno, Sacramento, Santa Ana, and Bakersfield.

“We’re still processing your application…” Real life consequences

We saw this story last night and it just goes to show you how “backed up” organizations are, in both staff but also the proper tools and resources to control their information. It also shows the huge personal impact the lack of technology (in some cases) is having on the lives of many, many people.

While the video goes more into more detail, the paper file backlog at the Veterans Affairs office has left 565,000 veterans waiting for their benefits. Many have conditions ranging from shrapnel wounds to PTSD and have filed claims for disability pay.

The story mentions one veteran who, after 7 months of waiting, had only received a letter mentioning “We’re still processing your application for compensation”

The VA has 4.4 million active records across their 56 regional offices (all paper), with most files being hundreds of pages long. They are all currently processed by hand, while they do intend to switch over to an electronic system by the end of 2015. Why does this have to take 3 years is what I’m wondering?

How many more people will be affected by this, and have to wait for their benefits too, who knows? It’s clear though that the right technology in place could have solved this problem many years ago.

How Box Stacks up to the Competition – 5 Limitations

If one company could take credit for pushing the “cloud computing” document management phenomenon, it would be Box. In 2010, they took the document management industry by storm giving companies access to a polished easy-to-use product that had low upfront costs. They are used by over 120,000 businesses worldwide, have native applications for all mobile platforms, and are one of the largest players in the market. Why would you not choose Box to be your document management system?

Well, there are a number of reasons why Box is not meant for everyone. This post is not meant to bash Box as they have a great product for sharing and collaborating on documents securely. As a document management system, however, they fall short in many aspects traditional systems have had for years.

  1. Handing of Large Files – If you have ever tried to use a large document with Box’s viewer you will know what I mean, and it cannot handle PDF’s over 35MB. Even with times changing and bandwidth in many places being plentiful, waiting for the viewer to render pages can be painstaking. Most traditional document management systems store images as single pages so you are able to quickly jump from page to page without downloading the entire document.
  2. Contextual Searches – While Box provides great searching for OCR text, folders and file names, the ability to easily associate other data with a document is extremely helpful for locating files. Tags can work, but the interface for tags is clunky and non-intuitive. It is possible to come up with complex naming schemes that can house much of the information that makes up a document, but the whole idea of Box is creating an easy to use system where everything just works.
  3. Paper Conversion – Box excels in the sharing and collaboration of electronic documents(Microsoft Word, Excel, AutoCAD, etc). However, if you are trying to migrate all of your paper to the cloud, Box offers no easy interface for this. You will need to scan to PDF, name your files something that makes sense and pick a folder. Contrast this to products such as Westbrook’s Fortis or Laserfiche where they have many tools to assist with this process and allow for automated naming and capture using combinations of barcodes/zonal OCR(Optical Character Recognition) as well as database lookups to compare and retrieve additional data about the document you are archiving.
  4. Document Editing – Sometimes you will have the need to update an existing PDF. With Box’s interface it would require you download the full PDF, update the file and then re-upload. With most major document management systems, this would be a single copy or cut and paste action resulting in a much more streamlined process.
  5. Bandwidth – Although this is related to how Box handles large files, it is still a very real issue for many small to medium businesses that are still on the low-end to mid-tier internet connections. If you have someone uploading a large file, prepare for internet slowdown if your office is running off of DSL or a low-end cable internet connection. While the internet is always getting faster, there are still many rural areas that just do not have faster internet options.

In conclusion, Box is great for online collaboration with your primary use dealing with electronic documents. We use Box and we love it for many tasks. However, if you are looking for a system which can hold your electronic documents, but also excels at the housing of your paper documents you may want to look beyond Box. Companies such as Westbrook Technologies and Laserfiche are industry leaders and have been around for many years. You can house their solutions both onsite and off premise while being fully secured.


If you would like more information on how to choose a document management system that is right for you, please use our Contact Us form.

Looking at Document Management ROI – Easier Done Than Said

In business, there are generally 3 levels of ROI (return on investment). Level 1 focuses on cost savings, Level 2 costs and benefits, with Level 3 being the most powerful and focusing on the business case.

As more and more organizations today are looking to “go paperless”, converting their paper documents and records into something easier to manage, there still needs to be a ROI to justify these projects.

Regardless though of the type you are looking for support on, the following video should help as it outlines a simple way to come up with the ROI of a document management / enterprise content management (ECM) implementation. Yes, the title is misleading as you DO have to gather this data based on your current environment and processes, but once you have all this information, the rest is very straightforward.

One tool we have looks at 4 common areas where both hard and soft costs are taken into account. These are the “Pre-Deployment Costs” and the video provides more depth around these items as well as the “Investment Costs” and “Post-Deployment Costs” to come up with our “ROI Calculations”.

1. Labor Costs
How many people are involved? What is their salary? How much time do they spend retrieving, sorting, recreating and faxing information?
2. Storage Costs
Where is information stored? How much do we pay for this (per sq/ft)? Is it onsite or offsite? Do staff ever need to access it? If so, what do we pay them?
3. Copying Costs
How many pages of paper are generated or printed each day? What is that cost (including toner/ink)?
4. Distribution Costs
How many faxes do we send/receive a day? What is the cost to fax a page? How much do we spend on overnight delivery and postage?

Best viewed in 720p resolution and fullscreen mode.

We are available to share more information about the tool we used, and we look forward to any questions you have.

Fill out our “Request Information” form to learn more about how we can help your organization today. We work with organizations across the Central Valley of California with offices in Fresno, Sacramento, Santa Ana, and Bakersfield.

“But I’m scanning to PDF now…” Auditing & Records Management

Records management challenges

Continuing on with our series of 7 reasons why scanning to PDF alone isn’t enough, let’s continue on with reason #4 and #5, auditing and records management.

#4. Auditing

Today, more than ever, it’s important to be able to not just secure your data but also provide insight on what’s occurring with it.

In the “paper world” documents can be left on someone’s desk, eventually read by someone else, removed, lost, damaged, etc. Many people think that their records are more secure here vs. in an electronic format but that’s not the case. We discussed Security earlier, and with a PDF it’s rather hard to track or audit the activity of your staff or colleagues.

Let’s say we have an HR folder on the network where we store all of our PDF files and items we scanned into the computer. So long as staff have access to this folder, they can access these documents, print them, make changes or delete them (given the right), as well as other activities. Say someone did delete a document in here, it’s hard to later go back and track that activity. It’s difficult to run a report showing who accessed the document, what they did with it, and what their final action with it was.

With a document management solution though, all true/false events can essentially be audited or tracked. For example, we could see not only that Jane Doe logged into the system at 2:57 p.m. on April 23, 2012, but also that once inside she attempted to access the John Smith file, and then tried to save it locally. Or she accessed the file, made changes to a particular field. Or she simply selected the file and deleted it. All of these activities can be tracked so that after the fact, there’s an “e-paper trail” a manager could follow. The organization now has insight into the activity surrounding their digital assets, and a report can even be generated highlighting all of this activity.

As a side note, many document management solutions have Recycle Bins (similar to Windows) where if an item is deleted, it’ll end up here. So while Jane Doe may delete a file in the example above, it won’t be permanently deleted and a manager may have access to the Recycle Bin to restore it or see what has been deleted by staff.

#5. Records Management

Another large reason why scanning to PDF today is not enough is records management (RM). Today records management is a phrase spoken and heard by many, and organizations of different sizes are becoming more and more aware of the need for a proper RM strategy to be in place.

Our rule of thumb is that you want to hang onto records long enough to where they are still seen as an asset, but not too long to where they are a liability.

For example, in California after an employee is let go, the employer is responsible to hang on to their respective application and related files for 7 years. Once this retention period is up and the document has met the end of its life cycle, it could technically be purged or destroyed.

In the physical realm, this is generally managed by boxes labeled by year and stored away in your warehouse or perhaps filing room. Unless an organization has a dedicated records manager, they simply follow a process of moving the records offsite for destruction at the beginning of the calendar year when those documents are now eligible for disposition. This is an ideal scenario, while many organizations keep records for much longer than they need to. Some take the approach of storing them forever or in perpetuity while this might not be required.

With PDFs, there really is no good way to handle this outside of doing something similar electronically. In other words, you could create a folder structure in Windows based on year or retention type, although staff won’t see this as helpful and it’ll be tough to access specific records. Those in HR don’t necessarily care about record types and retention, and would prefer to navigate to an employee’s folder and see all of their records in one location, more of an employee-centric view.

With a document/records management software solution, this is now much easier to manage. As records are brought into the solution, their retention schedules can automatically be assigned with staff spending very little time on this step. So the employee files may have a 7 year retention schedule assigned as an example. After that period of time, the solution won’t ever automatically purge or delete them as this would be a risk, and it wouldn’t be helpful. Instead, a records manager or staff member can run a report or search in this system and return all the records that ARE eligible for disposition. Say this report or search brings back 2,000 results, it’s very likely then that these results have reached the end of their life cycle and can be destroyed properly.

As a side note, the highest level of records management certification a software solution today can receive is DoD 5015.2. DoD 5015.2 is the de-facto software standard which provides implementation and procedural guidance on the management of records in the DoD. It establishes requirements for managing classified records, and includes requirements to support the Freedom of Information Act, Privacy Act, and interoperability.

So not only can we track or audit more activity with a document/records management solution compared to just scanning to PDF alone, but we also have more control over how our records are dealt with and can remain in compliance easier.

Read other posts in this series
Reasons #1 and #2 – File Names Don’t Matter & Security
Reason #3 – Network & Web Access
Reason #4 and #5 – Auditing & Records Management
Reason #6 and #7 – Advanced Searching & Automated Capture

Fill out our “Request Information” form to learn more about how we can help your organization today. We work with organizations across the Central Valley of California with offices in Fresno, Sacramento, Santa Ana, and Bakersfield.