Have you been challenged with finding information in your data sources (libraries/databases)? Why is it that in an accounting, payroll, product or similar system you can always find what you are looking for if it is there?
I believe the answer is simply that in those types of data sources you not only know what you are looking for but your data has been captured and then processed in such a way to allow you to find what you are looking for. Data stored in these types of databases have been captured with metadata that is commonly used within the organization to retrieve that particular record
or information. These data source’s fields and corresponding values (a.k.a. metadata) where preselected at the time the data source was created to specifically serve the function of retrieval, based on the content of the data source and how the data was going to be used. Not surprisingly you will find this type of data source to be what is commonly referred to as “structured data”.
When it comes to a data source with “unstructured data” this task is more difficult because the use of the data is generally not specific to a single function like, for instance, a “Parts” database would be. Additionally, the data making up these data sources are electronic files containing anything from office documents and emails to images, videos and anything else captured during the normal communication and collaboration processes within an organization.
The solution to dealing with unstructured data is to provide a common structure to the data being captured by using all the data available from the source and adding any business knowledge directly from the business unit that will be using the data. Additionally, the methodology supporting the solution needs to be flexible by allowing it to support changes to the current business need, future requirements and is able to address previously processed data, and last but not least keep cost down by allowing for changes to take place quickly without involving end users.






