This post has been written by BA Insight Guest Blogger, Agnes Molnar, Founder and Managing Consultant, Search Explained. There are many facts that have to be in place and fit together in order to make Search good and successful. One of them is metadata – in this post I’m going to show you why and how to get its real benefits in Enterprise Search.
First of all, let’s see where we use metadata in search. The first screenshot is a typical Enterprise Search user experience is SharePoint 2013:
What can we see here? Query, results, Hover Panel, refiners, etc. – All based on metadata. There are a lot of operations behind the scenes, too: query rules, ranking models, result sources, etc. – Again, all based on metadata.
Let’s move one step forward. The following picture shows a customized, advanced Search Based Application:
Everything you can see on this screenshot is metadata: filters applied, data displayed, inputs of charts, etc.
By now, you should get the point. If you don’t have metadata on your content, the chance your Search can be good is tiny. Your content without metadata is like files stored in a file share. Findability is very poor, and the number of duplicates and multiplications grow exponentially. You are on the best way to end up having a content silo.
Having metadata on the content is less than 50% technology. It’s much more about human behavior, habits and psychology. Even if we have the technology ready, people tend to fill the properties (metadata) with incorrect values. The most common reasons:
- They are not sure what to enter into the fields.
- They are not educated how to use the metadata forms.
- They are not motivated to spend even a couple of minutes to fill in the forms. It’s much easier and more convenient to leave them empty. And even if the metadata is required, users are not motivated to fill it the proper values. Much easier and more convenient to choose the first or the default value, or enter something like “qwerty” or “123”.
Side note: People tend to tag pictures on Facebook much more than enterprise content in their everyday job. The reason is mostly psychological: they are motivated much more on Facebook through getting more likes and comments. In the business, we have to do the same: we have to find the way how to motivate our users to use the metadata capabilities in the most effective way.
The result is: although technically we have metadata on the content, the metadata is incorrect, inconsistent and messy.
In this case, our Search application will be even worse than without metadata. Bad metadata is misleading. Inconsistent metadata is hard to track and correct. We put a lot of efforts into a system that gives more headache than help.
Manual vs. Auto-Tagging
There are two options to put metadata to our content.
Out-of-the-box, we have Manual Tagging: users can add, modify and remove metadata manually, by their best will and knowledge. This feature is great but has some limitations:
- Users make mistakes, even with the best will. These mistakes lead us to have bad, improper metadata.
- Entering metadata manually is slow. If you add millions of documents during a migration, for example, it’s almost mission impossible tagging all the documents properly.
- As we’ve seen above, people tend to skip tagging or to choose the default values. Avoiding this is a complex, cultural change.
To get the full benefits of the tagging features, we need some more sophisticated solution, and this can be Auto-Tagging or Auto-Classification. With this method, some additional engine does the tagging, based on the rules we set up in advance. Basically, there are two different approaches in Auto-Tagging:
- Pre-defined taxonomy based classification. In this case, we define a domain specific taxonomy, for example manufacturing, healthcare, insurance, etc., and the tagging happens according to these. The pre-defined taxonomy can be purchased from a taxonomy provider, or can be prepared by our own.
- Analyzing the current content corpus and create the taxonomy by it. This needs advanced linguistic and intelligent features in order to be able to extract the “useful” information as tags from the content.
- Of course, we can use a “hybrid” solution, where we mix the two methods described above. Taxonomies always have to be edited and maintained by human resources, as machine algorithms cannot replace human intelligence and subject matter experience.
As you can see, metadata is critical for providing good quality Enterprise Search (and findability!) solutions. Technically, we have options to tag our content with various types of metadata, but manual tagging never can be as good as an automated or half-automated way. But the key for auto-tagging is a proper, domain specific taxonomy – general, good quality- auto-tagging is still like a sci-fi story.