Search orchestration is a heavily used method for querying multiple search indices at once, and then providing the results of those queries back to users on the internet. Many sites, including travel sites like Kayak, Booking, Expedia, Priceline, TripIt!, and Orbitz search multiple search indices in the background, and then combine these results and present them to you. The same is true of online shopping sites like Nextag, PriceGrabber, Google Shopping, Bing Shopping, and Shopzilla, which aggregate and deliver comparison pricing for you.
The concept is also used by Office365 Home Search where orchestrated results for three separate search indices for content “In SharePoint”, “In your OneDrive”, and in “Links and attachments in email” are presented in a single interface. Bing for Business also does this by combining results from the SharePoint Online index with the primary Bing search index. In this blog I will discuss Search Orchestration’s application to enterprise search.
When Should Search Orchestration Be Used?
- When Indexing Is Not Feasible
If your enterprise search strategy includes incorporating content from the web, subscription services, and/or social media, Orchestration is the answer. It is impractical for organizations to try and replicate web indexes, and subscription services (Bloomberg, LinkedIn, or LexisNexis, for example) do not allow crawling. In terms of Social Media, it is better to include a service like Topsy (or Twitter itself) in an Orchestrated Search.
- When Compliance Dictates Multiple Indices
Data originating in certain geographical areas or containing specific types of data (for example PII) may be required to remain on specific infrastructure or in a specific location. In these instances, an orchestrated search deployment allows this data to be included in the search deployment.
- When Indexing is Not Effective
It may not be feasible to scale search infrastructure to support billions of files when those files have individual value rather than group value. Take OneDrive and Exchange for example. Each user may have thousands of files and emails that are valuable to them individually, but those same files aren’t valuable or even accessible to other users. It does not make sense to include those files into the core central index.
When Should Search Orchestration Not Be Used?
- When Metadata is Not Sufficient
The availability of quality metadata from the varied sources of information necessitates a metadata generation capability. Metadata generation solutions (like our AutoClassifier) integrate into the content ingestion processes. In a Search Orchestration scenario, this ability is likely not available, providing no solution to the metadata gap.
- When Relevancy Needs Improvement
Orchestrated sources have a “fire and forget” aspect, which prevents the search application from adjusting or modifying the relevancy of returned results. Search applications will find their hands tied in terms of the relevancy of results returned from those sources.
- When Personalization is Needed
Adjusting queries and results to account for user location, department, role, and individual interest just scratches the surface of the level of personalization that users will expect. Orchestrated sources will face issues with applying personalization strategies due to the lack of direct access to the underlying index.
- When Enhanced Applications are Needed
Enhanced applications can provide additional capabilities to help end users. An example of an enhanced application would be NLP text engines like Linguamatics, and time saving tools like our Smart Previews. An orchestrated content source would not be able to provide the document level access required to support these tools.
Pros and Cons
- Orchestration Pros
- Can include existing sources of large datasets
- Can address compliancy requirements
- Reduced search infrastructure
- Avoids indexing content not valuable to multiple users
- Can be deployed quickly
- Orchestration Cons
- May be limited in terms of UI capabilities
- Cannot generate additional metadata
- Cannot modify relevancy
- Cannot apply personalization
- Cannot enable enhanced applications
Advice
1. Spend Time Designing the Right UI
The core decision to be made here is providing the results in a single integrated set of results or in a dedicated section specifically for the source. Our advice is to push for an integrated set of results first and fall back on the dedicated section. This gives the user fewer areas to look in and will prevent the orchestrated results from being an afterthought.
2. Level the Relevancy
Don’t follow a round robin interleaving strategy. Analyze how the orchestrated source communicates the relevancy score for the results it has provided and “level” these scores across sources. This will allow you to ensure that the mix of results provided to users contain the most relevancy results, regardless of the source they are returned from.
Read more about BA Insight’s SmartHub powered orchestration capabilities here.