Graph-based search has emerged as a major break-through in search technology. It goes beyond simple keyword matching to take into account entities and relationships between entities for increased relevancy. Although graph search in SharePoint today is reserved to Office 365 and Delve, you achieve similar results with SharePoint on-premise under the right circumstances. This blog post will walk you through identifying these circumstances and offer you techniques to run graph-like searches with SharePoint 2013.
What is Graph Search?
In the past couple of years, all major Internet companies have made announcements around graph search; Facebook and its social graph, Google and its knowledge graph; LinkedIn and its Economic graph. Graph search is now getting integrated in mainstream enterprise software too. For one, Microsoft recently released its Office Graph technology to power its new Delve feature in Office 365.In this article, I will show you how you can accomplish similar capabilities today with SharePoint 2013 using our Knowledge Integration Platform.
In case you haven’t had much time to read about graph search, here are the kinds of questions graph search can help you answer:
- Proposals dealing with global deployment for customers in the oil and gas industry
- Data science jobs in New York in companies where my connections have worked at
- Expert with experience in big data who have worked for financial institution customers
These are not queries the SharePoint’s search engine knows how to answer natively today. However, the engine is powerful and flexible, and for the right scenarios, you can apply techniques to handle such queries within your SharePoint deployment using our Knowledge Integration Platform.
First, let’s decompose the problem a bit. In most cases, these queries are really two queries with two different sets of criteria, from which you’ll want to extract a specific subset of results.
- Let’s call result set one, the results matching criteria one and result set two, the results matching criteria two.
- The results of the whole query are the items from result set two which are connected to entities from result set one.
For the first example above, this means that result set one is all the oil and gas customers whereas result set two is all the proposals dealing with global deployment. And the final set of results is all of the proposals found in result set two submitted to customers found in result set one.
To reuse graph terminology, you could consider the result set one the set of “customer” nodes to which all proposal documents of the second result set are connected via a “submitted to” relationship.
Achieving Graph-Like Searches with SharePoint 2013
For the rest of the post, I’ll use the following concrete example to show you how our Knowledge Integration Platform can help implement graph-likes searches with SharePoint 2013 today.
Consider that you are indexing your customer projects and files and you want to be able to search these files based on your customer data. The typical pattern for such a query would be give me all documents containing keywords K for customers which purchased product P or located in North America. For our example, result set one is a set of customers and result set two are the documents for these customers.
Let’s look at the various techniques now. They can essentially be split into two categories: indexing-time solution vs. query-time solutions.
The approach here is simple: improve the quality of the content being indexed so that we eliminate the need for two consecutive queries to find the desired results. In our case, the solution consists simply in applying the customer’s metadata (product bought, location, industry, etc…) to any indexed file related to that customer.
Once the index is properly populated, then you will be able to run queries with the following pattern: <keywords to find in the document> CustomerAttribute:value. For instance: statement of work international deployment customerIndustry:”oil and gas” customerRevenue>10,000,000,000 (that’s 10 billion if you’re counting 😉 )
Indexing time solutions offer the best search experience in terms of responsiveness, but at the cost of flexibility. For instance, if you wish to change your logic, then you will need to re-index all your documents to repopulate the index with the right metadata.
Using our Dataset Connectors
This technique requires the metadata to be added to indexed documents to be stored in a SQL database or SharePoint list. This technique scales well with both the number of documents to index and also the number of customers.
To implement this technique, create a dataset connector pointing to your list of customers in your SQL database or SharePoint list. Then, use one or many of the attributes from the indexed document as the key to look up the customer via the dataset connector. Once the dataset connector has found the matching customer, then it will add the customer’s metadata to the list of metadata already existing for the document being indexed, as depicted below:
The query time approach takes the opposite route of the solutions above. Instead of assuming that we know all the metadata, the kind of queries we want to execute and the relationship between entities, we consider indexing all entities to the best of our abilities and instead decide to perform several queries on behalf of the user in order to return the results he or she is looking for.
Using our federator engine, the query executed is intercepted and pre-processed. In our example here, if a customer runs a query such as statement of work international deployment customerIndustry:”oil and gas” customerRevenue>10,000,000,000, the query is split into two queries, the part focused on the customer filters is executed first to find the matching customers and then the query is rewritten to match only documents related to these customers. The final query executed would look like this: statement of work international deployment and (customerID=id1 or customerID=id2 or customerID=id3 …)
Although this technique requires some programming to build the pipeline stage pre-processing the query and rewriting it, it also offers the most flexibility because you are in control of how the query is understood is rewritten. This also means that you can expand the KQL syntax with any new operator you see fit to build queries tailored to your business needs. For other examples of how queries can be rewritten, see our
BA Insight Federator white paper.
As you can see, there are many ways to implement graph-like capabilities with SharePoint. Its capabilities and extensibility really allows for advanced search scenarios. And when combined with our Knowledge Integration Platform, you can unlock those capabilities within hours or days- tops. Any of the examples I presented above can be implemented in less than a week to create a search application which will delight your end users.