BA Insight Blog
Ideas and Thoughts on Enterprise Search, Microsoft SharePoint and FAST Search
 
January 12, 2011
Q: We are embarking on a search initiative this year. What best practices can I follow to ensure an optimal search architecture and topology?

Step 1
Your first step is to identify what data you wish to surface through search. You should avoid the temptation to index it all, because with increasing volumes of data relevance will inevitably suffer. Unlike the Internet, the Enterprise has far fewer people willing to take the time to identify relevant content by linking to it which is the key to Google's relevance ranking algorithm.

This being said, you shouldn't be too restrictive either. Enterprise Search technology has evolved to the point where it is finally delivering on the promise of a single access point to all Enterprise content. SharePoint Search 2010 includes Business Connectivity Services, a tool that enables SharePoint to index database content where security is not important. For secured systems such as ERP, ECM, CRM, and custom apps. 3rd party vendors offer Enterprise Connectors that fully honor the security of the target system.

Once a rough sense of corpus size has been established you can get a general sense of your "starting point" architecture based on Microsoft's recommendations seen in the table below. The term starting point is used because your search project may have SLA's that are higher than average. This will require adding additional servers to the baseline.

Number of Items - Starting point architecture
0-1 million - Limited deployment
1-10 million - Small farm topology
10-20 million - Medium shared farm topology
20-40 million - Medium dedicated farm topology
40-100 million - Large dedicated farm topology

Step 2
So step 2 is to determine if you have requirements above what the prescribed architecture can deliver as defined by Microsoft. If, as an example, you'll be indexing content at a remote location over a WAN, the crawl speed will be significantly reduced. If there is a requirement for the index to be updated frequently because the data changes quite often, a change to the baseline topology is required.

Let's assume you'll be deploying a small farm topology with the requirement identified above. Adding and additional server to host an additional Crawl Application Service would enable one crawler to index local content, while the second crawler indexed the content over the WAN. The additional components are depicted in the figure below in yellow.



Other factors that would require beefing up the starting point architecture include a large number of concurrent users, hardware availability, bandwidth, etc.

Step 3
Once you've settled on a logical topology, it's time to talk hardware. The following table details how many physical servers are required given your topology



To get a sense of the type of hardware required for a Small Farm Topology, please refer to this great post by Hernando Silva from Microsoft. He blogs about an actual deployment at Microsoft for one of their smaller divisions. The post is here: http://blogs.msdn.com/b/enterprisesearch/archive/2010/06/15/sharepoint-search-2010-in-a-small-scale-farm-hardware.aspx


 
WRITTEN BY MARTIN MULDOON

Martin MuldoonMartin Muldoon
, Director of Product Marketing
Send an E-mail


​​