Monday, April 30, 2012

Almost Excluding Specific Search Results in SharePoint 2010

Sometimes you want to hide certain content from being exposed through search in certain SharePoint web-applications, even if the user really has access to the information in the actual content source. A scenario is intranet search that is openly used, but in which you want to prevent accidental information exposure. Think of a group working together on reqruiting, where the HR manager use the search center looking for information - you wouldn't want even excerpts of confidential information to be exposed in the search results.

So you carefully plan your content sources and crawl rules to only index the least possible amount of information. Still, even with crawl rules you will often need to tweak the query scope rules to exclude content at a more fine-grained level, or even add new scopes for providing search-driven content to users. Such configuration typically involves using exclude rules on content types or content sources. This is a story of how SharePoint can throw you a search results curveball, leading to accidental information disclosure.

In this scenario, I had created a new content source JobVault for crawling the HR site-collection in another SharePoint web-application, to be exposed only through a custom shared scope. So I tweaked the rules of the existing scopes such as "All Sites" to exclude the Puzzlepart JobVault content source, and added a new JobReqruiting scope that required the JobVault content source and included the content type JobHired and excluded the content type JobFired.

So no shared scopes defined in the Search Service Application (SSA) included JobFired information, as all scopes either excluded the HR content source or excluded the confidential content type. To my surprise our SharePoint search center would find and expose such pages and documents when searching for "you're fired!!!".

Knowing that the search center by default uses the "All Sites" scope when no specific scope is configured or defined in the keyword query, it was back to the SSA to verify the scope. It was all in order, and doing a property search on Scope:"All Sites" got me the expected results with no confidential data in it. The same result for Scope:"JobReqruiting", no information exposure there either. It looked very much like a best bet, but there where no best bet keywords defined for the site-collection.


The search center culprit was the Top Federated Results web-part in our basic search site, by default showing results from the local search index very much like best bets. That was the same location as defined in the core results web-part, so why this difference?

Looking into the details of the "Local Search Results" federated location, the reason became clear: "This location provides unscoped results from the Local Search index". The keyword here is "unscoped".


The solution is to add the "All Sites" scope to the federated location to ensure that results that you want to hide are also excluded from the federated results web-part. Add it to the "Query Template" and optionally also to the "More Results Link Template" under the "Location Information" section in "Edit Federated Location".


Now the content is hidden when searching. Not through query security trimming, but through query filtering. Forgetting to add the filter somewhere can expose the information, but then only to users that have permission to see the content anyway. The results are still security trimmed, so this no actual information disclosure risk.

Note that this approach is no replacement for real information security; if that is what you need, don't crawl confidential information from an SSA that is exposed through openly available SharePoint search, even on your intranet.

No comments: