Friday, July 18, 2008

The Elusive Crawl Rule

I have been working on this rather large Enterprise Search project for a little while now. For some background, we are using Microsoft Search Server 2008 to index content where the meta data is contained in databases and the full text content is contained on network shares.

The project is at the point we we are bringing on extra content sources such as the traditional file share content. In order for this content to be indexed, our default content account will need to be granted read access to the file shares. At this point, my customer asked me where the password was stored for the content account.

Hmm, interesting question I thought. So we began hunting down where this information is stored.

Step 1: SQL Profiler. Create a crawl rule with a custom account and see what is logged. Lo and behold - nothing.

Step 2: Registry. Success. It turns out that the crawl rules, content sources and basically everything related to the index process is located in the registry.

Registry Root:HKLM\Software\Microsoft\Office Server\12.0\Search\Applications\<SSP GUID>\
Content Sources:...\Gather\Portal_Content\Content Sources
Crawl Rules:...\Gather\Portal_Content\Sites\...
Credential Cache:...\Gathering Manager\Secrets\...

So initially I thought, why the registry? In fact the answer makes quite a lot of sense. As you can only have a single indexer for a SSP, it makes sense to place this information in the registry where it can be locked down. So from a security point of view, you must compromise the server first to get access to the registry, then you would need to try and decrypt the secrets. Also performance would be increased as it is a local registry read.