Tuesday, December 13, 2011

Search Configuration Best Practices

 This is the best practice I had find to configure search in SharePoint 2010:
    1. Add a crawl component to a Search Service Application
1)    In Central Administration, in the Application Management section, click Manage service applications.
2)    On the Service Applications page, click the name of the Search Service Application to which you want to add a crawl component.
3)    On the Search Administration page, in the Search Application Topology section, click the Modify button.
noteNote: The SharePoint Search topology cannot be changed in Standalone installations.
4)     On the Manage Search Topology page, click New, and then click Crawl Component.
5)    In the Add Crawl Component dialog box, in the Server list, click the farm server to which you want to add the crawl component.
6)    In the Associated Crawl Database list, click the crawl database you want to associate with the new crawl component.
7)    In the Temporary Location of Index field, you can optionally enter the location on the server that will be used for creating the index files before propagating them to the query components. If you want to accept the default location, leave the contents of this field unchanged.
8)    Click OK to add the new crawl component to the job queue.
9)    On the Manage Search Topology page, click the Apply Topology Changes button to start the SharePoint timer job that will add the new crawl component to the farm on the specified server.
    1. Create or Edit a content source.  
From Search Administration Page, in the Crawling section at the quick navigation bar, click Content Sources.

To create a content source

1.    On the Manage Content Sources page, click New Content Source.
2.    On the Add Content Source page, in the Name section, in the Name box, type a name for the new content source.
3.    In the Content Source Type section, select the type of content that you want to crawl.
4.    In the Start Addresses section, in the Type start addresses below (one per line) box, type the URLs from which the crawler should begin crawling.
5.    In the Crawl Settings section, select the crawling behavior that you want.
6.    In the Crawl Schedules section, to specify a schedule for full crawls, select a defined schedule from the Full Crawl list. A full crawl crawls all content that is specified by the content source, regardless of whether the content has changed. To define a full crawl schedule, click Create schedule.
7.    To specify a schedule for incremental crawls, select a defined schedule from the Incremental Crawl list. An incremental crawl crawls content that is specified by the content source that has changed since the last crawl. To define a schedule, click Create schedule. You can change a defined schedule by clicking Edit schedule.
8.    To prioritize this content source, in the Content Source Priority section, on the Priority list, select Normal or High.
9.    To immediately begin a full crawl, in the Start Full Crawl section, select the Start full crawl of this content source check box, and then click OK.

 

To edit a content source

1.    You can edit a content source to change the schedule on which the content is crawled, the crawl start addresses, the content source priority, or the name of the crawl. Crawl settings and content type cannot be changed when editing a content source.
2.    On the Manage Content Sources page, in the list of content sources, point to the name of the content source that you want to edit, click the arrow that appears, and then click Edit.
3.    After you have made the changes that you want, select the Start full crawl of this content source check box, and then click OK.
    1. Add some crawling rules to include or exclude paths from crawling.
Add rules to exclude the following paths:
    • http://*/_catalogs/*
    • http://*/_layouts/*
    • http://*/Lists/*
    • http://*/Documents/*
    • http://*/Forms/*
    • http://.*?/DocLib[0-9]*/.*?
Note: just for the last rule you have to check the option: “Use regular expression syntax for matching this rule”
You can add a rule, to exclude some URL, as in the following screenshot:
  Adding a rule to exclude some URL


Add the following rule to forcibly include pages as normal http pages. Select the option “Include all items in this path” And check the two options: “Crawl complex URLs” and “Crawl SharePoint content as normal http pages”:
    • http://*/pages/*.*
(Note: if you was using non publishing site you would then use “http://*/sitepages/*.*” and also the previous deny rules may have to be modified.)
You can do the previous include rule as in this screen shoot:
 Adding a rule to include all files that have extensions
           

Also, add the following rule to give the crawler the chance to browse the directories without including those directories in the search. Note: this role work with integration with the previous rule that include all files in the search.
You can do the previous include rule as in this screen shoot:
 Adding rule to allow searching inside directories without including the directories them self.


Unfortunately, the previous configuration will work well only for sites that do not have a redirection page at the root. An example, of a site that has a redirection page at the root, is a multilingual site with a variation redirection page at the root. However, if this is your case, simply do two things. First, replace the rule: "http://*/Forms/*" with an exclude rules for those three: "http://*/Forms/Thumbnails.aspx", "http://*/Forms/AllItems.aspx" and "http://*/Forms/DispForm.aspx". Second, do not use any include rules. This will make you rules list as following:
  • http://*/_catalogs/*
  • http://*/_layouts/*
  • http://*/Lists/*
  • http://*/Documents/*
  • http://*/Forms/Thumbnails.aspx
  • http://*/Forms/AllItems.aspx
  • http://*/Forms/DispForm.aspx
  • http://.*?/DocLib[0-9]*/.*?

 Note: just for the last rule you have to check the option: “Use regular expression syntax for matching this rule”.
Enjoy!

Kindly support me by giving some good feedback, adding some useful comments, share it at Facebook and/or make a +1 for the article at Google from the bottom of the page.
Thanks to Abdullah Sibai & Firas Kassoumeh
________________________________
محمد الطباع (Muhammad Altabba)
SharePoint Developer with Project Management and Team Leadership Activities