{"id":2093,"date":"2012-01-17T07:45:32","date_gmt":"2012-01-17T07:45:32","guid":{"rendered":"https:\/\/poiseddevelopers.com\/reality-tech\/?p=2093"},"modified":"2024-05-13T10:43:50","modified_gmt":"2024-05-13T10:43:50","slug":"tuning-your-crawl","status":"publish","type":"post","link":"https:\/\/poiseddevelopers.com\/reality-tech\/tuning-your-crawl\/","title":{"rendered":"Tuning Your Crawl"},"content":{"rendered":"<p>Want to tune your Search crawling? There\u2019s plenty of benefit to be had refining how Search crawls in SharePoint. Eliminate useless page hits, or documents that will fail crawl processing.<\/p>\n<p>It\u2019s another way to exclude sensitive documents as well, if you can find a suitable search crawl exclusion rule.<\/p>\n<p>I found out the hard way that SharePoint URLs defined in a Content Source MUST be a Web Application.<\/p>\n<p>If you only want to crawl a subsite your recourse is to pare out all other sites using Crawl Rules.<\/p>\n<p>The Crawl Rules come in two basic flavors; simple wildcard which is quite intuitive, and Regular Expressions. You can find the Crawl Rules in Central Admin, General Application Settings, Search, (your Content SSA if in FAST), Crawl Rules ( visible on left).<\/p>\n<p>Surprisingly, there is scant documentation on the Regular Expression implementation in SharePoint.Through a bit of digging and trial and error I\u2019ve summarized the Regular Expression operators supported in SharePoint:<\/p>\n<table style=\"border-collapse: collapse; width: 60%; margin-top: 20px;\">\n<tbody>\n<tr style=\"background-color: #f2f2f2;\">\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">?<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">Conditional matching; matches optionally<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">\u201chttp :\/\/SharePoint\/List_%5ba-z%5d?.aspx\u201d<br \/>\nthe char a-z is optional<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">*<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">Matches on zero or more<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">\u201chttp :\/\/SharePoint\/List_M*\u201d<br \/>\nno M or M or MM\u2026at the end.<\/td>\n<\/tr>\n<tr style=\"background-color: #f2f2f2;\">\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">+<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">Matches on one or more<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">\u201chttp :\/\/SharePoint\/List_M\u201d<br \/>\nOne or more Ms at the end<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">.<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">Match one character<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">\u201chtt p:\/\/SharePoint\/List_\u201d<br \/>\nOne character expected after _<\/td>\n<\/tr>\n<tr style=\"background-color: #f2f2f2;\">\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">[abc]<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">Any characters; I use abc as example. Ranges a-c work too<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">\u201chttp :\/\/SharePoint\/List_%5ba-z]\u201d<br \/>\nMatches on any List_ with any letter a-z<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">|<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">Exclusive OR<br \/>\nIf both sides are true, this evaluates to false.<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\"><\/td>\n<\/tr>\n<tr style=\"background-color: #f2f2f2;\">\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">()<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">Parentheses group characters for an operation<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\"><\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">{x,y}<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">Range of counts<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\"><\/td>\n<\/tr>\n<tr style=\"background-color: #f2f2f2;\">\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">{x}<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">Exact count<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\"><\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">{x,}<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\">X or more counts<\/td>\n<td style=\"border: 1px solid #dddddd; text-align: left; padding: 8px;\"><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>For FAST, note the Crawl Rules are under your Content SSA, not the Query SSA.<\/p>\n<p>To create an Exclusion Rule with Powershell, Type 0=include, 1=exclude:<\/p>\n<pre>New-SPEnterpriseSearchCrawlRule -SearchApplication FASTSearchApp\u00a0 -Path \u201chttp :\/\/SharePoint\/Sites\/Secret\/*\u201d\u00a0 -Type 1<\/pre>\n<p>To output all your Crawl Rules, use this line of PowerShell:<\/p>\n<pre>get-SPEnterpriseSearchServiceApplication | get-SPEnterpriseSearchCrawlRule | ft<\/pre>\n<p>The CmdLet \u201cget-SPEnterpriseSearchCrawlRule\u201d requires a Service Application object, so we simply pipe one in using the \u201cget-SPEnterpriseSearchServiceApplication\u201d CmdLet.<\/p>\n<p>You can then pipe it to whatever you want.\u00a0 \u201cft\u201d is an alias for Format-Table, which is the default output, but you can just as easily pipe it to a file for automatic documentation.<\/p>\n<p>This is especially useful when playing with your crawl rules.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Want to tune your Search crawling? There\u2019s plenty of benefit to be had refining how Search crawls in SharePoint. Eliminate useless page hits, or documents that will fail crawl processing. It\u2019s another way to exclude sensitive documents as well, if you can find a suitable search crawl exclusion rule. I found out the hard way [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":2094,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[58],"tags":[],"class_list":["post-2093","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-fast-search"],"acf":[],"_links":{"self":[{"href":"https:\/\/poiseddevelopers.com\/reality-tech\/wp-json\/wp\/v2\/posts\/2093","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/poiseddevelopers.com\/reality-tech\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/poiseddevelopers.com\/reality-tech\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/poiseddevelopers.com\/reality-tech\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/poiseddevelopers.com\/reality-tech\/wp-json\/wp\/v2\/comments?post=2093"}],"version-history":[{"count":6,"href":"https:\/\/poiseddevelopers.com\/reality-tech\/wp-json\/wp\/v2\/posts\/2093\/revisions"}],"predecessor-version":[{"id":3954,"href":"https:\/\/poiseddevelopers.com\/reality-tech\/wp-json\/wp\/v2\/posts\/2093\/revisions\/3954"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/poiseddevelopers.com\/reality-tech\/wp-json\/wp\/v2\/media\/2094"}],"wp:attachment":[{"href":"https:\/\/poiseddevelopers.com\/reality-tech\/wp-json\/wp\/v2\/media?parent=2093"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/poiseddevelopers.com\/reality-tech\/wp-json\/wp\/v2\/categories?post=2093"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/poiseddevelopers.com\/reality-tech\/wp-json\/wp\/v2\/tags?post=2093"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}