Using SharePoint Search to boost your PowerShell powers

Getting started:  Example Script (Zip), Example Script (preview), SharePoint Search Query Tool (optional, but handy)

Have you ever needed to get some information from one or more of your SharePoint environments to analyze or inspect information from them? That may seem daunting or painful to aggregate, especially in bulk,  but SharePoint search can be your best friend in these instances.  This blog post will dissect the following example script to show you one possible way you can take advantage of this powerful pairing.

Since SharePoint search can be targeted to crawl/index many SharePoint environments as content sources, it natively does much of the content aggregation and correlation ‘hard work’.  Thus, providing you with SharePoint content catalogued by the known metadata attributes it understands of each object.

Hmmm.  Now to the tricky part.  How can I ask SharePoint Search for this catalogued information via PowerShell?

Enter, the SharePoint Search API: /_api/search/

First, we need to establish something called a Form Digest Request token.  I won’t go too far into what this is, but you can effectively consider it an authorization token the API uses to recognize your session is valid from a starting URL.

We accomplish this with a pair of functions, one to post a request for SharePoint asking the API for a token:

Post-RESTRequest ($endpoint, $postFormDigest, $body, $credential)

And another to call the post function and process the results:

Get-SPRESTFormDigest($xreqURL)

This is stored to a variable for use when asking the API for information:

$formDig = Get-SPRESTFormDigest($global:spAdminSite)

Now, with the Form Request Digest key in hand, we can talk to the Search API … but what should we ask it for?  How should we shape the request?

For the purposes of this discussion, we would like to ask SharePoint Search for a listing of all known SharePoint sites (and webs), including some known metadata: the path, type of site object, name, guid, size, site guid, and it’s SharePoint version.  Whew! Now you might see how this could be daunting; asking multiple environments for all of this to then aggregate together.

Fortunately, SharePoint search will have all of this information (and much more) for everything indexed in it’s content source.  We just have to ‘politely’ ask search for the the information using the correct syntax in order for it to return the pre-aggregated data for us to use.

Essentially the restful request structure we are going to ask for would compound as follows:

$global:spAdminSite +"/_api/search/query?querytext='*'&startrow=" + $rowNum + "&rowlimit=1000&selectproperties='Path%2cSiteName%2cTitle'&refinementfilters='contentclass:or(`"STS_Site`",`"STS_Web`")'&sortlist='Path:descending'"

::wiping sweat from brow::

That’s a bit ugly, IMHO! Let’s break that down into the pieces we wanted to ask search for, starting from blank:

$queryBuildBits = ""

Then we’ll add the location of the API (noting that the adminSite is arbitrary but you must be able to authenticate to it):

#region where to query?
$queryBuildBits += $global:spAdminSite +"/_api/search/query"
#endregion

Next, let’s tell the API information we’d like to ask for, this query parameter is synonymous with the search box on a search page. In our example we want all of them so we’ll use a wildcard as shown here:

#region what to look for?
$queryBuildBits += "?querytext='*'"
#endregion

Now, I’ll inject an assumption that we have MANY sites to ask Search about.  In that case, we need to account for the fact that search will only return so many results at once.  Due to this we need to request and collect the returned results in batches.  Here, $rowNum would start at 1, and our logic will increase that for the next batch if, as we process the return(s), the number of results is what we’ve specified.  We can specify how many to return per batch from Search (to then measure), as follows:

#region where to start and how many?
$rowsToRet = 350
$queryBuildBits += "&startrow=" + $rowNum + "&rowlimit=" + $rowsToRet
#endregion

Next, if you recall, we want to specify the metadata attributes we care about.  That is done with the &selectproperties= portion of the API call.  First we build an array of the properties we want($retProps), then enumerate the array to add them to our request string, and finally close out the properties with a graceful string addition (Title including the ):

#region what to return? 
$retProps = @('Path','SiteName','WebID','contentclass','Size','SiteID','SPVersion')

$queryBuildBits += "&selectproperties='"
$retProps | %{
    $queryBuildBits += ($_ + ",");
}

$queryBuildBits += ",Title'";
#endregion

So far, this would still give me any kind of result that SharePoint Search knew of.  However, we only wanted to ask search for the metadata as it pertained to Sites (and webs).  To narrow the results we can use a refiner and append it to our query string as show here:

#region refine your results?
$queryBuildBits += "&refinementfilters='contentclass:or(`"STS_Site`",`"STS_Web`")'"
#endregion

OK.  Who’s ready to talk to Search?  I know I am.  With my query string all put together, and my Form Digest Key in hand, we can submit a RESTFul query to the API via a third function I’ve provided in the script:

Get-RESTRequest ($endpoint, $getFormDigest, $body, $credential )

As with most RESTful SharePoint APIs, this will return a bulk of JSON including ranking information, metrics, etc. To inspect all of this, simply make the call to the API:

(Get-RESTRequest $queryBuildBits $formDig)

BUT, we specifically want the returned results for our requested query, so we can gather that from the data tree as follows:

$srchRes = (Get-RESTRequest $queryBuildBits $formDig).d.query.PrimaryQueryResult.RelevantResults.Table.Rows.results.SyncRoot

SWEET!!!!! By now, you should have the first set of results!  If you have less than the number of sites you specified above, you are done and can process the data accordingly.  If you happen to have more than the number specified for your batch return you will need to gather these results and then submit your request for the next batch of results.

To do so, firstly we gather each of this batch to a collection. Note, I won’t go into the performance benefits in this post, but I tend to deal in exceptionally large data structures that use .NET queues as seen in this example.  You can use whatever hash/array structure you are comfortable with.

if( ($srchRes -ne $null) -and 
    ($srchRes.SyncRoot.Count -gt 0)
){
  $srchRes.SyncRoot | %{
      $global:qSPSiteURLs.Enqueue($_)
  }
}

Now that we’ve gathered this batch of results to our collection, we can increment our starting row variable, then repeat the Search API call again with the adjusted starting location.

if( ($srchRes.SyncRoot.count -eq $rowsToRet)
){
    $rowNum = $rowNum + $rowsToRet
}
else{
    $rowNum = 0
}

You might be asking how to kick the next request off.  In the example script provided, all of this is wrapped in a while that triggers on our $rowNum variable

while ($rowNum -ne 0){
    #put all logic in here 
}

To summarize, this example has illustrated a method that has saved me countless hours and tremendous amounts of brain damage when trying to stitch together large datasets from multiple farms + online.  I hope this helps to get you started thinking about how to leverage the SharePoint Search platform to do much of the ‘hard work’ in the collection and aggregation of typed objects throughout your entire content footprint.