Thursday, May 9, 2013

Understanding and Getting SharePoint 2013 Search Managed Properties to Work

Technorati Tags: ,,

Many things have changed with SharePoint 2010 and SharePoint 2013 search. However, managed properties are still the mainstay of a robust search solution allowing users to fine tune their searches using refiners or property restrictions. SharePoint 2013 has a new focus on content driven search which makes understanding how managed properties work even more important. Unfortunately, the documentation on what crawled properties represent in the search schema has not improved. This leads to confusion and an unnecessary trial and error process to get the search schema to return the correct results.  Crawled properties are the properties generated from crawling the content in SharePoint. These are then mapped to managed properties sometimes automatically and many times by a search administrator. A good link explaining this process is here on TechNet http://technet.microsoft.com/en-us/library/jj613136.aspx. In this post I will explain some of the strange behavior you may encounter when trying to use SharePoint 2013 Search out of the box and ways to fix the behavior.

What happened to my Auto-Generated managed properties?

SharePoint 2010 has the ability to automatically generate managed properties. This is done by editing the crawled property category and selecting the option to auto-generate managed properties when new crawled properties are discovered. The auto-generated managed properties always matched the type of the crawled property. So if a new date time SharePoint column was discovered, then the crawl process would generate a date time managed property. SharePoint 2013 has put a new twist on this. A new managed property is only auto-generated when a new site column is created and used. What I mean by “used” is that a value must be stored in the content database using the new site column. A second twist is that two new crawled properties are created for each site column. One is in the format of ows_q_{Data Type Code}_{Site Column Internal Name} and another in the format of ows_{Site Column Internal Name}. For example if you had a site column named salesdate the two crawled properties generated would look like the picture below:

 

The auto-generated managed property is named {Site Column Internal Name}+OWS+{Data Type Code}. Some examples of the date type codes are DATE,BOOL, CHCS (CHOICE), and GUID. Unfortunately,the new managed property generated is always a text data type. This is apparently by design for some unknown reason. A problem arises if you want to use the auto-generated property in a property restriction within a query transformation.  For example, if the site column is a date time type and you want to use operators like < or > with the auto-generated text managed property it will produce a syntax error. The auto-generated managed properties are rather useless if they are mapped to non-text crawled properties.

You should avoid using the auto-generated managed property and create your own with the correct data type and map it to the crawled property named just with “ows_”, in this case ows_salesdate.

I am not sure why this auto-generation process is set up and there seems to be no option to turn if off.

Where is my title for my Microsoft Word file?

One of the biggest problems I am hearing about SharePoint 2013 search is that the title managed property for Microsoft Word files shows the first line in the content of the document and not the title that is given to it when it is uploaded to SharePoint. In SharePoint 2010 you could turn this behavior off by setting the enableOptimisticTitleOverride to zero in the HKLM\SOFTWARE\Microsoft\Office Server\14.0\Search\Global\Gathering Manager registry entry. This no longer works in SharePoint 2013. The work around on this is to create a new managed property and map the TermTitle and Title crawled properties. Trying to do this on the out of the box Title managed property has no effect. This behavior is very inconvenient for search administrators. Once again there seems to be no reasoning around this.

What is going on with the ContentType managed property?

Searching on the ContentType managed property in SP2010 did not work since is was not retrievable. In SP2013 the ContentType managed property can be retrieved but using it to search for a contenttype name does not work. The ContentType managed property is mapped to two crawled properties ows_ContentType and Basic:5. SP2013 stores both the contenttype name and the mime type name in this managed property.

The best way to search on the name of a contenttype is to use the new SPContentType managed property. This is mapped only to the ows_ContentType crawled property. You must make sure to mark this managed property as Retrievable in order to display it in the search results.

Why does my PDF file’s last modified date never change?

After uploading and crawling a PDF file the LastModifiedTime managed property value never changes. This  is caused by the managed property storing the value from the actual modified property stored in the document. To fix this problem you must re-order the crawled property mappings of the LastModifiedTime managed property. First start by checking the option to “Include content from the first crawled property that is not empty, based on the specified order”  then  make the ows_Modified crawled property first in the list. Finally do a full crawl. It appears the LastSavedDateTime crawled property represents the last modified property in the document.

Who is creating these documents?

When searching you may want to see who created the document in your search results. Unfortunately, the Author managed property will bring back a string that contains the claims identifier for the user who created the document.

 

This of course is not very user friendly. I looked at the CreatedBy managed property and this managed property does not return anything. To fix this add ows_created_x0020_by and move the Internal:105 and Internal:3 to the top of the crawled properties mappings and make sure the “Include content from the first crawled property that is not empty, based on the specified order” is checked. Make sure to do a full crawl.

You still need to experiment with the SharePoint 2013 search schema

The documentation on what crawled properties represent in SharePoint search is still missing after all these years. This is unfortunate because it causes so many unnecessary full crawls. How would anyone know what Internal:5 or Basic:105 represent? This prevents search administrators from building an effective search schema. A possible way of documenting these may be to plug into the new Content Enrichment web service and document what some of these unknown crawled properties represent. Sounds like fun. Maybe I will post the results here.

11 comments:

Anonymous said...

Very informative, thanks for sharing. Also is there any chance you could include the full post detail in the RSS feed?

Mikael Svenson said...

Very good post indeed. I missed it when you first wrote it and covers a lot of what I have encountered as well.

Anonymous said...

Very nice post! Thank you for sharing.

Do you allready have a solution for the enableOptimisticTitleOverride problem in SharePoint 2013?

Dominic Goulet said...

Hello,

I tried the custom title managed property and it still tries to fetch the first line for Word documents. This is pretty annoying... Thanks though for the article, it confirmed some thoughts we had.

Anonymous said...

I have been in conversations with the product team at Microsoft and I've been told that the enableOptimisticTitleOverride issue will be addressed in a cumulative update.

Tech me how to code said...

Good research work. Keep doing good work.Thanks for clearing my point.

Unknown said...

You deserve a metal for the "Where is my title for my Microsoft Word file?" section. I was troubleshooting a customized environment for two weeks. I never thought the issue was because of a OOB SharePoint Fail.

Lalit said...

Can anybody please explain fix of "Where is my title for my Microsoft Word file?"

maj said...

This is great! Thank you for this info! The part about the MS Word file title saved me hours (if not days) of research.

Mahesh Ronda said...

This is really informative, more useful than any MSDN documentation around search schema.

Mahesh Ronda said...

@Steve, Very nice post, more informative than MSDN articles on 2013 search schema. Resolved lot of queries i had.

Post a Comment