Tuesday, September 8, 2009

Office 2003 files are second class citizens in SharePoint 2007

There are many ways to upload files to SharePoint 2007 and they all seem to work well. Unless your using Office 2003 or later.

Uploading files to SharePoint can be done many ways including the upload page, front page extensions remote procedure calls and webdav using Windows Explorer. Of course Microsoft Office uses its own proprietary ways which may involve a combination of webservice and rpc calls depending on what version of Office you are using. Many companies have upgraded to SharePoint 2007 and Office 2007, however; many older Office version files still exist. Companies are putting older Office documents in SharePoint 2007 without converting to Office 2007 format. This seems to be reasonable, until you need to view and edit the document's properties. The catch is the process of "property demotion". SharePoint has a process of property "promotion" and "demotion". When you upload a document and its properties then a document's properties get "promoted" to the matching SharePoint columns. Opening the document from SharePoint then demotes SharePoint properties to the matching document's properties so they can be viewed in the native application. SharePoint 2007 seems to have a problem with the demotion process for Office 2003 documents.

The main problem is using front page remote procedure call "put document". This method makes a direct call to the author.dll uploading the file binary and the corresponding properties you wish to populate. However, uploading Office 2003 files using this method causes an error on SharePoint 2007. After uploading a file you can examine the metadata that is generated for the document in the "AllDocs" table in the corresponding content database in sql server. Below is an example of what is generated in the metaInfo column of the table when you upload a file.

 

Subject:SW|
vti_error0:SX|Could not process the file mosstestsearch/test.doc as a Microsoft Office document.
vti_parserversion:SR|12.0.0.6219
Keywords:SW|
vti_cachedcustomprops:VX|Subject Keywords _Author _Category _Comments
vti_modifiedby:SR|BASESMCDEV\\steve.curran
vti_title:SW|foreign
ContentType:SW|Document
ContentTypeId:SW|0x01010017483E443739384889CC3E4B64BC3B6B
_Author:SW|joe.smith
_Category:SW|
vti_error:IX|1
_Comments:SW|
vti_author:SR|BASESMCDEV\\steve.curran
testcol:SR|STUFF

 

You can see when uploading Office 2003 files SharePoint adds a metadata error entry "vti_error0" stating "Could not process the file as a Microsoft Office document". Hmmm. I do believe this is an office document, just not an Office 2007 document. The document's properties are promoted correctly because I can go into the SharePoint UI and view the properties. For instance, vti_title maps to the title property and testcol maps to the testcol property. Unfortunately, if I open this document in Office 2007 none of the properties are visible. So why are these properties not getting demoted correctly? The key here seems to lie with the "vti_cachedcustomprops" property. You can see this is a space delimited list of property names. Apparently, this is what Office 2007 uses to read properties from Office 2003 documents when opening them from SharePoint. I tested this by editing the properties in the SharePoint UI which resulted in the following metadata being inserted into the metaInfo column of the AllDocs table.

 

vti_lmt:SW|Tue, 26 Feb 2008 19:36:47 GMT
vti_parserversion:SR|12.0.0.6219
_Category:SW|
vti_author:SR|BASESMCDEV\\steve.curran
vti_approvallevel:SR|
vti_categories:VW|
vti_assignedto:SR|
Keywords:SW|
vti_cachedcustomprops:VX|Subject  _Category testcol _Comments vti_approvallevel vti_categories vti_assignedto Keywords vti_title _Author
vti_modifiedby:SR|BASESMCDEV\\steve.curran
vti_title:SR|foreign
ContentType:SW|Document
ContentTypeId:SW|0x01010017483E443739384889CC3E4B64BC3B6B
vti_lat:SW|Tue, 26 Feb 2008 19:36:47 GMT
vti_ct:SW|Thu, 14 Feb 2008 16:29:48 GMT
vti_cachedtitle:SR|foreign
_Author:SW|joe.smith
testcol:SR|STUFF2

 

After modifying the properties via the SharePoint UI the "vit_cachedcustomprops" property is updated with the custom columns. Now when I open the document in Office 2007 the property values will show up because the columns in question are in this property list. It appears that property demotion of Office 2003 files only occurs after the properties have been modified via the object model. The proof of this can be seen in two additional cases. The first case is when you upload an Office 2003 document using the SharePoint upload page. Uploading the file and not entering in any properties will result in the standard "vti_cachedcustomprops" property seen in the first example. The second case is adding an Office 2003 file via webdav and dragging it into a document library folder via Windows Explorer. So what does the object model do? If you reflect the SPListItem.Update method and follow the code path you come to an internal method PrepareItemForUpdate which is called before the AddOrUpdateItem method. Unfortunately, the PrepareItemForUpdate is obfuscated. Dead end.

I tried getting around this by creating "vti_cachedcustomprops" property and sending it with the other properties when uploading via RPC. Unfortunately, it is ignored or just overwritten.

Many companies are now complaining that Office 2003 files are broken in SharePoint 2007 when editing properties in Office 2007. I would have to agree with this assessment. Users are not aware of this "property demotion" process and open the files from SharePoint into Office 2007. They make changes to the file and never look at the properties. They then save the file back to SharePoint and all the properties are overwritten with blank values since the existing properties were not demoted. So if the user has painstakingly categorized/indexed his document so it can be found easily via SharePoint Search, then all this is blown away because of this problem, rendering this document un-retrievable in most cases. This problem could also wreak havoc on workflow, information policies, auditing and item event handlers. This is what you call a "soft data error" where data is changed with no notification, so the process of losing metadata could go for a long time before anyone is aware of the problem.

I am not sure there is a solution other than to convert your documents to Office 2007 format. The property demotion works fine with these types of documents :-) I think the problem deserves Microsoft's immediate attention.

4 comments:

Anonymous said...

Great article! Helps me much to understand what is going on ...

Again! Thanks!!

Anonymous said...

Thank you for this great insight, I just found out about the property pro/demotion very recently, but this one file then stumped me, in that it actually prevent the upload.

And wow.. this is from 2009 and still a problem today.

Anonymous said...

Hi, I am hoping you can help me. I am looking for a definitive reverence to the data type information before the metadat values ie:
SR|
SW|
VW|
obviously the first character is the data type, second is readable / writable / system and last is a delimiter however i'm looking for a reference on the types of char in the first column. Would you know where i can find this reference?

Joel Plaut said...

After much research, I found how to disable the Document Parser:
http://reality-tech.com/2014/01/14/working-around-the-sharepoint-document-parser-for-office-2003-documents/

Post a Comment