Missing Metadata in SharePoint with PDFs
I recently got a call from my end-users stating that their metadata was mysteriously disappearing when they are working with their PDFs that are stored in SharePoint.
Take the following scenario. A user creates a PDF by scanning a paper document using Adobe Acrobat Professional. They then upload that document into SharePoint and assign it the required metadata (shown below).
A few days later, the user decides they need to edit the PDF. They open the PDF using the drop down menu and selecting Edit Document.
Once Adobe Acrobat Professional launches, they perform an OCR on the document so they can edit it. Selecting OCR Text Recognition > Recognize Text Using OCR from the Document menu they let it grind away for a few minutes. With the OCR complete, the user completes the edits and simply clicks Save.
After closing Adobe Acrobat Professional, the SharePoint document library refreshes and all the metadata is gone (see below).
After researching the issue I determined that when you perform an OCR on a document that has more than 1 page, Adobe Acrobat will actually delete the original document (thus removing all metadata associated with it) and create a brand spanking new file in its place (the evidence of this can be found by looking in the SharePoint recycle bin).
Doing a similar test on a file stored on a local drive had the same results. If you fill out the Summary tab by right-click on a PDF document and go to Properties and perform the steps above you will also lose your metadata.
I called Adobe support on this issue and they first responded with “Please explain how SharePoint works, we are unfamiliar with it”. Following my explanation and being put on a hold several times, I finally got this response, “We believe the problem is caused by SharePoint.”
I attempted numerous times to convince the phone support operator that it was not a SharePoint specific problem but was unsuccessful.
I came up with a solution that meet the needs of my users and thought I would share it. It is not revolutionary but uses built in SharePoint functionality.
If your users have a need to manipulate a PDF document in anyway, have them follow these steps.
Check out the document to the local drafts folder
This process will actually put the PDF in the users My Documents\SharePoint Drafts folder on their computer. Any further edits to the PDF by this user will be made to their local copy.
Once they have completed the edits, simply check the file back in. This will move the file from their local computer back into the SharePoint library. However, all PDF edits and metadata remains intact. (Yippee!)
I have mainly seen this problem occur with the OCR process and sometimes with the amend process when creating a PDF from a scanner. To be safe, I have recommended to my users to use the solution above at all times when manipulating a PDF document. It is not perfect, but it works.