Postby JohnThompsonJTSoftware939 » May 27th, 2012 5:16 am
Thank you so much for the suggestion. I took a quick look at the "lite" version of beginner lesson 2, and it works fine for copy and pasting, and extracting text. Ironically, the "lite" versions are a lot bigger files (perhaps they have embedded more font information?), but I don't mind, as long as I can get the text out of them.
Having dug into it a little more, apparently you can create .pdfs which use font entry IDs instead of standard character values, such that the pdfs don't even have any usable text in them, unless a special character map (CMAP) is available to convert them to characters. Some of the text extraction tools (Adobe or otherwise) actually use optical character recognition techniques to look at the rendered glyphs to convert it back to text. This is just unbelievable that Adobe would do something so short-sighted. I wish they would just use UTF-8 encodings and be done with it.
Thanks again!
-John