Saturday, April 7, 2012

Trouble installing perl CPAN Modules on Mac OS X

I recently ran into problems trying to install perl CPAN modules. I did the following keyword searches to find answers:

mac won't install CPAN modules

Finally, I decided to upgrade the perl installation on my machine from 5.8 to 5.12. This fixed everything - I can now install perl modules using CPAN without problems. Here is how I did it:

(ref: http://stackoverflow.com/questions/3942520/how-do-i-upgrade-my-macports-perl-installation. And note that I was using macports to do the installation)

sudo port uninstall -f perl5.8
sudo port install perl5 +perl5_12
sudo port -f activate perl5.12  
 
You can check the installation by typing:

perl -v

So far, it works well.

Now, I want to be able to pull context (text) from PDFs. I have come across the following modules:

(ref: http://www.perlmonks.org/?node_id=634794, http://stackoverflow.com/questions/5977969/how-to-parse-pdf-files-in-perl)

PDF::parse
PDF:API2
PDF

I take it that these produce XML from from the PDF content, and then one can parse the XML using the following modules:

(ref: http://stackoverflow.com/questions/5977969/how-to-parse-pdf-files-in-perl)

XML::Twig
XML::Simple

I haven't started pulling PDF content yet though. I think I will just use pdftohtml, like the above link says:

sudo port install pdftohtml

(I could also try xpdf)

No comments: