I recently ran into problems trying to install perl CPAN modules. I did the following keyword searches to find answers:
mac won't install CPAN modules
Finally, I decided to upgrade the perl installation on my machine from 5.8 to 5.12. This fixed everything - I can now install perl modules using CPAN without problems. Here is how I did it:
(ref: http://stackoverflow.com/questions/3942520/how-do-i-upgrade-my-macports-perl-installation. And note that I was using macports to do the installation)
perl -v
So far, it works well.
Now, I want to be able to pull context (text) from PDFs. I have come across the following modules:
(ref: http://www.perlmonks.org/?node_id=634794, http://stackoverflow.com/questions/5977969/how-to-parse-pdf-files-in-perl)
PDF::parse
PDF:API2
PDF
I take it that these produce XML from from the PDF content, and then one can parse the XML using the following modules:
(ref: http://stackoverflow.com/questions/5977969/how-to-parse-pdf-files-in-perl)
XML::Twig
XML::Simple
I haven't started pulling PDF content yet though. I think I will just use pdftohtml, like the above link says:
sudo port install pdftohtml
(I could also try xpdf)
mac won't install CPAN modules
Finally, I decided to upgrade the perl installation on my machine from 5.8 to 5.12. This fixed everything - I can now install perl modules using CPAN without problems. Here is how I did it:
(ref: http://stackoverflow.com/questions/3942520/how-do-i-upgrade-my-macports-perl-installation. And note that I was using macports to do the installation)
sudo port uninstall -f perl5.8
sudo port install perl5 +perl5_12
sudo port -f activate perl5.12
You can check the installation by typing:perl -v
So far, it works well.
Now, I want to be able to pull context (text) from PDFs. I have come across the following modules:
(ref: http://www.perlmonks.org/?node_id=634794, http://stackoverflow.com/questions/5977969/how-to-parse-pdf-files-in-perl)
PDF::parse
PDF:API2
I take it that these produce XML from from the PDF content, and then one can parse the XML using the following modules:
(ref: http://stackoverflow.com/questions/5977969/how-to-parse-pdf-files-in-perl)
XML::Twig
XML::Simple
I haven't started pulling PDF content yet though. I think I will just use pdftohtml, like the above link says:
sudo port install pdftohtml
(I could also try xpdf)
No comments:
Post a Comment