I'm using Opensuse 10.3 and like to know command line tools to search phrases in large number of pdf files inside a directory. Windows XP, the Explorer search allows this but is too slow. Is there grep tips here?
asked 13 Jul '10, 17:35
Because pdf's are compressed data you can't simply grep through them with the usual grep command. You can use strings on the file which prints all the ascii out of the file, but it's not guaranteed to work.
There are a number of open source apps out there that can be used, for instance a script using pdftotext would be easy to implement. pdftotext is a part of xpdf.
For OpenSuse 10.3 the rpm is below.
Using pdftotext and a simple bash loop
So you can understand what each part of the script does i've broken it down as well.
answered 13 Jul '10, 18:07
What I do is utilize pdftotext.
One way is:
If you have a large number of PDFs to go through, a simple script could be written to go through each PDF file.
answered 13 Jul '10, 18:11