Hi, Im doing a small project in C++ in LINUX PLATFORM.i need to search 10 or more PDF files and find required data.how can i do so?. i will make my question more clear with following eg

suppose i have ten text books all about c++ and i need info about the topic array and how i can search the pdf and find data?

thanks in advance

asked 07 Jun '10, 14:41

dili's gravatar image

accept rate: 0%

edited 07 Jun '10, 15:54

jeremy's gravatar image

jeremy ♦♦

Please accept an answer so the question/answer can be finished. Or provide more details so we can help.

(20 Apr '11, 13:55) rfelsburg ♦

Not the cleanest method, but you can use:

pdftotext file.pdf -

To convert the PDF to a text stream on stdout and then use whatever text manipulation commands you'd like from there. To convert the PDF to a text file, replace - with a filename:

pdftotext file.pdf file.txt



answered 07 Jun '10, 15:54

jeremy's gravatar image

jeremy ♦♦
accept rate: 37%

does pdftotext keeps/inserts tags so to easily find chapters/section/topic entries?

(07 Jun '10, 18:09) pmarini

No, pdftotext simply converts a PDF to plain text. If you'd like something more in-depth you may want to setup Lucene, htdig or some other indexing method that supports PDF.

(08 Jun '10, 01:06) jeremy ♦♦

The Lucene engine supports PDF searching. Some FLOSS projects use the Lucene Engine:

  • OpenCms
  • regain

It uses the Solr subproject for pdf searching.

Propably, this question will help you :)
How can I search PDF? on Stackoverflow

Good luck!


answered 07 Jun '10, 16:36

guerda's gravatar image

accept rate: 38%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 07 Jun '10, 14:41

Seen: 1,152 times

Last updated: 20 Apr '11, 13:55

Related questions

powered by OSQA