LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 06-02-2012, 07:38 PM   #1
rmknox
Member
 
Registered: May 2010
Posts: 354

Rep: Reputation: 34
Fedora - GNU grep 2.8 - xls files


using GNU grep 2.8 I am trying to find text strings in xls files with no success

maybe they use 16 bit characters?
maybe I dont know how to input a regular expression and ask that it match 16 bit characters?

I'm searching to find whether the files in the directory have the text string Consult

grep -c Consult *.xls
returns all zeros - and yet I know that the string appears in 2 of the files

grep -c 'Consult' *.xls - same deal

Im tired and forgetful - is there a way with escapes or whatever to tell it to try those chracters as 16 bit characters - or more generally - what should I do? (except perhaps take a nap)?

Dick

interesting
grep --binary -c 'Con' *.*
finds files including the files I want
grep --binary -c 'Cons' *.*
does not

And yet there are files with the word Consult involved

Last edited by rmknox; 06-02-2012 at 08:48 PM.
 
Old 06-03-2012, 04:08 AM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
I'd guess it has something to do with how grep sees the binary stream, but I don't really know. You might want to try using the -a option instead.

As another option, how about using strings to extract the raw text, and grepping that? You'll have to run it in a loop to get the filenames, however.


This will print out any files that contain "Cons" as a text string, in the same format as grep -c.

Code:
for f in *.xls; do
	c=$( strings "$f" | grep -c "Cons" ) && echo "$f:$c"
done
Change the "&&" to ";" if you want it to print out the results of all the files.

If all you really want is to know which files match, and you don't care about the actual count, you can shorten it a bit:

Code:
for f in *.xls; do
	strings "$f" | grep -q "Cons" && echo "$f"
done
 
1 members found this post helpful.
Old 06-03-2012, 11:47 PM   #3
rmknox
Member
 
Registered: May 2010
Posts: 354

Original Poster
Rep: Reputation: 34
David

Clearly you are a clever and knowledgable guy - thank you much - Dick
 
Old 06-04-2012, 12:01 AM   #4
rmknox
Member
 
Registered: May 2010
Posts: 354

Original Poster
Rep: Reputation: 34
David Darn!

I tried it and get the same result. Co is found Cons is not.

Reminds me of some work I did on the PDP 8. DEC had 2 operating systems OS8 and some business related op sys. Similar magic. Turned out that one op sys read all sectors in the floppy in sequence and the other used alternative sectors - with the result that someting that was contiguous when read on one op sys was not when read on the other.

Dick
 
Old 06-04-2012, 10:56 AM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Whatever the reason, it seems like the string just doesn't exist in the file as you think it should. What do you get when you run strings alone?
 
Old 06-05-2012, 08:28 AM   #6
rmknox
Member
 
Registered: May 2010
Posts: 354

Original Poster
Rep: Reputation: 34
Hi David

strings does not find the chracter strings - I tried all the encodings
but
I did locate the microsoft documentation
but
I'm out of time to research it
in case you are interested it is located here

http://www.microsoft.com/interop/doc...ryFormats.mspx

strange - when I use the link icon provided on this editing page i get what appers to be a doubling of the url -- maybe it displays correctly when not in editing mode

thanks for your help
Dick
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
GNU grep 2.5.1-55 bug? Linux_Kidd Red Hat 1 11-17-2011 01:11 PM
Question about either csv or xls files Versatile Green Linux - Software 4 11-14-2010 07:08 AM
Protection in openoffice spreadsheet created .xls files PClOStinspace Linux - Software 2 06-06-2009 12:13 PM
opening xls files in kbasic ashwinkumar Programming 1 01-25-2009 12:20 AM
which linux application use for reading .xls files Eileen Linux - Software 1 05-30-2006 11:45 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 12:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration