grep through binary file - regex

I have a binary file that contains lines in the following form:
blabla^A2013.04.03-09:35:04^Ablabla
where ^A is the binary character 001.
I want to be able to perform a grep that will give me only what is between the ^A (not the whole line).
I know that flag -o is only for match but I don't know how to search for that binary character

You should be able to include control-A on the command line by simply typing control-A where you want it to appear. At worst, you might need to type control-V before it. You can also explore notations using bash's ANSI-C quoting such as $'\001'.

Try doing this :
grep --binary-files=text pattern file.txt
so :
$ grep --binary-files=text -oP '\^\K[^\^]+(?=\^)' file.txt
A2013.04.03-09:35:04

Related

Can I perform a 'non-global' grep and capture only the first match found for each line of input?

I understand that what I'm asking can be accomplished using awk or sed, I'm asking here how to do this using GREP.
Given the following input:
.bash_profile
.config/ranger/bookmarks
.oh-my-zsh/README.md
I want to use GREP to get:
.bash_profile
.config/
.oh-my-zsh/
Currently I'm trying
grep -Po '([^/]*[/]?){1}'
Which results in output:
.bash_profile
.config/
ranger/
bookmarks
.oh-my-zsh/
README.md
Is there some simple way to use GREP to only get the first matched string on each line?
I think you can grep non / letters like:
grep -Eo '^[^/]+'
On another SO site there is another similar question with solution.
You don't need grep for this at all.
cut -d / -f 1
The -o option says to print every substring which matches your pattern, instead of printing each matching line. Your current pattern matches every string which doesn't contain slashes (optionally including a trailing slash); but it's easy to switch to one which only matches this pattern at the beginning of a line.
grep -o '^[^/]*' file
Notice the addition of the ^ beginning of line anchor, and the omission of the -P option (which you were not really using anyway) as well as the silly beginner error {1}.
(I should add that plain grep doesn't support parentheses or repetitions; grep -E would support these constructs just fine, of you could switch to toe POSIX BRE variation which requires a backslash to use round or curly parentheses as metacharacters. You can probably ignore these details and just use grep -E everywhere unless you really need the features of grep -P, though also be aware that -P is not portable.)

Grep for lines not beginning with "//"

I'm trying but failing to write a regex to grep for lines that do not begin with "//" (i.e. C++-style comments). I'm aware of the "grep -v" option, but I am trying to learn how to pull this off with regex alone.
I've searched and found various answers on grepping for lines that don't begin with a character, and even one on how to grep for lines that don't begin with a string, but I'm unable to adapt those answers to my case, and I don't understand what my error is.
> cat bar.txt
hello
//world
> cat bar.txt | grep "(?!\/\/)"
-bash: !\/\/: event not found
I'm not sure what this "event not found" is about. One of the answers I found used paren-question mark-exclamation-string-paren, which I've done here, and which still fails.
> cat bar.txt | grep "^[^\/\/].+"
(no output)
Another answer I found used a caret within square brackets and explained that this syntax meant "search for the absence of what's in the square brackets (other than the caret). I think the ".+" means "one or more of anything", but I'm not sure if that's correct and if it is correct, what distinguishes it from ".*"
In a nutshell: how can I construct a regex to pass to grep to search for lines that do not begin with "//" ?
To be even more specific, I'm trying to search for lines that have "#include" that are not preceeded by "//".
Thank you.
The first line tells you that the problem is from bash (your shell). Bash finds the ! and attempts to inject into your command the last you entered that begins with \/\/. To avoid this you need to escape the ! or use single quotes. For an example of !, try !cat, it will execute the last command beginning with cat that you entered.
You don't need to escape /, it has no special meaning in regular expressions. You also don't need to write a complicated regular expression to invert a match. Rather, just supply the -v argument to grep. Most of the time simple is better. And you also don't need to cat the file to grep. Just give grep the file name. eg.
grep -v "^//" bar.txt | grep "#include"
If you're really hungup on using regular expressions then a simple one would look like (match start of string ^, any number of white space [[:space:]]*, exactly two backslashes /{2}, any number of any characters .*, followed by #include):
grep -E "^[[:space:]]*/{2}.*#include" bar.txt
You're using negative lookahead which is PCRE feature and requires -P option
Your negative lookahead won't work without start anchor
This will of course require gnu-grep.
You must use single quotes to use ! in your regex otherwise history expansion is attempted with the text after ! in your regex, the reason of !\/\/: event not found error.
So you can use:
grep -P '^(?!\h*//)' file
hello
\h matches 0 or more horizontal whitespace.
Without -P or non-gnu grep you can use grep -v:
grep -v '^[[:blank:]]*//' file
hello
To find #include lines that are not preceded by // (or /* …), you can use:
grep '^[[:space:]]*#[[:space:]]*include[[:space:]]*["<]'
The regex looks for start of line, optional spaces, #, optional spaces, include, optional spaces and either " or <. It will find all #include lines except lines such as #include MACRO_NAME, which are legitimate but rare, and screwball cases such as:
#/*comment*/include/*comment*/<stdio.h>
#\
include\
<stdio.h>
If you have to deal with software containing such notations, (a) you have my sympathy and (b) fix the code to a more orthodox style before hunting the #include lines. It will pick up false positives such as:
/* Do not include this:
#include <does-not-exist.h>
*/
You could omit the final [[:space:]]*["<] with minimal chance of confusion, which will then pick up the macro name variant.
To find lines that do not start with a double slash, use -v (to invert the match) and '^//' to look for slashes at the start of a line:
grep -v '^//'
You have to use the -P (perl) option:
cat bar.txt | grep -P '(?!//)'
For the lines not beginning with "//", you could use (^[^/]{2}.*$).
If you don't like grep -v for this then you could just use awk:
awk '!/^\/\//' file
Since awk supports compound conditions instead of just regexps, it's often easier to specify what you want to match with awk than grep, e.g. to search for a and b in any order with grep:
grep -E 'a.*b|b.*a`
while with awk:
awk '/a/ && /b/'

Grep matching two or more patterns within pattern file

I'd like to use grep to pull in matching patterns from a file only if the line contains two or more patterns contained within my pattern file. This is my rough idea of what the syntax looks like, but it doesn't work. Any pointers?
egrep -f -i pattern.txt {2,} file.txt >> output.txt
grep -E '/pattern1/' -E '/pattern2/' file
In this way you can scan multiple pattern in a single line using this command. try to google 'regular expressions with grep tutorial' and you will find the answer. also be specific when mentioning the pattern you want to search in a line.

Linux - Only find a pattern within a line, not the whole line

I want to use a regex to find a pattern in a file. That pattern may be in the middle of a line, but I don't want the whole line. I tried grep -a pattern file but this returns the entire line that contains the regex. The following is an example of what I'm trying to do. Does anyone know a way to do this?
Example:
Input: AAAAAAAAAAAAAXxXxXxXxBananasyYyYyYyYBBBBBBBCCCCCC
Regex: Xx.*yY
Ouput: XxXxXxXxBananasyYyYyYyY
you were close, you need the -o flag
grep -o 'Xx.*yY' <<<AAAAAAAAAAAAAXxXxXxXxBananasyYyYyYyYBBBBBBBCCCCCC
XxXxXxXxBananasyYyYyYyY
Use the -o option to print just the part of the line that matches the regexp
grep -o pattern file
In addition to grep -o (the simplest way), there are a couple of other options:
In bash, without relying on any particular implementation of grep:
$ regex='Xx.*yY'
$ [[ AAAAAAAAAAAAAXxXxXxXxBananasyYyYyYyYBBBBBBBCCCCCC =~ $regex ]]
$ echo ${BASH_REMATCH[0]}
XxXxXxXxBananasyYyYyYyY
Using expr, which is a little unwieldy (in part because the regular expression is implicitly anchored to the beginning of the string), but is defined by the POSIX standard so it should work on any POSIX platform, regardless of the shell used.
$ expr AAAAAAAAAAAAAXxXxXxXxBananasyYyYyYyYBBBBBBBCCCCCC : '[^X]*\(Xx.*yY\)'
XxXxXxXxBananasyYyYyYyY

Grep regular expression to find words in any order

Context: I want to find a class definition within a lot of source code files, but I do not know the exact name.
Question: I know a number of words which must appear on the line I want to find, but I do not know the order in which they will appear. Is there a quick way to look for a number of words in any order on the same line?
For situations where you need to search on a large number of words, you can use awk as follows:
awk "/word1/&&/word2/&&/word3/" *.c
(If you are a cygwin user, the command is gawk.)
If you're trying to find foo, bar, and baz, you can just do:
grep foo *.c | grep bar | grep baz
That will find anything that has all three in any order. You can use word boundaries if you use egrep, otherwise that will match substrings.
While this is not an exact answer your grep question, but you should check the "ctags" command for generating tags file from the source code. For the source code objects this should help you a much more than an simple grep. check: http://ctags.sourceforge.net/ctags.html
Using standard basic regex recursively match starting from the current directory any .c file with the indicated words (case insesitive, bash flavour):
grep -r -i 'word1\|word2\|word3' ./*.c
Using standard extended regex:
grep -r -i -E 'word1|word2|word3' ./*.c
You can also use perl regex:
grep -r -i -P 'word1|word2|word3' ./*.c
If you need to search with a single grep command (for example, you are searching for multiple pattern alternatives on stdin), you could use:
grep -e 'word1.*word2' -e 'word2.*word1' -e 'alternative-word'
This would find anything which has word1 and word2 in either order, or alternative-word.
(Note that this method gets exponentially complicated as the number of words in arbitrary order increases.)