I have a text file
$ cat test.log
SYB-01001
SYB-18913
SYB-02445
SYB-21356
I want to grep for 01001 and 18913 only whats the way to do this
I want the output to be
SYB-01001
SYB-18913
SYB-02445
I tried this but not sure whats wrong with it
grep 'SYB-(18913)|0*)' test.log
Use the -E flag for "extended regular expressions" with grep.
e.g.
grep -E 'SYB-(0|18913)' test.log
Other things to be aware of:
parentheses must match (for every opening bracket you want a closing bracket)
0* means zero or more 0 characters - in truth this will match everything
Try that :
grep 'SYB-\(18913\|0*\)' test.log
But you maybe don't want the 0* part to act like this. Maybe 0+ is better.
awk
awk '/SYB-(0|18913)/' file
Your brackets are out. Try
grep 'SYB-[18913|0*]' test.log
Related
I have a text file with the following pattern written to it:
TIME[32.468ms] -(3)-............."TEXT I WANT TO KEEP"
I would like to discard the first part of each line containing
TIME[32.468ms] -(3)-.............
To test the regular expression I've tried the following:
cat myfile.txt | egrep "^TIME\[.*\]\s\s\-\(3\)\-\.+"
This identifies correctly the lines I want. Now, to delete the pattern I've tried:
cat myfile.txt | sed s/"^TIME\[.*\]\s\s\-\(3\)\-\.+"//
but it just seems to be doing the cat, since it shows the content of the complete file and no substitution happens.
What am I doing wrong?
OS: CentOS 7
With your shown samples, please try following grep command. Written and tested with GNU grep.
grep -oP '^TIME\[\d+\.\d+ms\]\s+-\(\d+\)-\.+\K.*' Input_file
Explanation: Adding detailed explanation for above code.
^TIME\[ ##Matching string TIME from starting of value here.
\d+\.\d+ms\] ##Matching digits(1 or more occurrences) followed by dot digits(1 or more occurrences) followed by ms ] here.
\s+-\(\d+\)-\.+ ##Matching spaces91 or more occurrences) followed by - digits(1 or more occurrences) - and 1 or more dots.
\K ##Using \K option of GNU grep to make sure previous match is found in line but don't consider it in printing, print next matched regex part only.
.* ##to match till end of the value.
2nd solution: Adding awk program here.
awk 'match($0,/^TIME\[[0-9]+\.[0-9]+ms\][[:space:]]+-\([0-9]+\)-\.+/){print substr($0,RSTART+RLENGTH)}' Input_file
Explanation: using match function of awk, to match regex ^TIME\[[0-9]+\.[0-9]+ms\][[:space:]]+-\([0-9]+\)-\.+ which will catch text which we actually want to remove from lines. Then printing rest of the text apart from matched one which is actually required by OP.
This awk using its sub() function:
awk 'sub(/^TIME[[][^]]*].*\.+/,"")' file
"TEXT I WANT TO KEEP"
If there is replacement, sub() returns true.
$ cut -d'"' -f2 file
TEXT I WANT TO KEEP
You may use:
s='TIME[32.468ms] -(3)-............."TEXT I WANT TO KEEP"'
sed -E 's/^TIME\[[^]]*].*\.+//'
"TEXT I WANT TO KEEP"
The \s regex extension may not be supported by your sed.
In BRE syntax (which is what sed speaks out of the box) you do not backslash round parentheses - doing that turns them into regex metacharacters which do not match themselves, somewhat unintuitively. Also, + is just a regular character in BRE, not a repetition operator (though you can turn it into one by similarly backslashing it: \+).
You can try adding an -E option to switch from BRE syntax to the perhaps more familiar ERE syntax, but that still won't enable Perl regex extensions, which are not part of ERE syntax, either.
sed 's/^TIME\[[^][]*\][[:space:]][[:space:]]-(3)-\.*//' myfile.txt
should work on any reasonably POSIX sed. (Notice also how the minus character does not need to be backslash-escaped, though doing so is harmless per se. Furthermore, I tightened up the regex for the square brackets, to prevent the "match anything" regex you had .* from "escaping" past the closing square bracket. In some more detail, [^][] is a negated character class which matches any character which isn't (a newline or) ] or [; they have to be specified exactly in this order to avoid ambiguity in the character class definition. Finally, notice also how the entire sed script should normally be quoted in single quotes, unless you have specific reasons to use different quoting.)
If you have sed -E or sed -r you can use + instead of * but then this complicates the overall regex, so I won't suggest that here.
A simpler one for sed:
sed 's/^[^"]*//' myfile.txt
If the "text you want to keep" always surrounded by the quote like this and only them having the quote in the line starting with "TIME...", then:
sed -n '/^TIME/p' file | awk -F'"' '{print $2}'
should get the line starting with "TIME..." and print the text within the quotes.
Thanks all, for your help.
By the end, I've found a way to make it work:
echo 'TIME[32.468ms] -(3)-.............TEXT I WANT TO KEEP' | grep TIME | sed -r 's/^TIME\[[0-9]+\.[0-9]+ms\]\s\s-\(3\)-\.+//'
More generally,
grep TIME myfile.txt | sed -r ‘s/^TIME\[[0-9]+\.[0-9]+ms\]\s\s-\(3\)-\.+//’
Cheers,
Pedro
I want to parse html files to extract strings between "{{_(" and ")}}" using GREP. I tried something like this:
grep '"[^{{_(|)}}$]"' *.html
but it didn't work.
Can someone help me please?
Thanks!
You may use
grep -oP '(?<={{_\().+?(?=\)}})' file
Details
-o - output only matched substrings
-P - enable the PCRE regex engine
(?<={{_\().+?(?=\)}}) match:
(?<={{_\() - a location that is immediately preceded with {{+(
.+? - any 1 or more more chars other than line break chars, as few as possible
(?=\)}}) - a location that is immediately followed with )}} .
See the regex demo.
#Wiktor Stribiżew's answer works really good. However, if you have multiple files, you would get an output like this, where the respective file name per each match is also displayed:
foo.html: content abc
foo.html: test 123
bar.html: first match
bar.html: second match
So, if you are only interested in the matching string as output, you can try sed instead
sed -n 's/.*{{_(\(.*\))}}.*/\1/p' *.html
You can also count the unique occurrence of matches and things like that...
Update:
Or just use the -h | --no-filename with the grep that #Wiktor Stribiżew has provided.
grep -h -oP '(?<={{_\().+?(?=\)}})' *.html
Or the -c flag in order to display the count of matches per each file:
grep -c -oP '(?<={{_\().+?(?=\)}})' *.html
As in the posts before with it is possible to grep the value of an HTML property.
placeholder="SOME TEXT_HERE" -> grep -> "SOME TEXT_HERE"
grep -oP '(?<=placeholder=").+?(?=")' *html
I have a text file with contents like this:
here is some super text:
this is text that should
be selected with a cool match
And this is how it all ends
blah blah...
I am trying to get the two lines (but could be more or less lines) between:
some super text:
and
And this is how
I am using grep on an ubuntu machine and a lot of the patterns I've found seem to be specific to different kinds of regex engines.
So I should end up with something like this:
grep "my regex goes here" myFileNameHere
Not sure if egrep is needed, but could use that just as easy.
You can use addresses in sed:
sed -e '/some super text/,/And this is how/!d' file
!d means "don't output if not in the range".
To exclude the border lines, you must be more clever:
sed -n -e '/some super text/ {n;b c}; d;:c {/And this is how/ {d};p;n;b c}' file
Or, similarly, in Perl:
perl -ne 'print if /some super text/ .. /And this is how/' file
To exclude the border lines again, change it to
perl -ne '$in = /some super text/ .. /And this is how/; print if $in > 1 and $in !~ /E/' file
I don't see how it could be done in grep. Using awk:
awk '/^And this is how/ {p=0}; p; /some super text:$/ {p=1}' file
Give a try to pcregrep instead of normal grep. Because normal grep won't help you to fetch multiple lines in a row.
$ pcregrep -M -o '(?s)some super text:[^\n]*\n\K.*?(?=\n[^\n]*And this is how)' file
this is text that should
be selected with a cool match
(?s) Dotall modifier allows dot to match even newline characters also.
\K Discards the previously matched characters.
From pcregrep --help
-M, --multiline run in multiline mode
-o, --only-matching=n show only the part of the line that matched
TL;DR
With your corpus, another way to solve the problem is by matching lines with leading whitespace, rather than using a flip-flop operator of some sort to match start and end lines. The following solutions work with your posted example.
GNU Grep with PCRE Compiled In
$ grep -Po '^\s+\K.*' /tmp/corpus
this is text that should
be selected with a cool match
Alternative: Use pcregrep Instead
$ pcregrep -o '^\s+\K.*' /tmp/corpus
this is text that should
be selected with a cool match
Let's say I wanted to find all 12-letter words in /usr/share/dict/words that started with c and ended with er. Off the top of my head, a workable pattern could look something like:
grep -E '^c.{9}er$' /usr/share/dict/words
It finds:
cabinetmaker
calcographer
calligrapher
campanologer
campylometer
...
But that .{9} bothers me. It feels too magical, subtracting the total length of all the anchor characters from the number defined in the original constraint.
Is there any way to rewrite this regex so it doesn't require doing this calculation up front, allowing a literal 12 to be used directly in the pattern?
You can use the -x option which selects only matches that exactly match the whole line.
grep -xE '.{12}' | grep 'c.*er'
Ideone Demo
Or use the -P option which clarifies the pattern as a Perl regular expression and use a lookahead assertion.
grep -P '^(?=.{12}$)c.*er$'
Ideone Demo
You can use awk as an alternative and avoid this calculation:
awk -v len=12 'length($1)==len && $1 ~ /^c.*?er$/' file
I don't know grep so well, but some more advanced NFA RegEx implementations provide you with lookaheads and lookbehinds. If you can figure out any means to make those available for you, you could write:
^(?=c).{12}(?<=er)$
Maybe as a perl one-liner like this?
cat /usr/share/dict/words | perl -ne "print if m/^(?=c).{12}(?<=er)$/"
One approach with GNU sed:
$ sed -nr '/^.{12}$/{/^c.*er$/p}' words
With BSD sed (Mac OS) it would be:
$ sed -nE '/^.{12}$/{/^c.*er$/p;}' words
I’m stuck in trying to grep anything just after name=, include only spaces and alphanumeric.
e.g.:
name=some value here
I get
some value here
I’m totally newb in this, the following grep match everything including the name=.
grep 'name=.*' filename
Any help is much appreciated.
As detailed here, you want a positive lookbehind clause, such as:
grep -P '(?<=name=)[ A-Za-z0-9]*' filename
The -P makes grep use the Perl dialect, otherwise you'd probably need to escape the parentheses. You can also, as noted elsewhere, append the -o parameter to print out only what is matched. The part in brackets specifies that you want alphanumerics and spaces.
The advantage of using a positive lookbehind clause is that the "name=" text is not part of the match. If grep highlights matched text, it will only highlight the alphanumeric (and spaces) part. The -o parameter will also not display the "name=" part. And, if you transition this to another program like sed that might capture the text and do something with it, you won't be capturing the "name=" part, although you can also do that by using capturing parenthess.
Try this:
sed -n 's/^name=//p' filename
It tells sed to print nothing (-n) by default, substitute your prefix with nothing, and print if the substitution occurs.
Bonus: if you really need it to only match entries with only spaces and alphanumerics, you can do that too:
sed -n 's/^name=\([ 0-9a-zA-Z]*$\)/\1/p' filename
Here we've added a pattern to match spaces and alphanumerics only until the end of the line ($), and if we match we substitute the group in parentheses and print.
gawk
echo "name=some value here" | awk -F"=" '/name=/ { print $2}'
or with bash
str="name=some value here"
IFS="="
set -- $str
echo $1
unset IFS
or
str="name=some value here"
str=${str/name=/}
grep does not extract like you expect. What you need is
grep "name=" file.txt | cut -d'=' -f1-
grep will print the entire line where it matches the pattern. To print only the pattern matched, use the grep -o option. You'll probably also need to use sed to remove the name= part of the pattern.
grep -o 'name=[0-9a-zA-Z ]' myfile | sed /^name=/d