regex: find strings that do not begin with a certain prefix [duplicate]

regex: find strings that do not begin with a certain prefix [duplicate] - regex

This question already has an answer here:
Regular expression for a string that does not start with a sequence
(1 answer)
Closed 9 years ago.
I want to find a word in strings, but only if it doesn't begin with a prefix.
for example.
I'd like to find all the appearances of APP_PERFORM_TASK, but only if they are not starting with a prefix of CMD_DO("
so,
CMD_DO("APP_PERFORM_TASK") <- OK (i don't need to know about this)
BLAH("APP_PERFORM_TASK") <-- NOT OK, this should match my search.
I tried:
(?!CMD_DO\(")APP_PERFORM_TASK
But that doesn't produce the results I need. What I doing wrong?

Here's a quick way:
Use the --invert-match (also known as -v) flag to ignore CMD_DO and pipe the results to a second grep that only matches BLAH:
grep -v CMD_DO dummy | grep BLAH

Try replacing NegativeLookAhead (?!) with NegativeLookBehind (?<!) in your regex
(?<!CMD_DO\(")APP_PERFORM_TASK
Check this in action here

Based on your comment: Let's concentrate on command line tool grep
Here is grep solution without using -P switch (perl like regex):
grep 'APP_PERFORM_TASK' file | grep -v '^CMD_DO("'
Here is grep solution using -P switch and negative lokbehind:
grep -P '(?<!^CMD_DO\(")APP_PERFORM_TASK' file

Try this
(?!CMD_DO\(").*APP_PERFORM_TASK.*

To handle an input line with both the desirable and undesirable forms like:
CMD_DO("APP_PERFORM_TASK") BLAH("APP_PERFORM_TASK")
you'd need something like this in awk (using GNU awk for gensub()):
awk -v s="APP_PERFORM_TASK" 'gensub("CMD_DO\\(\\""s,"","") ~ s' file
i.e. get rid of all of the unwanted occurrences of the string then test whats left.

An awk version
awk '/APP_PERFORM_TASK/ && !/^CMD_DO/' file

Related

Exclude pattern in a Grep using extended regex [duplicate]

This question already has answers here:
How to invert a grep expression
(5 answers)
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 5 years ago.
I got a Grep that is killing me
Let's suppose i got the a file (file.xml) with the two below entries:
pos_ADF_datasource-1450-jdbc.xml
datasource-1450-jdbc.xml
Now If i run the below grep:
grep -E '(ADF)' file.txt
I got the below output:
pos_ADF_datasource-1450-jdbc.xml
Now i want to exclude ADF to get the other entry, it should be easy, but i tried it all and I'm unable to let it works:
grep -E '(?<!ADF)' file.txt
I tried many but i'm sure there is something i'm not considering that is making my expression not working...
I need and want to use the -E, i know it works not using the extended regex!
Please guys me light me!
RESOLVED:
Thanks Wiktor for the below consideration:
ERE POSIX does not support lookarounds. Even if you use -P excluding 'ADF' it will just match any position that is not preceded with ADF
You cannot check with an ERE regex if a string does not contain a pattern. Only if it is not equal, does not start/end with a pattern. You may only do it with a PCRE regex. grep -P '^(?!.*ADF)' file.txt
Then i figured it out with grep -Pe:
grep -Pe "^((?!.*ADF).)*-jdbc.xml$" file.xml

Match unknown substring with RegEx

How can I get an unknown substring with an regular expression? I know what's before and after the wanted string but I don't want the known part with in the result.
Example text:
jhgjgjgvocher_SOMETHINGHERE.dbhjjkghjkg
vocher_SOMETHINGELSE.db
I'm looking for 'SOMETHINGHERE' and 'SOMETHINGELSE' only.
vocher_ and .db are always before and after the relevant part but should not be in the result.
A working solution is:
cat test | egrep -o "vocher_.*\.db" | cut -d "_" -f2 | cut -d "." -f1
… but you know it's ugly.
Is it possible to search exactly for an unknown part with regex (in this case only the .* part), or do I need to use something like sed? Is there a better solution?

A simple solution using perl is the following:
perl -ne 'if (/vocher_(.*)\.db/){ print "$1\n";}' test_file.txt
This iterates line-by-line over the file and only prints the desired portion.

Use the following grep approach:
grep -Po '(?<=vocher_).+(?=\.db)' test
-P - allows Perl regular expressions
-o - prints only matched substrings
The output will be like below:
SOMETHINGHERE
SOMETHINGELSE

Extract word after a known pattern in UNIX [duplicate]

This question already has answers here:
get the next word after grep matching [duplicate]
(3 answers)
Closed 7 years ago.
I have a file called in.txt which contains a whole bunch of code, however I need to extract a user ID which is guaranteed to be of the form 'EID:nmb685', potentially with content before and/or after the guaranteed format. I want to extract the 'nmb685' using a bash script. I've tried some combinations of grep and sed but nothing has worked.

if your grep doesn't support -p but supports -o, you can combine grep and awk.
grep -o 'EID:\w\+' file|awk -F':' '{print $2}'
Though can it be done by awk alone, but this is more straightforward.

If your grep supports -P, perl-regexp parameter, you may use this.
grep -oP 'EID:\K\w+' file

What is being output after the ID? Is there anything consistent that you can match against?
If you know the length of the userid you can use:
grep "EID:......" in.txt > out.txt
or if you don't maybe something like this (checks all char/num followed by space, preceeded by EID:)
grep "EID:[A-Za-z0-9]* " in.txt > out.txt

Not very elegant, but this works:
grep "EID:" in.txt | sed 's/\(.*\EID:......\).*/\1/g' | sed 's/^.*EID://'
Select all lines with the substring "EID:"
Remove everything after "EID:" plus 6 characters
Remove everything before (and including) "EID:"

Match specific length words, anchored, without doing magic math

Let's say I wanted to find all 12-letter words in /usr/share/dict/words that started with c and ended with er. Off the top of my head, a workable pattern could look something like:
grep -E '^c.{9}er$' /usr/share/dict/words
It finds:
cabinetmaker
calcographer
calligrapher
campanologer
campylometer
...
But that .{9} bothers me. It feels too magical, subtracting the total length of all the anchor characters from the number defined in the original constraint.
Is there any way to rewrite this regex so it doesn't require doing this calculation up front, allowing a literal 12 to be used directly in the pattern?

You can use the -x option which selects only matches that exactly match the whole line.
grep -xE '.{12}' | grep 'c.*er'
Ideone Demo
Or use the -P option which clarifies the pattern as a Perl regular expression and use a lookahead assertion.
grep -P '^(?=.{12}$)c.*er$'
Ideone Demo

You can use awk as an alternative and avoid this calculation:
awk -v len=12 'length($1)==len && $1 ~ /^c.*?er$/' file

I don't know grep so well, but some more advanced NFA RegEx implementations provide you with lookaheads and lookbehinds. If you can figure out any means to make those available for you, you could write:
^(?=c).{12}(?<=er)$
Maybe as a perl one-liner like this?
cat /usr/share/dict/words | perl -ne "print if m/^(?=c).{12}(?<=er)$/"

One approach with GNU sed:
$ sed -nr '/^.{12}$/{/^c.*er$/p}' words
With BSD sed (Mac OS) it would be:
$ sed -nE '/^.{12}$/{/^c.*er$/p;}' words

Not operator in regex [duplicate]

This question already has answers here:
Negative matching using grep (match lines that do not contain foo)
(3 answers)
Closed 12 months ago.
I have a file and I wish to grep out all the lines that do not start with a timestamp. I tried using the following regex but it did not work:
cat myFile | grep '^(?!\[0-9\]$).*$'
Any other suggestions or something that I might be doing wrong here?

Why not simply use grep -v option like this to negate:
grep -v "<pattern>" file
Let's say you want to grep all the lines in a shell script that are not commented ( do not have # at start ) then you can use:
grep -v "^\s*#" file.sh

Try this:
cat myFile | grep '^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d'
This assumes your timestamp is of the pattern dddd-dd-dd dd:dd:dd, but you change it to what matches your timestamp if it's something else.
Note: Unless you're using some kind of cmd chaining, grep pattern file is a simpler syntax
BTW: Your use of a double-negative makes me unsure if you want the timestamp lines or you want the non-timestamp lines.

You don't need a not operator, just use grep as it is most easily used: finding a pattern:
grep '^[0-9]' myFile

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regex: find strings that do not begin with a certain prefix [duplicate] - regex

Here's a quick way: Use the --invert-match (also known as -v) flag to ignore CMD_DO and pipe the results to a second grep that only matches BLAH: grep -v CMD_DO dummy | grep BLAH

Try replacing NegativeLookAhead (?!) with NegativeLookBehind (?<!) in your regex (?<!CMD_DO\(")APP_PERFORM_TASK Check this in action here

Based on your comment: Let's concentrate on command line tool grep Here is grep solution without using -P switch (perl like regex): grep 'APP_PERFORM_TASK' file | grep -v '^CMD_DO("' Here is grep solution using -P switch and negative lokbehind: grep -P '(?<!^CMD_DO\(")APP_PERFORM_TASK' file

Try this (?!CMD_DO\(").APP_PERFORM_TASK.

An awk version awk '/APP_PERFORM_TASK/ && !/^CMD_DO/' file

Related

Exclude pattern in a Grep using extended regex [duplicate]

Match unknown substring with RegEx

Extract word after a known pattern in UNIX [duplicate]

Match specific length words, anchored, without doing magic math

Not operator in regex [duplicate]

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regex: find strings that do not begin with a certain prefix [duplicate] - regex

Here's a quick way: Use the --invert-match (also known as -v) flag to ignore CMD_DO and pipe the results to a second grep that only matches BLAH: grep -v CMD_DO dummy | grep BLAH

Try replacing NegativeLookAhead (?!) with NegativeLookBehind (?<!) in your regex (?<!CMD_DO\(")APP_PERFORM_TASK Check this in action here

Based on your comment: Let's concentrate on command line tool grep Here is grep solution without using -P switch (perl like regex): grep 'APP_PERFORM_TASK' file | grep -v '^CMD_DO("' Here is grep solution using -P switch and negative lokbehind: grep -P '(?<!^CMD_DO\(")APP_PERFORM_TASK' file

Try this (?!CMD_DO\(").*APP_PERFORM_TASK.*

An awk version awk '/APP_PERFORM_TASK/ && !/^CMD_DO/' file

Related

Exclude pattern in a Grep using extended regex [duplicate]

Match unknown substring with RegEx

Extract word after a known pattern in UNIX [duplicate]

Match specific length words, anchored, without doing magic math

Not operator in regex [duplicate]

Categories

Resources

Try this (?!CMD_DO\(").APP_PERFORM_TASK.