Extract word after a known pattern in UNIX [duplicate]

Extract word after a known pattern in UNIX [duplicate] - regex

This question already has answers here:
get the next word after grep matching [duplicate]
(3 answers)
Closed 7 years ago.
I have a file called in.txt which contains a whole bunch of code, however I need to extract a user ID which is guaranteed to be of the form 'EID:nmb685', potentially with content before and/or after the guaranteed format. I want to extract the 'nmb685' using a bash script. I've tried some combinations of grep and sed but nothing has worked.

if your grep doesn't support -p but supports -o, you can combine grep and awk.
grep -o 'EID:\w\+' file|awk -F':' '{print $2}'
Though can it be done by awk alone, but this is more straightforward.

If your grep supports -P, perl-regexp parameter, you may use this.
grep -oP 'EID:\K\w+' file

What is being output after the ID? Is there anything consistent that you can match against?
If you know the length of the userid you can use:
grep "EID:......" in.txt > out.txt
or if you don't maybe something like this (checks all char/num followed by space, preceeded by EID:)
grep "EID:[A-Za-z0-9]* " in.txt > out.txt

Not very elegant, but this works:
grep "EID:" in.txt | sed 's/\(.*\EID:......\).*/\1/g' | sed 's/^.*EID://'
Select all lines with the substring "EID:"
Remove everything after "EID:" plus 6 characters
Remove everything before (and including) "EID:"

Related

Extract word and digit from string using SED [duplicate]

This question already has answers here:
sed: print only matching group
(5 answers)
Closed 2 years ago.
how can I extract the TIC-9890 from a
branch name that looks like feature/TIC-9890/some-other-wording
I am not a SED expert, but I managed to come up with:
echo "feature/TIC-000/random-description" |
sed -n 's/.*\(TIC-[0-9]\{1,\}\).*/\1/'
This seems to work fine if the TIC-\d+ string is in there,
but returns the entire string if that is missing...
However, I need it to return null or empty string if the match isn't present.

You should add a p option to print and it should fly then. Why because we have stopped printing of sed by using -n option so when substitution happens then p needs to be used to print it.
echo "feature/TIC-000/random-description" | sed -n 's/.*\(TIC-[0-9]\{1,\}\).*/\1/p'
From man sed page:
-n, --quiet, --silent suppress automatic printing of pattern space
p Print the current pattern space.
OR as per #anubhava sir's comments one could use grep with -E option we could try:
echo "feature/TIC-000/random-description" | grep -oE 'TIC-[0-9]+'

Opposit letters with sed -r? [duplicate]

This question already has answers here:
How can I get sed to change all of the instances of each letter only once?
(3 answers)
Closed 3 years ago.
I have a text with random alphabetic characters and I want to change every character to his opposit one.
ex. i have a character z, i change it in to a, b to y etc.
I can't really find a better way to do this unless i do
sed -r -e 's/a/z/' -e 's/b/y/' ... 's/z/a/'
Is there a way to do this in a more simple way?
I just want to use the -r option in sed.
Using the y command maybe?

tr is easier,
e.g. for lowercase chars
$ z_a=$(echo {z..a} | tr -d ' '); echo adfa alfja | tr a-z $z_a
zwuz zouqz
the detour to create z-a is required since tr can't handle "reverse collating sequence order".

I know this isn't a sed solution, but this seems a simple, straight forward use of perl:
cat input_file | perl -ple 's/(.)/chr(25-ord($1)+ord("a")*2)/eg'

sed comes, one way is like this:
sed -r 'y/'"$(echo {a..z})"'/'"$(echo {z..a})"'/' file

“sed” command to remove a line that matches an exact string on first word

I've found an answer to my question here: "sed" command to remove a line that match an exact string on first word
...but only partially because that solution only works if I query pretty much exactly like the answer person answered.
They answered:
sed -i "/^maria\b/Id" file.txt
...to chop out only a line starting with the word "maria" in it and not maria if it's not the first word for example.
I want to chop out a specific url in a file, example: "cnn.com" - but, I also have a bunch of local host addressses, 0.0.0.0 and both have some with a single space in front. I also don't want to chop out sub domains like ads.cnn.com so that code "should" work but doesn't when I string in more commands with the -e option. My code below seems to clean things up well except that I can't get it to whack out the cnn.com! My file is called raw.txt
sed -r -e 's/^127.0.0.1//' -e 's/^ 127.0.0.1//' -e 's/^0.0.0.0//' -e 's/^ 0.0.0.0//' -e '/#/d' -e '/^cnn.com\b/d' -e '/::/d' raw.txt | sort | tr -d "[:blank:]" | awk '!seen[$0]++' | grep cnn.com
When I grep for cnn.com I see all the cnn's INCLUDING the one I don't want which is actually "cnn.com".
ads.cnn.com
cl.cnn.com
cnn.com <-- the one I don't want
cnn.dyn.cnn.com
customad.cnn.com
gdyn.cnn.com
jfcnn.com
kermit.macnn.com
metrics.cnn.com
projectcnn.com
smetrics.cnn.com
tiads.sportsillustrated.cnn.com
trumpincnn.com
victory.cnn.com
xcnn.com
If I just use that one piece of code with the cnn.com chop out it seems to work.
sed -r '/^cnn.com\b/d' raw.txt | grep cnn.com
* I'm not using the "-e" option
Result:
ads.cnn.com
cl.cnn.com
cnn.dyn.cnn.com
customad.cnn.com
gdyn.cnn.com
jfcnn.com
kermit.macnn.com
metrics.cnn.com
projectcnn.com
smetrics.cnn.com
tiads.sportsillustrated.cnn.com
trumpincnn.com
victory.cnn.com
xcnn.com
Nothing I do seems to work when I string commands together with the "-e" option. I need some help on getting my multiple option command kicking with SED.
Any advice?
Ubuntu 12 LTS & 16 LTS.
sed (GNU sed) 4.2.2

The . is metacharacter in regex which means "Match any one character". So you accidentally created a regex that will also catch cnnPcom or cnn com or cnn\com. While it probably works for your needs, it would be better to be more explicit:
sed -r '/^cnn\.com\b/d' raw.txt
The difference here is the \ backslash before the . period. That escapes the period metacharacter so it's treated as a literal period.
As for your lines that start with a space, you can catch those in a single regex (Again escaping the period metacharacter):
sed -r '/(^[ ]*|^)127\.0\.0\.1\b/d' raw.txt
This (^[ ]*|^) says a line that starts with any number of repeating spaces ^[ ]* OR | starts with ^ which is then followed by your match for 127.0.0.1.
And then for stringing these together you can use the | OR operator inside of parantheses to catch all of your matches:
sed -r '/(^[ ]*|^)(127\.0\.0\.1|cnn\.com|0\.0\.0\.0)\b/d' raw.txt
Alternatively you can use a ; semicolon to separate out the different regexes:
sed -r '/(^[ ]*|^)127\.0\.0\.1\b/d; /(^[ ]*|^)cnn\.com\b/d; /(^[ ]*|^)0\.0\.0\.0\b/d;' raw.txt

sed doesn't understand matching on strings, only regular expressions, and it's ridiculously difficult to try to get sed to act as if it does, see Is it possible to escape regex metacharacters reliably with sed. To remove a line whose first space-separated word is "foo" is just:
awk '$1 != "foo"' file
To remove lines that start with any of "foo" or "bar" is just:
awk '($1 != "foo") && ($1 != "bar")' file
If you have more than just a couple of words then the approach is to list them all and create a hash table indexed by them then test for the first word of your line being an index of the hash table:
awk 'BEGIN{split("foo bar other word",badWords)} !($1 in badWords)' file
If that's not what you want then edit your question to clarify your requirements and include concise, testable sample input and the expected output given that input.

regex: find strings that do not begin with a certain prefix [duplicate]

This question already has an answer here:
Regular expression for a string that does not start with a sequence
(1 answer)
Closed 9 years ago.
I want to find a word in strings, but only if it doesn't begin with a prefix.
for example.
I'd like to find all the appearances of APP_PERFORM_TASK, but only if they are not starting with a prefix of CMD_DO("
so,
CMD_DO("APP_PERFORM_TASK") <- OK (i don't need to know about this)
BLAH("APP_PERFORM_TASK") <-- NOT OK, this should match my search.
I tried:
(?!CMD_DO\(")APP_PERFORM_TASK
But that doesn't produce the results I need. What I doing wrong?

Here's a quick way:
Use the --invert-match (also known as -v) flag to ignore CMD_DO and pipe the results to a second grep that only matches BLAH:
grep -v CMD_DO dummy | grep BLAH

Try replacing NegativeLookAhead (?!) with NegativeLookBehind (?<!) in your regex
(?<!CMD_DO\(")APP_PERFORM_TASK
Check this in action here

Based on your comment: Let's concentrate on command line tool grep
Here is grep solution without using -P switch (perl like regex):
grep 'APP_PERFORM_TASK' file | grep -v '^CMD_DO("'
Here is grep solution using -P switch and negative lokbehind:
grep -P '(?<!^CMD_DO\(")APP_PERFORM_TASK' file

Try this
(?!CMD_DO\(").*APP_PERFORM_TASK.*

To handle an input line with both the desirable and undesirable forms like:
CMD_DO("APP_PERFORM_TASK") BLAH("APP_PERFORM_TASK")
you'd need something like this in awk (using GNU awk for gensub()):
awk -v s="APP_PERFORM_TASK" 'gensub("CMD_DO\\(\\""s,"","") ~ s' file
i.e. get rid of all of the unwanted occurrences of the string then test whats left.

An awk version
awk '/APP_PERFORM_TASK/ && !/^CMD_DO/' file

Not operator in regex [duplicate]

This question already has answers here:
Negative matching using grep (match lines that do not contain foo)
(3 answers)
Closed 12 months ago.
I have a file and I wish to grep out all the lines that do not start with a timestamp. I tried using the following regex but it did not work:
cat myFile | grep '^(?!\[0-9\]$).*$'
Any other suggestions or something that I might be doing wrong here?

Why not simply use grep -v option like this to negate:
grep -v "<pattern>" file
Let's say you want to grep all the lines in a shell script that are not commented ( do not have # at start ) then you can use:
grep -v "^\s*#" file.sh

Try this:
cat myFile | grep '^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d'
This assumes your timestamp is of the pattern dddd-dd-dd dd:dd:dd, but you change it to what matches your timestamp if it's something else.
Note: Unless you're using some kind of cmd chaining, grep pattern file is a simpler syntax
BTW: Your use of a double-negative makes me unsure if you want the timestamp lines or you want the non-timestamp lines.

You don't need a not operator, just use grep as it is most easily used: finding a pattern:
grep '^[0-9]' myFile

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract word after a known pattern in UNIX [duplicate] - regex

if your grep doesn't support -p but supports -o, you can combine grep and awk. grep -o 'EID:\w\+' file|awk -F':' '{print $2}' Though can it be done by awk alone, but this is more straightforward.

If your grep supports -P, perl-regexp parameter, you may use this. grep -oP 'EID:\K\w+' file

Not very elegant, but this works: grep "EID:" in.txt | sed 's/\(.\EID:......\)./\1/g' | sed 's/^.*EID://' Select all lines with the substring "EID:" Remove everything after "EID:" plus 6 characters Remove everything before (and including) "EID:"

Related

Extract word and digit from string using SED [duplicate]

Opposit letters with sed -r? [duplicate]

“sed” command to remove a line that matches an exact string on first word

regex: find strings that do not begin with a certain prefix [duplicate]

Not operator in regex [duplicate]

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract word after a known pattern in UNIX [duplicate] - regex

if your grep doesn't support -p but supports -o, you can combine grep and awk. grep -o 'EID:\w\+' file|awk -F':' '{print $2}' Though can it be done by awk alone, but this is more straightforward.

If your grep supports -P, perl-regexp parameter, you may use this. grep -oP 'EID:\K\w+' file

Not very elegant, but this works: grep "EID:" in.txt | sed 's/\(.*\EID:......\).*/\1/g' | sed 's/^.*EID://' Select all lines with the substring "EID:" Remove everything after "EID:" plus 6 characters Remove everything before (and including) "EID:"

Related

Extract word and digit from string using SED [duplicate]

Opposit letters with sed -r? [duplicate]

“sed” command to remove a line that matches an exact string on first word

regex: find strings that do not begin with a certain prefix [duplicate]

Not operator in regex [duplicate]

Categories

Resources

Not very elegant, but this works: grep "EID:" in.txt | sed 's/\(.\EID:......\)./\1/g' | sed 's/^.*EID://' Select all lines with the substring "EID:" Remove everything after "EID:" plus 6 characters Remove everything before (and including) "EID:"