grep regular expression [duplicate] - regex

This question already has answers here:
Match empty lines in a file with 'grep'
(3 answers)
Closed 5 years ago.
Hi I stuck with some script that is doing some text filtering
script is counting occurrences by doing:
cat file | sort | grep -v "^$" | uniq -c | sort -nr | head -20
Is not obvious for me how will grep -v "^$" works.
As I am understanding -v which is invert the sense of matching, inverting pattern with begging of line and end of line is not obvious for me.
I was trying few examples but is not clear to understand for me how it works (i.e. it filter spaces but not carriage returns)

It will just get rid of empty lines. "^$" matches lines that start and end without anything in between the start and end.

Related

Extract word and digit from string using SED [duplicate]

This question already has answers here:
sed: print only matching group
(5 answers)
Closed 2 years ago.
how can I extract the TIC-9890 from a
branch name that looks like feature/TIC-9890/some-other-wording
I am not a SED expert, but I managed to come up with:
echo "feature/TIC-000/random-description" |
sed -n 's/.*\(TIC-[0-9]\{1,\}\).*/\1/'
This seems to work fine if the TIC-\d+ string is in there,
but returns the entire string if that is missing...
However, I need it to return null or empty string if the match isn't present.
You should add a p option to print and it should fly then. Why because we have stopped printing of sed by using -n option so when substitution happens then p needs to be used to print it.
echo "feature/TIC-000/random-description" | sed -n 's/.*\(TIC-[0-9]\{1,\}\).*/\1/p'
From man sed page:
-n, --quiet, --silent suppress automatic printing of pattern space
p Print the current pattern space.
OR as per #anubhava sir's comments one could use grep with -E option we could try:
echo "feature/TIC-000/random-description" | grep -oE 'TIC-[0-9]+'

Print specific string in a line after matched word in bash [duplicate]

This question already has answers here:
How to extract string following a pattern with grep, regex or perl [duplicate]
(8 answers)
Closed 4 years ago.
Situation
There is a file called test that consists on the following text:
this is the first line
version=1.2.3.4
this is the third line
How can i print via bash only:
1.2.3.4
Note: I want always to print until end of line what is after "version=" not searching for 1.2.3.4
Thank you
Using GNU grep :
grep -Po '^version=\K.*'
-P enables PCRE regex, -o is used to only display what is matched rather than whole lines and the \K meta-character specifies not to display what precedes.
Using sed :
sed -n 's/^version=\(.*\)/\1/p'
-n disables auto-printing, then the substitution command will replace the "version=[...]" line by only its end through a capturing group. The substitution is only effective on the second line, which trigger the p instruction to print the (transformed) line.
you can use:
grep version file | cut -d\= -f2

How to keep just the first 300 charactes of every line? [duplicate]

This question already has answers here:
Using sed, how do you print the first 'N' characters of a line?
(6 answers)
Closed 5 years ago.
I want to keep just the first 300 characters of every line. The obvious solution:
sed -E 's/^(.{0,300}).*/\1/'
apparently exceeds some internal regex limit:
RE error: invalid repetition count(s)
Some experimentation shows rep count can only go up to 255, at least on my platform (MacOS). Python can handle {0,300}, but I'd prefer to do this with normal shell tools, if possible. Any ideas?
PS: Yeah, I know, if I was doing it in Python, I'd do line[:300] and ditch the regex completely.
Doesn't error out for me on GNU sed, see if cut works for you
$ perl -e 'print "a" x 350' | sed -E 's/^(.{0,300}).*/\1/' | wc -L
300
$ perl -e 'print "a" x 350' | cut -c1-300 | wc -L
300

regex: find strings that do not begin with a certain prefix [duplicate]

This question already has an answer here:
Regular expression for a string that does not start with a sequence
(1 answer)
Closed 9 years ago.
I want to find a word in strings, but only if it doesn't begin with a prefix.
for example.
I'd like to find all the appearances of APP_PERFORM_TASK, but only if they are not starting with a prefix of CMD_DO("
so,
CMD_DO("APP_PERFORM_TASK") <- OK (i don't need to know about this)
BLAH("APP_PERFORM_TASK") <-- NOT OK, this should match my search.
I tried:
(?!CMD_DO\(")APP_PERFORM_TASK
But that doesn't produce the results I need. What I doing wrong?
Here's a quick way:
Use the --invert-match (also known as -v) flag to ignore CMD_DO and pipe the results to a second grep that only matches BLAH:
grep -v CMD_DO dummy | grep BLAH
Try replacing NegativeLookAhead (?!) with NegativeLookBehind (?<!) in your regex
(?<!CMD_DO\(")APP_PERFORM_TASK
Check this in action here
Based on your comment: Let's concentrate on command line tool grep
Here is grep solution without using -P switch (perl like regex):
grep 'APP_PERFORM_TASK' file | grep -v '^CMD_DO("'
Here is grep solution using -P switch and negative lokbehind:
grep -P '(?<!^CMD_DO\(")APP_PERFORM_TASK' file
Try this
(?!CMD_DO\(").*APP_PERFORM_TASK.*
To handle an input line with both the desirable and undesirable forms like:
CMD_DO("APP_PERFORM_TASK") BLAH("APP_PERFORM_TASK")
you'd need something like this in awk (using GNU awk for gensub()):
awk -v s="APP_PERFORM_TASK" 'gensub("CMD_DO\\(\\""s,"","") ~ s' file
i.e. get rid of all of the unwanted occurrences of the string then test whats left.
An awk version
awk '/APP_PERFORM_TASK/ && !/^CMD_DO/' file

Not operator in regex [duplicate]

This question already has answers here:
Negative matching using grep (match lines that do not contain foo)
(3 answers)
Closed 12 months ago.
I have a file and I wish to grep out all the lines that do not start with a timestamp. I tried using the following regex but it did not work:
cat myFile | grep '^(?!\[0-9\]$).*$'
Any other suggestions or something that I might be doing wrong here?
Why not simply use grep -v option like this to negate:
grep -v "<pattern>" file
Let's say you want to grep all the lines in a shell script that are not commented ( do not have # at start ) then you can use:
grep -v "^\s*#" file.sh
Try this:
cat myFile | grep '^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d'
This assumes your timestamp is of the pattern dddd-dd-dd dd:dd:dd, but you change it to what matches your timestamp if it's something else.
Note: Unless you're using some kind of cmd chaining, grep pattern file is a simpler syntax
BTW: Your use of a double-negative makes me unsure if you want the timestamp lines or you want the non-timestamp lines.
You don't need a not operator, just use grep as it is most easily used: finding a pattern:
grep '^[0-9]' myFile