why grep '\s*' is not working, but grep '\S*' works - regex

I am new to shell script.
I want to display the line starts with whitespace or non-whitespace in the file, but grep '\S*' works, grep '\s*' does not match any line.
And '\s' looks works
My grep version is 3.4. I am using WSL Ubuntu. The read color means it is matched. I tried [[:space:]], the result is same
Anyone can help? Thanks
test.fa includes
ctatccagcaccagatagcatcattttactttcaagcctagaaattgcac
haha
ok
acttgtatataaaccaaccgaagatgaggattgagagttcatcttggtgg
running result

* means "zero or more repetitions of the preceding expression". So \S* matches zero or more non-spaces while \s* matches zero or more spaces, and puting a ^ in front means match those at the start of a line (when the string being compared is a line as is the case with grep by default).
So in your input file:
Line 1: ctatccagcaccagatagcatcattttactttcaagcctagaaattgcac
Line 2: haha
Line 3:
Line 4: ok
Line 5: acttgtatataaaccaaccgaagatgaggattgagagttcatcttggtgg
^\S* matches the following on each line:
line 1: ctatccagcaccagatagcatcattttactttcaagcctagaaattgcac
Line 2: the null string before the leading blank
Line 3: the null string that is the whole line
Line 4: the null string before the leading blanks
Line 5: acttgtatataaaccaaccgaagatgaggattgagagttcatcttggtgg
while ^\s* matches the following on each line:
line 1: the null string before ctatccagcaccagatagcatcattttactttcaagcctagaaattgcac
Line 2: the leading blank
Line 3: the null string that is the whole line
Line 4: the leading blanks
Line 5: the null string before acttgtatataaaccaaccgaagatgaggattgagagttcatcttggtgg
So both regexps match something on every line, and what is colored as matching is the printable (i.e. non-null and non-blank) chars from each matching string.
To display the lines that start with whitespace would be:
grep '^\s'
and to display the lines that start with non-whitespace would be:
grep '^\S'
and to display empty lines would be:
grep -v '.'
If your grep doesn't support \s/\S then use [[:space:]]/[^[:space:]] instead if it's a POSIX grep or [ \t]/[^ \t] in any grep.

Related

regex: match lines that doesn't end with '}' and has match one of three words [duplicate]

This question already has answers here:
Using the star sign in grep
(12 answers)
Closed 3 years ago.
I have this text:
NBA:red this line has a tab and ends with a curly braces}
some random text qwertyuiop
NBA:green this line must match
NBA:red this line has a tab and must match
NBA:response this line has spaces and must match
NBA:blue this line has a tab and ends with a curly braces}
some random text qwertyuiop
NBA:blue this line has spaces at the begining and ends with curly braces}
random text qwertyuiop
this line must not match}
this line must not match }
I want to match the lines that contains 'NBA:' following by the word 'red' or 'green' or 'blue', and also that doesn't end with a curly braces'}', this command match only 'NBA:' and one of the three words:
$ egrep 'NBA:(red|green|blue)' myfile.txt
NBA:red this line has a tab and ends with a curly braces}
NBA:green this line must match
NBA:red this line has a tab and must match
NBA:blue this line has a tab and ends with a curly braces}
NBA:blue this line has spaces at the begining and ends with curly braces}
But I don't know how to match the lines that doesn't end with '}':
I tried this but it doesn't work:
egrep 'NBA:(red|green|blue)*[^}]$' myfile.txt
But this works:
egrep 'NBA:(red|green|blue)' lorem.txt | egrep '[^}]$'
NBA:green this line must match
NBA:red this line has a tab and must match
I want to do it in just one command
You were just one character off. This should work fine:
egrep 'NBA:(red|green|blue).*[^}]$'
# ^
# Note this bit.
* doesn't mean the same thing in regex that it does in glob patterns. It means zero-or-more of the preceding item (a preceding item in this answer being ., any character).

Delete line containing a particular word as well as the previous line in Notepad++

I would like to get the code that deletes every line containing the string /movie/ and the previous line (the / character is included in /movie/).
Example :
Before Code :
#EXTINF:-1,Wreck-It Ralph
http://p5.giffy.be:8080/movie/RghyHCIE4i/SDrQatrZkx/104880.mp4
#EXTINF:-1,Wrinkle-Free
http://p5.giffy.be:8080/movie/RghyHCIE4i/SDrQatrZkx/105060.mp4
#EXTINF:-1,DR | TELEMICRO 5
http://p5.giffy.be:8080/RghyHCIE4i/SDrQatrZkx/99633
#EXTINF:-1,Wrong Mistake - Short Movie
http://p5.giffy.be:8080/movie/RghyHCIE4i/SDrQatrZkx/106840.mp4
#EXTINF:-1,DR | TELESISTEMA 11
http://p5.giffy.be:8080/RghyHCIE4i/SDrQatrZkx/99632
#EXTINF:-1,Wreck-It
http://p5.giffy.be:8080/movie/RghyHCIE4i/SDrQatrZkx/104707.mp4
#EXTINF:-1,DR | TELEUNIVERSO
http://p5.giffy.be:8080/RghyHCIE4i/SDrQatrZkx/99631
After Code :
#EXTINF:-1,DR | TELEMICRO 5
http://p5.giffy.be:8080/RghyHCIE4i/SDrQatrZkx/99633
#EXTINF:-1,DR | TELESISTEMA 11
http://p5.giffy.be:8080/RghyHCIE4i/SDrQatrZkx/99632
#EXTINF:-1,DR | TELEUNIVERSO
http://p5.giffy.be:8080/RghyHCIE4i/SDrQatrZkx/99631
You can use the following regular expression:
^.*?\r\n.*?\/movie\/.*?(\r\n|$)
Step-by-step:
Open Find and Replace with Ctrl+h.
Press Alt+f to focus on Find what.
Enter the above regex.
Press Alt+g to enable regular expression mode. Ensure ". matches newline" is off.
Press Alt+a to Replace All.
How it works:
^ # anchor to beginning of line
.*? # lazily match zero or more characters
\r\n # match carriage return and line feed
.*? # lazily match zero or more characters
\/movie\/ # match literal /movie/
.*? # lazily match zero or more characters
(\r\n|$) # match carriage return and line feed or EOL
Another option is to match the first line and match a unicode newline sequence \R. Then match the second line with /movie/ and at the end match \R
Find what:
^.*\R.*/movie/.*\R
That will match
^ Start of string
.* Match 0+ times any char except newline
\R Match unicode newline sequence
.*/movie/.* Match /movie/ in the string
\R Match unicode newline sequence
Replace with:
Leave empty
Regex demo

perl: regex help needed to replace text within two words with certain conditions

This question seems to be the same as Negative lookahead with awk or sed not possible but only perl supports But its not the same.
In this question i want to know how to solve more conditions for my search
I have the following text (sample.txt)
Condition 1: contains PQXY in between QWWK and KWWQ so not wanted
QWWK erly jointure PQXY In said devonshire
Drift allow green son walls years for blush.
acceptance son KWWQ
Condition 2: QWWK does not start at the beginnig of the line, so not wanted
other QWWK get him his projection ar saw fat sudden edward
sociable felicity supplied mr. September
ay now many. Alte KWWQ
Condition 3: KWWQ is not at the end of the line, so not wanted
QWWK ble formerly six but hand
r way now many. Alteration you
occasion ham for KWWQ other
Condition 4: QWWK begins at the starting and KWWQ ends at the last and there is no PQXY, so this is what wanted
QWWK n zealously arranging fr
eal park so rest we on. Ignorant d
he possession insensible sympathi KWWQ
.......
Kindly the note the words QWWK PQXY and KWWQ
My text goes multiple lines.
I want to match text between QWWK and KWWQ
Condition 1: should not contain the word PQXY inbetween
Condition 2: QWWK should start at the beginning of the line
Condition 3: KWWQ should be at the end of the line
In sublime text i match using:
(?s)(^QWWK(?:(?!QWWK).)*?KWWQ\n)
and it matches condition 4
QWWK n zealously arranging fr
eal park so rest we on. Ignorant d
he possession insensible sympathi KWWQ
So it does not match condition 1, condition 2 and condition 3.
I am trying with perl to replace condition 4 with sometext i am trying
$ perl -0777pe 's/^QWWK(?!QWWK).*?KWWQ\n/sometext/gs' sample.txt > sample_mod.txt
But sample_mod.txt did not replace the codition 4
I also tried
$ perl -0777pe 's/\nQWWK(?!QWWK).*?KWWQ\n/sometext/gs' sample.txt > sample_mod.txt
It removes both condition 1 and condition4
/m alters the definition of ^ and $ to be start of line and end of line respectively.
What you asked for:
/^QWWK(?:(?!PQXY).)*KWWQ$/msg
What you probably want:
/^QWWK(?:(?!QWWK|PQXY|KWWQ).)*KWWQ$/msg
Optimized: (Reduces the number of lookarounds performed)
/
^ QWWK
[^KPQ]*+
(?: (?: K (?!WWQ)
| P (?!QXY)
| Q (?!WWK)
)
[^KPQ]*+
)*+
KWWQ $
/xmg
Through some trial and error, I came up with this regex:
/^QWWK(?!.*PQXY)(?!.*KWWQ[^\n])(.*?)KWWQ$/gms
The /m modifier means the input is multi-line and ^ matches the beginning of any line and $ matches the end of any line
With the /s modifier, the . metacharacter means any character including newline characters
/^QWWK .../m
Find a substring that begins with QWWK at the start of a line
/... KWWQ$/m
and ends with KWWQ at the end of a line
/^QWWK(?!.*PQXY)/s
The match fails if QWWK is followed by any number of characters (including new lines) and the text PQXY.
/^QWWK ... (?!.KWWQ[^\n]) ... /s
The match also fails if QWWK is followed by any number of characters, the text KWWQ, and any character that is not a new line.
/^QWWK(.*?)KWWQ$/s
Put any text between QWWK and KWWQ, including new lines, in a capture group. Use the non-greedy modifier ? so that the regexp will not try to capture from an early QWWK observation to the latest possible KWWQ observation.
I read this post Multiline search replace with Perl
I tried the below and looks to be working:
$ perl -0pe 's/^QWWK(?:(?!PQXY).)*?KWWQ\n/sometext/gms' sample.txt > sample_mod.txt
Then only condition 4 is replaced and others remain intact

grep not the begining of a line

I want to find all lines in a file containing a number, but not at the beginning of a line. I tried the following:
grep -E '[^^][1-9]?[0-9]+' test.txt
However, it does not work: this expression matches the lines starting with numbers consisting of two-(or more) digits. As I understand it, [^^] does not mean "any symbol except the beginning of a line". Why is so, and how to write this correctly?
Edited according to comment:
This Regex should do it, it matches lines not starting with a number (one or more characters), then find one or more numbers.
^[^1-9]+?\d+
You will need to set the 'multiline' option, if you check multiple lines at one time.
Your issue is the [^^] part of your regex. That is a negative character class (a ^ inside the [ ] negates what is inside the brackets).
Instead, I think you are looking for ^ outside of the brackets to state 'start of the line' and then a negated character class of [^0-9] for something other than a digit at the start of the line:
$ echo "1 line
line 2
3 line
line 4
no num" | grep '^[^0-9]'
line 2
line 4
no num
Then add .* for 'anything of any length' and [0-9] for at least one digit to filter for lines that have a digit in the line:
$ echo "1 line
line 2
3 line
line 4
no num" | grep '^[^0-9].*[0-9]'
line 2
line 4
Or, if you want to be locale aware, you can use POSIX character classes to the same result:
$ echo "1 line
line 2
3 line
line 4
no num" | grep '^[^[:digit:]].*[[:digit:]]'
line 2
line 4

replace in multiline - refer to content for replacement

I need the following:
input:
NAME-LIST:
name1
<any text>
name_to_be_changed;
NAME-LIST:
name3
<any text>
name_to_be_changed;
output: replace "name_to_be_changed" by first name in the block
NAME-LIST:
name1
<any text>
name1;
NAME-LIST:
name3
<any text>
name3;
result:
I would prefer a perl one-liner :-)
I suggest a search expression similar to what Sam already posted:
(NAME-LIST:[\t ]*[\r\n]+)([^\r\n]+)([\r\n]+[^\r\n]*[\r\n]+)name_to_be_changed;
The replace string is \1\2\3\2; or $1$2$3$2;
Each pair of opening and closing round brackets specify a marking group. There are three such marking groups in the search expression.
[\t ]* makes it possible that there are trailing spaces or tabs after fixed string NAME-LIST: at end of first line of a block.
[\r\n]+ matches 1 or more carriage returns or linefeeds. That is similar to \v as used by Sam but does not match other vertical whitespaces like formfeed.
[^\r\n]+ matches 1 or more characters which are whether a carriage return nor a linefeed. That is like . if the matching behavior for a dot is defined as matching all characters except line terminators.
[^\r\n]* matches 0 or more characters which are whether a carriage return nor a linefeed. So <any text> can be also no text at all which means third line can be also a blank line.
The 3 strings found by the expressions in the marking groups are backreferenced by \1, \2 and \3 respectively $1, $2 and $3 whereby only the second one is backreferenced twice to copy the string from line 2 to line 4 and keep the other 3 lines unchanged.
Using a perl one-liner
perl -00 -pe 's/NAMELIST:\n(.*)\n.*\n\K.*/$1/' file.txt
Explanation:
Switches:
-00: Paragraph mode
-p: Creates a while(<>){...; print} loop for each line in your input file.
-e: Tells perl to execute the code on command line.
first of all thanks for your input...
unfortunately I could not make use of both of your suggested solutions, but I have found an own one:
perl -00 -pe 's/(NAME-LIST:\s+)(\w+)(.*?)\w+;/$1$2$3$2;/gs'
\s+ = 1 or more white spaces (space, newline, tab,...)
\w+ = 1 or more alphanumericals (like words or numbers
important is the /gs
g = global (do the replacements more than one time, otherwise only the first name will be replaced)
s = treat as single line