I'm looking for correct regex to find lines with less then n times the TAB (\t) character.
I tried this one but it finds nothing:
^.*(?:\t.*){0,20}\r\n
Your pattern contains a .* at the start (after ^, start of string/line anchor), and it matches any zero or more chars other than line break chars, as many as possible. So, it can match any amount of tabs. Then, (?:\t.*){0,20} matches zero, one ... twenty occurrences of a tab and then again any zero or more chars other than line break chars as many as possible.
In the end, the regex does not restrict the amount of tabs on a line at all.
To match lines having no more than N amount of tabs you need
^(?!(?:[^\t\r\n]*\t){N+1}).*
where N is your occurrence count. So, if you want to match (and later remove, since you have \r\n at the end of the regex) lines having no more than 20 tabs, you can use
^(?!(?:[^\t\r\n]*\t){21}).*\R?
See the regex demo.
Details:
^ - start of string/line
(?!(?:[^\t\r\n]*\t){21}) - a negative lookahead that fails the match if there are twenty-one occurrences of zero or more chars other than CR, LF and TAB followed with a TAB char immediately to the right of the current location
.* - the rest of the line
\R? - an optional line break sequence (CRLF, LF or CR).
Related
Trying to send multiline Kafka log from RSYSLOG to FLuentd.
(?<date>\[.*?\]) (.*?) ((.|\n*)*)
Here is the link:
https://regex101.com/r/iFHyTi/1
But my regex is considering next timestamp pattern as a single line. Requirement is to stop before the next timestamp starts.
You can match all subsequent lines that start with either a TAB or a space char:
(?<date>\[[^][]*]) ([A-Z]+) (.*(?:\n(?!\[\d{4}-\d\d-\d\d).*)*)
See the regex demo.
Details
(?<date>\[[^][]*]) - Group "date": [, zero or more chars other than square brackets, ]
- space
([A-Z]+) - Group 2: one or more uppercase ASCII letters
- space
(.*(?:\n(?!\[\d{4}-\d\d-\d\d).*)*) - Group 3:
.* - any zero or more chars other that line break chars as many as possible
(?:\n(?!\[\d{4}-\d\d-\d\d).*)* - zero or more sequences of
\n(?!\[\d{4}-\d\d-\d\d) - a newline, LF, char not followed with [, four digis, -, two digits, -, two digits
.* - any zero or more chars other that line break chars as many as possible
How does a regex look like for
Input:
Rood Li-Ion 12 G6
Match:
"Rood" "Li-Ion" "G6"
1.
I tried
\b[\w-]+\b /g
But that matches the "12" also!
2.I tried
/([0-9]+)?[a-zA-ZĂȘ]/
But that didn't match G6.
I want all words even if they have a number in them but I dont want only numbers to match. How is this possible. Whitespace also shall not be part of the match.
"Rood Li-Ion 12 G6" shall become 3 strings of "Rood","Li-Ion","G6"
You can use
(?<!\S)(?!\d+(?!\S))\w+(?:-\w+)*(?!\S)
See the regex demo. It matches strings between whitespaces or start/end of string, and only when this non-whitespace chunk is not a digit only chunk.
Also, it won't match a streak of hyphens as your original regex.
Details
(?<!\S) - a left whitespace boundary
(?!\d+(?!\S)) - no one or more digits immediately to the right capped with whitespace or end of string is allowed
\w+(?:-\w+)* - one or more word chars followed with zero or more repetitions of - and one or more word chars
(?!\S) - a right whitespace boundary
This should suit your needs:
\b[\w-]*[a-zA-Z][\w-]*\b
I am trying to remove last word from each line if line contains more than one word.
If line has only one word then print it as it, no need to delete it.
say below are the lines
address 34 address
value 1 value
valuedescription
size 4 size
from above lines I want to remove all last words from each line except from 3rd line as it has only one word using regexp ..
I tried below regexp and it is removing single word lines also
$_ =~ s/\s*\S+\s*+$//;
Need your help for the same.
You can use:
$_ =~ s/(?<=\w)\h+\w+$//m;
RegEx Demo
Explanation:
(?<=\w): Lookbehind to assert that we have at least one word char before last word
\h+: Match 1+ horizontal whitespaces
\w+: match a word with 1+ word characters
$: End of line
Try this regex:
^(?=(?:\w+ \w+)).*\K\b\w+
Replace each match with a blank string
Click for Demo
OR
^((?=(?:\w+ \w+)).*\b)\w+
and replace each match with \1
Click for Demo
Explanation(1st Regex):
^ - asserts the start of the line
(?=(?:\w+ \w+)) - positive lookahead to check if the string has 2 words present in it
.* - If the above condition satisfies, then match 0+ occurrences of any character(except newline) until the end of the line
\K - forget everything matched so far
\b - backtrack to find the last word boundary
\w+ - matches the last word
a single word with no whitespace matches your regex since you've used \s* both before and after the \S+, and \s* matches an empty string.
You could use $_ =~ s/^(.*\S)\s+(\S+)$/$1/;
[Explanation: Match the RegEx if the line contains some number of characters ending with a non-whitespace (stored in $1), followed by 1 or more white-space characters, followed by 1 or more non-white-space characters. If there is a match, replace it all with the first part ($1).]
Though you might want to trim leading/trailing whitespace if you think it might contain any - depends on what you want to happen in those cases.
Using Regex find/replace in Notepadd++ how can I remove all spaces from a line if the line starts with 'CHAPTER'?
Example Text:
CHAPTER A B C
Once upon a time.
What I want to end up with:
CHAPTERABC
Once upon a time.
Incorrect code is something like:
(?<=CHAPTER)( )(?<=\r\n)
So 'CHAPTER' needs to stay and the search should stop at the first line break.
You may use a \G based regex to only match a line that starts with CHAPTER and then match only consecutive non-whitespace and whitespace chunks up to the linebreak while omitting the matched non-whitespace chunks and removing only the horizontal whitespace:
(?:^CHAPTER|(?!^)\G)\S*\K\h+
Details:
(?:^CHAPTER|(?!^)\G) - CHAPTER at the start of a line (^CHAPTER) or (|) the end of the previous successful match ((?!^)\G, as \G can also match the start of a line, we use the retricting negative lookahead.)
\S* - zero or more non-whitespace symbols
\K - a match reset operator forcing the regex engine omit the text matched so far (thus, we do not remove CHAPTER or any of the non-whitespace chunks)
\h+ - horizontal whitespace (1 or more occurrences) only
In the following expression:
if (($$_ =~ /^.+:\s*\#\s*abcd\s+XYZ/)
Where is $$_ taken from?
The right side of the expression means to match one or more characters plus followed by colon, followed by zero or more spaces followed by # followed by one or more spaces folowed by 'abcd' followed by zero or more spaces followed by 'XYZ'?
You have the last "one or more" and "zero or more" reversed from what the regex actually does.
$$_ dereferences the scalar reference in $_.
Concerning 2., your explanation of the regex is not entirely correct.
/^.+:\s*#\s*abcd\s+XYZ/
means one or more characters (starting at the beginning of the string) followed by a colon, followed by zero or more whitespace characters, followed by one hash character, followed by zero or more whitespace characters, followed by 'abcd', followed by one or more whitespace characters, followed by 'XYZ'.
As for pt. 2:
Line beginning with (^) one or more characters (.+), colon (:), zero or more whitespace characters (\s*), a hash (\#), zero or more whitespace characters (\s*), the string "abcd" (abcd), one or more whitespace characters (\s+), then the string "XYZ" (XYZ).
(emphasis added on discrepancies.) Do note that there is no anchor on the end of line ($), thus this only concerns the beginning.
Have a look at this site
Here is the given explanation of your regex:
Token Meaning
^ Matches beginning of input. If the multiline flag is set to true,
also matches immediately after a line break character.
.+ Matches any single character except newline characters.
The + quantifier causes this item to be matched 1 or more times (greedy).
: :
\s* Matches a single white space character.
The * quantifier causes this item to be matched 0 or more times (greedy).
\# #
\s* Matches a single white space character.
The * quantifier causes this item to be matched 0 or more times (greedy).
abcd abcd
\s+ Matches a single white space character.
The + quantifier causes this item to be matched 1 or more times (greedy).
XYZ XYZ