How to clear lines after the last regex match - regex

I got an huge log of records I need to turn into a table.
Each line has a record, preceded by date and time, something like this:
27/11/2019 16:35 - i don't need this
28/11/2019 17:25 - don't need this either
30/11/2019 11:33 - stuff i'm looking for
01/12/2019 08:11 - stuff that i'm also looking for
03/11/2019 09:39 - don't need this
I want to completely clear the file from all the lines that I don't need.
I'm able to clear most of the lines that I don't want if I use the following regex and substitution patterns (in notepad++, using the flag in which dot matches newline):
.+?(?<datetime>[\d\/]+\s[\d:]+)\s-\s(?<mystuff>stuff[^\n]+)
'${datetime};${mystuff}
However, I can't clear the lines after the last match. How could I do so?

You may use
Find What: ^(?:.+?([\d/]+\h[\d:]+)\h-\h(stuff.*)|.*\R?)
Replace With: (?{1}$1;$2)
Details
^ - start of a line
(?:.+?([\d/]+\h[\d:]+)\h-\h(stuff.*)|.*\R?) - match either
.+? - any 1+ chars, as few as possible
([\d/]+\h[\d:]+) - Group 1: one or more digits or /, a horizontal whitespace, one or more digits or :
\h-\h - a horizontal whitespace, - and a hor. whitespace
(stuff.*) - Group 2: stuff and the rest of the line
| - or
.* - any 0+ chars other than linebreak chars
\R? - an optional line break sequence.
The (?{1}$1;$2) replacement pattern only replaces with $1;$2 if Group 1 matches.
See the Notepad++ demo:

Related

RegEx string to find two strings and delete the rest of the text in the file including lines that don't contain the strings [duplicate]

I need to do a find and delete the rest in a text file with notepad+++
i want tu use RegeX to find variations on thban..... the variable always has max 5 chars behind it(see dots).
with my search string it hit the last line but the whole line. I just want the word preserved.
When this works i also want keep the words containing C3.....
The rest of a tekst file can be delete.
It should also be caps insensitive
(?!thban\w+).*\r?\n?
\
THBANES900 and C3950 bla bla
THBAN
..THBANES901.. C3850 bla bla
THBANMP900
**..thbanes900..**
This should result in
THBANES900 C3950
THBAN
THBANES901 C3850
THBANMP900
thbanes900
Maybe just capture those words of interest instead of replacing everything else? In Notepad++ search for pattern:
^.*\b(thban\S{0,5})(?:.*(\sC3\w+))?.*$|.+
See the Online Demo
^ - Start string ancor.
.*\b - Any character other than newline zero or more times upto a word-boundary.
(- Open 1st capture group.
thban\S{0,5} - Match "thban" and zero or 5 non-whitespace chars.
) - Close 1st capture group.
(?: - Open non-capturing group.
.* - Any character other than newline zero or more times.
( - Open 2nd capture group.
\sC3\w+ - A whitespace character, match "C3" and one ore more word characters.
) - Close 2nd capture group.
)? - Close non-capturing group and make it optional.
.* - Any character other than newline zero or more times.
$ - End string ancor.
| - Alternation (OR).
.+ - Any character other than newline once or more.
Replace with:
$1$2
After this, you may end up with empty line you can switly remove using the build-in option. I'm unaware of the english terms so I made a GIF to show you where to find these buttons:
I'm not sure what the english checkbutton is for ignore case. But make sure that is not ticked.
You may use
Find What: (?|\b(thban\S{0,5})|\s(C3\w+))|(?s:.)
Replace With: (?1$1\n:)
Screenshot & settings
Details
(?| - start of a branch reset group:
\b(thban\S{0,5}) - Group 1: a word boundary, then thban and any 0 to 5 non-whitespace chars
| - or
\s(C3\w+) - a whitespace char, and then Group 1: C3 and one or more word chars
) - end of the branch reset group
| - or
(?s:.) - any one char (including line break chars)
The replacement is
(?1 - if Group 1 matched,
$1\n - Group 1 value with a newline
: - else, replace with empty string
) - end of the conditional replacement pattern

How to replace text starting with pattern?

I have this text opened in notepad++ 7.5.2 text editor :
- Note 1 message={"one":1}]
- Note 5 message={"two":2}]
- Note 2 message={"three":3}]
- Note 7 message={"four":4}]
For each line, I want to keep only the text between the brackets starting and ending brackets { } and the bracket themselves. I tried a regex - Note.* message= in https://regex101.com/ and it works. I am able to find lines in matching the regex in notepad++. But, I am not able to replace them with nothing.
How do I do the replacement ?
You may use
^- Note.* message=(.*)]$
Replace with $1. See the regex demo.
Details
^ - start of a line
- Note - - Note text
.* - any 0+ chars other than line break chars, as many as possible
message= - message= text
(.*) - Capturing group 1 ($1): any 0+ chars other than line break chars, as many as possible
] - a ] char
$ - end of a line.
Notepad++ demo and settings:

regular expression to capture groups in selected lines

I have multi line string below (in python) and looking for regex to extract src, dst and severity. So in the example below group1 be '10.4.180.5' , group 2 '34.23.21.10' and group 3 'critical'
src: 10.4.180.25
dst: 34.23.21.10
natsrc: 20.160.129.5
natdst: 34.33.21.10
... more lines
severity: critical
... more lines
If I try regex like /src: (\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b)\ndst: (\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b)\n/ with gm flags it will find me src and dst but not severity which is few lines down (lines omitted for clarity). Is there a way to do it without including all of these lines between src, dst and severity ?
You missed need to actually match any number of lines that do not start with severity after what your pattern matches. Besides, you may shorten the pattern by using {3} limiting quantifier in order not to repeat \.\d{1,3} so many times. Note than between a whitespace and a digit, the word boundary is implicit, it is already there, no need to use \b.
Use
src:\s*(\d{1,3}(?:\.\d{1,3}){3})\ndst:\s*(\d{1,3}(?:\.\d{1,3}){3})(?:\n(?!severity).+)*?\nseverity:\s*(.+)
See the regex demo
Details
src: - a literal substring
\s* - 0+ whitespaces
(\d{1,3}(?:\.\d{1,3}){3}) - Group 1: IP-like pattern
\n - a newline
dst:\s* - dst: with 0+ whitespaces after it
(\d{1,3}(?:\.\d{1,3}){3}) - Group 1: IP-like pattern
(?:\n(?!severity).+)*? - 0+ sequences (as few as possible) of
\n(?!severity) - a newline not followed with severity
.+ - the whole line
\nseverity:\s* - a newline, severity: substring and 0+ whitespaces
(.+) - Group 3: 1 or more chars up to the end of the line
Note you do not need any DOTALL modifier with this regex.
You can use a greedy lookup (think this is the right terminology) regex to do this:
src: (\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\ndst: (\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})[\s\S]*?severity: (.+)?\n
I have updated the regex so it actually works now!
so it searches for the same bit you have, but then as there are many lines between the dst: line and the severity line, we need to skip all these lines.
To match any number of lines up to the line beginning with severity:, we need to match any characters - including new-lines. To do this, we can use a set of characters: [\s\S]. This means match any character which is not a space or is a space, i.e. all characters. We then put this in a greedy lookup to match as many any characters needed to get to the severity: line - so this bit is [\s\S]*?severity:.
Now we are at the severity: line, we want to match and return the characters up to the end of that line (up to the new-line \n character). This is done with the similar: (.+)?\n syntax but with a plus as we want to match one or more characters. Also, as want to return this bit, we need to put it in parentheses.

Regex multiple conditions

I recently started using regex, but I can't seem to figure out this problem:
https://xxxx.yyyy.com/en
For this URL I want to create a regex that is only valid when all conditions below are true:
does not contain 'xxxx'
does contain /en$ or /en/
I got to validate the 2 seperate conditions, but can't seem to put them together
\/en\/|\/en$|^(?!.*(xxxx)).*$
Can you please help?
Thanks!
You may use
/^(?!.*xxxx).*\/en(?:$|\/)/
See the regex demo
Details
^ - start of string
(?!.*xxxx) - there can't be xxxx after any 0+ chars other than line break chars
.* - any 0 or more chars other than line break chars, as many as possible
\/en - /en substring
(?:$|\/) - end of string or /
So, if you want to replace xxxx with more than one term, use
/^(?!.*(?:stage|acc)).*\/en(?:$|\/)/
Note that you may force the engine to match them as whole words if you add word boundaries:
/^(?!.*\b(?:stage|acc)\b).*\/en(?:$|\/)/
If you need a full string match, add .* at the end of the pattern.
Using only lookarounds:
^(?!.*xxxx)(?=.*\/en(?:$|\/)).*
^ // start of line
(?!.*xxxx) // look ahead and don't match anything then 'xxxx'
(?= // look ahead and match
.*\/en // anything then '/en'
(?:$|\/) // end of line OR a slash
) // end of look ahead
.* // match all (can be omitted if testing lines)
Flags: global, multiline
Steps: 188
Demo

RegEx - How to select the second comma and everything after it

I'm using UltraEdit. I have a text file that contains strings like this
Workspace\\Trays\\Dialogs\\Components, Expand, kThisComputerOnly, P_BOOLEAN },
WebCommonDialog Sign_Out, Left, kThisComputerOnly, P_INTEGER_RANGE(0, 4096) },
ThreeDTextDlg, x, kThisComputerOnly, P_INTEGER_RANGE(0, 4096) },
Preferences\\Graphics, CtxDbgMaxGLVersionMajor, kThisComputerOnly, P_INTEGER },
UltraEdit allows PERL, UNIX and UltraEdit style RegEx.
I need to select the second comma and everything to the end of the line and delete it.
Using regexpal.com I've tried several different approaches but can't figure it out.
/,\s.+/ selects the first comma
/[,]\s.+/ same as above
I can't figure out how to select the second command and beyond.
I have also search StackOverflow and found several examples but couldn't change them to work for me.
Thanks.
You may use a Perl regex option with the following pattern:
^([^,]*,[^,]*),.*
and replace with \1.
See the regex demo.
Details:
^ - start of string
([^,]*,[^,]*) - Group 1 (later referred to with \1 backreference from the replacement pattern):
[^,]* - any 0+ chars other than a comma (to prevent overflowing across lines, add \n\r into the negated character class - [^,\n\r]*)
, - a comma
[^,]* - any 0+ chars other than a comma
, - a comma
.* - any 0+ chars other than line break chars as many as possible