Regex to remove all except XML - regex

I need help with a Regex for notepad++ to match all but XML
The regex I'm using:
(!?\<.*\>) <-- I want the opposite of this (in first three lines)
The example code:
[20173003] This text is what I want to delete [<Person><Name>Foo</Name><Surname>Bar</Surname></Person>], and this text too.
[20173003] This is another text to delete [<Person><Name>Bar</Name><Surname>Foo</Surname></Person>]
[20173003] This text too... [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], delete me!
[20173003] But things like this make the regex to fail < [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], or this>
Expected result:
<Person><Name>Foo</Name><Surname>Bar</Surname></Person>
<Person><Name>Bar</Name><Surname>Foo</Surname></Person>
<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>
<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>
Thanks in advance!

This is not perfect, but should work with your input that looks quite simple and well-structured.
If you need to handle just a single unnested <Person> tag, you may use simple (<Person>.*?</Person>)|. regex (that will match and capture into Group 1 any <Person> tag and will match any other char) and replace with a conditional replacement pattern (?{1}$1\n:) (that will reinsert Person tag with a newline after it or will replace the match with an empty string):
To make it a bit more generic, you may capture the opening and corresponding closing XML tags with a recursion-based Boost regex, and the appropriate conditional replacement pattern:
Find What: (<(\w+)[^>]*>(?:(?!</?\2\b).|(?1))*</\2>)|.
Replace With: (?{1}$1\n:)
. matches newline: ON
Regex Details:
(<(\w+)[^>]*>(?:(?!</?\2\b).|(?1))*</\2>) - Capturing group 1 (that will be later recursed with the (?1) subrouting call) matching
<(\w+)[^>]*> - any opening tag with its name captured into Group 2
(?:(?!</?\2\b).|(?1))* - zero or more occurrences of:
(?!</?\2\b). - any char (.) not starting a sequence of </ + tag name as a whole word with an optional / in front
| - or
(?1) - the whole Group 1 subpattern is recursed (repeated)
</\2> - the corresponding closing tag
| - or
. - any single char.
Replacement pattern:
(?{1} - if Group 1 matches:
$1\n - replace with its contents + a newline
: - else replace with an empty string
) - end of the replacement pattern.

Related

replaceAll regex to remove last - from the output

I was able to achieve some of the output but not the right one. I am using replace all regex and below is the sample code.
final String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
System.out.println(label.replaceAll(
"([^-]+)-([^-]+)-(.+)-([^-]+)-([^-]+)", "$3"));
i want this output:
abc-nyd-request-xyxpt
but getting:
abc-nyd-request-xyxpt-
here is the code https://ideone.com/UKnepg
You may use this .replaceFirst solution:
String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
label.replaceFirst("(?:[^-]*-){2}(.+?)(?:--1)?-[^-]+$", "$1");
//=> "abc-nyd-request-xyxpt"
RegEx Demo
RegEx Details:
(?:[^-]+-){2}: Match 2 repetitions of non-hyphenated string followed by a hyphen
(.+?): Match 1+ of any characters and capture in group #1
(?:--1)?: Match optional --1
-: Match a -
[^-]+: Match a non-hyphenated string
$: End
The following works for your example case
([^-]+)-([^-]+)-(.+[^-])-+([^-]+)-([^-]+)
https://regex101.com/r/VNtryN/1
We don't want to capture any trailing - while allowing the trailing dashes to have more than a single one which makes it match the double --.
With your shown samples and attempts, please try following regex. This is going to create 1 capturing group which can be used in replacement. Do replacement like: $1in your function.
^(?:.*?-){2}([^-]*(?:-[^-]*){3})--.*
Here is the Online demo for above regex.
Explanation: Adding detailed explanation for above regex.
^(?:.*?-){2} ##Matching from starting of value in a non-capturing group where using lazy match to match very near occurrence of - and matching 2 occurrences of it.
([^-]*(?:-[^-]*){3}) ##Creating 1st and only capturing group and matching everything before - followed by - followed by everything just before - and this combination 3 times to get required output.
--.* ##Matching -- to all values till last.

Exact text (not entire lines) matched by pattern in Notepad++ (npp) [duplicate]

If i have a big text, and i'm needind to keep only matched content, how can i do that?
For example, if I have a text like this:
asdas8Isd8m8Td8r
asdia8y8dasd
asd8is88n8gd
asd8t8od8lsdas
as9ea9ad8r1n88r8e87g6765ejasdm8x
And use this regex: [0-9]([a-z]) to group all letters after a number and replace with \1 i will repace all (number)(letter) to (letter) (And if i want to delete the rest and stay only with the letter matched)?...
Converting this text to
ImTr
y
ing
tol
earnregex
How can i replace this text with grouped and delete the rest?
And if i want to delete all but no matched?
In this case, converting the text to:
8I8m8T8r
8y8d
8i8n8g
8t8o8l
9e9a9r1n8r7g5e8x
Can i match all that is not [0-9]([a-z])?
Thanks! :D
You may use the following regex:
(?i-s)[0-9]([a-z])|.
Replace with (?{1}$1:).
To delete all but non-matched, use the (?{1}$0:) replacement with the same regex.
Details:
(?i-s) - an inline modifier turning on case insensitive mode and turning off the DOTALL mode (. does not match a newline)
[0-9]([a-z]) - an ASCII digit and any ASCII letter captured into Group 1 (later referred to with $1 or \1 backreference from the string replacement pattern)
| - or
. - any char but a line break char.
Replacement details
(?{1} - start of the conditional replacement: if Group 1 matched then...
$1 - the contents of Group 1 (or the whole match if $0 backreference is used)
: - else... nothing
) - end of the conditional replacement pattern.

Notepad++ regex to extract usernames from this list

I have this list below:
scrapeDate,username,full_name,is_private,follower_count,following_count,media_count,biography,hasProfilePic,external_url,email,contact_phone_number,address_street,category,businessJoinDate,businessCountry,businessAds,countryCode,cityName,isverified
07/05/2020 05:37 AM,maplethenorwich,Maple the Norwich,False,0,0,0,,False,,,,,,,,,,,No
07/05/2020 05:37 AM,baby_yoda_militia,Baby Yoda,False,0,0,0,,False,,,,,,,,,,,No
07/05/2020 05:37 AM,caciquegoldendoodle,CaciqueGoldenDoodle,False,0,0,0,,False,,,,,,,,,,,No
07/05/2020 05:37 AM,ja_watts,Julie Anna Watts,False,0,0,0,,False,,,,,,,,,,,No
07/05/2020 05:37 AM,lets_go_zumba_and_travel,Mrsirenetakamoto,False,0,0,0,,False,,,,,,,,,,,No
07/05/2020 05:37 AM,bunnyslash,Bunnyslash,False,0,0,0,,False,,,,,,,,,,,No
I would like to get the Usernames only as below:
maplethenorwich
baby_yoda_militia
caciquegoldendoodle
ja_watts
lets_go_zumba_and_travel
bunnyslash
I've tried ^(?:[^,\r\n]*,){3}([^,\r\n]+).* but it gets me "False".
I wish somebody who can help me to find the right Regex to extract the Usernames only.
You may try:
.*?,(.*?),.*
Explanation of the above regex:
.*? - Lazily matches everything except the new line.
, - Matches , literally.
(.*?) - Represents first capturing group matching lazily username or the second values in csv.
,.* - Greedily matching everything except new line. If you don't want to remove the contents; just leave this and capture the above group and write them to a new file or according to your requirement.
$1 - For the replacement part replace all the matched text with just the captured group using $1.
You can find the demo of the above regex in here.
Result Snap from notepad++
You are repeating the group 3 times using quantifier {3}, but there is no need to repeat it because you want the second value.
^(?:[^,\r\n]*,){3}([^,\r\n]+).*
^^^ ^^^^
You can omit the quantifier and the non capturing group as there is nothing to repeat.
^[^,\r\n]*,([^,\r\n]+).*
^ Start of the string
[^,\r\n]*, Match 0+ times any char except a comma or newline, then match ,
( Capture group 1
[^,\r\n]+ Match 1+ times any char except a comma or newline
) Close group 1
.* Match the rest of the line
Regex demo

Removing everything but the regex result (Notepad++) [duplicate]

If i have a big text, and i'm needind to keep only matched content, how can i do that?
For example, if I have a text like this:
asdas8Isd8m8Td8r
asdia8y8dasd
asd8is88n8gd
asd8t8od8lsdas
as9ea9ad8r1n88r8e87g6765ejasdm8x
And use this regex: [0-9]([a-z]) to group all letters after a number and replace with \1 i will repace all (number)(letter) to (letter) (And if i want to delete the rest and stay only with the letter matched)?...
Converting this text to
ImTr
y
ing
tol
earnregex
How can i replace this text with grouped and delete the rest?
And if i want to delete all but no matched?
In this case, converting the text to:
8I8m8T8r
8y8d
8i8n8g
8t8o8l
9e9a9r1n8r7g5e8x
Can i match all that is not [0-9]([a-z])?
Thanks! :D
You may use the following regex:
(?i-s)[0-9]([a-z])|.
Replace with (?{1}$1:).
To delete all but non-matched, use the (?{1}$0:) replacement with the same regex.
Details:
(?i-s) - an inline modifier turning on case insensitive mode and turning off the DOTALL mode (. does not match a newline)
[0-9]([a-z]) - an ASCII digit and any ASCII letter captured into Group 1 (later referred to with $1 or \1 backreference from the string replacement pattern)
| - or
. - any char but a line break char.
Replacement details
(?{1} - start of the conditional replacement: if Group 1 matched then...
$1 - the contents of Group 1 (or the whole match if $0 backreference is used)
: - else... nothing
) - end of the conditional replacement pattern.

Remove all text except the RegEx pattern (ABC|123|XYZ)(\w+) in Notepad++ [duplicate]

If i have a big text, and i'm needind to keep only matched content, how can i do that?
For example, if I have a text like this:
asdas8Isd8m8Td8r
asdia8y8dasd
asd8is88n8gd
asd8t8od8lsdas
as9ea9ad8r1n88r8e87g6765ejasdm8x
And use this regex: [0-9]([a-z]) to group all letters after a number and replace with \1 i will repace all (number)(letter) to (letter) (And if i want to delete the rest and stay only with the letter matched)?...
Converting this text to
ImTr
y
ing
tol
earnregex
How can i replace this text with grouped and delete the rest?
And if i want to delete all but no matched?
In this case, converting the text to:
8I8m8T8r
8y8d
8i8n8g
8t8o8l
9e9a9r1n8r7g5e8x
Can i match all that is not [0-9]([a-z])?
Thanks! :D
You may use the following regex:
(?i-s)[0-9]([a-z])|.
Replace with (?{1}$1:).
To delete all but non-matched, use the (?{1}$0:) replacement with the same regex.
Details:
(?i-s) - an inline modifier turning on case insensitive mode and turning off the DOTALL mode (. does not match a newline)
[0-9]([a-z]) - an ASCII digit and any ASCII letter captured into Group 1 (later referred to with $1 or \1 backreference from the string replacement pattern)
| - or
. - any char but a line break char.
Replacement details
(?{1} - start of the conditional replacement: if Group 1 matched then...
$1 - the contents of Group 1 (or the whole match if $0 backreference is used)
: - else... nothing
) - end of the conditional replacement pattern.