notepad++ regex divide two lists - regex

i have below list:
21870172299%3Akvm6wcmcVYaoQ2J%3A2 340282366841710300949128111982633033733
21200717504%3AUhGubOhpHPtBKLk%3A6 340282366841710300949128111984034029824
21256096197%3AMGYmtB2uoj4er5i%3A1 340282366841710300949128111984541030820
11665946937%3AHBBkUBzcy3cvbtb%3A5 340282366841710300949128111986242038268
21719881031%3AH3t9c4b7re6cs5%3A24 340282366841710300949128111986284030213
21697692027%3A1S0fM2Jp6Ivsxo9%3A5 340282366841710300949128111986299030036
20424141770%3AFPiScGMuAVBPGvk%3A7 340282366841710300949128111987613032298
I would like to use regular expression to divide these 2 list. example:
list1:
21870172299%3Akvm6wccVYaoQ2J%3A2
21200717504%3AUhGubOpHPtBKLk%3A6
21256096197%3AMGYmtBuoj4er5i%3A1
11665946937%3AHBBkUBcy3cvbtb%3A5
21719881031%3AH3t9c4b7re6cs5%3A24
21697692027%3A1S0fMJp6Ivsxo9%3A5
20424141770%3AFPiSGMuAVBPGvk%3A7
list2:
340282366841710300949128111982633033733
340282366841710300949128111984034029824
340282366841710300949128111984541030820
340282366841710300949128111986242038268
340282366841710300949128111986284030213
340282366841710300949128111986299030036
340282366841710300949128111987613032298
I have tried to use online regex (regex101) but with failed attempts.
Kindly help me to divide this lists.
Thank you.

Copy this text and paste twice to your text file, one below the other.
Select first block of data:
Check "In selection" option and use pattern (^\S+).+ and replace it \1 meaning replacing with first capturing group.
Pattern explanation: ^ matches beginning of a string, \S+ matches one or more non-whitespace characters, .+ matches one or more of any character, (...) means store matched text in first capturing group.
Similarly, select second block of data and use pattern: ^\S+\s+(.+)
\s+ matches one or more of whitespaces. Again, check "In selection" check box.

Related

RegEx to match all sets of items that have part of specific value

I'm trying to use RegEx to filter all sets of items that have part of a specific value in a capture group that I have defined.
I have to check if the fifth capture group contains at least part of a specific text.
My string:
First Item;Second Item;Third Item;Fourth Item;First Word;Sixth
Item?First Item;Second Item;Third Item;Fourth Item;Second Word;Sixth
Item?First Item;Second Item;Third Item;Fourth Item;Can't Capture This
Set;Sixth Item
RegEx that works for exact word:
(?:^|\?)([^;]+);([^;]+);([^;]+);([^;]+);(Second Word);([^;\?$]+)
The problem is that I need this RegEx to work to capture only part of the word.
Not Working:
(?:^|\?)([^;]+);([^;]+);([^;]+);([^;]+);(.*Word.*);([^;\?$]+) >
Thanks!
Use [^;]* instead of .* because you have semi-colons as field delimiters:
(?:^|\?)([^;]+);([^;]+);([^;]+);([^;]+);([^;]*Word[^;]*);([^;?]+)
See proof. ([^;]*Word[^;]*) will match zero or more characters other than semi-colons, then a Word and zero or more characters other than semi-colons.

Regex: Find multiple matching strings in all lines

I'm trying to match multiple strings in a single line using regex in Sublime Text 3.
I want to match all values and replace them with null.
Part of the string that I'm matching against:
"userName":"MyName","hiScore":50,"stuntPoints":192,"coins":200,"specialUser":false
List of strings that it should match:
"MyName"
50
192
200
false
Result after replacing:
"userName":null,"hiScore":null,"stuntPoints":null,"coins":null,"specialUser":null
Is there a way to do this without using sed or any other substitution method, but just by matching the wanted pattern in regex?
You can use this find pattern:
:(.*?)(,|$)
And this replace pattern:
:null\2
The first group will match any symbol (dot) zero or more times (asterisk) with this last quantifier lazy (question mark), this last part means that it will match as little as possible. The second group will match either a comma or the end of the string. In the replace pattern, I substitute the first group with null (as desired) and I leave the symbol matched by the second group unchanged.
Here is an alternative on amaurs answer where it doesn't put the comma in after the last substitution:
:\K(.*?)(?=,|$)
And this replacement pattern:
null
This works like amaurs but starts matching after the colon is found (using the \K to reset the match starting point) and matches until a comma of new line (using a positive look ahead).
I have tested and this works in Sublime Text 2 (so should work in Sublime Text 3)
Another slightly better alternative to this is:
(?<=:).+?(?=,|$)
which uses a positive lookbehind instead of resetting the regex starting point
Another good alternative (so far the most efficient here):
:\K[^,]*
This may help.
Find: (?<=:)[^,]*
Replace: null

Regex expressions to match text between first comma and the comma before the first number

I have a csv file with all UK areas (43000 rows).
However, even though the fields are separated with commas, they are not enclosed with anything, hence if the field has commas within its contents, import to a database fails.
Fortunately, there is only one field that has commas within its content.
I need a regular expression that I could use to select this field on all rows.
Here is an example of data:
Aberaman,Rhondda, Cynon, Taf (Rhondda, Cynon, Taff),51.69N,03.43W,SO0101
Aberangell,Powys,52.67N,03.71W,SH8410
This should look like:
Aberaman,"Rhondda, Cynon, Taf (Rhondda, Cynon, Taff)",51.69N,03.43W,SO0101
Aberangell,"Powys",52.67N,03.71W,SH8410
So I need to basically select the second field, which is between the first comma and the comma just before the first number.
I will use sublime text 2 to perform this regex search.
Sublime text2 supports \K,
Regex:
^[^,]*,\K(.*?)(?=,\d)
Replacement string:
"\1"
DEMO
Explanation:
^ Asserts that we are at the start of a line.
[^,]* Matches any character not of comma zero or more times.
, Literal comma.
\K Previously matched characters would be discarded.
(.*?)(?=,\d) Matches any character zeror or more times which must be followed by , and a number. ? after * does a reluctant match.
You can try with capturing groups. Simply substitute it with $1"$2"$3 or \1"\2"\3
^(\w+,)([^\d]*)(,.*)$
Live Demo
You can do it in Notepad++ as well.
Find what: ^(\w+,)([^\d]*)(,.*)$
Replace with: $1"$2"$3
A regex which should be able to solve your problem is:
^.*?,(.*?),\d+
This matches
anything (non-greedy) up to first comma (which will not be included in result)
then anything up to second comma (which will be in a group)
and additional condition is that there has to be a number after second comma
So your group is in $1

perl style regex to match nth item in a list

Trying to match the third item in this list:
/text word1, word2, some_other_word, word_4
I tried using this perl style regex to no avail:
([^, ]*, ){$m}([^, ]*),
I want to match ONLY the third word, nothing before or after, and no commas or whitespace. I need it to be a regex, this is not in a program but UltraEdit for a word file.
What can I use to match some_other_word (Or anything third in the list.)
Based on some input by the community members I made the following change to make the logic of the regex pattern clearer.
/^(?:(?:.(?<!,))+,){2}\s*(\w+).*/x
Explanation
/^ # 1.- Match start of line.
(?:(?:.(?<!,))+ # 2.- Match but don't capture a secuence of character not containing a comma ...
,) # 3.- followed by a comma
{2} # 4.- (exactly two times)
\s* # 5.- Match any optional space
(\w+) # 6.- Match and capture a secuence of the characters represented by \w a leat one character long.
.* # 7.- Match anything after that if neccesary.
/x
This is the one suggested previously.
/(?:\w+,?\s*){3}(\w+)/
Try group 1 of this regex:
^(?:.*?,){2}\s*(.*?)\s*(,|$)
See a live demo using your sample, plus an edge case, input showing capture in group 1.
It can't only return one match at a time because your string has more than one occurrence of the same pattern and Regular Expression doesn't have a selective return option! So you can do whatever you want from the returned array.
,\s?([^,]+)
See it in action, 2nd matched group is what you need.

Regex match for text

I am tring to create a regex to match the content between numbered lists, e.g. with the following content:
1) Text for part 1
2) Text for part 2
3) Text for part 3
The following PCRE should work, assuming you haven't got any thing formatted like "1)" or the like inside of the sections:
\d+\)\s*(.*?)\s*(?=\d+\)|$)
Explanation:
\d+\) gives a number followed by a ).
\s* matches the preceding whitespace.
(.*?) captures the contents non-greedily.
\s* matches the trailing whitespace.
(?=\d+\)|$) ensures that the match is followed by either the start of a new section or the end of the text.
Note, it doesn't enforce that they must be ascending or anything like that, so it'd match the following text as well:
4) Hello there 1) How are you? 5) Good.
I'd suggest the following (PCRE):
(?:\d+\)\s*(.*?))*$
The inner part \d+\)\s* matches the list number and the closing brace, followed by optional white space(s).
(.*?) matches the list text, but in a non-greedy manner (otherwise, it would also match the next list item).
The enclosing (?: )*$ then matches the above zero or more times, until the end of the input.
You should keep in mind text after number and bracket might be any text, this would find your substrings:
\d\).+?(?=\d\)|$)
EDIT:
To get rid of whitespace and return only text without a number, get group 1 from following match:
\d\)\w*(.+?)(?=\d\)|$)
To get number in group(1) and text in group(2) use this:
(\d)\)\w*(.+?)(?=\d\)|$)