Regex how to get groups within json expression - regex

Say I have following JSON:
{"MyPerson":{"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}},"AnotherComplexObject":{"Another":"Yes","Me":"True"},"Count":1,"Start":2}
I remove starting { and ending } and get:
"MyPerson":{"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}},"AnotherComplexObject":{"Another":"Yes","Me":"True"},"Count":1,"Start":2
Now, what regex would I use to get "complex objects" out, for example in that JSON I would want to get these two results:
{"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}}
{"Another":"Yes","Me":"True"}
The closest I've came to solution is this regex {[^}]*} but that one fails to select } in that "Number":15 result.

# String
# "MyPerson":{"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}},"AnotherComplexObject":{"Another":"Yes","Me":"True"},"Count":1,"Start":2
/({[^}]+}})/
# http://rubular.com/r/rupVEn9yZo
# Match Groups
# 1. {"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}}
/({[^}]+})/
# http://rubular.com/r/H5FaoH18c8
# Match Groups
# Match 1
# 1. {"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}
# Match 2
# 1. {"Another":"Yes","Me":"True"}
/({[^}]+}})[^{]+({[^}]+})/
# http://rubular.com/r/zmcyjvoR1y
# Match Groups
# 1. {"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}}
# 2. {"Another":"Yes","Me":"True"}
# String
# {"MyPerson":{"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}},"AnotherComplexObject":{"Another":"Yes","Me":"True"},"Count":1,"Start":2}
/[^{]+({[^}]+}})[^{]+({[^}]+})/
# http://rubular.com/r/qCxN1Rk9Ka
# Match Groups
# 1. {"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}}
# 2. {"Another":"Yes","Me":"True"}

Related

Combine Regex searches into one match

I am trying to set up a Regex which should combine two searches into one full match.
My demo String is:
Name Klein Vorname Marvin
The Regex should find: Marvin Klein
The names can be different. Does anybody know a way how I can get this to work?
This is how far I already got: ^(?=^Name)(?=.*$)
Thanks!
In Python, using tagged groups and general search for alphabetic characters:
import re
demo_str = "Name Klein Vorname Marvin"
pattern = r"""
(?:[\w]+) # Skip the first string ('Name')
\s+? # One or more whitespaces
(?P<last>[\w]+) # Tagged group 'first'
\s+? # One or more whitespaces
(?:[\w]+) # Skip the second string ('Vorname')
\s+? # One or more whitespaces
(?P<first>[\w]+) # Tagged group 'last'
"""
res = re.search(pattern, demo_str, flags = re.X) # re.X handles multi-line patterns
res.group("last") # 'Klein'
res.group("first") # 'Marvin'
EDIT: You can reformat to get your output pretty quickly. I don't know if positioning was expected to be part of the pattern.
" ".join([res.group("first"), res.group("last")]) # 'Marvin Klein'
Or, numerically:
" ".join([res.group(2), res.group(1)]) # 'Marvin Klein'

Regular expression enforcing at least one in two groups

I have to parse a string using regular expressions in which at least one group in a set of two is required. I cannot figure out how to write this case.
To illustrate the problem we can think parsing this case:
String: aredhouse theball bluegreencar the
Match: ✓ ✓ ✓ ✗
Items are separated by spaces
Each item is composed by an article, a colour and an object defined by groups in the following expression (?P<article>the|a)?(?P<colour>(red|green|blue|yellow)*)(?P<object>car|ball|house)?\s*
An item can have an 'article' but must have a 'colour' or/and an 'object'.
Is there a way of making 'article' optional but require at least one 'colour' or 'object' using regular expressions?
Here is the coded Go version of this example, however I guess this is generic regexp question that applies to any language.
This is working with your testcases.
/
(?P<article>the|a)? # optional article
(?: # non-capture group, mandatory
(?P<colour>(?:red|green|blue|yellow)+) # 1 or more colors
(?P<object>car|ball|house) # followed by 1 object
| # OR
(?P<colour>(?:red|green|blue|yellow)+) # 1 or more colors
| # OR
(?P<object>car|ball|house) # 1 object
) # end group
/x
It can be reduced to:
/
(?P<article>the|a)? # optional article
(?: # non-capture group, mandatory
(?P<colour>(?:red|green|blue|yellow)+) # 1 or more colors
(?P<object>car|ball|house)? # followed by optional object
| # OR
(?P<object>car|ball|house) # 1 object
) # end group
/x
In regex, there's a few special signs that indicate the expected number of matches for a character or a group:
* - zero or more
+ - one or more
? - zero or one
These applied, your regex looks like this:
(?P<article>(the|a)?)(?P<colour>(red|green|blue|yellow)+)(?P<object>(car|ball|house)+)\s*
None or one article.
One or more colors.
Finally one or more objects.

Capturing the same regular expression over multiple lines

I want to capture series of file names that are listed each in a new line, and I have figured out how to capture the file name in the first line, but I haven't figured out how to repeat it on the subsequent lines.
# Input
# data/raw/file1
# data/raw/file2
# Output
# data/interim/file1
# data/interim/file2
Current Attempt
The regular expression I currently have is
# Input\n(# (.*))
And my inner capture group properly captures data/raw/file1.
Desired Output
What I want is to grab all of the files in between # Input and # Output, so in this example, data/raw/file1 and data/raw/file2.
Go with \G magic:
(?:^#\s+Input|\G(?!\A))\R*(?!#\s+Output)#\s*(.*)|[\s\S]*
Live demo
Regex breakdown
(?: # Start of non-capturing group (a)
^#\s+Input # Match a line beginning with `# Input`
| # Or
\G(?!\A) # Continue from previous successful match point
) # End of NCG (a)
\R* # Match any kind of newline characters
(?!#\s+Output) # Which are not followed by such a line `# Output`
#\s*(.*) # Start matching a path line and capture path
| # If previous patterns didn't match....
[\s\S]* # Then match everything else up to end to not involve engine a lot
PHP code:
$re = '~(?:^#\s+Input|\G(?!\A))\R*(?!#\s+Output)#\s*(.*)|[\s\S]*~m';
$str = '# Input
# data/raw/file1
# data/raw/file2
# Output
# data/interim/file1
# data/interim/file2';
preg_match_all($re, $str, $matches, PREG_PATTERN_ORDER, 0);
// Print the entire match result
print_r(array_filter($matches[1]));
Output:
Array
(
[0] => data/raw/file1
[1] => data/raw/file2
)
Using the s modifier, preg_match, and preg_split you can get each result on its own.
preg_match('/# Input\n(# (?:.*?))# Output/s', '# Input
# data/raw/file1
# data/raw/file2
# Output
# data/interim/file1
# data/interim/file2', $match);
$matched = preg_split('/# /', $match[1], -1, PREG_SPLIT_NO_EMPTY);
print_r($matched);
Demo: https://3v4l.org/dAcRp
Regex demo: https://regex101.com/r/5tfJGM/1/

Restructure CSV data with Notepad++, Regex

I have a CSV file with following headers and (sample) data:
StopName,RouteName,Travel_Direction,Latitude,Longitude
StreetA # StreetB,1 NameA,DirectionA,Lat,Long
StreetC # StreetD,1 NameA,DirectionA,Lat,Long
...
StreetE # StreetF,1 NameA,DirectionB,Lat,Long
StreetG # StreetH,1 NameA,DirectionB,Lat,Long
...
StreetI # StreetJ,2 NameB,DirectionC,Lat,Long
StreetK # StreetL,2 NameB,DirectionC,Lat,Long
...
StreetM # StreetN,2 NameB,DirectionD,Lat,Long
StreetO # StreetP,2 NameB,DirectionD,Lat,Long
.
.
.
I am wanting to use regex (currently in Notepad++) to get the following results:
1 NameA - DirectionA=[[StreetA # StreetB,[Lat,Long]], [StreetC # StreetD,[Lat,Long]], ...]
1 NameA - DirectionB=[[StreetD # StreetE,[Lat,Long]], [StreetF # StreetG,[Lat,Long]], ...]
2 NameB - DirectionC=[[StreetH # StreetI,[Lat,Long]], [StreetJ # StreetK,[Lat,Long]], ...]
2 NameB - DirectionD=[[StreetL # StreetM,[Lat,Long]], [StreetN # StreetO,[Lat,Long]], ...]
.
.
.
With the Regex and Substitution,
RgX: ^([^,]*),([^,]*),([^,]*),(.*)
Sub: $2 - $3=[$1,[\4]]
Demo: https://regex101.com/r/gS9hD6/1
I have gotten this far:
1 NameA - DirectionA=[StreetA # StreetB,[Lat,Long]]
1 NameA - DirectionA=[StreetC # StreetD,[Lat,Long]]
1 NameA - DirectionB=[StreetE # StreetF,[Lat,Long]]
1 NameA - DirectionB=[StreetG # StreetH,[Lat,Long]]
2 NameB - DirectionC=[StreetI # StreetJ,[Lat,Long]]
2 NameB - DirectionC=[StreetK # StreetL,[Lat,Long]]
2 NameB - DirectionD=[StreetM # StreetN,[Lat,Long]]
2 NameB - DirectionD=[StreetO # StreetP,[Lat,Long]]
In a new regex, I tried splitting the above result on "=", but didn't know where to go from there.
I think one way to get the desired results would be to keep first unique instance of what's before "=", replace new line with "," and enclose it with a [..] to make it an array form.
Edit:
There are about 10k stops (total), but only about 100 unique routes.
Edit 2: (maybe I am asking for too many changes now)
For first regex:
What if I want to use "\n" instead of "="?
At beginning of 2nd regex replacement,
What if I have only RouteName and StopName columns, like this: 1
NameA - DirectionA=[StreetA # StreetB, ...]?
Similarly, what if I only have RouteName and Coordinates, like this:
1 NameA - DirectionA=[[Lat,Long]]?
Steps
1. First replacement:
Find what: ^([^,]*),([^,]*),([^,]*),(.*)
Replace with: \2 - \3=[[\1,[\4]]]
Replace All
2. Second replacement:
Find what: ^[\S\s]*?^([^][]*=)\[\[.*\]\]\K\]\R\1\[(.*)\]$
Replace with: , \2]
Replace All
3. Repeat step 2 until there are no more occurences.
This means that if there are 100 instances (Stops) for the same key (Route - Direction pair), you will have to click Replace All 7 times (ceiling(log2(N))).
Description
I modified your regex in step 1 to add an extra pair of brackets that will enclose the whole set.
For step 2, it finds a pair of lines for the same Direction, appending the last to the previous one.
^[\S\s]*?^([^][]*=) #Group 1: captures "1 NameA - DirA="
\[\[.*\]\] #matches the set of Stops - "[[StA # StB,[Lat,Long]], ..."
\K #keeps the text matched so far out of the match
\]\R #closing "]" and newline
\1 #match next line (if the same route)
\[(.*)\]$ #and capture the Stop (Group 2)
regex101 Demo for step 1
regex101 Demo for step 2
Try this one
I checked it with mobile notepad no error.
Find what:
(s.+#\s\w+),(\d{1,} \w+),(\w+),(.+)
Replace with:
\2 - \3=[[\1,[\4],...]]

Replacing data in CSV file with Regex

I have a CSV file (exported data from iWork Numbers) which contains of a list of users with information. What I want to do is to replace ;;;;;;;;; with ; on all lines accept "Last login".
By doing so and importing the file to Numbers again the data will (hopefully) be divided in rows like this:
User 1 | Points: 1 | Registered: 2012-01-01 | Last login 2012-02-02
User 2 | Points: 2 | Registered: 2012-01-01 | Last login 2012-02-02
How the CSV file looks:
;User1;;;;;;;;;
;Points: 1;;;;;;;;;
;Registered: 2012-01-01;;;;;;;;;
;Last login: 2012-02-02;;;;;;;;;
;User2;;;;;;;;;
;Points: 2;;;;;;;;;
;Registered: 2012-01-01;;;;;;;;;
;Last login: 2012-02-02;;;;;;;;;
So my question is what Regex code should I type in the Find and Replace fields?
Thanks in advance!
See the regex in action:
Find : ^(;(?!Last).*)(;{9})
Replace: $1;
Output will be:
;User1;
;Points: 1;
;Registered: 2012-01-01;
;Last login: 2012-02-02;;;;;;;;;
;User2;
;Points: 2;
;Registered: 2012-01-01;
;Last login: 2012-02-02;;;;;;;;;
Explanation
Find:
^ # Match start of the line
( # Start of the 1st capture group
;(?!Last) # Match a semicolon (;), only if not followed by 'Last' word.
.* # Match everything
) # End of the 1st capture group
( # Start of the 2nd capture group
;{9} # Match exactly 9 semicolons
) # End of the 2nd capture group
Replace:
$1; # Leave 1st capture group as is and append a semicolon.