Restructure CSV data with Notepad++, Regex - regex

I have a CSV file with following headers and (sample) data:
StopName,RouteName,Travel_Direction,Latitude,Longitude
StreetA # StreetB,1 NameA,DirectionA,Lat,Long
StreetC # StreetD,1 NameA,DirectionA,Lat,Long
...
StreetE # StreetF,1 NameA,DirectionB,Lat,Long
StreetG # StreetH,1 NameA,DirectionB,Lat,Long
...
StreetI # StreetJ,2 NameB,DirectionC,Lat,Long
StreetK # StreetL,2 NameB,DirectionC,Lat,Long
...
StreetM # StreetN,2 NameB,DirectionD,Lat,Long
StreetO # StreetP,2 NameB,DirectionD,Lat,Long
.
.
.
I am wanting to use regex (currently in Notepad++) to get the following results:
1 NameA - DirectionA=[[StreetA # StreetB,[Lat,Long]], [StreetC # StreetD,[Lat,Long]], ...]
1 NameA - DirectionB=[[StreetD # StreetE,[Lat,Long]], [StreetF # StreetG,[Lat,Long]], ...]
2 NameB - DirectionC=[[StreetH # StreetI,[Lat,Long]], [StreetJ # StreetK,[Lat,Long]], ...]
2 NameB - DirectionD=[[StreetL # StreetM,[Lat,Long]], [StreetN # StreetO,[Lat,Long]], ...]
.
.
.
With the Regex and Substitution,
RgX: ^([^,]*),([^,]*),([^,]*),(.*)
Sub: $2 - $3=[$1,[\4]]
Demo: https://regex101.com/r/gS9hD6/1
I have gotten this far:
1 NameA - DirectionA=[StreetA # StreetB,[Lat,Long]]
1 NameA - DirectionA=[StreetC # StreetD,[Lat,Long]]
1 NameA - DirectionB=[StreetE # StreetF,[Lat,Long]]
1 NameA - DirectionB=[StreetG # StreetH,[Lat,Long]]
2 NameB - DirectionC=[StreetI # StreetJ,[Lat,Long]]
2 NameB - DirectionC=[StreetK # StreetL,[Lat,Long]]
2 NameB - DirectionD=[StreetM # StreetN,[Lat,Long]]
2 NameB - DirectionD=[StreetO # StreetP,[Lat,Long]]
In a new regex, I tried splitting the above result on "=", but didn't know where to go from there.
I think one way to get the desired results would be to keep first unique instance of what's before "=", replace new line with "," and enclose it with a [..] to make it an array form.
Edit:
There are about 10k stops (total), but only about 100 unique routes.
Edit 2: (maybe I am asking for too many changes now)
For first regex:
What if I want to use "\n" instead of "="?
At beginning of 2nd regex replacement,
What if I have only RouteName and StopName columns, like this: 1
NameA - DirectionA=[StreetA # StreetB, ...]?
Similarly, what if I only have RouteName and Coordinates, like this:
1 NameA - DirectionA=[[Lat,Long]]?

Steps
1. First replacement:
Find what: ^([^,]*),([^,]*),([^,]*),(.*)
Replace with: \2 - \3=[[\1,[\4]]]
Replace All
2. Second replacement:
Find what: ^[\S\s]*?^([^][]*=)\[\[.*\]\]\K\]\R\1\[(.*)\]$
Replace with: , \2]
Replace All
3. Repeat step 2 until there are no more occurences.
This means that if there are 100 instances (Stops) for the same key (Route - Direction pair), you will have to click Replace All 7 times (ceiling(log2(N))).
Description
I modified your regex in step 1 to add an extra pair of brackets that will enclose the whole set.
For step 2, it finds a pair of lines for the same Direction, appending the last to the previous one.
^[\S\s]*?^([^][]*=) #Group 1: captures "1 NameA - DirA="
\[\[.*\]\] #matches the set of Stops - "[[StA # StB,[Lat,Long]], ..."
\K #keeps the text matched so far out of the match
\]\R #closing "]" and newline
\1 #match next line (if the same route)
\[(.*)\]$ #and capture the Stop (Group 2)
regex101 Demo for step 1
regex101 Demo for step 2

Try this one
I checked it with mobile notepad no error.
Find what:
(s.+#\s\w+),(\d{1,} \w+),(\w+),(.+)
Replace with:
\2 - \3=[[\1,[\4],...]]

Related

Combine Regex searches into one match

I am trying to set up a Regex which should combine two searches into one full match.
My demo String is:
Name Klein Vorname Marvin
The Regex should find: Marvin Klein
The names can be different. Does anybody know a way how I can get this to work?
This is how far I already got: ^(?=^Name)(?=.*$)
Thanks!
In Python, using tagged groups and general search for alphabetic characters:
import re
demo_str = "Name Klein Vorname Marvin"
pattern = r"""
(?:[\w]+) # Skip the first string ('Name')
\s+? # One or more whitespaces
(?P<last>[\w]+) # Tagged group 'first'
\s+? # One or more whitespaces
(?:[\w]+) # Skip the second string ('Vorname')
\s+? # One or more whitespaces
(?P<first>[\w]+) # Tagged group 'last'
"""
res = re.search(pattern, demo_str, flags = re.X) # re.X handles multi-line patterns
res.group("last") # 'Klein'
res.group("first") # 'Marvin'
EDIT: You can reformat to get your output pretty quickly. I don't know if positioning was expected to be part of the pattern.
" ".join([res.group("first"), res.group("last")]) # 'Marvin Klein'
Or, numerically:
" ".join([res.group(2), res.group(1)]) # 'Marvin Klein'

Regular expression to replace txt in Sublime text editor

I have thousands of rows to edit so need to use regex.
One row is something like this.
T1,1,Example Text1 Text
T2,2,Example Text 2 Text
T3,3,Example Text 3 Text3
I want to convert this data to like this.
{pid:T1,sid:1,name:"Example Text1 Text"},
{pid:T2,sid:2,name:"Test Text1 Text"},
{pid:T3,sid:3,name:"Content Text1 Text"},
How can I do this?
I've tried to replace first and last characters using ^ & $.
but I want to convert ",1," to ",sid:1," and "Example Test" to "name:'Example Text'".
Anyhelp would be appreciate.
Find ^(T\d+,)(\d+,)(.*)$ and replace with {pid:\1sid:\2name:"\3"},
Make sure that the search is set to use Regex.
Matching Regex Breakdown:
^ # Start of Line
( # Capture Group #1 (for Tx)
T # "T"
\d+ # 1 or more digits (T1, T2, T27, etc.)
, # ","
)
( # Capture Group #2 (for the `sid`)
\d+ # 1 or more digits (for the `sid`)
, # ","
)
( # Capture Group #3 (for the string)
.* # String (name)
)
$ # End of Line

Removing everything except a "part" of the string

Here is the string, a full example:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
('1416851046', '1416851046', '2607:5300:60:6097::', '80.187.100.105', 'lagbugdc', 393884, '737537953'),
('1416851067', '1416851067', '174.66.174.101', '98.148.244.151', 'maihym', 393885, '1473193487'),
('1416851094', '1416851094', '2607:5300:60:6097::', '92.157.2.230', 'xeosse26', 393886, '737537953'),
I'd like to remove -EVERYTHING- from it except: facebook:jens.pettersson.7568
(the username slot)
And where facebook:jens.pettersson.7568 is actually 'facebook:jens.pettersson.7568', I'd like it to appear as:
facebook:jens.pettersson.7568 (see the white space there?)
Then sort my list where all 361k lines line up like so:
x x xx xcx xzx xyx xtz
All with spaces, in technically 1 line, if possible.
Or if removing and just collecting the 1 line I need would suffice, I could manually do the sorting i suppose
I'm going to read between the lines and guess that what you want is this:
BEFORE:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
^ this is username
AFTER:
facebook:humpy_electro
You could handle that with the following regex:
s/(?:[^,]*,){4}[\s'"]*([^'",]*).*/facebook:$1, /
i.e.
(?: # begin non-capturing group
[^,]*, # zero or more non-comma characters, followed by a comma
){4} # end non-capturing group, and repeat 4 times
# this skips the first 4 columns of data
[\s'"]* # matches any whitespace and the first quote
( # begin capturing group 1
[^'",]* # capture all non-comma characters until the end quote
) # end capturing group 1
.* # match rest of line
# REPLACE WITH
facebook: # literal text
$1 # capturing group 1
, # comma and a trailing space (not shown here)
And voila.
This turns this:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
('1416851046', '1416851046', '2607:5300:60:6097::', '80.187.100.105', 'lagbugdc', 393884, '737537953'),
('1416851067', '1416851067', '174.66.174.101', '98.148.244.151', 'maihym', 393885, '1473193487'),
('1416851094', '1416851094', '2607:5300:60:6097::', '92.157.2.230', 'xeosse26', 393886, '737537953'),
Into this
facebook:humpy_electro, facebook:lagbugdc, facebook:maihym, facebook:xeosse26,
I got it, from a friend, to do this was a 2 part: First step: ^((.? '){4}) replace with nothing, then, second step '((.?$){1}) replace with nothing.

Regex how to get groups within json expression

Say I have following JSON:
{"MyPerson":{"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}},"AnotherComplexObject":{"Another":"Yes","Me":"True"},"Count":1,"Start":2}
I remove starting { and ending } and get:
"MyPerson":{"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}},"AnotherComplexObject":{"Another":"Yes","Me":"True"},"Count":1,"Start":2
Now, what regex would I use to get "complex objects" out, for example in that JSON I would want to get these two results:
{"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}}
{"Another":"Yes","Me":"True"}
The closest I've came to solution is this regex {[^}]*} but that one fails to select } in that "Number":15 result.
# String
# "MyPerson":{"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}},"AnotherComplexObject":{"Another":"Yes","Me":"True"},"Count":1,"Start":2
/({[^}]+}})/
# http://rubular.com/r/rupVEn9yZo
# Match Groups
# 1. {"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}}
/({[^}]+})/
# http://rubular.com/r/H5FaoH18c8
# Match Groups
# Match 1
# 1. {"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}
# Match 2
# 1. {"Another":"Yes","Me":"True"}
/({[^}]+}})[^{]+({[^}]+})/
# http://rubular.com/r/zmcyjvoR1y
# Match Groups
# 1. {"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}}
# 2. {"Another":"Yes","Me":"True"}
# String
# {"MyPerson":{"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}},"AnotherComplexObject":{"Another":"Yes","Me":"True"},"Count":1,"Start":2}
/[^{]+({[^}]+}})[^{]+({[^}]+})/
# http://rubular.com/r/qCxN1Rk9Ka
# Match Groups
# 1. {"Firstname":"First","Lastname":"Last","Where":{"Street":"Street","Number":15}}
# 2. {"Another":"Yes","Me":"True"}

Replacing data in CSV file with Regex

I have a CSV file (exported data from iWork Numbers) which contains of a list of users with information. What I want to do is to replace ;;;;;;;;; with ; on all lines accept "Last login".
By doing so and importing the file to Numbers again the data will (hopefully) be divided in rows like this:
User 1 | Points: 1 | Registered: 2012-01-01 | Last login 2012-02-02
User 2 | Points: 2 | Registered: 2012-01-01 | Last login 2012-02-02
How the CSV file looks:
;User1;;;;;;;;;
;Points: 1;;;;;;;;;
;Registered: 2012-01-01;;;;;;;;;
;Last login: 2012-02-02;;;;;;;;;
;User2;;;;;;;;;
;Points: 2;;;;;;;;;
;Registered: 2012-01-01;;;;;;;;;
;Last login: 2012-02-02;;;;;;;;;
So my question is what Regex code should I type in the Find and Replace fields?
Thanks in advance!
See the regex in action:
Find : ^(;(?!Last).*)(;{9})
Replace: $1;
Output will be:
;User1;
;Points: 1;
;Registered: 2012-01-01;
;Last login: 2012-02-02;;;;;;;;;
;User2;
;Points: 2;
;Registered: 2012-01-01;
;Last login: 2012-02-02;;;;;;;;;
Explanation
Find:
^ # Match start of the line
( # Start of the 1st capture group
;(?!Last) # Match a semicolon (;), only if not followed by 'Last' word.
.* # Match everything
) # End of the 1st capture group
( # Start of the 2nd capture group
;{9} # Match exactly 9 semicolons
) # End of the 2nd capture group
Replace:
$1; # Leave 1st capture group as is and append a semicolon.