Replacing data in CSV file with Regex

Replacing data in CSV file with Regex - regex

I have a CSV file (exported data from iWork Numbers) which contains of a list of users with information. What I want to do is to replace ;;;;;;;;; with ; on all lines accept "Last login".
By doing so and importing the file to Numbers again the data will (hopefully) be divided in rows like this:
User 1 | Points: 1 | Registered: 2012-01-01 | Last login 2012-02-02
User 2 | Points: 2 | Registered: 2012-01-01 | Last login 2012-02-02
How the CSV file looks:
;User1;;;;;;;;;
;Points: 1;;;;;;;;;
;Registered: 2012-01-01;;;;;;;;;
;Last login: 2012-02-02;;;;;;;;;
;User2;;;;;;;;;
;Points: 2;;;;;;;;;
;Registered: 2012-01-01;;;;;;;;;
;Last login: 2012-02-02;;;;;;;;;
So my question is what Regex code should I type in the Find and Replace fields?
Thanks in advance!

See the regex in action:
Find : ^(;(?!Last).*)(;{9})
Replace: $1;
Output will be:
;User1;
;Points: 1;
;Registered: 2012-01-01;
;Last login: 2012-02-02;;;;;;;;;
;User2;
;Points: 2;
;Registered: 2012-01-01;
;Last login: 2012-02-02;;;;;;;;;
Explanation
Find:
^ # Match start of the line
( # Start of the 1st capture group
;(?!Last) # Match a semicolon (;), only if not followed by 'Last' word.
.* # Match everything
) # End of the 1st capture group
( # Start of the 2nd capture group
;{9} # Match exactly 9 semicolons
) # End of the 2nd capture group
Replace:
$1; # Leave 1st capture group as is and append a semicolon.

Related

Capturing the same regular expression over multiple lines

I want to capture series of file names that are listed each in a new line, and I have figured out how to capture the file name in the first line, but I haven't figured out how to repeat it on the subsequent lines.
# Input
# data/raw/file1
# data/raw/file2
# Output
# data/interim/file1
# data/interim/file2
Current Attempt
The regular expression I currently have is
# Input\n(# (.*))
And my inner capture group properly captures data/raw/file1.
Desired Output
What I want is to grab all of the files in between # Input and # Output, so in this example, data/raw/file1 and data/raw/file2.

Go with \G magic:
(?:^#\s+Input|\G(?!\A))\R*(?!#\s+Output)#\s*(.*)|[\s\S]*
Live demo
Regex breakdown
(?: # Start of non-capturing group (a)
^#\s+Input # Match a line beginning with `# Input`
| # Or
\G(?!\A) # Continue from previous successful match point
) # End of NCG (a)
\R* # Match any kind of newline characters
(?!#\s+Output) # Which are not followed by such a line `# Output`
#\s*(.*) # Start matching a path line and capture path
| # If previous patterns didn't match....
[\s\S]* # Then match everything else up to end to not involve engine a lot
PHP code:
$re = '~(?:^#\s+Input|\G(?!\A))\R*(?!#\s+Output)#\s*(.*)|[\s\S]*~m';
$str = '# Input
# data/raw/file1
# data/raw/file2
# Output
# data/interim/file1
# data/interim/file2';
preg_match_all($re, $str, $matches, PREG_PATTERN_ORDER, 0);
// Print the entire match result
print_r(array_filter($matches[1]));
Output:
Array
(
[0] => data/raw/file1
[1] => data/raw/file2
)

Using the s modifier, preg_match, and preg_split you can get each result on its own.
preg_match('/# Input\n(# (?:.*?))# Output/s', '# Input
# data/raw/file1
# data/raw/file2
# Output
# data/interim/file1
# data/interim/file2', $match);
$matched = preg_split('/# /', $match[1], -1, PREG_SPLIT_NO_EMPTY);
print_r($matched);
Demo: https://3v4l.org/dAcRp
Regex demo: https://regex101.com/r/5tfJGM/1/

Regular expression to replace txt in Sublime text editor

I have thousands of rows to edit so need to use regex.
One row is something like this.
T1,1,Example Text1 Text
T2,2,Example Text 2 Text
T3,3,Example Text 3 Text3
I want to convert this data to like this.
{pid:T1,sid:1,name:"Example Text1 Text"},
{pid:T2,sid:2,name:"Test Text1 Text"},
{pid:T3,sid:3,name:"Content Text1 Text"},
How can I do this?
I've tried to replace first and last characters using ^ & $.
but I want to convert ",1," to ",sid:1," and "Example Test" to "name:'Example Text'".
Anyhelp would be appreciate.

Find ^(T\d+,)(\d+,)(.*)$ and replace with {pid:\1sid:\2name:"\3"},
Make sure that the search is set to use Regex.
Matching Regex Breakdown:
^ # Start of Line
( # Capture Group #1 (for Tx)
T # "T"
\d+ # 1 or more digits (T1, T2, T27, etc.)
, # ","
)
( # Capture Group #2 (for the `sid`)
\d+ # 1 or more digits (for the `sid`)
, # ","
)
( # Capture Group #3 (for the string)
.* # String (name)
)
$ # End of Line

Restructure CSV data with Notepad++, Regex

I have a CSV file with following headers and (sample) data:
StopName,RouteName,Travel_Direction,Latitude,Longitude
StreetA # StreetB,1 NameA,DirectionA,Lat,Long
StreetC # StreetD,1 NameA,DirectionA,Lat,Long
...
StreetE # StreetF,1 NameA,DirectionB,Lat,Long
StreetG # StreetH,1 NameA,DirectionB,Lat,Long
...
StreetI # StreetJ,2 NameB,DirectionC,Lat,Long
StreetK # StreetL,2 NameB,DirectionC,Lat,Long
...
StreetM # StreetN,2 NameB,DirectionD,Lat,Long
StreetO # StreetP,2 NameB,DirectionD,Lat,Long
.
.
.
I am wanting to use regex (currently in Notepad++) to get the following results:
1 NameA - DirectionA=[[StreetA # StreetB,[Lat,Long]], [StreetC # StreetD,[Lat,Long]], ...]
1 NameA - DirectionB=[[StreetD # StreetE,[Lat,Long]], [StreetF # StreetG,[Lat,Long]], ...]
2 NameB - DirectionC=[[StreetH # StreetI,[Lat,Long]], [StreetJ # StreetK,[Lat,Long]], ...]
2 NameB - DirectionD=[[StreetL # StreetM,[Lat,Long]], [StreetN # StreetO,[Lat,Long]], ...]
.
.
.
With the Regex and Substitution,
RgX: ^([^,]*),([^,]*),([^,]*),(.*)
Sub: $2 - $3=[$1,[\4]]
Demo: https://regex101.com/r/gS9hD6/1
I have gotten this far:
1 NameA - DirectionA=[StreetA # StreetB,[Lat,Long]]
1 NameA - DirectionA=[StreetC # StreetD,[Lat,Long]]
1 NameA - DirectionB=[StreetE # StreetF,[Lat,Long]]
1 NameA - DirectionB=[StreetG # StreetH,[Lat,Long]]
2 NameB - DirectionC=[StreetI # StreetJ,[Lat,Long]]
2 NameB - DirectionC=[StreetK # StreetL,[Lat,Long]]
2 NameB - DirectionD=[StreetM # StreetN,[Lat,Long]]
2 NameB - DirectionD=[StreetO # StreetP,[Lat,Long]]
In a new regex, I tried splitting the above result on "=", but didn't know where to go from there.
I think one way to get the desired results would be to keep first unique instance of what's before "=", replace new line with "," and enclose it with a [..] to make it an array form.
Edit:
There are about 10k stops (total), but only about 100 unique routes.
Edit 2: (maybe I am asking for too many changes now)
For first regex:
What if I want to use "\n" instead of "="?
At beginning of 2nd regex replacement,
What if I have only RouteName and StopName columns, like this: 1
NameA - DirectionA=[StreetA # StreetB, ...]?
Similarly, what if I only have RouteName and Coordinates, like this:
1 NameA - DirectionA=[[Lat,Long]]?

Steps
1. First replacement:
Find what: ^([^,]*),([^,]*),([^,]*),(.*)
Replace with: \2 - \3=[[\1,[\4]]]
Replace All
2. Second replacement:
Find what: ^[\S\s]*?^([^][]*=)\[\[.*\]\]\K\]\R\1\[(.*)\]$
Replace with: , \2]
Replace All
3. Repeat step 2 until there are no more occurences.
This means that if there are 100 instances (Stops) for the same key (Route - Direction pair), you will have to click Replace All 7 times (ceiling(log2(N))).
Description
I modified your regex in step 1 to add an extra pair of brackets that will enclose the whole set.
For step 2, it finds a pair of lines for the same Direction, appending the last to the previous one.
^[\S\s]*?^([^][]*=) #Group 1: captures "1 NameA - DirA="
\[\[.*\]\] #matches the set of Stops - "[[StA # StB,[Lat,Long]], ..."
\K #keeps the text matched so far out of the match
\]\R #closing "]" and newline
\1 #match next line (if the same route)
\[(.*)\]$ #and capture the Stop (Group 2)
regex101 Demo for step 1
regex101 Demo for step 2

Try this one
I checked it with mobile notepad no error.
Find what:
(s.+#\s\w+),(\d{1,} \w+),(\w+),(.+)
Replace with:
\2 - \3=[[\1,[\4],...]]

Removing everything except a "part" of the string

Here is the string, a full example:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
('1416851046', '1416851046', '2607:5300:60:6097::', '80.187.100.105', 'lagbugdc', 393884, '737537953'),
('1416851067', '1416851067', '174.66.174.101', '98.148.244.151', 'maihym', 393885, '1473193487'),
('1416851094', '1416851094', '2607:5300:60:6097::', '92.157.2.230', 'xeosse26', 393886, '737537953'),
I'd like to remove -EVERYTHING- from it except: facebook:jens.pettersson.7568
(the username slot)
And where facebook:jens.pettersson.7568 is actually 'facebook:jens.pettersson.7568', I'd like it to appear as:
facebook:jens.pettersson.7568 (see the white space there?)
Then sort my list where all 361k lines line up like so:
x x xx xcx xzx xyx xtz
All with spaces, in technically 1 line, if possible.
Or if removing and just collecting the 1 line I need would suffice, I could manually do the sorting i suppose

I'm going to read between the lines and guess that what you want is this:
BEFORE:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
^ this is username
AFTER:
facebook:humpy_electro
You could handle that with the following regex:
s/(?:[^,]*,){4}[\s'"]*([^'",]*).*/facebook:$1, /
i.e.
(?: # begin non-capturing group
[^,]*, # zero or more non-comma characters, followed by a comma
){4} # end non-capturing group, and repeat 4 times
# this skips the first 4 columns of data
[\s'"]* # matches any whitespace and the first quote
( # begin capturing group 1
[^'",]* # capture all non-comma characters until the end quote
) # end capturing group 1
.* # match rest of line
# REPLACE WITH
facebook: # literal text
$1 # capturing group 1
, # comma and a trailing space (not shown here)
And voila.
This turns this:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
('1416851046', '1416851046', '2607:5300:60:6097::', '80.187.100.105', 'lagbugdc', 393884, '737537953'),
('1416851067', '1416851067', '174.66.174.101', '98.148.244.151', 'maihym', 393885, '1473193487'),
('1416851094', '1416851094', '2607:5300:60:6097::', '92.157.2.230', 'xeosse26', 393886, '737537953'),
Into this
facebook:humpy_electro, facebook:lagbugdc, facebook:maihym, facebook:xeosse26,

I got it, from a friend, to do this was a 2 part: First step: ^((.? '){4}) replace with nothing, then, second step '((.?$){1}) replace with nothing.

Regex on Splitting String

I am trying to split the following string into proper output using regex. Answers do not have to be in perl but in general regex is fine:
Username is required.
Multi-string name is optional
Followed by Uselessword is there but should be be parsed
Followed by an optional number
Following by an IP in brackets < > (Required)
String = username optional multistring name uselessword 45 <100.100.100.100>
Output should be:
Match 1 = username
Match 2 = optional multistring name
Match 3 = 45
Match 4 = 100.100.100.100

This sort of things are easier to handle using multiple regex. Here is an example:
my #arr = (
'username optional multistring name uselessword 45 <100.100.100.100>',
'username 45 <100.100.100.100>'
);
for(#arr){
## you can use anchor ^ $ here
if(/(\S+) (.+?) (\d+) <(.+?)>/){
print "$1\n$2\n$3\n$4\n";
}
## you can use anchor ^ $ here
elsif(/(\S+) (\d+) <(.+?)>/){
print "$1\n$2\n\n$3\n";
}
print "==========\n";
}
First if block is looking for four groups from the input. And the second block is looking for three groups.
If you need, you can use [ ]+ to handle multiple spaces between the groups.
Also, if you need, you can adjust the optional group (.+?) according to your preferred characters(usually through the character class [bla]).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Replacing data in CSV file with Regex - regex

Related

Capturing the same regular expression over multiple lines

Regular expression to replace txt in Sublime text editor

Restructure CSV data with Notepad++, Regex

Removing everything except a "part" of the string

Regex on Splitting String

Categories

Resources