Notepad++ and Regex to put a bunch of word into [ ], each word in quotation mark and separated by commas - regex

I have a list, a thousand row like this
"Categories": "Action, Adventure, Comedy, Fantasy",
"Categories": "Action, Adventure",
"Categories": "Action, Adventure, Comedy, Drama,Fantasy, Martial Arts, Mystery, Supernatural",
"Categories": "Action,Adventure, Comedy, Fantasy,Psychological, School Life, Supernatural",
and I'd like to make into this
"Categories": ["Action", "Adventure", "Comedy", "Fantasy"]
"Categories": ["Action", "Adventure"]
"Categories": ["Action", "Adventure", "Comedy", "Drama", "Fantasy", "Mystery", "Supernatural"]
"Categories": ["Action", "Adventure", "Comedy", "Fantasy", "Psychological", "Supernatural"]
I've tried a bunch of regular expression, such as
("Categories":) "(\b.*?), (\b.*?), (.*), (.*), (\w+?)",
and still stuck, because I am still green at this stuff
please help me to solve this in regex and thank you for the answer

In two steps:
step 1: you replace the string with an array of strings when there is more than one item
search: "Categories":\s*\K("[^",]*+[^"]+")
replace: [$1]
step 2: you replace all the commas in the string
search: (\G(?!^)|"Categories":\s*\[")[^",]+?\K\s*,\s*
replace: ", "

Try:
pattern: ("Categories":) ("[^"]*")
substitute with: $1[$2]
bye

Related

How to swap two words in Visual Studio Code with a find and replace?

The project I'm working on has a number of yaml files, where all the instances of lat: and long: need to be swapped, since the data is incorrectly labeled.
So for instance, the following:
- lat: "-82.645672"
long: '44.941747'
title: "Item 1"
- lat: "-82.645744"
long: '44.940731'
title: "Item 2"
- lat: "-82.645744"
long: '44.940731'
title: "Item 3"
- lat: "-82.646599"
long: '44.941441'
title: "Item 4"
Would need to look like this:
- long: "-82.645672"
lat: '44.941747'
title: "Item 1"
- long: "-82.645744"
lat: '44.940731'
title: "Item 2"
- long: "-82.645744"
lat: '44.940731'
title: "Item 3"
- long: "-82.646599"
lat: '44.941441'
title: "Item 4"
I'm struggling to figure out how to swap these two words globally. I looked at the plugins that are available, but they only seem to work with the current file you're editing, and when highlighting only a couple of words (i.e. like this one https://marketplace.visualstudio.com/items?itemName=davidmart.swap-word). I was looking into using regex as a possible solution, but can only find ways to reorder words on the same line. Is there a regex that can be used in a find and replace to swap two words that can get applied to all files in a project?
To swap words across files (see end to swap words in one file easily):
Try this regex:
^(-\s+)(lat)(.*)(\n\s*)(long) // I made a small change here
and replace with:
$1$5$3$4$2
See regex101 demo.
This works perfectly fine for me in the find/replace widget but not in the search/replace across files panel. Why? See this "resolved" issue: issue: regex search and replace.
The issue seems to indicate it was provisionally "fixed" but it doesn't appear that it has been.
I was going to open a new issue but found this from earlier this week: issue: capture groups don't work when regex includes newline . So hopefully it will be fixed this iteration.
I am happy to report that this bug has been fixed in the Insiders Build 2019-09-16!! Demo below in Insider's Build:
To swap words in a single file only, you can use this extension I wrote: Find and Transform and this keybinding:
{
"key": "alt+s", // whatever keybinding you want
"command": "findInCurrentFile",
"args": {
"find": "(lat)|(long)",
"replace": "${1:+long}${2:+lat}", // swap here
"isRegex": true
}
}
There is no reason you couldn't make that swap 3+ words in some sequence you want.
${1:+long} is a conditional which says if there is a capture group 1, replace it with the text long.
You can use only replace feature.
If you are using Windows, the shortcut is ctrl+h.
ctrl+h, replace lat to dummy
ctrl+h, replace long to lat
ctrl+h, replace dummy to long
With the Replace Rules extension
"replacerules.rules": {
"Swap lat-long 1": {
"find": ["lat","long"],
"replace": ["XYZ","ABC"]
},
"Swap lat-long 2": {
"find": ["XYZ","ABC"],
"replace": ["long","lat"]
}
},
"replacerules.rulesets": {
"Swap lat-long": {
"rules": [
"Swap lat-long 1",
"Swap lat-long 2"
]
}
}
Then execute command: Replace Rules: Run Ruleset...
Dude.
Remember the algorithm to swap 2 strings?
temp=str1
str1=str2
str2=temp
replace "long" with "TEMP".
replace "lat" with "long".
replace "TEMP" with "lat".
Thats it.

Converting text log file with data in quotes to individual columns, using RegEx

I have a text log file and everything I want to capture within this file are in quotes (which are on separate lines).
"rows": [
{"values": [
"word",
"120.134.12.43",
"34780",
"33.334.115.100",
"9834-5202011",
"221",
"NYC-LOG-01.test",
"something.test.com",
"something.test.com\/",
"internet-communications-and-telephony",
"983439849389483",
"unknown, United States"
]},
{"values": [
"ssl",
"20.311.3.21",
"3443",
"40.51.96.219",
"93140-9834811",
"211",
"nyc-log-01.test",
"a.jones.com",
"a.jones.com\/",
"news",
"3434231343434356",
"Somewhere, California, United States, 12345"
I want to capture this data after but not including when the line says "values" the first row says "rows" but this does not appear again. I would like to export the file to a CSV file. Each txt file has 12 rows of data I want to capture but it would be nice if I could increase this too.
Something like this?
import csv
import json
with open('data.json') as data_file:
data = json.load(data_file)
with open("test.csv", "wb+") as output_file:
f = csv.writer(open("output.csv", "wb+"))
for entry in data['rows']:
f.writerow(entry['values'])
You are converting your data from JSON to CSV. Be aware that JSON is already a very easily parsed format, so this conversion is not necessarily needed or even a good idea.
Assuming your input is called data.json:
{"rows": [
{"values": [
"word",
"120.134.12.43",
"34780",
"33.334.115.100",
"9834-5202011",
"221",
"NYC-LOG-01.test",
"something.test.com",
"something.test.com\/",
"internet-communications-and-telephony",
"983439849389483",
"unknown, United States"
]},
{"values": [
"ssl",
"20.311.3.21",
"3443",
"40.51.96.219",
"93140-9834811",
"211",
"nyc-log-01.test",
"a.jones.com",
"a.jones.com\/",
"news",
"3434231343434356",
"Somewhere, California, United States, 12345"
]
}]}
Your data looks like it is in JSON format? (you're missing the surrounding {}?). If so the simplest approach would be to use a parser such as jq
$ jq -r '.rows[].values | #csv' input-file
"word","120.134.12.43","34780","33.334.115.100","9834-5202011","221","NYC-LOG-01.test","something.test.com","something.test.com/","internet-communications-and-telephony","983439849389483","unknown, United States"
"ssl","20.311.3.21","3443","40.51.96.219","93140-9834811","211","nyc-log-01.test","a.jones.com","a.jones.com/","news","3434231343434356","Somewhere, California, United States, 12345"
Or you could use the json module that ships with Python
$ python -c 'import csv, json, sys; csv.writer(sys.stdout).writerows(row["values"] for row in json.load(sys.stdin)["rows"])' < filename
Finally a terrible approach (but does what you want for this particular input example) could be
sed -n '/^ *\(".*"\),*$/{ s/^ *//; H; }; /^ *\]/{ s/.*//; x; s/\n//g; p; }' filename

Regular expression in postgresql

I have the table mytable with the column images, there I store strings like JSON objects.
This column contains invalid string in some records, not because the string is incorrect, but because when I try to cast it to JSON the query fails. Example:
`SELECT images::JSON->0 FROM mytable WHERE <any filter>`
If all elements of the JSON object are good that query works successfully, but if some string has " in incorrect place (to be specific, in this case in the title key) the error happens.
Good strings are like this:
[
{
"imagen": "http://www.myExample.com/asd1.png",
"amazon": "http://amazonExample.com/asd1.jpg",
"title": "A title 1."
},
{
"imagen": "http://www.myExample.com/asd2.png",
"amazon": "http://amazonExample.com/asd2.jpg",
"title": "A title 2."
},
{
"imagen": "http://www.myExample.com/asd3.png",
"amazon": "http://amazonExample.com/asd3.jpg",
"title": "A title 3."
}
]
Bad are like this:
[
{
"imagen": "http://www.myExample.com/asd1.png",
"amazon": "http://amazonExample.com/asd1.jpg",
"title": "A "title" 1."
},
{
"imagen": "http://www.myExample.com/asd2.png",
"amazon": "http://amazonExample.com/asd2.jpg",
"title": "A title 2."
},
{
"imagen": "http://www.myExample.com/asd3.png",
"amazon": "http://amazonExample.com/asd3.jpg",
"title": "A title 3."
}
]
Please put attention in the difference of title keys.
I need a regular expression to convert bad strings into good ones in PostgreSQL.
It will be very complicated, if possible to do this in one regexp, but it will be very easy to do in two or more.
For example, replace all the double quotes with \" and then replace {\" with {", \":\" with ":", \",\" with ",", \"} with "}. The quotes that are not escaped are the ones that breaks JSON.
Alternatively, replace "(?=[^}:]*"[\s]*}) (quotes in title only) with \" and then replace ":\" with ":". See details: https://regex101.com/r/pB6rD9/1
Crafting replace that will be able to do so in one go requires lookbehinds and I suppose that PSQL does not support them.

RegEx:Replace: Dequote integers

So, I have a file with a large JSON array of objects, and unfortunately, every field is wrapped in double quotes. Two fields in particular (Latitude and Longitude) needs to have the quotes removed.
I just want to use RegEx within an editors find/replace feature to remove the quotes...but I am struggling to come up with the RegEx.
This is very specific, I am just hoping there is a RegEx guru out there that could point me in the right direction on how to free the 37 and the -122 below from their quoted prisons.
{
"ClubId": "TestWith01",
"ClubName": "TestWith01",
"_DistrictNumber": "K05",
"MeetingDay1": "2nd & 4th MO",
"MeetingTime1": "6:30 PM",
"MeetingDay2": "",
"URL": "http://www.someurl.com",
"Latitude": "37",
"Longitude": "-122",
"MeetingAddress": {
"Address1": "Sample With Quotes",
"Address2": "",
"Address3": "",
"City": "Treasure Island",
"State": "FL",
"PostalCode": "33706",
"Country": "United States"
}
},
result = subject.replace(/"(-?\d+)"/g, "$1");
This should replace anything that has an optional minus, followed by 1+ digits. You did not specify your language, so I guessed javascript.

Split line with perl

I have a multiline credits with missing a few commas:
rendező: Joe Carnahan forgatókönyvíró: Brian Bloom, Michael Brandt, Skip Woods zeneszerző: Alan Silvestri operatőr: Mauro Fiore producer: Stephen J. Cannell, Jules Daly, Ridley Scott szereplő(k): Liam Neeson (John 'Hannibal' Smith ezredes) Bradley Cooper (Templeton 'Szépfiú' Peck hadnagy) szinkronhang: Gáti Oszkár (John 'Hannibal' (Smith magyar hangja)) Rajkai Zoltán (Templeton 'Faceman' Peck magyar hangja)
This leads to inability to split line by commas:
$credits (split /, */, $line):
I want to split after comma and if not exist comma between credits, split after first credits (ex.):
rendező: Joe Carnahan
forgatókönyvíró: Brian Bloom
Michael Brandt
Skip Woods
zeneszerző: Alan Silvestri
operatőr: Mauro Fiore
producer: Stephen J. Cannell
Jules Daly
Ridley Scott
szereplő(k): Liam Neeson (John 'Hannibal' Smith ezredes)
Bradley Cooper (Templeton 'Szépfiú' Peck hadnagy)
szinkronhang: Gáti Oszkár (John 'Hannibal' (Smith magyar hangja))
Rajkai Zoltán (Templeton 'Faceman' Peck magyar hangja)
Thanks
So you can split by a comma-space in most cases, but otherwise by a space character preceded by a right parenthesis. This would be:
/, |(?<=\)) /
Or, perhaps (?) more clearly:
/,[[:space:]]|(?<=\))[[:space:]]/
The pipe character will make for a disjunctive match between what's on either side of it. But there's also parsing out the roles, and the entire string is full of non-ascii characters.
Script:
use strict;
use warnings;
use utf8;
use Data::Dump 'dump';
my $big_string = q/rendező: ... hangja)/;
my #credits = map {
my ($title, $names) = /([[:alpha:]()]+): (.+)/;
my #names = split /,[[:space:]]|(?<=\))[[:space:]]/, $names;
my $credit = { $title => \#names };
} split / (?=[[:alpha:]()]+:)/, $big_string;
binmode STDOUT, ':utf8';
print dump \#credits;
Output:
[
{ rendező => ["Joe Carnahan"] },
{
forgatókönyvíró => ["Brian Bloom", "Michael Brandt", "Skip Woods"],
},
{ zeneszerző => ["Alan Silvestri"] },
{ operatőr => ["Mauro Fiore"] },
{
producer => ["Stephen J. Cannell", "Jules Daly", "Ridley Scott"],
},
{
"szerepl\x{151}(k)" => [
"Liam Neeson (John 'Hannibal' Smith ezredes)",
"Bradley Cooper (Templeton 'Sz\xE9pfi\xFA' Peck hadnagy)",
],
},
{
szinkronhang => [
"G\xE1ti Oszk\xE1r (John 'Hannibal' (Smith magyar hangja))",
"Rajkai Zolt\xE1n (Templeton 'Faceman' Peck magyar hangja)",
],
},
]
Notes:
An array of hashrefs is used to preserve the order of the list.
The utf8 pragma will make the [:alpha:] construct utf8-aware.
Given Perl >= v5.10, The utf8::all pragma can replace utf8 and also remove the need to call &binmode prior to output.
Lookarounds ((?=), (?<=), etc.) can be tricky; see perlre and this guide for good information on them.
I think you can try to set up a regular expression.
you can substitute any 'word:' with '\nword:'
in the same way you can substitte ',' with ',\n'
to give a look to regular expression check this page:
http://www.troubleshooters.com/codecorn/littperl/perlreg.htm
the 2 roules should be something similar to:
$newstr ~= ($str =~ tr/[a-zA-Z]+:/(\n)[a-Z]+:/);
it's just a guess... not really aware of Perl syntax