Reg-ex always confuses me, plus super simple syntax's are hard to Google. I am using reg-ex here strictly with find and replace no need for any languages to do some reg-ex just want to save time editing a lot of data :)
I have a huge json file, these are only two pieces of data, but it's good for this example.
[
{
name: 'John',
team: 'Wolves',
team_id: 1,
number: 24
},
{
name: 'Kevin',
team: 'Rockets',
team_id: 1,
number: 6
}
]
Inside my json I need to put double quotes over pretty much every key:value pair, numbers are optional.
I need to get rid of the single quotes, then put double quotes over everything.
Final result looking like this.
[
{
"name": "John",
"team": "Wolves",
"team_id": "1",
"number": "24"
},
{
"name": "Kevin",
"team": "Rockets",
"team_id": "1",
"number": "6"
}
]
Again, numbers are optional but it would be nice to know how to double quote those.
Extra: I vaguely remember doing something like this awhile back, but can't find where I found that information. This would be a nice reference. Does anyone have any good links to the basics of regex, I just want to save time when working with a lot of data. Thanks.
Try something along the lines of this:
(\w+):\s*('?)([^']+?)\2(?=[\n,]) and replace by "\1": "\3"
Demo: http://regex101.com/r/pX9xX6
Edit:
Just tested in Sublime, seems to work fine.
Well, the exact syntax depends on the tool. If you were using vim, for instance:
:%s/'\([^']*\)'/"\1"/g
and
:%s/^\([ ^I]*\)\([^ ^I]*\):/\1"\2":/
would probably do the trick, although you'd want to do a manual check for any quoted quotes..
Related
I'm trying to build a regex query for a database and it's got me stumped. If I have a string with a varying number of elements that has an ordered structure how can I find if it matches another string exactly OR some exact sub string when read from the left?
For example I have these strings
Canada.Ontario.Toronto.Downtown
Canada.Ontario
Canada.EasternCanada.Ontario.Toronto.Downtown
England.London
France.SouthFrance.Nice
They are structured by most general location to specific, left to right. However, the number of elements varies with some specifying a country.region.state and so on, and some just country.town. I need to match not only the words but the order.
So if I want to match "Canada.Ontario.Toronto.Downtown" I would want to both get #1 and #2 and nothing else. How would I do that? Basically running through the string and as soon as a different character comes up it's not a match but still allow a sub string that ends "early" to match like #2.
I've tried making groups and using "?" like (canada)?.?(Ontario)?.? etc but it doesn't seem to work in all situations since it can match nothing as well.
Edit as requested:
Mongodb Database Collection:
[
{
"_id": "doc1",
"context": "Canada.Ontario.Toronto.Downtown",
"useful_data": "Some Data"
},
{
"_id": "doc2",
"context": "Canada.Ontario",
"useful_data": "Some Data"
},
{
"_id": "doc3",
"context": "Canada.EasternCanada.Ontario.Toronto.Downtown",
"useful_data": "Some Data"
},
{
"_id": "doc4",
"context": "England.London",
"useful_data": "Some Data"
},
{
"_id": "doc5",
"context": "France.SouthFrance.Nice",
"useful_data": "Some Data"
},
{
"_id": "doc6",
"context": "",
"useful_data": "Some Data"
}
]
User provides "Canada", "Ontario", "Toronto", and "Downtown" values in that order and I need to use that to query doc1 and doc2 and no others. So I need a regex pattern to put in here: collection.find({"context": {$regex: <pattern here>}) If it's not possible I'll just have to restructure the data and use different methods of finding those docs.
At each dot, start an nested optional group for the next term, and add start and end anchors:
^Canada(\.Ontario(\.Toronto(\.Downtown)?)?)?$
See live demo.
i have create a programming language KAGSA, and i have to create a syntax highlighter i start with VSCode highlighter i write every thing well but i have problem with regex of strings (more than one line) and comments (more than one line) this is the code :
Match is the code:
Comments :
"comments": {
"patterns": [{
"name": "comment.line.shebang.kagsa",
"match": "//..*|/\\*(.*?|\n)*\\*/|//|/\\**\\*"
}]
},
The problem is wit the /*Comment*/ comment.
and string code :
"strings": {
"name": "string.quoted.double.kagsa",
"patterns": [{
"name": "string.quoted.double.kagsa",
"match": "'(.*?)'|\"(.*?)\"|``(.*?|\n)*``"
}]
},
my problem is with ``String``
and the Color i get :
[the output color][https://i.stack.imgur.com/NPbS0.png]
You have this issue because match doesn't work for multiline string literals.
I found a similar problem.
As said by Gama11 in his answer:
Try to use a begin / end pattern instead of a simple match.
I just started working with elastic search. By started working I mean I have to query an already running elastic database. Is there a good documentation of the regex they follow. I know about the one on their official site, but its not very helpful.
The more specific problem is that I want to query for lines of the sort:
10:02:37:623421|0098-TSOT {TRANSITION} {ID} {1619245525} {securityID} {} {fromStatus} {NOT_PRESENT} {toStatus} {WAITING}
or
01:01:36:832516|0058-CT {ADD} {0} {3137TTDR7} {23} {COM} {New} {0} {0} {52} {1}
and more of a similar structure. I don't want a generalized regex. If possible, could someone give me a regex expression for each of these that would run with elastic?
I noticed that it matches if the regexp matches with a substring too when I ran with:
query = {"query":
{"regexp":
{
"message": "[0-9]{2}"
}
},
"sort":
[
{"#timestamp":"asc"}
]
}
But it wont match anything if I use:
query = {"query":
{"regexp":
{
"message": "[0-9]{2}:.*"
}
},
"sort":
[
{"#timestamp":"asc"}
]
}
I want to write regex that are more specific and that are different for the two examples given near the top.
turns out my message is present in the tokenized form instead of the raw form, and : is one of the default delimiters of the tokenizer, in elastic. And as a reason, I can't use regexp query on the whole message because it matches it with each token individually.
Heres an example of the things I need to match on a request that I have stored as a text:
[{"id":"896","name":"TinyAuras","author_id":"654","author":"Kurisu</span></strong></span></a>","githubFolder":"https://github.com/xKurisu/TinyAuras/blob/master/TinyAuras.csproj","count":9,"countByChampion":{"":9,"total":9},"description":"(Beta) Aura/Buff/Debuff Tracker","udate":"1451971516","createdDays":375,"image":"https://cdn.joduska.me/forum/uploads/assemblydb/image-default.jpg","strudate":"2016-07-22 19:40","champions":null,"forum_link":"165574","assembly_compiles":true,"voted":false,"voted_champions":[]},
I want to select that link up to the stop here (basically the github folder, not the actual csproj).
I have a file full of thousands of those and I'm trying to extract all of those links and put them in a text file.
Here is what I have so far for perl regex:
(?<=githubFolder":").*(?=\/.+\.csproj") but that ends up selecting more than I need after the first match. Any suggestions?
The issue is, I want everything right before this.csproj.
So in my example I want to extract:
https://github.com/xKurisu/TinyAuras/blob/master/
This regex:
"githubFolder":"([^"]*/)[^"/]*"
selects:
https://github.com/xKurisu/TinyAuras/blob/master/
in your example.
However, it would likely be better to use an actual json parser as Jim D.'s answer suggests so you won't have to worry about spacing and special characters.
While the accepted answer will likely get the job done here, I just want to point out that the old school linux tools are not easy to use to get 100% accurate results working with JSON, and for that reason, it would be best practice to use an actual JSON parser to extract your content.
One simple reason is that strings are JSON encoded so you will need to somehow decode them to insure you get the correct result. Another is that JSON is not a regular language, it is context free. You will need something more powerful than regular expressions in general.
One I am familiar with is jq, and the array of JSON objects can be parsed as the OP desires like this:
$ jq -r ' .[] | .githubFolder ' foo
https://github.com/xKurisu/TinyAuras/blob/master/TinyAuras.csproj
https://github.com/xKurisu/"GiantAuras"/blob/master/GiantAuras.csproj
$
where file foo is
[
{
"id": "896",
"name": "TinyAuras",
"author_id": "654",
"author": "Kurisu</span></strong></span></a>",
"githubFolder": "https://github.com/xKurisu/TinyAuras/blob/master/TinyAuras.csproj",
"count": 9,
"countByChampion": {
"": 9,
"total": 9
},
"description": "(Beta) Aura/Buff/Debuff Tracker",
"udate": "1451971516",
"createdDays": 375,
"image": "https://cdn.joduska.me/forum/uploads/assemblydb/image-default.jpg",
"strudate": "2016-07-22 19:40",
"champions": null,
"forum_link": "165574",
"assembly_compiles": true,
"voted": false,
"voted_champions": []
},
{
"id": "888",
"name": "\"GiantAuras\"",
"author_id": "666",
"author": "Astaire</span></strong></span></a>",
"githubFolder": "https://github.com/xKurisu/\"GiantAuras\"/blob/master/GiantAuras.csproj",
"count": 90,
"countByChampion": {
"": 777,
"total": 42
},
"description": "(Stable) Aura/Buff/Debuff Tracker",
"udate": "1451971517",
"createdDays": 399,
"image": "https://cdn.joduska.me/forum/uploads/assemblydb/image-default.jpg",
"strudate": "2016-07-22 19:40",
"champions": null,
"forum_link": "165574",
"assembly_compiles": true,
"voted": false,
"voted_champions": []
}
]
Here is the regexp:
("githubFolder":".*)\/(.*\.csproj)
1. "githubFolder":"https://github.com/removed/removed/blob/master/stophere/this.csproj
1.1. Group: "githubFolder":"https://github.com/removed/removed/blob/master/stophere
1.2. Group: this.csproj
you can test it here: http://www.regexe.com
this pattern : (http|https):\/\/github\.com\/[\w\/]+\/ selects all directories which starts with github.com on your example.
Try this RegEx:
githubFolder":"([a-zA-Z:\/.]+\/)
It will Group the link upto last slash.
I'm currently encountering a pickle in modifications of a document. Lets say for example, I have this chunk of text:
"id": "EFM",
"type": "Casual",
"hasBeenAssigned": false,
"hasRandomAssigned": false
},
I currently have roughly 73 - 80 occourances of:
"id" : "somethingdifferent",
Using a regular expression in notepad++, How can I select the entire string:
"id" : "",
but only change the contents between the second set of quotes?
Edit
An oversight made me leave this information out:
"equipedOutfit": {
"id": "MkIV",
"type": "Outfit",
"hasBeenAssigned": false,
"hasRandonAssigned": false
},
"equipedWeapon": {
"id": "EFM",
"type": "Casual",
"hasBeenAssigned": false,
"hasRandonAssigned": false
},
The selected text, looking for is:
"id" : "EFM",
You can use a regex like this:
("id": ").*?"
With a replacement string:
$1whatever"
^^^^^^^^--- replace 'whatever' with whatever you want
Working demo
Update: as you updated your question, I'm updating the answer. If you want only to replace "id": "EFM" then you have just to look for that text only and put the replacement string you want.
"id":\s*"\K[^"]*
You can use \K here and replace by whatever you want.See demo.
https://regex101.com/r/sS2dM8/29
EDIT:
If you want only EFM then use
"id"\s*:\s*"\KEFM(?=")
Find what: ("id"\s?:\s?").*(")
Replace with: \1somethingdifferent\2
Options:
Regular expression, Wrap around