I am facing a situation that drives me nuts.
I am setting up an update server which uses a json file.
Don't ask why or how, it sucks and is my only possibility to achieve it.
I have been trying and researching for HOURS (many) because I went ballistic and wanted to crack this on my own. But I have to realize I got stuck and need help.
So sorry for this chunk but I think it is somewhat important to see...
The file is a one liner and repeating the following sequence with changing values (of course).
"plugin_name_foo_bar": {"buildDate": "bla", "dependencies": [{"name": "bla", "optional": true, "version": "1.00"}], "developers": [{"developerId": "bla", "email": "bla#gmail.com", "name": "Bla bla2nd"}], "excerpt": "some text {excerpt} !bla.png|thumbnail,border=1! ", "gav": "bla", "labels": ["report", "scm-related"], "name": "plugin_name_foo_bar", "previousTimestamp": "bla", "previousVersion": "1.0", "releaseTimestamp": "bla", "requiredCore": "1", "scm": "github.com", "sha1": "ynnBM2jWo25ZLDdP3ybBOnV/Pio=", "title": "bla", "url": "http://bla.org", "version": "1.0", "wiki": "https://bla.org"}, "Exclusion": {"buildDate": "bla", "dependencies": [],
and the next plugin block is glued straight afterwards.
What I now want to do is to search for "plugin_foo_bar": {" as this is the unique identifier for a new plugin description block.
I want to replace the first sha1 value occuring afterwards. That's where I keep failing. I always grab the first,last or any occurrence in the entire file and not the block :(
"title" is the unique identifier after the sha1 value.
So I tried to make the .* less greedy but it ain't working out.
last attempt was heading towards:
sed -i 's/("name": "plugin_name_foo_bar.*sha1": ")([a-zA-Z0-9!##\$%^&*()\[\]]*)(", "title"\)/\1blablabla\2/1' default.json
to find the sha1 value of that plugin but still no joy. I hope someone knows - preferably a simpler approach - before I now continue with trial and error until I have to puke and freakout.
I am working with SED on Windows, so Unix approach might help me to figure out how to achieve this in batch but please make it as one-liner if possible. Scripts are a real pain to convert.
And I just need SED and no other solution with other tools like AWK. That is absolutely out of discussion.
Any help is appreciated :)
Cheers
Jan
Don't use regex (sed) to parse JSON, instead use a proper JSON parser, or javascript directly like I do :
Using javascript and nodejs in a script :
File /tmp/file.json is :
{
"plugin_name_foo_bar" : {
"excerpt" : "some text {excerpt} !bla.png|thumbnail,border=1! ",
"dependencies" : [
{
"name" : "bla",
"version" : "1.00",
"optional" : true
}
],
"title" : "bla",
"previousTimestamp" : "bla",
"releaseTimestamp" : "bla",
"sha1" : "ynnBM2jWo25ZLDdP3ybBOnV/Pio=",
"labels" : [
"report",
"scm-related"
],
"buildDate" : "bla",
"version" : "1.0",
"previousVersion" : "1.0",
"name" : "plugin_name_foo_bar",
"scm" : "github.com",
"url" : "http://bla.org",
"gav" : "bla",
"developers" : [
{
"email" : "bla#gmail.com",
"developerId" : "bla",
"name" : "Bla bla2nd"
}
],
"wiki" : "https://bla.org",
"requiredCore" : "1"
},
"Exclusion" : {
"dependencies" : [],
"buildDate" : "bla"
}
}
The script script.js :
var js = require('/tmp/file.json')
js.plugin_name_foo_bar.sha1 = "xxx"
console.log(js)
Usage :
nodejs script.js
As sputnick points out parsing is a little beyond what sed's meant for. Still, sed's Turing-complete and bludgeoning it into doing what you want can satisfy that {sad,masoch}istic urge so many of us feel from time to time.
This one's even easy.
sed '
s/"sha1": /\n/g
s/\("name": "plugin_name_foo_bar"[^\n]*\n"\)[^"]*/\1thenewsha/
s/\n/"sha1": /g
'
For windows command line, with escaped quotes, replacing inline and using regular expression
sed -i -r "s/(plugin_name_foo_bar.+?sha1\": \")[^\"]+\"/\1abcdefghijkl\"/" default.json
sed -r "s/(plugin_name_foo_bar[^!]+sha1.: .)[^\"]+/\1abcdefghijkl/" file
Related
i have create a programming language KAGSA, and i have to create a syntax highlighter i start with VSCode highlighter i write every thing well but i have problem with regex of strings (more than one line) and comments (more than one line) this is the code :
Match is the code:
Comments :
"comments": {
"patterns": [{
"name": "comment.line.shebang.kagsa",
"match": "//..*|/\\*(.*?|\n)*\\*/|//|/\\**\\*"
}]
},
The problem is wit the /*Comment*/ comment.
and string code :
"strings": {
"name": "string.quoted.double.kagsa",
"patterns": [{
"name": "string.quoted.double.kagsa",
"match": "'(.*?)'|\"(.*?)\"|``(.*?|\n)*``"
}]
},
my problem is with ``String``
and the Color i get :
[the output color][https://i.stack.imgur.com/NPbS0.png]
You have this issue because match doesn't work for multiline string literals.
I found a similar problem.
As said by Gama11 in his answer:
Try to use a begin / end pattern instead of a simple match.
I'd like to extract the name content (David) and the url content (www.stackoverflow.com) from the following json file.
I have several questions:
How to extract a string that starts with " and ends with " ?
Hoe to force the regular expression to start with an expression that is not part of the matching regular expressing.
{
"id" : "1234",
"name" : "David",
"request" : {
"url" : "www.stackoverflow.com",
"method" : "POST",
"bodyPatterns" : [ {
"matchesXPath" : "example"
}, {
"matchesXPath" : "example/123"
}, {
"matchesXPath" : {
"expression" : "example/123/123/text()",
"equalTo" : "bbbb"
}
} ]
}
}
Note: a proper parser is the most recommended way to do this on the long term. For a simple, occasional situation regex might fit.
This regex does the job:
"name"\s*:\s*"(?'name'[^"]+)".*"url"\s*:\s*"(?'url'[^"]+)"
Test here. Groups name and url contain your data.
I do not recommend solving this with a regular expression. Such ad-hoc parsing solutions tend to be error-prone, overly complicated, hard to extend and turn on you when you least expect it.
Instead, I recommend using a proper json parser, depending on the language you use. For plain shell, jq is a good choice. With that, specifying the path to the property becomes trivial:
cat file.json | jq '.request.url'
Heres an example of the things I need to match on a request that I have stored as a text:
[{"id":"896","name":"TinyAuras","author_id":"654","author":"Kurisu</span></strong></span></a>","githubFolder":"https://github.com/xKurisu/TinyAuras/blob/master/TinyAuras.csproj","count":9,"countByChampion":{"":9,"total":9},"description":"(Beta) Aura/Buff/Debuff Tracker","udate":"1451971516","createdDays":375,"image":"https://cdn.joduska.me/forum/uploads/assemblydb/image-default.jpg","strudate":"2016-07-22 19:40","champions":null,"forum_link":"165574","assembly_compiles":true,"voted":false,"voted_champions":[]},
I want to select that link up to the stop here (basically the github folder, not the actual csproj).
I have a file full of thousands of those and I'm trying to extract all of those links and put them in a text file.
Here is what I have so far for perl regex:
(?<=githubFolder":").*(?=\/.+\.csproj") but that ends up selecting more than I need after the first match. Any suggestions?
The issue is, I want everything right before this.csproj.
So in my example I want to extract:
https://github.com/xKurisu/TinyAuras/blob/master/
This regex:
"githubFolder":"([^"]*/)[^"/]*"
selects:
https://github.com/xKurisu/TinyAuras/blob/master/
in your example.
However, it would likely be better to use an actual json parser as Jim D.'s answer suggests so you won't have to worry about spacing and special characters.
While the accepted answer will likely get the job done here, I just want to point out that the old school linux tools are not easy to use to get 100% accurate results working with JSON, and for that reason, it would be best practice to use an actual JSON parser to extract your content.
One simple reason is that strings are JSON encoded so you will need to somehow decode them to insure you get the correct result. Another is that JSON is not a regular language, it is context free. You will need something more powerful than regular expressions in general.
One I am familiar with is jq, and the array of JSON objects can be parsed as the OP desires like this:
$ jq -r ' .[] | .githubFolder ' foo
https://github.com/xKurisu/TinyAuras/blob/master/TinyAuras.csproj
https://github.com/xKurisu/"GiantAuras"/blob/master/GiantAuras.csproj
$
where file foo is
[
{
"id": "896",
"name": "TinyAuras",
"author_id": "654",
"author": "Kurisu</span></strong></span></a>",
"githubFolder": "https://github.com/xKurisu/TinyAuras/blob/master/TinyAuras.csproj",
"count": 9,
"countByChampion": {
"": 9,
"total": 9
},
"description": "(Beta) Aura/Buff/Debuff Tracker",
"udate": "1451971516",
"createdDays": 375,
"image": "https://cdn.joduska.me/forum/uploads/assemblydb/image-default.jpg",
"strudate": "2016-07-22 19:40",
"champions": null,
"forum_link": "165574",
"assembly_compiles": true,
"voted": false,
"voted_champions": []
},
{
"id": "888",
"name": "\"GiantAuras\"",
"author_id": "666",
"author": "Astaire</span></strong></span></a>",
"githubFolder": "https://github.com/xKurisu/\"GiantAuras\"/blob/master/GiantAuras.csproj",
"count": 90,
"countByChampion": {
"": 777,
"total": 42
},
"description": "(Stable) Aura/Buff/Debuff Tracker",
"udate": "1451971517",
"createdDays": 399,
"image": "https://cdn.joduska.me/forum/uploads/assemblydb/image-default.jpg",
"strudate": "2016-07-22 19:40",
"champions": null,
"forum_link": "165574",
"assembly_compiles": true,
"voted": false,
"voted_champions": []
}
]
Here is the regexp:
("githubFolder":".*)\/(.*\.csproj)
1. "githubFolder":"https://github.com/removed/removed/blob/master/stophere/this.csproj
1.1. Group: "githubFolder":"https://github.com/removed/removed/blob/master/stophere
1.2. Group: this.csproj
you can test it here: http://www.regexe.com
this pattern : (http|https):\/\/github\.com\/[\w\/]+\/ selects all directories which starts with github.com on your example.
Try this RegEx:
githubFolder":"([a-zA-Z:\/.]+\/)
It will Group the link upto last slash.
I'm currently encountering a pickle in modifications of a document. Lets say for example, I have this chunk of text:
"id": "EFM",
"type": "Casual",
"hasBeenAssigned": false,
"hasRandomAssigned": false
},
I currently have roughly 73 - 80 occourances of:
"id" : "somethingdifferent",
Using a regular expression in notepad++, How can I select the entire string:
"id" : "",
but only change the contents between the second set of quotes?
Edit
An oversight made me leave this information out:
"equipedOutfit": {
"id": "MkIV",
"type": "Outfit",
"hasBeenAssigned": false,
"hasRandonAssigned": false
},
"equipedWeapon": {
"id": "EFM",
"type": "Casual",
"hasBeenAssigned": false,
"hasRandonAssigned": false
},
The selected text, looking for is:
"id" : "EFM",
You can use a regex like this:
("id": ").*?"
With a replacement string:
$1whatever"
^^^^^^^^--- replace 'whatever' with whatever you want
Working demo
Update: as you updated your question, I'm updating the answer. If you want only to replace "id": "EFM" then you have just to look for that text only and put the replacement string you want.
"id":\s*"\K[^"]*
You can use \K here and replace by whatever you want.See demo.
https://regex101.com/r/sS2dM8/29
EDIT:
If you want only EFM then use
"id"\s*:\s*"\KEFM(?=")
Find what: ("id"\s?:\s?").*(")
Replace with: \1somethingdifferent\2
Options:
Regular expression, Wrap around
I would like to create a new Syntax Rule in Sublime in order to search a string pattern so that that pattern is highlighted. The parttern I am looking for is IPC or TST, therefore I was making use of the following Sublime Syntax rule
{ "name": "a3",
"scopeName": "source.a3",
"fileTypes": ["a3"],
"patterns": [
{ "name": "IPC",
"match": "\\b\\w(IPC|TST)\\w\\b "
}
],
"uuid": "c76f733d-879c-4c1d-a1a2-101dfaa11ed8"
}
But for some reason or another, it doesn't work at all.
Could someone point me out in the right direction?
Thanks in advance
After looking around and testing a lot, I have found the issue, apparently apart from identifying the patter, I should invoke the colour, for doing it I have to make use of "capture", being the command as follows:
{ "name": "IPC colour",
"match": "\\b(IPC|TST)\\b",
"captures": {
"1": { "name": "meta.preprocessor.diagnostic" }
}
},
Where "name": "meta.preprocessor.diagnostic" will indicate the sort of colour assign to the found pattern.
regards!