Regex function to look for a character just on part of the string - regex

I need help to build a regex rule to find some [ on a text file.
Here is a sample of te text. It is a Json, but I can't use it as it is because of limitation of the program I'm using.
{
"event":[
"ONIMBOTMESSAGEADD"
],
"data[BOT][123][BOT_ID]":[
"123"
]
}
I need to find a regex that matches the line "data[BOT][123][BOT_ID]":[ and find all [ on it. The objectve is to replace it by an underscore so I would end up with something like this:
{
"event":[
"ONIMBOTMESSAGEADD"
],
"data_BOT_123_BOT_ID":[
"123"
]
}
I can't just remove all special characters because this would destroy the json structure.
I found a way to select each one of the lines that need to be corrected with the rule below, but I was not able to apply another rule over the result. I don't know how to do it.
pattern = (("data\[[a-zA-Z]+]\[[0-9]+]\[([a-zA-Z]+_[a-zA-Z]+)\]":\[)|("data\[[A-Z]+]\[([A-Z]+(_|)[A-Z]+)\]":\[)|("data\[[A-Z]+]\[([A-Z]+(_|)[A-Z]+(_|)[A-Z]+)\]":\[))
Any ideas on how to solve it? Thank you in advance.

Replacing weird data* key by only data:
jq '.["data"] = .[keys[0]] | del(.[keys[1]])' file
{
"event": [
"ONIMBOTMESSAGEADD"
],
"data": [
"123"
]
}

Related

How to get comments and string in regex?

i have create a programming language KAGSA, and i have to create a syntax highlighter i start with VSCode highlighter i write every thing well but i have problem with regex of strings (more than one line) and comments (more than one line) this is the code :
Match is the code:
Comments :
"comments": {
"patterns": [{
"name": "comment.line.shebang.kagsa",
"match": "//..*|/\\*(.*?|\n)*\\*/|//|/\\**\\*"
}]
},
The problem is wit the /*Comment*/ comment.
and string code :
"strings": {
"name": "string.quoted.double.kagsa",
"patterns": [{
"name": "string.quoted.double.kagsa",
"match": "'(.*?)'|\"(.*?)\"|``(.*?|\n)*``"
}]
},
my problem is with ``String``
and the Color i get :
[the output color][https://i.stack.imgur.com/NPbS0.png]
You have this issue because match doesn't work for multiline string literals.
I found a similar problem.
As said by Gama11 in his answer:
Try to use a begin / end pattern instead of a simple match.

I want to apply the regular expression used in gitleaks in secretlint

I am now trying to migrate from gitleaks to a tool called secretlint.
Originally, there was a warning in the generic-api-key rule when executing gitleaks, but after moving to secretlint, the warning no longer occurs.
Specifically, I wrote the regular expression of gitleaks.toml provided by gitleaks in the secretlint configuration file .secretlintrc.json according to the format of #secretlint-rule-pattern provided by secretlint.
[[rules]]
id = "generic-api-key"
description = "Generic API Key"
regex = '''(?i)((key|api[^Version]|token|secret|password|auth)[a-z0-9_ .\-,]{0,25})(=|>|:=|\|\|:|<=|=>|:).{0,5}['\"]([0-9a-zA-Z\-_=]{8,64})['\"]'''
entropy = 3.7
secretGroup = 4
keywords = [
"key",
"api",
"token",
"secret",
"password",
"auth",
]
to
{
"rules": [
{
"id": "#secretlint/secretlint-rule-pattern",
"options": {
"patterns": [
{
"name": "Generic API key",
"pattern": "/(?i)((key|api[^Version]|token|secret|password|auth)[a-z0-9_ .\\-,]{0,25})(=|>|:=|\\|\\|:|<=|=>|:).{0,5}['\"]([0-9a-zA-Z\\-_=]{8,64})['\"]/"
}
]
}
}
]
}
I'm thinking that perhaps I'm not migrating the regex correctly, but if anyone can tell me where I'm going wrong, I'd like to know.
The main issue is the the inline (?i) modifier is not supported by the JavaScript regex engine. You must use the normal i flag after the second regex delimiter (/.../i).
Also, the api[^Version] is a typical user error. If you meant to say api not followed with Version, you need api(?!Version).
So you can use
"pattern": "/((key|api(?!Version)|token|secret|password|auth)[\\w .,-]{0,25})([=>:]|:=|\\|\\|:|<=|=>).{0,5}['\"]([\\w=-]{8,64})['\"]/i"
Note that I "shrunk" [A-Za-z0-9_] into a single \w, they are equivalent here. Note the - char does not need escaping when used at the end (or start) of a character class.

Regex to match text between two delimeters?

Heres an example of the things I need to match on a request that I have stored as a text:
[{"id":"896","name":"TinyAuras","author_id":"654","author":"Kurisu</span></strong></span></a>","githubFolder":"https://github.com/xKurisu/TinyAuras/blob/master/TinyAuras.csproj","count":9,"countByChampion":{"":9,"total":9},"description":"(Beta) Aura/Buff/Debuff Tracker","udate":"1451971516","createdDays":375,"image":"https://cdn.joduska.me/forum/uploads/assemblydb/image-default.jpg","strudate":"2016-07-22 19:40","champions":null,"forum_link":"165574","assembly_compiles":true,"voted":false,"voted_champions":[]},
I want to select that link up to the stop here (basically the github folder, not the actual csproj).
I have a file full of thousands of those and I'm trying to extract all of those links and put them in a text file.
Here is what I have so far for perl regex:
(?<=githubFolder":").*(?=\/.+\.csproj") but that ends up selecting more than I need after the first match. Any suggestions?
The issue is, I want everything right before this.csproj.
So in my example I want to extract:
https://github.com/xKurisu/TinyAuras/blob/master/
This regex:
"githubFolder":"([^"]*/)[^"/]*"
selects:
https://github.com/xKurisu/TinyAuras/blob/master/
in your example.
However, it would likely be better to use an actual json parser as Jim D.'s answer suggests so you won't have to worry about spacing and special characters.
While the accepted answer will likely get the job done here, I just want to point out that the old school linux tools are not easy to use to get 100% accurate results working with JSON, and for that reason, it would be best practice to use an actual JSON parser to extract your content.
One simple reason is that strings are JSON encoded so you will need to somehow decode them to insure you get the correct result. Another is that JSON is not a regular language, it is context free. You will need something more powerful than regular expressions in general.
One I am familiar with is jq, and the array of JSON objects can be parsed as the OP desires like this:
$ jq -r ' .[] | .githubFolder ' foo
https://github.com/xKurisu/TinyAuras/blob/master/TinyAuras.csproj
https://github.com/xKurisu/"GiantAuras"/blob/master/GiantAuras.csproj
$
where file foo is
[
{
"id": "896",
"name": "TinyAuras",
"author_id": "654",
"author": "Kurisu</span></strong></span></a>",
"githubFolder": "https://github.com/xKurisu/TinyAuras/blob/master/TinyAuras.csproj",
"count": 9,
"countByChampion": {
"": 9,
"total": 9
},
"description": "(Beta) Aura/Buff/Debuff Tracker",
"udate": "1451971516",
"createdDays": 375,
"image": "https://cdn.joduska.me/forum/uploads/assemblydb/image-default.jpg",
"strudate": "2016-07-22 19:40",
"champions": null,
"forum_link": "165574",
"assembly_compiles": true,
"voted": false,
"voted_champions": []
},
{
"id": "888",
"name": "\"GiantAuras\"",
"author_id": "666",
"author": "Astaire</span></strong></span></a>",
"githubFolder": "https://github.com/xKurisu/\"GiantAuras\"/blob/master/GiantAuras.csproj",
"count": 90,
"countByChampion": {
"": 777,
"total": 42
},
"description": "(Stable) Aura/Buff/Debuff Tracker",
"udate": "1451971517",
"createdDays": 399,
"image": "https://cdn.joduska.me/forum/uploads/assemblydb/image-default.jpg",
"strudate": "2016-07-22 19:40",
"champions": null,
"forum_link": "165574",
"assembly_compiles": true,
"voted": false,
"voted_champions": []
}
]
Here is the regexp:
("githubFolder":".*)\/(.*\.csproj)
1. "githubFolder":"https://github.com/removed/removed/blob/master/stophere/this.csproj
1.1. Group: "githubFolder":"https://github.com/removed/removed/blob/master/stophere
1.2. Group: this.csproj
you can test it here: http://www.regexe.com
this pattern : (http|https):\/\/github\.com\/[\w\/]+\/ selects all directories which starts with github.com on your example.
Try this RegEx:
githubFolder":"([a-zA-Z:\/.]+\/)
It will Group the link upto last slash.

Find pattern with regex in Sublime text 2.02

I would like to create a new Syntax Rule in Sublime in order to search a string pattern so that that pattern is highlighted. The parttern I am looking for is IPC or TST, therefore I was making use of the following Sublime Syntax rule
{ "name": "a3",
"scopeName": "source.a3",
"fileTypes": ["a3"],
"patterns": [
{ "name": "IPC",
"match": "\\b\\w(IPC|TST)\\w\\b "
}
],
"uuid": "c76f733d-879c-4c1d-a1a2-101dfaa11ed8"
}
But for some reason or another, it doesn't work at all.
Could someone point me out in the right direction?
Thanks in advance
After looking around and testing a lot, I have found the issue, apparently apart from identifying the patter, I should invoke the colour, for doing it I have to make use of "capture", being the command as follows:
{ "name": "IPC colour",
"match": "\\b(IPC|TST)\\b",
"captures": {
"1": { "name": "meta.preprocessor.diagnostic" }
}
},
Where "name": "meta.preprocessor.diagnostic" will indicate the sort of colour assign to the found pattern.
regards!

Regex: Match Numbers inside a bracket

Ok here is an example of the text I got
"data": [
{
"post_id": "164902600239452_10202071734744222",
"actor_id": 164902600239452,
"target_id": null,
"likes": {
"href": "https://www.facebook.com/browse/likes/?id=10202071734744222",
"count": 2,
"sample": [
678063648,
100000551340876,
100000805495404,
100000905843684,
],
"friends": [
],
"user_likes": false,
"can_like": true
},
"comments": {
"can_remove": false,
"can_post": true,
"count": 0,
"comment_list": [
]
},
"message": "Down to the FINAL 3 SEATS for It Factor LIVE 2013... WHO will snag them before we close registration on October 15th???\n\nLearn more now at http://www.ItFactorLIVE.com/"
}, ]
I want to match only the numbers inside the brackets after the "sample":
"sample": [
678063648,
100000551340876,
100000805495404,
100000905843684,
],
so that I end up with this
678063648
100000551340876
100000805495404
100000905843684
May somebody please help me with the correct regex to make that happen?
OK - I have looked at the solution that #hwnd had suggested, as well as the link you gave to the "real" data, and came up with the following:
\d+(?=,*\s+(?:\d|\]))
You can see at http://regex101.com/r/pL3gW2 that this matches every string of digits in the sample that is inside square brackets.
The key difference with #hwnd's solution was the addition of a * after the ,, making the comma after the digits optional: this allows the expression to match the last set of numbers before the close ]. Without it, the match skipped the last number inside the brackets.
It's been said before: there are powerful JSON parsers available in almost any language / platform. Look into them.
see if this works for you
pattern = (\d+)(?=(?:(?!\[).)*\]) Demo