Regex to match text between two delimeters? - regex

Heres an example of the things I need to match on a request that I have stored as a text:
[{"id":"896","name":"TinyAuras","author_id":"654","author":"Kurisu</span></strong></span></a>","githubFolder":"https://github.com/xKurisu/TinyAuras/blob/master/TinyAuras.csproj","count":9,"countByChampion":{"":9,"total":9},"description":"(Beta) Aura/Buff/Debuff Tracker","udate":"1451971516","createdDays":375,"image":"https://cdn.joduska.me/forum/uploads/assemblydb/image-default.jpg","strudate":"2016-07-22 19:40","champions":null,"forum_link":"165574","assembly_compiles":true,"voted":false,"voted_champions":[]},
I want to select that link up to the stop here (basically the github folder, not the actual csproj).
I have a file full of thousands of those and I'm trying to extract all of those links and put them in a text file.
Here is what I have so far for perl regex:
(?<=githubFolder":").*(?=\/.+\.csproj") but that ends up selecting more than I need after the first match. Any suggestions?
The issue is, I want everything right before this.csproj.
So in my example I want to extract:
https://github.com/xKurisu/TinyAuras/blob/master/

This regex:
"githubFolder":"([^"]*/)[^"/]*"
selects:
https://github.com/xKurisu/TinyAuras/blob/master/
in your example.
However, it would likely be better to use an actual json parser as Jim D.'s answer suggests so you won't have to worry about spacing and special characters.

While the accepted answer will likely get the job done here, I just want to point out that the old school linux tools are not easy to use to get 100% accurate results working with JSON, and for that reason, it would be best practice to use an actual JSON parser to extract your content.
One simple reason is that strings are JSON encoded so you will need to somehow decode them to insure you get the correct result. Another is that JSON is not a regular language, it is context free. You will need something more powerful than regular expressions in general.
One I am familiar with is jq, and the array of JSON objects can be parsed as the OP desires like this:
$ jq -r ' .[] | .githubFolder ' foo
https://github.com/xKurisu/TinyAuras/blob/master/TinyAuras.csproj
https://github.com/xKurisu/"GiantAuras"/blob/master/GiantAuras.csproj
$
where file foo is
[
{
"id": "896",
"name": "TinyAuras",
"author_id": "654",
"author": "Kurisu</span></strong></span></a>",
"githubFolder": "https://github.com/xKurisu/TinyAuras/blob/master/TinyAuras.csproj",
"count": 9,
"countByChampion": {
"": 9,
"total": 9
},
"description": "(Beta) Aura/Buff/Debuff Tracker",
"udate": "1451971516",
"createdDays": 375,
"image": "https://cdn.joduska.me/forum/uploads/assemblydb/image-default.jpg",
"strudate": "2016-07-22 19:40",
"champions": null,
"forum_link": "165574",
"assembly_compiles": true,
"voted": false,
"voted_champions": []
},
{
"id": "888",
"name": "\"GiantAuras\"",
"author_id": "666",
"author": "Astaire</span></strong></span></a>",
"githubFolder": "https://github.com/xKurisu/\"GiantAuras\"/blob/master/GiantAuras.csproj",
"count": 90,
"countByChampion": {
"": 777,
"total": 42
},
"description": "(Stable) Aura/Buff/Debuff Tracker",
"udate": "1451971517",
"createdDays": 399,
"image": "https://cdn.joduska.me/forum/uploads/assemblydb/image-default.jpg",
"strudate": "2016-07-22 19:40",
"champions": null,
"forum_link": "165574",
"assembly_compiles": true,
"voted": false,
"voted_champions": []
}
]

Here is the regexp:
("githubFolder":".*)\/(.*\.csproj)
1. "githubFolder":"https://github.com/removed/removed/blob/master/stophere/this.csproj
1.1. Group: "githubFolder":"https://github.com/removed/removed/blob/master/stophere
1.2. Group: this.csproj
you can test it here: http://www.regexe.com

this pattern : (http|https):\/\/github\.com\/[\w\/]+\/ selects all directories which starts with github.com on your example.

Try this RegEx:
githubFolder":"([a-zA-Z:\/.]+\/)
It will Group the link upto last slash.

Related

Regex - Exception alternatives

I need an alternative for exceptions in regex.
I have the following log:
"value": "ef51be4506d7d287abc8c26ea6c495f6", "u_jira_status": "", "u_quarter_closed": "", "file_hash": "ef51be4506d7d287abc8c26ea6c495f6", "escalation": "0", "upon_approval": "proceed", "correlation_id": "", "cyber_kill_change": "ef51be4506d7d287abc8c26ea6c495f6", "sys_id": "ef51be4506d7d287abc8c26ea6c495f6", "u_business_service": "", "destination_ip": "ef51be4506d7d287abc8c26ea6c495f6", u'test': u'9db92f08db4f951423c87d84f39619ef'
I need to detect every MD5 in the log, but just the value, for example somthing like this 9db92f08db4f951423c87d84f39619ef
I'm using the following regex
[\"'](?![^\"']*?(?:value|id))[^\"']*(?:\":\s\"|':\su')\b([a-fA-F\d]{32})\b
This regex does the job but, I cannot use it because the sofware I'm currently using (XSOAR) does not allow "!" or "<" so that's where my regex breaks.
I'm looking for an alternative to exclude the words "value" and words that contains "id"
And also if someone can help me to fix my regex, currently matches the field name and the value, for example
"file_hash": "ef51be4506d7d287abc8c26ea6c495f6
What I'm looking for is just the value, something like this
ef51be4506d7d287abc8c26ea6c495f6
Thanks for the help.

How to match a string exactly OR exact substring from beginning using Regular Expression

I'm trying to build a regex query for a database and it's got me stumped. If I have a string with a varying number of elements that has an ordered structure how can I find if it matches another string exactly OR some exact sub string when read from the left?
For example I have these strings
Canada.Ontario.Toronto.Downtown
Canada.Ontario
Canada.EasternCanada.Ontario.Toronto.Downtown
England.London
France.SouthFrance.Nice
They are structured by most general location to specific, left to right. However, the number of elements varies with some specifying a country.region.state and so on, and some just country.town. I need to match not only the words but the order.
So if I want to match "Canada.Ontario.Toronto.Downtown" I would want to both get #1 and #2 and nothing else. How would I do that? Basically running through the string and as soon as a different character comes up it's not a match but still allow a sub string that ends "early" to match like #2.
I've tried making groups and using "?" like (canada)?.?(Ontario)?.? etc but it doesn't seem to work in all situations since it can match nothing as well.
Edit as requested:
Mongodb Database Collection:
[
{
"_id": "doc1",
"context": "Canada.Ontario.Toronto.Downtown",
"useful_data": "Some Data"
},
{
"_id": "doc2",
"context": "Canada.Ontario",
"useful_data": "Some Data"
},
{
"_id": "doc3",
"context": "Canada.EasternCanada.Ontario.Toronto.Downtown",
"useful_data": "Some Data"
},
{
"_id": "doc4",
"context": "England.London",
"useful_data": "Some Data"
},
{
"_id": "doc5",
"context": "France.SouthFrance.Nice",
"useful_data": "Some Data"
},
{
"_id": "doc6",
"context": "",
"useful_data": "Some Data"
}
]
User provides "Canada", "Ontario", "Toronto", and "Downtown" values in that order and I need to use that to query doc1 and doc2 and no others. So I need a regex pattern to put in here: collection.find({"context": {$regex: <pattern here>}) If it's not possible I'll just have to restructure the data and use different methods of finding those docs.
At each dot, start an nested optional group for the next term, and add start and end anchors:
^Canada(\.Ontario(\.Toronto(\.Downtown)?)?)?$
See live demo.

Regex to get value from JSON field

Considering the json below I need to get the value text from grand_total. The most close regular expression that I achieved was:
"grand_total":(.*?)\}
{
"data": [{
"grand_total": {
"digital": "4:41",
"hours": 4,
"minutes": 41,
"text": "4 hrs 41 mins",
"total_seconds": 16880.662732
}
}],
"end": "2019-09-04T02:59:59Z",
"start": "2019-09-03T03:00:00Z"
}
On s mode, an expression similar to the following might extract the desired value,
"grand_total":\s*{.*?"text"\s*:\s*"([^"]*)"
and you can likely call that using $1.
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.

Replace specific part of a string

I'm currently encountering a pickle in modifications of a document. Lets say for example, I have this chunk of text:
"id": "EFM",
"type": "Casual",
"hasBeenAssigned": false,
"hasRandomAssigned": false
},
I currently have roughly 73 - 80 occourances of:
"id" : "somethingdifferent",
Using a regular expression in notepad++, How can I select the entire string:
"id" : "",
but only change the contents between the second set of quotes?
Edit
An oversight made me leave this information out:
"equipedOutfit": {
"id": "MkIV",
"type": "Outfit",
"hasBeenAssigned": false,
"hasRandonAssigned": false
},
"equipedWeapon": {
"id": "EFM",
"type": "Casual",
"hasBeenAssigned": false,
"hasRandonAssigned": false
},
The selected text, looking for is:
"id" : "EFM",
You can use a regex like this:
("id": ").*?"
With a replacement string:
$1whatever"
^^^^^^^^--- replace 'whatever' with whatever you want
Working demo
Update: as you updated your question, I'm updating the answer. If you want only to replace "id": "EFM" then you have just to look for that text only and put the replacement string you want.
"id":\s*"\K[^"]*
You can use \K here and replace by whatever you want.See demo.
https://regex101.com/r/sS2dM8/29
EDIT:
If you want only EFM then use
"id"\s*:\s*"\KEFM(?=")
Find what: ("id"\s?:\s?").*(")
Replace with: \1somethingdifferent\2
Options:
Regular expression, Wrap around

Regex find and replace all words

Reg-ex always confuses me, plus super simple syntax's are hard to Google. I am using reg-ex here strictly with find and replace no need for any languages to do some reg-ex just want to save time editing a lot of data :)
I have a huge json file, these are only two pieces of data, but it's good for this example.
[
{
name: 'John',
team: 'Wolves',
team_id: 1,
number: 24
},
{
name: 'Kevin',
team: 'Rockets',
team_id: 1,
number: 6
}
]
Inside my json I need to put double quotes over pretty much every key:value pair, numbers are optional.
I need to get rid of the single quotes, then put double quotes over everything.
Final result looking like this.
[
{
"name": "John",
"team": "Wolves",
"team_id": "1",
"number": "24"
},
{
"name": "Kevin",
"team": "Rockets",
"team_id": "1",
"number": "6"
}
]
Again, numbers are optional but it would be nice to know how to double quote those.
Extra: I vaguely remember doing something like this awhile back, but can't find where I found that information. This would be a nice reference. Does anyone have any good links to the basics of regex, I just want to save time when working with a lot of data. Thanks.
Try something along the lines of this:
(\w+):\s*('?)([^']+?)\2(?=[\n,]) and replace by "\1": "\3"
Demo: http://regex101.com/r/pX9xX6
Edit:
Just tested in Sublime, seems to work fine.
Well, the exact syntax depends on the tool. If you were using vim, for instance:
:%s/'\([^']*\)'/"\1"/g
and
:%s/^\([ ^I]*\)\([^ ^I]*\):/\1"\2":/
would probably do the trick, although you'd want to do a manual check for any quoted quotes..