Notepad ++: how to remove all text before and after a string - regex

I want to just keep the code for each line in this text, what is the regular expression for this
{"name": "Canada", "countryCd": "CA", "code": 393},
{"name": "Syria", "countryCd": "SR", "code": 3535},
{"name": "Germany", "countryCd": "GR", "code": 3213}
The expected result would be
CA
SR
GR

Kind of a hack (see #Totos comment) but works for your requirements:
.*"([A-Z]{2})".*
This needs to be replaced by $1, see a demo on regex101.com (side node: isn't Germany usually GER ?)

In notepad++ I would do a find and replace like:
.*?"countryCd": "([^"]+)".*
And replace that with:
\1
That way if for some reason your country code was not just 2 letters it would be captured correctly. The [^"] is a negative character class, meaning anything that isn't " and the + makes it at least 1 character. I find using negative character classes does what is actually intended.
And in this case you want to capture whatever is in the quotes after the country CD, and this will do the trick.

Related

Regex multiple exclusion and match for different patterns

I want to exclude some specific words and if those words doesnt match, then should match an md5 hash for example.
Here a small log as example
"value": "ef51be4506d7d287abc8c26ea6c495f6", "u_jira_status": "", "u_quarter_closed": "", "file_hash": "ef51be4506d7d287abc8c26ea6c495f6", "escalation": "0", "upon_approval": "proceed", "correlation_id": "", "cyber_kill_change": "ef51be4506d7d287abc8c26ea6c495f6", "sys_id": "ef51be4506d7d287abc8c26ea6c495f6", "u_business_service": "", "destination_ip": "ef51be4506d7d287abc8c26ea6c495f6", u'test': u'9db92f08db4f951423c87d84f39619ef'
As you can see there is multiple values that should match, just excluding "value" and "id"
Here the regex I am using so far
([^value|^id](\":\s\"|':\su')\b)[a-fA-F\d]{32}\b
There is two cases where after the exclusion could be
"something": "hash"
'something': u'hash'
Whit the previous regex the result is the following.
The result is excluding value and id as expected, but there is a value called "cyber_kill_change" that is not matching for some reason and for the other ones is matching "file_hash", "destination_ip" and 'test' as expected.
Now as you can see in the previous image the matches are
h": "ef51be4506d7d287abc8c26ea6c495f6
p": "ef51be4506d7d287abc8c26ea6c495f6
t': u'9db92f08db4f951423c87d84f39619ef
Instead of just the MD5 (In this example is the same for the all 3 matches)
9db92f08db4f951423c87d84f39619ef
Can someone explain to me how to match correctly, please?
Note
For the exclusions I cannot use something similar to this
(?<!value|id)
The < and ! are not accepted by the software where I want to add the regex.
If it helps I am trying to use this regex for XSOAR, here some documentation of the permitted Syntax
"cyber_kill_change" ends with the character 'e' which is the same as the last character in "value", which is why it was also excluded. The problem started when you use the brackets [], which is a "character class", which means "any character in the word value or Id will be match as a single character, not as a word". It is the same as:
[value|id]=(v|a|l|u|e|i|d)
To match the exact word, you can use (value|id) you may try this Expression:
((?<!(value|id))(\":\s\"|':\su')\b)[a-fA-F\d]{32}\b
I used CyrilEx Regex Tester to check the expression and I got the same result as shown in the following image:
Regex Tester

Matching groups of things, separated by specific token

So, here's what I'm trying to do, although I've been struggling with that for some time.
Let's say we have this input:
{{something|a}} text text {{another|one|with|more|items}}
What I'm trying to achieve:
[
["something", "a"],
["another", "one", "with", "more", "items"]
]
The simple way would be something like:
"{{something|a}} text text {{another|one|with|more|items}}".scan(/([^\|\{\}]+)/)
But this yields - quite predictably so - all the results in a single array (also note that I do not want "text text" in the results, just the items IN the curly braces):
[["something"], ["a"], [" text text "], ["another"], ["one"], ["with"], ["more"], ["items"]]
I then tried doing it like (see script here):
\{\{(([^\|\{\}]+)\|?)+\}\}
But I must be doing something wrong.
Any help will be appreciated! :)
You can't get all captured values of a repeated capturing group in Ruby.
There are always as many captures as the capturing groups in the pattern.
Thus, you need to throw in some more code to get the expected output:
s = '{{something|a}} text text {{another|one|with|more|items}}'
p s.scan(/{{(.*?)}}/).flatten.map{ |x| x.split("|") }
# => [["something", "a"], ["another", "one", "with", "more", "items"]]
See the Ruby demo.
Note the {{(.*?)}} pattern matches a {{ substring, then any zero or more chars other than line break chars as few as possible and then }}, then .flatten turns the result into a string array, and then x.split("|") within a map call splits the found capturing group values with |.
NOTE: if there can be line breaks in between {{ and }}, add /m modifier, /{{(.*?)}}/m. Or, unroll the pattern for better efficiency: /{{[^}]*(?:}(?!})[^}]*)*}}/ (see Rubular demo).

Regex: match only line where numbers are located

Really tired of this regex. So many combinations.... I believe I need another brain :-)
Here is my problem and if someone help, I'd be highly appreciated.
I have those 6 lines of JSON response
...some JSON code here
"note" : "",
"note" : "Here is my note",
"note" : "My note $%$",
"note" : "Created bug 14569 in the system",
"note" : "Another beautiful note",
"note" : "##$%##%dgdeg"
...continuation of the JSON code
With the help of Regex, how do I match number 14569 only?
I have tried this regex, but it matches all 6 lines
"note"([\s\:\"a-zA-Z])*([0-9]*) - 6 matches (I only need one)
"note"([\s\:\"a-zA-Z])*(^[0-9]*) - no matches
"note"([\s\:\"a-zA-Z])*([0-9]*+?) - pattern error
"note"([\s\:\"a-zA-Z])*(^[0-9]*+#?) - no match
Thanks for you help!
Updated for Matt. Below is my full JSON object
"response": {
"notes": [{
"note" : "",
"note" : "Here is my note",
"note" : "My note $%$",
"note" : "Created bug 14569 in the system",
"note" : "Another beautiful note",
"note" : "##$%##%dgdeg"
}]
}
You could try this regex:
"note"\s*:\s*".*?([0-9]++).*"
It will give you the number in group 1 of the match.
If you don't want to match numbers that are part of a word (e.g. "bug11") then surround the capture group with word boundary assertions (\b):
"note"\s*:\s*".*?\b([0-9]++)\b.*"
Regex101 demo
If all that you care about is that the line includes a number, then that is all you need to look for.
/[0-9]/ # matches if the string includes a digit
Or, as you want to capture the number:
/([0-9]+)/ # matches (and captures) one or more digits
This is a common error that I see when beginners build regular expressions. They want to build a regex that matches the whole string - when, actually, they only need to match the bit of the string that they want to match.
Update:
It might help to explain why some of your other attempts failed.
"note"([\s\:\"a-zA-Z])*([0-9]*) - 6 matches (I only need one)
The * means "match zero or more of the previous item", effectively making the item optional. This matches all lines as they all contain zero or more digits.
"note"([\s\:\"a-zA-Z])*(^[0-9]*) - no matches
The ^ means "the next item needs to be at the start of the string". You don't have digits at the start of your string.
"note"([\s\:\"a-zA-Z])*([0-9]*+?) - pattern error
Yeah. You're just adding random punctuation here, aren't you? *+? means nothing to the regex parser.
"note"([\s\:\"a-zA-Z])*(^[0-9]*+#?) - no match
This fails for the same reason as the previous attempt where you use ^ - the digits aren't at the start of the string. Also, the # has no special meaning in a regex, so #? means "zero or one # characters".
If you have JSON, why don't you parse the JSON and then grep through the result?
use JSON 'decode_json';
my $data = decode_json( $json_text );
my #rows = map { /\b(\d+)\b/ ? $1 : () } # keep only the number
map { $_->{note} } #$data;
This might work (?m-s)^[^"\r\n]*?"note"\h*:\h*"[^"\r\n]*?\d+[^"\r\n]*".*
https://regex101.com/r/ujDBa9/1
Explained
(?m-s) # Multi-line, no dot-all
^ # BOL
[^"\r\n]*? # Not a double quote yet
"note" \h* : \h* # Finally, a note
" [^"\r\n]*? \d+ [^"\r\n]* " # Is a number embedded within the quotes ?
.* # Ok, get the rest of the line

what is the regexp to accept

I have a last name in json request and i need to build schema for the json.
I have the schema as
"lastName": {
"type": "string",
"required": true,
"pattern":"^[a-zA-Z0-9'. ]{1,40}$"
}
But we got defect saying lastnames can be as follows.
Last names: apostrophe, hyphen, period (O’Rourke; Smith-Jones; St. Pierre).
Fixed the apostrophe, period and space but don't know how to put hyphen.
Please let me know how to fix this.
The hyphen can be put at the end of the list, which makes it clear that it's not a character range:
[.....-]
Note: I wouldn't accept special characters at the beginning of the name.
Escape it with a backslash (it can then be placed anywhere in the regex):
^[\-a-zA-Z0-9'. ]
or place it at the end (where it cannot be mistakenly parsed as a range separator):
^[a-zA-Z0-9'. -]

how to use a regular expression to extract json fields?

Beginner RegExp question. I have lines of JSON in a textfile, each with slightly different Fields, but there are 3 fields I want to extract for each line if it has it, ignoring everything else. How would I use a regex (in editpad or anywhere else) to do this?
Example:
"url":"http://www.netcharles.com/orwell/essays.htm",
"domain":"netcharles.com",
"title":"Orwell Essays & Journalism Section - Charles' George Orwell Links",
"tags":["orwell","writing","literature","journalism","essays","politics","essay","reference","language","toread"],
"index":2931,
"time_created":1345419323,
"num_saves":24
I want to extract URL,TITLE,TAGS,
/"(url|title|tags)":"((\\"|[^"])*)"/i
I think this is what you're asking for. I'll provide an explanation momentarily. This regular expression (delimited by / - you probably won't have to put those in editpad) matches:
"
A literal ".
(url|title|tags)
Any of the three literal strings "url", "title" or "tags" - in Regular Expressions, by default Parentheses are used to create groups, and the pipe character is used to alternate - like a logical 'or'. To match these literal characters, you'd have to escape them.
":"
Another literal string.
(
The beginning of another group. (Group 2)
(
Another group (3)
\\"
The literal string \" - you have to escape the backslash because otherwise it will be interpreted as escaping the next character, and you never know what that'll do.
|
or...
[^"]
Any single character except a double quote The brackets denote a Character Class/Set, or a list of characters to match. Any given class matches exactly one character in the string. Using a carat (^) at the beginning of a class negates it, causing the matcher to match anything that's not contained in the class.
)
End of group 3...
*
The asterisk causes the previous regular expression (in this case, group 3), to be repeated zero or more times, In this case causing the matcher to match anything that could be inside the double quotes of a JSON string.
)"
The end of group 2, and a literal ".
I've done a few non-obvious things here, that may come in handy:
Group 2 - when dereferenced using Backreferences - will be the actual string assigned to the field. This is useful when getting the actual value.
The i at the end of the expression makes it case insensitive.
Group 1 contains the name of the captured field.
EDIT: So I see that the tags are an array. I'll update the regular expression here in a second when I've had a chance to think about it.
Your new Regex is:
/"(url|title|tags)":("(\\"|[^"])*"|\[("(\\"|[^"])*"(,"(\\"|[^"])*")*)?\])/i
All I've done here is alternate the string regular expression I had been using ("((\\"|[^"])*)"), with a regular expression for finding arrays (\[("(\\"|[^"])*"(,"(\\"|[^"])*")*)?\]). No so easy to Read, is it? Well, substituting our String Regex out for the letter S, we can rewrite it as:
\[(S(,S)*)?\]
Which matches a literal opening bracket (hence the backslashes), optionally followed by a comma separated list of strings, and a closing bracket. The only new concept I've introduced here is the question mark (?), which is itself a type of repetition. Commonly referred to as 'making the previous expression optional', it can also be thought of as exactly 0 or 1 matches.
With our same S Notation, here's the whole dirty Regular Expression:
/"(url|title|tags)":(S|\[(S(,S)*)?\])/i
If it helps to see it in action, here's a view of it in action.
This question is a bit older, but I have had browsed a bit on my PC and found that expression. I passed him as GIST, could be useful to others.
EDIT:
# Expression was tested with PHP and Ruby
# This regular expression finds a key-value pair in JSON formatted strings
# Match 1: Key
# Match 2: Value
# https://regex101.com/r/zR2vU9/4
# http://rubular.com/r/KpF3suIL10
(?:\"|\')(?<key>[^"]*)(?:\"|\')(?=:)(?:\:\s*)(?:\"|\')?(?<value>true|false|[0-9a-zA-Z\+\-\,\.\$]*)
# test document
[
{
"_id": "56af331efbeca6240c61b2ca",
"index": 120000,
"guid": "bedb2018-c017-429E-b520-696ea3666692",
"isActive": false,
"balance": "$2,202,350",
"object": {
"name": "am",
"lastname": "lang"
}
}
]
the json string you'd like to extract field value from
{"fid":"321","otherAttribute":"value"}
the following regex expression extract exactly the "fid" field value "321"
(?<=\"fid\":\")[^\"]*
Please try below expression:
/"(url|title|tags)":("([^""]+)"|\[[^[]+])/gm
Explanation:
1st Capturing Group (url|title|tags): This is alternatively capturing the characters 'url','title' and 'tags' literally (case sensitive).
2nd Capturing Group ("([^""]+)"|[[^[]+]):
1st Alternative "([^""]+)" is matches all words within " and " including " and "
2nd Alternative [[^[]+] is matches all words within [ and ] including [ and ]
I have tested here
I adapted regex to work with JSON in my own library. I've detailed algorithm behavior below.
First, stringify the JSON object. Then, you need to store the starts and lengths of the matched substrings. For example:
"matched".search("ch") // yields 3
For a JSON string, this works exactly the same (unless you are searching explicitly for commas and curly brackets in which case I'd recommend some prior transform of your JSON object before performing regex (i.e. think :, {, }).
Next, you need to reconstruct the JSON object. The algorithm I authored does this by detecting JSON syntax by recursively going backwards from the match index. For instance, the pseudo code might look as follows:
find the next key preceding the match index, call this theKey
then find the number of all occurrences of this key preceding theKey, call this theNumber
using the number of occurrences of all keys with same name as theKey up to position of theKey, traverse the object until keys named theKey has been discovered theNumber times
return this object called parentChain
With this information, it is possible to use regex to filter a JSON object to return the key, the value, and the parent object chain.
You can see the library and code I authored at http://json.spiritway.co/
if your json is
{"key1":"abc","key2":"xyz"}
then below regex will extract key1 or key2 based on a key that you pass in regex
"key2(.*?)(?=,|}|$)
you can verify it here - regex101.com
Why does it have to be a Regular Expression object?
Here we can just use a Hash object first and then go search it.
mh = {"url":"http://www.netcharles.com/orwell/essays.htm","domain":"netcharles.com","title":"Orwell Essays & Journalism Section - Charles' George Orwell Links","tags":["orwell","writing","literature","journalism","essays","politics","essay","reference","language","toread"],"index":2931,"time_created":1345419323,"num_saves":24}
The output of which would be
=> {:url=>"http://www.netcharles.com/orwell/essays.htm", :domain=>"netcharles.com", :title=>"Orwell Essays & Journalism Section - Charles' George Orwell Links", :tags=>["orwell", "writing", "literature", "journalism", "essays", "politics", "essay", "reference", "language", "toread"], :index=>2931, :time_created=>1345419323, :num_saves=>24}
Not that I want to avoid using Regexp but don't you think it would be easier to take it a step at a time until your getting the data you want to further search through? Just MHO.
mh.values_at(:url, :title, :tags)
The output:
["http://www.netcharles.com/orwell/essays.htm", "Orwell Essays & Journalism Section - Charles' George Orwell Links", ["orwell", "writing", "literature", "journalism", "essays", "politics", "essay", "reference", "language", "toread"]]
Taking the pattern that FrankieTheKneeman gave you:
pattern = /"(url|title|tags)":"((\\"|[^"])*)"/i
we can search the mh hash by converting it to a json object.
/#{pattern}/.match(mh.to_json)
The output:
=> #<MatchData "\"url\":\"http://www.netcharles.com/orwell/essays.htm\"" 1:"url" 2:"http://www.netcharles.com/orwell/essays.htm" 3:"m">
Of course this is all done in Ruby which is not a tag that you have but relates I hope.
But oops! Looks like we can't do all three at once with that pattern so I will do them one at a time just for sake.
pattern = /"(title)":"((\\"|[^"])*)"/i
/#{pattern}/.match(mh.to_json)
#<MatchData "\"title\":\"Orwell Essays & Journalism Section - Charles' George Orwell Links\"" 1:"title" 2:"Orwell Essays & Journalism Section - Charles' George Orwell Links" 3:"s">
pattern = /"(tags)":"((\\"|[^"])*)"/i
/#{pattern}/.match(mh.to_json)
=> nil
Sorry about that last one. It will have to be handled differently.