Regex to parse json formatted string - regex

I have a JSON formatted string which I am trying to parse using a regex. I would like to parse each key value pair for later use in grafana (the regex itself is used in logstash).
The test string looks like this:
{
"version":"1.1",
"nameId":"test",
"productId":"B2",
"total customers":99,
"full_description":"asdf"
}
I am using the following regex expression, but it seems that if the value is a number (without " "), it groups the comma inthe the value. For example, the group value for the key "total customers" is "99," and not just "99".
(?i)["'](?<key>[^"]*)["'](?:\:)["'\{\[]?([\r\n]?\t+\")?(?<value>\w(?:\s[a-zA-Z0-9_=]\.?)+\w+#(?:(?:\w[a-z\d\-]+\w)\.)+[a-z]{2,10}|true|false|[\w+-.$\s=-]*)(",[\r\n])?(?2)?(?J)(?<value>(?&value))?
What do I have to add to the regex expression in order to parse JSON-values which are numbers?

This part in the pattern [\w+-.$\s=-] has a range +-. instead of matching either a + - or .
The range matches ASCII chars decimals number 43-46 where number 44 matches the unwanted ,
As the character class already matches - at the end, so you can omit the middle -.
The pattern contains some superfluous escapes and capture groups and seems a bit complicated. The updated pattern with just 2 capture groups could look like;
(?i)["'](?<key>[^"]*)["']:["'{\[]?(?:[\r\n]?\t+")?(?<value>\w(?:\s[a-zA-Z\d_=]\.?)+\w+#(?:\w[a-z\d-]+\w\.)+[a-z]{2,10}|true|false|[\w+.$\s=-]*)(?:",[\r\n])?(?2)?(?J)(?<value>(?&value))?
Regex demo

Related

RegEx Parse Tool to extract digits from string

Using Alteryx, I have a field called Address which consists of fields like A32C, GH2X, ABC19E. So basically where digits are pinned between sets of letters. I am trying to use the RegEx tool to extract the digits out into a new column called ADDRESS1.
I have Address set to Field to Parse. Output method Parse.
My regular expression is typed in as:
(?:[[alpha]]+)(/d+)(?:[[alpha]]+)
And then I have (/d+) outputting to ADDRESS1. However, when I run this it parses 0 records. What am I doing wrong?
To match a digit, use [0-9] or \d. To match a letter, use [[:alpha:]].
Use
[[:alpha:]]+(\d+)[[:alpha:]]+
See the regex demo.
You can try this :
let regex = /(?!([A-Z]+))(\d+)(?=[A-Z]+)/g;
let values = 'A32CZ, GH2X, ABC19E'
let result = values.match(regex);
console.log(result);

Value matching in regex and Openrefine

I am trying to use the value.match command in OpenRefine 2.6 for splitting two columns based on a 4 number date.
A sample of the text is:
"first sentence, second sentence, third sentences, 2009"
What I do is going to "Add column based on this column" and insert
value.match(\d{4})
but I get the error
Parsing error at offset 12: Missing number, string, identifier, regex,
or parenthesized expression
any idea of the possible solution?
You need to fix 3 things to get this working:
1) As Wiktor says you need to start & end the regular expression with a forward slash /
2) The 'match' function requires you to match the whole string in the cell, not just the fragment you need - so your regular expression needs to match the whole string
3) To extract part of a string with 'match' you need to have capture groups in your regular expression- that is use ( ) around the bit of the regular expression you want to extract. The captured values will be put in an array and you will need to get the string out of tge array to store it in a cell
So you'll need something like:
value.match(/.*(\d{4})/)[0]
To get the four digit year from the end of the string

Remove set of string from a string, multiple occurences

Want to completely remove any part of my string that has
\"AddedDate\":\"\\/Date(1480542000000-0600)\\/\"
The 1480526460000-0600 is not hardcoded, it could be any set of numbers (JSON dates).
Try this regex \"AddedDate\":\"\\\/Date\(\d+(?:-\d+)?\)\\\?\" and replace with empty string. If the regex engine doesn't support \d, replace them with [0-9]. This will match date format like x or x-x, x being any number of digits.
If you want to match exactly 13 numbers in the first part of the date and 4 in the second, use \"AddedDate\":\"\\\/Date\(\d{13}(?:-\d{4})?\)\\\?\"
EDIT: For new format use \\\"AddedDate\\\":\\\"\\\\\/Date\(\d+(?:-\d+)?\)\\\\\/\\\" it should work.

1 to 5 of the same groups in REGEX

For a string such as:
abzyxcabkmqfcmkcde
Notice that there are string patterns between ab and c in bold. To capture the first string pattern:
ab([a-z]{3,5})c
Is it possible to match both of the groups from the sample string? Actually, there should be 1 to 5 groups.
Note: python style regex.
You can verify that a given string conforms to the 1-5 repetitions of ab([a-z]{3,5})c using this regex
(?:ab([a-z]{3,5})c){1,5}
or this one if there are characters expected between the groups
(?:ab([a-z]{3,5})c.*?){1,5}
You will only be able to extract the last matching group from that string however, not any of the previous ones. to get a previous one you need to use hsz's approach
Just match all results - i.e. with g flag:
/ab([a-z]{3,5})c/g
or some method like in Python:
re.findall(pattern, string, flags=0)

Regular expression string followed by numbers

I am writing a regular expression to extract phrases like #Question1# or #Question125# from html string like
Patitent name #Question1#, Patient was suffering from #Question2#, Patient's gender is #Question3#, patient has #Question4# drinking for the last month. His DOB is #Question5#
The first half of the expression is simple just #Question, but I also need to match for a series of digits with unspecified length, and the whole string ends with #.
Once I find the matching phrase, how I extract only the digits from the string? Like for example, #Question312#, I just want to get 312 out?
Any suggestion?
The regexp you are looking for is
/#Question[0-9]+#/
If you need to extract the number you can just wrap the [0-9]+ part in parenthesis
/#Question([0-9]+)#/
making it a group. How you use a captured group depends on the specific regexp implementation (e.g. python, perl, javascript ...). For example in python you can replace all those questions with corresponding answers from a list with
answers = ["Andrea", "Griffini"]
text = "My first name is #Question1# and my last name is #Question2#"
print re.sub("#Question([0-9]+)#",
lambda x:answers[int(x.group(1)) - 1],
text)
I think what you are looking for is:
#Question[0-9]+#
#Question
Any character in this class: [0-9], one or more repetitions
#