Parse labeled param strings with Regex - regex

Can anyone help me with this one?
My objective here is to grab some info from a text file, present the user with it and ask for values to replace that info so to generate a new output. So I thought of using regular expressions.
My variables would be of the format: {#<num>[|<value>]}.
Here are some examples:
{#1}<br>
{#2|label}<br>
{#3|label|help}<br>
{#4|label|help|something else}<br><br>
So after some research and experimenting, I came up with this expression: \{\#(\d{1,})(?:\|{1}(.+))*\}
which works pretty well on most of the ocasions, except when on something like this:
{#1} some text {#2|label} some more text {#3|label|help}
In this case variables 2 & 3 are matched on a single occurrence rather than on 2 separate matches...
I've already tried to use lookahead commands for the trailing } of the expression, but I didn't manage to get it.
I'm targeting this expression for using into C#, should that further help anyone...

I like the results from this one:
\{\#(\d+)(?:|\|(.+?))\}
This returns 3 groups. The second group is the number (1, 2, 3) and the third group is the arguments ('label', 'label|help').
I prefer to remove the * in favor of | in order to capture all the arguments after the first pipe in the last grouping.

A regular expression which can be used would be something like
\{\#(\d+)(?:\|([^|}]+))*\}
This will prevent reading over any closing }.
Another possible solution (with slightly different behaviour) would be to use a non-greedy matcher (.+?) instead of the greedy version (.+).
Note: I also removed the {1} and replaced {1,} with + which are equivalent in your case.

Try this:
\{\#(\d+)(?:\|[^|}]+)*\}
In C#:
MatchCollection matches = Regex.Matches(mystring,
#"\{\#(\d+)(?:\|[^|}]+)*\}");
It prevents the label and help from eating the | or }.
match[0].Value => {#1}
match[0].Groups[0].Value => {#1}
match[0].Groups[1].Value => 1
match[1].Value => {#2|label}
match[1].Groups[0].Value => {#2|label}
match[1].Groups[1].Value => 2
match[2].Value => {#3|label|help}
match[2].Groups[0].Value => {#3|label|help}
match[2].Groups[1].Value => 3

Related

Regex- to get part of String

I have got below string and I need to Get all the values Between Pizzahut: and |.
ABC:2fg45rdvsg|Pizzahut:j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg|
I have got RegExpression .scan(/(?<=Pizzahut:)([.*\s\S]+)(?=\|)/) but it fetches
"j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg|"
Result should be: 34532jdhgj,3242237,67688873rg
You can use
s='ABC:2fg45rdvsg|Pizzahut:j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg|'
p s.scan(/Pizzahut:([^|]+)/).flatten
# => ["j34532jdhgj", "3242237", "67688873rg"]
See this Ruby demo and the Rubular demo.
It does not look possible that you have Pizzahut as a part of another word, but it is possible, use a version with a word boundary, /\bPizzahut:([^|]+)/.
The Pizzahut:([^|]+) matches Pizzahut: and then captures into Group 1 any one or more chars other than a pipe (with ([^|]+)).
Note that String#scan returns the captures only if a pattern contains a capturing group, so you do not need to use lookarounds.
I'm not sure why you're jumping to a regex solution here; that input string clearly looks structured to me, and you would probably do better by splitting it on the delimiters to convert it into a more convenient data structure.
Something like this:
input = "ABC:2fg45rdvsg|Pizzahut:j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg"
converted_input = input
.split('|') #=> ["ABC:2fg45rdvsg", "Pizzahut:j34532jdhgj", ... ]
.map { |pair| pair.split(':') } #=> [["ABC", "2fg45rdvsg"], ["Pizzahut", "j34532jdhgj"], ... ]
.group_by(&:first) #=> {"ABC"=>[["ABC", "2fg45rdvsg"]], "Pizzahut"=>[["Pizzahut", "j34532jdhgj"], ... ], "Dominos"=>[["Dominos", "3424232"]], ... ]
.transform_values { |v| v.flat_map(&:last) }
(The above series of transformations is just one possible way; you could probably come up with a dozen similar alternative steps to convert this input into the same hash shown below! For example, by using reduce or even the CSV library.)
Which gives you the final result:
converted_input = {
"ABC" => ["2fg45rdvsg"],
"Pizzahut" => ["j34532jdhgj", "3242237", "67688873rg"],
"Dominos" => ["3424232"],
"Wendys" => ["3462783"]
}
Now that the data is formatted conveniently, obtaining data like your original request becomes trivial:
converted_input["Pizzahut"].join(',') #=> "j34532jdhgj,3242237,67688873rg"
(Although quite likely it would be more suitable to leave it as an Array, not a comma-separated String!!)

Backrefence without matching it on find result

Consider the text structure
(Title)[#1Title-link]
(Chapter1)[#Chapter1-link]
(Chapter2)[#Chapter2-link]
(Chapter3)[#Chapter3-link]
How can i backrefence to [#Title-link] without matching it on find result. Im trying to change
(Chapter1)[#Chapter1-link] => (Chapter1)[#1Title-link-Chapter1-link]
(Chapter2)[#Chapter2-link] => (Chapter2)[#1Title-link-Chapter2-link]
(Chapter3)[#Chapter3-link] => (Chapter3)[#1Title-link-Chapter3-link]
I tried to use and find
(\(Title\)\[(.*?)])([\s\S]*?\[)#(\D.*?\])
then replace it with
$1$3$2-$4
but the problem in here it only highlight once per find and i got lots of chapter its too inefficient to replace it one by one.
Making a constant title is no good too because i have multiple files with that same structure.
Is this possible in regex? any solution or alternative is welcome.
You can first do a search to get the correct substitution string and then do a subsequent replace operation with that substitution string. You did not specify what language you were using, so here is the code in Python (where that back reference to group 1 is \1 rather than the more usual $1):
import re
text = """(Title)[#1Title-link]
(Chapter1)[#Chapter1-link]
(Chapter2)[#Chapter2-link]
(Chapter3)[#Chapter3-link]"""
m = re.search(r'(?:\(Title\)\[#([^\]]*)\])', text)
assert(m) # that we have a match
substitution = m.group(1)
text = re.sub(r'\[#Chapter([^\]]*)\]', r'[#' + substitution + r'-Chapter\1' + ']', text)
print(text)
Prints:
(Title)[#1Title-link]
(Chapter1)[#1Title-link-Chapter1-link]
(Chapter2)[#1Title-link-Chapter2-link]
(Chapter3)[#1Title-link-Chapter3-link]
See Regex Demo 1 for getting the substitution string
See Regex Demo 2 for making the subsitutions

Regex Multiple rows [duplicate]

I'm trying to get the list of all digits preceding a hyphen in a given string (let's say in cell A1), using a Google Sheets regex formula :
=REGEXEXTRACT(A1, "\d-")
My problem is that it only returns the first match... how can I get all matches?
Example text:
"A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq"
My formula returns 1-, whereas I want to get 1-2-2-2-2-2-2-2-2-2-3-3- (either as an array or concatenated text).
I know I could use a script or another function (like SPLIT) to achieve the desired result, but what I really want to know is how I could get a re2 regular expression to return such multiple matches in a "REGEX.*" Google Sheets formula.
Something like the "global - Don't return after first match" option on regex101.com
I've also tried removing the undesired text with REGEXREPLACE, with no success either (I couldn't get rid of other digits not preceding a hyphen).
Any help appreciated!
Thanks :)
You can actually do this in a single formula using regexreplace to surround all the values with a capture group instead of replacing the text:
=join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)")))
basically what it does is surround all instances of the \d- with a "capture group" then using regex extract, it neatly returns all the captures. if you want to join it back into a single string you can just use join to pack it back into a single cell:
You may create your own custom function in the Script Editor:
function ExtractAllRegex(input, pattern,groupId) {
return [Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId])];
}
Or, if you need to return all matches in a single cell joined with some separator:
function ExtractAllRegex(input, pattern,groupId,separator) {
return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]).join(separator);
}
Then, just call it like =ExtractAllRegex(A1, "\d-", 0, ", ").
Description:
input - current cell value
pattern - regex pattern
groupId - Capturing group ID you want to extract
separator - text used to join the matched results.
Edit
I came up with more general solution:
=regexreplace(A1,"(.)?(\d-)|(.)","$2")
It replaces any text except the second group match (\d-) with just the second group $2.
"(.)?(\d-)|(.)"
1 2 3
Groups are in ()
---------------------------------------
"$2" -- means return the group number 2
Learn regular expressions: https://regexone.com
Try this formula:
=regexreplace(regexreplace(A1,"[^\-0-9]",""),"(\d-)|(.)","$1")
It will handle string like this:
"A1-Nutrition;A2-ActPhysiq;A2-BioM---eta;A2-PH3-Généti***566*9q"
with output:
1-2-2-2-3-
I wasn't able to get the accepted answer to work for my case. I'd like to do it that way, but needed a quick solution and went with the following:
Input:
1111 days, 123 hours 1234 minutes and 121 seconds
Expected output:
1111 123 1234 121
Formula:
=split(REGEXREPLACE(C26,"[a-z,]"," ")," ")
The shortest possible regex:
=regexreplace(A1,".?(\d-)|.", "$1")
Which returns 1-2-2-2-2-2-2-2-2-2-3-3- for "A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq".
Explanation of regex:
.? -- optional character
(\d-) -- capture group 1 with a digit followed by a dash (specify (\d+-) multiple digits)
| -- logical or
. -- any character
the replacement "$1" uses just the capture group 1, and discards anything else
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex
This seems to work and I have tried to verify it.
The logic is
(1) Replace letter followed by hyphen with nothing
(2) Replace any digit not followed by a hyphen with nothing
(3) Replace everything which is not a digit or hyphen with nothing
=regexreplace(A1,"[a-zA-Z]-|[0-9][^-]|[a-zA-Z;/é]","")
Result
1-2-2-2-2-2-2-2-2-2-3-3-
Analysis
I had to step through these procedurally to convince myself that this was correct. According to this reference when there are alternatives separated by the pipe symbol, regex should match them in order left-to-right. The above formula doesn't work properly unless rule 1 comes first (otherwise it reduces all characters except a digit or hyphen to null before rule (1) can come into play and you get an extra hyphen from "Patho-jour").
Here are some examples of how I think it must deal with the text
The solution to capture groups with RegexReplace and then do the RegexExctract works here too, but there is a catch.
=join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)")))
If the cell that you are trying to get the values has Special Characters like parentheses "(" or question mark "?" the solution provided won´t work.
In my case, I was trying to list all “variables text” contained in the cell. Those “variables text “ was wrote inside like that: “{example_name}”. But the full content of the cell had special characters making the regex formula do break. When I removed theses specials characters, then I could list all captured groups like the solution did.
There are two general ('Excel' / 'native' / non-Apps Script) solutions to return an array of regex matches in the style of REGEXEXTRACT:
Method 1)
insert a delimiter around matches, remove junk, and call SPLIT
Regexes work by iterating over the string from left to right, and 'consuming'. If we are careful to consume junk values, we can throw them away.
(This gets around the problem faced by the currently accepted solution, which is that as Carlos Eduardo Oliveira mentions, it will obviously fail if the corpus text contains special regex characters.)
First we pick a delimiter, which must not already exist in the text. The proper way to do this is to parse the text to temporarily replace our delimiter with a "temporary delimiter", like if we were going to use commas "," we'd first replace all existing commas with something like "<<QUOTED-COMMA>>" then un-replace them later. BUT, for simplicity's sake, we'll just grab a random character such as  from the private-use unicode blocks and use it as our special delimiter (note that it is 2 bytes... google spreadsheets might not count bytes in graphemes in a consistent way, but we'll be careful later).
=SPLIT(
LAMBDA(temp,
MID(temp, 1, LEN(temp)-LEN(""))
)(
REGEXREPLACE(
"xyzSixSpaces:[ ]123ThreeSpaces:[ ]aaaa 12345",".*?( |$)",
"$1"
)
),
""
)
We just use a lambda to define temp="match1match2match3", then use that to remove the last delimiter into "match1match2match3", then SPLIT it.
Taking COLUMNS of the result will prove that the correct result is returned, i.e. {" ", " ", " "}.
This is a particularly good function to turn into a Named Function, and call it something like REGEXGLOBALEXTRACT(text,regex) or REGEXALLEXTRACT(text,regex), e.g.:
=SPLIT(
LAMBDA(temp,
MID(temp, 1, LEN(temp)-LEN(""))
)(
REGEXREPLACE(
text,
".*?("&regex&"|$)",
"$1"
)
),
""
)
Method 2)
use recursion
With LAMBDA (i.e. lets you define a function like any other programming language), you can use some tricks from the well-studied lambda calculus and function programming: you have access to recursion. Defining a recursive function is confusing because there's no easy way for it to refer to itself, so you have to use a trick/convention:
trick for recursive functions: to actually define a function f which needs to refer to itself, instead define a function that takes a parameter of itself and returns the function you actually want; pass in this 'convention' to the Y-combinator to turn it into an actual recursive function
The plumbing which takes such a function work is called the Y-combinator. Here is a good article to understand it if you have some programming background.
For example to get the result of 5! (5 factorial, i.e. implement our own FACT(5)), we could define:
Named Function Y(f)=LAMBDA(f, (LAMBDA(x,x(x)))( LAMBDA(x, f(LAMBDA(y, x(x)(y)))) ) ) (this is the Y-combinator and is magic; you don't have to understand it to use it)
Named Function MY_FACTORIAL(n)=
Y(LAMBDA(self,
LAMBDA(n,
IF(n=0, 1, n*self(n-1))
)
))
result of MY_FACTORIAL(5): 120
The Y-combinator makes writing recursive functions look relatively easy, like an introduction to programming class. I'm using Named Functions for clarity, but you could just dump it all together at the expense of sanity...
=LAMBDA(Y,
Y(LAMBDA(self, LAMBDA(n, IF(n=0,1,n*self(n-1))) ))(5)
)(
LAMBDA(f, (LAMBDA(x,x(x)))( LAMBDA(x, f(LAMBDA(y, x(x)(y)))) ) )
)
How does this apply to the problem at hand? Well a recursive solution is as follows:
in pseudocode below, I use 'function' instead of LAMBDA, but it's the same thing:
// code to get around the fact that you can't have 0-length arrays
function emptyList() {
return {"ignore this value"}
}
function listToArray(myList) {
return OFFSET(myList,0,1)
}
function allMatches(text, regex) {
allMatchesHelper(emptyList(), text, regex)
}
function allMatchesHelper(resultsToReturn, text, regex) {
currentMatch = REGEXEXTRACT(...)
if (currentMatch succeeds) {
textWithoutMatch = SUBSTITUTE(text, currentMatch, "", 1)
return allMatches(
{resultsToReturn,currentMatch},
textWithoutMatch,
regex
)
} else {
return listToArray(resultsToReturn)
}
}
Unfortunately, the recursive approach is quadratic order of growth (because it's appending the results over and over to itself, while recreating the giant search string with smaller and smaller bites taken out of it, so 1+2+3+4+5+... = big^2, which can add up to a lot of time), so may be slow if you have many many matches. It's better to stay inside the regex engine for speed, since it's probably highly optimized.
You could of course avoid using Named Functions by doing temporary bindings with LAMBDA(varName, expr)(varValue) if you want to use varName in an expression. (You can define this pattern as a Named Function =cont(varValue) to invert the order of the parameters to keep code cleaner, or not.)
Whenever I use varName = varValue, write that instead.
to see if a match succeeds, use ISNA(...)
It would look something like:
Named Function allMatches(resultsToReturn, text, regex):
UNTESTED:
LAMBDA(helper,
OFFSET(
helper({"ignore"}, text, regex),
0,1)
)(
Y(LAMBDA(helperItself,
LAMBDA(results, partialText,
LAMBDA(currentMatch,
IF(ISNA(currentMatch),
results,
LAMBDA(textWithoutMatch,
helperItself({results,currentMatch}, textWithoutMatch)
)(
SUBSTITUTE(partialText, currentMatch, "", 1)
)
)
)(
REGEXEXTRACT(partialText, regex)
)
)
))
)

Trying to build a regular expression to check pattern

a) Start and end with a number
b) Hyphen should start and end with a number
c) Comma should start and end with a number
d) Range of number should be from 1-31
[Edit: Need this rule in the regex, thanks Ed-Heal!]
e) If a number starts with a hyphen (-), it cannot end with any other character other than a comma AND follow all rules listed above.
E.g. 2-2,1 OR 2,2-1 is valid while 1-1-1-1 is not valid
E.g.
a) 1-5,5,15-29
b) 1,28,1-31,15
c) 15,25,3 [Edit: Replaced 56 with 3, thanks for pointing it out Brian!]
d) 1-24,5-6,2-9
Tried this but it passes even if the string starts with a comma:
/^[0-9]*(?:-[0-9]+)*(?:,[0-9]+)*$/
How about this? This will check rules a, b and c, at least, but does not check rule d.
/^[0-9]+(-[0-9]+)?(,[0-9]+(-[0-9]+)?)*$/
If you need to ensure that all the numbers are in the range 1-31, then the expression will get a whole lot uglier:
/^([1-9]|[12][0-9]|3[01])(-([1-9]|[12][0-9]|3[01]))?(,([1-9]|[12][0-9]|3[01])(-([1-9]|[12][0-9]|3[01]))?)*$/
Note that your example c contains a number, 56, that does not fall within the range 1-31, so it will not pass the second expression.
try this
^\d+(-\d+)?(,\d+(-\d+)?)*$
DEMO
Here is my workings
Numbers:
0|([1-9][0-9]*) call this expression A Note this expression treats zero as a special case and prevents numbers starting with a zero eg 0000001234
Number or a range:
A|(A-A) call this expression B (i.e (0|([1-9][0-9]*))|((0|([1-9][0-9]*))-(0|([1-9][0-9]*)))
Comma operator
B(,B)*
Putting this togher should do the trick and we get
((0|([1-9][0-9]*))|((0|([1-9][0-9]*))-(0|([1-9][0-9]*))))(,((0|([1-9][0-9]*))|((0|([1-9][0-9]*))-(0|([1-9][0-9]*)))))*
You can abbreviatge this with \d for [0-9]
The other approaches have not restricted the allowed range of numbers. This allows 1 through 31 only, and seems simpler than some of the monstrosities people have come up with ...
^([12][0-9]?|3[01]?|[4-9])([-,]([12][0-9]?|3[01]?|[4-9]))*$
There is no check for sensible ranges; adding that would make the expression significantly more complex. In the end you might be better off with a simpler regex and implementing sanity checks in code.
I propose the following regex:
(?<number>[1-9]|[12]\d|3[01]){0}(?<thing>\g<number>-\g<number>|\g<number>){0}^(\g<thing>,)*\g<thing>$
It looks awful but it isn't :) In fact the construction (?<name>...){0} allows us to define a named regex and to say that it doesn't match where it is defined. Thus I defined a pattern for numbers called number and a pattern for what I called a thing i.e. a range or number called thing. Next I know that your expression is a sequence of those things, so I use the named regex thing to build it with the construct \g<thing>. It gives (\g<thing>,)*\g<thing>. That's easy to read and understand. If you allow whitespaces to be non significant in your regex, you could even indent it like this:
(?<number>[1-9]|[12]\d|3[01]){0}
(?<thing>\g<number>-\g<number>|\g<number>){0}
^(\g<thing>,)*\g<thing>$/
I tested it with Ruby 1.9.2. Your regex engine should support named groups to allow that kind of clarity.
irb(main):001:0> s1 = '1-5,5,15-29'
=> "1-5,5,15-29"
irb(main):002:0> s2 = '1,28,1-31,15'
=> "1,28,1-31,15"
irb(main):003:0> s3 = '15,25,3'
=> "15,25,3"
irb(main):004:0> s4 = '1-24,5-6,2-9'
=> "1-24,5-6,2-9"
irb(main):005:0> r = /(?<number>[1-9]|[12]\d|3[01]){0}(?<thing>\g<number>-\g<number>|\g<number>){0}^(\g<thing>,)*\g<thing>$/
=> /(?<number>[1-9]|[12]\d|3[01]){0}(?<thing>\g<number>-\g<number>|\g<number>){0}^(\g<thing>,)*\g<thing>$/
irb(main):006:0> s1.match(r)
=> #<MatchData "1-5,5,15-29" number:"29" thing:"15-29">
irb(main):007:0> s2.match(r)
=> #<MatchData "1,28,1-31,15" number:"15" thing:"15">
irb(main):008:0> s3.match(r)
=> #<MatchData "15,25,3" number:"3" thing:"3">
irb(main):009:0> s4.match(r)
=> #<MatchData "1-24,5-6,2-9" number:"9" thing:"2-9">
irb(main):010:0> '1-1-1-1'.match(r)
=> nil
Using the same logic in my previous answer but limiting the range
A becomes [1-9]\d|3[01]
B becomes ([1-9]\d|3[01])|(([1-9]\d|3[01])-([1-9]\d|3[01]))
Overall expression
(([12]\d|3[01])|(([12]\d|3[01])-([12]\d|3[01])))(,(([12]\d|3[01])|(([12]\d|3[01])-([12]\d|3[01]))))*
An optimal Regex for this topic could be:
^(?'int'[1-2]?[1-9]|3[01])((,\g'int')|(-\g'int'(?=$|,)))*$
demo

A regex to match between delimiters except when there is a colon that is not between double quotes?

This one is a little bit complicated and I'm not sure if it can be done.
The regex need to match everything between a , (comma) or [] (square brackets).
It must not match if there is a :
And now the tricky part.
If the : is between " " it can match.
I managed to create a regex that does everything except the last.
(?<=[[,])[^:]+?(?=[],])
So this is what it needs to match.
[ ItemName:Data, More Data, With a number "as: " item name]
I'm going to keep testing. Lets see if someone solves it.
It sounds like you're trying to specify a language that's really to complicated to parse using only regular expressions. Here's a pattern that matches what you've described, but probably won't work perfectly. It doesn't use look behinds so you need to select the first match group to get the contents.
/[\[,](("[^"\]]*"|[^:\[])*?)[\]\,]/
/[\[,] # Opening bracket or comma.
(("[^"\]]*" # Anything not including the closing bracket, in quotes...
|[^:\[] # or not including the colon...
))*? # repeated any number of times.
[\]\,]/x # Closing bracket or comma.
An example usage in Python:
import re
pattern = re.compile(r"""[\[,](("[^"\]]*"|[^:\[])*?)[\]\,]""", re.DEBUG)
for match in pattern.finditer('[1 2 3] [4 5] [6 : 7], "8 : 9", '):
print match.group(1)
Producing output:
1 2 3
4 5
"8 : 9"
I have good experience in using (perl) regexps in practise, so let me share my experience. If you are handling complex cases like this it is almost always best to do it step by step, unless you are in special ciscumstances (for example speed of execution is crucial).
So in this case I woud simply do it in two steps. First explode the data to chunks, i.e. something like (depending on your language)
split(/[][,]/)
and than accept or remove individual parts. In this case just remove parts which match this expression
/^([^"]*:.*|.*:[^"])$/
i.e. parts which include semicolon not surrounded with parantheses.
Clearly this deos not solve all the cases like With a number "as: " : "item" name, but I agree with Jeremy, than if you are trying to implement complicated syntax language, than it might not be the right thing to just throw few regexpes on it without deeper analysis (i.e. answering what exactly it should accept in wierd cases like [ 1:1, 2":"2,3":":3,4":":":"4,5":":"5], ...) and using appropriate aprroach to solve it (recursive syntax parser)