regexp for find key value pairs separated by colon - regex

I'm having trouble coming up with a regular expression for a string in the given form:
123123<key:value><key:value>,21313<key:value><key:value>
where the key:value pairs are optional, but we must not have two colons in the same key:value pairs.
I've gotten this far:
^((\d+)(<(.+?):(.+?)>)*)(,\d+)(<(.+?):(.+?)>)*$
some valid texts:
123131
123131, 123131, 1213313
12313<key:value>
232133<key:value><key:value>,232133<key:value><key:value>

Try this:
^((\d+)(<(.+?):(.+?)>){0,2})(,\s*((\d+)(<(.+?):(.+?)>){0,2}))*$
Depending on which group you don't want to capture, you can change ( ) to (?: ).
Rubular link

Try using this ^(\d+(<.+:.+>){1,2})(,\d+(<.+:.+>){1,2})*$ Hope it helped

Thanks a lot for your responses, but none of them seem to do excactly what I'm looking for. I think maybe the easies thing is to follow OrangeDogs suggestion considering maintainability as well...

Related

How do I make a regex match a query string?

I'm working on a simple router, and I need to be able to identify if there's a query-like structure at the end of a URL address.
What I came up with:
(\?([^&=]+)=([^&=]+)&?)+$
simply does not work! It works on a first iteration: i.e. xxx?foo=bar, but definitely not on two i.e. xxx?foo=bar&greeting=hello won't work.
What am I doing wrong? And also: Is there a better solution to accomplish this?
You need to match one key-value pair (([^&=]+)=([^&=]+)) preceded with a question mark (\?([^&=]+)=([^&=]+)) followed by zero to any number of key-value pairs preceded by an ampersand each ((?:&([^&=]+)=([^&=]+))*):
\?([^&=]+)=([^&=]+)(?:&([^&=]+)=([^&=]+))*$
Demo: https://regex101.com/r/OHdRHS/1

Separate values out of semicolon delimited string with regular expression

I need help for a regular expression.
My search can't find something useful so far. My string looks like:
E32;E223;E0;A1023
I would like to get the values E32 and E223 and E0 and A1023.
What is the best regex syntax for it?
Any help would be appreciated.
Thanks,
You may use this: [^;]+ or \w+.
This will give you every semicolon separated token, and will exclude empty tokens.
Edit: also you can and should use "E32;E223;E0;A1023".split(";") like falsetru mentioned in the comments (providing your language supports this -> which it probably does).

Combining regex groups to one group or exlude a character from a match

I have this string, and I need to get the datetime out of it by using regex. I have little to no experience with regex and am stuck.
As an example, take this string: Vic-nc_20150406_0100
I want to get the following result: 201504060100
How am I to accomplish this? So far I've come up with this expression: ([0-9]{8})_([0-9]{4}), although the result is two groups (20150404 and 0100).
Another expression I've come up with is ([0-9]{8}_[0-9]{4}), now the result is 20150406_0100.
I either need to combine the groups or filter out the [_] somehow. Can anybody help me out?
Thanks in advance!
If you want to replace, then just take the value of two groups.
Find (\d{8})_(\d{4})
Replace \1\2 or $1$2 based on your program language.

Extract text between two given strings

Hopefully someone can help me out. Been all over google now.
I'm doing some zone-ocr of documents, and want to extract some text with regex. It is always like this:
"Til: Name Name Name org.nr 12323123".
I want to extract the name-part, it can be 1-4 names, but "Til:" and "org.nr" is always before and after.
Anyone?
If you can't use capturing groups (check your documentation) you can try this:
(?<=Til:).*?(?=org\.nr)
This solution is using look behind and lookahead assertions, but those are not supported from every regex flavour. If they are working, this regex will return only the part you want, because the parts in the assertions are not matched, it checks only if the patterns in the assertions are there.
Use the pattern:
Til:(.*)org\.nr
Then take the second group to get the content between the parenthesis.

Regex, how to select all items outside of selection group

I'm a Regex noob and am pretty sure I'm not going about this in the most efficient way - wanted to get some advice.
I have a Regex expression ((\w+\b.*?){100}){1} which selects the first 100 words of my string, the length of which varies.
What I want is to select the entire string except for the first 100 words.
Is there syntax I can add to my current expression to do this, or am I better off trying to directly select the rest of the text instead.
Also, if anyone has any good resources for improving my Regex knowledge, i'd be very appreciative. Thus far I've found http://gskinner.com/RegExr/ to be very helpful.
Thanks in advance!
If you use this, you can refer to everything else as group 3 noted as $3
This one will treat hyphenated words as one word.
(\w+(-\w+|\b).*?){100}(.*)
Regex training Here