Regex for KeyValue pattern

Regex for KeyValue pattern - regex

I have to check if a string follows the following patterns:
Field1=value1
Field1=value1,Field2=value2
7645a=fds23,Field2=dsd$
The words 'field1', 'value1' don't count, the important thing is that it has to be something=something and if there is more than 1, it should be a comma for each pair.
I reached the following regex:
((\w+)[^=])=((\w+)[^=])
"Match any one or more word except if it has =, then there should be an = and then match any one or more word except if it has =".
The thing is, it does take the comma but I think is because of \w. I don't think this is correct.
I'm using https://regexr.com/ to check for the correct regular expression.

If you need to match symbols like $, then don't use \w. This satisfies all your conditions:
(?:([^,=\n]+)=([^,=\n]+))(?:,([^,=\n]+)=([^,=\n]+))*
Explanation:
(?: // Begin non-capturing group (first key=value pair)
( // Begin capturing group (key)
[^,=\n]+ // Match one or more characters that aren't comma, equals, or new line
) // End capturing group (key)
= // Equals
( // Begin capturing group (value)
[^,=\n]+ // Match one or more characters that aren't comma, equals, or new line
) // End capturing group (value)
) // End non-capturing group (first key=value pair)
(?: // Begin non-capturing group (additional key=value pairs)
, // Starts with comma (otherwise entire group fails)
( // Begin capturing group (key)
[^,=\n]+ // Match one or more characters that aren't comma, equals, or new line
) // End capturing group (key)
= // Equals
( // Begin capturing group (value)
[^,=\n]+ // Match one or more characters that aren't comma, equals, or new line
) // End capturing group (value)
) // End non-capturing group (additional key=value pairs)
* // Match 0 or more of the additional key value pairs
Test Here

Related

How to match in a single/common Regex Group matching or based on a condition

I would like to extract two different test strings /i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
and
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8
with a single RegEx and in Group-1.
By using this RegEx ^.[i,na,fm,d]+\/(.+([,\/])?(\/|.+=.+,\/).+\/[,](live.([^,]).).+_)?.+(640).*$ I can get the second string to match the desired result int/2021/11/25/,live_20211125_215206_
but the first string does not match in Group-1 and the missing expected test string 1 extraction is int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45
Any pointers on this is appreciated.
Thanks!

If you want both values in group 1, you can use:
^/(?:[id]|na|fm)/([^/\s]*/\d{4}/\d{2}/\d{2}/\S*?)(?:/,|[^_]+_)640(?:\D|$)
The pattern matches:
^ Start of string
/ Match literally
(?:[id]|na|fm) Match one of i d na fm
/ Match literally
( Capture group 1
[^/\s]*/ Match any char except a / or a whitespace char, then match /
\d{4}/\d{2}/\d{2}/ Match a date like pattern
\S*? Match optional non whitespace chars, as few as possible
) Close group 1
(?:/,|[^_]+_) Match either /, or 1+ chars other than _ and then match _
640 Match literally
(?:\D|$) Match either a non digits or assert end of string
See a regex demo and a go demo.

We can't know all the rules of how the strings your are matching are constructed, but for just these two example strings provided:
package main
import (
"fmt"
"regexp"
)
func main() {
var re = regexp.MustCompile(`(?m)(\/i/int/\d{4}/\d{2}/\d{2}/.*)(?:\/,|_[\w_]+)640`)
var str = `
/i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8`
match := re.FindAllStringSubmatch(str, -1)
for _, val := range match {
fmt.Println(val[1])
}
}

Pattern match for (length)%code with before length

I have a pattern like x%c, where x is a single digit integer and c is an alphanumeric code of length x. % is just a token separator of length and code
For instance 2%74 is valid since 74 is of 2 digits. Similarly, 1%8 and 4%3232 are also valid.
I have tried regex of form ^([0-9])(%)([A-Z0-9]){\1}, where I am trying to put a limit on length by the value of group 1. It does not work apparently since the group is treated as a string, not a number.
If I change the above regex to ^([0-9])(%)([A-Z0-9]){2} it will work for 2%74 it is of no use since my length is to be limited controlled by the first group not a fixed digit.
I it is not possible by regex is there a better approach in java?

One way could be using 2 capture groups, and convert the first group to an int and count the characters for the second group.
\b(\d+)%(\d+)\b
\b Word boundary
(\d+) Capture group 1, match 1+ digits
% Match literally
(\d+) Capture group 2, match 1+ digits
\b Word boundary
Regex demo | Java demo
For example
String regex = "\\b(\\d+)%(\\d+)\\b";
String string = "2%74";
Pattern pattern = Pattern.compile(regex);
String strings[] = { "2%74", "1%8", "4%3232", "5%123456", "6%0" };
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
if (Integer.parseInt(matcher.group(1)) == matcher.group(2).length()) {
System.out.println("Match for " + s);
} else {
System.out.println("No match for " + s);
}
}
}
Output
Match for 2%74
Match for 1%8
Match for 4%3232
No match for 5%123456
No match for 6%0

Regex: Deal \r\n as normal word

I'm doing a small project which can calculate the count of functions in C++ files(.cpp).
I used the following Regex as "function pattern":
/[a-z|A-Z]+\s*::\s*~?[a-z|A-Z]+\(.*\)/gm
It works for most cases, but fails when there are new line breaks in ().
void CXYZRScanPanel::OnPrepareScanning()
{
//This one is ok.
}
void CXYZRScanPanel::OnPrepareScanning(int k)
{
//This one is ok.
}
void CXYZRScanPanel::OnPrepareScanning(int k,
int j)
{
//This one fails.
}
I'm thinking if there is anything "stronger" than the .* which can skip the \r\n.
Thanks for any help.
If there is no such a thing, I will probably remove all /r/n within () before doing the such.

You could write the pattern using a negated character class starting with [^ matching any char except ( and ) which will also match a newline.
Note that you can omit the | in the character class.
[a-zA-Z]+\s*::\s*~?[a-zA-Z]+(\([^()]*\))
The pattern matches:
[a-zA-Z]+ Match 1+ times chars a-zA-Z
\s*::\s* Match :: between optional whitespace chars
~? Match an optional ~ char
[a-zA-Z]+ Match 1+ times chars a-zA-Z
( Capture group 1
\([^()]*\) Optionally match any char except ( and ) between parenthesis
) Close group 1
See a regex demo

regex match longest substring with equal first and last char

/(\w)(\w*)\1/
For this string:"mgntdygtxrvxjnwksqhxuxtrv" I match "txrvxjnwksqhxuxt" (using Ruby), but not the even longer valid substring "tdygtxrvxjnwksqhxuxt".

For a given string, here are two ways to find the longest substring that begins and ends with the same character.
Suppose
str = "mgntdygtxrvxjnwksqhxuxtrv"
Use a regular expression
r = /(.)(?=(.*\1))/
str.gsub(r).map { $1 + $2 }.max_by(&:length)
#=> "tdygtxrvxjnwksqhxuxt".
When, as here, the regular expression contains capture groups, it may be more convenient to use String#gsub without a second argument or block (in which case it returns an enumerator, which can be chained) than String#scan (" If the pattern contains groups, each individual result is itself an array containing one entry per group.") Here gsub performs no substitutions; it merely generates matches of the regular expression.
The regular expression can be made self-documenting by writing it in free-spacing mode.
r = /
(.) # match any char and save to capture group 1
(?= # begin a positive lookahead
(.*\1) # match >= 0 characters followed by the contents of capture group 1
) # end the postive lookahead
/x # free-spacing regex definition mode
The following intermediate calculation is performed:
str.gsub(r).map { $1 + $2 }
#=> ["gntdyg", "ntdygtxrvxjn", "tdygtxrvxjnwksqhxuxt", "txrvxjnwksqhxuxt",
# "xrvxjnwksqhxux", "rvxjnwksqhxuxtr", "vxjnwksqhxuxtrv", "xjnwksqhxux",
# "xux"]
Notice that this does not enumerate all substrings beginning and ending with the same character (because .* is greedy). It does not generate, for example, the substring "xrvx".
Do not use a regular expression
v = str.each_char.with_index.with_object({}) do |(c,i),h|
if h.key?(c)
h[c][:size] = i - h[c][:start] + 1
else
h[c] = { start: i, size: 1 }
end
end.max_by { |_,h| h[:size] }.last
str[v[:start], v[:size]]
#=> "tdygtxrvxjnwksqhxuxt"

Split string on commas ignoring commas, brackets, braces in parenthesis, quotes

I am attempting to split a comma separated list. I want to ignore commas that are in parenthesis, brackets, braces and quotes using regex. To be more precise I am trying to do this in postgres POSIX regexp_split_to_array.
My knowledge of regex is not great and by searching on stack overflow I was able to get a partial solution, I can split the string if it does not contain nested parenthesis, brackets, braces. Here is the regex:
,(?![^()]*+\))(?![^{}]*+})(?![^\[\]]*+\])(?=(?:[^"]|"[^"]*")*$)
Test case:
0, (1,2), (1,2,(1,2)) [1,2,3,[1,2]], [1,2,3], "text, text (test)", {a1:1, a2:3, a3:{a1=1, s2=2}, a4:"asasad, sadsas, asasdasd"}
Here is the demo
The problem is that in i.e. (1,2,(1,2)) the first 2 commas get matched if there is a nested parenthesis.

Even though regex is not the best way to go, here is a solution with recursive matching:
(?>(?>\([^()]*(?R)?[^()]*\))|(?>\[[^[\]]*(?R)?[^[\]]*\])|(?>{[^{}]*(?R)?[^{}]*})|(?>"[^"]*")|(?>[^(){}[\]", ]+))(?>[ ]*(?R))*
If we break it down, there is a group with some stuff inside, followed by more of the same kind of matching, separated by optional spaces.
(?> <---- start matching
... <---- some stuff inside
) <---- end matching
(?>
[ ]* <---- optional spaces
(?R) <---- match the entire thing again
)* <---- can be repeated
From your example 0, (1,2), (1,2,(1,2)) [1,2,3,[1,2]], [1,2,3],..., we want to match:
0
(1,2)
(1,2,(1,2)) [1,2,3,[1,2]]
[1,2,3]
...
For the third match, the stuff inside will match (1,2,(1,2)) and [1,2,3,[1,2]], which are separated by a space.
The stuff inside is a series of options:
(?>
(?>...)| <---- will match balanced ()
(?>...)| <---- will match balanced []
(?>...)| <---- will match balanced {}
(?>...)| <---- will match "..."
(?>...) <---- will match anything else without space or comma
)
Here are the options:
\( <---- literal (
[^()]* <---- any number of chars except ( or )
(?R)? <---- match the entire thing optionally
[^()]* <---- any number of chars except ( or )
\) <---- literal )
\[ <---- literal [
[^[\]]* <---- any number of chars except [ or ]
(?R)? <---- match the entire thing optionally
[^[\]]* <---- any number of chars except [ or ]
\] <---- literal ]
{ <---- literal {
[^{}]* <---- any number of chars except { or }
(?R)? <---- match the entire thing optionally
[^{}]* <---- any number of chars except { or }
} <---- literal }
" <---- literal "
[^"]* <---- any number of chars except "
" <---- literal "
[^(){}[\]", ]+ <---- one or more chars except comma, or space, or these: (){}[]"
Note that this does not match a comma-separated list, but the items in such a list. The exclusion of comma and space in the last option above causes it to stop matching at comma or space (except for space we explicitly allowed between repeated matches).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex for KeyValue pattern - regex

Related

How to match in a single/common Regex Group matching or based on a condition

Pattern match for (length)%code with before length

Regex: Deal \r\n as normal word

regex match longest substring with equal first and last char

Split string on commas ignoring commas, brackets, braces in parenthesis, quotes

Categories

Resources