Regular Expression to parse group of strings with quotes separated by space - regex

Given a line of string that does not have any linebreak, I want to get groups of strings which may consist of quotes and separated by space. Space is allowed only if it's within quotes. E.g.
a="1234" gg b=5678 c="1 2 3"
The result should have 4 groups:
a="1234"
gg
b=5678
c="1 2 3"
So far I have this
/[^\s]+(=".*?"|=".*?[^s]+|=[^\s]+|=)/g
but this cannot capture the second group "gg". I can't check if there is space before and after the text, as this will include the string that has space within quotes.
Any help will be greatly appreciated! Thanks.
Edited
This is for javascript

In JavaScript, you may use the following regex:
/\w+(?:=(?:"[^"]*"|\S+)?)?/g
See the regex demo.
Details
\w+ - 1+ letters, digits or/and _
(?:=(?:"[^"]*"|\S+)?)? - an optional sequence of:
= - an equal sign
(?:"[^"]*"|\S+)? - an optional sequence of:
"[^"]*" - a ", then 0+ chars other than " and then "
| - or
\S+ - 1+ non-whitespace chars
JS demo:
var rx = /\w+(?:=(?:"[^"]*"|\S+)?)?/g;
var s = 'a="1234" gg b=5678 c="1 2 3" d=abcd e=';
console.log(s.match(rx));

if I did not misunderstand what you are saying this is what you are looking for.
\w+=(?|"([^"]*)"|(\d+))|(?|[a-z]+)
think of the or works as a fallback option there for use more complex one in front of the more generic ones.
alternatively, you can remove second ?| and it will capture it as a different group so you can check that group (group 2)

Related

Regex ignoring matches between square brackets

Hi I'm trying to create a Regex to help separate a string into a series of object fields, however having issues where the individual field values themselves are lists and therefore comma separated internally.
string = "field1:1234,field2:[[1, 3],[3,4]], field3:[[1, 3],[3,4]]"
I want the regex to identify only the commas before "field2" and "field3", ignoring the ones separating the list values (e.g. 1 and 3, ] and [, 3 and 4.
I've tried using non-capturing groups to ignore the character after the commas (e.g. (,)([?!a-z]) ) but given I'm running this in Kotlin I don't think non-capturing and group separation is useful.
Is there a way to ignore string values between specified characters? E.g. ignore anything between "[[" and "]]" would work here.
Any help appreciated.
You can tweak the existing Java recursion mimicking regex to extract all the matches you need:
val rx = """\w+:(?:(?=\[)(?:(?=.*?\[(?!.*?\1)(.*\](?!.*\2).*))(?=.*?\](?!.*?\2)(.*)).)+?.*?(?=\1)[^\[]*(?=\2$)|\w+)""".toRegex()
val matches = rx.findAll(string).map{it.value}.joinToString("\n")
See the regex demo. Quick details:
\w+ - one or more letters, digits, underscores
: - a colon
(?: - start of a non-capturing group matching either
(?=\[)(?:(?=.*?\[(?!.*?\1)(.*\](?!.*\2).*))(?=.*?\](?!.*?\2)(.*)).)+?.*?(?=\1)[^\[]*(?=\2$) - a substring between two paired [ and ]
| - or
\w+ - one or more word chars
) - end of the non-capturing group.
See the Kotlin demo:
val string = "field1:1234,field2:[[1, 3],[3,4]], field3:[[1, 3],[3,4]]"
val rx = """\w+:(?:(?=\[)(?:(?=.*?\[(?!.*?\1)(.*\](?!.*\2).*))(?=.*?\](?!.*?\2)(.*)).)+?.*?(?=\1)[^\[]*(?=\2$)|\w+)""".toRegex()
print( rx.findAll(string).map{it.value}.joinToString("\n") )
Output:
field1:1234
field2:[[1, 3],[3,4]]
field3:[[1, 3],[3,4]]

golang regex get the string including the search character

I am extracting a piece of string from a string (link):
https://arteptweb-vh.akamaihd.net/i/am/ptweb/100000/100000/100095-000-A_0_VO-STE%5BANG%5D_AMM-PTWEB_XQ.1V7rLEYkPH.smil/master.m3u8
The desired output should be 100000/100000/100095-000-A_
I am using the Regex ^.*?(/[i,na,fm,d]([,/]?)(/am/ptweb/|.+=.+,))([^_]*).*?$ in Golang flavor and I can get only the group 4 with the folowing output 100000/100000/100095-000-A
However I want the underscore after A.
Bit stuck on this, any help on this is appreciated.
You can use
(/(i|na|fm|d)(/am/ptweb/|.+=.+,))([^_]*_?)
See the regex demo.
Details:
(/(i|na|fm|d)(/am/ptweb/|.+=.+,)) - Group 1:
/ - a / char
(i|na|fm|d) - Group 2: i, na, fm or d
(/am/ptweb/|.+=.+,) - Group 3: /amp/ptweb/ or one or more chars as many as possible (other than line break chars), =, one or more chars as many as possible (other than line break chars) and a , char
([^_]*_?) - Group 4: zero or more chars other than _ and then an optional _.
You can match the underscore after the A like:
^.*?(/(?:[id]|na|fm)([,/]?)(/am/ptweb/|.+=.+,))([^_]*_).*$
See a regex demo
A few notes about the pattern that you tried:
This notation is a character class [i,na,fm,d] which should be a grouping (?:[id]|na|fm)
In this group ([,/]?) you optionally capture either , or / so in theory it could match a string that has /i//am/ptweb/
The last part .*?$ does not have to be non greedy as it is the last part of the pattern
This part [^_]* can also match spaces and newlines

Using regex replacement in Sublime 3

I am trying to use replace in Sublime using regular expressions but I'm stuck. I tried various combinations but don't seem to be getting there.
This is the input and my desired output:
Input: N_BBP_c_46137_n
Output : BBP
I tried combinations of:
[^BBP]+\b
\*BBP*+\g
But none of the above (and many others) don't seem to work.
To turn N_BBP_c_46137_n into BBP and according to the comment just want that entire long name such as N_BBP_ to be replaced by only BBP* you might also use a capture group to keep BBP.
\bN_(BBP)_\S*
\bN_ Match N preceded by a word boundary
(BBP) Capture group 1, match BBP (or use [A-Z]+ to match 1+ uppercase chars)
_\S* Match _ followed by 0+ times a non whitespace char
In the replacement use the first capturing group $1
Regex demo
You may use
(N_)[^_]*(_c_\d+_n)
Replace with ${1}some new value$2.
Details
(N_) - Group 1 ($1 or ${1} if the next char is a digit): N_
[^_]* - any 0 or more chars other than _
-(_c_\d+_n) - Group 2 ($2): _c_, 1 or more digits and then _n.
See the regex demo.

Regex: Regular expression to pick the nth parameter of the response

Consider the example below:
AT+CEREG?
+CEREG: "4s",123,"7021","28","8B7200B",8,,,"00000010","10110100"
The desired response would be to pick n
n=1 => "4s"
n=2 => 123
n=8 =>
n=10 => 10110100
In my case, I am enquiring some details from an LTE modem and above is the type of response I receive.
I have created this regex which captures the (n+1)th member under group 2 including the last member, however, I can't seem to work out how to pick the 1st parameter in the approach I have taken.
(?:([^,]*,)){5}([^,].*?(?=,|$))?
Could you suggest an alternative method or complete/correct mine?
You may start matching from : (or +CEREG: if it is a static piece of text) and use
:\s*(?:[^,]*,){min}([^,]*)
where min is the n-1 position of the expected value.
See the regex demo. This solution is std::regex compatible.
Details
: - a colon
\s* - 0+ whitespaces
(?:[^,]*,){min} - min occurrences of any 0+ chars other than , followed with ,
([^,]*) - Capturing group 1: 0+ chars other than ,.
A boost::regex solution might look neater since you may easily capture substrings inside double quotes or substrings consisting of chars other than whitespace and commas using a branch reset group:
:\s*(?:[^,]*,){0}(?|"([^"]*)"|([^,\s]+))
See the regex demo
Details
:\s*(?:[^,]*,){min} - same as in the first pattern
(?| - start of a branch reset group where each alternative branch shares the same IDs:
"([^"]*)" - a ", then Group 1 holding any 0+ chars other than " and then a " is just matched
| - or
([^,\s]+)) - (still Group 1): one or more chars other than whitespace and ,.

Regex: cutting a portion of string between two characters (for smarty)

I need to cut a portion of a string delimited by two characters.
(From the second ":" to the "|")
EG, I have this string (without quotation marks):
"Materiale : Pelle naturale, Colore : Pelle | Rosso"
and I must remove " Pelle | " in the results.
Note that "Pelle" is just an example, but it could be a lot of different words.
Can someone help me?
Thank you
You can use
{'/((?:[^:]+:){2})[^|]+\|/'|preg_replace:'$1':$value}
See the regex demo
The regex means:
((?:[^:]+:){2}) - matches and captures into Group 1 two sequences of:
[^:]+ - 1+ symbols other than :
: - a literal :
[^|]+\| - 1+ characters other than | (with [^|]+) and then |.
In the replacement pattern, we just restore the Group 1 with the $1 backreference that gets the text captured by the first group.
The negated character class is a very handy construct in regex.