I'm trying to write a regex to filter out parameters of a handlebars call:
example call:
117-tooltip classes=(concat (concat "productTile__product-availability " classes) " tooltip--small-icon productAvailability__tooltip") bla=(concat "test" "test2")
what my matches should be:
classes=(concat (concat "productTile__product-availability " classes) " tooltip--small-icon productAvailability__tooltip")
bla=(concat "test" "test2")
what my matches currently are:
(concat (concat "productTile__product-availability " classes) " tooltip--small-icon productAvailability__tooltip")
(concat "test" "test2")
my regex:
\((?>[^()]|(?R))*\)
I need to extend it so the structure must be something=(...(...)..) with an unknown number of matching parentheses.
How do I need to extend the regex to get the x= part also into it?
You can use a regex subroutine:
(\w+)=(\(((?>[^()]++|(?2))*)\))
See the regex demo. Details:
(\w+) - Capturing group 1: one or more word chars
= - a = char
(\(((?>[^()]++|(?2))*)\)) - Group 2 (needed for the regex subroutine to work):
\( - ( char
((?>[^()]++|(?2))*) - Group 3: zero or more repetitions of one or more chars other than ( and ) or the whole Group 2 pattern recursed
\) - a ) char.
I would use:
\b\w+=.*?(?=\s+\w+=|$)
Demo
The idea behind this pattern is to match a key= followed by all content leading up to, but not including, either the next key, or the end of the input.
Explanation:
\b\w+= match a KEY=
.*? match all content up, but not including
(?=\s+\w+=|$) assert that what follows is one or more
whitespace characters followed by KEY= OR
the end of the input
Related
I have a string containing placeholders which I want replace with other strings, but I would also like to split the string whenever I encounter a placeholder.
So, by splitting I mean that
"This {0} is an example {1} with a placeholder"
should become:
parts[0] -> "This"
parts[1] -> "{0}"
parts[2] -> "is an example"
parts[3] -> "{1}"
parts[4] -> "with a placeholder"
and then the next step would be to replace the placeholders (this part is simple):
parts[0] -> "This"
parts[1] -> value[0]
parts[2] -> "is an example"
parts[3] -> value[1]
parts[4] -> "with a placeholder"
I know how to match and replace the placeholders (e.g. ({\d+})), but no clue how to tell regex to "match non placeholders" and "match placeholders" at the same time.
My idea was something like: (?!{\d+})+ | ({\d+}) but it's not working. I am doing this in JavaScript if Regex flavor is important.
If I can also replace the placeholders with a value in one step it would be neat, but I can also do this after I split.
You might write the pattern as:
{\d+}|\S.*?(?=\s*(?:{\d+}|$))
The pattern matches:
{\d+} Match { 1+ digits and }
| Or
\S.*? Match a non whitespace char followed by any character as few as possible
(?= Positive lookahead
\s* Match optional whitespace chars
(?:{\d+}|$) Match either { 1+ digits and } or assert the end of the string
) Close the lookahead
Regex demo
To get an array with those values:
const regex = /{\d+}|\S.*?(?=\s*(?:{\d+}|$))/gm;
const str = `This {0} is an example {1} with a placeholder`;
console.log(str.match(regex))
If you use parenthesis around the separator, matched results are included in the output:
let parts = str.split(/ *({\d+}) */);
See this demo at tio.run - If separator occurs at start/end, just filter out empty matches.
If your goal is just to replace, it can be done in one step using replace and a callback:
str = str.replace(/{(\d+)}/g, (m0, m1) => value[m1]);
Another demo at tio.run - m0 is the full match, m1 holds the capture of the first group.
Used with g (global) flag to return all possible matches in the string (not just the first).
I am looking for a regex substitution to transform N white spaces at the beginning of a line to N . So this text:
list:
- first
should become:
list:
- first
I have tried:
str = "list:\n - first"
str.gsub(/(?<=^) */, " ")
which returns:
list:
- first
which is missing one . How to improve the substitution to get the desired output?
You could make use of the \G anchor and \K to reset the starting point of the reported match.
To match all leading single spaces:
(?:\R\K|\G)
(?: Non capture group
\R\K Match a newline and clear the match buffer
| Or
\G Assert the position at the end of the previous match
) Close non capture group and match a space
See a regex demo and a Ruby demo.
To match only the single leading spaces in the example string:
(?:^.*:\R|\G)\K
In parts, the pattern matches:
(?: Non capture group
^.*:\R Match a line that ends with : and match a newline
| Or
\G Assert the position at the end of the previous match, or at the start of the string
) Close non capture group
\K Forget what is matched so far and match a space
See a regex demo and a Ruby demo.
Example
re = /(?:^.*:\R|\G)\K /
str = 'list:
- first'
result = str.gsub(re, ' ')
puts result
Output
list:
- first
I would write
"list:\n - first".gsub(/^ +/) { |s| ' ' * s.size }
#=> "list:\n - first"
See String#*
Use gsub with a callback function:
str = "list:\n - first"
output = str.gsub(/(?<=^|\n)[ ]+/) {|m| m.gsub(" ", " ") }
This prints:
list:
- first
The pattern (?<=^|\n)[ ]+ captures one or more spaces at the start of a line. This match then gets passed to the callback, which replaces each space, one at a time, with .
You can use a short /(?:\G|^) / regex with a plain text replacement pattern:
result = text.gsub(/(?:\G|^) /, ' ')
See the regex demo. Details:
(?:\G|^) - start of a line or string or the end of the previous match
- a space.
See a Ruby demo:
str = "list:\n - first"
result = str.gsub(/(?:\G|^) /, ' ')
puts result
# =>
# list:
# - first
If you need to match any whitespace, replace with a \s pattern. Or use \h if you need to only match horizontal whitespace.
I have a file with text like this:
"Title" = "Body"
And I would like to remove both " before the =, to leave it like this:
Title = "Body"
So far I managed to select the first block of text with:
.+(=)
That selects everything up to the =, but I can't find how to reemplace (or delete) both " .
Any suggestions?
You could use a capture group in the replacement, and match the double quotes to be removed while asserting an equals sign at the right.
Find what:
"([^"]+)"(?=\h*=)
" Match literally
([^"]+) Capture group 1, match 1+ times any char other than "
" Match literally
(?=\h*=) Positive lookahead, assert an = sigh at the right
Regex demo
Replace with:
$1
To match the whole pattern from the start till end end of the string, you might also use 2 capture groups and use those in the replacement.
^"([^"]+)"(\h*=\h*"[^"]+")$
Regex demo
In the replacement use $1$2
You can use
(?:\G(?!^)|^(?=.*=))[^"=\v]*\K"
Replace with an empty string.
Details:
(?:\G(?!^)|^(?=.*=)) - end of the previous successful match (\G(?!^)) or (|) start of a line that contains = somewhere on it (^(?=.*=))
[^"=\v]* - any zero or more chars other than ", = and vertical whitespace
\K - omit the text matched
" - a " char (matched, consumed and removed)
See the screenshot with settings and a demo:
Consider the example below:
AT+CEREG?
+CEREG: "4s",123,"7021","28","8B7200B",8,,,"00000010","10110100"
The desired response would be to pick n
n=1 => "4s"
n=2 => 123
n=8 =>
n=10 => 10110100
In my case, I am enquiring some details from an LTE modem and above is the type of response I receive.
I have created this regex which captures the (n+1)th member under group 2 including the last member, however, I can't seem to work out how to pick the 1st parameter in the approach I have taken.
(?:([^,]*,)){5}([^,].*?(?=,|$))?
Could you suggest an alternative method or complete/correct mine?
You may start matching from : (or +CEREG: if it is a static piece of text) and use
:\s*(?:[^,]*,){min}([^,]*)
where min is the n-1 position of the expected value.
See the regex demo. This solution is std::regex compatible.
Details
: - a colon
\s* - 0+ whitespaces
(?:[^,]*,){min} - min occurrences of any 0+ chars other than , followed with ,
([^,]*) - Capturing group 1: 0+ chars other than ,.
A boost::regex solution might look neater since you may easily capture substrings inside double quotes or substrings consisting of chars other than whitespace and commas using a branch reset group:
:\s*(?:[^,]*,){0}(?|"([^"]*)"|([^,\s]+))
See the regex demo
Details
:\s*(?:[^,]*,){min} - same as in the first pattern
(?| - start of a branch reset group where each alternative branch shares the same IDs:
"([^"]*)" - a ", then Group 1 holding any 0+ chars other than " and then a " is just matched
| - or
([^,\s]+)) - (still Group 1): one or more chars other than whitespace and ,.
I want to keep only the last term of a string separated by dots
Example:
My string is:
abc"val1.val2.val3.val4"zzz
Expected string after i use regex:
abc"val4"zzz
Which means i want the content from left-hand side which was separated with dot (.)
The most relevant I tried was
val json="""abc"val1.val2.val3.val4"zzz"""
val sortie="""(([A-Za-z0-9]*)\.([A-Za-z0-9]*){2,10})\.([A-Za-z0-9]*)""".r.replaceAllIn(json, a=> a.group(3))
the result was:
abc".val4"zzz
Can you tell me if you have different solution for regex please?
Thanks
You may use
val s = """abc"val1.val2.val3.val4"zzz"""
val res = "(\\w+\")[^\"]*\\.([^\"]*\")".r replaceAllIn (s, "$1$2")
println(res)
// => abc"val4"zzz
See the Scala demo
Pattern details:
(\\w+\") - Group 1 capturing 1+ word chars and a "
[^\"]* - 0+ chars other than "
\\. - a dot
([^\"]*\") - Group 2 capturing 0+ chars other than " and then a ".
The $1 is the backreference to the first group and $2 inserts the text inside Group 2.
Maybe without Regex at all:
scala> json.split("\"").map(_.split("\\.").last).mkString("\"")
res4: String = abc"val4"zzz
This assumes you want each "token" (separated by ") to become the last dot-separated inner token.