Relevant Regular Expression in scala

Relevant Regular Expression in scala - regex

I want to keep only the last term of a string separated by dots
Example:
My string is:
abc"val1.val2.val3.val4"zzz
Expected string after i use regex:
abc"val4"zzz
Which means i want the content from left-hand side which was separated with dot (.)
The most relevant I tried was
val json="""abc"val1.val2.val3.val4"zzz"""
val sortie="""(([A-Za-z0-9]*)\.([A-Za-z0-9]*){2,10})\.([A-Za-z0-9]*)""".r.replaceAllIn(json, a=> a.group(3))
the result was:
abc".val4"zzz
Can you tell me if you have different solution for regex please?
Thanks

You may use
val s = """abc"val1.val2.val3.val4"zzz"""
val res = "(\\w+\")[^\"]*\\.([^\"]*\")".r replaceAllIn (s, "$1$2")
println(res)
// => abc"val4"zzz
See the Scala demo
Pattern details:
(\\w+\") - Group 1 capturing 1+ word chars and a "
[^\"]* - 0+ chars other than "
\\. - a dot
([^\"]*\") - Group 2 capturing 0+ chars other than " and then a ".
The $1 is the backreference to the first group and $2 inserts the text inside Group 2.

Maybe without Regex at all:
scala> json.split("\"").map(_.split("\\.").last).mkString("\"")
res4: String = abc"val4"zzz
This assumes you want each "token" (separated by ") to become the last dot-separated inner token.

Related

How to split string into parts using Regex

I have a string containing placeholders which I want replace with other strings, but I would also like to split the string whenever I encounter a placeholder.
So, by splitting I mean that
"This {0} is an example {1} with a placeholder"
should become:
parts[0] -> "This"
parts[1] -> "{0}"
parts[2] -> "is an example"
parts[3] -> "{1}"
parts[4] -> "with a placeholder"
and then the next step would be to replace the placeholders (this part is simple):
parts[0] -> "This"
parts[1] -> value[0]
parts[2] -> "is an example"
parts[3] -> value[1]
parts[4] -> "with a placeholder"
I know how to match and replace the placeholders (e.g. ({\d+})), but no clue how to tell regex to "match non placeholders" and "match placeholders" at the same time.
My idea was something like: (?!{\d+})+ | ({\d+}) but it's not working. I am doing this in JavaScript if Regex flavor is important.
If I can also replace the placeholders with a value in one step it would be neat, but I can also do this after I split.

You might write the pattern as:
{\d+}|\S.*?(?=\s*(?:{\d+}|$))
The pattern matches:
{\d+} Match { 1+ digits and }
| Or
\S.*? Match a non whitespace char followed by any character as few as possible
(?= Positive lookahead
\s* Match optional whitespace chars
(?:{\d+}|$) Match either { 1+ digits and } or assert the end of the string
) Close the lookahead
Regex demo
To get an array with those values:
const regex = /{\d+}|\S.*?(?=\s*(?:{\d+}|$))/gm;
const str = `This {0} is an example {1} with a placeholder`;
console.log(str.match(regex))

If you use parenthesis around the separator, matched results are included in the output:
let parts = str.split(/ *({\d+}) */);
See this demo at tio.run - If separator occurs at start/end, just filter out empty matches.
If your goal is just to replace, it can be done in one step using replace and a callback:
str = str.replace(/{(\d+)}/g, (m0, m1) => value[m1]);
Another demo at tio.run - m0 is the full match, m1 holds the capture of the first group.
Used with g (global) flag to return all possible matches in the string (not just the first).

Replace N spaces at the beginning of a line with N characters

I am looking for a regex substitution to transform N white spaces at the beginning of a line to N . So this text:
list:
- first
should become:
list:
- first
I have tried:
str = "list:\n - first"
str.gsub(/(?<=^) */, " ")
which returns:
list:
- first
which is missing one . How to improve the substitution to get the desired output?

You could make use of the \G anchor and \K to reset the starting point of the reported match.
To match all leading single spaces:
(?:\R\K|\G)
(?: Non capture group
\R\K Match a newline and clear the match buffer
| Or
\G Assert the position at the end of the previous match
) Close non capture group and match a space
See a regex demo and a Ruby demo.
To match only the single leading spaces in the example string:
(?:^.*:\R|\G)\K
In parts, the pattern matches:
(?: Non capture group
^.*:\R Match a line that ends with : and match a newline
| Or
\G Assert the position at the end of the previous match, or at the start of the string
) Close non capture group
\K Forget what is matched so far and match a space
See a regex demo and a Ruby demo.
Example
re = /(?:^.*:\R|\G)\K /
str = 'list:
- first'
result = str.gsub(re, ' ')
puts result
Output
list:
- first

I would write
"list:\n - first".gsub(/^ +/) { |s| ' ' * s.size }
#=> "list:\n - first"
See String#*

Use gsub with a callback function:
str = "list:\n - first"
output = str.gsub(/(?<=^|\n)[ ]+/) {|m| m.gsub(" ", " ") }
This prints:
list:
- first
The pattern (?<=^|\n)[ ]+ captures one or more spaces at the start of a line. This match then gets passed to the callback, which replaces each space, one at a time, with .

You can use a short /(?:\G|^) / regex with a plain text replacement pattern:
result = text.gsub(/(?:\G|^) /, ' ')
See the regex demo. Details:
(?:\G|^) - start of a line or string or the end of the previous match
- a space.
See a Ruby demo:
str = "list:\n - first"
result = str.gsub(/(?:\G|^) /, ' ')
puts result
# =>
# list:
# - first
If you need to match any whitespace, replace with a \s pattern. Or use \h if you need to only match horizontal whitespace.

Regex matching parentheses with =

I'm trying to write a regex to filter out parameters of a handlebars call:
example call:
117-tooltip classes=(concat (concat "productTile__product-availability " classes) " tooltip--small-icon productAvailability__tooltip") bla=(concat "test" "test2")
what my matches should be:
classes=(concat (concat "productTile__product-availability " classes) " tooltip--small-icon productAvailability__tooltip")
bla=(concat "test" "test2")
what my matches currently are:
(concat (concat "productTile__product-availability " classes) " tooltip--small-icon productAvailability__tooltip")
(concat "test" "test2")
my regex:
\((?>[^()]|(?R))*\)
I need to extend it so the structure must be something=(...(...)..) with an unknown number of matching parentheses.
How do I need to extend the regex to get the x= part also into it?

You can use a regex subroutine:
(\w+)=(\(((?>[^()]++|(?2))*)\))
See the regex demo. Details:
(\w+) - Capturing group 1: one or more word chars
= - a = char
(\(((?>[^()]++|(?2))*)\)) - Group 2 (needed for the regex subroutine to work):
\( - ( char
((?>[^()]++|(?2))*) - Group 3: zero or more repetitions of one or more chars other than ( and ) or the whole Group 2 pattern recursed
\) - a ) char.

I would use:
\b\w+=.*?(?=\s+\w+=|$)
Demo
The idea behind this pattern is to match a key= followed by all content leading up to, but not including, either the next key, or the end of the input.
Explanation:
\b\w+= match a KEY=
.*? match all content up, but not including
(?=\s+\w+=|$) assert that what follows is one or more
whitespace characters followed by KEY= OR
the end of the input

Regex for parse name with one or more words after double number and before 2 or more spaces

Problem:
How create regex to parse "DISNAY LAND 2.0 GCP" like name from Array of lines in Scala like this:
DE1ALAT0002 32.4756 -86.4393 106.1 ZQ DISNAY LAND 2.0 GCP 23456
//For using in code:
val regex = """(?:[\d\.\d]){2}\s*(?:[\d.\d])\s*(ZQ)\s*([A-Z])""".r . // my attempt
val getName = row match {
case regex(name) => name
case _ =>
}
I'm sure only in:
1) there is different number of spaces between values
2) useful value "DISNAY LAND 2.0 GCP" come after double number and "ZQ" letters
3) name separating with one space and may consist of one or many words
4) name ending with two or more spaces
sorry if I repeat the question, but after a long search I did not find the right solution
Many thank for answers

You may use an .unanchored pattern like
\d\.\d+\s+ZQ\s+(\S+(?:\s\S+)*)
See the regex demo. Details
\d\.\d+ - 1 digit, . and then 1+ digits
\s+ - 1+ whitespaces
ZQ - ZQ substring
\s+ - 1+ whitespaces (here, the left-hand side context definition ends, now, starting to capture the value we need to return)
(\S+(?:\s\S+)*) - Capturing group 1:
\S+ - 1 or more non-whitespace chars
(?:\s\S+)* - a non-capturing group that matches 0 or more sequences of a single whitespace (\s) and then 1+ non-whitespace chars (so, up to the double whitespace or end of string).
Scala demo:
val regex = """\d\.\d+\s+ZQ\s+(\S+(?:\s\S+)*)""".r.unanchored
val row = "DE1ALAT0002 32.4756 -86.4393 106.1 ZQ DISNAY LAND 2.0 GCP 23456"
val getName = row match {
case regex(name) => name
case _ =>
}
print(getName)
Output: DISNAY LAND 2.0 GCP

Regular Expression to parse group of strings with quotes separated by space

Given a line of string that does not have any linebreak, I want to get groups of strings which may consist of quotes and separated by space. Space is allowed only if it's within quotes. E.g.
a="1234" gg b=5678 c="1 2 3"
The result should have 4 groups:
a="1234"
gg
b=5678
c="1 2 3"
So far I have this
/[^\s]+(=".*?"|=".*?[^s]+|=[^\s]+|=)/g
but this cannot capture the second group "gg". I can't check if there is space before and after the text, as this will include the string that has space within quotes.
Any help will be greatly appreciated! Thanks.
Edited
This is for javascript

In JavaScript, you may use the following regex:
/\w+(?:=(?:"[^"]*"|\S+)?)?/g
See the regex demo.
Details
\w+ - 1+ letters, digits or/and _
(?:=(?:"[^"]*"|\S+)?)? - an optional sequence of:
= - an equal sign
(?:"[^"]*"|\S+)? - an optional sequence of:
"[^"]*" - a ", then 0+ chars other than " and then "
| - or
\S+ - 1+ non-whitespace chars
JS demo:
var rx = /\w+(?:=(?:"[^"]*"|\S+)?)?/g;
var s = 'a="1234" gg b=5678 c="1 2 3" d=abcd e=';
console.log(s.match(rx));

if I did not misunderstand what you are saying this is what you are looking for.
\w+=(?|"([^"]*)"|(\d+))|(?|[a-z]+)
think of the or works as a fallback option there for use more complex one in front of the more generic ones.
alternatively, you can remove second ?| and it will capture it as a different group so you can check that group (group 2)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Relevant Regular Expression in scala - regex

Maybe without Regex at all: scala> json.split("\"").map(_.split("\\.").last).mkString("\"") res4: String = abc"val4"zzz This assumes you want each "token" (separated by ") to become the last dot-separated inner token.

Related

How to split string into parts using Regex

Replace N spaces at the beginning of a line with N characters

Regex matching parentheses with =

Regex for parse name with one or more words after double number and before 2 or more spaces

Regular Expression to parse group of strings with quotes separated by space

Categories

Resources