Swift Regex for extracting words between parenthesis - regex

Hello i wanna extract the text between ().
For example :
(some text) some other text -> some text
(some) some other text -> some
(12345) some other text -> 12345
the maximum length of the string between parenthesis should be 10 characters.
(TooLongStri) -> nothing matched because 11 characters
what i have currently is :
let regex = try! NSRegularExpression(pattern: "\\(\\w+\\)", options: [])
regex.enumerateMatchesInString(text, options: [], range: NSMakeRange(0, (text as NSString).length))
{
(result, _, _) in
let match = (text as NSString).substringWithRange(result!.range)
if (match.characters.count <= 10)
{
print(match)
}
}
which works nicely but the matches are :
(some text) some other text -> (some text)
(some) some other text -> (some)
(12345) some other text -> (12345)
and doesn't match <=10 because () are counted also.
How can i change the code above to solve that? I would like also to remove the if (match.characters.count <= 10)by extending the regex to hold the length info.

You can use
"(?<=\\()[^()]{1,10}(?=\\))"
See the regex demo
The pattern:
(?<=\\() - asserts the presence of a ( before the current position and fails the match if there is none
[^()]{1,10} - matches 1 to 10 characters other than ( and ) (replace [^()] with \w if you need to only match alphanumeric / underscore characters)
(?=\\)) - checks if there is a literal ) after the current position, and fail the match if there is none.
If you can adjust your code to get the value at Range 1 (capture group) you can use a simpler regex:
"\\(([^()]{1,10})\\)"
See the regex demo. The value you need is inside Capture group 1.

This will work
\((?=.{0,10}\)).+?\)
Regex Demo
This will also work
\((?=.{0,10}\))([^)]+)\)
Regex Demo
Regex Breakdown
\( #Match the bracket literally
(?=.{0,10}\)) #Lookahead to check there are between 0 to 10 characters till we encounter another )
([^)]+) #Match anything except )
\) #Match ) literally

Related

How to split string into parts using Regex

I have a string containing placeholders which I want replace with other strings, but I would also like to split the string whenever I encounter a placeholder.
So, by splitting I mean that
"This {0} is an example {1} with a placeholder"
should become:
parts[0] -> "This"
parts[1] -> "{0}"
parts[2] -> "is an example"
parts[3] -> "{1}"
parts[4] -> "with a placeholder"
and then the next step would be to replace the placeholders (this part is simple):
parts[0] -> "This"
parts[1] -> value[0]
parts[2] -> "is an example"
parts[3] -> value[1]
parts[4] -> "with a placeholder"
I know how to match and replace the placeholders (e.g. ({\d+})), but no clue how to tell regex to "match non placeholders" and "match placeholders" at the same time.
My idea was something like: (?!{\d+})+ | ({\d+}) but it's not working. I am doing this in JavaScript if Regex flavor is important.
If I can also replace the placeholders with a value in one step it would be neat, but I can also do this after I split.
You might write the pattern as:
{\d+}|\S.*?(?=\s*(?:{\d+}|$))
The pattern matches:
{\d+} Match { 1+ digits and }
| Or
\S.*? Match a non whitespace char followed by any character as few as possible
(?= Positive lookahead
\s* Match optional whitespace chars
(?:{\d+}|$) Match either { 1+ digits and } or assert the end of the string
) Close the lookahead
Regex demo
To get an array with those values:
const regex = /{\d+}|\S.*?(?=\s*(?:{\d+}|$))/gm;
const str = `This {0} is an example {1} with a placeholder`;
console.log(str.match(regex))
If you use parenthesis around the separator, matched results are included in the output:
let parts = str.split(/ *({\d+}) */);
See this demo at tio.run - If separator occurs at start/end, just filter out empty matches.
If your goal is just to replace, it can be done in one step using replace and a callback:
str = str.replace(/{(\d+)}/g, (m0, m1) => value[m1]);
Another demo at tio.run - m0 is the full match, m1 holds the capture of the first group.
Used with g (global) flag to return all possible matches in the string (not just the first).

RegEx match with lookaheads

I am been playing with regex and it seems to me. If i can achieve that with regex.
So my test string goes something like this ->
1. \"name\":\"asdsaD\"
2. target(name=adsada, name = asdasd , name=asds dfd ad,cccc=dsaasdas)
My desired result is to achieve that matches \"name\":\"asdsaD\" , name=adsada, name = asdasd and name=asds dfd ad
My solution that i tried -> \\*"{0,1}(?:name)\\*"{0,1}\s{0,}[:=]\s{0,}\\*"{0,1}(.*?)([),]|\\*"{0,1})
https://regex101.com/r/6e58Cb/1
It doesnot match it the way i want it to, tried changing the .*? -> .*
It matches the whole of second line -> name=adsada, name = asdasd , name=asds dfd ad,cccc=dsaasdas)
You could try something like this, a little simpler, and works at least on your given example:
name[\\"\:\s=]+([a-zA-Z0-9\-\_]+)
Basically just finds "name" with an equals, colon, slashes, etc., in front of it, and matches the following word
You could make use of a capturing group with a backreference \1 to what is captured to get consistent maches for the backslash and the double quotes on both sides:
((?:\\")?)name\1\h*[:=]\h*\1[^:=,]+\1
((?:\\")?) Capture group 1, optionally match \"
name\1\ Match name followed by a backrefernce to group 1
h*[:=]\h* Match either : or = between 0+ horizontal whitespace chars
\1[^:=,]+\1 Match 1+ times any char except : = and , between 2 backreferences
Regex demo | Java demo
For example
String regex = "((?:\\\\\")?)name\\1\\h*[:=]\\h*\\1[^:=,]+\\1";
String string = "1. \\\"name\\\":\\\"asdsaD\\\"\n"
+ "2. target(name=adsada, name = asdasd , name=asds dfd ad,cccc=dsaasdas)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
\"name\":\"asdsaD\"
name=adsada
name = asdasd
name=asds dfd ad

Notepad++ REGEX Masking / Sanitise Data

There's a requirement to sanitise the Production file and hand it over then to a third party. The integrity / number of characters / digits should remain same.
<ADD1<4, Privet Drive, Scotland, EC12 5FL, UK<
In the above example, we need to mask number with 9, and Characters with X or x (based on case).
Target data should be.
<ADD1<9, Xxxxxx Xxxxx, Xxxxxxxx, XX99 9XX, XX<
NP++ supposedly uses boost::regex engine.
And further, it apparently uses the boost-extended replacement format string.
This means you can put a conditional within the replacement string to test
which group matched, then replace accordingly.
syntax: (?1yes:no) says did group 1 match, do yes, else do no
syntax: (?{1}yes:no) same
If it's got boost::regex use
update
only between <ADD1< and <
find (?:(?!^)\G|<ADD1<)[^a-zA-Z0-9<]*\K(?:([A-Z])|([a-z])|\d)
replace (?1X:(?2x:9))
Note - select the replacement string format as Boost Extended
if it is not the default.
https://regex101.com/r/pJCsZa/1
Regex info
(?:
(?! ^ )
\G # Start match where last left off
| # or,
<ADD1< # New start
)
[^a-zA-Z0-9<]* # Optional non-letter or digit or <
\K # Ignore matched characters up to here
(?: # What's left, a letter or a digit
( [A-Z] ) # (1)
| ( [a-z] ) # (2)
| \d
)
You should be able to do a series of replacements here. Do each replacement by searching in regex mode, and then use the appropriate replacement:
[A-Z] -> replace with X
[a-z] -> replace with x
[0-9] -> replace with 9
I suggest highlighting the entire address text and then doing the replacement.

Relevant Regular Expression in scala

I want to keep only the last term of a string separated by dots
Example:
My string is:
abc"val1.val2.val3.val4"zzz
Expected string after i use regex:
abc"val4"zzz
Which means i want the content from left-hand side which was separated with dot (.)
The most relevant I tried was
val json="""abc"val1.val2.val3.val4"zzz"""
val sortie="""(([A-Za-z0-9]*)\.([A-Za-z0-9]*){2,10})\.([A-Za-z0-9]*)""".r.replaceAllIn(json, a=> a.group(3))
the result was:
abc".val4"zzz
Can you tell me if you have different solution for regex please?
Thanks
You may use
val s = """abc"val1.val2.val3.val4"zzz"""
val res = "(\\w+\")[^\"]*\\.([^\"]*\")".r replaceAllIn (s, "$1$2")
println(res)
// => abc"val4"zzz
See the Scala demo
Pattern details:
(\\w+\") - Group 1 capturing 1+ word chars and a "
[^\"]* - 0+ chars other than "
\\. - a dot
([^\"]*\") - Group 2 capturing 0+ chars other than " and then a ".
The $1 is the backreference to the first group and $2 inserts the text inside Group 2.
Maybe without Regex at all:
scala> json.split("\"").map(_.split("\\.").last).mkString("\"")
res4: String = abc"val4"zzz
This assumes you want each "token" (separated by ") to become the last dot-separated inner token.

How do I match the contents of parenthesis in a scala regular expression

I'm trying to get at the contents of a string like this (2.2,3.4) with a scala regular expression to obtain a string like the following 2.2,3.4
This will get me the string with parenthesis and all from a line of other text:
"""\(.*?\)"""
But I can't seem to find a way to get just the contents of the parenthesis.
I've tried: """\((.*?)\)""" """((.*?))""" and some other combinations, without luck.
I've used this one in the past in other Java apps: \\((.*?)\\), which is why I thought the first attempt in the line above """\((.*?)\)""" would work.
For my purposes, this looks something like:
var points = "pointA: (2.12, -3.48), pointB: (2.12, -3.48)"
var parenth_contents = """\((.*?)\)""".r;
val center = parenth_contents.findAllIn(points(0));
var cxy = center.next();
val cx = cxy.split(",")(0).toDouble;
Use Lookahead and Lookbehind
You can use this regex:
(?<=\()\d+\.\d+,\d+\.\d+(?=\))
Or, if you don't need precision inside the parentheses:
(?<=\()[^)]+(?=\))
See demo 1 and demo 2
Explanation
The lookbehind (?<=\() asserts that what precedes is a (
\d+\.\d+,\d+\.\d+ matches the string
or, in Option 2, [^)]+ matches any chars that are not a closing parenthesis
The lookahead (?=\)) asserts that what follows is a )
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
May be try this out
val parenth_contents = "\\(([^)]+)\\)".r
parenth_contents: scala.util.matching.Regex = \(([^)]+)\)
val parenth_contents(r) = "(123, abc)"
r: String = 123, abc
A even sample regex for matching all occurrence of both parenthesis itself and content inside the parenthesises.
(\([^)]+\)+)
1st Capturing Group (\([^)]+\)+)
\( matches the character ( literally (case sensitive)
Match a single character not present in the list below [^)]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
) matches the character ) literally (case sensitive)
\)+ matches the character ) literally (case sensitive)
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
https://regex101.com/r/MMNRRo/1
\((.*?)\) works - you just need to extract the matched group. The easiest way to do that is to use the unapplySeq method of scala.util.matching.Regex:
scala> val wrapped = raw"\((.*?)\)".r
wrapped: scala.util.matching.Regex = \((.*?)\)
val wrapped(r) = "(123,abc)"
r: String = 123,abc