RegEx match with lookaheads - regex

I am been playing with regex and it seems to me. If i can achieve that with regex.
So my test string goes something like this ->
1. \"name\":\"asdsaD\"
2. target(name=adsada, name = asdasd , name=asds dfd ad,cccc=dsaasdas)
My desired result is to achieve that matches \"name\":\"asdsaD\" , name=adsada, name = asdasd and name=asds dfd ad
My solution that i tried -> \\*"{0,1}(?:name)\\*"{0,1}\s{0,}[:=]\s{0,}\\*"{0,1}(.*?)([),]|\\*"{0,1})
https://regex101.com/r/6e58Cb/1
It doesnot match it the way i want it to, tried changing the .*? -> .*
It matches the whole of second line -> name=adsada, name = asdasd , name=asds dfd ad,cccc=dsaasdas)

You could try something like this, a little simpler, and works at least on your given example:
name[\\"\:\s=]+([a-zA-Z0-9\-\_]+)
Basically just finds "name" with an equals, colon, slashes, etc., in front of it, and matches the following word

You could make use of a capturing group with a backreference \1 to what is captured to get consistent maches for the backslash and the double quotes on both sides:
((?:\\")?)name\1\h*[:=]\h*\1[^:=,]+\1
((?:\\")?) Capture group 1, optionally match \"
name\1\ Match name followed by a backrefernce to group 1
h*[:=]\h* Match either : or = between 0+ horizontal whitespace chars
\1[^:=,]+\1 Match 1+ times any char except : = and , between 2 backreferences
Regex demo | Java demo
For example
String regex = "((?:\\\\\")?)name\\1\\h*[:=]\\h*\\1[^:=,]+\\1";
String string = "1. \\\"name\\\":\\\"asdsaD\\\"\n"
+ "2. target(name=adsada, name = asdasd , name=asds dfd ad,cccc=dsaasdas)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
\"name\":\"asdsaD\"
name=adsada
name = asdasd
name=asds dfd ad

Related

How to split string into parts using Regex

I have a string containing placeholders which I want replace with other strings, but I would also like to split the string whenever I encounter a placeholder.
So, by splitting I mean that
"This {0} is an example {1} with a placeholder"
should become:
parts[0] -> "This"
parts[1] -> "{0}"
parts[2] -> "is an example"
parts[3] -> "{1}"
parts[4] -> "with a placeholder"
and then the next step would be to replace the placeholders (this part is simple):
parts[0] -> "This"
parts[1] -> value[0]
parts[2] -> "is an example"
parts[3] -> value[1]
parts[4] -> "with a placeholder"
I know how to match and replace the placeholders (e.g. ({\d+})), but no clue how to tell regex to "match non placeholders" and "match placeholders" at the same time.
My idea was something like: (?!{\d+})+ | ({\d+}) but it's not working. I am doing this in JavaScript if Regex flavor is important.
If I can also replace the placeholders with a value in one step it would be neat, but I can also do this after I split.
You might write the pattern as:
{\d+}|\S.*?(?=\s*(?:{\d+}|$))
The pattern matches:
{\d+} Match { 1+ digits and }
| Or
\S.*? Match a non whitespace char followed by any character as few as possible
(?= Positive lookahead
\s* Match optional whitespace chars
(?:{\d+}|$) Match either { 1+ digits and } or assert the end of the string
) Close the lookahead
Regex demo
To get an array with those values:
const regex = /{\d+}|\S.*?(?=\s*(?:{\d+}|$))/gm;
const str = `This {0} is an example {1} with a placeholder`;
console.log(str.match(regex))
If you use parenthesis around the separator, matched results are included in the output:
let parts = str.split(/ *({\d+}) */);
See this demo at tio.run - If separator occurs at start/end, just filter out empty matches.
If your goal is just to replace, it can be done in one step using replace and a callback:
str = str.replace(/{(\d+)}/g, (m0, m1) => value[m1]);
Another demo at tio.run - m0 is the full match, m1 holds the capture of the first group.
Used with g (global) flag to return all possible matches in the string (not just the first).

Regular Expression how to stop at first match

My regex pattern looks like this:
(?<=_)(.*?)(?=_)
I want replace only the first match between the two underscores (its not allways AVVI, it can be different. Same as AVVIDI):
T_AVVI_EINZELPOSTEN_TEST -> T_AVVIDI_EINZELPOSTEN_TEST
My regex pattern matches AVVI and EINZEPLOSTEN. How can i modify my regex to find only the first match AVVI?
code:
private Identifier addPrefix(final Identifier identifier) {
if (isExcluded(identifier)) {
return identifier;
}
Pattern p = Pattern.compile("(?<=_)(.*?)(?=_)");
Matcher m = p.matcher(identifier.getText());
return Identifier.toIdentifier(m.replaceAll(prefix));
}
You can do this using .replaceFirst like this using start anchor and a capture group:
String line = identifier.getText()
.repalceFirst("^([^_]*_)[^_]*(?=_)", "$1AVVIDI");
RegEx Demo
RegEx Breakup:
^: Start
([^_]*_): Capture group #1 to match 0 or more chars that are not _ followed by a _
[^_]*: Match 0 or more chars that are not _
(?=_): Positive lookahead to assert presence of _ at next position
$1AVVIDI: to replace with value in capture group #1 followed by text AVVIDI

Multiline Regex with opening and closing word

I need to admit, I'm very basic if it comes to RegEx expressions.
I have an app written in C# that looks for certain Regex expressions in text files. I'm not sure how to explain my problem so I will go straight to example.
My text:
DeviceNr : 30
DeviceClass = ABC
UnitNr = 1
Reference = 29
PhysState = ENABLED
LogState = OPERATIVE
DevicePlan = 702
Manufacturer = CDE
Model = EFG
ready
DeviceNr : 31
DeviceClass = ABC
UnitNr = 9
Reference = 33
PhysState = ENABLED
LogState = OPERATIVE
Manufacturer = DDD
Model = XYZ
Description = something here
ready
I need to match a multiline text that starts with "DeviceNr" word, ends with "ready" and have "DeviceClass = ABC" and "Model = XYZ" - I can only assume that this lines will be in this exact order, but I cannot assume what will be between them, not even number of other lines between them. I tried with below regex, but it matched the whole text instead of only DeviceNr : 31
DeviceNr : ([0-9]+)(?:.*?\n)*? DeviceClass = ABC(?:.*?\n)*? Model = XYZ(?:.*?\n)*?ready\n\n
If you know that "DeviceClass = ABC" and "Model = XYZ" are present and in that order, you can also make use of a lookahead assertion on a per line bases first matching all lines that do not contain for example DeviceNr
Then match the lines that does, and also do this for Model and ready
^\s*DeviceNr : ([0-9]+)(?:\r?\n(?!\s*DeviceClass =).*)*\r?\n\s*DeviceClass = ABC\b(?:\r?\n(?!\s*Model =).*)*\r?\n\s*Model = XYZ\b(?:\r?\n(?!\s*ready).*)*\r?\n\s*ready\b
^ Start of string
\s*DeviceNr : ([0-9]+) Match DeviceNr : and capture 1+ digits 0-9 in group 1
(?: Non capture group
\r?\n(?!\s*DeviceClass =).* Match a newline, and assert that the line does not contain DeviceClass =
)* Close non capture group and optionally repeat as you don't know how much lines there are
\r?\n\s*DeviceClass = ABC\b Match a newline, optional whitespace chars and DeviceClass = ABC
(?:\r?\n(?!\s*Model =).*)*\r?\n\s*Model = XYZ\b The previous approach also for Model =
(?:\r?\n(?!\s*ready).*)*\r?\n\s*ready\b And the same approach for ready
Regex demo
Note that \s can also match a newline. If you want to prevent that, you can also use [^\S\r\n] to match a whitespace char without a newline.
Regex demo
The issue is that you want to match 'DeviceNr : 31' followed by 'DeviceClass = ABC' (possibly with some intervening characters) followed by 'Model = XYZ' (again possibly with some intervening characters) followed by 'ready' (again possibly with some intervening characters) making sure that none of those intervening characters are actually the start of of another 'DeviceNr' section.
So to match arbitrary intervening characters with the above enforcement, we can use the following regex expression that uses a negative lookahead assertion:
(?:(?!DeviceNr)[\s\S])*?
(?: - Start of a non-capturing group
(?!DeviceNr) - Asserts that the next characters of the input are not 'DeviceNr'
[\s\S] - Matches a whitespace or non-whitespace character, i.e. any character
) end of the non-capturing group
*? non-greedily match 0 or more characters as long as the next input does not match 'DeviceNr'
Then it's a simple matter to use the above regex repeatedly as follows:
DeviceNr : (\d+)\n(?:(?!DeviceNr)[\s\S])*?DeviceClass = ABC\n(?:(?!DeviceNr)[\s\S])*?Model = XYZ\n(?:(?!DeviceNr)[\s\S])*?ready
See Regex Demo
Capture Group 1 will have the DeviceNr value.
Important Note
The above regex is quite expensive in terms of the number of steps required for execution since it must check the negative lookahead assertion at just about every character position once it has matched DeviceNr : (\d+).

Relevant Regular Expression in scala

I want to keep only the last term of a string separated by dots
Example:
My string is:
abc"val1.val2.val3.val4"zzz
Expected string after i use regex:
abc"val4"zzz
Which means i want the content from left-hand side which was separated with dot (.)
The most relevant I tried was
val json="""abc"val1.val2.val3.val4"zzz"""
val sortie="""(([A-Za-z0-9]*)\.([A-Za-z0-9]*){2,10})\.([A-Za-z0-9]*)""".r.replaceAllIn(json, a=> a.group(3))
the result was:
abc".val4"zzz
Can you tell me if you have different solution for regex please?
Thanks
You may use
val s = """abc"val1.val2.val3.val4"zzz"""
val res = "(\\w+\")[^\"]*\\.([^\"]*\")".r replaceAllIn (s, "$1$2")
println(res)
// => abc"val4"zzz
See the Scala demo
Pattern details:
(\\w+\") - Group 1 capturing 1+ word chars and a "
[^\"]* - 0+ chars other than "
\\. - a dot
([^\"]*\") - Group 2 capturing 0+ chars other than " and then a ".
The $1 is the backreference to the first group and $2 inserts the text inside Group 2.
Maybe without Regex at all:
scala> json.split("\"").map(_.split("\\.").last).mkString("\"")
res4: String = abc"val4"zzz
This assumes you want each "token" (separated by ") to become the last dot-separated inner token.

Swift Regex for extracting words between parenthesis

Hello i wanna extract the text between ().
For example :
(some text) some other text -> some text
(some) some other text -> some
(12345) some other text -> 12345
the maximum length of the string between parenthesis should be 10 characters.
(TooLongStri) -> nothing matched because 11 characters
what i have currently is :
let regex = try! NSRegularExpression(pattern: "\\(\\w+\\)", options: [])
regex.enumerateMatchesInString(text, options: [], range: NSMakeRange(0, (text as NSString).length))
{
(result, _, _) in
let match = (text as NSString).substringWithRange(result!.range)
if (match.characters.count <= 10)
{
print(match)
}
}
which works nicely but the matches are :
(some text) some other text -> (some text)
(some) some other text -> (some)
(12345) some other text -> (12345)
and doesn't match <=10 because () are counted also.
How can i change the code above to solve that? I would like also to remove the if (match.characters.count <= 10)by extending the regex to hold the length info.
You can use
"(?<=\\()[^()]{1,10}(?=\\))"
See the regex demo
The pattern:
(?<=\\() - asserts the presence of a ( before the current position and fails the match if there is none
[^()]{1,10} - matches 1 to 10 characters other than ( and ) (replace [^()] with \w if you need to only match alphanumeric / underscore characters)
(?=\\)) - checks if there is a literal ) after the current position, and fail the match if there is none.
If you can adjust your code to get the value at Range 1 (capture group) you can use a simpler regex:
"\\(([^()]{1,10})\\)"
See the regex demo. The value you need is inside Capture group 1.
This will work
\((?=.{0,10}\)).+?\)
Regex Demo
This will also work
\((?=.{0,10}\))([^)]+)\)
Regex Demo
Regex Breakdown
\( #Match the bracket literally
(?=.{0,10}\)) #Lookahead to check there are between 0 to 10 characters till we encounter another )
([^)]+) #Match anything except )
\) #Match ) literally