Swift and regex, cpu goes haywire for some strings - regex

I want to match a localization line with regex. Everything works fine except when trying to match this string. You can put the code in playground to see that it doesn't stop, or in a blank project to see the cpu going 100% and stuck at the 'let match' line. Now the interesting thing is if you delete the last word it works. I don't know if works with chinese or other weird chars, this is greek.
let lineContent = "\"key\" = \" Χρήση παλιάς συνόμευση\";"
if let r = try? NSRegularExpression(pattern: "\"(.*)+\"(^|[ ]*)=(^|[ ]*)\"(.*)+\";", options: NSRegularExpressionOptions()) {
let match = r.matchesInString(lineContent, options: NSMatchingOptions(), range: NSMakeRange(0, lineContent.characters.count))
match.count
}
Later edit: it actually doesn't matter the characters type but the number of words. This string put in the right side is also not working: 'jhg jhgjklkhjkh hhhhh hhh'

You have nested quantifiers in (.*)+ that will lead to catastrophic backtracking (I recommend reading that article). The problem is when a subexpression fails, the regex engine backtracks to test another alternative. Having nested quantifiers means there will be an exponencial number of tries for each character in the subject string: it will test for all repetitions of (.*)+ and, for each, also all repetitions of .*.
To avoid it, use a pattern defined as specific as you can:
"\"([^\"]+)\"[ ]*=[ ]*\"([^\"]*)\";"
\"([^\"]+)\" Matches
An opening "
[^\"]+ Any number of characters except quotes. Change the + to * to allow empty strings.
A closing "
Code
let lineContent = "\"key\" = \" Χρήση παλιάς συνόμευση\";"
if let r = try? NSRegularExpression(pattern: "\"([^\"]+)\"[ ]*=[ ]*\"([^\"]*)\";", options: NSRegularExpressionOptions()) {
let match = r.matchesInString(
lineContent,
options: NSMatchingOptions(),
range: NSMakeRange(0, lineContent.characters.count)
)
for index in 1..<match[0].numberOfRanges {
print((lineContent as NSString).substringWithRange(match[0].rangeAtIndex(index)))
}
}
SwiftStub demo

As already mentioned in comments, the .*+ is causing a catastrophic backtracking, causing the high CPU usage (and in general, failure to match).
Instead of using a pattern like
\"(.*)+\"
since, you're matching everything between the double-quotes, use a negated character set:
\"([^\"]+)\"

As per the comment above - replace the nested (.*)+ with a lazy version - (.*?).

Related

How to extract a text in HTML tag? [duplicate]

I have found very similar posts, but I can't quite get my regular expression right here.
I am trying to write a regular expression which returns a string which is between two other strings. For example: I want to get the string which resides between the strings "cow" and "milk".
My cow always gives milk
would return
"always gives"
Here is the expression I have pieced together so far:
(?=cow).*(?=milk)
However, this returns the string "cow always gives".
A lookahead (that (?= part) does not consume any input. It is a zero-width assertion (as are boundary checks and lookbehinds).
You want a regular match here, to consume the cow portion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):
cow(.*)milk
No lookaheads are needed at all.
Regular expression to get a string between two strings in JavaScript
The most complete solution that will work in the vast majority of cases is using a capturing group with a lazy dot matching pattern. However, a dot . in JavaScript regex does not match line break characters, so, what will work in 100% cases is a [^] or [\s\S]/[\d\D]/[\w\W] constructs.
ECMAScript 2018 and newer compatible solution
In JavaScript environments supporting ECMAScript 2018, s modifier allows . to match any char including line break chars, and the regex engine supports lookbehinds of variable length. So, you may use a regex like
var result = s.match(/(?<=cow\s+).*?(?=\s+milk)/gs); // Returns multiple matches if any
// Or
var result = s.match(/(?<=cow\s*).*?(?=\s*milk)/gs); // Same but whitespaces are optional
In both cases, the current position is checked for cow with any 1/0 or more whitespaces after cow, then any 0+ chars as few as possible are matched and consumed (=added to the match value), and then milk is checked for (with any 1/0 or more whitespaces before this substring).
Scenario 1: Single-line input
This and all other scenarios below are supported by all JavaScript environments. See usage examples at the bottom of the answer.
cow (.*?) milk
cow is found first, then a space, then any 0+ chars other than line break chars, as few as possible as *? is a lazy quantifier, are captured into Group 1 and then a space with milk must follow (and those are matched and consumed, too).
Scenario 2: Multiline input
cow ([\s\S]*?) milk
Here, cow and a space are matched first, then any 0+ chars as few as possible are matched and captured into Group 1, and then a space with milk are matched.
Scenario 3: Overlapping matches
If you have a string like >>>15 text>>>67 text2>>> and you need to get 2 matches in-between >>>+number+whitespace and >>>, you can't use />>>\d+\s(.*?)>>>/g as this will only find 1 match due to the fact the >>> before 67 is already consumed upon finding the first match. You may use a positive lookahead to check for the text presence without actually "gobbling" it (i.e. appending to the match):
/>>>\d+\s(.*?)(?=>>>)/g
See the online regex demo yielding text1 and text2 as Group 1 contents found.
Also see How to get all possible overlapping matches for a string.
Performance considerations
Lazy dot matching pattern (.*?) inside regex patterns may slow down script execution if very long input is given. In many cases, unroll-the-loop technique helps to a greater extent. Trying to grab all between cow and milk from "Their\ncow\ngives\nmore\nmilk", we see that we just need to match all lines that do not start with milk, thus, instead of cow\n([\s\S]*?)\nmilk we can use:
/cow\n(.*(?:\n(?!milk$).*)*)\nmilk/gm
See the regex demo (if there can be \r\n, use /cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm). With this small test string, the performance gain is negligible, but with very large text, you will feel the difference (especially if the lines are long and line breaks are not very numerous).
Sample regex usage in JavaScript:
//Single/First match expected: use no global modifier and access match[1]
console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]);
// Multiple matches: get multiple matches with a global modifier and
// trim the results if length of leading/trailing delimiters is known
var s = "My cow always gives milk, thier cow also gives milk";
console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);}));
//or use RegExp#exec inside a loop to collect all the Group 1 contents
var result = [], m, rx = /cow (.*?) milk/g;
while ((m=rx.exec(s)) !== null) {
result.push(m[1]);
}
console.log(result);
Using the modern String#matchAll method
const s = "My cow always gives milk, thier cow also gives milk";
const matches = s.matchAll(/cow (.*?) milk/g);
console.log(Array.from(matches, x => x[1]));
Here's a regex which will grab what's between cow and milk (without leading/trailing space):
srctext = "My cow always gives milk.";
var re = /(.*cow\s+)(.*)(\s+milk.*)/;
var newtext = srctext.replace(re, "$2");
An example: http://jsfiddle.net/entropo/tkP74/
You need capture the .*
You can (but don't have to) make the .* nongreedy
There's really no need for the lookahead.
> /cow(.*?)milk/i.exec('My cow always gives milk');
["cow always gives milk", " always gives "]
The chosen answer didn't work for me...hmm...
Just add space after cow and/or before milk to trim spaces from " always gives "
/(?<=cow ).*(?= milk)/
I find regex to be tedious and time consuming given the syntax. Since you are already using javascript it is easier to do the following without regex:
const text = 'My cow always gives milk'
const start = `cow`;
const end = `milk`;
const middleText = text.split(start)[1].split(end)[0]
console.log(middleText) // prints "always gives"
You can use the method match() to extract a substring between two strings. Try the following code:
var str = "My cow always gives milk";
var subStr = str.match("cow(.*)milk");
console.log(subStr[1]);
Output:
always gives
See a complete example here : How to find sub-string between two strings.
I was able to get what I needed using Martinho Fernandes' solution below. The code is:
var test = "My cow always gives milk";
var testRE = test.match("cow(.*)milk");
alert(testRE[1]);
You'll notice that I am alerting the testRE variable as an array. This is because testRE is returning as an array, for some reason. The output from:
My cow always gives milk
Changes into:
always gives
Just use the following regular expression:
(?<=My cow\s).*?(?=\smilk)
If the data is on multiple lines then you may have to use the following,
/My cow ([\s\S]*)milk/gm
My cow always gives
milk
Regex 101 example
You can use destructuring to only focus on the part of your interest.
So you can do:
let str = "My cow always gives milk";
let [, result] = str.match(/\bcow\s+(.*?)\s+milk\b/) || [];
console.log(result);
In this way you ignore the first part (the complete match) and only get the capture group's match. The addition of || [] may be interesting if you are not sure there will be a match at all. In that case match would return null which cannot be destructured, and so we return [] instead in that case, and then result will be null.
The additional \b ensures the surrounding words "cow" and "milk" are really separate words (e.g. not "milky"). Also \s+ is needed to avoid that the match includes some outer spacing.
The method match() searches a string for a match and returns an Array object.
// Original string
var str = "My cow always gives milk";
// Using index [0] would return<br/>
// "**cow always gives milk**"
str.match(/cow(.*)milk/)**[0]**
// Using index **[1]** would return
// "**always gives**"
str.match(/cow(.*)milk/)[1]
Task
Extract substring between two string (excluding this two strings)
Solution
let allText = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum";
let textBefore = "five centuries,";
let textAfter = "electronic typesetting";
var regExp = new RegExp(`(?<=${textBefore}\\s)(.+?)(?=\\s+${textAfter})`, "g");
var results = regExp.exec(allText);
if (results && results.length > 1) {
console.log(results[0]);
}

Regex to match all the words looking for [duplicate]

I have found very similar posts, but I can't quite get my regular expression right here.
I am trying to write a regular expression which returns a string which is between two other strings. For example: I want to get the string which resides between the strings "cow" and "milk".
My cow always gives milk
would return
"always gives"
Here is the expression I have pieced together so far:
(?=cow).*(?=milk)
However, this returns the string "cow always gives".
A lookahead (that (?= part) does not consume any input. It is a zero-width assertion (as are boundary checks and lookbehinds).
You want a regular match here, to consume the cow portion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):
cow(.*)milk
No lookaheads are needed at all.
Regular expression to get a string between two strings in JavaScript
The most complete solution that will work in the vast majority of cases is using a capturing group with a lazy dot matching pattern. However, a dot . in JavaScript regex does not match line break characters, so, what will work in 100% cases is a [^] or [\s\S]/[\d\D]/[\w\W] constructs.
ECMAScript 2018 and newer compatible solution
In JavaScript environments supporting ECMAScript 2018, s modifier allows . to match any char including line break chars, and the regex engine supports lookbehinds of variable length. So, you may use a regex like
var result = s.match(/(?<=cow\s+).*?(?=\s+milk)/gs); // Returns multiple matches if any
// Or
var result = s.match(/(?<=cow\s*).*?(?=\s*milk)/gs); // Same but whitespaces are optional
In both cases, the current position is checked for cow with any 1/0 or more whitespaces after cow, then any 0+ chars as few as possible are matched and consumed (=added to the match value), and then milk is checked for (with any 1/0 or more whitespaces before this substring).
Scenario 1: Single-line input
This and all other scenarios below are supported by all JavaScript environments. See usage examples at the bottom of the answer.
cow (.*?) milk
cow is found first, then a space, then any 0+ chars other than line break chars, as few as possible as *? is a lazy quantifier, are captured into Group 1 and then a space with milk must follow (and those are matched and consumed, too).
Scenario 2: Multiline input
cow ([\s\S]*?) milk
Here, cow and a space are matched first, then any 0+ chars as few as possible are matched and captured into Group 1, and then a space with milk are matched.
Scenario 3: Overlapping matches
If you have a string like >>>15 text>>>67 text2>>> and you need to get 2 matches in-between >>>+number+whitespace and >>>, you can't use />>>\d+\s(.*?)>>>/g as this will only find 1 match due to the fact the >>> before 67 is already consumed upon finding the first match. You may use a positive lookahead to check for the text presence without actually "gobbling" it (i.e. appending to the match):
/>>>\d+\s(.*?)(?=>>>)/g
See the online regex demo yielding text1 and text2 as Group 1 contents found.
Also see How to get all possible overlapping matches for a string.
Performance considerations
Lazy dot matching pattern (.*?) inside regex patterns may slow down script execution if very long input is given. In many cases, unroll-the-loop technique helps to a greater extent. Trying to grab all between cow and milk from "Their\ncow\ngives\nmore\nmilk", we see that we just need to match all lines that do not start with milk, thus, instead of cow\n([\s\S]*?)\nmilk we can use:
/cow\n(.*(?:\n(?!milk$).*)*)\nmilk/gm
See the regex demo (if there can be \r\n, use /cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm). With this small test string, the performance gain is negligible, but with very large text, you will feel the difference (especially if the lines are long and line breaks are not very numerous).
Sample regex usage in JavaScript:
//Single/First match expected: use no global modifier and access match[1]
console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]);
// Multiple matches: get multiple matches with a global modifier and
// trim the results if length of leading/trailing delimiters is known
var s = "My cow always gives milk, thier cow also gives milk";
console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);}));
//or use RegExp#exec inside a loop to collect all the Group 1 contents
var result = [], m, rx = /cow (.*?) milk/g;
while ((m=rx.exec(s)) !== null) {
result.push(m[1]);
}
console.log(result);
Using the modern String#matchAll method
const s = "My cow always gives milk, thier cow also gives milk";
const matches = s.matchAll(/cow (.*?) milk/g);
console.log(Array.from(matches, x => x[1]));
Here's a regex which will grab what's between cow and milk (without leading/trailing space):
srctext = "My cow always gives milk.";
var re = /(.*cow\s+)(.*)(\s+milk.*)/;
var newtext = srctext.replace(re, "$2");
An example: http://jsfiddle.net/entropo/tkP74/
You need capture the .*
You can (but don't have to) make the .* nongreedy
There's really no need for the lookahead.
> /cow(.*?)milk/i.exec('My cow always gives milk');
["cow always gives milk", " always gives "]
The chosen answer didn't work for me...hmm...
Just add space after cow and/or before milk to trim spaces from " always gives "
/(?<=cow ).*(?= milk)/
I find regex to be tedious and time consuming given the syntax. Since you are already using javascript it is easier to do the following without regex:
const text = 'My cow always gives milk'
const start = `cow`;
const end = `milk`;
const middleText = text.split(start)[1].split(end)[0]
console.log(middleText) // prints "always gives"
You can use the method match() to extract a substring between two strings. Try the following code:
var str = "My cow always gives milk";
var subStr = str.match("cow(.*)milk");
console.log(subStr[1]);
Output:
always gives
See a complete example here : How to find sub-string between two strings.
I was able to get what I needed using Martinho Fernandes' solution below. The code is:
var test = "My cow always gives milk";
var testRE = test.match("cow(.*)milk");
alert(testRE[1]);
You'll notice that I am alerting the testRE variable as an array. This is because testRE is returning as an array, for some reason. The output from:
My cow always gives milk
Changes into:
always gives
Just use the following regular expression:
(?<=My cow\s).*?(?=\smilk)
If the data is on multiple lines then you may have to use the following,
/My cow ([\s\S]*)milk/gm
My cow always gives
milk
Regex 101 example
You can use destructuring to only focus on the part of your interest.
So you can do:
let str = "My cow always gives milk";
let [, result] = str.match(/\bcow\s+(.*?)\s+milk\b/) || [];
console.log(result);
In this way you ignore the first part (the complete match) and only get the capture group's match. The addition of || [] may be interesting if you are not sure there will be a match at all. In that case match would return null which cannot be destructured, and so we return [] instead in that case, and then result will be null.
The additional \b ensures the surrounding words "cow" and "milk" are really separate words (e.g. not "milky"). Also \s+ is needed to avoid that the match includes some outer spacing.
The method match() searches a string for a match and returns an Array object.
// Original string
var str = "My cow always gives milk";
// Using index [0] would return<br/>
// "**cow always gives milk**"
str.match(/cow(.*)milk/)**[0]**
// Using index **[1]** would return
// "**always gives**"
str.match(/cow(.*)milk/)[1]
Task
Extract substring between two string (excluding this two strings)
Solution
let allText = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum";
let textBefore = "five centuries,";
let textAfter = "electronic typesetting";
var regExp = new RegExp(`(?<=${textBefore}\\s)(.+?)(?=\\s+${textAfter})`, "g");
var results = regExp.exec(allText);
if (results && results.length > 1) {
console.log(results[0]);
}

Regex for picking a Value After “#word_” [duplicate]

I have found very similar posts, but I can't quite get my regular expression right here.
I am trying to write a regular expression which returns a string which is between two other strings. For example: I want to get the string which resides between the strings "cow" and "milk".
My cow always gives milk
would return
"always gives"
Here is the expression I have pieced together so far:
(?=cow).*(?=milk)
However, this returns the string "cow always gives".
A lookahead (that (?= part) does not consume any input. It is a zero-width assertion (as are boundary checks and lookbehinds).
You want a regular match here, to consume the cow portion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):
cow(.*)milk
No lookaheads are needed at all.
Regular expression to get a string between two strings in JavaScript
The most complete solution that will work in the vast majority of cases is using a capturing group with a lazy dot matching pattern. However, a dot . in JavaScript regex does not match line break characters, so, what will work in 100% cases is a [^] or [\s\S]/[\d\D]/[\w\W] constructs.
ECMAScript 2018 and newer compatible solution
In JavaScript environments supporting ECMAScript 2018, s modifier allows . to match any char including line break chars, and the regex engine supports lookbehinds of variable length. So, you may use a regex like
var result = s.match(/(?<=cow\s+).*?(?=\s+milk)/gs); // Returns multiple matches if any
// Or
var result = s.match(/(?<=cow\s*).*?(?=\s*milk)/gs); // Same but whitespaces are optional
In both cases, the current position is checked for cow with any 1/0 or more whitespaces after cow, then any 0+ chars as few as possible are matched and consumed (=added to the match value), and then milk is checked for (with any 1/0 or more whitespaces before this substring).
Scenario 1: Single-line input
This and all other scenarios below are supported by all JavaScript environments. See usage examples at the bottom of the answer.
cow (.*?) milk
cow is found first, then a space, then any 0+ chars other than line break chars, as few as possible as *? is a lazy quantifier, are captured into Group 1 and then a space with milk must follow (and those are matched and consumed, too).
Scenario 2: Multiline input
cow ([\s\S]*?) milk
Here, cow and a space are matched first, then any 0+ chars as few as possible are matched and captured into Group 1, and then a space with milk are matched.
Scenario 3: Overlapping matches
If you have a string like >>>15 text>>>67 text2>>> and you need to get 2 matches in-between >>>+number+whitespace and >>>, you can't use />>>\d+\s(.*?)>>>/g as this will only find 1 match due to the fact the >>> before 67 is already consumed upon finding the first match. You may use a positive lookahead to check for the text presence without actually "gobbling" it (i.e. appending to the match):
/>>>\d+\s(.*?)(?=>>>)/g
See the online regex demo yielding text1 and text2 as Group 1 contents found.
Also see How to get all possible overlapping matches for a string.
Performance considerations
Lazy dot matching pattern (.*?) inside regex patterns may slow down script execution if very long input is given. In many cases, unroll-the-loop technique helps to a greater extent. Trying to grab all between cow and milk from "Their\ncow\ngives\nmore\nmilk", we see that we just need to match all lines that do not start with milk, thus, instead of cow\n([\s\S]*?)\nmilk we can use:
/cow\n(.*(?:\n(?!milk$).*)*)\nmilk/gm
See the regex demo (if there can be \r\n, use /cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm). With this small test string, the performance gain is negligible, but with very large text, you will feel the difference (especially if the lines are long and line breaks are not very numerous).
Sample regex usage in JavaScript:
//Single/First match expected: use no global modifier and access match[1]
console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]);
// Multiple matches: get multiple matches with a global modifier and
// trim the results if length of leading/trailing delimiters is known
var s = "My cow always gives milk, thier cow also gives milk";
console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);}));
//or use RegExp#exec inside a loop to collect all the Group 1 contents
var result = [], m, rx = /cow (.*?) milk/g;
while ((m=rx.exec(s)) !== null) {
result.push(m[1]);
}
console.log(result);
Using the modern String#matchAll method
const s = "My cow always gives milk, thier cow also gives milk";
const matches = s.matchAll(/cow (.*?) milk/g);
console.log(Array.from(matches, x => x[1]));
Here's a regex which will grab what's between cow and milk (without leading/trailing space):
srctext = "My cow always gives milk.";
var re = /(.*cow\s+)(.*)(\s+milk.*)/;
var newtext = srctext.replace(re, "$2");
An example: http://jsfiddle.net/entropo/tkP74/
You need capture the .*
You can (but don't have to) make the .* nongreedy
There's really no need for the lookahead.
> /cow(.*?)milk/i.exec('My cow always gives milk');
["cow always gives milk", " always gives "]
The chosen answer didn't work for me...hmm...
Just add space after cow and/or before milk to trim spaces from " always gives "
/(?<=cow ).*(?= milk)/
I find regex to be tedious and time consuming given the syntax. Since you are already using javascript it is easier to do the following without regex:
const text = 'My cow always gives milk'
const start = `cow`;
const end = `milk`;
const middleText = text.split(start)[1].split(end)[0]
console.log(middleText) // prints "always gives"
You can use the method match() to extract a substring between two strings. Try the following code:
var str = "My cow always gives milk";
var subStr = str.match("cow(.*)milk");
console.log(subStr[1]);
Output:
always gives
See a complete example here : How to find sub-string between two strings.
I was able to get what I needed using Martinho Fernandes' solution below. The code is:
var test = "My cow always gives milk";
var testRE = test.match("cow(.*)milk");
alert(testRE[1]);
You'll notice that I am alerting the testRE variable as an array. This is because testRE is returning as an array, for some reason. The output from:
My cow always gives milk
Changes into:
always gives
Just use the following regular expression:
(?<=My cow\s).*?(?=\smilk)
If the data is on multiple lines then you may have to use the following,
/My cow ([\s\S]*)milk/gm
My cow always gives
milk
Regex 101 example
You can use destructuring to only focus on the part of your interest.
So you can do:
let str = "My cow always gives milk";
let [, result] = str.match(/\bcow\s+(.*?)\s+milk\b/) || [];
console.log(result);
In this way you ignore the first part (the complete match) and only get the capture group's match. The addition of || [] may be interesting if you are not sure there will be a match at all. In that case match would return null which cannot be destructured, and so we return [] instead in that case, and then result will be null.
The additional \b ensures the surrounding words "cow" and "milk" are really separate words (e.g. not "milky"). Also \s+ is needed to avoid that the match includes some outer spacing.
The method match() searches a string for a match and returns an Array object.
// Original string
var str = "My cow always gives milk";
// Using index [0] would return<br/>
// "**cow always gives milk**"
str.match(/cow(.*)milk/)**[0]**
// Using index **[1]** would return
// "**always gives**"
str.match(/cow(.*)milk/)[1]
Task
Extract substring between two string (excluding this two strings)
Solution
let allText = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum";
let textBefore = "five centuries,";
let textAfter = "electronic typesetting";
var regExp = new RegExp(`(?<=${textBefore}\\s)(.+?)(?=\\s+${textAfter})`, "g");
var results = regExp.exec(allText);
if (results && results.length > 1) {
console.log(results[0]);
}

How to search for only whole words in a Swift String

I have this NS search expression. searchString passes in a String which I would like to search for in the baseString and highlight. However at the moment if I search for the word 'I' an 'i' in the word 'hide' for example appears highlighted.
I've seen that I can use \b to search for only whole words but I can't see where I add this into the expression. So that only whole words are highlighted.
Another example could be if my baseString contains 'His story is history' and I used searchString to so search for 'his' it will highlight history.
let regex = try! NSRegularExpression(pattern: searchString as! String,options: .caseInsensitive)
for match in regex.matches(in: baseString!, options: NSRegularExpression.MatchingOptions(), range: NSRange(location: 0, length: (baseString?.characters.count)!)) as [NSTextCheckingResult] {
attributed.addAttribute(NSBackgroundColorAttributeName, value: UIColor.yellow, range: match.range)
}
You can easily create a regex pattern from your searchString:
let baseString = "His story is history"
let searchString = "his" //This needs to be a single word
let attributed = NSMutableAttributedString(string: baseString)
//Create a regex pattern matching with word boundaries
let searchPattern = "\\b"+NSRegularExpression.escapedPattern(for: searchString)+"\\b"
let regex = try! NSRegularExpression(pattern: searchPattern, options: .caseInsensitive)
for match in regex.matches(in: baseString, range: NSRange(0..<baseString.utf16.count)) {
attributed.addAttribute(NSBackgroundColorAttributeName, value: UIColor.yellow, range: match.range)
}
Some comments:
Assuming baseString and searchString are non-Optional String in the code above, if not, make them so as soon as possible, before searching.
Empty OptionSet is represented by [], so options: NSRegularExpression.MatchingOptions() in your code can be simplified as option: [], and it is the default value for options: parameter of matches method, which you have no need to specify.
NSRegularExpression takes and returns ranges based on UTF-16 representation of String. You should not use characters.count to make NSRange, use utf16.count instead.
The return type of matches(in:range:) is declared as [NSTextCheckingResult], you have no need to cast it.
Update
I thought of a better solution than my previous answer so I updated it. The original answer will follow for anyone that prefers so.
"(?<=[^A-Za-z0-9]|^)[A-Za-z0-9]+(?=[^A-Za-z0-9]|$)"
Breaking down this expression, (?<=[^A-Za-z0-9]|^) checks for any non-alphanumeric or start of line ^ before the word I want to match. [A-Za-z0-9]+? matches any alphanumeric characters and requires at least one matched by +. (?=[^A-Za-z0-9]|$) will check for another non-alphanumeric or end of line $ after the word I matched. Therefore this expression will match any alphanumeric. To exclude numbers to match only alphabets simply remove 0-9 from the expression like
"(?<=[^A-Za-z]|^)[A-Za-z]+(?=[^A-Za-z]|$)"
For usage replace the center matching expression with the word to match like:
"(?<=[^A-Za-z]|^)\(searchString)(?=[^A-Za-z]|$)"
Old Answer
I tried using this before, it finds every string separated by whitespace. Should do what you need
"\\s[a-zA-Z1-9]*\\s"
Change [a-zA-Z1-9]* to match what you are searching for, in your case fit your original search string into it like
let regex = try! NSRegularExpression(pattern: "\\s\(searchString)\\s" ,options: .caseInsensitive)
As an added answer, \\s will include the whitespace before and after the word. I added a check to exclude the whitespace if it becomes more useful, the pattern is like:
"(?<=\\s)[A-Za-z0-9]*(?=\\s)"
similarly, replace [A-Za-z0-9]* which searches for all words with the search string you need.
Note, (?<=\\s) checks for whitespace before the word but does not include it, (?=\\s) checks for whitespace after, also not including it. This will work better in most scenarios compared to my original answer above since there is no extra whitespace.

how to create regular expression for this sentence?

i have following statement {$("#aprilfoolc").val("HoliWed27"); $("#UgadHieXampp").val("ugadicome");}.and i want to get the string with combination.i have written following regex but it is not working.
please help!
(?=[\$("#]?)[\w]*(?<=[")]?)
Your lookaround assertions are using character classes by mistake, and you've confused lookbehind and lookahead. Try the following:
(?<=\$\(")\w*(?="\))
You could use this simpler one :
'{$("#aprilfoolc").val("HoliWed27");}'.match(/\$\(\"#(\w+)\"[^"]*"(\w+)"/)
This returns
["$("#aprilfoolc").val("HoliWed27"", "aprilfoolc", "HoliWed27"]
where the strings you want are at indexes 1 and 2.
This construction
(?=[\$*"#]?)
will match a lookahead, but only optional -- the character set is followed by a ?. This kind of defeats the next part,
[\w]
which matches word characters only. So the lookahead will never match. Similar, this part
(?<=[")])
will also never match, because logically there can never be one of the characters " or ) at the end of a string that matches \w only. Again, since this portion is optional (that ? at the end again) it will simply never match.
It's a bit unclear what you are after. Strings inside double quotes, yes, but in the first one you want to skip the hash -- why? Given your input and desired output, this ought to work:
\w+(?=")
Also possible:
/\("[#]?(.*?)"\)/
import re
s='{$("#aprilfoolc").val("HoliWed27");}'
f = re.findall(r'\("[#]?(.*?)"\)',s)
for m in f:
print m
I don't know why, but if you want capturing of two groups simultaneously, so:
/\("#(.*?)"\).*?\("(.*?)"\)/
import re
s='{$("#aprilfoolc").val("HoliWed27");}'
f = re.findall(r'\("#(.*?)"\).*?\("(.*?)"\)',s)
for m in f:
print m[0],m[1]
In JavaScript:
var s='{$("#aprilfoolc").val("HoliWed27")';
var re=/\("#(.*?)"\).*?\("(.*?)"\)/;
alert(s.match(re));