Regex Works in Perl, Doesn't as NSRegularExpresson - regex

I have this regex:
([^\ \t\r\n\[\{\(\-])?'(?(1)|(?=\s | s\b))
with the following substitution:
$1’
This seems to work in Perl. Given a phrase like "King Solomon's Mines," it will change it to King Solomon’s Mines, but it throws an EXC_BAD_ACCESS error as an NSRegularExpression. This test suite suggests that syntax is valid in php and Python but not Javascript, and that the (?(1) part is the culprit.
Example Swift code:
let string = "King Solomon's Mines"
var anError: NSError? = nil
let pattern = "([^\\ \\t\\r\\n\\[\\{\\(\\-])?'(?(1)|(?=\\s | s\\b))"
let regex = NSRegularExpression(pattern: pattern, options: .CaseInsensitive, error: &anError)
let range = NSMakeRange(0, countElements(string))
let template = "$1’"
let newString = regex.stringByReplacingMatchesInString(string, options: nil, range: range, withTemplate: template)
The let regex declaration is where a Playground will get the bad access error. Do I need to modify the regex to get it working in Swift?
(Edit: forgot to put in the escapes for the backslashes. I had that in my code.)

I don't think that the conditional test (?(1)...|...) is available, but it is not really a problem since it is not needed. This pattern does the same:
let pattern = "'(?:(?<=[^\\s[{(-]')|(?=\\s | s\\b))"
Note: if it doesn't work, try to double escape the opening square bracket in the character class.

Related

Regular expression to extract href url

I want to extract the links from a String with regular expressions. I found a similar post here and I tried this code
let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>.*?</a>")
let range = NSMakeRange(0, text.characters.count)
let htmlLessString :String = regex.stringByReplacingMatches(in: text,
options: [],
range:range ,
withTemplate: "")
but the proposed regular expression deleted all the content of the href tag. My string look like
SOME stirng some text I need to keep and other text
and the expected result is
SOME stirng https://com.mywebsite.com/yfgvh/f23/fsd some text I need to keep and other text
the perfect result is
SOME stirng some text I need to keep (https://com.mywebsite.com/yfgvh/f23/fsd) and other text
Do you have an idea if it's possible to achieve this?
Of course it deletes the href content because you are ...ReplacingMatches...with empty string.
Your sample string does not match the pattern because the closing tag </a> is missing.
The pattern "<a[^>]+href=\"(.*?)\"[^>]*>" checks until a closing angle bracket after the link.
The captured group is located at index 1 of the match. This code prints all extracted links:
let text = "<a href=\"https://com.mywebsite.com/yfgvh/f23/fsd\" rel=\"DFGHJ\">"
let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>")
let range = NSMakeRange(0, text.characters.count)
let matches = regex.matches(in: text, range: range)
for match in matches {
let htmlLessString = (text as NSString).substring(with: match.rangeAt(1))
print(htmlLessString)
}
I not regular developer of Swift, but, Did you tried to use the withTemplateoption of stringByReplacingMatches like this?
let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>(.*)?</a>")
let range = NSMakeRange(0, text.characters.count)
let htmlLessString :String = regex.stringByReplacingMatches(in:
text,
options: [],
range:range ,
withTemplate: #"$2 ($1)")
This regex seems to work in this case: href="(.*)" .*">(.*)<\/a>(.*) , group 1 would have your url , group 2 text between <a></a> and group 3 text after <a></a> , however you will have to use this extension to be able to get information out of groups, as there is no native group support: http://samwize.com/2016/07/21/how-to-capture-multiple-groups-in-a-regex-with-swift/

Matching but not capture a string in Swift Regex

I'm trying to search for a single plain quote mark (') in given String to then replace it with a single curved quote mark (’). I had tested more patterns but every time the search captures also the adjacent text. For example in the string "I'm", along with the ' mark it gets also the "I" and the "m".
(?:\\S)'(?:\\S)
Is there a possibility for achieve this or in the Swift implementation of Regex there is not support for non-capturing groups?
EDIT:
Example
let startingString = "I'm"
let myPattern = "(?:\\S)(')(?:\\S)"
let mySubstitutionText = "’"
let result = (applyReg(startingString, pattern: myPattern, substitutionText: mySubstitutionText))
func applyReg(startingString: String, pattern: String, substitutionText: String) -> String {
var newStr = startingString
if let regex = try? NSRegularExpression(pattern: pattern, options: .CaseInsensitive) {
let regStr = regex.stringByReplacingMatchesInString(startingString, options: .WithoutAnchoringBounds, range: NSMakeRange(0, startingString.characters.count), withTemplate: startingString)
newStr = regStr
}
return newStr
}
Matching but not capture a string in Swift Regex
In regex, you can use lookarounds to achieve this behavior:
let myPattern = "(?<=\\S)'(?=\\S)"
See the regex demo
Lookarounds do not consume the text they match, they just return true or false so that the regex engine could decide what to do with the currently matched text. If the condition is met, the regex pattern is evaluated further, and if not, the match is failed.
However, using capturing seems quite valid here, do not discard that approach.
Put your quote in a capture group in itself
(?:\\S)(')(?:\\S)
For example, when matching against "I'm", this will capture ["I", "'", "m"]

How to construct Regex pattern Swift

I am trying to construct Regex but it doesn't work. Can anyone help?
I have a string, which I want to remove the following characters:
*_-+=#:><&[]\n
And instruct also to remove all text between (/ and )
Code is belkow:
if let regex = try? NSRegularExpression(pattern: "&[^*_-=;](\\)*;", options: .CaseInsensitive) {
let modString = regex.stringByReplacingMatchesInString(testString, options: .WithTransparentBounds, range: NSMakeRange(0, testString.characters.count), withTemplate: "")
print(modString)
}
You can use
"\\(/[^)]*\\)|[*\r\n_+=#:><&\\[\\]-]"
See the regex demo
The \\(/[^)]*\\) alternative deals with all text between (/ and ) and [*_+=#:><&\\[\\]-] will match all the single characters you need to match.
Note that the hyphen in your regex must either be double-escaped, or placed at the start or end of the character class. Your regex did not work because it created an invalid range:

Swift and regex, cpu goes haywire for some strings

I want to match a localization line with regex. Everything works fine except when trying to match this string. You can put the code in playground to see that it doesn't stop, or in a blank project to see the cpu going 100% and stuck at the 'let match' line. Now the interesting thing is if you delete the last word it works. I don't know if works with chinese or other weird chars, this is greek.
let lineContent = "\"key\" = \" Χρήση παλιάς συνόμευση\";"
if let r = try? NSRegularExpression(pattern: "\"(.*)+\"(^|[ ]*)=(^|[ ]*)\"(.*)+\";", options: NSRegularExpressionOptions()) {
let match = r.matchesInString(lineContent, options: NSMatchingOptions(), range: NSMakeRange(0, lineContent.characters.count))
match.count
}
Later edit: it actually doesn't matter the characters type but the number of words. This string put in the right side is also not working: 'jhg jhgjklkhjkh hhhhh hhh'
You have nested quantifiers in (.*)+ that will lead to catastrophic backtracking (I recommend reading that article). The problem is when a subexpression fails, the regex engine backtracks to test another alternative. Having nested quantifiers means there will be an exponencial number of tries for each character in the subject string: it will test for all repetitions of (.*)+ and, for each, also all repetitions of .*.
To avoid it, use a pattern defined as specific as you can:
"\"([^\"]+)\"[ ]*=[ ]*\"([^\"]*)\";"
\"([^\"]+)\" Matches
An opening "
[^\"]+ Any number of characters except quotes. Change the + to * to allow empty strings.
A closing "
Code
let lineContent = "\"key\" = \" Χρήση παλιάς συνόμευση\";"
if let r = try? NSRegularExpression(pattern: "\"([^\"]+)\"[ ]*=[ ]*\"([^\"]*)\";", options: NSRegularExpressionOptions()) {
let match = r.matchesInString(
lineContent,
options: NSMatchingOptions(),
range: NSMakeRange(0, lineContent.characters.count)
)
for index in 1..<match[0].numberOfRanges {
print((lineContent as NSString).substringWithRange(match[0].rangeAtIndex(index)))
}
}
SwiftStub demo
As already mentioned in comments, the .*+ is causing a catastrophic backtracking, causing the high CPU usage (and in general, failure to match).
Instead of using a pattern like
\"(.*)+\"
since, you're matching everything between the double-quotes, use a negated character set:
\"([^\"]+)\"
As per the comment above - replace the nested (.*)+ with a lazy version - (.*?).

Javascript RegExp find with condition but without showing them

I'm trying to find the words between the brackets.
var str = "asdfasdfkjh {{word1}} asdf fff fffff {{word2}} asdfasdf";
var pattern = /{{\w*}}/g;
var str.match(pattern); // ["{{word1}}","{{word2}}"]
This closes the deal, but gives it with the brackets, and i don't want them.
Sure, if I used the native replace on the results i could remove them. But i want the regexp to do the same.
I've also tried:
var pattern = /(?:{{)(\w*)(?:}})/g
but i can't find the real deal. Could you help me?
Edit: i might need to add a note that the words are dynamic
solution:
Bases on Tim Piezcker awnser i came with this solution:
var arr = [],
re = /{{(\w?)}}/g,item;
while (item = re.exec(s))
arr.push(item[1]);
In most regex flavors, you could use lookaround assertions:
(?<={{)\w*(?=}})
Unfortunately, JavaScript doesn't support lookbehind assertions, so you can't use them.
But the regex you proposed can be used by accessing the first capturing group:
var pattern = /{{(\w*)}}/g;
var match = pattern.exec(subject);
if (match != null) {
result = match[1];
}
A quick and dirty solution would be /[^{]+(?=\}\})/, but it will cause a bit of a mess if the leading braces are omitted, and will also match {word1}}. If I remember correctly, JavaScript does not support look-behind, which is a bit of a shame in this case.