Swift Regex doesn't work - regex

I am using the following extension method to get NSRange array of a substring:
extension String {
func nsRangesOfString(findStr:String) -> [NSRange] {
let ranges: [NSRange]
do {
// Create the regular expression.
let regex = try NSRegularExpression(pattern: findStr, options: [])
// Use the regular expression to get an array of NSTextCheckingResult.
// Use map to extract the range from each result.
ranges = regex.matches(in: self, options: [], range: NSMakeRange(0, self.characters.count)).map {$0.range}
}
catch {
// There was a problem creating the regular expression
ranges = []
}
return ranges
}
}
However, I didn't realize why it doesn't work sometimes. Here are two similar cases, one works and the other doesn't:
That one works:
self(String):
"וצפן (קרי: יִצְפֹּ֣ן) לַ֭יְשָׁרִים תּוּשִׁיָּ֑ה מָ֝גֵ֗ן לְהֹ֣לְכֵי תֹֽם׃"
findStr:
"קרי:"
And that one doesn't:
self(String):
"לִ֭נְצֹר אָרְח֣וֹת מִשְׁפָּ֑ט וְדֶ֖רֶךְ חסידו (קרי: חֲסִידָ֣יו) יִשְׁמֹֽר׃"
findStr:
"קרי:"
(An alternate steady method would be an appropriate answer though.)

NSRange ranges are specified in terms of UTF-16 code units (which
is what NSString uses internally), therefore the length must be
self.utf16.count:
ranges = regex.matches(in: self, options: [],
range: NSRange(location: 0, length: self.utf16.count))
.map {$0.range}
In the case of your second string we have
let s2 = "לִ֭נְצֹר אָרְח֣וֹת מִשְׁפָּ֑ט וְדֶ֖רֶךְ חסידו (קרי: חֲסִידָ֣יו) יִשְׁמֹֽר׃"
print(s2.characters.count) // 46
print(s2.utf16.count) // 74
and that's why the pattern is not found with your code.
Starting with Swift 4 you can compute a NSRange for the entire string also as
NSRange(self.startIndex..., in: self)

Related

How to use regex to split string into groups of identical characters?

I got a string like this:
var string = "AAAAAAABBBCCCCCCDD"
and like to split the string into an array of this format (same characters --> same group) using regular expressions:
Array: "AAAAAAA", "BBB", "CCCCCC", "DD"
This Is what I got so far but tbh I can not really get it working.
var array = [String]()
var string = "AAAAAAABBBCCCCCCDD"
let pattern = "\\ b([1,][a-z])\\" // mistake?!
let regex = try! NSRegularExpression(pattern: pattern, options: [])
array = regex.matchesInString(string, options: [], range: NSRange(location: 0, length: string.count))
You can achieve that using this function from this answer:
func matches(for regex: String, in text: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex)
let results = regex.matches(in: text,
range: NSRange(text.startIndex..., in: text))
return results.map {
String(text[Range($0.range, in: text)!])
}
} catch let error {
print("invalid regex: \(error.localizedDescription)")
return []
}
}
Passing (.)\\1+ as regex and AAAAAAABBBCCCCCCDD as text like this:
let result = matches(for: "(.)\\1+", in: "AAAAAAABBBCCCCCCDD")
print(result) // ["AAAAAAA", "BBB", "CCCCCC", "DD"]
You can achieve that with a "back reference", compare
NSRegularExpression:
\n
Back Reference. Match whatever the nth capturing group matched. n must be a number ≥ 1 and ≤ total number of capture groups in the pattern.
Example (using the utility method from Swift extract regex matches):
let string = "AAAAAAABBBCCCCCCDDE"
let pattern = "(.)\\1*"
let array = matches(for: pattern, in: string)
print(array)
// ["AAAAAAA", "BBB", "CCCCCC", "DD", "E"]
The pattern matches an arbitrary character, followed by zero or more
occurrences of the same character. If you are only interested in
repeating word characters use
let pattern = "(\\w)\\1*"
instead.

Regex not working for empty string - Swift

My function should handle every regex and return a true or false. It's working good... still now
func test(_ input: String) -> Bool {
let pattern = ".{7}" //allow exactly 7 numbers
let regex = try! NSRegularExpression(pattern: pattern, options: [NSRegularExpression.Options.caseInsensitive])
let leftover = regex.stringByReplacingMatches(in: input, options: [], range: NSMakeRange(0, input.characters.count), withTemplate: "")
if leftover.isEmpty {
return true
}
return false
}
print(test("123456")) //false
print(test("1234567")) //true
print(test("12345678")) //false
print(test("")) //true - I expect false
So I understand why test("") is false. But how can I fix my regex that it return false?
Sometimes I use the regex .* My function should handle this one, too. So I can't make a check like this
if input.isEmpty {
return false
}
If input is the empty string then leftover will be the empty string
as well, and therefore your function returns true. Another case where
your approach fails is
print(test("12345671234567")) // true (expected: false)
An alternative is to use the range(of:) method of String with the .regularExpression option. Then check if the matched range is the entire string.
In order to match 7 digits (and not 7 arbitrary characters), the
pattern should be \d{7}.
func test(_ input: String) -> Bool {
let pattern = "\\d{7}"
return input.range(of: pattern, options: [.regularExpression, .caseInsensitive])
== input.startIndex..<input.endIndex
}
A solution is to specify that your regex has to match the entire string to be valid, so you can do this by adding ^ and $ at your regex to ensure the start and the end of the string.
let pattern = "^.{7}$" //allow exactly 7 numbers
let regex = try! NSRegularExpression(pattern: pattern, options: [.caseInsensitive])
let numberOfOccurences = regex.numberOfMatches(in: input, options: [], range: NSMakeRange(0, input.utf16.count))
return (numberOfOccurences != 0)
In theory, we should be checking if numberOfOccurences is truly equal to 1 to return true, but checking the start and the end should give you only one or zero match.

Swift 3 conversion emcompassing muliple issues

I have started converting to swift 3 while removing NS classes as much as possible, but ran into a snag with his code:
var S: String = ADataItem.description_text;
// FRegExBufui_Image is of type NSRegularExpression
let matches: [NSTextCheckingResult] = FRegexBufUI_Image.matches(in: S, options: NSRegularExpression.MatchingOptions(), range: NSRange(location: 0, length: S.characters.count));
if matches.count > 0 {
for m in 0 ..< matches.count {
S = S.substring(with: match.rangeAt(m));
I get error
Cannot convert value of type 'NSRange' (aka '_NSRange') to expected
argument type 'Range'
(aka'Range')
I think maybe the reason for the problem is I am now mixing swift datatypes/classes with NS.
The mos clean solution here... is that simply casting NSRange to Range? Or is there a way to go fully Swift when I need to use regular expressions as well?
A Swift Range and an NSRange are different things. It looks like the function is expecting a Swift range which you can create using the ..< operator. Instead of
NSRange(location: 0, length: S.characters.count)
write
0 ..< S.characters.count
Note that the above two things are not identical in semantics although they both represent the same set of characters. The NSRange takes the start location and the length of the character sequence. The Swift Range uses the lower and upper bound (the upper bound is excluded).
The easiest way is to bridge the string to NSString
let matches = FRegexBufUI_Image.matches(in: S, options: NSRegularExpression.MatchingOptions(), range: NSRange(location: 0, length: S.characters.count));
for match in matches { // don't use ugly C-style index based loops
let substring = (S as NSString).substring(with: match.rangeAt(m))
}
If you don't want to use mixed types implement this String extension which converts Range<String.Index> to NSRange:
extension String {
func range(from nsRange: NSRange) -> Range<String.Index>? {
guard
let from16 = utf16.index(utf16.startIndex, offsetBy: nsRange.location, limitedBy: utf16.endIndex),
let to16 = utf16.index(from16, offsetBy: nsRange.length, limitedBy: utf16.endIndex),
let from = String.Index(from16, within: self),
let to = String.Index(to16, within: self)
else { return nil }
return from ..< to
}
func substring(withNSRange range : NSRange) -> String
{
let swiftRange = self.range(from : range)
return swiftRange != nil ? self.substring(with: swiftRange!) : self
}
}
and use it:
for match in matches { // don't use ugly C-style index based loops
let substring = S.substring(withNSRange: match.rangeAt(m))
}
Edit:
In Swift 4+ the extension has become obsolete. There is a convenience initializer to create Range<String.Index> from NSRange
for match in matches { // don't use ugly C-style index based loops
let stringRange = Range(match.range(at: m), in: S)!
let substring = String(S[stringRange])
}

How to use regex with Swift?

I am making an app in Swift and I need to catch 8 numbers from a string.
Here's the string:
index.php?page=index&l=99182677
My pattern is:
&l=(\d{8,})
And here's my code:
var yourAccountNumber = "index.php?page=index&l=99182677"
let regex = try! NSRegularExpression(pattern: "&l=(\\d{8,})", options: NSRegularExpressionOptions.CaseInsensitive)
let range = NSMakeRange(0, yourAccountNumber.characters.count)
let match = regex.matchesInString(yourAccountNumber, options: NSMatchingOptions.Anchored, range: range)
Firstly, I don't know what the NSMatchingOptions means, on the official Apple library, I don't get all the .Anchored, .ReportProgress, etc stuff. Anyone would be able to lighten me up on this?
Then, when I print(match), nothing seems to contain on that variable ([]).
I am using Xcode 7 Beta 3, with Swift 2.0.
ORIGINAL ANSWER
Here is a function you can leverage to get captured group texts:
import Foundation
extension String {
func firstMatchIn(string: NSString!, atRangeIndex: Int!) -> String {
var error : NSError?
let re = NSRegularExpression(pattern: self, options: .CaseInsensitive, error: &error)
let match = re.firstMatchInString(string, options: .WithoutAnchoringBounds, range: NSMakeRange(0, string.length))
return string.substringWithRange(match.rangeAtIndex(atRangeIndex))
}
}
And then:
var result = "&l=(\\d{8,})".firstMatchIn(yourAccountNumber, atRangeIndex: 1)
The 1 in atRangeIndex: 1 will extract the text captured by (\d{8,}) capture group.
NOTE1: If you plan to extract 8, and only 8 digits after &l=, you do not need the , in the limiting quantifier, as {8,} means 8 or more. Change to {8} if you plan to capture just 8 digits.
NOTE2: NSMatchingAnchored is something you would like to avoid if your expected result is not at the beginning of a search range. See documentation:
Specifies that matches are limited to those at the start of the search range.
NOTE3: Speaking about "simplest" things, I'd advise to avoid using look-arounds whenever you do not have to. Look-arounds usually come at some cost to performance, and if you are not going to capture overlapping text, I'd recommend to use capture groups.
UPDATE FOR SWIFT 2
I have come up with a function that will return all matches with all capturing groups (similar to preg_match_all in PHP). Here is a way to use it for your scenario:
func regMatchGroup(regex: String, text: String) -> [[String]] {
do {
var resultsFinal = [[String]]()
let regex = try NSRegularExpression(pattern: regex, options: [])
let nsString = text as NSString
let results = regex.matchesInString(text,
options: [], range: NSMakeRange(0, nsString.length))
for result in results {
var internalString = [String]()
for var i = 0; i < result.numberOfRanges; ++i{
internalString.append(nsString.substringWithRange(result.rangeAtIndex(i)))
}
resultsFinal.append(internalString)
}
return resultsFinal
} catch let error as NSError {
print("invalid regex: \(error.localizedDescription)")
return [[]]
}
}
// USAGE:
let yourAccountNumber = "index.php?page=index&l=99182677"
let matches = regMatchGroup("&l=(\\d{8,})", text: yourAccountNumber)
if (matches.count > 0) // If we have matches....
{
print(matches[0][1]) // Print the first one, Group 1.
}
It may be easier just to use the NSString method instead of NSRegularExpression.
var yourAccountNumber = "index.php?page=index&l=99182677"
println(yourAccountNumber) // index.php?page=index&l=99182677
let regexString = "(?<=&l=)\\d{8,}+"
let options :NSStringCompareOptions = .RegularExpressionSearch | .CaseInsensitiveSearch
if let range = yourAccountNumber.rangeOfString(regexString, options:options) {
let digits = yourAccountNumber.substringWithRange(range)
println("digits: \(digits)")
}
else {
print("Match not found")
}
The (?<=&l=) means precedes but not part of.
In detail:
Look-behind assertion. True if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)
In general performance considerations of a look-behind without instrumented proof is just premature optimization. That being said there may be other valid reasons for and against look-arounds in regular expressions.
ICU User Guide: Regular Expressions
For Swift 2, you can use this extension of String:
import Foundation
extension String {
func firstMatchIn(string: NSString!, atRangeIndex: Int!) -> String {
do {
let re = try NSRegularExpression(pattern: self, options: NSRegularExpressionOptions.CaseInsensitive)
let match = re.firstMatchInString(string as String, options: .WithoutAnchoringBounds, range: NSMakeRange(0, string.length))
return string.substringWithRange(match!.rangeAtIndex(atRangeIndex))
} catch {
return ""
}
}
}
You can get the account-number with:
var result = "&l=(\\d{8,})".firstMatchIn(yourAccountNumber, atRangeIndex: 1)
Replace NSMatchingOptions.Anchored with NSMatchingOptions() (no options)

Swift Regex matching fails when source contains unicode characters

I'm trying to do a simple regex match using NSRegularExpression, but I'm having some problems matching the string when the source contains multibyte characters:
let string = "D 9"
// The following matches (any characters)(SPACE)(numbers)(any characters)
let pattern = "([\\s\\S]*) ([0-9]*)(.*)"
let slen : Int = string.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)
var error: NSError? = nil
var regex = NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions.DotMatchesLineSeparators, error: &error)
var result = regex?.stringByReplacingMatchesInString(string, options: nil, range: NSRange(location:0,
length:slen), withTemplate: "First \"$1\" Second: \"$2\"")
The code above returns "D" and "9" as expected
If I now change the first line to include a UK 'Pound' currency symbol as follows:
let string = "£ 9"
Then the match doesn't work, even though the ([\\s\\S]*) part of the expression should still match any leading characters.
I understand that the £ symbol will take two bytes but the wildcard leading match should ignore those shouldn't it?
Can anyone explain what is going on here please?
It can be confusing. The first parameter of stringByReplacingMatchesInString() is mapped from NSString in
Objective-C to String in Swift, but the range: parameter is still
an NSRange. Therefore you have to specify the range in the units
used by NSString (which is the number of UTF-16 code points):
var result = regex?.stringByReplacingMatchesInString(string,
options: nil,
range: NSRange(location:0, length:(string as NSString).length),
withTemplate: "First \"$1\" Second: \"$2\"")
Alternatively you can use count(string.utf16)
instead of (string as NSString).length .
Full example:
let string = "£ 9"
let pattern = "([\\s\\S]*) ([0-9]*)(.*)"
var error: NSError? = nil
let regex = NSRegularExpression(pattern: pattern,
options: NSRegularExpressionOptions.DotMatchesLineSeparators,
error: &error)!
let result = regex.stringByReplacingMatchesInString(string,
options: nil,
range: NSRange(location:0, length:(string as NSString).length),
withTemplate: "First \"$1\" Second: \"$2\"")
println(result)
// First "£" Second: "9"
I've run into this a couple times and Martin's answer helped me understand the problem. Here's a quick version of the solution that worked for me.
If your regular expression function includes a range parameter built like this:
NSRange(location: 0, length: yourString.count)
You can change it to this:
NSRange(location: 0, length: yourString.utf16.count)