How to find which group is matched in NSRegularExpression - regex

I have a regex statement with multiple capture groups which are separated by | operator. How can I find out which capture group is matched? Only way I can think of -for this example- is counting the number of characters if something is matched.
var string = "1234567897"
var pattern = "(^\\d{9}$)|(^\\d{10}$)|(^\\d{13}$)|(^[a-zA-Z]{2}\\d{9}[a-zA-Z]{2}$)"
var myRegex = NSRegularExpression(pattern: pattern, options: nil, error: nil)!
if let myMatch = myRegex.firstMatchInString(string, options: nil,
range: NSRange(location: 0, length: string.utf16Count)) {
println((string as NSString).substringWithRange(myMatch.rangeAtIndex(0)))
}

I wrote a code which worked for my example. I am sure it can be written better way but it works for now.
Swift 2.3
var string = "123456789"
var pattern = "(^\\d{9}$)|(^\\d{10}$)|(^\\d{13}$)|(^[a-zA-Z]{2}\\d{9}[wW]{2}$)"
var myRegex = try! NSRegularExpression(pattern: pattern, options: [])
if let myMatch = myRegex.firstMatchInString(string, options: NSMatchingOptions.init(rawValue: 0), range: NSRange(location: 0, length: string.utf16.count)) {
var matchedGroup = 0
for var i in 1..<myMatch.numberOfRanges {
if myMatch.rangeAtIndex(i).length != 0 {
matchedGroup = i
break
}
}
print(matchedGroup)
print((string as NSString).substringWithRange(myMatch.rangeAtIndex(0))) //whatever the range you want to print
}
Swift 3
var string = "123456789"
var pattern = "(^\\d{9}$)|(^\\d{10}$)|(^\\d{13}$)|(^[a-zA-Z]{2}\\d{9}[wW]{2}$)"
var myRegex = try! NSRegularExpression(pattern: pattern, options: [])
if let myMatch = myRegex.firstMatch(in: string, options: NSRegularExpression.MatchingOptions.init(rawValue: 0), range: NSRange(location: 0, length: string.utf16.count)) {
var matchedGroup = 0
for var i in 1..<myMatch.numberOfRanges {
if myMatch.rangeAt(i).length != 0 {
matchedGroup = i
break
}
}
print(matchedGroup)
print((string as NSString).substring(with: myMatch.rangeAt(0))) //whatever the range you want to print
}

Related

RegExp Find and Replace

I'm trying to resolve a problem with Swift 3 but with no success.
I must change this string :
< iframe class="giphy-embed" src="//giphy.com/embed/akEhceCKfMyKA"></iframe>
with this one :
< img class="giphy-embed" src="https://media.giphy.com/media/akEhceCKfMyKA/giphy.gif"></img>
The node name must change from 'iframe' to 'img'. Also I must keep a part of link and change the beginning and the end
"//giphy.com/embed/akEhceCKfMyKA"
to "https://media.giphy.com/media/akEhceCKfMyKA/giphy.gif"
Have you a solution with a regular expression ?
Thanks a lot.
Here's how you do it in Swift. NSRegularExpression still works with NSString / NSMutableString so it's easier if you convert it before hand.
let str = "< iframe class=\"giphy-embed\" src=\"//giphy.com/embed/akEhceCKfMyKA\"></iframe>"
let mutableStr = NSMutableString(string: str)
let regex = try! NSRegularExpression(pattern: "<\\s*(iframe).+src=\"(.+?)\".+", options: [])
if let match = regex.firstMatch(in: str, options: [], range: NSMakeRange(0, mutableStr.length)) {
let components = mutableStr.substring(with: match.rangeAt(2)).components(separatedBy: "/")
let newURL = "https://media.giphy.com/media/" + components.last! + "/giphy.gif"
mutableStr.replaceCharacters(in: match.rangeAt(2), with: newURL)
mutableStr.replaceCharacters(in: match.rangeAt(1), with: "image")
}
let newStr = mutableStr as String
Find:
< iframe (class="giphy-embed" src=")\/\/giphy\.com\/embed\/([A-Za-z]*)"><\/iframe>
Replace
< img $1https://media.giphy.com/media/$2/giphy.gif"></img>
See demo
Thanks to Code Different for his help !
I need to change few things for replace all occurences in my string.
Here is my code :
let mutableStr = NSMutableString(string: str)
let regex = try! NSRegularExpression(pattern: "<\\s*(iframe).+src=\"(.+?)\".+(/iframe)", options: [])
let matches = regex.matches(in: str, options: [], range: NSMakeRange(0, mutableStr.length))
var k = matches.count - 1
while k >= 0 {
let match = matches[k]
let components = mutableStr.substring(with: match.rangeAt(2)).components(separatedBy: "/")
let newURL = "https://media.giphy.com/media/" + components.last! + "/giphy.gif"
mutableStr.replaceCharacters(in: match.rangeAt(3), with: "/img")
mutableStr.replaceCharacters(in: match.rangeAt(2), with: newURL)
mutableStr.replaceCharacters(in: match.rangeAt(1), with: "img")
k -= 1
}
print(mutableStr as String)
And it's work perfectly !

Swift: Splitting Strings by RegEx [duplicate]

I am attempting to use regular expression to replace all occurrences of UK car registrations within a string.
The following swift code works perfectly for a when the string matches the regex exactly as below.
var myString = "DD11 AAA"
var stringlength = countElements(myString)
var ierror: NSError?
var regex:NSRegularExpression = NSRegularExpression(pattern: "^([A-HK-PRSVWY][A-HJ-PR-Y])\\s?([0][2-9]|[1-9][0-9])\\s?[A-HJ-PR-Z]{3}$", options: NSRegularExpressionOptions.CaseInsensitive, error: &ierror)!
var modString = regex.stringByReplacingMatchesInString(myString, options: nil, range: NSMakeRange(0, stringlength), withTemplate: "XX")
print(modString)
The result is XX
However, the following does not work and the string is not modifed
var myString = "my car reg 1 - DD11 AAA my car reg 2 - AA22 BBB"
var stringlength = countElements(myString)
var ierror: NSError?
var regex:NSRegularExpression = NSRegularExpression(pattern: "^([A-HK-PRSVWY][A-HJ-PR-Y])\\s?([0][2-9]|[1-9][0-9])\\s?[A-HJ-PR-Z]{3}$", options: NSRegularExpressionOptions.CaseInsensitive, error: &ierror)!
var modString = regex.stringByReplacingMatchesInString(myString, options: nil, range: NSMakeRange(0, stringlength), withTemplate: "XX")
print(modString)
The result is my car reg 1 - DD11 AAA my car reg 2 - AA22 BBB
Can anyone give me any pointers?
You need to remove the ^ and $ anchors.
The ^ means start of string and $ means end of string (or line, depending on the options). That's why your first example works: in the first test string, the start of the string is really followed by your pattern and ends with it.
In the second test string, the pattern is found in the middle of the string, thus the ^... can't apply. If you would just remove the ^, the $ would apply on the second occurrence of the registration number and the output would be my car reg 1 - DD11 AAA my car reg 2 - XX.
let myString = "my car reg 1 - DD11 AAA my car reg 2 - AA22 BBB"
let regex = try! NSRegularExpression(pattern: "([A-HK-PRSVWY][A-HJ-PR-Y])\\s?([0][2-9]|[1-9][0-9])\\s?[A-HJ-PR-Z]{3}", options: NSRegularExpression.Options.caseInsensitive)
let range = NSMakeRange(0, myString.count)
let modString = regex.stringByReplacingMatches(in: myString, options: [], range: range, withTemplate: "XX")
print(modString)
// Output: "my car reg 1 - XX my car reg 2 - XX"
Let's use a class extension to wrap this up in Swift 3 syntax:
extension String {
mutating func removingRegexMatches(pattern: String, replaceWith: String = "") {
do {
let regex = try NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let range = NSRange(location: 0, length: count)
self = regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: replaceWith)
} catch { return }
}
}
var phoneNumber = "+1 07777777777"
phoneNumber.removingRegexMatches(pattern: "\\+\\d{1,4} (0)?")
Results in 7777777777 (thus removing country code from phone number)
Swift 4.2 Updated
let myString = "my car reg 1 - DD11 AAA my car reg 2 - AA22 BBB"
if let regex = try? NSRegularExpression(pattern: "([A-HK-PRSVWY][A-HJ-PR-Y])\\s?([0][2-9]|[1-9][0-9])\\s?[A-HJ-PR-Z]{3}", options: .caseInsensitive) {
let modString = regex.stringByReplacingMatches(in: myString, options: [], range: NSRange(location: 0, length: myString.count), withTemplate: "XX")
print(modString)
}
Update for Swift 2.1:
var myString = "my car reg 1 - DD11 AAA my car reg 2 - AA22 BBB"
if let regex = try? NSRegularExpression(pattern: "([A-HK-PRSVWY][A-HJ-PR-Y])\\s?([0][2-9]|[1-9][0-9])\\s?[A-HJ-PR-Z]{3}", options: .CaseInsensitive) {
let modString = regex.stringByReplacingMatchesInString(myString, options: .WithTransparentBounds, range: NSMakeRange(0, myString.characters.count), withTemplate: "XX")
print(modString)
}
Warning
Do not use NSRange(location: 0, length: myString.count) as all examples above quoted.
Use NSRange(myString.startIndex..., in: myString) instead!
.count will count newline characters like \r\n as one character - this may result in a shortened, thus invalid, NSRange that does not match the whole string.
(.length should work)
With pattern: "^ ... $" you have specified that the pattern is anchored
to the start and end of the string, in other words, the entire string
must match the pattern. Just remove ^ and $ from the pattern
and you'll get the expected result.
Simple extension:
extension String {
func replacingRegex(
matching pattern: String,
findingOptions: NSRegularExpression.Options = .caseInsensitive,
replacingOptions: NSRegularExpression.MatchingOptions = [],
with template: String
) throws -> String {
let regex = try NSRegularExpression(pattern: pattern, options: findingOptions)
let range = NSRange(startIndex..., in: self)
return regex.stringByReplacingMatches(in: self, options: replacingOptions, range: range, withTemplate: template)
}
}
✅ Advantages to other answers
Exposed throwing error to the caller
Exposed finding options to the caller with default for the ease of use
Exposed replacing options to the caller with default for the ease of use
Fixed the range BUG 🐞 in the original answer
A notice to all answers that uses .count in their answers:
This will cause problems in cases that the operating target range has surrogate-paired characters.
Please fix your answers by using .utf16.count instead.
Here's Ryan Brodie 's answer with this fix. It works with Swift 5.5.
private extension String {
mutating func regReplace(pattern: String, replaceWith: String = "") {
do {
let regex = try NSRegularExpression(pattern: pattern, options: [.caseInsensitive, .anchorsMatchLines])
let range = NSRange(location: 0, length: self.utf16.count)
self = regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: replaceWith)
} catch { return }
}
}
Update: If considering #coyer 's concerns:
private extension String {
mutating func regReplace(pattern: String, replaceWith: String = "") {
do {
let regex = try NSRegularExpression(pattern: pattern, options: [.caseInsensitive, .anchorsMatchLines])
let range = NSRange(self.startIndex..., in: self)
self = regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: replaceWith)
} catch { return }
}
}
Also: to #Martin R' :
It is okay to use ^ and $ in Regex as long as you have enabled the ".anchorsMatchLines" in the Regex options. I already applied this option in the codeblocks above.

Swift splitting "abc1.23.456.7890xyz" into "abc", "1", "23", "456", "7890" and "xyz"

In Swift on OS X I am trying to chop up the string "abc1.23.456.7890xyz" into these strings:
"abc"
"1"
"23"
"456"
"7890"
"xyz"
but when I run the following code I get the following:
=> "abc1.23.456.7890xyz"
(0,3) -> "abc"
(3,1) -> "1"
(12,4) -> "7890"
(16,3) -> "xyz"
which means that the application correctly found "abc", the first token "1", but then the next token found is "7890" (missing out "23" and "456") followed by "xyz".
Can anyone see how the code can be changed to find ALL of the strings (including "23" and "456")?
Many thanks in advance.
import Foundation
import XCTest
public
class StackOverflowTest: XCTestCase {
public
func testRegex() {
do {
let patternString = "([^0-9]*)([0-9]+)(?:\\.([0-9]+))*([^0-9]*)"
let regex = try NSRegularExpression(pattern: patternString, options: [])
let string = "abc1.23.456.7890xyz"
print("=> \"\(string)\"")
let range = NSMakeRange(0, string.characters.count)
regex.enumerateMatchesInString(string, options: [], range: range) {
(textCheckingResult, _, _) in
if let textCheckingResult = textCheckingResult {
for nsRangeIndex in 1 ..< textCheckingResult.numberOfRanges {
let nsRange = textCheckingResult.rangeAtIndex(nsRangeIndex)
let location = nsRange.location
if location < Int.max {
let startIndex = string.startIndex.advancedBy(location)
let endIndex = startIndex.advancedBy(nsRange.length)
let value = string[startIndex ..< endIndex]
print("\(nsRange) -> \"\(value)\"")
}
}
}
}
} catch {
}
}
}
It's all about your regex pattern. You want to find a series of contiguous letters or digits. Try this pattern instead:
let patternString = "([a-zA-Z]+|\\d+)"
alternative 'Swifty' way
let str = "abc1.23.456.7890xyz"
let chars = str.characters.map{ $0 }
enum CharType {
case Number
case Alpha
init(c: Character) {
self = .Alpha
if isNumber(c) {
self = .Number
}
}
func isNumber(c: Character)->Bool {
return "1234567890".characters.map{ $0 }.contains(c)
}
}
var tmp = ""
tmp.append(chars[0])
var type = CharType(c: chars[0])
for i in 1..<chars.count {
let c = CharType(c: chars[i])
if c != type {
tmp.append(Character("."))
}
tmp.append(chars[i])
type = c
}
tmp.characters.split(".", maxSplit: Int.max, allowEmptySlices: false).map(String.init)
// ["abc", "1", "23", "456", "7890", "xyz"]

Swift 2.1+ return String array, with emojis \\w+ expression

The problem is "\w+" works fine with just plain text. However, the goal is to avoid having the emoji characters included as whitespace.
Example:
"This is some text 🏈🏈".regex("\\w+")
Desired output:
["This","is","some","text","🏈🏈"]
Code:
extension String {
func regex (pattern: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions(rawValue: 0))
let nsstr = self as NSString
let all = NSRange(location: 0, length: nsstr.length)
var matches : [String] = [String]()
regex.enumerateMatchesInString(self, options: NSMatchingOptions(rawValue: 0), range: all) {
(result : NSTextCheckingResult?, _, _) in
if let r = result {
let result = nsstr.substringWithRange(r.range) as String
matches.append(result)
}
}
return matches
} catch {
return [String]()
}
}
}
The code above gives the following output:
"This is some text 🏈🏈".regex("\\w+")
// Yields: ["This", "is", "some", "text"]
// Note the 🏈🏈 are missing.
Is it a coding issue, regex issue, or both? Other answers seem to show the same problem.
func matchesForRegexInText(regex: String!, text: String!) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex, options: [])
let nsString = text as NSString
let results = regex.matchesInString(text,
options: [], range: NSMakeRange(0, nsString.length))
return results.map { nsString.substringWithRange($0.range)}
} catch let error as NSError {
print("invalid regex: \(error.localizedDescription)")
return []
}
}
let string = "This is some text 🏈🏈"
let matches = matchesForRegexInText("\\w+", text: string)
// Also yields ["This", "is", "some", "text"]
My Mistake
\w+ is word boundary
"This is some text \t 🏈🏈".regex("[^ |^\t]+")
// Give correct answer ["This", "is", "some", "text", "🏈🏈"]

Matching strings in Swift where characters are different but contain equal Unicode scalars

I want to match a string with a regex in Swift. I am following the approach described here.
Typically this would work like this (as evaluated in a Xcode playground):
var str1 = "hello"
var str2 = "bye"
var regex1 = "[abc]"
str1.rangeOfString(regex1, options:.RegularExpressionSearch) != nil // false - there is no match
str2.rangeOfString(regex1, options:.RegularExpressionSearch) != nil // true - there is a match
So far so good. Now let us take two strings which contain characters consisting of more than one Unicode scalar like so (as evaluated in a Xcode playground):
var str3 = "✔️"
var regex2 = "[✖️]"
"✔️" == "✖️" // false - the strings are not equal
str3.rangeOfString(regex2, options:.RegularExpressionSearch) != nil // true - there is a match!
I wouldn't expect a match when I try to find "✖️" in "✔️", but because "\u{2714}"+"\u{FE0F}" == "✔️" and "\u{2716}"+"\u{FE0F}" == "✖️", then "\u{FE0F}" is found in both and that gives a match.
How would you perform the match?
Digging into the link provided by #stribizhev I have come up with this (as evaluated in Xcode playground):
var str1 = "hello"
var str2 = "bye"
var str3 = "✔️"
var regex1 = "[abc]"
var regex2 = "[✖️]"
let matcher1 = try! NSRegularExpression(pattern: regex1, options: NSRegularExpressionOptions.CaseInsensitive)
let matcher2 = try! NSRegularExpression(pattern: regex2, options: NSRegularExpressionOptions.CaseInsensitive)
matcher1.numberOfMatchesInString(str1, options: NSMatchingOptions.ReportCompletion, range: NSMakeRange(0, str1.characters.count)) != 0 // false - does not match
matcher1.numberOfMatchesInString(str2, options: NSMatchingOptions.ReportCompletion, range: NSMakeRange(0, str2.characters.count)) != 0 // true - matches
matcher2.numberOfMatchesInString(str3, options: NSMatchingOptions.ReportCompletion, range: NSMakeRange(0, str3.characters.count)) != 0 // false - does not match
This is for XCode 7.1 and Swift 2.1