How to use regex to split string into groups of identical characters? - regex

I got a string like this:
var string = "AAAAAAABBBCCCCCCDD"
and like to split the string into an array of this format (same characters --> same group) using regular expressions:
Array: "AAAAAAA", "BBB", "CCCCCC", "DD"
This Is what I got so far but tbh I can not really get it working.
var array = [String]()
var string = "AAAAAAABBBCCCCCCDD"
let pattern = "\\ b([1,][a-z])\\" // mistake?!
let regex = try! NSRegularExpression(pattern: pattern, options: [])
array = regex.matchesInString(string, options: [], range: NSRange(location: 0, length: string.count))

You can achieve that using this function from this answer:
func matches(for regex: String, in text: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex)
let results = regex.matches(in: text,
range: NSRange(text.startIndex..., in: text))
return results.map {
String(text[Range($0.range, in: text)!])
}
} catch let error {
print("invalid regex: \(error.localizedDescription)")
return []
}
}
Passing (.)\\1+ as regex and AAAAAAABBBCCCCCCDD as text like this:
let result = matches(for: "(.)\\1+", in: "AAAAAAABBBCCCCCCDD")
print(result) // ["AAAAAAA", "BBB", "CCCCCC", "DD"]

You can achieve that with a "back reference", compare
NSRegularExpression:
\n
Back Reference. Match whatever the nth capturing group matched. n must be a number ≥ 1 and ≤ total number of capture groups in the pattern.
Example (using the utility method from Swift extract regex matches):
let string = "AAAAAAABBBCCCCCCDDE"
let pattern = "(.)\\1*"
let array = matches(for: pattern, in: string)
print(array)
// ["AAAAAAA", "BBB", "CCCCCC", "DD", "E"]
The pattern matches an arbitrary character, followed by zero or more
occurrences of the same character. If you are only interested in
repeating word characters use
let pattern = "(\\w)\\1*"
instead.

Related

NSRegularExpression to extract subset of text in Swift 3

I am trying to use NSRegularExpression(pattern: regex) to extract 10.32.15.235 in a string: \"IPAddress\":\"10.32.15.235\",\"WAN\" using Swift 3.
However, I'm getting an error using this function from this answer
func matches(for regex: String, in text: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex)
let nsString = text as NSString
let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
return results.map { nsString.substring(with: $0.range)}
} catch let error {
print("invalid regex: \(error.localizedDescription)")
return []
}
}
With this call:
let pattern = "IPAddress\\\":\\\"(.+?)\\"
let IPAddressString = self.matches(for: pattern, in: stringData!)
print(IPAddressString)
However, the error part of the function is called with this error:
invalid regex: The value “IPAddress\":\"(.+?)\” is invalid.
Can you help me modify the regex expression for Swift 3?
Thanks
Note that in case you have a valid JSON, you may use a JSON parser with Swift.
TO fix your current regex approach, you may use
let pattern = "(?<=IPAddress\":\")[^\"]+"
Pattern details
(?<=IPAddress\":\") - a positive lookahead that matches a position in the string right after IPAddress":" substring
[^\"]+ - a negated character class matching 1 or more chars other than "
See the regex demo.

Extracting words from inside sentence using regex in swift

I want a regex to extract Starboy and The Weekend / Daft Punk out of string:
The Weeknd / Daft Punk - text=\"Starboy\" song_spot=\"M\" MediaBaseId=\"2238986\" itunesTrackId=\"0\" amgTrackId=\"-1\" amgArtistId=\"0\" TAID=\"744880\" TPID=\"43758958\" cartcutId=\"08
So far this is my attempt
do {
let input = "The Weeknd / Daft Punk - text=\"Starboy\" song_spot=\"M\" MediaBaseId=\"2238986\" itunesTrackId=\"0\" amgTrackId=\"-1\" amgArtistId=\"0\" TAID=\"744880\" TPID=\"43758958\" cartcutId=\"0893584001\""
let regex = try NSRegularExpression(pattern: "text=\"(.*)", options: NSRegularExpression.Options.caseInsensitive)
let matches = regex.matches(in: input, options: [], range: NSRange(location: 0, length: input.utf16.count))
if let match = matches.first {
let range = match.range(at:1)
if let swiftRange = Range(range, in: input) {
let name = input[swiftRange]
print(name)
}
}
} catch {
print("Regex was bad!")
}
But this gives me the entire string
Starboy" song_spot="M" MediaBaseId="2238986" itunesTrackId="0" amgTrackId="-1" amgArtistId="0" TAID="744880" TPID="43758958" cartcutId="0893584001"
If you need to capture all text up to the sequence - text= followed by any word(s) between quote marks you can use this regex ".*(?=(text=\"[\\w\\s]+\"))" and to capture any word(s) after the sequence text=" you can use this regex "(?<=text=\")([\\w\\s]+)". If you want to capture both ranges just use "|" between them as follow:
let string = """
The Weeknd / Daft Punk - text=\"Starboy\" song_spot=\"M\" MediaBaseId=\"2238986\" itunesTrackId=\"0\" amgTrackId=\"-1\" amgArtistId=\"0\" TAID=\"744880\" TPID=\"43758958\" cartcutId=\"08
"""
let pattern = ".*(?=( - text=\"[\\w\\s]+\"))|(?<=text=\")([\\w\\s]+)"
do {
let regex = try NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let matches = regex.matches(in: string, options: [], range: NSRange(location: 0, length: string.utf16.count))
for match in matches {
if let range = Range(match.range, in: string) {
let name = string[range]
print(name)
}
}
} catch {
print("Regex was bad!")
}
This will print
The Weeknd / Daft Punk
Starboy

Make sure regex matches the entire string with Swift regex

How to check whether a WHOLE string can be matches to regex? In Java is method String.matches(regex)
You need to use anchors, ^ (start of string anchor) and $ (end of string anchor), with range(of:options:range:locale:), passing the .regularExpression option:
import Foundation
let phoneNumber = "123-456-789"
let result = phoneNumber.range(of: "^\\d{3}-\\d{3}-\\d{3}$", options: .regularExpression) != nil
print(result)
Or, you may pass an array of options, [.regularExpression, .anchored], where .anchored will anchor the pattern at the start of the string only, and you will be able to omit ^, but still, $ will be required to anchor at the string end:
let result = phoneNumber.range(of: "\\d{3}-\\d{3}-\\d{3}$", options: [.regularExpression, .anchored]) != nil
See the online Swift demo
Also, using NSPredicate with MATCHES is an alternative here:
The left hand expression equals the right hand expression using a regex-style comparison according to ICU v3 (for more details see the ICU User Guide for Regular Expressions).
MATCHES actually anchors the regex match both at the start and end of the string (note this might not work in all Swift 3 builds):
let pattern = "\\d{3}-\\d{3}-\\d{3}"
let predicate = NSPredicate(format: "self MATCHES [c] %#", pattern)
let result = predicate.evaluate(with: "123-456-789")
What you are looking for is range(of:options:range:locale:) then you can then compare the result of range(of:option:) with whole range of comparing string..
Example:
let phoneNumber = "(999) 555-1111"
let wholeRange = phoneNumber.startIndex..<phoneNumber.endIndex
if let match = phoneNumber.range(of: "\\(?\\d{3}\\)?\\s\\d{3}-\\d{4}", options: .regularExpression), wholeRange == match {
print("Valid number")
}
else {
print("Invalid number")
}
//Valid number
Edit: You can also use NSPredicate and compare your string with evaluate(with:) method of its.
let pattern = "^\\(?\\d{3}\\)?\\s\\d{3}-\\d{4}$"
let predicate = NSPredicate(format: "self MATCHES [c] %#", pattern)
if predicate.evaluate(with: "(888) 555-1111") {
print("Valid")
}
else {
print("Invalid")
}
Swift extract regex matches
with little bit of edit
import Foundation
func matches(for regex: String, in text: String) -> Bool {
do {
let regex = try NSRegularExpression(pattern: regex)
let nsString = text as NSString
let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
return !results.isEmpty
} catch let error {
print("invalid regex: \(error.localizedDescription)")
return false
}
}
Example usage from link above:
let string = "19320"
let matched = matches(for: "^[1-9]\\d*$", in: string)
print(matched) // will match
let string = "a19320"
let matched = matches(for: "^[1-9]\\d*$", in: string)
print(matched) // will not match

How to parse a string of hex into ascii equivalent in Swift 2

In swift 2 what is the best way to go about turning strings of hex characters into their ascii equivalent.
Given
let str1 = "0x4d 0x4c 0x4e 0x63"
let str2 = "4d 4c 4e 63"
let str3 = "4d4c4e63"
let str4 = "4d4d 4e63"
let str5 = "4d,4c,4e,63"
we would like to run a function (or string extension) that spits out: 'MLNc' which is the ascii equivalent of the hex strings
Pseudo Code:
Strip out all "junk", commas spaces etc
Get "2 character chunks" and then convert these characters into the int equivalent with strtoul
build an array of characters and merge them into a string
Partial Implementation
func hexStringtoAscii(hexString : String) -> String {
let hexArray = split(hexString.characters) { $0 == " "}.map(String.init)
let numArray = hexArray.map{ strtoul($0, nil, 16) }.map{Character(UnicodeScalar(UInt32($0)))}
return String(numArray)
}
Is this partial implementation on the correct path? And if so, how is the best way to handle the chunking
Using regular expression matching is one possible method to extract the
"hex numbers" from the string.
What you are looking for is an optional "0x", followed by exactly
2 hex digits. The corresponding regex pattern is "(0x)?([0-9a-f]{2})".
Then you can convert each match to a Character and finally concatenate
the characters to a String, quite similar to your "partial implementation". Instead of strtoul() you can use the UInt32
initializer
init?(_ text: String, radix: Int = default)
which is new in Swift 2.
The pattern has two "capture groups" (encloses in parentheses),
the first one matches the optional "0x", and the second one matches
the two hex digits, the corresponding range can be retrieved with
rangeAtIndex(2).
This leads to the following implementation which can handle all
your sample strings:
func hexStringtoAscii(hexString : String) -> String {
let pattern = "(0x)?([0-9a-f]{2})"
let regex = try! NSRegularExpression(pattern: pattern, options: .CaseInsensitive)
let nsString = hexString as NSString
let matches = regex.matchesInString(hexString, options: [], range: NSMakeRange(0, nsString.length))
let characters = matches.map {
Character(UnicodeScalar(UInt32(nsString.substringWithRange($0.rangeAtIndex(2)), radix: 16)!))
}
return String(characters)
}
(See Swift extract regex matches for an explanation for the conversion to NSString.)
Note that this function is quite lenient, it just searches for
2-digit hex strings and ignores all other characters, so this
would be accepted as well:
let str6 = "4d+-4c*/4e😈🇩🇪0x63"
Update for Swift 5.1:
func hexStringtoAscii(_ hexString : String) -> String {
let pattern = "(0x)?([0-9a-f]{2})"
let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let nsString = hexString as NSString
let matches = regex.matches(in: hexString, options: [], range: NSMakeRange(0, nsString.length))
let characters = matches.map {
Character(UnicodeScalar(UInt32(nsString.substring(with: $0.range(at: 2)), radix: 16)!)!)
}
return String(characters)
}

How to use regex with Swift?

I am making an app in Swift and I need to catch 8 numbers from a string.
Here's the string:
index.php?page=index&l=99182677
My pattern is:
&l=(\d{8,})
And here's my code:
var yourAccountNumber = "index.php?page=index&l=99182677"
let regex = try! NSRegularExpression(pattern: "&l=(\\d{8,})", options: NSRegularExpressionOptions.CaseInsensitive)
let range = NSMakeRange(0, yourAccountNumber.characters.count)
let match = regex.matchesInString(yourAccountNumber, options: NSMatchingOptions.Anchored, range: range)
Firstly, I don't know what the NSMatchingOptions means, on the official Apple library, I don't get all the .Anchored, .ReportProgress, etc stuff. Anyone would be able to lighten me up on this?
Then, when I print(match), nothing seems to contain on that variable ([]).
I am using Xcode 7 Beta 3, with Swift 2.0.
ORIGINAL ANSWER
Here is a function you can leverage to get captured group texts:
import Foundation
extension String {
func firstMatchIn(string: NSString!, atRangeIndex: Int!) -> String {
var error : NSError?
let re = NSRegularExpression(pattern: self, options: .CaseInsensitive, error: &error)
let match = re.firstMatchInString(string, options: .WithoutAnchoringBounds, range: NSMakeRange(0, string.length))
return string.substringWithRange(match.rangeAtIndex(atRangeIndex))
}
}
And then:
var result = "&l=(\\d{8,})".firstMatchIn(yourAccountNumber, atRangeIndex: 1)
The 1 in atRangeIndex: 1 will extract the text captured by (\d{8,}) capture group.
NOTE1: If you plan to extract 8, and only 8 digits after &l=, you do not need the , in the limiting quantifier, as {8,} means 8 or more. Change to {8} if you plan to capture just 8 digits.
NOTE2: NSMatchingAnchored is something you would like to avoid if your expected result is not at the beginning of a search range. See documentation:
Specifies that matches are limited to those at the start of the search range.
NOTE3: Speaking about "simplest" things, I'd advise to avoid using look-arounds whenever you do not have to. Look-arounds usually come at some cost to performance, and if you are not going to capture overlapping text, I'd recommend to use capture groups.
UPDATE FOR SWIFT 2
I have come up with a function that will return all matches with all capturing groups (similar to preg_match_all in PHP). Here is a way to use it for your scenario:
func regMatchGroup(regex: String, text: String) -> [[String]] {
do {
var resultsFinal = [[String]]()
let regex = try NSRegularExpression(pattern: regex, options: [])
let nsString = text as NSString
let results = regex.matchesInString(text,
options: [], range: NSMakeRange(0, nsString.length))
for result in results {
var internalString = [String]()
for var i = 0; i < result.numberOfRanges; ++i{
internalString.append(nsString.substringWithRange(result.rangeAtIndex(i)))
}
resultsFinal.append(internalString)
}
return resultsFinal
} catch let error as NSError {
print("invalid regex: \(error.localizedDescription)")
return [[]]
}
}
// USAGE:
let yourAccountNumber = "index.php?page=index&l=99182677"
let matches = regMatchGroup("&l=(\\d{8,})", text: yourAccountNumber)
if (matches.count > 0) // If we have matches....
{
print(matches[0][1]) // Print the first one, Group 1.
}
It may be easier just to use the NSString method instead of NSRegularExpression.
var yourAccountNumber = "index.php?page=index&l=99182677"
println(yourAccountNumber) // index.php?page=index&l=99182677
let regexString = "(?<=&l=)\\d{8,}+"
let options :NSStringCompareOptions = .RegularExpressionSearch | .CaseInsensitiveSearch
if let range = yourAccountNumber.rangeOfString(regexString, options:options) {
let digits = yourAccountNumber.substringWithRange(range)
println("digits: \(digits)")
}
else {
print("Match not found")
}
The (?<=&l=) means precedes but not part of.
In detail:
Look-behind assertion. True if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)
In general performance considerations of a look-behind without instrumented proof is just premature optimization. That being said there may be other valid reasons for and against look-arounds in regular expressions.
ICU User Guide: Regular Expressions
For Swift 2, you can use this extension of String:
import Foundation
extension String {
func firstMatchIn(string: NSString!, atRangeIndex: Int!) -> String {
do {
let re = try NSRegularExpression(pattern: self, options: NSRegularExpressionOptions.CaseInsensitive)
let match = re.firstMatchInString(string as String, options: .WithoutAnchoringBounds, range: NSMakeRange(0, string.length))
return string.substringWithRange(match!.rangeAtIndex(atRangeIndex))
} catch {
return ""
}
}
}
You can get the account-number with:
var result = "&l=(\\d{8,})".firstMatchIn(yourAccountNumber, atRangeIndex: 1)
Replace NSMatchingOptions.Anchored with NSMatchingOptions() (no options)