How to Split String Using Regex Expressions - regex
I have a string "323 ECO Economics Course 451 ENG English Course 789 Mathematical Topography" I want to split this string using the regex expression [0-9][0-9][0-9][A-Z][A-Z][A-Z] so that the function returns the array:
Array =
["323 ECO Economics Course ", "451 ENG English Course", "789 Mathematical Topography"]
How would I go about doing this using swift?
Edit
My question is different than the one linked to. I realize that you can split a string in swift using myString.components(separatedBy: "splitting string") The issue is that that question doesn't address how to make the splitting string a regex expression. I tried using mystring.components(separatedBy: "[0-9][0-9][0-9][A-Z][A-Z][A-Z]", options: .regularExpression) but that didn't work.
How can I make the separatedBy: portion a regular expression?
You can use regex "\\b[0-9]{1,}[a-zA-Z ]{1,}" and this extension from this answer to get all ranges of a string using literal, caseInsensitive or regularExpression search:
extension StringProtocol {
func ranges<S: StringProtocol>(of string: S, options: String.CompareOptions = []) -> [Range<Index>] {
var result: [Range<Index>] = []
var startIndex = self.startIndex
while startIndex < endIndex,
let range = self[startIndex...].range(of: string, options: options) {
result.append(range)
startIndex = range.lowerBound < range.upperBound ? range.upperBound :
index(range.lowerBound, offsetBy: 1, limitedBy: endIndex) ?? endIndex
}
return result
}
}
let inputString = "323 ECO Economics Course 451 ENG English Course 789 Mathematical Topography"
let courses = inputString.ranges(of: "\\b[0-9]{1,}[a-zA-Z ]{1,}", options: .regularExpression).map { inputString[$0].trimmingCharacters(in: .whitespaces) }
print(courses) // ["323 ECO Economics Course", "451 ENG English Course", "789 Mathematical Topography"]
Swift doesn't have native regular expressions as of yet. But Foundation provides NSRegularExpression.
import Foundation
let toSearch = "323 ECO Economics Course 451 ENG English Course 789 MAT Mathematical Topography"
let pattern = "[0-9]{3} [A-Z]{3}"
let regex = try! NSRegularExpression(pattern: pattern, options: [])
// NSRegularExpression works with objective-c NSString, which are utf16 encoded
let matches = regex.matches(in: toSearch, range: NSMakeRange(0, toSearch.utf16.count))
// the combination of zip, dropFirst and map to optional here is a trick
// to be able to map on [(result1, result2), (result2, result3), (result3, nil)]
let results = zip(matches, matches.dropFirst().map { Optional.some($0) } + [nil]).map { current, next -> String in
let range = current.rangeAt(0)
let start = String.UTF16Index(range.location)
// if there's a next, use it's starting location as the ending of our match
// otherwise, go to the end of the searched string
let end = next.map { $0.rangeAt(0) }.map { String.UTF16Index($0.location) } ?? String.UTF16Index(toSearch.utf16.count)
return String(toSearch.utf16[start..<end])!
}
dump(results)
Running this will output
▿ 3 elements
- "323 ECO Economics Course "
- "451 ENG English Course "
- "789 MAT Mathematical Topography"
I needed something like this and should work more like JS String.prototype.split(pat: RegExp) or Rust's String.splitn(pat: Pattern<'a>) but with Regex. I ended up with this
extension NSRegularExpression {
convenience init(_ pattern: String) {...}
/// An array of substring of the given string, separated by this regular expression, restricted to returning at most n items.
/// If n substrings are returned, the last substring (the nth substring) will contain the remainder of the string.
/// - Parameter str: String to be matched
/// - Parameter n: If `n` is specified and n != -1, it will be split into n elements else split into all occurences of this pattern
func splitn(_ str: String, _ n: Int = -1) -> [String] {
let range = NSRange(location: 0, length: str.utf8.count)
let matches = self.matches(in: str, range: range);
var result = [String]()
if (n != -1 && n < 2) || matches.isEmpty { return [str] }
if let first = matches.first?.range {
if first.location == 0 { result.append("") }
if first.location != 0 {
let _range = NSRange(location: 0, length: first.location)
result.append(String(str[Range(_range, in: str)!]))
}
}
for (cur, next) in zip(matches, matches[1...]) {
let loc = cur.range.location + cur.range.length
if n != -1 && result.count + 1 == n {
let _range = NSRange(location: loc, length: str.utf8.count - loc)
result.append(String(str[Range(_range, in: str)!]))
return result
}
let len = next.range.location - loc
let _range = NSRange(location: loc, length: len)
result.append(String(str[Range(_range, in: str)!]))
}
if let last = matches.last?.range, !(n != -1 && result.count >= n) {
let lastIndex = last.length + last.location
if lastIndex == str.utf8.count { result.append("") }
if lastIndex < str.utf8.count {
let _range = NSRange(location: lastIndex, length: str.utf8.count - lastIndex)
result.append(String(str[Range(_range, in: str)!]))
}
}
return result;
}
}
Passes the following tests
func testRegexSplit() {
XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn("My . Love"), ["My", "Love"])
XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn("My . Love . "), ["My", "Love", ""])
XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn(" . My . Love"), ["", "My", "Love"])
XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn(" . My . Love . "), ["", "My", "Love", ""])
XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX"), ["", "My", "", "Love", ""])
}
func testRegexSplitWithN() {
XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 1), ["xXMyxXxXLovexX"])
XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", -1), ["", "My", "", "Love", ""])
XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 2), ["", "MyxXxXLovexX"])
XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 3), ["", "My", "xXLovexX"])
XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 4), ["", "My", "", "LovexX"])
}
func testNoMatches() {
XCTAssertEqual(NSRegularExpression("xX").splitn("MyLove", 1), ["MyLove"])
XCTAssertEqual(NSRegularExpression("xX").splitn("MyLove"), ["MyLove"])
XCTAssertEqual(NSRegularExpression("xX").splitn("MyLove", 3), ["MyLove"])
}
Update to #tomahh answer for latest Swift (5).
import Foundation
let toSearch = "323 ECO Economics Course 451 ENG English Course 789 MAT Mathematical Topography"
let pattern = "[0-9]{3} [A-Z]{3}"
let regex = try! NSRegularExpression(pattern: pattern, options: [])
let matches = regex.matches(in: toSearch, range: NSRange(toSearch.startIndex..<toSearch.endIndex, in: toSearch))
// the combination of zip, dropFirst and map to optional here is a trick
// to be able to map on [(result1, result2), (result2, result3), (result3, nil)]
let results = zip(matches, matches.dropFirst().map { Optional.some($0) } + [nil]).map { current, next -> String in
let start = toSearch.index(toSearch.startIndex, offsetBy: current.range.lowerBound)
let end = next.map(\.range).map { toSearch.index(toSearch.startIndex, offsetBy: $0.lowerBound) } ?? toSearch.endIndex
return String(toSearch[start..<end])
}
dump(results)
▿ 3 elements
- "323 ECO Economics Course "
- "451 ENG English Course "
- "789 MAT Mathematical Topography"
Related
Extracting words from inside sentence using regex in swift
I want a regex to extract Starboy and The Weekend / Daft Punk out of string: The Weeknd / Daft Punk - text=\"Starboy\" song_spot=\"M\" MediaBaseId=\"2238986\" itunesTrackId=\"0\" amgTrackId=\"-1\" amgArtistId=\"0\" TAID=\"744880\" TPID=\"43758958\" cartcutId=\"08 So far this is my attempt do { let input = "The Weeknd / Daft Punk - text=\"Starboy\" song_spot=\"M\" MediaBaseId=\"2238986\" itunesTrackId=\"0\" amgTrackId=\"-1\" amgArtistId=\"0\" TAID=\"744880\" TPID=\"43758958\" cartcutId=\"0893584001\"" let regex = try NSRegularExpression(pattern: "text=\"(.*)", options: NSRegularExpression.Options.caseInsensitive) let matches = regex.matches(in: input, options: [], range: NSRange(location: 0, length: input.utf16.count)) if let match = matches.first { let range = match.range(at:1) if let swiftRange = Range(range, in: input) { let name = input[swiftRange] print(name) } } } catch { print("Regex was bad!") } But this gives me the entire string Starboy" song_spot="M" MediaBaseId="2238986" itunesTrackId="0" amgTrackId="-1" amgArtistId="0" TAID="744880" TPID="43758958" cartcutId="0893584001"
If you need to capture all text up to the sequence - text= followed by any word(s) between quote marks you can use this regex ".*(?=(text=\"[\\w\\s]+\"))" and to capture any word(s) after the sequence text=" you can use this regex "(?<=text=\")([\\w\\s]+)". If you want to capture both ranges just use "|" between them as follow: let string = """ The Weeknd / Daft Punk - text=\"Starboy\" song_spot=\"M\" MediaBaseId=\"2238986\" itunesTrackId=\"0\" amgTrackId=\"-1\" amgArtistId=\"0\" TAID=\"744880\" TPID=\"43758958\" cartcutId=\"08 """ let pattern = ".*(?=( - text=\"[\\w\\s]+\"))|(?<=text=\")([\\w\\s]+)" do { let regex = try NSRegularExpression(pattern: pattern, options: .caseInsensitive) let matches = regex.matches(in: string, options: [], range: NSRange(location: 0, length: string.utf16.count)) for match in matches { if let range = Range(match.range, in: string) { let name = string[range] print(name) } } } catch { print("Regex was bad!") } This will print The Weeknd / Daft Punk Starboy
Without using NSRegularExpression, How can I get all matches of my string regular expression?
Swift 3 introduced String.range(of:options). Then, with this function, is possible match a part of string without creating a NSRegularExpression object, for example: let text = "it is need #match my both #hashtag!" let match = text.range(of: "(?:^#|\\s#)[\\p{L}0-9_]*", options: .regularExpression)! print(text[match]) // #math But, is possible match both occurrences of the regexp (that is, #match and #hashtag), instead of only the first?
let text = "it is need #match my both #hashtag!" // create an object to store the ranges found var ranges: [Range<String.Index>] = [] // create an object to store your search position var start = text.startIndex // create a while loop to find your regex ranges while let range = text.range(of: "(?:^#|\\s#)[\\p{L}0-9_]*", options: .regularExpression, range: start..<text.endIndex) { // append your range found ranges.append(range) // and change the startIndex of your string search start = range.lowerBound < range.upperBound ? range.upperBound : text.index(range.lowerBound, offsetBy: 1, limitedBy: text.endIndex) ?? text.endIndex } ranges.forEach({print(text[$0])}) This will print #match #hashtag If you need to use it more than once in your code you should add this extension to your project: extension StringProtocol { func ranges<S: StringProtocol>(of string: S, options: String.CompareOptions = []) -> [Range<Index>] { var result: [Range<Index>] = [] var start = startIndex while start < endIndex, let range = self[start...].range(of: string, options: options) { result.append(range) start = range.lowerBound < range.upperBound ? range.upperBound : index(after: range.lowerBound) } return result } } usage: let text = "it is need #match my both #hashtag!" let pattern = "(?<!\\S)#[\\p{L}0-9_]*" let ranges = text.ranges(of: pattern, options: .regularExpression) let matches = ranges.map{text[$0]} print(matches) // ["#match", "#hashtag"]
Search array element in string swift 3.0
I want to search an element of an array of strings in a string. Like this: let array:[String] = ["dee", "kamal"] let str:String = "Hello all how are you, I m here for deepak." so, I want str.contain("dee") == true any possible search in string?
You can do it in one line by composing a regular expression pattern "(item1|item2|item3)" let array = ["dee", "kamal"] let str = "Hello all how are you, I m here for deepak." let success = str.range(of: "(" + array.joined(separator: "|") + ")", options: .regularExpression) != nil
You should iterate over the array and for each element, call str.contains. for word in array { if str.contains(word) { print("\(word) is part of the string") } else { print("Word not found") } }
You can do like this: array.forEach { (item) in var isContains:Bool = str.contains(item) print(isContains) }
Number of occurrences of substring in string in Swift
My main string is "hello Swift Swift and Swift" and substring is Swift. I need to get the number of times the substring "Swift" occurs in the mentioned string. This code can determine whether the pattern exists. var string = "hello Swift Swift and Swift" if string.rangeOfString("Swift") != nil { println("exists") } Now I need to know the number of occurrence.
A simple approach would be to split on "Swift", and subtract 1 from the number of parts: let s = "hello Swift Swift and Swift" let tok = s.components(separatedBy:"Swift") print(tok.count-1) This code prints 3. Edit: Before Swift 3 syntax the code looked like this: let tok = s.componentsSeparatedByString("Swift")
Should you want to count characters rather than substrings: extension String { func count(of needle: Character) -> Int { return reduce(0) { $1 == needle ? $0 + 1 : $0 } } }
Optimising dwsolbergs solution to count faster. Also faster than componentsSeparatedByString. extension String { /// stringToFind must be at least 1 character. func countInstances(of stringToFind: String) -> Int { assert(!stringToFind.isEmpty) var count = 0 var searchRange: Range<String.Index>? while let foundRange = range(of: stringToFind, options: [], range: searchRange) { count += 1 searchRange = Range(uncheckedBounds: (lower: foundRange.upperBound, upper: endIndex)) } return count } } Usage: // return 2 "aaaa".countInstances(of: "aa") If you want to ignore accents, you may replace options: [] with options: .diacriticInsensitive like dwsolbergs did. If you want to ignore case, you may replace options: [] with options: .caseInsensitive like ConfusionTowers suggested. If you want to ignore both accents and case, you may replace options: [] with options: [.caseInsensitive, .diacriticInsensitive] like ConfusionTowers suggested. If, on the other hand, you want the fastest comparison possible and you can guarantee some canonical form for composed character sequences, then you may consider option .literal and it will only perform exact matchs.
Swift 5 Extension extension String { func numberOfOccurrencesOf(string: String) -> Int { return self.components(separatedBy:string).count - 1 } } Example use let string = "hello Swift Swift and Swift" let numberOfOccurrences = string.numberOfOccurrencesOf(string: "Swift") // numberOfOccurrences = 3
I'd recommend an extension to string in Swift 3 such as: extension String { func countInstances(of stringToFind: String) -> Int { var stringToSearch = self var count = 0 while let foundRange = stringToSearch.range(of: stringToFind, options: .diacriticInsensitive) { stringToSearch = stringToSearch.replacingCharacters(in: foundRange, with: "") count += 1 } return count } } It's a loop that finds and removes each instance of the stringToFind, incrementing the count on each go-round. Once the searchString no longer contains any stringToFind, the loop breaks and the count returns. Note that I'm using .diacriticInsensitive so it ignore accents (for example résume and resume would both be found). You might want to add or change the options depending on the types of strings you want to find.
I needed a way to count substrings that may contain the start of the next matched substring. Leveraging dwsolbergs extension and Strings range(of:options:range:locale:) method I came up with this String extension extension String { /** Counts the occurrences of a given substring by calling Strings `range(of:options:range:locale:)` method multiple times. - Parameter substring : The string to search for, optional for convenience - Parameter allowOverlap : Bool flag indicating whether the matched substrings may overlap. Count of "🐼🐼" in "🐼🐼🐼🐼" is 2 if allowOverlap is **false**, and 3 if it is **true** - Parameter options : String compare-options to use while counting - Parameter range : An optional range to limit the search, default is **nil**, meaning search whole string - Parameter locale : Locale to use while counting - Returns : The number of occurrences of the substring in this String */ public func count( occurrencesOf substring: String?, allowOverlap: Bool = false, options: String.CompareOptions = [], range searchRange: Range<String.Index>? = nil, locale: Locale? = nil) -> Int { guard let substring = substring, !substring.isEmpty else { return 0 } var count = 0 let searchRange = searchRange ?? startIndex..<endIndex var searchStartIndex = searchRange.lowerBound let searchEndIndex = searchRange.upperBound while let rangeFound = range(of: substring, options: options, range: searchStartIndex..<searchEndIndex, locale: locale) { count += 1 if allowOverlap { searchStartIndex = index(rangeFound.lowerBound, offsetBy: 1) } else { searchStartIndex = rangeFound.upperBound } } return count } }
why not just use some length maths?? extension String { func occurences(of search:String) -> Int { guard search.count > 0 else { preconditionFailure() } let shrunk = self.replacingOccurrences(of: search, with: "") return (self.count - shrunk.count)/search.count } }
Try this var mainString = "hello Swift Swift and Swift" var count = 0 mainString.enumerateSubstrings(in: mainString.startIndex..<mainString.endIndex, options: .byWords) { (subString, subStringRange, enclosingRange, stop) in if case let s? = subString{ if s.caseInsensitiveCompare("swift") == .orderedSame{ count += 1 } } } print(count)
For the sake of completeness – and because there is a regex tag – this is a solution with Regular Expression let string = "hello Swift Swift and Swift" let regex = try! NSRegularExpression(pattern: "swift", options: .caseInsensitive) let numberOfOccurrences = regex.numberOfMatches(in: string, range: NSRange(string.startIndex..., in: string)) The option .caseInsensitive is optional.
My solution, maybe it will be better to use String.Index instead of Int range but I think in such way it is a bit easier to read. extension String { func count(of char: Character, range: (Int, Int)? = nil) -> Int { let range = range ?? (0, self.count) return self.enumerated().reduce(0) { guard ($1.0 >= range.0) && ($1.0 < range.1) else { return $0 } return ($1.1 == char) ? $0 + 1 : $0 } } }
Solution which uses a higher order functions func subStringCount(str: String, substr: String) -> Int { { $0.isEmpty ? 0 : $0.count - 1 } ( str.components(separatedBy: substr)) } Unit Tests import XCTest class HigherOrderFunctions: XCTestCase { func testSubstringWhichIsPresentInString() { XCTAssertEqual(subStringCount(str: "hello Swift Swift and Swift", substr: "Swift"), 3) } func testSubstringWhichIsNotPresentInString() { XCTAssertEqual(subStringCount(str: "hello", substr: "Swift"), 0) } }
Another way using RegexBuilder in iOS 16+ & swift 5.7+. import RegexBuilder let text = "hello Swift Swift and Swift" let match = text.matches(of: Regex{"Swift"}) print(match.count) // prints 3 Using this as a function func countSubstrings(string : String, subString : String)-> Int{ return string.matches(of: Regex{subString}).count } print(countSubstrings(string: text, subString: "Swift")) //prints 3 Using this as an Extension extension String { func countSubstrings(subString : String)-> Int{ return self.matches(of: Regex{subString}).count } } print(text.countSubstrings(subString: "Swift")) // prints 3
extract the first word from a string - regex
I have the following string: str1 = "cat-one,cat2,cat-3"; OR str1 = "catone,cat-2,cat3"; OR str1 = "catone"; OR str1 = "cat-one"; The point here is words may/may not have "-"s in it Using regex: How could I extract the 1st word? Appreciate any help on this. Thanks, L
It's pretty easy, just include allowed characters in brackets: ^([\w\-]+)
An approach not using a regex: assuming the first word is delimited always by a comma "," you can do this: var str1 = "cat-one"; var i = str1.indexOf(","); var firstTerm = i == -1 ? str1 : str1.substring(0, i); Edit: Assumed this was a javascript question, for some reason.
If someone, one day would like to do it in Swift here you go with an extension : extension String { func firstWord() -> String? { var error : NSError? let internalExpression = NSRegularExpression(pattern: "^[a-zA-Z0-9]*", options: .CaseInsensitive, error: &error)! let matches = internalExpression.matchesInString(self, options: nil, range:NSMakeRange(0, countElements(self))) if (matches.count > 0) { let range = (matches[0] as NSTextCheckingResult).range return (self as NSString).substringWithRange(range) } return nil } } To use it just write: myString.firstWord()