Regular expression to extract href url - regex

I want to extract the links from a String with regular expressions. I found a similar post here and I tried this code
let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>.*?</a>")
let range = NSMakeRange(0, text.characters.count)
let htmlLessString :String = regex.stringByReplacingMatches(in: text,
options: [],
range:range ,
withTemplate: "")
but the proposed regular expression deleted all the content of the href tag. My string look like
SOME stirng some text I need to keep and other text
and the expected result is
SOME stirng https://com.mywebsite.com/yfgvh/f23/fsd some text I need to keep and other text
the perfect result is
SOME stirng some text I need to keep (https://com.mywebsite.com/yfgvh/f23/fsd) and other text
Do you have an idea if it's possible to achieve this?

Of course it deletes the href content because you are ...ReplacingMatches...with empty string.
Your sample string does not match the pattern because the closing tag </a> is missing.
The pattern "<a[^>]+href=\"(.*?)\"[^>]*>" checks until a closing angle bracket after the link.
The captured group is located at index 1 of the match. This code prints all extracted links:
let text = "<a href=\"https://com.mywebsite.com/yfgvh/f23/fsd\" rel=\"DFGHJ\">"
let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>")
let range = NSMakeRange(0, text.characters.count)
let matches = regex.matches(in: text, range: range)
for match in matches {
let htmlLessString = (text as NSString).substring(with: match.rangeAt(1))
print(htmlLessString)
}

I not regular developer of Swift, but, Did you tried to use the withTemplateoption of stringByReplacingMatches like this?
let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>(.*)?</a>")
let range = NSMakeRange(0, text.characters.count)
let htmlLessString :String = regex.stringByReplacingMatches(in:
text,
options: [],
range:range ,
withTemplate: #"$2 ($1)")

This regex seems to work in this case: href="(.*)" .*">(.*)<\/a>(.*) , group 1 would have your url , group 2 text between <a></a> and group 3 text after <a></a> , however you will have to use this extension to be able to get information out of groups, as there is no native group support: http://samwize.com/2016/07/21/how-to-capture-multiple-groups-in-a-regex-with-swift/

Related

regex help. modify regex to exclude content within curly braces

In javascript, my regex is:
let regEx = new RegExp("([A-Za-z])(?!([^<]+)?>)", "gi");
My string is:
<span class="customer-key">{aaa}</span>bbb {ccc}
In the above example, the regex matches "aaa", "bbb" and "ccc".
I would like to update my regex to EXCLUDE anything WITHIN curly braces, so that it ONLY matches "bbb".
How can I update the regex to do so? thanks!
Try (your regexp separate letters so my too)
let regEx = new RegExp("([A-Za-z])(?!([^<]+)?>)(?!([^{]+)?})", "gi");
let str= '<span class="customer-key">{aaa}</span>bbb {ccc}'
let s =str.match(regEx);
console.log(s)
If you want to get bbb , one option could be to use the dom, find the textnode(s) and remove the content between curly braces:
const htmlString = `<span class="customer-key">{aaa}</span>bbb {ccc}`;
let div = document.createElement('div');
div.innerHTML = htmlString;
div.childNodes.forEach(x => {
if (x.nodeType === Node.TEXT_NODE) {
console.log(x.textContent.replace(/{[^}]+}/g, ''));
}
});
Note that parsing html with a regex is not advisable.
If you want to get bbb from your example string, another option could be to match what you don't want to keep and to replace that with an empty string.
const regex = /\s*<[^>]+>\s*|\s*{[^}]+}\s*/gm;
const str = `<span class="customer-key">{aaa}</span>bbb {ccc}`;
const result = str.replace(regex, '');
console.log(result);

How to search for only whole words in a Swift String

I have this NS search expression. searchString passes in a String which I would like to search for in the baseString and highlight. However at the moment if I search for the word 'I' an 'i' in the word 'hide' for example appears highlighted.
I've seen that I can use \b to search for only whole words but I can't see where I add this into the expression. So that only whole words are highlighted.
Another example could be if my baseString contains 'His story is history' and I used searchString to so search for 'his' it will highlight history.
let regex = try! NSRegularExpression(pattern: searchString as! String,options: .caseInsensitive)
for match in regex.matches(in: baseString!, options: NSRegularExpression.MatchingOptions(), range: NSRange(location: 0, length: (baseString?.characters.count)!)) as [NSTextCheckingResult] {
attributed.addAttribute(NSBackgroundColorAttributeName, value: UIColor.yellow, range: match.range)
}
You can easily create a regex pattern from your searchString:
let baseString = "His story is history"
let searchString = "his" //This needs to be a single word
let attributed = NSMutableAttributedString(string: baseString)
//Create a regex pattern matching with word boundaries
let searchPattern = "\\b"+NSRegularExpression.escapedPattern(for: searchString)+"\\b"
let regex = try! NSRegularExpression(pattern: searchPattern, options: .caseInsensitive)
for match in regex.matches(in: baseString, range: NSRange(0..<baseString.utf16.count)) {
attributed.addAttribute(NSBackgroundColorAttributeName, value: UIColor.yellow, range: match.range)
}
Some comments:
Assuming baseString and searchString are non-Optional String in the code above, if not, make them so as soon as possible, before searching.
Empty OptionSet is represented by [], so options: NSRegularExpression.MatchingOptions() in your code can be simplified as option: [], and it is the default value for options: parameter of matches method, which you have no need to specify.
NSRegularExpression takes and returns ranges based on UTF-16 representation of String. You should not use characters.count to make NSRange, use utf16.count instead.
The return type of matches(in:range:) is declared as [NSTextCheckingResult], you have no need to cast it.
Update
I thought of a better solution than my previous answer so I updated it. The original answer will follow for anyone that prefers so.
"(?<=[^A-Za-z0-9]|^)[A-Za-z0-9]+(?=[^A-Za-z0-9]|$)"
Breaking down this expression, (?<=[^A-Za-z0-9]|^) checks for any non-alphanumeric or start of line ^ before the word I want to match. [A-Za-z0-9]+? matches any alphanumeric characters and requires at least one matched by +. (?=[^A-Za-z0-9]|$) will check for another non-alphanumeric or end of line $ after the word I matched. Therefore this expression will match any alphanumeric. To exclude numbers to match only alphabets simply remove 0-9 from the expression like
"(?<=[^A-Za-z]|^)[A-Za-z]+(?=[^A-Za-z]|$)"
For usage replace the center matching expression with the word to match like:
"(?<=[^A-Za-z]|^)\(searchString)(?=[^A-Za-z]|$)"
Old Answer
I tried using this before, it finds every string separated by whitespace. Should do what you need
"\\s[a-zA-Z1-9]*\\s"
Change [a-zA-Z1-9]* to match what you are searching for, in your case fit your original search string into it like
let regex = try! NSRegularExpression(pattern: "\\s\(searchString)\\s" ,options: .caseInsensitive)
As an added answer, \\s will include the whitespace before and after the word. I added a check to exclude the whitespace if it becomes more useful, the pattern is like:
"(?<=\\s)[A-Za-z0-9]*(?=\\s)"
similarly, replace [A-Za-z0-9]* which searches for all words with the search string you need.
Note, (?<=\\s) checks for whitespace before the word but does not include it, (?=\\s) checks for whitespace after, also not including it. This will work better in most scenarios compared to my original answer above since there is no extra whitespace.

Matching but not capture a string in Swift Regex

I'm trying to search for a single plain quote mark (') in given String to then replace it with a single curved quote mark (’). I had tested more patterns but every time the search captures also the adjacent text. For example in the string "I'm", along with the ' mark it gets also the "I" and the "m".
(?:\\S)'(?:\\S)
Is there a possibility for achieve this or in the Swift implementation of Regex there is not support for non-capturing groups?
EDIT:
Example
let startingString = "I'm"
let myPattern = "(?:\\S)(')(?:\\S)"
let mySubstitutionText = "’"
let result = (applyReg(startingString, pattern: myPattern, substitutionText: mySubstitutionText))
func applyReg(startingString: String, pattern: String, substitutionText: String) -> String {
var newStr = startingString
if let regex = try? NSRegularExpression(pattern: pattern, options: .CaseInsensitive) {
let regStr = regex.stringByReplacingMatchesInString(startingString, options: .WithoutAnchoringBounds, range: NSMakeRange(0, startingString.characters.count), withTemplate: startingString)
newStr = regStr
}
return newStr
}
Matching but not capture a string in Swift Regex
In regex, you can use lookarounds to achieve this behavior:
let myPattern = "(?<=\\S)'(?=\\S)"
See the regex demo
Lookarounds do not consume the text they match, they just return true or false so that the regex engine could decide what to do with the currently matched text. If the condition is met, the regex pattern is evaluated further, and if not, the match is failed.
However, using capturing seems quite valid here, do not discard that approach.
Put your quote in a capture group in itself
(?:\\S)(')(?:\\S)
For example, when matching against "I'm", this will capture ["I", "'", "m"]

How to construct Regex pattern Swift

I am trying to construct Regex but it doesn't work. Can anyone help?
I have a string, which I want to remove the following characters:
*_-+=#:><&[]\n
And instruct also to remove all text between (/ and )
Code is belkow:
if let regex = try? NSRegularExpression(pattern: "&[^*_-=;](\\)*;", options: .CaseInsensitive) {
let modString = regex.stringByReplacingMatchesInString(testString, options: .WithTransparentBounds, range: NSMakeRange(0, testString.characters.count), withTemplate: "")
print(modString)
}
You can use
"\\(/[^)]*\\)|[*\r\n_+=#:><&\\[\\]-]"
See the regex demo
The \\(/[^)]*\\) alternative deals with all text between (/ and ) and [*_+=#:><&\\[\\]-] will match all the single characters you need to match.
Note that the hyphen in your regex must either be double-escaped, or placed at the start or end of the character class. Your regex did not work because it created an invalid range:

Regex Works in Perl, Doesn't as NSRegularExpresson

I have this regex:
([^\ \t\r\n\[\{\(\-])?'(?(1)|(?=\s | s\b))
with the following substitution:
$1’
This seems to work in Perl. Given a phrase like "King Solomon's Mines," it will change it to King Solomon’s Mines, but it throws an EXC_BAD_ACCESS error as an NSRegularExpression. This test suite suggests that syntax is valid in php and Python but not Javascript, and that the (?(1) part is the culprit.
Example Swift code:
let string = "King Solomon's Mines"
var anError: NSError? = nil
let pattern = "([^\\ \\t\\r\\n\\[\\{\\(\\-])?'(?(1)|(?=\\s | s\\b))"
let regex = NSRegularExpression(pattern: pattern, options: .CaseInsensitive, error: &anError)
let range = NSMakeRange(0, countElements(string))
let template = "$1’"
let newString = regex.stringByReplacingMatchesInString(string, options: nil, range: range, withTemplate: template)
The let regex declaration is where a Playground will get the bad access error. Do I need to modify the regex to get it working in Swift?
(Edit: forgot to put in the escapes for the backslashes. I had that in my code.)
I don't think that the conditional test (?(1)...|...) is available, but it is not really a problem since it is not needed. This pattern does the same:
let pattern = "'(?:(?<=[^\\s[{(-]')|(?=\\s | s\\b))"
Note: if it doesn't work, try to double escape the opening square bracket in the character class.