How can I replace the last word using Regex? - regex

I have a String extension:
func replaceLastWordWithUsername(_ username: String) -> String {
let pattern = "#*[A-Za-z0-9]*$"
do {
Log.info("Replacing", self, username)
let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options.caseInsensitive)
let range = NSMakeRange(0, self.characters.count)
return regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: username )
} catch {
return self
}
}
let oldString = "Hey jess"
let newString = oldString.replaceLastWordWithUsername("#jessica")
newString now equals Hey #jessica #jessica. The expected result should be Hey #jessica

I think it's because the * regex operator will
Match 0 or more times. Match as many times as possible.
This might be causing it to also match the 'no characters at the end' in addition to the word at the end, resulting in two replacements.
As mentioned by #Code Different, if you use let pattern = "\\w+$" instead, it will only match if there are characters, eliminating the 'no characters' match.
"Word1 Word2"
^some characters and then end
^0 characters and then end

Use this regex:
(?<=\s)\S+$
Sample: https://regex101.com/r/kGnQEM/1
/(?<=\s)\S+$/g
Positive Lookbehind (?<=\s)
Assert that the Regex below matches
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\S+ matches any non-whitespace character (equal to [^\r\n\t\f ])
Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string, or before the line
terminator right at the end of the string (if any)

Just change your pattern:
let pattern = "\\w+$"
\w matches any word character, i.e [A-Za-z0-9]
+ means one or more

Related

match everything but a given string and do not match single characters from that string

Let's start with the following input.
Input = 'blue, blueblue, b l u e'
I want to match everything that is not the string 'blue'. Note that blueblue should not match, but single characters should (even if present in match string).
From this, If I replace the matches with an empty string, it should return:
Result = 'blueblueblue'
I have tried with [^\bblue\b]+
but this matches the last four single characters 'b', 'l','u','e'
Another solution:
(?<=blue)(?:(?!blue).)+(?=blue|$)|^(?:(?!blue).)+(?=blue|$)
Regex demo
If you regex engine support the \K flag, then we can try:
/blue\K|.*?(?=blue|$)/gm
Demo
This pattern says to match:
blue match "blue"
\K but then forget that match
| OR
.*? match anything else until reaching
(?=blue|$) the next "blue" or the end of the string
Edit:
On JavaScript, we can try the following replacement:
var input = "blue, blueblue, b l u e";
var output = input.replace(/blue|.*?(?=blue|$)/g, (x) => x != "blue" ? "" : "blue");
console.log(output);

How to match in a single/common Regex Group matching or based on a condition

I would like to extract two different test strings /i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
and
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8
with a single RegEx and in Group-1.
By using this RegEx ^.[i,na,fm,d]+\/(.+([,\/])?(\/|.+=.+,\/).+\/[,](live.([^,]).).+_)?.+(640).*$ I can get the second string to match the desired result int/2021/11/25/,live_20211125_215206_
but the first string does not match in Group-1 and the missing expected test string 1 extraction is int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45
Any pointers on this is appreciated.
Thanks!
If you want both values in group 1, you can use:
^/(?:[id]|na|fm)/([^/\s]*/\d{4}/\d{2}/\d{2}/\S*?)(?:/,|[^_]+_)640(?:\D|$)
The pattern matches:
^ Start of string
/ Match literally
(?:[id]|na|fm) Match one of i d na fm
/ Match literally
( Capture group 1
[^/\s]*/ Match any char except a / or a whitespace char, then match /
\d{4}/\d{2}/\d{2}/ Match a date like pattern
\S*? Match optional non whitespace chars, as few as possible
) Close group 1
(?:/,|[^_]+_) Match either /, or 1+ chars other than _ and then match _
640 Match literally
(?:\D|$) Match either a non digits or assert end of string
See a regex demo and a go demo.
We can't know all the rules of how the strings your are matching are constructed, but for just these two example strings provided:
package main
import (
"fmt"
"regexp"
)
func main() {
var re = regexp.MustCompile(`(?m)(\/i/int/\d{4}/\d{2}/\d{2}/.*)(?:\/,|_[\w_]+)640`)
var str = `
/i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8`
match := re.FindAllStringSubmatch(str, -1)
for _, val := range match {
fmt.Println(val[1])
}
}

Why does the regex [a-zA-Z]{5} return true for non-matching string?

I defined a regular expression to check if the string only contains alphabetic characters and with length 5:
use regex::Regex;
fn main() {
let re = Regex::new("[a-zA-Z]{5}").unwrap();
println!("{}", re.is_match("this-shouldn't-return-true#"));
}
The text I use contains many illegal characters and is longer than 5 characters, so why does this return true?
You have to put it inside ^...$ to match the whole string and not just parts:
use regex::Regex;
fn main() {
let re = Regex::new("^[a-zA-Z]{5}$").unwrap();
println!("{}", re.is_match("this-shouldn't-return-true#"));
}
Playground.
As explained in the docs:
Notice the use of the ^ and $ anchors. In this crate, every expression is executed with an implicit .*? at the beginning and end, which allows it to match anywhere in the text. Anchors can be used to ensure that the full text matches an expression.
Your pattern returns true because it matches any consecutive 5 alpha chars, in your case it matches both 'shouldn't' and 'return'.
Change your regex to: ^[a-zA-Z]{5}$
^ start of string
[a-zA-Z]{5} matches 5 alpha chars
$ end of string
This will match a string only if the string has a length of 5 chars and all of the chars from start to end fall in range a-z and A-Z.

Regex Express Return All Chars before a '/' but if there are 2 '/' Return all before that

I have been trying to get a regex expression to return me the following in the following situations.
XX -> XX
XXX -> XXX
XX/XX -> XX
XX/XX/XX -> XX/XX
XXX/XXX/XX -> XXX/XXX
I had the following Regex, however they do no work.
^[^/]+ => https://regex101.com/r/xvCbNB/1
=========
([A-Z])\w+ => https://regex101.com/r/xvCbNB/2
They are close but are not there.
Any Help would be appreciated.
You want to get all text from the start till the last occurrence of a specific character or till the end of string if the character is missing.
Use
^(?:.*(?=\/)|.+)
See the regex demo and the regex graph:
Details
^ - start of string
(?:.*(?=\/)|.+) - a non-capturing group that matches either of the two alternatives, and if the first one matches first the second won't be tried:
.*(?=\/) - any 0+ chars other than line break chars, as many as possible upt to but excluding /
| - or
.+ - any 1+ chars other than line break chars, as many as possible.
It will be easier to use a replace here to match / followed by non-slash characters before end of line:
Search regex:
/[^/]*$
Replacement String:
""
Updated RegEx Demo 1
If you're looking for a regex match then use this regex:
^(.*?)(?:/[^/]*)?$
Updated RegEx Demo 2
Any special reason it has to be a regular expression? How about just splitting the string at the slashes, remove the last item and rejoin:
function removeItemAfterLastSlash(string) {
const list = string.split(/\//);
if (list.length == 1) [
return string;
}
list.pop();
return list.join("/");
}
Or look for the last slash an remove it:
function removeItemAfterLastSlash(string) {
const index = string.lastIndexOf("/");
if (index === -1) {
return string;
}
return string.splice(0, index);
}

Regex catch word at the start and end of a UITextView

I'm trying to catch when a word is used in a UITextView. I've got it working for words in the interior of the view.
The problem is when the word is first or last in the view.
My code so far:
private func filteredTermFor(_ word: String) -> String {
let punctuationFilter = "([\\A|\\W|\\d|\\z| ])"
let wordInParens = "(\(word))"
return punctuationFilter + wordInParens + punctuationFilter
}
I checked and found I should use ^ for the start of input and $ for the end of input. When I add either of these, for example:
"([^|\\A|\\W|\\d|\\z| ])"
they don't seem to have any effect when the word in question is the first or last in the view.
*For the sake of being verbose with my question, the return value from the function above is being used as searchTerm in this:
func highlightedTextInString(with searchTerm: String, targetString: String) -> NSAttributedString? {
let attributedString = NSMutableAttributedString(string: targetString)
do {
let regex = try NSRegularExpression(pattern: searchTerm, options: .caseInsensitive)
let range = NSRange(location: 0, length: targetString.utf16.count)
for match in regex.matches(in: targetString, options: .withTransparentBounds, range: range) {
let fontColor = UIColor.red
attributedString.addAttribute(NSForegroundColorAttributeName, value: fontColor, range: match.range)
}
return attributedString
} catch _ {
print("Error creating regular expression")
return nil
}
}
** Edit **
Since this was marked as a duplicate
The question this was reported a duplicate of does not cover edge cases when the word is typed next to a punctuation mark or digit without spaces.
For example:
.word , word9 , ?word?
Note that ([^|\\A|\\W|\\d|\\z| ]) is a capturing group ((...)) containing a character class that matches a single char defined inside it. The ^ after [ makes the class a negated one, and it matches any char but the one(s) defined in the set. So, [^|\\A|\\W|\\d|\\z| ] matches a single char other than | (it is no longer an alternation operator inside a character class), A (the \ in front is not considered, is omitted), a non-word char, a digit, z and space. It effectively matches _ and any letters other than A and z.
You state that the words you need to match may occur within word boundaries or digits.
You may use
return "(?<![^\\W\\d])(\(word))(?![^\\W\\d])"
See the regex demo.
Here, "(?<![^\\W\\d])" is a negative lookbehind that matches a location that is NOT immediately preceded with a character other than a non-word and a digit char. This sounds cumbersome, but the main point here is that [^\W\d] matches the same texts as \w excluding digits (\w matches letters, digit, and _. So, "(?<![^\\W\\d])" makes sure there is a start of string or a non-letter and non-_ char right before the word. If you allow a word to match after _, just use (?<!\\p{L}) (where \p{L} matches any Unicode letter).
The "(?![^\\W\\d])" is a negative lookahead that makes sure there is an end of string or a non-letter and non-_ (there can be punctuation, symbols and digits) immediately to the right of the word. Again, if you want to match a word if it is followed with _, you may replace this lookahead with "(?!\\p{L})" (just no letter after the word is allowed).