RegEx capture specific string - regex

Hi everyone, I have a problem. I'm trying to use regex to get all lines that are not end with
_0.jpg_s.jpg_m.jpg_l.jpg
Example Lines:
9Uikt/ifehr54mg__0.jpg9Uikt/idg4hdmg2_s.jpg9Uikt/igdffgggfmg4_m.jpg9Uikt/img3teg3gegg7_l.jpg9Uikt/imgerhw45h70.jpg9Uikt/imggq4ge37s.jpg9Uikt/img3f37m.jpg9Uikt/img34g3f7l.jpg9Uikt/imgf3f34t4t73l_.jpg9Uikt/imgf3f34t4t73l_2.jpg
The bold ones I am trying to get.
Between 9Uikit/ and .jpg any character can happen, except the characters that are not allowed for file names
"*:<>?/\|
I have tried this code
.*(_(?![0-9][a-zA-Z])).*\.jpg

You can use
^(?!.*_[0sml]\.jpg$).+\.jpg$
See the regex demo
Details
.* - any zero or more chars other than line break chars, as many as possible
(?!.*_[0sml]\.jpg$) - a negative lookahead that fails the match if, immediately to the right of the current location, there are
.* - any zero or more chars other than line break chars, as many as possible
_ - an underscore
[0sml] - a 0, s, m, l char
\. - a dot
jpg - jpg string
$ - end of string anchor
.+\.jpg$ - any one or more chars other than line break chars, as many as possible, .jpg string and end of string.
Or, .*(?<!_[0sml])\.jpg$ if you can afford a lookbehind:
.*(?<!_[0sml])\.jpg$
See this regex demo. Details:
.* - any zero or more chars other than line break chars, as many as possible
(?<!_[0sml]) - no _ and 0, s, m or l char immediately on the left is allowed
-\.jpg$ - .jpg at the end of string.

Your problem statement is perhaps a little light on what other combinations you might want to reject. From what I can see you don't want to match '7_' in which case the negative lookahead is the thing to do
A Perl regex might look like
if(m/img(?!7_).*jpg/) {print}
Note that SED and AWK don't support negative lookahead, but you did not say how you were executing the regex

You can simply use this regex, which I think is simpler to read and understand.
\w+\/?.*?_\w\.jpg
const testCases = [
"9Uikt/img7_0.jpg",
"9Uikt/img7_s.jpg",
"9Uikt/img7_m.jpg",
"9Uikt/img7_l.jpg",
"9Uikt/img70.jpg",
"9Uikt/img7s.jpg",
"9Uikt/img7m.jpg",
"9Uikt/img7l.jpg",
"9Uikt/img7l_.jpg"
];
const re = /\w+\/?.*?_\w\.jpg/gi;
testCases.forEach(tc => {
if (tc.match(re)) {
console.log("Matched: " + tc);
} else {
console.log("Not matched: " + tc);
}
});

Related

How do I make this regular expression not match anything after forward slash /

I have this regular expression:
/^www\.example\.(com|co(\.(in|uk))?|net|us|me)\/?(.*)?[^\/]$/g
It matches:
www.example.com/example1/something
But doesn't match
www.example.com/example1/something/
But the problem is that, it matches: I do not want it to match:
www.example.com/example1/something/otherstuff
I just want it to stop when a slash is enountered after "something". If there is no slash after "something", it should continue matching any character, except line breaks.
I am a new learner for regex. So, I get confused easily with those characters
You may use this regex:
^www\.example\.(?:com|co(?:\.(?:in|uk))?|net|us|me)(?:\/[^\/]+){2}$
RegEx Demo
This will match following URL:
www.example.co.uk/example1/something
You can use
^www\.example\.(?:com|co(?:\.(?:in|uk))?|net|us|me)\/([^\/]+)\/([^\/]+)$
See the regex demo
The (.*)? part in your pattern matches any zero or more chars, so it won't stop even after encountering two slashes. The \/([^\/]+)\/([^\/]+) part in the new pattern will match two parts after slash, and capture each part into a separate group (in case you need to access those values).
Details:
^ - start of string
www\.example\. - www.example. string
(?:com|co(?:\.(?:in|uk))?|net|us|me) - com, co.in, co.uk, co, net, us, me strings
\/ - a / char
([^\/]+) - Group 1: one or more chars other than /
\/ - a / char
([^\/]+) - Group 2: one or more chars other than /
$ - end of string.

Regex With Conditional - Not Desired Output

Was actually glossing over a question and found myself struggling to perform something really simple.
If a string contains % I want to use a particular regex, else I want to use a different one.
I tried the following: https://regex101.com/r/UvFZpo/1/
Regex: (%)(?(1)[^$]+|[^%]+).
Test string: abc%
But I'm not getting the expected results.
I was expecting to see abc% matched as it contains %.
If the string was, abc$, I'd expect it to use the second expression.
Where am I going wrong?
Regex parses strings from left to right, position by position.
Once your pattern matches &, its index is at the end of string, hence, it fails since there are no more chars to be matched by the subsequent [^$]+ pattern.
You can use a mere alternation here:
^(?:([^$]*%[^$]*)|([^%]+))$
See the regex demo
If the string contains %, the Group 1 will be populated, else, Group 2 will.
Details
^ - start of string
(?:([^$]*%[^$]*)|([^%]+)) - either of the two alternatives:
([^$]*%[^$]*) - Group 1: any 0+ chars other than $, as many as possible, % any 0+ chars other than $, as many as possible,
| - or
([^%]+) - any 1+ chars other than %, as many as possible
$ - end of string.

Kotlin / Regex - Replace a group of pattern with a repeating character

I would like to mask the email passed in the maskEmail function. I'm currently facing a problem wherein the asterisk * is not repeating when i'm replacing group 2 and and 4 of my pattern.
Here is my code:
fun maskEmail(email: String): String {
return email.replace(Regex("(\\w)(\\w*)\\.(\\w)(\\w*)(#.*\\..*)$"), "$1*.$3*$5")
}
Here is the input:
tom.cat#email.com
cutie.pie#email.com
captain.america#email.com
Here is the current output of that code:
t*.c*#email.com
c*.p*#email.com
c*.a*#email.com
Expected output:
t**.c**#email.com
c****.p**#email.com
c******.a******#email.com
Edit:
I know this could be done easily with for loop but I would need this to be done in regex. Thank you in advance.
For your problem, you need to match each character in the email address that not is the first character in a word and occurs before the #. You can do that with a negative lookbehind for a word break and a positive lookahead for the # symbol:
(?<!\b)\w(?=.*?#)
The matched characters can then be replaced with *.
Note we use a lazy quantifier (?) on the .* to improve efficiency.
Demo on regex101
Note also as pointed out by #CarySwoveland, you can replace (?<!\b) with \B i.e.
\B\w(?=.*?#)
Demo on regex101
As pointed out by #Thefourthbird, this can be improved further efficiency wise by replacing the .*? with a [^\r\n#]* i.e.
\B\w(?=[^\r\n#]*#)
Demo on regex101
Or, if you're only matching single strings, just [^#]*:
\B\w(?=[^#]*#)
Demo on regex101
I suggest keeping any char at the start of string and a combination of a dot + any char, and replace any other chars with * that are followed with any amount of characters other than # before a #:
((?:\.|^).)?.(?=.*#)
Replace with $1*. See the regex demo. This will handle emails that happen to contain chars other than just word (letter/digit/underscore) and . chars.
Details
((?:\.|^).)? - an optional capturing group matching a dot or start of string position and then any char other than a line break char
. - any char other than a line break char...
(?=.*#) - if followed with any 0 or more chars other than line break chars as many as possible and then #.
Kotlin code (with a raw string literal used to define the regex pattern so as not to have to double escape the backslash):
fun maskEmail(email: String): String {
return email.replace(Regex("""((?:\.|^).)?.(?=.*#)"""), "$1*")
}
See a Kotlin test online:
val emails = arrayOf<String>("captain.am-e-r-ica#email.com","my-cutie.pie+here#email.com","tom.cat#email.com","cutie.pie#email.com","captain.america#email.com")
for(email in emails) {
val masked = maskEmail(email)
println("${email}: ${masked}")
}
Output:
captain.am-e-r-ica#email.com: c******.a*********#email.com
my-cutie.pie+here#email.com: m*******.p*******#email.com
tom.cat#email.com: t**.c**#email.com
cutie.pie#email.com: c****.p**#email.com
captain.america#email.com: c******.a******#email.com

A regular expression for matching a group followed by a specific character

So I need to match the following:
1.2.
3.4.5.
5.6.7.10
((\d+)\.(\d+)\.((\d+)\.)*) will do fine for the very first line, but the problem is: there could be many lines: could be one or more than one.
\n will only appear if there are more than one lines.
In string version, I get it like this: "1.2.\n3.4.5.\n1.2."
So my issue is: if there is only one line, \n needs not to be at the end, but if there are more than one lines, \n needs be there at the end for each line except the very last.
Here is the pattern I suggest:
^\d+(?:\.\d+)*\.?(?:\n\d+(?:\.\d+)*\.?)*$
Demo
Here is a brief explanation of the pattern:
^ from the start of the string
\d+ match a number
(?:\.\d+)* followed by dot, and another number, zero or more times
\.? followed by an optional trailing dot
(?:\n followed by a newline
\d+(?:\.\d+)*\.?)* and another path sequence, zero or more times
$ end of the string
You might check if there is a newline at the end using a positive lookahead (?=.*\n):
(?=.*\n)(\d+)\.(\d+)\.((\d+)\.)*
See a regex demo
Edit
You could use an alternation to either match when on the next line there is the same pattern following, or match the pattern when not followed by a newline.
^(?:\d+\.\d+\.(?:\d+\.)*(?=.*\n\d+\.\d+\.)|\d+\.\d+\.(?:\d+\.)*(?!.*\n))
Regex demo
^ Start of string
(?: Non capturing group
\d+\.\d+\. Match 2 times a digit and a dot
(?:\d+\.)* Repeat 0+ times matching 1+ digits and a dot
(?=.*\n\d+\.\d+\.) Positive lookahead, assert what follows a a newline starting with the pattern
| Or
\d+\.\d+\. Match 2 times a digit and a dot
(?:\d+\.)* Repeat 0+ times matching 1+ digits and a dot
*(?!.*\n) Negative lookahead, assert what follows is not a newline
) Close non capturing group
(\d+\.*)+\n* will match the text you provided. If you need to make sure the final line also ends with a . then (\d+\.)+\n* will work.
Most programming languages offer the m flag. Which is the multiline modifier. Enabling this would let $ match at the end of lines and end of string.
The solution below only appends the $ to your current regex and sets the m flag. This may vary depending on your programming language.
var text = "1.2.\n3.4.5.\n1.2.\n12.34.56.78.123.\nthis 1.2. shouldn't hit",
regex = /((\d+)\.(\d+)\.((\d+)\.)*)$/gm,
match;
while (match = regex.exec(text)) {
console.log(match);
}
You could simplify the regex to /(\d+\.){2,}$/gm, then split the full match based on the dot character to get all the different numbers. I've given a JavaScript example below, but getting a substring and splitting a string are pretty basic operations in most languages.
var text = "1.2.\n3.4.5.\n1.2.\n12.34.56.78.123.\nthis 1.2. shouldn't hit",
regex = /(\d+\.){2,}$/gm;
/* Slice is used to drop the dot at the end, otherwise resulting in
* an empty string on split.
*
* "1.2.3.".split(".") //=> ["1", "2", "3", ""]
* "1.2.3.".slice(0, -1) //=> "1.2.3"
* "1.2.3".split(".") //=> ["1", "2", "3"]
*/
console.log(
text.match(regex)
.map(match => match.slice(0, -1).split("."))
);
For more info about regex flags/modifiers have a look at: Regular Expression Reference: Mode Modifiers

How do I match from the last occurring set of characters in an OR statement?

I have this string I want to test again:
<<Hello>>
<<I am Going->To>>
expected matches:
Hello
To
and I'm using this pattern:
(?<=->|<<)(?:.+)(?=\>{2}|->)
What I want is that it matches a string after -> and ending before >>. Unless the -> doesn't exist, then I want to match it with << instead.
But the or statement I have written - (?<=->|<<) - starts matching << immediately. I hoped it would look in order through the entire string but unfortunately it looks at both alternatives at the same time, which does make more sense.
How would I approach this?
Try Regex: (?!.*->)(?<=->|<<)(?:.+)(?=>>)
Demo
You might use:
(?<=<<)(?:(?!->).)+(?=>>)|(?<=->).*?(?=>>)
Regex demo
Explanation
(?<=<<) Positive lookbehind, assert that is directly on the left is
(?:(?!->).)+ Match any char if what follows is not ->
(?=>>) Assert what is directly on the right is >>
| Or
(?<=->) Positive lookbehind, assert what is directly on the left is
.*? Match any char non greedy
(?=>>) Positive lookahead, assert what is directly on the right is >>
You may use
.*(?:->|<<)(.+)[>-]>
and get Group 1 value. See the regex demo.
Details
.* - match any 0+ chars other than line break chars, as many as possible
(?:->|<<) - match either -> or <<
(.+) - from the current location, capture into Group 1 any 0+ chars other than line break chars, as many as possible
[>-]> - match > or - followed with >.
C#:
var result = Regex.Match(s, #".*(?:->|<<)(.+)[>-]>")?.Groups[1].Value;