Regex to capture most of alternating pattern - regex

I have the following file names which should pass through regex
6505208533_95d2834be5_b#2x.jpg
6505208533_95d2834be5_b~ipad.jpg
6505208533_95d2834be5_b~ipad#2x.jpg
6505218557_8407260688_b#2x.png
6505218557_8407260688_b~ipad.png
6505218557_8407260688_b~ipad#2x.png
6505237749_b71c648be2_b#2x.jpg
6505237749_b71c648be2_b~ipad.jpg
6505237749_b71c648be2_b~ipad#2x.jpg
The following regex should capture all file name suffixes: ~ipad#2x, #2x and ~ipad.
(.+)(#2x|~ipad|~ipad#2x)\.(?:jpg|png)
However, it does NOT capture ~ipad#2x. How to solve it?

You should use the lazy operator after .+:
(.+?)
instead of:
(.+)
Otherwise it will try to be greedy and match the longest possible string (demo).

The better and more semantically correct solution is to simply change the order of your suffixes since your combo suffix "~ipad#2x" is never reached in the search because it is a combination of the other two which always match first:
(.+?)(~ipad#2x|#2x|~ipad)\.(?:jpg|png)

Related

Regex to match path containing one of two strings

RegEx to match one of two strings in the third segment, ie in pseudo code:
/content/au/(boomer or millenial)/...
Example matches
/content/au/boomer
/content/au/boomer/male/31
/content/au/millenial/female/29/M
/content/au/millenial/male/18/UM
Example non-matches
/content/au
/content/nz/millenial/male/18/UM
/content/au/genz/male
I've tried this, but to no avail:
^/content/au/(?![^/]*/(?:millenial|boomer))([^/]*)
Don't use a look ahead; just use the plain alternation millenial|boomer then a word-boundary:
^/content/au/(?:millenial|boomer)\b(?:/.*)?
See live demo.
You should probably spell millennial correctly too (two "n"s, not one).
What's with the negative lookahead? This is a simple, if not trivial, positive match.
^/content/au/(?:millenial|boomer)(?:/|$)
The final group says the match needs to be followed by a slash or nothing, so as to exclude paths which begin with one of the alternatives, but contain additional text.
You can use the following regex DEMO
content/au/(?:boomer|millenial)

Regex expression to exclude both prefix and suffix

I'm trying to build an expression which will match all text EXCLUDING text with prefix 'abc' AND suffix 'def' (text which only has the prefix OR the suffix is ok).
I've tried the following:
^(?!([a][b][c]])).*(?!([d][e][f])$), but it doesn't match text which only has one of the criterias (i.e. abc.xxx fails, as well as xxx.pdf, though they should pass)
I understand the answer is related to 'look behind' but i'm still not quite sure how to achieve this behavior
I've also tried the following:
^(?<!([a][b][c])).*(?!([d][e][f])$), but again, with no luck
^((abc.*\.(?!def))|((?!abc).*\.def))$
I think there can be a simpler solution, but this one will work as you wanted it.
[a][b][c] can be simplified to abc, the same goes for def.
The first part of the pattern matches abc.*\. without def at the end.
The second part matches .*\.def without the prefix abc.
Here is a visual representation of the pattern:
Debuggex Demo
Keep it simple and combine it into a single lookahead to check both conditions:
^(?!abc.*def$).*

Regex expression to extract everything inside brackets

I need to extract content inside brackets () from the following string in C++;
#82=IFCCLASSIFICATIONREFERENCE($,'E05.11.a','Rectangular',#28);
I tried following regex but it gives an output with brackets intact.
std::regex e2 ("\\((.*?)\\)");
if (std::regex_search(sLine,m,e2)){
}
Output should be:
$,'E05.11.a','Rectangular',#28
The result you are looking for should be in the first matching subexpression, i.e. comprised in the [[1].first, m[1].second) interval.
This is because your regex matches also the enclosing parentheses, but you specified a grouping subexpression, i.e. (.*?). Here is a starting point to some documentation
Use lookaheads: "(?<=\\()[^)]*?(?=\\))". Watch out, as this won't work for nested parentheses.
You can also use backreferences.
(?<=\().*(?=\))
Try this i only tested in one tester but it worked. It basically looks for any character after a ( and before a ) but not including them.

How to distinguish between saved segment and alternative?

From the following text...
Acme Inc.<SPACE>12345<SPACE or TAB>bla bla<CRLF>
... I need to extract company name + zip code + rest of the line.
Since either a TAB or a SPACE character can separate the second from the third tokens, I tried using the following regex:
FIND:^(.+) (\d{5})(\t| )(.+)$
REPLACE:\1\t\2\t\3
However, the contents of the alternative part is put in the \3 part, so the result is this:
Acme Inc.<TAB>12345<TAB><TAB or SPACE here>$
How can I tell the (Perl) regex engine that (\t| ) is an alternative instead of a token to be saved in RAM?
Thank you.
You want:
^(.+?) (\d{5})[\t ](.+)$
Since you are matching one character or the other, you can use a character class instead. Also, I made your first quantifier non-greedy (+? instead of +) to reduce the amount of backtracking the engine has to do to find the match.
In general, if you want to make capture groups not capture anything, you can add ?: to it, like:
^(.+?) (\d{5})(?:\t| )(.+)$
Use non-capturing parentheses:
^(.+) (\d{5})(?:\t| )(.+)$
One way is to use \s instead of ( |\t) which will match any whitespace char.
See Backslash-sequences for how Perl defines "whitespace".

How to match a string that does not end in a certain substring?

how can I write regular expression that dose not contain some string at the end.
in my project,all classes that their names dont end with some string such as "controller" and "map" should inherit from a base class. how can I do this using regular expression ?
but using both
public*.class[a-zA-Z]*(?<!controller|map)$
public*.class*.(?<!controller)$
there isnt any match case!!!
Do a search for all filenames matching this:
(?<!controller|map|anythingelse)$
(Remove the |anythingelse if no other keywords, or append other keywords similarly.)
If you can't use negative lookbehinds (the (?<!..) bit), do a search for filenames that do not match this:
(?:controller|map)$
And if that still doesn't work (might not in some IDEs), remove the ?: part and it probably will - that just makes it a non-capturing group, but the difference here is fairly insignificant.
If you're using something where the full string must match, then you can just prefix either of the above with ^.* to do that.
Update:
In response to this:
but using both
public*.class[a-zA-Z]*(?<!controller|map)$
public*.class*.(?<!controller)$
there isnt any match case!!!
Not quite sure what you're attempting with the public/class stuff there, so try this:
public.*class.*(?<!controller|map)$`
The . is a regex char that means "anything except newline", and the * means zero or more times.
If this isn't what you're after, edit the question with more details.
Depending on your regex implementation, you might be able to use a lookbehind for this task. This would look like
(?<!SomeText)$
This matches any lines NOT having "SomeText" at their end. If you cannot use that, the expression
^(?!.*SomeText$).*$
matches any non-empty lines not ending with "SomeText" as well.
You could write a regex that contains two groups, one consists of one or more characters before controller or map, the other contains controller or map and is optional.
^(.+)(controller|map)?$
With that you may match your string and if there is a group() method in the regex API you use, if group(2) is empty, the string does not contain controller or map.
Check if the name does not match [a-zA-Z]*controller or [a-zA-Z]*map.
finally I did it in this way
public.*class.*[^(controller|map|spec)]$
it worked