Combine two regexes - regex

I have two regular expressions which are working perfectly fine when use independently.
First : ^.+\.((jpeg)|(gif)|(png))\s*$ : search for .jpeg,gif or png at the end of url
Second : ^.+(javax.faces.resource|rfRes).* : search for two url patterns
I want to combine the above two expressions such that " url ends in any of the image " OR "url has javax.faces.resource or rfRes in its path"
I tried using | operator to join both but it seems its not working like below :
^.+\.((jpeg)|(gif)|(png))\s*$ | ^.+(javax.faces.resource|rfRes).*
but its not working.
Can anybody please help in joining above two regex ?

You have extra spaces around the | operator:
Your original regex
^.+\.((jpeg)|(gif)|(png))\s*$ | ^.+(javax.faces.resource|rfRes).*
^.+\.((jpeg)|(gif)|(png))\s*$|^.+(javax.faces.resource|rfRes).*
Fixed regex ^
|
Your solution will try to match "the end of the string and then a space," or "a space and then the beginning of the string." Remember, whitespace is significant in regexes.

The spaces in your combined expression are erroneous. You are requiring a space after end of line or before beginning of line, which is impossible in line-oriented input.
As a further improvement, you can remove the superfluous "anything" parts of the match, as well as a good number of redundant parentheses.
javax\.faces\.resource|rfRes|\.(jpeg|gif|png)\s*$
Notice also the proper quoting of literal full stop characters (a lone . matches any character).

Related

Regular Expression to Match List of File Extensions

I would like to have a regular expression that will match a list of file extensions that are delimited with a pipe | such as doc|xls|pdf This list could also just be a single extension such as pdf or it could also be a wild card * or ? I would also like to exclude the | at the start or the end of the list and also not match the \<>/:" characters.
I have tried the following but it doesn't account for a single * wildcard.
^([^|\\<>\/:"]|[^\\<>:"])[^\/\\<>:"]*[^|\/\\<>:"]$
I have been on one of the online testers but can't seem to get over the final hurdle. If someone could point me in the right direction I would be most grateful.
You can construct this from smaller building blocks. A single extension, excluding the characters you mention, would be:
[^\\<>/:"]+
We should probably also exclude | since that's our delimiter:
[^\\<>/:"|]+
This can automatically match wildcards as well, since they're not forbidden.
To construct the |-separated list from those is then easy:
[^\\<>/:"|]+
followed by an arbitrary number of the same thing with a | before that:
[^\\<>/:"|]+(\|[^\\<>/:"|]+)*
And if you want a complete string to match this, add the ^ and $ anchors:
^[^\\<>/:"|]+(\|[^\\<>/:"|]+)*$

regular expression to match either or 2 patterns [duplicate]

I have two regular expressions which are working perfectly fine when use independently.
First : ^.+\.((jpeg)|(gif)|(png))\s*$ : search for .jpeg,gif or png at the end of url
Second : ^.+(javax.faces.resource|rfRes).* : search for two url patterns
I want to combine the above two expressions such that " url ends in any of the image " OR "url has javax.faces.resource or rfRes in its path"
I tried using | operator to join both but it seems its not working like below :
^.+\.((jpeg)|(gif)|(png))\s*$ | ^.+(javax.faces.resource|rfRes).*
but its not working.
Can anybody please help in joining above two regex ?
You have extra spaces around the | operator:
Your original regex
^.+\.((jpeg)|(gif)|(png))\s*$ | ^.+(javax.faces.resource|rfRes).*
^.+\.((jpeg)|(gif)|(png))\s*$|^.+(javax.faces.resource|rfRes).*
Fixed regex ^
|
Your solution will try to match "the end of the string and then a space," or "a space and then the beginning of the string." Remember, whitespace is significant in regexes.
The spaces in your combined expression are erroneous. You are requiring a space after end of line or before beginning of line, which is impossible in line-oriented input.
As a further improvement, you can remove the superfluous "anything" parts of the match, as well as a good number of redundant parentheses.
javax\.faces\.resource|rfRes|\.(jpeg|gif|png)\s*$
Notice also the proper quoting of literal full stop characters (a lone . matches any character).

PowerGREP - regular expression

I have log of Apache and each line of file looks like:
script.php?variable1=value1&variable2=value2&variable3=value3&.........................
I need to take out this part of string:
variable1=value1&variable2=value2
and ignore the rest of line. How I can do this in PowerGREP?
I tried:
variable1=(.*)&variable2=(.*)&
But I get rest of line after value2.
Please help me, sorry for my english.
Contrary to what Ed Cottrell wrote about his second example, the first one works better (i. e. correctly); this is because if the subexpression for value2 is made non-greedy, it matches as few characters as possible, i. e. not any.
If you wouldn't mind having the & after value2 included in the match, you could as well hone your try by making the subexpression for value2 non-greedy, so that it only extends to the next &:
variable1=(.*)&variable2=(.*?)&
Replace . with [^&] and drop the final &, like this:
variable1=(.*)&variable2=([^&]*)
. will match anything it can (any character except for the newline character, basically). [^&], on the other hand, matches only characters that are not &.
For even better results and faster performance, you can also replace the first . in the same way and add ? (the non-greedy qualifier), like so:
variable1=([^&]*?)&variable2=([^&]*?)
Here's a working demo.

Regex Group not starting with

I'm having trouble to compute 2 regex in one (used to deal with .ini files)
I've got this one (I suggest you to use rubular with theses examples to understand)
^(?<key>[^=;\r\n]+)=((?<value>\"*.*;*.*\"[^;\r\n]*);?(?<comment>.*)[^\r\n]*)
to match :
This="isnot;acomment"
This="isa";comment
This="isa;special";case
And I've got this one :
^(?<key>[^=;\r\n]+)=(?<value>[^;\r\n]*);?(?<comment>[^\r\n]*)
to match
This=isasimplecase
This=isasimple;comment
And I'm trying to merge the 2 regex, sadly I do not manage to say "If my value group is not starting with \" use the second one if not use the first one".
Right now i've got this :
^(?<key>[^=;\r\n]+)=(((?<value>\"*.*;*.*\"[^;\r\n]*);?(?<comment>.*)[^\r\n]*)|(?<value>[^;\r\n]*);?(?<comment>[^\r\n]*))
But it's creating 2 more sections unnamed for the simple case without quoted. I was thinking that maybe by adding "the first item of the value group for the simple case must not start with \". But I didn't manage to do it.
PS : I suggest you to use rubular to understand better my problem. Sorry if I wasn't clear enough
How about this?
^(?<key>[^=;\r\n]+)=(?<value>"[^"]*"|[^;\n\r]*);?(?<comment>.*)
DEMO
(?<key>[^=;\r\n]+) Matches the part before the = symbol.
"[^"]*" Matches the string within the double quotes , ex strings like "foobar". If there is no " then the regex engine move on to the next pattern that is [^;\n\r]* and it matches upto the first ; or newline or \r character. These matched characters are stored into a named group called value.
;? Optional semicolon.
(?<comment>.*) Remaining characters are stored into the comment group.

Find and trim part of what is found using regular expression

I'm a newbie in writing regular expressions
I have a file name like this TST0101201304-123.txt and my target is to get the numbers between '-' and '.txt'
So I wrote this formula -([0-9]*)\.txt this will get me the numbers that I want, but in addition, it is retrieving the highfin '-' and the last part of the string also '.txt' so the result in the example above is '-123.txt'
So my question is:
Is there a way in regular expressions to get only part of the matched string, like a submatch of the match without the need to trim it in my shell script code for unix?
I found this answer but it is getting the same result:
Regexp: Trim parts of a string and return what ever is left
Tip: To test my regular expressions is used this website
You can use lookbehind and lookahead
(?<=-)[0-9]*(?=[.]txt)
Don't know if it would work in unix
Different regex-engines are different. Since you're using expr match, you need to make two changes:
expr match expects a regex that matches the entire string; so, you need to add .* at the beginning of yours, to cover everything before the hyphen.
expr match uses POSIX Basic Regular Expressions (BREs), which use \( and \) for grouping (and capturing) rather than merely ( and ).
But, conveniently, when you give expr match a regex that contains a capture-group, its output is the content of that capture-group; you don't need to do anything else special. So:
$ expr match TST0101201304-123.txt '.*-\([0-9]*\)\.txt'
123
sed is your friend.
echo filename | sed -e 's/-\([0-9]*\)/\1'
should get you what you want.