regular expression to match either or 2 patterns [duplicate] - regex

I have two regular expressions which are working perfectly fine when use independently.
First : ^.+\.((jpeg)|(gif)|(png))\s*$ : search for .jpeg,gif or png at the end of url
Second : ^.+(javax.faces.resource|rfRes).* : search for two url patterns
I want to combine the above two expressions such that " url ends in any of the image " OR "url has javax.faces.resource or rfRes in its path"
I tried using | operator to join both but it seems its not working like below :
^.+\.((jpeg)|(gif)|(png))\s*$ | ^.+(javax.faces.resource|rfRes).*
but its not working.
Can anybody please help in joining above two regex ?

You have extra spaces around the | operator:
Your original regex
^.+\.((jpeg)|(gif)|(png))\s*$ | ^.+(javax.faces.resource|rfRes).*
^.+\.((jpeg)|(gif)|(png))\s*$|^.+(javax.faces.resource|rfRes).*
Fixed regex ^
|
Your solution will try to match "the end of the string and then a space," or "a space and then the beginning of the string." Remember, whitespace is significant in regexes.

The spaces in your combined expression are erroneous. You are requiring a space after end of line or before beginning of line, which is impossible in line-oriented input.
As a further improvement, you can remove the superfluous "anything" parts of the match, as well as a good number of redundant parentheses.
javax\.faces\.resource|rfRes|\.(jpeg|gif|png)\s*$
Notice also the proper quoting of literal full stop characters (a lone . matches any character).

Related

Regular Expression to Match List of File Extensions

I would like to have a regular expression that will match a list of file extensions that are delimited with a pipe | such as doc|xls|pdf This list could also just be a single extension such as pdf or it could also be a wild card * or ? I would also like to exclude the | at the start or the end of the list and also not match the \<>/:" characters.
I have tried the following but it doesn't account for a single * wildcard.
^([^|\\<>\/:"]|[^\\<>:"])[^\/\\<>:"]*[^|\/\\<>:"]$
I have been on one of the online testers but can't seem to get over the final hurdle. If someone could point me in the right direction I would be most grateful.
You can construct this from smaller building blocks. A single extension, excluding the characters you mention, would be:
[^\\<>/:"]+
We should probably also exclude | since that's our delimiter:
[^\\<>/:"|]+
This can automatically match wildcards as well, since they're not forbidden.
To construct the |-separated list from those is then easy:
[^\\<>/:"|]+
followed by an arbitrary number of the same thing with a | before that:
[^\\<>/:"|]+(\|[^\\<>/:"|]+)*
And if you want a complete string to match this, add the ^ and $ anchors:
^[^\\<>/:"|]+(\|[^\\<>/:"|]+)*$

Regex Group not starting with

I'm having trouble to compute 2 regex in one (used to deal with .ini files)
I've got this one (I suggest you to use rubular with theses examples to understand)
^(?<key>[^=;\r\n]+)=((?<value>\"*.*;*.*\"[^;\r\n]*);?(?<comment>.*)[^\r\n]*)
to match :
This="isnot;acomment"
This="isa";comment
This="isa;special";case
And I've got this one :
^(?<key>[^=;\r\n]+)=(?<value>[^;\r\n]*);?(?<comment>[^\r\n]*)
to match
This=isasimplecase
This=isasimple;comment
And I'm trying to merge the 2 regex, sadly I do not manage to say "If my value group is not starting with \" use the second one if not use the first one".
Right now i've got this :
^(?<key>[^=;\r\n]+)=(((?<value>\"*.*;*.*\"[^;\r\n]*);?(?<comment>.*)[^\r\n]*)|(?<value>[^;\r\n]*);?(?<comment>[^\r\n]*))
But it's creating 2 more sections unnamed for the simple case without quoted. I was thinking that maybe by adding "the first item of the value group for the simple case must not start with \". But I didn't manage to do it.
PS : I suggest you to use rubular to understand better my problem. Sorry if I wasn't clear enough
How about this?
^(?<key>[^=;\r\n]+)=(?<value>"[^"]*"|[^;\n\r]*);?(?<comment>.*)
DEMO
(?<key>[^=;\r\n]+) Matches the part before the = symbol.
"[^"]*" Matches the string within the double quotes , ex strings like "foobar". If there is no " then the regex engine move on to the next pattern that is [^;\n\r]* and it matches upto the first ; or newline or \r character. These matched characters are stored into a named group called value.
;? Optional semicolon.
(?<comment>.*) Remaining characters are stored into the comment group.

Combine two regexes

I have two regular expressions which are working perfectly fine when use independently.
First : ^.+\.((jpeg)|(gif)|(png))\s*$ : search for .jpeg,gif or png at the end of url
Second : ^.+(javax.faces.resource|rfRes).* : search for two url patterns
I want to combine the above two expressions such that " url ends in any of the image " OR "url has javax.faces.resource or rfRes in its path"
I tried using | operator to join both but it seems its not working like below :
^.+\.((jpeg)|(gif)|(png))\s*$ | ^.+(javax.faces.resource|rfRes).*
but its not working.
Can anybody please help in joining above two regex ?
You have extra spaces around the | operator:
Your original regex
^.+\.((jpeg)|(gif)|(png))\s*$ | ^.+(javax.faces.resource|rfRes).*
^.+\.((jpeg)|(gif)|(png))\s*$|^.+(javax.faces.resource|rfRes).*
Fixed regex ^
|
Your solution will try to match "the end of the string and then a space," or "a space and then the beginning of the string." Remember, whitespace is significant in regexes.
The spaces in your combined expression are erroneous. You are requiring a space after end of line or before beginning of line, which is impossible in line-oriented input.
As a further improvement, you can remove the superfluous "anything" parts of the match, as well as a good number of redundant parentheses.
javax\.faces\.resource|rfRes|\.(jpeg|gif|png)\s*$
Notice also the proper quoting of literal full stop characters (a lone . matches any character).

Regular Expression on Strings

I wrote this regular expression in http://www.regexr.com/
Regular Expression: (^A.*\..\s)\|((\sS.*:\sA.*,\sN.....\s))\|(\sN.+)/g
Text:
AT1G01010.1 | Symbols: ANAC001, NAC001 | NAC domain containing protein 1
| chr1:3760-5630 FORWARD LENGTH=429
I'm able to detect the 1st String|2nd String| 3rd String| in the above text.
I would like to eliminate the 2nd part (" Symbols: ANAC001, NAC001 ") in the above text using the regular expression. Could anyone help? Or I need a regular expression to detect only the 1st and 3rd String.
Consider the following regex since you are already using the beginning of string ^ anchor.
^(A[^|]+)\s\|[^|]+\|\s*([^|]+)\s\|
Live Demo
What exactly are you trying to do? the regular ex that you provide that will search the whole text and return you the one that match. so you are treating the regex as a whole. if you want grab just the 1st part and the 3st part, then you need to do two seperate regex on the same text twice and merge the result together.
try ?:
(^A.*\..\s)\|(?:\sS.*:\sA.*,\sN.....\s)\|(\sN.+)

Find and trim part of what is found using regular expression

I'm a newbie in writing regular expressions
I have a file name like this TST0101201304-123.txt and my target is to get the numbers between '-' and '.txt'
So I wrote this formula -([0-9]*)\.txt this will get me the numbers that I want, but in addition, it is retrieving the highfin '-' and the last part of the string also '.txt' so the result in the example above is '-123.txt'
So my question is:
Is there a way in regular expressions to get only part of the matched string, like a submatch of the match without the need to trim it in my shell script code for unix?
I found this answer but it is getting the same result:
Regexp: Trim parts of a string and return what ever is left
Tip: To test my regular expressions is used this website
You can use lookbehind and lookahead
(?<=-)[0-9]*(?=[.]txt)
Don't know if it would work in unix
Different regex-engines are different. Since you're using expr match, you need to make two changes:
expr match expects a regex that matches the entire string; so, you need to add .* at the beginning of yours, to cover everything before the hyphen.
expr match uses POSIX Basic Regular Expressions (BREs), which use \( and \) for grouping (and capturing) rather than merely ( and ).
But, conveniently, when you give expr match a regex that contains a capture-group, its output is the content of that capture-group; you don't need to do anything else special. So:
$ expr match TST0101201304-123.txt '.*-\([0-9]*\)\.txt'
123
sed is your friend.
echo filename | sed -e 's/-\([0-9]*\)/\1'
should get you what you want.