Regular Expression on Strings - regex

I wrote this regular expression in http://www.regexr.com/
Regular Expression: (^A.*\..\s)\|((\sS.*:\sA.*,\sN.....\s))\|(\sN.+)/g
Text:
AT1G01010.1 | Symbols: ANAC001, NAC001 | NAC domain containing protein 1
| chr1:3760-5630 FORWARD LENGTH=429
I'm able to detect the 1st String|2nd String| 3rd String| in the above text.
I would like to eliminate the 2nd part (" Symbols: ANAC001, NAC001 ") in the above text using the regular expression. Could anyone help? Or I need a regular expression to detect only the 1st and 3rd String.

Consider the following regex since you are already using the beginning of string ^ anchor.
^(A[^|]+)\s\|[^|]+\|\s*([^|]+)\s\|
Live Demo

What exactly are you trying to do? the regular ex that you provide that will search the whole text and return you the one that match. so you are treating the regex as a whole. if you want grab just the 1st part and the 3st part, then you need to do two seperate regex on the same text twice and merge the result together.

try ?:
(^A.*\..\s)\|(?:\sS.*:\sA.*,\sN.....\s)\|(\sN.+)

Related

regular expression to match either or 2 patterns [duplicate]

I have two regular expressions which are working perfectly fine when use independently.
First : ^.+\.((jpeg)|(gif)|(png))\s*$ : search for .jpeg,gif or png at the end of url
Second : ^.+(javax.faces.resource|rfRes).* : search for two url patterns
I want to combine the above two expressions such that " url ends in any of the image " OR "url has javax.faces.resource or rfRes in its path"
I tried using | operator to join both but it seems its not working like below :
^.+\.((jpeg)|(gif)|(png))\s*$ | ^.+(javax.faces.resource|rfRes).*
but its not working.
Can anybody please help in joining above two regex ?
You have extra spaces around the | operator:
Your original regex
^.+\.((jpeg)|(gif)|(png))\s*$ | ^.+(javax.faces.resource|rfRes).*
^.+\.((jpeg)|(gif)|(png))\s*$|^.+(javax.faces.resource|rfRes).*
Fixed regex ^
|
Your solution will try to match "the end of the string and then a space," or "a space and then the beginning of the string." Remember, whitespace is significant in regexes.
The spaces in your combined expression are erroneous. You are requiring a space after end of line or before beginning of line, which is impossible in line-oriented input.
As a further improvement, you can remove the superfluous "anything" parts of the match, as well as a good number of redundant parentheses.
javax\.faces\.resource|rfRes|\.(jpeg|gif|png)\s*$
Notice also the proper quoting of literal full stop characters (a lone . matches any character).

regular expression replace removes first and last character when using $1

I have string like this:
&breakUp=Mumbai;city,Puma;brand&
where Mumbai;city and Puma;brand are filters(let say) separated by comma(,). I have to add more filters like Delhi;State.
I am using following regular expression to find the above string:
&breakUp=.([\w;,]*).&
and following regular expression to replace it:
&breakUp=$1,Delhi;State&
It is finding the string correctly but while replacing it is removing the first and last character and giving the following result:
&breakUp=umbai;city,Puma;bran,Delhi;State&
How to resolve this?
Also, If I have no filters I don't want that first comma. Like
&breakUp=&
should become
&breakUp=Delhi;State&
How to do it?
My guess is that your expression is just fine, there are two extra . in there, that we would remove those:
&breakUp=([\w;,]*)&
In this demo, the expression is explained, if you might be interested.
To bypass &breakUp=&, we can likely apply this expression:
&breakUp=([^&]+)&
Demo
Your problem seems to be the leading and trailing period, they are matched to any character.
Try using this regex:
&breakUp=([\w;,]*)&

Finding a pattern with optional end using regular expression

I am looking for one single regular expression to extract a block of text, which can be surrounded with an optional end. The challenge here is just to use a single regular expression.
The input is as follows:
Anchor: This is the text I want to extract A/C : 2015-5-20
Anchor: This is the text I want to extract
I am currently using the following regular expression
Anchor:(?<extact>.*)(A\/C)
The result looks as follows:
If I make the A/C block optional, Anchor:(?<extact>.*)(A\/C)? using a ? the matching gets to long:
It looks as follows:
Any ideas how to elegantly solve this with a single regex. An additional constraint is that I want to have a named block in the regex, (here extact)
You can find the sample code on regex101: https://regex101.com/r/wH5iQ4/1
Anchor:(?<extact>.*?)\s*(?=A\/C|$)
You can make use of lookahead here.See demo.
https://regex101.com/r/wH5iQ4/3

Regular expression replace double and single quotes with nothing

I am using a sphinx search module on a site I am developing and there is the option to enter regular expressions to be replaced with specified characters.
The available options are Match Expression,Replace Expression and Replace Char (these are input fields in a CMS admin panel so I'm unsure of the actual code function used behind the scenes unfortunately). My understanding is the search checks for any expressions which match Match Expression and replaces the expressions specified in Replace Expression with those specified in Replace Char. So it's a sort of find and replace on matched terms.
Some examples that work:
Example 1
Match Expression: /[a-zA-Z0-9]*-[a-zA-Z0-9]*/
Replace Expression: /-/
Replace Char: empty
Matched text: SX500-123, GLX-11A, GLZX-VXV, GLZ/123, GLZV 123, CNC-PWR1
Result text: SX500123, GLX11A, GLZXVXV, GLZ/123, GLZV-123-123, CNCPWR1
More examples here: http://mirasvit.com/doc/ssp/2.3.2/ssp/global/long_tail
What I want to do is strip any single or double quotes or apostrophes from a search query.
Example inputs: "examination papers",'examination papers,'examination' "papers",pa"pers,pa'pers
Desired outputs: examination papers,examination papers,papers,papers,papers
I have tried just replacing the - with a " in the examples listed above for now but even this hasn't worked.
Any help would be greatly appreciated! Thank you
You can use these expressions:
Match Expression - /["'][\w\s]+["']|\w+["']\w+/
This will match the following text:
"examination papers",'examination papers','examination' "papers",pa"pers,pa'pers
Then you can use this regex to replace your quotes:
Replace Expression - /["']/
Replace Char - empty
So, your output will be:
examination papers,examination papers,examination papers,papers,papers
As a context for this answer. I understand from the tool you are using that your match expression gathers a resultset where you can apply another regex expression (Replace expression) that will replace the content matched with replace char

Find and trim part of what is found using regular expression

I'm a newbie in writing regular expressions
I have a file name like this TST0101201304-123.txt and my target is to get the numbers between '-' and '.txt'
So I wrote this formula -([0-9]*)\.txt this will get me the numbers that I want, but in addition, it is retrieving the highfin '-' and the last part of the string also '.txt' so the result in the example above is '-123.txt'
So my question is:
Is there a way in regular expressions to get only part of the matched string, like a submatch of the match without the need to trim it in my shell script code for unix?
I found this answer but it is getting the same result:
Regexp: Trim parts of a string and return what ever is left
Tip: To test my regular expressions is used this website
You can use lookbehind and lookahead
(?<=-)[0-9]*(?=[.]txt)
Don't know if it would work in unix
Different regex-engines are different. Since you're using expr match, you need to make two changes:
expr match expects a regex that matches the entire string; so, you need to add .* at the beginning of yours, to cover everything before the hyphen.
expr match uses POSIX Basic Regular Expressions (BREs), which use \( and \) for grouping (and capturing) rather than merely ( and ).
But, conveniently, when you give expr match a regex that contains a capture-group, its output is the content of that capture-group; you don't need to do anything else special. So:
$ expr match TST0101201304-123.txt '.*-\([0-9]*\)\.txt'
123
sed is your friend.
echo filename | sed -e 's/-\([0-9]*\)/\1'
should get you what you want.