RegEx - double condition to find some string

RegEx - double condition to find some string - regex

I'd like to find word RADU3_ or RADU3- in a sentence that begins with xlink:href= and ends with .svg
How to do this?
I've tried following, but does not give the result I'm expecting.
(?=\wxlink:href=|\wsvg\b)|\bRADU3_|\bRADU3-
Just last line in example is good result (RADU3_)
ProductionGraphics\GP1**RADU3-**11_HeatingFurnaceF1.svg
PB:ExpressionText id="RADU3_FUEL GAS _SUM_EX" PBD:LinkUses
xlink:href="C:\ProcBookImport\MaintenanceGraphics\RADU3_AI.svg"
Example...

Not sure exactly how you want to use it but the below pattern finds the string. I put the RADU3 part in a group where I matches RADU3 followed by - or _ ([_-])
(xlink:href=.*)(RADU3[_-]*)(.*\.svg)
Edit, handle multiple occurences
If a string might contain the pattern several times then use ? to allow a group to repeat itself
(RADU3[_-]*?)(.*?\.svg?)
The above could be used in a replace expression like
\1someotherword\3
Where \2 is the second group that is replaced

If you want to make sure that the string starts with xlink:href= and ends with \.svg you could use anchors to assert the start ^ and the end $ of the string.
Use 1 capturing group to make sure xlink:href= comes before RADU3 followed by an underscore or a hyphen. Then you could match it and in the replacement use that capturing group follwed by your replacement.
You could use a positive lookahead to assert that the string ends with \.svg
That will match:
^(xlink:href=.*)\bRADU3[_-](?=.*\.svg$)
^ Assert the start of the string
(xlink:href=.*) Capturing group, match up until the last occurence of ..
\bRADU3[_-] Word boundary to prevent matching part of a larger word. Match RADU3 followed by an underscore or hyphen
(?=.*\.svg$) Positive lookahead to assert the string ends with .svg
See the regex demo

It sounds like you only want the word (substring) if it is in a specific context?
In your case, you can restart the regex midways if you want to have starting and ending conditions (multiple conditions) for a string, but at the same time only want to use these conditions as "if-statements" and not as part of the result.
The following uses this method, and utilizes restarts (\K) in order to only extract the substring you are looking for.
# The string has to start with "xlink:href="
xlink:href=
# Fetch everything up to our match, and the restart the regex
.*\K
# The strings we are looking for
(RADU3[-_])
# String has to end with ".svg"
(?=(.*\.svg))
If you want the entire string matching our rules you are looking for something like this:
#The string has to start with "xlink:href"
^(xlink:href=).*
# The strings we are looking for
(RADU3[-_])
# String has to end with ".svg"
(\w+\.svg)
#Get everything after .svg too
.*
If you only want the ending " after the .svg, you'd want to modify the last part where I just take everything after .svg
You can play around with what I have come up with at regex101 (no affiliation, just love their site): https://regex101.com/r/g0v07V/3/

Related

How to select a part of a string in a set, but not the string exactly in RegEx

I am trying to select part of a URL /example/privacy-policy and /example/123/privacy-policy but not /privacy-policy
I currently have this ^[^\/privacy\-policy].\/privacy\-policy.$
But it seems to not work still. Ideally, it would be able to find privacy-policy anywhere in the string without directly matching the root /privacy-policy
Thank you very much!

^.*(?=\/privacy-policy)
This matches everything (.*) from the beginning of the string (^) up until the string /privacy-policy appears. (This is called a positive lookahead)

If you only want to match privacy-policy at the end of the string, but not
when is it /privacy-policy at the start of the string and using a lookbehind assertion is supported:
(?<!^\/)\bprivacy-policy$
Explanation
(?<!^\/) Negative lookbehind, assert not the start of the string followed by / directly to the left
\bprivacy-policy Match literally preceded by a word boundary to prevent a partial match
$ End of string
See a regex demo.

If you don't want privacy-policy as the root of your url, but you still want to match it, you can force the regex to look for at least one more symbol before the backslash that preceeds privacy-policy:
(.+\/)privacy-policy
Then if you want to get the part that comes before privacy-policy, you can reference Group 1.
Is this what you're looking for?

How can I match all lines with a certain pattern, except when a certain substring is present?

I have multiple lines that have a bit of code that has a format that follow a very simple pattern: &G3FRM.GetRecord("<TAG>".GetField("<TAG>").Value. For example, I might have the following:
&G3FRM.GetRecord("PAGEREC").GetField("GSHOURS").Value
&G3FRM.GetRecord("RSCH_SETUP").GetField("Y_NIH_MNTHLY_CAP").Value
&G3FRM.GetRecord("PAYMENT").GetField("Y_HRS_TOTAL").Value
I need to match anything that has &G3FRM.GetRecord, that doesn't have PAGEREC as the first string/tag, and is then followed by the rest of the pattern. These statements can appear at the beginning, middle or end of any given line, and there could even be multiple matches in a single line.
This is the Regex pattern that I have tried:
&G3FRM\.GetRecord\("(?!PAGEREC)"\)\.GetField\("\w+"\)\.Value
As far as I understand, this is matching some literals (&G3FRM.GetRecord(") and is then looking for any string that doesn't match PAGEREC, using a negative lookahead. It certainly excludes any of the matches that have PAGEREC, but it also excludes everything else, so I know that I'm missing something.
So, I have a bunch of lines that I've cherry-picked that could look something like this:
Local string &rqst_dept_descr = %This.GetDepartmentDescription(&G3FRM.GetRecord("PAGEREC").GetField("GSREQUESTING_DEPT").Value);
Local string &hoursHTML = GetHTMLText(HTML.G_FORM_ROW_VALUE, "Hours", &G3FRM.GetRecord("PAYMENT").GetField("GSHOURS").Value);
Local string &off_cycle_deposit = &G3FRM.GetRecord("PAGEREC").GetField("GSOFFCYCLE_DIR_DEP").Value;
&G3FRM.GetRecord("POSITION").GetField("GSCOMMISSIONTIPS").Value = "Y";
SQLExec(SQL.Y_HAS_CONTRACT_DATA_IN_RANGE, &G3FRM.GetRecord("PAGEREC").GetField("EMPLID").Value, &G3FRM.GetRecord("PAYMENT").GetField("CONTRACT_NUM").Value, &G3FRM.GetRecord("PAYMENT").GetField("EFFDT").Value, &G3FRM.GetRecord("PAYMENT").GetField("EFFDT").Value, &HasContractData);
In this example, it should exclude the first line, since it only has the pattern I don't want. It should include the second line, exclude the third, include the fourth, and include the fifth (even though it does have one example of the excluded pattern, it has multiples that I do want).

You may use this regex:
&G3FRM\.GetRecord\("(?!PAGEREC\b)\w+"\)\.GetField\("\w+"\)\.Value
Note use of \w+ after negative lookahead to allow it to match a word that must not be PAGEREC1. I have added \b in your lookahead condition to make sure we don't match partial words.
RegEx Demo
In your regex &G3FRM\.GetRecord\("(?!PAGEREC)"\)\.GetField\("\w+"\)\.Value your negative lookahead condition is correct but regex is not matching anything between 2 double quotes so your regex will only match e.g. &G3FRM.GetRecord("").GetField("GSHOURS").Value.

regex match combined with something before a string if exists

I tried to get the sub-strings from a string
such like:
test strings:
cat_zoo_New_York_US
dog_zoo_South_Carolina
dolphin_zoo_Montreal_Canada
pokemon_home_d_K2-155
returned sub strings:
cat, New_York
dog, South_Carolina
dolphin, Montreal
pokemon, d
the Regex pattern I have tried is
([\w]+)(?:(_zoo_|_home_))(((?!(_US|_Canada|_K2-155))\w)+)
which I don't think is very concise and it returns other sub-strings besides what I need. Do you have any other suggestions?
Thanks!
Some updates
after #The fourth bird's answer #03/15/2018.
First of all, I like the idea of utilizing both ([^_]+) and the (?:) for different part of the sample strings.But let me extend a little more of the sample strings.
cat_zoo_New_York_US
dog_zoo_South_Carolina
yellow_dolphin_zoo_Montreal_Canada
pokemon_home_d_K2-155
pokemon_home_zoo_d_K2-155
I actually want to use the anchor strings such as 'zoo','home' or 'home_zoo' to separate the characters before and after, together with matching(and discarding) the last part of the country(or whatever specified place ID), which makes this question a bit less general(I like the idea of using _,but let me make it more tricky to learn better).
two questions here
what is the function of (?=) and .* here in
(?=(?:_US|_Canada|_K2-155|$)).*$? It seems if I use
(?:_US|_Canada|_K2-155|$), it is still ok...
since I extended a little bit on the anchor string to let it support
_, I used:
(.*?)(?:_*)(?:home_zoo|zoo|home)(?:_*)(.*?)(?:_*)(?:US|Canada|K2-155|$)
It seems ok, but if I use:
(.*?)(?:_*)(?:home|zoo|home_zoo)(?:_*)(.*?)(?:_*)(?:US|Canada|K2-155|$)
It will firstly match home for the last sample string. Is there a
greedy algorithm to catch this without specify the order of the pattern
string?
Well again, I don't like to make a long list of anchor strings, but I don't have other ideas make it more general without doing so.
Thanks again!

You could try it like this:
^([^_]+)_[^_]+_(.*?)(?=(?:_US|_Canada|_K2-155|$)).*$
This will capture 2 groups. You could for example use this in a replacement with group1, group2.
First capture the first part ending on an underscore in group 1 like cat_. Then match the second part ending with an underscore like zoo_ or home_.
From that point capture in a group until you encounter one of your values using a lookahead (?= or the end of the string.
That would match:
^ Begin of the string
([^_]+) Match in a capturing group not an _ one or more times (group 1)
_[^_]+_ match _ then not an _ one or more times followed by _
(.*?) Capture in a group any character zero or more times greedy (group 2)
(?= Positive lookahead that asserts what is on the right side is
(?: Non capturing group
_US|_Canada|_K2-155|$ your values or end of the string
) Close group
) Close group
.*$ Match any character zero or more times till the end of the string
Edit: After the updated question, perhaps this will suit your requirements:
^(.*?)_(?:home_zoo|zoo|home)(.*?)(?=(?:_US|_Canada|_K2-155|$))
This will match any charcter zero or more times non greedy (.*?), then an underscore and a non capturing group (?:home|zoo|home_zoo) to separate the characters before and after.

Well, I tried a more straightforward approach. If your data is more complex than the sample that you gave above, this may fail. Otherwise, for the above text, it works fine.
Here is the expression that I used:
^([^_]*)_[^_]*_(.*)_.*$
1 23 45 67
Basically what I did was:
Group the first char stream, which does not contain _, starting at the beginning of the line.
Then there is an _ following the above group
Follows an arbitrary length string, which does not have _'s in it
Then comes an _
Group the next arbitrary length string
Comes and _ afterwards
Rest of the string
replace it with \1, \2 (first group, second group).
You can find a fiddle here
If you are using vim, you can also achieve the same thing in vim with the following command:
:%s/^[^_]*_\([^_]*\)_\(.*\)_.*$/\1, \2/g
UPDATE
^([^_]*)_[^_]*_(((?:South_)|(?:New_))*[^_]*)((?:_US)|(?:_Canada)|(?:_K2-155))*$
You can find the new fiddle (here)[https://regex101.com/r/qQ2dE4/273]
What is the difference between this one and the previous one?
Now, I cheat a little, as such that I look for adjectives, which modify the state name, like South_ or New_. You can add more here, like East_, West_, Old_ or whatever if there is a case in your date.
There are cases where country is skipped in data. Plus looks like that last token on the very last line does not follow up a pattern. So, I explicitly listed those options in the expression, like US, Canada etc. You may need to add more exceptional cases in here as well.

Regex match till end of text

I'm using Regex to match whole sentences in a text containing a certain string. This is working fine as long as the sentence ends with any kind of punctuation. It does not work however when the sentence is at the end of the text without any punctuation.
This is my current expression:
[^.?!]*(?<=[.?\s!])string(?=[\s.?!])[^.?!]*[.?!]
Works for:
This is a sentence with string. More text.
Does not work for:
More text. This is a sentence with string
Is there any way to make this word as intended? I can't find any character class for "end of text".

End of text is matched by the anchor $, not a character class.
You have two separate issues you need to address: (1) the sentence ending directly after string, and (2) the sentence ending sometime after string but with no end-of-sentence punctuation.
To do this, you need to make the match after string optional, but anchor that match to the end of the string. This also means that, after you recognize an (optional) end-of-sentence punctuation mark, you need to match everything that follows, so the end-of-string anchor will match.
My changes: Take everything after string in your original regex and surround it in (?:...)? - the (?:...) being a "non-remembered" group, and the ? making the entire group optional. Follow that with $ to anchor the end of the string.
Within that optional group, you also need to make the end-of-sentence itself optional, by replacing the simple [.?!] with (?:[.?!].*)? - again, the (?:...) is to make a "non-remembered" group, the ? makes the group optional - and the .* allows this to match as much as you want after the end-of-sentence has been found.
[^.?!]*(?<=[.?\s!])string(?:(?=[\s.?!])[^.?!]*(?:[.?!].*)?)?$

The symbol for end-of-text is $ (and, the symbol for beginning-of-text, if you ever need it, is ^).
You probably won't get what you're looking for with by just adding the $ to your punctuation list though (e.g., [.?!$]); you'll find it works better as an alternative choice: ([.?!]|$).

Your regex is way too complex for what you want to achieve.
To match only a word just use
"\bstring\b"
It will match start, end and any non-alphanum delimiters.
It works with the following:
string is at the start
this is the end string
this is a string.
stringing won't match (you don't want a match here)

You should add the language in the question for more information about using.
Here is my example using javascript:
var reg = /^([\w\s\.]*)string([\w\s\.]*)$/;
console.log(reg.test('This is a sentence with string. More text.'));
console.log(reg.test('More text. This is a sentence with string'));
console.log(reg.test('string'))
Note:
* : Match zero or more times.
? : Match zero or one time.
+ : Match one or more times.
You can change * with ? or + if you want more definition.

Name validation - Adding a check to this regex to stop entering just identical characters

I'm trying to add another feature to a regex which is trying to validate names (first or last).
At the moment it looks like this:
/^(?!^mr$|^mrs$|^ms$|^miss$|^dr$|^mr-mrs$)([a-z][a-z'-]{1,})$/i
https://regex101.com/r/pQ1tP2/1
The idea is to do the following
Don't allow just adding a title like Mr, Mrs etc
Ensure the first character is a letter
Ensure subsequent characters are either letters, hyphens or apostrophes
Minimum of two characters
I have managed to get this far (shockingly I find regex so confusing lol).
It matches things like O'Brian or Anne-Marie etc and is doing a pretty good job.
My next additions I've struggled with though! trying to add additional features to the regex to not match on the following:
Just entering the same characters i.e. aaa bbbbb etc
Thanks :)

I'd add another negative lookahead alternative matching against ^(.)\1*$, that is, any character, repetead until the end of the string.
Included as is in your regex, it would make that :
/^(?!^mr$|^mrs$|^ms$|^miss$|^dr$|^mr-mrs$|^(.)\1*$)([a-z][a-z'-]{1,})$/i
However, I would probably simplify your negative lookahead as follows :
/^(?!(mr|ms|miss|dr|mr-mrs|(.)\2*)$)([a-z][a-z'-]{1,})$/i
The modifications are as follow :
We're evaluating the lookahead at the start of the string, as indicated by the ^ preceding it : no need to repeat that we match the start of the string in its clauses
Each alternative match the end of the string. We can put the alternatives in a group, which will be followed by the end-of-string anchor
We have created a new group, which we have to take into account in our back-reference : to reference the same group, it now must address \2 rather than \1. An alternative in certain regex flavours would have been to use a non-capturing group (?:...)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

RegEx - double condition to find some string - regex

Related

How to select a part of a string in a set, but not the string exactly in RegEx

How can I match all lines with a certain pattern, except when a certain substring is present?

regex match combined with something before a string if exists

Regex match till end of text

Name validation - Adding a check to this regex to stop entering just identical characters

Categories

Resources