non-greedy search for redundant values in string - regex

Basically I have this string and I want to get only a distinct image filename.
/mPastedImg_Time1469244713469.png&gtxResourceFileName=mPastedImg_Time1469244713469.png&amp
I have this regex code but it does not seem to work.
[^\/]*?_Time[0-9]{13}\.\w{3,4}\&
My expected output is:
mPastedImg_Time1469244713469.png
But the actual output is:
mPastedImg_Time1469244713469.png&gtxResourceFileName=mPastedImg_Time1469244713469.png&

To find the unique filename in a string, you can use this regex,
([^\/&= ]+_Time[0-9]{13}\.\w{3,4})(?!.*\1)
Here, ([^\/&= ]+_Time[0-9]{13}\.\w{3,4}) captures the filename you require and (?!.*\1) negative look ahead gives you the last match ensuring the removal of all duplicates matches in the string. Also, because of appropriate negated character set, it allows matching Chinese character set too that are present in your filename which also you wanted to capture.
Demo

Your pattern has 2 matches where the second part has a larger match due to the negated character class [^\/] that matches not a forward slash.
What you might do is make the first character class more restrictive to specify what you would allow to match (for example [a-zA-Z]) and make sure that you don't use a global match to match all, but just one match:
[a-zA-Z]*_Time[0-9]{13}\.\w{3,4}
Regex demo
Note that you don't have to match the ampersand at the end of the pattern.

I think you were quite close matching it, but you were doing too complex:
If you know that the name will be mPastedImg_Time then use it to the fullest.
What about simply doing it like this:
mPastedImg_Time[0-9]{13}\.\w{3,4}

Related

How can I match all lines with a certain pattern, except when a certain substring is present?

I have multiple lines that have a bit of code that has a format that follow a very simple pattern: &G3FRM.GetRecord("<TAG>".GetField("<TAG>").Value. For example, I might have the following:
&G3FRM.GetRecord("PAGEREC").GetField("GSHOURS").Value
&G3FRM.GetRecord("RSCH_SETUP").GetField("Y_NIH_MNTHLY_CAP").Value
&G3FRM.GetRecord("PAYMENT").GetField("Y_HRS_TOTAL").Value
I need to match anything that has &G3FRM.GetRecord, that doesn't have PAGEREC as the first string/tag, and is then followed by the rest of the pattern. These statements can appear at the beginning, middle or end of any given line, and there could even be multiple matches in a single line.
This is the Regex pattern that I have tried:
&G3FRM\.GetRecord\("(?!PAGEREC)"\)\.GetField\("\w+"\)\.Value
As far as I understand, this is matching some literals (&G3FRM.GetRecord(") and is then looking for any string that doesn't match PAGEREC, using a negative lookahead. It certainly excludes any of the matches that have PAGEREC, but it also excludes everything else, so I know that I'm missing something.
So, I have a bunch of lines that I've cherry-picked that could look something like this:
Local string &rqst_dept_descr = %This.GetDepartmentDescription(&G3FRM.GetRecord("PAGEREC").GetField("GSREQUESTING_DEPT").Value);
Local string &hoursHTML = GetHTMLText(HTML.G_FORM_ROW_VALUE, "Hours", &G3FRM.GetRecord("PAYMENT").GetField("GSHOURS").Value);
Local string &off_cycle_deposit = &G3FRM.GetRecord("PAGEREC").GetField("GSOFFCYCLE_DIR_DEP").Value;
&G3FRM.GetRecord("POSITION").GetField("GSCOMMISSIONTIPS").Value = "Y";
SQLExec(SQL.Y_HAS_CONTRACT_DATA_IN_RANGE, &G3FRM.GetRecord("PAGEREC").GetField("EMPLID").Value, &G3FRM.GetRecord("PAYMENT").GetField("CONTRACT_NUM").Value, &G3FRM.GetRecord("PAYMENT").GetField("EFFDT").Value, &G3FRM.GetRecord("PAYMENT").GetField("EFFDT").Value, &HasContractData);
In this example, it should exclude the first line, since it only has the pattern I don't want. It should include the second line, exclude the third, include the fourth, and include the fifth (even though it does have one example of the excluded pattern, it has multiples that I do want).
You may use this regex:
&G3FRM\.GetRecord\("(?!PAGEREC\b)\w+"\)\.GetField\("\w+"\)\.Value
Note use of \w+ after negative lookahead to allow it to match a word that must not be PAGEREC1. I have added \b in your lookahead condition to make sure we don't match partial words.
RegEx Demo
In your regex &G3FRM\.GetRecord\("(?!PAGEREC)"\)\.GetField\("\w+"\)\.Value your negative lookahead condition is correct but regex is not matching anything between 2 double quotes so your regex will only match e.g. &G3FRM.GetRecord("").GetField("GSHOURS").Value.

Regex matching extra characters

using: this tool to evaluate my expression
My test string: "Little" Timmy (tim) McGraw
my regex:
^[()"]|.["()]
It looks like I'm properly catching the characters I want but my matches are including whatever character comes just before the match. I'm not sure what, or if anything, I'm doing wrong to be catching the preceding characters like that? The goal is to capture characters we don't want in the name field of one of our systems.
Brief
Your current regex ^[()"]|.["()] says the following:
^[()"]|.["()] Match either of the following
^[()"] Match the following
^ Assert position at the start of the line
[()"] Match any character present in the list ()"
.["()] Match the following
. Match any character (this is the issue you were having)
["()] Match any character present in the list "()
Code
You can actually shorten your regex to just [()"].
Ultimately, however, it would be much easier to create a negated set that determines which characters are valid rather than those that are invalid. This approach would get you something like [^\w ]. This means match anything not present in the set. So match any non-word and non-space characters (in your sample string this will match the symbols ()" since they are not in the set).

Regex: ignore characters that follow

I'd like to know how can I ignore characters that follows a particular pattern in a Regex.
I tried with positive lookaheads but they do not work as they preserves those character for other matches, while I want them to be just... discarded.
For example, a part of my regex is: (?<DoubleQ>\"\".*?\"\")|(?<SingleQ>\".*?\")
in order to match some "key-parts" of this string:
This is a ""sample text"" just for "testing purposes": not to be used anywhere else.
I want to capture the entire ""sample text"", but then I want to "extract" only sample text and the same with testing purposes. That is, I want the group to match to be ""sample text"", but then I want the full match to be sample text. I partially achieved that with the use of the \K option:
(?<DoubleQ>\"\"\K.*?\"\")|(?<SingleQ>\"\K.*?\")
Which ignores the first "" (or ") from the full match but takes it into account when matching the group. How can I ignore the following "" (")?
Note: positive lookahead does not work: it does not ignore characters from the following matches, it just does not include them in the current match.
Thanks a lot.
I hope I got your questions right. So you want to match the whole string including the quotes, but you want to replace/extract it only the expression without the quotes, right?
You typically can use the regex replace functionality to extract just a part of the match.
This is the regex expression:
""?(.*?)""?
And this the replace expression:
$1

Match pattern anywhere in string?

I want to match the following pattern:
Exxxx49 (where x is a digit 0-9)
For example, E123449abcdefgh, abcdefE123449987654321 are both valid. I.e., I need to match the pattern anywhere in a string.
I am using:
^*E[0-9]{4}49*$
But it only matches E123449.
How can I allow any amount of characters in front or after the pattern?
Remove the ^ and $ to search anywhere in the string.
In your case the * are probably not what you intended; E[0-9]{4}49 should suffice. This will find an E, followed by four digits, followed by a 4 and a 9, anywhere in the string.
I would go for
^.*E[0-9]{4}49.*$
EDIT:
since it fullfills all requirements state by OP.
"[match] Exxxx49 (where x is digit 0-9)"
"allow for any amount of characters in front or after pattern"
It will match
^.* everything from, including the beginning of the line
E[0-9]{4}49 the requested pattern
.*$ everthing after the pattern, including the the end of the line
Your original regex had a regex pattern syntax error at the first *. Fix it and change it to this:
.*E\d{4}49.*
This pattern is for matching in engines (most engines) that are anchored, like Java. Since you forgot to specify a language.
.* matches any number of sequences. As it surrounds the match, this will match the entire string as long as this match is located in the string.
Here is a regex demo!
Just simply use this:
E[0-9]{4}49
How do I allow for any amount of characters in front or after pattern? but it only matches E123449
Use global flag /E\d{4}49/g if supported by the language
OR
Try with capturing groups (E\d{4}49)+ that is grouped by enclosing inside parenthesis (...)
Here is online demo

Exclude strings of pattern "abba"

For example, I want to exclude 'fitting', 'hollow', 'trillion'
but not 'hello' or 'pattern'
I already got the following to work
(.)(.)\2\1
which matches 'hollow' or 'fitting', but I have trouble negating this.
the closest thing I get is
^.(?!(.)(.)\2\1)
which excludes 'fitting' and 'hollow' but not 'trillion'
It's a little different from what you have. Your current regex will check for the pallindromicity (?) as of the second character. Since you want to check the whole string, you need to change it a little to:
^(?!.*(.)(.)\2\1)
The first anchor will ensure that the check is made only at the beginning (otherwise, the regex can claim a match at the end of the string).
Then the .* within the negative lookahead will enable the check to be done anywhere within the string. If there's any match, fail the entire match.
It doesn't match with trillion because you added ^. means it must have a character before the match from beginning. For your first two cases it has h and f character. So if you change this into ^..(?!(.)(.)\2\1) then it will work for trillion.
So in general the regex will be:
(?!.*(.)(.)\2\1)
^^ any number of characters(other than \n)