Regex stop at the first double quote - regex

This is my string
'<input type="hidden" name="mode" value="<?=$modeValue?>"/>'
I am trying to take 1) name="mode" and 2) value="<?=$modeValue?>"
from it.
I ran two regex to find it which is
/name\s*=\s*['\"].*['\"]/ for name="mode" and /value\s*=\s*['\"].*['\"]/ for value="<?=$modeValue?>"
But I fail to get name="mode" on the first regex.
Instead I get name="mode" value="$modeValue".
However I succeeded in getting value="<?=$modeValue?>"
What is wrong with my regex for name="mode"?
My observation, I think I have to make the regex stops at the first " it encounters. Anyone know how to do this. I am running out of time...

A little change and your regex is good to go.
name\s*=\s*['\"].*?['\"]
^
Why your regex was not working the way you wanted.
So by nature quantifiers are greedy in nature so . will try to match as many characters as it can.
So by adding ? we make it lazy which means it will now try to match as less character as it can.
Demo
In case you want to join both of regex together.
(name=\".*?\")\s*(value=\".*?\")|(value=\".*?\")\s*(name=\".*?\")
Demo2

You can create capturing groups to match both,
(name=\".*?\")\s*(value=\".*?\")
Demo:
https://regex101.com/r/z9dDE2/1

Related

Regex to remove a whole phrase from the match

I am trying to remove a whole phrase from my regex(PCRE) matches
if given the following strings
test:test2:test3:test4:test5:1.0.department
test:test2:test3:test4:test5:1.0.foo.0.bar
user.0.display
"test:test2:test3:test4:test5:1.0".division
I want to write regex that will return:
.department
.foo.0.bar
user.0.display
.division
Now I thought a good way to do this would be to match everything and then remove test:test2:test3:test4:test5:1.0 and "test:test2:test3:test4:test5:1.0" but I am struggling to do this
I tried the following
\b(?!(test:test2:test3:test4:test5:1\.0)|("test:test2:test3:test4:test5:1\.0"))\b.*
but this seems to just remove the first tests from each and thats all. Could anyone help on where I am going wrong or a better approach maybe?
I suggest searching for the following pattern:
"?test:test2:test3:test4:test5:1\.0"?
and replacing with an empty string. See the regex demo and the regex graph:
The quotation marks on both ends are made optional with a ? (1 or 0 times) quantifier.

pattern for get all tags

I have this sample code:
<ul><li>aaa</li><li>bbb</li><li>ccc</li></ul>
I need to get aaa, bbb, ccc tags, and I wrote this pattern:
/<a .* class=\"tag\">(.*?)<\/a>/
But this return wrong results. You can see result here.
What's happen and how can I resolve it?
You made your second .* non-greedy, but not your first. Because of this greedy matching, it was matching everything from the opening <a right through to the end of the third opening <a. The simple fix is to make the first non-greedy too:
<a .*? class=\"tag\">(.*?)<\/a>
Here's the updated regex101.
That said, depending on what you have available in your language of choice, and whether or not you're ever expecting a (even very slighty) different HTML string, an HTML parser might be a better choice.

regex optional part in prefix, but do not include it in matches if it present

Problem is easier to be seen in code then described I got following regex
(?<=First(Second)?)\w{5}
and following sample data
FirstSecondText1
FirstText2
I only want matches Text1 & Text2 , I get 3 though, Secon is added, and I don't want that.
Played around, cant seem to get it to work.
You need an additional negative lookahead:
(?<=First(Second)?)(?!Second)\w{5}
If you want to avoid using Second twice, you could do it without lookaround and take the result of the first capturing group:
First(?:Second)?(\w{5})
You can try this regex (?<=First(Second)?)\w{5}$. All you have to do is to add a $ in the end so that the regex would not match the text Secon. You can use this as long as you are sure of the pattern that comes at the end of the input text. In this case it is \w{5}$

Regex matching in ColdFusion OR condition

I am attempting to write a CF component that will parse wikiCreole text. I am having trouble getting the correct matches with some of my regular expression though. I feel like if I can just get my head around the first one the rest will just click. Here is an example:
The following is sample input:
You can make things **bold** or //italic// or **//both//** or //**both**//.
Character formatting extends across line breaks: **bold,
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
My first attempt was:
<cfset out = REreplace(out, "\*\*(.*?)\*\*", "<strong>\1</strong>", "all") />
Then I realized that it would not match where the ** is not given, and it should end where there are two carriage returns.
So I tried this:
<cfset out = REreplace(out, "\*\*(.*?)[(\*\*)|(\r\n\r\n)]", "<strong>\1</strong>", "all") />
and it is close but for some reason it gives you this:
You can make things <strong>bold</strong>* or //italic// or <strong>//both//</strong>* or //<strong>both</strong>*//.
Character formatting extends across line breaks: <strong>bold,</strong>
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
Any ideas?
PS: If anyone has any suggestions for better tags, or a better title for this post I am all ears.
The [...] represents a character class, so this:
[(\*\*)|(\r\n\r\n)]
Is effectively the same as this:
[*|\r\n]
i.e. it matches a single "*" and the "|" isn't an alternation.
Another problem is that you replace the double linefeed. Even if your match succeeded you would end up merging paragraphs. You need to either restore it or not consume it in the first place. I'd use a positive lookahead to do the latter.
In Perl I'd write it this way:
$string =~ s/\*\*(.*?)(?:\*\*|(?=\n\n))/<strong>$1<\/strong>/sg;
Taking a wild guess, the ColdFusion probably looks like this:
REreplace(out, "\*\*(.*?)(?:\*\*|(?=\r\n\r\n))", "<strong>\1</strong>", "all")
You really should change your
(.*?)
to something like
[^*]*?
to match any character except the *. I don't know if that is the problem, but it could be the any-character . is eating one of your stars. It also a generally accepted "best practice" when trying to balance matching characters like the double star or html start/end tags to explicitly exclude them from your match set for the inner text.
*Disclaimer, I didn't test this in ColdFusion for the nuances of the regex engine - but the idea should hold true.
I know this is an older question but in response to where Ryan Guill said "I tried the $1 but it put a literal $1 in there instead of the match" for ColdFusion you should use \1 instead of $1
I always use a regex web-page. It seems like I start from scratch every time I used regex.
Try using '$1' instead of \1 for this one - the replace is slightly different... but I think the pattern is what you need to get working.
Getting closer with this:
**(.?)**|//(.?)//
The tricky part is the //** or **//
Ok, first checking for //bold//
then //bold// then bold, then
//bold//
**//(.?)//**|//**(.?)**//|**(.?)**|//(.?)//
I find this app immensely helpful when I'm doing anything with regex:
http://www.gskinner.com/RegExr/desktop/
Still doesn't help with your actual issue, but could be useful going forward.

Regex greedy issue

I'm sure this one is easy but I've tried a ton of variations and still cant match what I need. The thing is being too greedy and I cant get it to stop being greedy.
Given the text:
test=this=that=more text follows
I want to just select:
test=
I've tried the following regex
(\S+)=(\S.*)
(\S+)?=
[^=]{1}
...
Thanks all.
here:
// matches "test=, test"
(\S+?)=
or
// matches "test=, test" too
(\S[^=]+)=
you should consider using the second version over the first. given your string "test=this=that=more text follows", version 1 will match test=this=that= then continue parsing to the end of the string. it will then backtrack, and find test=this=, continue to backtrack, and find test=, continue to backtrack, and settle on test= as it's final answer.
version 2 will match test= then stop. you can see the efficiency gains in larger searches like multi-line or whole document matches.
You probably want something like
^(\S+?=)
The caret ^ anchors the regex to the beginning of the string. The ? after the + makes the + non-greedy.
You might be looking for lazy quantifiers *?, +?, ??, and {n, n}?
You should be able to use this:
(\S+?)=(\S.*)
Lazy quantifiers work, but they also can be a performance hit because of backtracking.
Consider that what you really want is "a bunch of non-equals, an equals, and a bunch more non-equals."
([^=]+)=([^=]+)
Your examples of [^=]{1} only matches a single non-equals character.
if you want only "text=", I think that a simply:
^(\w+=)
should be fine if you are shure about that the string "text=" will always start the line.
the real problem is when the string is like this:
this=that= more test= text follows
if you use the regex above the result is "this=" and if you modify the above with the reapeater qualifiers at the end, like this:
^(\w+=)*
you find a tremendous "this=that=", so I could only imagine the trivial:
[th\w+=]*test=
Bye.