Regex greedy issue - regex

I'm sure this one is easy but I've tried a ton of variations and still cant match what I need. The thing is being too greedy and I cant get it to stop being greedy.
Given the text:
test=this=that=more text follows
I want to just select:
test=
I've tried the following regex
(\S+)=(\S.*)
(\S+)?=
[^=]{1}
...
Thanks all.

here:
// matches "test=, test"
(\S+?)=
or
// matches "test=, test" too
(\S[^=]+)=
you should consider using the second version over the first. given your string "test=this=that=more text follows", version 1 will match test=this=that= then continue parsing to the end of the string. it will then backtrack, and find test=this=, continue to backtrack, and find test=, continue to backtrack, and settle on test= as it's final answer.
version 2 will match test= then stop. you can see the efficiency gains in larger searches like multi-line or whole document matches.

You probably want something like
^(\S+?=)
The caret ^ anchors the regex to the beginning of the string. The ? after the + makes the + non-greedy.

You might be looking for lazy quantifiers *?, +?, ??, and {n, n}?

You should be able to use this:
(\S+?)=(\S.*)

Lazy quantifiers work, but they also can be a performance hit because of backtracking.
Consider that what you really want is "a bunch of non-equals, an equals, and a bunch more non-equals."
([^=]+)=([^=]+)
Your examples of [^=]{1} only matches a single non-equals character.

if you want only "text=", I think that a simply:
^(\w+=)
should be fine if you are shure about that the string "text=" will always start the line.
the real problem is when the string is like this:
this=that= more test= text follows
if you use the regex above the result is "this=" and if you modify the above with the reapeater qualifiers at the end, like this:
^(\w+=)*
you find a tremendous "this=that=", so I could only imagine the trivial:
[th\w+=]*test=
Bye.

Related

Matching multiple letters and special characters in regex

I am trying to catch strings around the acronym ADJ. The strings look like this:
·NOM·JJ·ADJ+CASE_DEF_GEN
·NOM·JJ·ADJ+CASE_DEF_ACC
·NOM·JJ·ADJ+CASE_INDEF_GEN
·NOM·DT+JJ·DET+ADJ+NSUFF_FEM_SG+CASE_DEF_GEN
·NOM·JJ·ADJ+CASE_INDEF_GEN
·NOM·JJ·ADJ+NSUFF_FEM_SG+CASE_INDEF_GEN
·NOM·DT+JJ·DET+ADJ+NSUFF_FEM_SG+CASE_DEF_ACC
So far I have this:
/[A-Z·\+#_]*?[·\+]ADJ[·\+][A-Z_·\+#]*?/g
But it only matches from the beginning of the strings until "ADJ+" ·NOM·DT+JJ·DET+ADJ+.
Since the rest of the strings after ADJ have the same composition of the beginning of the strings before ADJ, I thought this /[A-Z·\+#_]*?[·\+]/g should work, but it doesn't.
How do I get it to match the rest of the string?
My guess is that you want to make sure if you have an ADJ in the string, which if so, maybe we could simplify our expression to something similar to:
([A-Z·+#_]*)\bADJ\b([A-Z·+#_]*)
The expression is explained on the top right panel of this demo, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.
That *? quantifier after the +ADJ+ phrase is satisfied with the empty string right after it, since the ? makes the quantifier before it match "the minimum number of times possible" and for * that is zero times.
So drop the ?, which also has no purpose for the rest of the line
perl -wE'$_=q(-XADJX-JJ+ADJ-REST-);
($before, $after) = /(.*?)[+\-]ADJ[+\-](.*)/;
say for $before,$after'
Removing the ? at the end would match the whole strings,
/[A-Z·\+#_]*?[·\+]ADJ[·\+][A-Z_·\+#]*/g
I am not entirely sure why you needed a ? in a *.

Regex to remove a whole phrase from the match

I am trying to remove a whole phrase from my regex(PCRE) matches
if given the following strings
test:test2:test3:test4:test5:1.0.department
test:test2:test3:test4:test5:1.0.foo.0.bar
user.0.display
"test:test2:test3:test4:test5:1.0".division
I want to write regex that will return:
.department
.foo.0.bar
user.0.display
.division
Now I thought a good way to do this would be to match everything and then remove test:test2:test3:test4:test5:1.0 and "test:test2:test3:test4:test5:1.0" but I am struggling to do this
I tried the following
\b(?!(test:test2:test3:test4:test5:1\.0)|("test:test2:test3:test4:test5:1\.0"))\b.*
but this seems to just remove the first tests from each and thats all. Could anyone help on where I am going wrong or a better approach maybe?
I suggest searching for the following pattern:
"?test:test2:test3:test4:test5:1\.0"?
and replacing with an empty string. See the regex demo and the regex graph:
The quotation marks on both ends are made optional with a ? (1 or 0 times) quantifier.

Get all matches for a certain pattern using RegEx

I am not really a RegEx expert and hence asking a simple question.
I have a few parameters that I need to use which are in a particular pattern
For example
$$DATA_START_TIME
$$DATA_END_TIME
$$MIN_POID_ID_DLAY
$$MAX_POID_ID_DLAY
$$MIN_POID_ID_RELTM
$$MAX_POID_ID_RELTM
And these will be replaced at runtime in a string with their values (a SQL statement).
For example I have a simple query
select * from asdf where asdf.starttime = $$DATA_START_TIME and asdf.endtime = $$DATA_END_TIME
Now when I try to use the RegEx pattern
\$\$[^\W+]\w+$
I do not get all the matches(I get only a the last match).
I am trying to test my usage here https://regex101.com/r/xR9dG0/2
If someone could correct my mistake, I would really appreciate it.
Thanks!
This will do the job:
\$\$\w+/g
See Demo
Just Some clarifications why your regex is doing what is doing:
\$\$[^\W+]\w+$
Unescaped $ char means end of string, so, your pattern is matching something that must be on the end of the string, that's why its getting only the last match.
This group [^\W+] doesn't really makes sense, groups starting with [^..] means negate the chars inside here, and \W is the negation of words, and + inside the group means literally the char +, so you are saying match everything that is Not a Not word and that is not a + sign, i guess that was not what you wanted.
To match the next word just \w+ will do it. And the global modifier /g ensures that you will not stop on the first match.
This should work - Based on what you said you wanted to match this should work . Also it won't match $$lower_case_strings if that's what you wanted. If not, add the "i" flag also.
\${2}[A-Z_]+/g

Regex — only zero or one 's'

I have a name, "foo bar", and in any string, foo, foos, bar and bars should be matched.
I thought this should work like this: (foo|bar)s?. I tried some other regexes as well, but they all were like this. How can I do this?
(foo|bar)s? is correct...
You should use a boundary like \b(foo|bar)s?\b. Else it would also match hihellofoos.
Your question seems to reflect perplexity over why you found a match in foosss. Note the difference between finding a match in a string, and matching the whole string.
You have several ways of dealing with this, and the right choice depends on your application.
Anchor the regex to the whole input line or input: ^(foo|bar)s?$
Anchor the regex to one word: \b(foo|bar)s?\b
Some APIs (but not preg_match) have a separate function to match the whole string.

Little vim regex

I have a bunch of strings that look like this: '../DisplayPhotod6f6.jpg?t=before&tn=1&id=130', and I'd like to take out everything after the question mark, to look like '../DisplayPhotod6f6.jpg'.
s/\(.\.\.\/DisplayPhoto.\{4,}\.jpg\)*'/\1'/g
This regex is capturing some but not all occurences, can you see why?
\.\{4,} is trying to match 4 or more . characters. What it looks like you wanted is "match 4 or more of any character" (.\{4,}) but "match 4 or more non-. characters" ([^.]\{4,}) might be more accurate. You'll also need to change the lone * at the end of the pattern to .* since the * is currently applying to the entire \(\) group.
I think the easyest way to go for this is:
s/?.*$/'/g
This says: delete everything after the question mark and replace it with a single quote.
I would use macros, sometime simpler than regexp (and interactive) :
qa
/DisplayPhoto<Enter>
f?dt'
n
q
And then some #a, or 20000#a to go though all lines.
The following regexp: /(\.\./DisplayPhoto.*\.jpg)/gi
tested against following examples:
../DisplayPhotocef3.jpg?t=before&tn=1&id=54
../DisplayPhotod6f6.jpg?t=before&tn=1&id=130
will result:
../DisplayPhotocef3.jpg
../DisplayPhotod6f6.jpg
%s/\('\.\.\/DisplayPhoto\w\{4,}\.jpg\).*'/\1'/g
Some notes:
% will cause the swap to work on all lines.
\w instead of '.', in case there are some malformed file names.
Replace '.' at the start of your matching regex with ' which is exactly what it should be matching.