matching in between a long sentence with keywords - regex

target sentence:
$(SolDir)..\..\ABC\ccc\1234\ccc_am_system;$(SolDir)..\..\ABC\ccc\1234\ccc_am_system\host;$(SolDir)..\..\ABC\ccc\1234\components\fds\ab_cdef_1.0\host; $(SolDir)..\..\ABC\ccc\1234\somethingelse;
how should I construct my regex to extract item contains "..\..\ABC\ccc\1234\ccc_am_system"
basically, I want to extract all those folders and may be more, they are all under \ABC\ccc\1234\ccc_am_system:
$(SolDir)..\..\ABC\ccc\1234\ccc_am_system\host\abc;
$(SolDir)..\..\ABC\ccc\1234\ccc_am_system\host\123\123\123\123;
$(SolDir)..\..\ABC\ccc\1234\ccc_am_system\host;
my current regex doesn't work and I can't figure out why
\$.*ccc\\1234\.*;

Your problem is most likely that * is a greedy operator. It's greedily matching more than you intend it to. In many regex dialects, *? is the reluctant operator. I would first try using it like this:
\$.*?ccc\\1234.*?;
You can read up a bit more on greedy vs reluctant operators in this question.
If that doesn't work, you can try to be more specific with the characters you match than .. For example, you can match every non-semicolon character with an expression like this: [^;]*. You could use that idea this way:
\$[^;]*ccc\\1234[^;]*;

The below regex would store the captured strings inside group 1.
(\$.*?ccc\\1234\\.*?;)
You need to make the * quantifier to does a shortest match by adding ? next to * . And also this \.* matches a literal dot zero or more times. It's wrong.
DEMO

I found this to be the best:
\$(.[^\$;])*ccc\\1234(.[^\$;])*;
it doesn't allow any over match whatsoever, if I use ?, it still matches more $ or ; more than once for some reason, but with above expression, that will never be case. Still thanks to all those who took the time to answer my question,.

Related

How to extract characters from a string with optional string afterwards using Regex?

I am in the process of learning Regex and have been stuck on this case. I have a url that can be in two states EXAMPLE 1:
spotify.com/track/1HYcYZCOpaLjg51qUg8ilA?si=Nf5w1q9MTKu3zG_CJ83RWA
OR EXAMPLE 2:
spotify.com/track/1HYcYZCOpaLjg51qUg8ilA
I need to extract the 1HYcYZCOpaLjg51qUg8ilA ID
So far I am using this: (?<=track\/)(.*)(?=\?)? which works well for Example 2 but it includes the ?si=Nf5w1q9MTKu3zG_CJ83RWA when matching with Example 1.
BUT if I remove the ? at the end of the expression then it works for Example 1 but not Example 2! Doesn't that mean that last group (?=\?) is optional and should match?
Where am I going wrong?
Thanks!
I searched a handful of "Questions that may already have your answer" suggestions from SO, and didn't find this case, so I hope asking this is okay!
The capturing group in your regular expression is trying to match anything (.) as much as possible due to the greediness of the quantifier (*).
When you use:
(?<=track\/)(.*)(?=\?)
only 1HYcYZCOpaLjg51qUg8ilA from the first example is captured, as there is no question mark in your second example.
When using:
(?<=track\/)(.*)(?=\??)
You are effectively making the positive lookahead optional, so the capturing group will try to match as much as possible (including the question mark), so that 1HYcYZCOpaLjg51qUg8ilA?si=Nf5w1q9MTKu3zG_CJ83RWA and 1HYcYZCOpaLjg51qUg8ilA are matched, which is not the desired output.
Rather than matching anything, it is perhaps more appropriate for you to match alphanumerical characters \w only.
(?<=track\/)(\w*)(?=\??)
Alternatively, if you are expecting other characters , let's say a hyphen - or a underscore _, you may use a character class.
(?<=track\/)([a-zA-Z0-9_-]*)(?=\??)
Or you might want to capture everything except a question mark ? with a negated character class.
(?<=track\/)([^?]*)(?=\??)
As pointed out by gaganso, a look-behind is not necessary in this situation (or indeed the lookahead), however it is indeed a good idea to start playing around with them. The look-around assertions do not actually consume the characters in the string. As you can see here, the full match for both matches only consists of what is captured by the capture group. You may find more information here.
This should work:
track\/(\w+)
Please see here.
Since track is part of both the strings, and the ID is formed from alphanumeric characters, the above regex which matches the string "track/" and captures the alphanumeric characters after that string, should provide the required ID.
Regex : (\w+(?=\?))|(\w+&)
See the demo for the regex, https://regexr.com/3s4gv .
This will first try to search for word which has '?' just after it and if thats unsuccessful it will fetch the last word.

Ant regex expression

Quite a simple one in theory but can't quite get it!
I want a regex in ant which matches anything as long as it has a slash on the end.
Below is what I expect to work
<regexp id="slash.end.pattern" pattern="*/"/>
However this throws back
java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*/
^
I have also tried escaping this to \*, but that matches a literal *.
Any help appreciated!
Your original regex pattern didn't work because * is a special character in regex that is only used to quantify other characters.
The pattern (.)*/$, which you mentioned in your comment, will match any string of characters not containing newlines, however it uses a possibly unnecessary capturing group. .*/$ should work just as well.
If you need to match newline characters, the dot . won't be enough. You could try something like [\s\S]*/$
On that note, it should be mentioned that you might not want to use $ in this pattern. Suppose you have the following string:
abc/def/
Should this be evaluated as two matches, abc/ and def/? Or is it a single match containing the whole thing? Your current approach creates a single match. If instead you would like to search for strings of characters and then stop the match as soon as a / is found, you could use something like this: [\s\S]*?/.

Regular Expressions, getting digit after second occurence of dot

I want to get a number after second dot in a string like that :
4.5.3. Some kind of question ? but input string might look like this as well 41.53.32. Some kind of question ? so im aiming for 3 in the first example and 32 in second example.
I'm trying to do it with
(?<=(\.\d\.))[0-9]+
and it works on 1st example, but when im trying to add (?<=(\.\d+\.))[0-9]+
it doesn't work at all.
If there is always a dot after the final number then you can use the following expression:
\d+(?=\.(?:[^\d]|$))
This will match one or more digits \d+ which are followed by a dot . then something that is either not a number [^\d] of the end-of-string $, i.e. (?=\.(?:[^\d]|$)).
Regex101 Demo
If you use PERL or PHP, you can try this pattern:
(?:\d+\.){2}\K\d+
The simplest complete answer is probably something like this:
(?<=^(?:[^.]*\.){2})\d+
If you're at all worried about performance, this one will be slightly faster:
^(?:[^.]*\.){2}(\d+)
This one will capture the desired value in capturing group 1.
If you are using an engine that doesn't support variable-length lookbehind, you'll need to use the second version.
If you wish, you can replace [^.] with \d, to only match digits.
(\d+.\d+.)\K\d+
Match digits dot digits dot digits, with the first section as a group not selected.
(?:(?:.*\.)?){2}(\d+)
the following regex should work for your use case.
check it out here

How come this RegEx isn't working quite right?

I have this RegEx here:
/^function(\d)$/
It matches function(5) but not function(55). How come?
The other posters are correct about the +, but what language are you using for to parse the regular expression? Shouldn't you have to escape the ()? Otherwise it should capture the digit(s).
I would think you would need...
/^function\(\d+\)$/
/^function(\d+)$/
You need to add the + to make the \d (digits) greedy -- to match as much as possible. (Assuming that is what you are after as it would probably match
function(3242345235234235235234234234535325234235235234523) as well as function(55)
Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once.
referring to +
http://www.regular-expressions.info/reference.html
Because you only gave it one \d. If you want to match more than one digit, tell it so.

How to continue a match in Regex

price:(?:(?:\d+)?(?:\.)?\d+|min)-?(?:(?:\d+)?(?:\.)?\d+|max)?
This Regex matches the following examples correctly.
price:1.00-342
price:.1-23
price:4
price:min-900.00
price:.10-.50
price:45-100
price:453.23-231231
price:min-max
Now I want to improve it to match these cases.
price:4.45-8.00;10.45-14.50
price:1.00-max;3-12;23.34-12.19
price:1.00-2.50;min-12;23.34-max
Currently the match stops at the semi colon. How can I get the regex to repeat across the semi-colon dividers?
Final Solution:
price:(((\d*\.)?\d+|min)-?((\d*\.)?\d+|max)?;?)+
Add an optional ; at the end, and make the whole pattern to match one or more:
price:((?:(?:\d+)?(?:\.)?\d+|min)-?(?:(?:\d+)?(?:\.)?\d+|max)?;?)+
(?:\d+)? is the same thing as \d*, and (?:\.)? can just be \.?. Simplified, your original regex is:
price:(?:\d*\.?\d+|min)(?:-(?:\d*\.?\d+|max))?
You have two choices. You can either do price([:;]range)* where range is the regex you have for matching number ranges, or be more precise about the punctuation but have to write out range twice and do price:range(;range)*.
price([:;]range)* -- shorter but allows first ':' to be ';'
price:range(;range)* -- longer but gets colon vs semi-colon correct
Pick one of these two regexes:
price[:;](?:\d*\.?\d+|min)(?:-(?:\d*\.?\d+|max))?
price:(?:\d*\.?\d+|min)(?:-(?:\d*\.?\d+|max))?(?:(?:\d*\.?\d+|min)(?:-(?:\d*\.?\d+|max))?)*
First there are some issues with your regular expression: to match xx.yyy instead of the expression (?:\d+)?(?:\.)?\d+ you can use this (?:\d*\.)?\d+. This can only match in one way so it avoids unnecessary backtracking.
Also currently your regular expression matches things like price:minmax and price:1.2.3 which I assume you do not want to match.
The simple way to repeat your match is to add a semi-colon and then repeat your regular expression verbatim.
You can do it like this though to avoid writing out the entire regular twice:
price:(?:(?:(?:\d*\.)?\d+|min)(?:-(?:(?:\d*\.)?\d+|max))?(?:;|$))*
See it in action on Rubular.
price:((?:(?:\d+)?(?:\.)?\d+|min)-?(?:(?:\d+)?(?:\.)?\d+|max)?;?)+
I'm not sure what's up with all of the ?'s (I know the syntax, I just don't know why you're using it so much), but that should do it for you.