Regex match two words or at least one - regex

I have problem with my regex string. I have two combinations of strings as follows,
2.3.8.2.2.1.2.3.4.12345 = WORDS: "String to capture"
2.3.8.2.2.1.2.3.4.12345 = ""
Regex:
1\.2\.3\.4\.(\d+) = WORDS: (?|"([^"]*)|([^:]*))
https://regex101.com/r/kQ3wT5/10 - matching
https://regex101.com/r/kQ3wT5/9 - Not matching
This regex is matching only for the first string and not for the second where i have empty string. So the regex has to match on both scenario. And one more thing i really dont want to go with "global" match.
Please help me on this.

You need to make WORDS:<space> optional by enclosing it with an optional non-capturing group:
1\.2\.3\.4\.(\d+) = (?:WORDS: )?(?|"([^"]*)|([^:]*))
See the regex demo.
The (?:WORDS: )? matches 1 or 0 sequences (due to the ? quantifier) of WORDS: substring followed with a space.

Related

Regex match 10 characters after second pattern

I would like to match 10 characters after the second pattern:
My String:
www.mysite.de/ep/3423141549/ep/B104RHWZZZ?something
What I want to be matched:
B104RHWZZZ
What the regex currently matches:
B104RHWZZZ?something
Currently, my Regex looks like this:
(?<=\/ep\/)(?:(?!\/ep\/).)*$.
Could someone help me to change the regex that it only matches 10 characters after the second "/ep/" ("B104RHWZZZ")?
It depends on which characters you allow to match. If you want to allow 10 non whitspace characters characters not being / or ? then you could use;
(?<=\/ep\/)[^\/?\s]{10}(?=[^\/\s]*$)
Explanation
(?<=\/ep\/) Assert /ep/ directly to the left
[^\/?\s]{10} Match 10 times any non whitespace character except for / and ?
(?=[^\/\s]*$) Assert no more occurrence of / to the right
Regex demo
Or matching 1+ chars other than / ? & instead of exactly 10:
(?<=\/ep\/)[^\/?&\s]+(?=[^\/\s]*$)
Regex demo
This would match the string as matching group 1:
ep\/\w+\/ep\/(\w+)
https://regex101.com/r/9tUjxG/1
While lookarounds can make this expression more sophisticated so that you won't require matching groups, it makes (in my experiences) the expression hard to read, understand and maintain/extend.
That's why I would always keep regexes as simple as possible.

How to exclude a specific string with REGEX? (Perl)

For example, I have these strings
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLETEA1B
APPLEWINE3B
APPLEWINE1C
I want all of these strings except those that have TEA or WINE1C in them.
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLEWINE3B
I've already tried the following, but it didn't work:
^APPLE(?!.*(?:TEA|WINE1C)).*$
Any help is appreciated as I'm also kinda new to this.
If you indeed have mutliple strings as you claim, there's no need to jam all that in one regex pattern.
/^APPLE/ && !/TEA|WINE1C/
If you have a single string, the best approach is probably to splice it into lines (split /\n/), but you could also use a single regex match too
/^APPLE(?!.*TEA|WINE1C).*/mg
You can use
^APPLE(?!.*TEA)(?!.*WINE1C).*
See the regex demo.
Details:
^ - start of string
APPLE - a fixed string
(?!.*TEA) - no TEA allowed anywhere to the right of the current location
(?!.*WINE1C) - no WINE1C allowed anywhere to the right of the current location
.* - any zero or more chars other than line break chars as many as possible.
If you don't want to match a string that has both or them (which is not in the current example data):
^APPLE(?!.*(WINE1C|TEA).*(?!\1)(?:TEA|WINE1C)).*
Explanation
^ Start of string
APPLE match literally
(?! Negative lookahead
.*(WINE1C|TEA) Capture either one of the values in group 1
.* Match 0+ characters
(?!\1)(?:TEA|WINE1C) Match either one of the values as long as it is not the same as previously matched in group 1
) Close the lookahead
.* Match the rest of the line
Regex demo

Regex pattern for localization

I am trying to find a regex pattern to fix a localize issue.
The usual delimiters are "." "," or "_" which i have stored into an array of delimiters.
I'm trying to find a pattern with match any of these delimiters which also ends with one or more 0.
For example 3,000 or 3,0 3.0 3.00
You could try positive lookahead
If indeed your data always has one or more 0 after any delimiter, using a positive lookahead ( (?=0+) in this case) might be what you are looking for...
More precisely, for the numbers you gave:
s/([_.,](?=0+))/g
should do the trick!
You could try it out and experiment with regex here!
We could likely start with an expression similar to:
\d+[.,](\d+)?[0]
and add additional boundaries to it, if we like so.
For instance, if we wish to capture the delimiters, we would be adding a capturing group:
\d+([.,])(\d+)?[0]
Demo
Or if we wish to remove delimiters, we would expand it to:
(\d+)([.,])(\d+)?([0])
and replace it with:
$1$3$4
Demo
Test
const regex = /(\d+)([.,])(\d+)?([0])/gm;
const str = `3,000
3,0
3.0
3.00`;
const subst = `$1$3$4`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log(result);
You could use add the delimiters in a character class [_.,] and use word boundaries \b to prevent the number being part of a larger word.
If thet are the only value, you might also use anchors to assert the start ^ and the ends $ of the string.
\b\d+[_.,]\d*0\b
That will match:
\b Word boundary
\d+ Match 1+ digits
[_.,] Match any of the listed in the character class
\d*0 Match 0+ digits followed by a zero
\b Word boundary
Regex demo

Regex Optional Match

I have this regex pattern which I made myself (I'm a noob though, and made it through following tutorials):
^([a-z0-9\p{Greek}].*)\s(Ε[0-9\p{Greek}]+|Θ)\s[\(]([a-z1-9\p{Greek}]+.*)[\)]\s-\s([a-z0-9\p{Greek}]+$)
And I'm trying to match the following sentences:
ΠΡΟΓΡΑΜΜΑΤΙΣΤΙΚΕΣ ΕΦΑΡΜ ΣΤΟ ΔΙΑΔΙΚΤΥΟ Ε2 (Ε.Β.Δ.) - ΔΗΜΗΤΡΙΟΥ
ΠΡΟΓΡΑΜΜΑΤΙΣΜΟΣ 1 Θ (ΑΜΦ) - ΜΑΣΤΟΡΟΚΩΣΤΑΣ
ΕΙΣΑΓΩΓΗ ΣΤΗΝ ΠΛΗΡΟΦΟΡΙΚΗ Θ (ΑΜΦ) - ΒΟΛΟΓΙΑΝΝΙΔΗΣ
And so on.
This pattern splits the string into 4 parts.
For example, for the string:
ΠΡΟΓΡΑΜΜΑΤΙΣΤΙΚΕΣ ΕΦΑΡΜ ΣΤΟ ΔΙΑΔΙΚΤΥΟ Ε2 (Ε.Β.Δ.) - ΔΗΜΗΤΡΙΟΥ
The first match is: ΠΡΟΓΡΑΜΜΑΤΙΣΤΙΚΕΣ ΕΦΑΡΜ ΣΤΟ ΔΙΑΔΙΚΤΥΟ (Subject's Name)
Second match is: Ε2 (Class)
Third match is: Ε.Β.Δ. (Room)
And the forth match is: ΔΗΜΗΤΡΙΟΥ (Teacher)
Now in some entries E*/Θ is not defined, and I want to get the 3 matches without the E*/Θ. How should I modify my pattern so that (Ε[0-9\p{Greek}]+|Θ) is an optional match?
I tried ? so far, but because in my previous matches i'm defining \s and \s it requires 2 whitespaces to get 3 matches and i only have one in my string.
I think you need to do two things:
Make .* lazy (i.e. .*?)
Enclose (?:\s(Ε[0-9\p{Greek}]+|Θ))? with a non-capturing optional group.
The regex will look like
^([a-z0-9\p{Greek}].*?)(?:\s(Ε[0-9\p{Greek}]+|Θ))?\s[\(]([a-z1-9\p{Greek}]+.*)[\)]\s-\s([a-z0-9\p{Greek}]+)$
^^ ^^ ^
See demo
If you do not make the first .* lazy, it will eat up the second group that is optional. Making it lazy will ensure that if there is some text that can be matched by the second capturing group, it will be "set".
Note you call capture groups matches, which is wrong. Matches are whole texts matched by the entire regular expression and captures are just substrings matched by parts of regexp enclosed in unescaped round brackets. See more on capture groups at regular-expressions.info.
You can use something like:
(E[0-9\p{Greek}]+|0)?
The whole group will be optional (?).

Finding a match one after another

How do I find multiple matches that are (and can only be) separated from each other by whitespaces?
I have this regular expression:
/([0-9]+)\s*([A-Za-z]+)/
And I want each of the matches (not groups) to be surrounded by a whitespace or another match. If the condition is not fullfilled, the match should not be returned.
This is valid: 1min 2hours 3days
This is not: 1min, 2hours 3days (1min and 2hours should not be returned)
Is there a simpler way of finding a continuous sequence of matches (in Java preferably) than repeating the whole regex before and after the main one, checking if there is a whitespace, start/end of the string or another match?
I believe this pattern will meet your requirements (provided that only a single space character separates your alphanumeric tokens):
(?<=^|[\w\d]\s)([\w\d]+)(?=\s|$)
^^^^^^^^^^ ^^^^^^^ ^^^^
(2) (1) (3)
A capture group that contains an alphanumeric string.
A look-behind assertion: To the left of the capture group must be a) the beginning of the line or b) an alphanumeric character followed by a single space character.
A look-ahead assertion: To the right of the capture group must be a) a space character or b) the end of the line.
See regex101.com demo.
Here is some sample data that I included in the demo. Each bolded alphanumeric string indicates a successful capture:
1min 2hours 3days
1min, 2hours 3days
42min 4hours 2days
String text = "1min 2hours 3days";
boolean match = text.matches("(?:\\s*[0-9]+\\s*[A-Za-z]+\\s*)*");
This is basically looking for a pattern on your example. Then using * after the pattern its looking for zero or more occurrence of the pattern in text. And ?: means doesn't capture the group.
This will will also return true for empty string. If you don't want the empty string to be true, then change * into +
I've mananged to solve my problem by splitting the string using string.split("\\s+") and then matching the results to the pattern /([0-9]+)\s*([A-Za-z]+)/.
There is an error here the '' will match all characters and ignore your rest
/([0-9]+)\s([A-Za-z]+)/
Change to
/(\d+)\s+(\w+)/g
This will return an array of matches either digits or word characters. There is no need to always write '[0-9]' or '[A-Za-z]' the same thing can be said as '\d' match any 0 to 9 more can be found at this cheat sheet regular expression cheat sheet