Regex: pattern repeated capture – delimit the matching at the end of the pattern - non capture group and lookaheads negative example - regex

I wish to match the end of my text and for it I have to match all the characters and the line breaks.
But I must exclude the beginning of the next capture!
What I want is to delimit the end of the pattern where the next pattern begins.
I tried to replace
[^-]
by something like
(?!-{2}\\*{3})
It doesn't work !
So I want to capture the number and I want to capture the whole paragraph (some text) between (--*** x ***)

Using this regex seems to work:
--\*{3}([\d]*)\*{3}(((?!-).*\n)*)
1st Capturing group: The digit inside the stars.
2nd Capturing group: The text between the "headers"
3rd Capturing group: The last line of the paragraph.
A link with the regex tested:
https://regex101.com/r/xJ0gC6/1

I found exactly what I wanted! :)
--\*{3}([^!*]*)\*{3}((?:(?!-{2}\*{3})(?:\n|.))*)
I must group what I want and what I don't want.
For that I must use a 'non-capture group' and a 'negative lookahead':
(?!nowant)(?:want)
Then I must use a 'non-capture group' to agregate the matching:
(?:(?!nowant)(?:want))
After, I add the quantifier '*'
(?:(?!nowant)(?:want))*
And finally, I add a 'capture group':
((?:(?!nowant)(?:want))*)
So here is the regex:
((?:(?!-{2}\*{3})(?:\n|.))*)
You can see the complete Regex here :
https://regex101.com/r/xJ0gC6/2

Related

How can I remove something from the middle of a string with regex?

I have strings which look like this:
/xxxxx/xxxxx-xxxx-xxxx-338200.html
With my regex:
(?<=-)(\d+)(?=\.html)
It matches just the numbers before .html.
Is it possible to write a regex that matches everything that surrounds the numbers (matches the .html part and the part before the numbers)?
In your current pattern you already use a capturing group. In that case you might also match what comes before and after instead of using the lookarounds
-(\d+)\.html
To get what comes before and after the digits, you could use 2 capturing groups:
^(.*-)\d+(\.html)$
Regex demo
In the replacement use the 2 groups.
This should do the job:
.*-\d+\.html
Explanation: .* will match anything until -\d+ say it should match a - followed by a sequence of digits before a \.html (where \. represents the character .).
To capture groups, just do (.*-)(\d+)(\.html). This will put everything before the number in a group, the number in another group and everything after the number in another group.

Match everything after end of a line and before start of another line

I am trying to match everything between 2 words
1. AM at the end of a line
2. DR at the beginning of a line
Date:11/18/2016:9:39 AM
NIP CR/JUPITER, WHITE/GIN
DR Size:1200mb
With the expected outcome -> NIP CR/JUPITER, WHITE/GIN
I was able to get this done using a combination of lookbehind and lookahead (?<=(?:AM|PM))[\s\S]*?(?=DR) however this regex would not work for in some scenarios like this below
Date:11/18/2016:9:39 AM
NIP CR/DRAIN, WHITE/GIN
DR Size:1200mb
The second example has DR in DRAIN. You could add a newline before DR and perhaps also add a word boundary after DR\b to prevent being part of a larger word:
(?<=(?:AM|PM))[\s\S]*?(?=\nDR)
^^
Regex demo
But could also move the newline to the positive lookbehind:
(?<=(?:AM|PM)\n).*(?=\nDR\b)
Regex demo
You might also match AM followed by a newline, capture the next line in a capturing group followed by matching a newline and DR:
AM\n(.*)\nDR
Regex demo
RegEx 1
This RegEx might help you to do so by creating one group ($1) where your target line is.
AM\n(.+)\nDR
RegEx 2
Another approach is to target the second line directly such as this RegEx
[A-Z]{3}\s[A-Z]+\/[A-Z]+.+
You can also use group () and call it using $1:
([A-Z]{3}\s[A-Z]+\/[A-Z]+.+)
RegEx 3
This RegEx creates one group by adding additional boundaries to the pattern:
([A-Z]{3}\s[A-Z]+\/[A-Z,]+\s[A-Z,\/]+)

I want a regex code that accepts only a list of characters that are seperated by a comma or a space

So my problem is that i have a text field and i want the user to type a list of days only and to not accept any other word for example :
monday tuesday saturday
or monday,tuesday,saturday
this is what i wrote
"\b(monday|tuesday|wednesday|thursday|friday|saturday|sunday|\b"
but this didn't work i don't know why i'm a regex beginner and i need some help, thank you guys.
^((monday|tuesday|wednesday|thursday|friday|saturday|sunday)[, ])*(monday|tuesday|wednesday|thursday|friday|saturday|sunday)$
The ^ will anchor the pattern to match the start of the value, and the $ anchors at the end of the value. The combination of those two means the pattern will only match if the entire value matches. Without the anchors, the pattern would match anything which contains the pattern.
The pattern is saying that it must be zero or more dayname-followed-by-space-or-comma, followed by a dayname.
In your pattern the last pipe | of the alternation should be a closing parenthesis to close the group and you are not taking a comma or a space into account.
\b(monday|tuesday|wednesday|thursday|friday|saturday|sunday|\b
^
If you are not referring to the capturing groups in your code or tool, you could make them non capturing using (?: instead of (
You might update your pattern to use anchors ^ and $ to assert the start and the end of the string. Then match 1 day and repeat 0+ times matching another day prededing with a comma or a space.
^(?:mon|tues|wednes|thurs|fri|satur|sun)day(?:[, ](?:mon|tues|wednes|thurs|fri|satur|sun)day)*$
Regex demo
If you want to allow only the specified formats and for example not monday tuesday,saturday using a space AND a comma you could capture the space or comma the first time and then make use of a backreference using \1:
^(?:mon|tues|wednes|thurs|fri|satur|sun)day(?:([, ])(?:mon|tues|wednes|thurs|fri|satur|sun)day)?(?:\1(?:mon|tues|wednes|thurs|fri|satur|sun)day)*$
Regex demo

Regex Optional Match

I have this regex pattern which I made myself (I'm a noob though, and made it through following tutorials):
^([a-z0-9\p{Greek}].*)\s(Ε[0-9\p{Greek}]+|Θ)\s[\(]([a-z1-9\p{Greek}]+.*)[\)]\s-\s([a-z0-9\p{Greek}]+$)
And I'm trying to match the following sentences:
ΠΡΟΓΡΑΜΜΑΤΙΣΤΙΚΕΣ ΕΦΑΡΜ ΣΤΟ ΔΙΑΔΙΚΤΥΟ Ε2 (Ε.Β.Δ.) - ΔΗΜΗΤΡΙΟΥ
ΠΡΟΓΡΑΜΜΑΤΙΣΜΟΣ 1 Θ (ΑΜΦ) - ΜΑΣΤΟΡΟΚΩΣΤΑΣ
ΕΙΣΑΓΩΓΗ ΣΤΗΝ ΠΛΗΡΟΦΟΡΙΚΗ Θ (ΑΜΦ) - ΒΟΛΟΓΙΑΝΝΙΔΗΣ
And so on.
This pattern splits the string into 4 parts.
For example, for the string:
ΠΡΟΓΡΑΜΜΑΤΙΣΤΙΚΕΣ ΕΦΑΡΜ ΣΤΟ ΔΙΑΔΙΚΤΥΟ Ε2 (Ε.Β.Δ.) - ΔΗΜΗΤΡΙΟΥ
The first match is: ΠΡΟΓΡΑΜΜΑΤΙΣΤΙΚΕΣ ΕΦΑΡΜ ΣΤΟ ΔΙΑΔΙΚΤΥΟ (Subject's Name)
Second match is: Ε2 (Class)
Third match is: Ε.Β.Δ. (Room)
And the forth match is: ΔΗΜΗΤΡΙΟΥ (Teacher)
Now in some entries E*/Θ is not defined, and I want to get the 3 matches without the E*/Θ. How should I modify my pattern so that (Ε[0-9\p{Greek}]+|Θ) is an optional match?
I tried ? so far, but because in my previous matches i'm defining \s and \s it requires 2 whitespaces to get 3 matches and i only have one in my string.
I think you need to do two things:
Make .* lazy (i.e. .*?)
Enclose (?:\s(Ε[0-9\p{Greek}]+|Θ))? with a non-capturing optional group.
The regex will look like
^([a-z0-9\p{Greek}].*?)(?:\s(Ε[0-9\p{Greek}]+|Θ))?\s[\(]([a-z1-9\p{Greek}]+.*)[\)]\s-\s([a-z0-9\p{Greek}]+)$
^^ ^^ ^
See demo
If you do not make the first .* lazy, it will eat up the second group that is optional. Making it lazy will ensure that if there is some text that can be matched by the second capturing group, it will be "set".
Note you call capture groups matches, which is wrong. Matches are whole texts matched by the entire regular expression and captures are just substrings matched by parts of regexp enclosed in unescaped round brackets. See more on capture groups at regular-expressions.info.
You can use something like:
(E[0-9\p{Greek}]+|0)?
The whole group will be optional (?).

perl style regex to match nth item in a list

Trying to match the third item in this list:
/text word1, word2, some_other_word, word_4
I tried using this perl style regex to no avail:
([^, ]*, ){$m}([^, ]*),
I want to match ONLY the third word, nothing before or after, and no commas or whitespace. I need it to be a regex, this is not in a program but UltraEdit for a word file.
What can I use to match some_other_word (Or anything third in the list.)
Based on some input by the community members I made the following change to make the logic of the regex pattern clearer.
/^(?:(?:.(?<!,))+,){2}\s*(\w+).*/x
Explanation
/^ # 1.- Match start of line.
(?:(?:.(?<!,))+ # 2.- Match but don't capture a secuence of character not containing a comma ...
,) # 3.- followed by a comma
{2} # 4.- (exactly two times)
\s* # 5.- Match any optional space
(\w+) # 6.- Match and capture a secuence of the characters represented by \w a leat one character long.
.* # 7.- Match anything after that if neccesary.
/x
This is the one suggested previously.
/(?:\w+,?\s*){3}(\w+)/
Try group 1 of this regex:
^(?:.*?,){2}\s*(.*?)\s*(,|$)
See a live demo using your sample, plus an edge case, input showing capture in group 1.
It can't only return one match at a time because your string has more than one occurrence of the same pattern and Regular Expression doesn't have a selective return option! So you can do whatever you want from the returned array.
,\s?([^,]+)
See it in action, 2nd matched group is what you need.