Content in between parenthesis regex - regex

I am quite new to regular expressions and may need some help.
I would like to get all expressions that are in between of parenthesis and end with a comma followed by a few digits.
for example asdasd alfalfa (asasdasd, 2002) asdasd fasted (asdasd) sfasadas (asdd,2333)
I already got the last part of this problem working by writing ,\s?\d+\) but I am struggling to achieve a solution for the first part of my problem.
Anyone got an idea how to get this working?

Try this:
\([^,)]+, ?\d+\)
See live demo.
An important trick here is the closing bracket ) in the character class [^,)]+, which prevents the match spanning across multiple pairs of brackets, ie matching from the opening bracket of an earlier bracketed term to the closing bracket of a later bracketed term.
If you only want the term before the comma (unclear because your example and your question text don't align on this point), convert the latter part of the regex to a look ahead:
(?<=\()[^,)]+(?=, ?\d+\))
See live demo.

Related

Notepad++ Regex Remove Character from Markdown Formatted Footnote

This is a follow-up question to what was solved yesterday:
Notepad++ Regex Replace Makeshift Footnotes format With Proper Markdown format
I managed to find a Regex to remove the offending semicolons in the main text area but by only cutting out the text and pasting back the result, which can only be done one by one.
I'm not sure how this can be done, but the expert can tell me.
So I have footnote references in markdown format. Two instances of the same thing:
[^1]:
[^2]:
.
.
.
[^99]:
I might not have 99 in a document but I wanted to show I need to match two digits here again.
As I said, there are two instances of these numbered references in the text. One in the main text pointing to the footnote and the footnote at the end of the document.
What I need is deleting the semi-colons from the main text and leave the
[^3]:
[^15]:
etc.
references at the end intact.
Because the main text references come after a word or at the end of a sentence (ususally before the sentence-ending period), there is never a case a reference would start a sentence (even if they seem to appear there once or twice because of word wrap).
I provided the exact opposite of my needs here:
Click here for Regex101 website link
I put in the exact opposite of what I want because I already knew of the
^
sign to match anything that is at the front of the line.
Now I would like to negate this, if possible, so that I would delete the semi-colons in the main text, not down at the bottom.
Of course, it is likely that my approach is not good and you'll come up with a completely different approach. Especially because there doesn't seem to be a NOT operator in Regex, if I read correctly.
I repeat: the Regex101 example with the match and substitution is exactly the opposite of what I want.
I am not sure if you can play around in the substitution line to get the desired negative effect.
I could have probably asked for removing the first occurence of semi-colons but I thought the important part of tackling the problem is that those items not to be matched are always at the start of the line, not the others.
Thanks for any suggestions
In Notepad++ you might use a negative lookabehind asserting not the start of the string to the left, and use \K to clear the match buffer matching only the colon that should be replaced by an empty string.
(?<!^)\[\^\d{1,2}]\K:
Explanation
(?<!^) Negative lookbehind, assert not the start of the start directly to the left
\[\^ Match [^
\d{1,2} Match 1 or 2 digits
] Match literally
\K Forget what is matched so far
: Match a colon
Regex demo

Regex matching any substring only after a delimiter

How to define a regex that will match an exact string like this:
{VAR}
Or anything in this format:
{VAR:10}
{VAR:something}
... meaning if ":" is present, then accept anything to the right of it until the closing }.
So, it should NOT match this:
{VAR2} or {VAR3}, etc.
Right now I am using 2 separate regexes to search for the above 2 scenarios and would like to use one.
It might be a bit crude, but I believe this is what you are looking for:
^{VAR(}|(:[A-Z0-9]+?)})$
This will take the start of the string as {VAR, then check if it either finds one of a } or a : followed by a series of numbers or uppercase characters until it finds a }.
Edit:
I didn't know that OP was looking for Anything after the character, so I have written a new version that will take any character.
^{VAR(}|(:.+)})$
Here is also a Demo of it.
First, I would suggest reading up on lookaheads, look behinds, and the If-Then-Else construct. I recommend this article at www.regular-expressions.info for a detailed explanation
The regex below should capture what you require
{VAR(?(?=:)[^}]*}|})
Demo
The pattern starts with {VAR, as this is constant for all of your scenarios.
We then leverage the if-then-else construct and a positive look ahead to check for a colon (?(?=:) and match everything up to and including the closing brace if a colon exists with [^}]*}. If a colon does not exist, the portion after the or | will only match a closing brace }.
Try Regex: {VAR}|(?<={VAR:)\w+(?=})
Demo

Regex Match between brackets (...)

I'm trying to grab 2 items from a simple line.
[Title](Description)
EDIT: actually a url looking to display called it description because i want it displayed not actually parsed.
[Trivium](https://www.youtube.com/user/trivium)
Grabbing between the brackets (...) doesn't seem to work at all for me. I've googled and found several variations with no luck, Thanks in advance :)
EDIT:
Tried the following:
[(.+?)]\((.*)\)
[(.+?)]\([^\(\r\n]*\)
[(.+?)]((.+?))
and a cpl more I cant find again
The first regex you listed almost has it right. Try using this regex instead:
\[.+?\]\((.*)\)
As #PM 77-1 pointed out, you need to escape the brackets by placing a backslash in front of them. The reason for this is that brackets are special regex metacharacters, or characters which have a special meaning. Brackets tell the regex engine to look for classes of characters contained inside of it.
Your original regex [(.+?)]\((.*)\) is actually doing this:
[(.+?)] match a period '.' 1 or more times
\((.*)\) match (anything), i.e. anything contained in parentheses
So this regex would match .....(stuff) but would not match [Title](Description), the latter which is what you really want.
Here is a link where you can test out the working regex:
Regex 101

Number Regular Expression Help

I am learning regular expressions and I am trying to create one that will validation either a whole number or a decimal.
I have created this regular expression:
^(\d+)|([\d+][\.{1}][\d+])$
It almost works, but it says a number like:
12.
12..
12..67
are matches.
I thought
([\d+][\.{1}][\d+])
meant it had to have one or more numbers, followed by a dot (and only one), followed by one or more numbers.
Can someone explain what I am doing wrong?
As a learning process I'm interested in what I am doing wrong rather than what is another way of doing it. I tried following the syntax examples but I have missed something.
You are wrong
([\d+][\.{1}][\d+])
with the square brackets are you creating character classes. that means
[\d+] does mean match a digit or a + once.
[\.{1}] does mean match a . or a { or a 1 or a }
To get the behaviour you expect remove the square brackets
(\d+\.{1}\d+)
This will match at least one digit, a . followed by one or more digits
The other problem here is the ^ belongs only to the first part of your expression and the $ belong only to the last part of your alternation. So you should put brackets around the complete alternation
^((\d+)|(\d+\.{1}\d+))$
If you don't need the match in a capturing group you can remove the brackets around the single alternatives
^(\d+|\d+\.{1}\d+)$
As last point as Jens noted
{1} is redundant \.{1} is the same than \.
Then we are at
^(\d+|\d+\.\d+)$
You can try with:
^(\d+(\.\d+)?)$
Your regex is nearly there, you just need to remove the square brackets -
^(\d+)|(\d+\.{1}\d+)$
Should work for what you want.

How to continue a match in Regex

price:(?:(?:\d+)?(?:\.)?\d+|min)-?(?:(?:\d+)?(?:\.)?\d+|max)?
This Regex matches the following examples correctly.
price:1.00-342
price:.1-23
price:4
price:min-900.00
price:.10-.50
price:45-100
price:453.23-231231
price:min-max
Now I want to improve it to match these cases.
price:4.45-8.00;10.45-14.50
price:1.00-max;3-12;23.34-12.19
price:1.00-2.50;min-12;23.34-max
Currently the match stops at the semi colon. How can I get the regex to repeat across the semi-colon dividers?
Final Solution:
price:(((\d*\.)?\d+|min)-?((\d*\.)?\d+|max)?;?)+
Add an optional ; at the end, and make the whole pattern to match one or more:
price:((?:(?:\d+)?(?:\.)?\d+|min)-?(?:(?:\d+)?(?:\.)?\d+|max)?;?)+
(?:\d+)? is the same thing as \d*, and (?:\.)? can just be \.?. Simplified, your original regex is:
price:(?:\d*\.?\d+|min)(?:-(?:\d*\.?\d+|max))?
You have two choices. You can either do price([:;]range)* where range is the regex you have for matching number ranges, or be more precise about the punctuation but have to write out range twice and do price:range(;range)*.
price([:;]range)* -- shorter but allows first ':' to be ';'
price:range(;range)* -- longer but gets colon vs semi-colon correct
Pick one of these two regexes:
price[:;](?:\d*\.?\d+|min)(?:-(?:\d*\.?\d+|max))?
price:(?:\d*\.?\d+|min)(?:-(?:\d*\.?\d+|max))?(?:(?:\d*\.?\d+|min)(?:-(?:\d*\.?\d+|max))?)*
First there are some issues with your regular expression: to match xx.yyy instead of the expression (?:\d+)?(?:\.)?\d+ you can use this (?:\d*\.)?\d+. This can only match in one way so it avoids unnecessary backtracking.
Also currently your regular expression matches things like price:minmax and price:1.2.3 which I assume you do not want to match.
The simple way to repeat your match is to add a semi-colon and then repeat your regular expression verbatim.
You can do it like this though to avoid writing out the entire regular twice:
price:(?:(?:(?:\d*\.)?\d+|min)(?:-(?:(?:\d*\.)?\d+|max))?(?:;|$))*
See it in action on Rubular.
price:((?:(?:\d+)?(?:\.)?\d+|min)-?(?:(?:\d+)?(?:\.)?\d+|max)?;?)+
I'm not sure what's up with all of the ?'s (I know the syntax, I just don't know why you're using it so much), but that should do it for you.