Paired characters in regular expression - c++

I expect this is very easy, but I can't work out how to match optional character pairs in regex. Regular expressions are not something I have ever had to do before.
I want to be able to match "=N","=B","=R" or "=Q" in a character string, optionally -- but if they appear, they must appear paired with the equal sign. So =?[NBRQ]? won't work for me, because someone could type 'N' without the accompanying equal sign. So it must be "=N","=B", "=R" or "=Q" or nothing at all.

If you need to make more than one regex production optional, enclose them in parentheses, capturing or non-capturing:
(=[NBRQ])?
The above would match an optional =N, =B, =R, or =Q. Since the question mark appears after parentheses, the entire group is optional, not its individual parts.

Related

htaccess regular expression explaination

I have been tasked with changing an .htaccess file. Unfortunately, I know very little about regular expressions, and so most of the file is unreadable for me. In particular, I have these two REs...
1: ^(?!((www|web3|web4|web5|web6|cm|test)\.mydomain\.com)|(?:(?:\d+\.){3}(?:\d+))$).*$
2: ^/([^/][^/])/([^/][^/])/([^/]+)/Job-Posting/$ /Misc/jobposting\.asp\?country=$1&state=$2&city=$3
For the first one, I understand the first half, more or less. it's trying to match against something that ISN'T www.mydomain.com, or web3.mydomain.com, etc., and that it may match that zero or one times. What I'm not clear on is what the second half of that does. My research suggests that ?: implies some sort of flag, but I didn't see any example that explained what exactly that meant. Please explain what this part means, as well as provide an example that would match it.
For the second one, the comments say this is applicable for a url containing /US/NY/Rochester/Job-Posting/. From this I can infer that ^/ means one character, but again, I couldnt find that in my research so far. What is the formal definition of ^/ ? What is the significance of putting it into square brackets [^/] ?
If I can get a handle on these two RE I should be able to adapt them to my needs. Your help is much appreciated.
?: doesn't match anything in particular, it modifies the behavior of the parenthesis. The ?: means the parenthesis are non-capturing, and thus cannot be referenced in the rule. Non capturing parens are good to use when you don't need to reference the captured text because the system doesn't have to 'remember' the text, which saves resources.
the code in question:
(?:(?:\d+\.){3}(?:\d+))
matches one or more digits followed by a period times three, then one or more digit. This will match IP addresses (ex 127.0.0.1). This will also match 123456.1.1.3456789, so you might want to restrict the number of characters allowed (?:(?:\d{1,3}.){3}(?:\d{1,3})), thought I haven't tested this so take it with a grain of salt.
Info on non capturing groupings.
The second item revolves around using square brackets as a character set. Square brackets match anything noted inside them, with ^ negating the match. So [ad02] will match any of the four characters a,d,0 or 2, while [^ad02] will match any character that is not a,d,0, or 2. So, ^/ means any character that is not /.
One of the tricky things about square brackets is the number of items they will match. [^/] will match one character, but so does [ad02]. It doesn't matter how many characters you have in the set, it still obeys the modifiers on the brackets. So [^/]{3} will match any series of 3 characters that does not contain a forward slash, while [^/]{2} will match a 2 character string with the same restriction.
For more info on character sets see Character Classes or Character Sets

Regex for extracting qmake variables

I'm trying to write the QRegExp for extracting variable names from qmake project code (*.pro files).
The syntax of variable usage have two forms:
$$VAR
$${VAR}
So, my regular expression must handle both cases.
I'm trying to write expression in this way:
\$\$\{?(\w+)\}?
But it does not work as expected: for string $$VAR i've got $$V match, with disabled "greeding" matching mode (QRegExp::setMinimal (true)). As i understood, gready-mode can lead to wrong results in my case.
So, what am i doing wrong?
Or maybe i just should use greedy-mode and don't care about this behavior :)
P.S. Variable name can't contains spaces and other "special" symbols, only letters.
You do not need to disable greedy matching. If greedy matching is disabled, the minimal match that satisfies your expression is returned. In your example, there's no need to match the AR, because $$V satisfies your expression.
So turn the minimal mode back on, and use
\$\$(\w+|\{\w+\})
This matches two dollar signs, followed by either a bunch of word characters, or by a bunch of word characters between braces. If you can trust your data not to contain any non-matching braces, your expression should work just as well.
\w is equal to [A-Za-z0-9_], so it matches all digits, all upper and lowercase alphabetical letters, and the underscore. If you want to restrict this to just the letters of the alphabet, use [A-Za-z] instead.
Since the variable names can not contain any special characters, there's no danger of matching too much, unless a variable can be followed directly by more regular characters, in which case it's undecidable.
For instance, if the data contains a string like Buy our new $$Varbuster!, where $$Var is supposed to be the variable, there is no regular expression that will separate the variable from the rest of the string.

regex mandatory character within character class

I'm having a bit of trouble creating this regular expression. I'm not sure how to make the , required but also be in an optional class.
^[0-9]+[,[0-9]+]?$
I'm trying to do:
starts with number(s)
optionally has
comma AND
additional number(s)
What I can't figure out is how to make the comma and 2nd set of numbers optional, but, if the second set of numbers exists then the comma is required.
Could someone explain how this would be done?
Use a group, denoted with a pair of parentheses:
^[0-9]+(,[0-9]+)?$
The question mark quantifier then applies to the whole group, not just the previous atom.
Very close. For the second part, you need "zero or one of", so (,[:digit:]+)?

Regex to check if a string contains at least A-Za-z0-9 but not an &

I am trying to check if a string contains at least A-Za-z0-9 but not an &.
My experience with regexes is limited, so I started with the easy part and got:
.*[a-zA-Z0-9].*
However I am having troubling combining this with the does not contain an & portion.
I was thinking along the lines of ^(?=.*[a-zA-Z0-9].*)(?![&()]).* but that does not seem to do the trick.
Any help would be appreciated.
I'm not sure if this what you meant, but here is a regular expression that will match any string that:
contains at least one alpha-numeric character
does not contain a &
This expression ensures that the entire string is always matched (the ^ and $ at beginning and end), and that none of the characters matched are a "&" sign (the [^&]* sections):
^[^&]*[a-zA-Z0-9][^&]*$
However, it might be clearer in code to simply perform two checks, if you are not limited to a single expression.
Also, check out the \w class in regular expressions (it might be the better solution for catching alphanumeric chars if you want to allow non-ASCII characters).

Regular expression for parsing string inside ""

<A "SystemTemperatureOutOfSpec" >
What should be the regular expression for parsing the string inside "". In the above sample it is 'SystemTemperatureOutOfSpec'
In JavaScript, this regexp:
/"([^"]*)"/
ex.
> /"([^"]*)"/.exec('<A "SystemTemperatureOutOfSpec" >')[1]
"SystemTemperatureOutOfSpec"
Similar patterns should work in a bunch of other programming languages.
try this
string Exp = "\"!\"";
I am not sure I understand your question well but if you need to match everything between double quotes, here it is: /(?<=").*?(?=")/s
(?<=<A\s")(?<content>.*)(?="\s>)
Regular expressions don't get much easier than this, so you should be able to solve it by yourself. Here's how you go about doing that:
The first step is to try to define as precisely as possible what you want to find. Let's start with this: you want to find a quote, followed by some number of characters other than a quote, followed by a quote. Is that correct? If so, our pattern has three parts: "a quote", "some characters other than a quote", and "a quote".
Now all we need to do is figure out what the regular expressions for those patterns are.
A quote
For "a quote", the pattern is literally ". Regular expressions have special characters which you have to be aware of (*, ., etc). Anything that's not a special character matches itself, and " is one of those characters. For a complete list of special characters for your language, see the documentation.
Characters other than a quote
So now the question is, how do we match "characters other than a quote"? That sounds like a range. A range is square brackets with a list of allowable characters. If the list begins with ^ it means it is a list of not-allowed characters. We want any characters other than a quote, so that means [^"].
"Some"
That range just means any one of the characters in the range, but we want "some". "Some" usually means either zero-or-more, or one-or-more. You can place * after a part of an expression to mean zero-or-more of that part. Likewise, use + to mean one-or-more (and ? means zero-or-one). There are a few other variations, but that's enough for this problem.
So, "some characters other than a quote" is the range [^"] (any character other than a quote) followed by * (zero-or-more). Thus, [^"]*
Putting it all together
This is the easy part: just combine all the pieces. A quote, followed by some characters other than a quote, followed by a quote, is "[^"]*".
Capturing the interesting part
The pattern we have will now match your string. What you want, however, is just the part inside the quotes. For that you need a "capturing group", which is denoted by parenthesis. To capture a part of a regular expression, put it in parenthesis. So, if we want to capture everything but the beginning and ending quote, the pattern becomes "([^"]*)".
And that's how you learn regular expressions. Break your problem down into a precise statement composed of short sequences of characters, figure out the regular expression for each sequence, then put it all together.
The pattern in this answer may not actually be the perfect answer for you. There are some edge cases to worry about. For example, you may only want to match a quote following a non-word character, or only quotes at the beginning or end of a word. That's all possible, but is highly dependent on your exact problem. Figuring out how to do that is just as easy though -- decide what you want, then look at the documentation to see how to accomplish that.
Spend one day practicing on regular expressions and you'll never have to ask anyone for help with regular expressions for the rest of your career. They aren't hard, but they do require concentrated study.
Are you sure you need regular expression matching here? Looking at your "string" you might be better off using a Xml parser?