regex mandatory character within character class - regex

I'm having a bit of trouble creating this regular expression. I'm not sure how to make the , required but also be in an optional class.
^[0-9]+[,[0-9]+]?$
I'm trying to do:
starts with number(s)
optionally has
comma AND
additional number(s)
What I can't figure out is how to make the comma and 2nd set of numbers optional, but, if the second set of numbers exists then the comma is required.
Could someone explain how this would be done?

Use a group, denoted with a pair of parentheses:
^[0-9]+(,[0-9]+)?$
The question mark quantifier then applies to the whole group, not just the previous atom.

Very close. For the second part, you need "zero or one of", so (,[:digit:]+)?

Related

Dynamic class operations within a regular expression

I am trying to write a regex which excludes certain characters from a class based on the current content of capturing groups. The specific task that made me look for such a thing was to match lowercase letters in alphabetical order.
I searched through Rex's page (https://www.rexegg.com/regex-class-operations.html) to see if there was any way to change the class' content, but was unable to find anything.
Take the following attempt as a brief example: ([a-z])[a-z--[\1]]
Though it's not a correct regular expression, it demonstrates the concept. The idea is that it would match two letters that are not the same.
Note: the expression shown follows a Python-like syntax, and can also be written as:
([a-z])[a-z&&[\1]] or ([a-z])(?![\1])[a-z]
But I am going to use the Python syntax.
In the examples above the nested brackets are optional(in certain engines), but for the ultimate goal they are necessary. The pattern I am trying to match the ordered letters with would be something like this:
^(?:([a-z])([a-z--[a-(?(2)\2|\1)]])*+)?$
The first character class matches a letter which is immediately captured by the group, meaning that the letter will be excluded from the group containing the conditional. the first time the second group tries to match, condition inside the conditional statement evaluates to false, since there has not been a second capture yet, so it "matches" the first group's content, which should result in the exclusion of the first letter from the class. In later steps the second group will be set, meaning that all the letters between 'a' and the most recently captured letter will be excluded.
I know, it seems complicated. Maybe refactoring the pattern will help, take a look at this one:
^(?:([a-z])([(?(2)\2|\1)-z])*+)?$
This example makes no use of set operations, but the idea is roughly the same. The first group matches a letter, then the class inside the second group matches anything between the captured letter and 'z', which is noted by the [(?(2)\2|\1)-z] part. The conditional is there to ensure that the lower boundary of the character interval is the most recently captured character.
This could also be written using subroutine calls, but I doubt it would solve the problem. The issue might be that the classes are precompiled (and so are subroutines), so they cannot change during the matching process.
Are you guys aware of a workaround or an engine that supports such operations? I am interested in the dynamic class operation itself rather than a different way to match alphabetically ordered letters.

Paired characters in regular expression

I expect this is very easy, but I can't work out how to match optional character pairs in regex. Regular expressions are not something I have ever had to do before.
I want to be able to match "=N","=B","=R" or "=Q" in a character string, optionally -- but if they appear, they must appear paired with the equal sign. So =?[NBRQ]? won't work for me, because someone could type 'N' without the accompanying equal sign. So it must be "=N","=B", "=R" or "=Q" or nothing at all.
If you need to make more than one regex production optional, enclose them in parentheses, capturing or non-capturing:
(=[NBRQ])?
The above would match an optional =N, =B, =R, or =Q. Since the question mark appears after parentheses, the entire group is optional, not its individual parts.

Regex matching order numbers e.g DE + 10 numbers or AT +10 numbers

I'm actually trying to match some order numbers in a string.
The string could look like this:
SDSFwsfcwqrewrPL0000018604ergerzergdsfa
or
FwsfcwqrewrAT0000018604ergerzergdsfaD
and I need to match the "PL0000018604" or "AT0000018604".
Actually I'm using something like that and it works:
.?(AT[0-9]{10})|(BE[0-9]{10})|(FR[0-9]{10})|(IT[0-9]{10})
But the more order prefixes we get, the longer the expression will be.
It is always 2 uppercase chars followed by 10 digits and I want to specify the different uppercase chars.
Is there any shorter version?
Thanks for your help :)
If the prefixes must be specific, there's not much of a way to make the pattern shorter. Though, you can collect all the prefixes at the front of the expression so you only have to have the numeric part once.
for example:
(AT|BE|FR|IT)[0-9]{10}
Depending on how you call it, if you need the whole expression to be captured as a group (versus simply matching, which is what the question asked about), you can add parenthesis around the whole expression. That doesn't change what is matched, but it will change what is returned by whatever function uses the expression.
((AT|BE|FR|IT)[0-9]{10})
And, of course, if you just want the number part to also be captured as a separate group, you can add more parenthesis
((AT|BE|FR|IT)([0-9]{10}))

htaccess regular expression explaination

I have been tasked with changing an .htaccess file. Unfortunately, I know very little about regular expressions, and so most of the file is unreadable for me. In particular, I have these two REs...
1: ^(?!((www|web3|web4|web5|web6|cm|test)\.mydomain\.com)|(?:(?:\d+\.){3}(?:\d+))$).*$
2: ^/([^/][^/])/([^/][^/])/([^/]+)/Job-Posting/$ /Misc/jobposting\.asp\?country=$1&state=$2&city=$3
For the first one, I understand the first half, more or less. it's trying to match against something that ISN'T www.mydomain.com, or web3.mydomain.com, etc., and that it may match that zero or one times. What I'm not clear on is what the second half of that does. My research suggests that ?: implies some sort of flag, but I didn't see any example that explained what exactly that meant. Please explain what this part means, as well as provide an example that would match it.
For the second one, the comments say this is applicable for a url containing /US/NY/Rochester/Job-Posting/. From this I can infer that ^/ means one character, but again, I couldnt find that in my research so far. What is the formal definition of ^/ ? What is the significance of putting it into square brackets [^/] ?
If I can get a handle on these two RE I should be able to adapt them to my needs. Your help is much appreciated.
?: doesn't match anything in particular, it modifies the behavior of the parenthesis. The ?: means the parenthesis are non-capturing, and thus cannot be referenced in the rule. Non capturing parens are good to use when you don't need to reference the captured text because the system doesn't have to 'remember' the text, which saves resources.
the code in question:
(?:(?:\d+\.){3}(?:\d+))
matches one or more digits followed by a period times three, then one or more digit. This will match IP addresses (ex 127.0.0.1). This will also match 123456.1.1.3456789, so you might want to restrict the number of characters allowed (?:(?:\d{1,3}.){3}(?:\d{1,3})), thought I haven't tested this so take it with a grain of salt.
Info on non capturing groupings.
The second item revolves around using square brackets as a character set. Square brackets match anything noted inside them, with ^ negating the match. So [ad02] will match any of the four characters a,d,0 or 2, while [^ad02] will match any character that is not a,d,0, or 2. So, ^/ means any character that is not /.
One of the tricky things about square brackets is the number of items they will match. [^/] will match one character, but so does [ad02]. It doesn't matter how many characters you have in the set, it still obeys the modifiers on the brackets. So [^/]{3} will match any series of 3 characters that does not contain a forward slash, while [^/]{2} will match a 2 character string with the same restriction.
For more info on character sets see Character Classes or Character Sets

getting at least 1 of 2 zero or more sets with a regular expression

How would I write a regular expression that allows for zero or more of one group, and zero or more of another group, but at least one of the two groups has to exist?
Specifically, I want to get a spreadsheet like reference, so it should get A1:B5 (for a whole region), A:A (for a whole column), or 5:5 (for a whole row).
I first tried
[A-Za-z]*[\d]*:[A-Za-z]*[\d]*
but this wouldn't be sufficient because then simply typing : or B6: would also satisfy that criteria.
Any help would be appreciated.
You can do this with grouping...
/((how)|(now))+/
If you want to match a range but not a cell reference, you could just enumerate the ways to do that:
([A-Z]:[A-Z])|(\d+:\d+)|([A-Z]\d+:[A-Z]\d+)
One way would be an explicit alternation:
(?:[a-zA-Z]+|\d+|[a-zA-Z]+\d+):(?:[a-zA-Z]+|\d+|[a-zA-Z]+\d+)
If your engine supports lookbehind, however, you could use that:
(?>[a-zA-Z]*\d*(?<=.)):(?>[a-zA-Z]*\d*)(?<=.))
This says "zero or more letters, followed by zero or more numbers, which must end in at least one character (.). That guarantees it won't be empty. The atomic grouping (?>...) means that the lookbehind (?<=.) can't match whatever came before that point.