regular expression help: break on dash...or not - regex

Doing URL rewrites in IIS8, and need help with a regular expression.
In a URL like this:
mysite.com/wichita-2014-Honda-Ridgeline
The following expression works great
^wichita-([^/-]+)-([^/-]+)-([^/-]+)$
That splits down to an easy
/index.php?year={R:0}&make={R:1}&model={R:2}
Works great. It breaks on
mysite.com/wichita-2014-Honda-CR-V
Note the new dash in "CR-V". The solution so far has been to have an exception rule preceeding the more general rule:
^wichita-([^/-]+)-([^/-]+)-CR-V$
I've tried
^wichita-([^/-]+)-([^/-]+)-[a-zA-Z0-9\-]*$
But that returns three matches and R:0 is the whole URL! Obviously, that's not what I want.
I have to imagine there's a better way. I just haven't found it yet, and I'm not strong with regular expressions.
So, to sum up: if possible, I'm looking for a regular expression that:
Matches on three terms: year, make, model (just to give them names)
Those three search terms are separated by a dash.
Term 3 might ALSO contain a dash (or a space, expressed in the URL as a plus sign), which should be taken as part of the term and not a separator.
Help please?

I'd use ^wichita-([^/-]+)-([^/-]+)-(.+)$, and map the capture groups R:1, R:2, and R:3 to /index.php?year={R:1}&make={R:2}&model={R:3}
The .+ captures everything after the first two capture groups into the 'anything goes' last bucket.

Related

Which regex could match those values?

I'm developing new specific syntax. Within it there are two kinds of code:
I: = or + or - (one or several plus, minus or equal signs in a row);
Regex for that is /[+=-]+/.
II: 6:+ or 15:- or 999:= (any integer, followed by one plus, minus or equal sign);
Regex for that is /\d+:[+=-]/.
In one entry there may be any amount of any of these tokens.
Each new entry has to be surrounded by brackets: [code here].
Kinds of code in brackets may stand next to each other: [=6:+-] or [15:-++=3:+] etc.
Empty entries are not allowed.
So, I can't make a regex to match proper entries!
I've tried this one /\[([=+-]*(\d+:[=+-])?[=+-]*)\]/, but it matches [] as well, while it is an еггог.
MATCH any of those
[=] [---] [+=-] [=+-] [17:=] [==+-] [6:=-] [+5:=-]
[==-=+] [+=====-] [15:-++=3:+] [=======] [+=-+==-] [---==--] [==-=+==] [=--==--]
NO MATCH
[] [=:1] [:2+] [3-:]
I dont know what flavor of regex but this should work for pretty much all of them:
\[((?:[+=-]+|[+=-]?\d+:[+=-]+)+)\]
Debuggex Demo
It makes use of | or operand, so it either captures one kind of match (the collection of -+= signs or the numbers with colons and such)
Also, it seems that since you want [+5:=-] to match, I added a [+-]? to match for that.
EDIT:
This allows for multiple occurrences of the language. This, however, may be trivial as there is nothing to distinguish between separate parts of code.
OMG, it could be way more simple:
\[(?:(?:\d+:)?[+=-])+\]
I can't believe I was so stupid.

Multiple spaces, multiple commas and multiple hypens in alphanumeric regex

I am very new to regex and regular expressions, and I am stuck in a situation where I want to apply a regex on an JSF input field.
Where
alphanumeric
multiple spaces
multiple dot(.)
multiple hyphen (‐)
are allowed, and Minimum limit is 1 and Maximum limit is 5.
And for multiple values - they must be separated by comma (,)
So a Single value can be:
3kd-R
or
k3
or
-4
And multiple values (must be comma separated):
kdk30,3.K-4,ER--U,2,.I3,
By the help of stackoverflow, so far I am able to achieve only this:
(^[a-zA-Z0-9 ]{5}(,[a-zA-Z0-9 ]{5})*$)
Something like
^[-.a-zA-Z0-9 ]{1,5}(,[-.a-zA-Z0-9 ]{1,5})*$
Changes made
[-.a-zA-Z0-9 ] Added - and . to the character class so that those are matched as well.
{1,5} Quantifier, ensures that it is matched minimum 1 and maximum 5 characters
Regex demo
You've done pretty good. You need to add hyphen and dot to that first character class. Note: With the hyphen, since it delegates ranges within a character class, you need to position it where contextually it cannot be specifying a range--not to say put it where it seems like it would be an invalid range, e.g., 7-., but positionally cannot be a range, i.e., first or last. So your first character class would look something like this:
[a-zA-Z 0-9.-]{1,5} or [-a-zA-Z0-9 .]{1,5}
So, we've just defined what one segment looks like. That pattern can reoccur zero or more times. Of course, there are many ways to do that, but I would favor a regex subroutine because this allows code reuse. Now if the specs change or you're testing and realize you have to tweak that segment pattern, you only need to change it in one place.
Subroutines are not supported in BRE or ERE, but most widely-used modern regex engines support them (Perl, PCRE, Ruby, Delphi, R, PHP). They are very simple to use and understand. Basically, you just need to be able to refer to it (sound familiar? refer-back? back-reference?), so this means we need to capture the regex we wish to repeat. Then it's as simple as referring back to it, but instead of \1 which refers to the captured value (data), we want to refer to it as (?1), the capturing expression. In doing so, we've logically defined a subroutine:
([a-zA-Z 0-9.-]{1,5})(,(?1))*
So, the first group basically defines our subroutine and the second group consists of a comma followed by the same segment-definition expression we used for the first group, and that is optional ('*' is the zero-or-more quantifier).
If you operate on large quantities of data where efficiency is a consideration, don't capture when you don't have to. If your sole purpose for using parenthesis is to alternate (e.g., \b[bB](asset|eagle)\b hound) or to quantify, as in our second group, use the (?: ... ) notation, which signifies to the regex engine that this is a non-capturing group. Without going into great detail, there is a lot of overhead in maintaining the match locations--not that it's complex, per se, just potentially highly repetitive. Regex engines will match, store the information, then when the match fails, they "give up" the match and try again starting with the next matching substring. Each time they match your capture group, they're storing that information again. Okay, I'm off the soapbox now. :-)
So, we're almost there. I say "almost" because I don't have all the information. But if this should be the sole occupant of the "subject" (line, field, etc.--the data sample you're evaluating), you should anchor it to "assert" that requirement. The caret '^' is beginning of subject, and the dollar '$' is end of subject, so by encapsulating our expression in ^ ... $ we are asserting that the subject matches in it's entirety, front-to-back. These assertions have zero-length; they consume no data, only assert a relative position. You can operate on them, e.g., s/^/ / would indent your entire document two spaces. You haven't really substituted the beginning of line with two spaces, but you're able to operate on that imaginary, zero-length location. (Do some research on zero-length assertions [aka zero-width assertions, or look-arounds] to uncover a powerful feature of modern regex. For example, in the previous regex if I wanted to make sure I did not insert two spaces on blank lines: s/^(?!$)/ /)
Also, you didn't say if you need to capture the results to do something with it. My impression was it's validation only, so that's not necessary. However, if it is needed, you can wrap the entire expression in capturing parenthesis: ^( ... )$.
I'm going to provide a final solution that does not assume you need to capture but does assume the entire subject should consist of this value:
^([a-zA-Z 0-9. -]{1,5})(?:,(?1))*$
I know I went on a bit, but you said you were new to regex, so wanted to provide some detail. I hope it wasn't too much detail.
By the way, an excellent resource with tutorials is regular-expressions dot info, and a wonderful regex development and testing tool is regex101 dot com. And I can never say enough about stack overflow!

Regular Expression to extract multiple parts when some string parts are absent

I am trying to create a regular expression that will capture several sections of a string. This is the expression I have created:
([0-9]{6}[-*][0-9xX]{7}).*([0-9]{1,3}-[0-9]{1,3}-[0-9]{1,3}).*([FPTSUCD])=?([01][*-])
The string that this runs against can appear in two different styles:
# 141803-6310114 #3-0-2 T0-jL
Or
]#0-7-4 C1-vU
When I use the first string I get all the parts I need.
141803-6310114
3-0-2
T
0-
When I use the second string I get no matches. This second sting is basically the same as the first but without this part “141803-6310114”. I would like the expression to work with both strings but for the number sequence to be optional. Can anyone advise on what the expression should look like to do this?
This will get you the parts in both cases:
(?:(\d{6}[-*][\dxX]{7}))?[^\d]*(\d{1,3}-\d{1,3}-\d{1,3}) ([FPTSUCD])=?([01][*-])
Made the first group optional (?) and changed the "eat all" between the first two groups to a "eat all non digits" + other clean up to make it more readable (at least to me ;)).
Regards

Mod Rewrite RegEx To Match Only If Previous Subset Matched

I am trying to make what I think is a simple regex for use with mod_rewrite.
I've tried various expressions, many of which I thought were promising, but all of which ultimately failed for one reason or another. They all also seem to fail once I add start/end string delimiters.
For example, ^user/(\d{1,10})(?=/)$ was one I tried, but among other things, it seems to group the trailing slash, and I only want to group the digits. I think I need to use a positive lookbehind, but I'm having difficulty because it's looking behind at a group.
What I am trying to match is strings that 1) begin with "user/" and 2) possibly end with (\d{1,10})/ (1 to 10 digits followed by a single slash)
Should Match:
user/
user/123/
user/1234567890/
Should not match:
user
user//
user/-4/
user/35.5/
user/123
user/123//
user/123/5/
user/12345678901/
Edit: Sorry about the formatting; I do not understand how to format anything via this markdown. Those examples are preceded by 4 spaces which I thought should make a code block, but obviously I thought wrong.
^user/(?:([0-9]{1,10})/)?$ should work just fine.
This: ^user(?=/)(/\d{1,10})?/$ Edit: if you want to group digits, ^user(?=/)(?:/(\d{1,10}))?/$

Regex href match a number

Well, here I am back at regex and my poor understanding of it. Spent more time learning it and this is what I came up with:
/(.*)
I basically want the number in this string:
510973
My regex is almost good? my original was:
"/<a href=\"travis.php?theTaco(.*)\">(.*)<\/a>/";
But sometimes it returned me huge strings. So, I just want to get numbers only.
I searched through other posts but there is such a large amount of unrelated material, please give an example, resource, or a link directing to a very related question.
Thank you.
Try using a HTML parser provided by the language you are using.
Reason why your first regex fails:
[0-9999999] is not what you think. It is same as [0-9] which matches one digit. To match a number you need [0-9]+. Also .* is greedy and will try to match as much as it can. You can use .*? to make it non-greedy. Since you are trying to match a number again, use [0-9]+ again instead of .*. Also if the two number you are capturing will be the same, you can just match the first and use a back reference \1 for 2nd one.
And there are a few regex meta-characters which you need to escape like ., ?.
Try:
<a href=\"travis\.php\?theTaco=([0-9]+)\">\1<\/a>
To capture a number, you don't use a range like [0-99999], you capture by digit. Something like [0-9]+ is more like what you want for that section. Also, escaping is important like codaddict said.
Others have already mentioned some issues regarding your regex, so I won't bother repeating them.
There are also issues regarding how you specified what it is you want. You can simply match via
/theTaco=(\d+)/
and take the first capturing group. You have not given us enough information to know whether this suits your needs.