Regex Jersey Rest Service - regex

I have the following regex in jersey, that works:
/artist_{artistUID: [1-9][0-9]*}
however, if i do
/{artistUID: [artist_][1-9][0-9]*}
it does not, what i do not understand how the regexes are being build and do not find any good documentation for it. What i want to do is something like this:
/{artistUID: ([uartist_]|[artist_])[1-9][0-9]*}
to recognize terms like "artist_123" and "uartist_123" and store them in the artistUID value.

You can use the alternation group ((...|...)) rather than a characrter class [...] (that matches 1 single character defined inside it).
Use
/{artistUID: (uartist|artist)_[1-9][0-9]*}
Or to make it shorter, use a ? quantifier after u to make it optional:
/{artistUID: u?artist_[1-9][0-9]*}
See the regex demo

Related

regex to find domain without those instances being part of subdomain.domain

I'm new to regex. I need to find instances of example.com in an .SQL file in Notepad++ without those instances being part of subdomain.example.com(edited)
From this answer, I've tried using ^((?!subdomain))\.example\.com$, but this does not work.
I tested this in Notepad++ and # https://regex101.com/r/kS1nQ4/1 but it doesn't work.
Help appreciated.
Simple
^example\.com$
with g,m,i switches will work for you.
https://regex101.com/r/sJ5fE9/1
If the matching should be done somewhere in the middle of the string you can use negative look behind to check that there is no dot before:
(?<!\.)example\.com
https://regex101.com/r/sJ5fE9/2
Without access to example text, it's a bit hard to guess what you really need, but the regular expression
(^|\s)example\.com\>
will find example.com where it is preceded by nothing or by whitespace, and followed by a word boundary. (You could still get a false match on example.com.pk because the period is a word boundary. Provide better examples in your question if you want better answers.)
If you specifically want to use a lookaround, the neative lookahead you used (as the name implies) specifies what the regex should not match at this point. So (?!subdomain\.)example trivially matches always, because example is not subdomain. -- the negative lookahead can't not be true.
You might be better served by a lookbehind:
(?<!subdomain\.)example\.com
Demo: https://regex101.com/r/kS1nQ4/3
Here's a solution that takes into account the protocols/prefixes,
/^(www\.)?(http:\/\/www\.)?(https:\/\/www\.)?example\.com$/

Why /^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$/i does not work as expected

I have this regex for email validation (assume only x#y.com, abc#defghi.org, something#anotherhting.edu are valid)
/^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$/i
But #abc.edu and abc#xyz.eduorg are both valid as to the regex above. Can anyone explain why that is?
My approach:
there should be at least one character or number before #
then there comes #
there should be at least one character or number after # and before .
the string should end with either edu, com, or org.
Try this
/^[a-zA-Z0-9]+#[a-zA-Z0-9]+\.(com|edu|org)$/i
and it should become clear - you need to group those alternatives, otherwise you can match any string that has 'edu' in it, or any string that ends with org. To put it another way, your version matches any of these patterns
^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)
(edu)
(org)$
It's worth pointing out that the original poster is using this as a regex learning exercise. This would be a terrible regex for actual production use! It's a thorny problem - see Using a regular expression to validate an email address for a lot more depth.
Your grouping parentheses are incorrect:
/^[a-zA-Z0-9]+#[a-zA-Z0-9]+\.(com|edu|org)$/i
Can also just use one case as you're using the i modifier:
/^[a-z0-9]+#[a-z0-9]+\.(com|edu|org)$/i
N.B. you were also missing a + from the second set, I assume this was just a typo...
What you have written is the equivalent of matching something that:
Begins with [a-zA-Z0-9]+#[a-zA-Z0-9].com
contains edu
or ends with org
What you were looking for was:
/^[a-z0-9]+#[a-z0-9]+\.(com|edu|org)$/i
Your regex looks ok.
I guess you are looking using a find function in stead of a match function
Without specifying what you use it is a bit difficult, but in Python you would write
import re
pattern = re.compile ('^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$')
re.match('#abc.edu') # fails, use this to validate an input
re.search('#abc.edu') # matches, finds the edu
Try to use it:
[a-zA-Z0-9]+#[a-zA-Z0-9]+.(com|edu|org)+$
U forget about + modificator if u want to catch any combinations of (com|edu|org)
Upd: as i see second [a-zA-Z0-9] u missed + too

Validate incomplete Regex

Let's say we have a Regex, in my case it's one I found to match UK car registration plates:
^([A-Z]{3}\s?(\d{3}|\d{2}|d{1})\s?[A-Z])|([A-Z]\s?(\d{3}|\d{2}|\d{1})\s?[A-Z]{3})|(([A-HK-PRSVWY][A-HJ-PR-Y])\s?([0][2-9]|[1-9][0-9])\s?[A-HJ-PR-Z]{3})
A typical UK car registration is
HG53CAY
This is matched correctly by the regex, but what i'd like to do is find a way to match any prefix substring of this, so the following would all be valid:
H, HG, HG5, HG53, HG53C, HG53CA, HG53CAY
Is there a suggested way to achieve this?
Firstly I'd rewrite your regexp to look like this:
^([A-Z]{3}\s?(\d{1,3})\s?[A-Z])|([A-Z]\s?(\d{1,3})\s?[A-Z]{3})|(([A-HK-PRSVWY][A-HJ-PR-Y])\s?([0][2-9]|[1-9][0-9])\s?[A-HJ-PR-Z]{3})
as the \d{3}|\d{2}|d{1} parts make no sense and should be written \d{1,3}.
Rewriting the regexp like
^([A-Z]{0,3}\s?(\d{0,3})\s?[A-Z]?)|([A-Z]\s?(\d{0,3})\s?[A-Z]{0,3})|(([A-HK-PRSVWY][A-HJ-PR-Y]?)\s?([0]?[2-9]?|[1-9]?[0-9]?)\s?[A-HJ-PR-Z]{0,3})
should have the desired effect of allowing matching of only the beginning of a registration, but unfortunately it's no longer guaranteed that the full registration will be a valid one, as I had to make most characters optional.
You could possibly try something like this
^(([A-Z]{3})|[A-Z]{1,2}$)\s?((\d{1,3})|$))...
to make it require either that each part is complete, or that it is incomplete but followed by "end of string", represented by the $ in the regexp.

Regular Expression to find multiple instances of %%{ANYTHING}%%

SomeRandomText=%EXAMPLE1%,MoreRandomText=%%ONE%%!!%%TWO%%,YetMoreRandomText=%%THREE%%%FOUR%!!%FIVE%\%%SIX%%
I'm in need of a regular expression which can pull out anything which is wrapped in '%%'- so this regular expression would match only the following:
%%ONE%%
%%TWO%%
%%THREE%%
%%SIX%%
I've tried lots of different methods, and am sure there is a way to achieve this- but i'm struggeling as of yet. I mainly end up getting it where it will match everything from the first %% to the last %% in the string- which is not what i want. i think i need something like forward lookups, but struggling to implement
You need a non-greedy match, using the ? modifier:
%%.*?%%
See it working online: rubular
This can also be done be restricting what is allowed between the %s.
%%[^%]*%%
This is more widely supported than non-greedy matching, however
note that this won't match %%A%B%%. Although, if necessary, this can be done with some modifications:
%%([^%]|%[^%])*%%
Or equivalently
%%(%?[^%])*%%

Regex to extract part of a url

I'm being lazy tonight and don't want to figure this one out. I need a regex to match 'jeremy.miller' and 'scottgu' from the following inputs:
http://codebetter.com/blogs/jeremy.miller/archive/2009/08/26/talking-about-storyteller-and-executable-requirements-on-elegant-code.aspx
http://weblogs.asp.net/scottgu/archive/2009/08/25/clean-web-config-files-vs-2010-and-net-4-0-series.aspx
Ideas?
Edit
Chris Lutz did a great job of meeting the requirements above. What if these were the inputs so you couldn't use 'archive' in the regex?
http://codebetter.com/blogs/jeremy.miller/
http://weblogs.asp.net/scottgu/
Would this be what you're looking for?
'/([^/]+)/archive/'
Captures the piece before "archive" in both cases. Depending on regex flavor you'll need to escape the /s for it to work. As an alternative, if you don't want to match the archive part, you could use a lookahead, but I don't like lookaheads, and it's easier to match a lot and just capture the parts you need (in my opinion), so if you prefer to use a lookahead to verify that the next part is archive, you can write one yourself.
EDIT: As you update your question, my idea of what you want is becoming fuzzier. If you want a new regex to match the second cases, you can just pluck the appropriate part off the end, with the same / conditions as before:
'/([^/]+)/$'
If you specifically want either the text jeremy.miller or scottgu, regardless of where they occur in a URL, but only as "words" in the URL (i.e. not scottgu2), try this, once again with the / caveat:
'/(jeremy\.miller|scottgu)/'
As yet a third alternative, if you want the field after the domain name, unless that field is "blogs", it's going to get hairy, especially with the / caveat:
'http://[^/]+/(?:blogs/)?([^/]+)/'
This will match the domain name, an optional blogs field, and then the desired field. The (?:) syntax is a non-capturing group, which means it's just like regular parenthesis, but won't capture the value, so the only value captured is the value you want. (?:) has a risk of varying depending on your particular regex flavor. I don't know what language you're asking for, but I predominantly use Perl, so this regex should pretty much do it if you're using PCRE. If you're using something different, look into non-capturing groups.
Wow. That's a lot of talking about regexes. I need to shut up and post already.
Try this one:
/\/([\w\.]+)\/archive/