PATINDEX returning 0 on matching rexpresson - regex

As far as I found out, T-SQL has a possibility to check for regular expressions, via
PATINDEX.
My task is simple: I get a telecom-Type (phone for example). On a mapping table, a regex to validate this type is saved. So before saving, I'd like to check this regex.
So easy done:
-- Check the regular expression for phone
DECLARE #regExp nvarchar(255);
SELECT #regExp = tt.ValidationRegex
FROM Core.TelecomType tt
WHERE tt.Code = #DEFAULT_PHONE_TYPE;
Sadly, this always returns 0, even with wildcards with tries like this:
SET #regExp = CONCAT('%', #regExp, '%');
SET #regExp = CONCAT(#regExp, '%');
SET #regExp = CONCAT('%', #regExp );
On RegExr and oracle-side, the values seem to match, so is this a problem on the T-SQL? If yes, is there a workarround for this?
Thanks in advance
Matthias

Nope, sorry. PATINDEX doesn't let you match a regular expression, it works with the same kinds of patterns that are used with LIKE.
A quote from the documentation:
PATINDEX works just like LIKE, so you can use any of the wildcards. You do not have to enclose the pattern between percents. PATINDEX('a%', 'abc') returns 1 and PATINDEX('%a', 'cba') returns 3.
Unlike LIKE, PATINDEX returns a position, similar to what CHARINDEX does.
If you need regular expression matching in T-SQL, you'll have to rely on a custom CLR function/procedure. You can also check this to see if you can use it.

Related

Regular expression that contains one expression yet doesn't contain the other

We are currently matching "service_hub*queue"
I want to ignore the case "service_hub_scout_dead_queue" and yet still match everything else.
What is the regular expression for that ?
This javascript sollution gives an array with the matches
var myText = 'service_hub_anything_queue Add service_hub_scout_dead_queue something service_hub_someting_queue else';
var myMatches = myText.match(/service_hub(?!_scout_dead_)\w+queue/g);
If you are rather interested in what follows a match
var mySplit = ('dummy'+myText).split(/service_hub(?!_scout_dead_)\w+queue/g).filter(function(txt,i) {return (i>0);})
I put 'dummy' and then filter away the first part to make it work both if the sting starts with a valid tag and when it does not.
Using negative lookbehind: "service_hub_.*?(?<!_scout_dead)_queue"
This appears to be widely supported by popular regex engines; I've tested with Java (or Scala, rather) just to make sure it works.

regular expression match domain

I need a regular expression to match the following domains as follows:
http://www.cnn.com/fred = www.cnn.com
cnn.com = cnn.com
www.cnn.com:8080 = www.cnn.com
I have the following regular expression (using pcre):
([^/]+://)?([^:/]+)
The above works fine in case 2 and 3 however with 1 i still have the http:// appended to the matching string, is there a regular expression option which i can use to skip the http part?
many thanks in advance
This one should suit your needs:
^(?:(?:f|ht)tps?://)?([^/:]+)
The first group will contain what you're looking for.
this looks like the closest i could get to what i want not perfect but seems to gets the job done
www?([^/:]+)

MATLAB 2012 regular expression

I have a set of strings that I'd like to parse in MATLAB 2012 that all have the following format:
string-int-int-int-int-string
I'd like to pluck out the third integer (the rest are 'don't cares'), but I haven't used MATLAB in ages and need to refresh on regular expressions. I tried using the regular expression '(.*)-(.*)-(.*)-\d-(.*)' but no dice. I did check out the MATLAB regexp page, but wasn't able to figure out how to apply that information to this case.
Anyone know how I might get the desired result? If so, could you explain what the expression you're using is doing to get that result so that others might be able to apply the answer to their unique situation?
Thanks in advance!
str = 'XyzStr-1-2-1000-56789-ILoveStackExchange.txt';
[tok] = regexp(str, '^.+?-.+?-.+?-(\d+?)-.+?-.+?', 'tokens');
tok{:}
ans =
'1000'
Update
Explanation, upon request.
^ - "Anchor", or match beginning of string.
.+? - Wildcard match, one or more, non-greedy.
- - Literal dash/hyphen.
(\d+?) - Digits match, one or more, non-greedy, captured into a token.
^.*?-.*?-.*?-(\d+)-.*?-.*?$
OR
^(?:[^-]*?-){3}(\d+)(?:.*?)$
Group1 now contains your required data

Article spinner with 2 tiers

I made an article spinner that used regex to find words in this syntax:
{word1|word2}
And then split them up at the "|", but I need a way to make it support tier 2 brackets, such as:
{{word1|word2}|{word3|word4}}
What my code does when presented with such a line, is take "{{word1|word2}" and "{word3|word4}", and this is not as intended.
What I want is when presented with such a line, my code breaks it up as "{word1|word2}|{word3|word4}", so that I can use this with the original function and break it into the actual words.
I am using c#.
Here is the pseudo code of how it might look like:
Check string for regex match to "{{word1|word2}|{word3|word4}}" pattern
If found, store each one as "{word1|word2}|{word3|word4}" in MatchCollection (mc1)
Split the word at the "|" but not the one inside the brackets, and select a random one (aka, "{word1|word2}" or "{word3|word4}")
Store the new results aka "{word1|word2}" and "{word3|word4}" in a new MatchCollection (mc2)
Now search the string again, this time looking for "{word1|word2}" only and ignore the double "{{" "}}"
Store these in mc2.
I can not split these up normally
Here is the regex I use to search for "{word1|word2}":
Regex regexObj = new Regex(#"\{.*?\}", RegexOptions.Singleline);
MatchCollection m = regexObj.Matches(originalText); //How I store them
Hopefully someone can help, thanks!
Edit: I solved this using a recursive method. I was building an article spinner btw.
That is not parsable using a regular expression, instead you have to use a recursive descent parser. Map it to JSON by replacing:
{ with [
| with ,
wordX with "wordX" (regex \w+)
Then your input
{{word1|word2}|{word3|word4}}
becomes valid JSON
[["word1","word2"],["word3","word4"]]
and will map directly to PHP arrays when you call json_decode.
In C#, the same should be possible with JavaScriptSerializer.
I'm really not completely sure WHAT you're asking for, but I'll give it a go:
If you want to get {word1|word2}|{word3|word4} out of any occurrence of {{word1|word2}|{word3|word4}} but not {word1|word2} or {word3|word4}, then use this:
#"\{(\{[^}]*\}\|\{[^}]*\})\}"
...which will match {{word1|word2}|{word3|word4}}, but with {word1|word2}|{word3|word4} in the first matching group.
I'm not sure if this will be helpful or even if it's along the right track, but I'll try to check back every once in a while for more questions or clarifications.
s = "{Spinning|Re-writing|Rotating|Content spinning|Rewriting|SEO Content Machine} is {fun|enjoyable|entertaining|exciting|enjoyment}! try it {for yourself|on your own|yourself|by yourself|for you} and {see how|observe how|observe} it {works|functions|operates|performs|is effective}."
print spin(s)
If you want to use the [square|brackets|syntax] use this line in the process function:
'/[(((?>[^[]]+)|(?R))*)]/x',

Is there a regular expression for a comma separated list of discrete values?

I use the following regular expression to validate a comma separated list of values.
^Dog|Cat|Bird|Mouse(, (Dog|Cat|Bird|Mouse))*$
The values are also listed in a drop down list in Excel cell validation, so the user can select a single value from the drop down list, or type in multiple values separated by commas.
The regular expression does a good job of preventing the user from entering anything but the approved values, but it doesn't prevent the user from entering duplicates. For example, the user can enter "Dog" and "Dog, Cat", but the user can also enter "Dog, Dog".
Is there any way to prevent duplicates using a similar single regular expression? In other words I need to be able to enforce a discrete list of approved comma separated values.
Thanks!
Use a backreference and a negative lookahead:
^(Dog|Cat|Bird|Mouse)(, (?!\1)(Dog|Cat|Bird|Mouse))*$
EDIT: This won't work with cases such as "Cat, Dog, Dog" ... You'll need to come up a hybrid solution for such instances - I don't believe there is a single regex that can handle that.
Here's another technique. You need to check two things, first, that it DOES match this:
(?:(?:^|, )(Dog|Cat|Bird|Mouse))+$
(That's just a slightly shorter version of your original regex)
Then, check that it DOES NOT match this:
(Dog|Cat|Bird|Mouse).+?\1
E.g.
var valid = string.match( /(?:(?:^|, )(Dog|Cat|Bird|Mouse))+$/ ) &&
!string.match( /(Dog|Cat|Bird|Mouse).+?\1/ );
J-P, I tried editing your sample regular expressions so that I could look for duplicates in any comma separated string. Something like this:
var valid = string.match( /(?:(?:^|, )([a-z]*))+$/ ) &&
!string.match( /([a-z]*).+?\1/ );
Unfortunately, I failed. The Force is weak with me. ;)
Thanks again for your help.
What about using some kind of expression like this:
(Dog|Cat|Bird|Mouse){1}
Then you can write only a value from the aray once. It is easy then to add zero or more times, commas, spaces, etc.
I know i'm necroposting but i found this in my search, so i'll let it sit here if anyone will find it.