Regex CHECK constraint not working with SQL server - regex

im trying to reject all inputs not in the format "03 xxxx xxxx" so i created a table like
create table records
(
....
num varchar(255) NOT NULL,
...
CONSTRAINT num_check CHECK (num like '03 [0-9]{4} [0-9]{4}')
)
which should (i think?) accept for example "03 1234 1234". but if i try to add this via sql manager i get an error with the message:
"the INSERT statement conflicted with the CHECK constraint "num_check" "
at first i thought my Regex was off but ive tried it in a few other places and it accepts the example above.
any ideas?

like does not work with regular expressions, it has its own, much simpler wildcard patterns, which only support %, _ , [a-z], and [^a-z]. That's it. {4} would not works, just like most regex features.
You should be able to use:
like '03 [0-9][0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
Another option, a little less repetitive:
declare #digitChar nvarchar(12)
set #digitChar = '[0-9]'
Where clause:
like '03 ' + replicate(#digitChar,4) + ' ' + replicate(#digitChar,4)
Example: http://sqlfiddle.com/#!3/d41d8/3251

Related

Having difficulty in pattern matching Postal Codes for an oracle regexp_like command

The Problem:
All I'm trying to do is come up with a pattern matching string for my regular expression that lets me select Canadian postal codes in this format: 'A1A-2B2' (for example).
The types of data I am trying to insert:
Insert Into Table
(Table_Number, Person_Name, EMail_Address, Street_Address, City, Province, Postal_Code, Hire_Date)
Values
(87, 'Tommy', 'mobster#gmail.com', '123 Street', 'location', 'ZY', 'T4X-1S2', To_Date('30-Aug-2020 08:50:56');
This is a slightly modified/generic version to protect some of the data. All of the other columns enter just fine/no complaints. But the postal code it does not seem to like when I try to run a load data script.
The Column & Constraint in question:
Postal_Code varchar2(7) Constraint Table_Postal_Code Null
Constraint CK_Postal_Code Check ((Regexp_like (Postal_Code, '^\[[:upper:]]{1}[[:digit:]]{1}[[:upper:]][[:punct:]]{1}[[:digit:]]{1}[[:upper:]](1}[[:digit:]]{1}$')),
My logic here: following the regular expression documentation:
I have:
an open quote
a exponent sign to indicate start of string
Backslash (I think to interpet a string literal)
-1 upper case letter, 1 digit, 1 uppercase , 1 :punct: to account for the hypen, 1 digit, 1 upper case letter, 1 digit
$ to indicate end of string
Close quote
In my mind, something like this should work, it accounts for every single letter/character and the ranges they have to be in. But something is off regarding my formatting of this pattern matching string.
The error I get is:
ORA-02290: check constraint (user.CK_POSTAL_CODE) violated
(slightly modified once more to protect my identity)
Which tells me that the data insert statement is tripping off my check constraint and thats about it. So its as issue with the condition of the constraint itself - ie string I'm using to match it. My instructor has told me that insert data is valid, and doesn't need any fix-up so I'm at a loss.
Limits/Rules: The Hyphen has to be there/matched to my understanding of the problem. They are all uppercase in the dataset, so I don't have to worry about lowercase for this example.
I have tried countless variations of this regexp statement to see if anything at all would work, including:
changing all those uppers to :alpha: , then using 'i' to not check for case sensitivity for the time being
removing the {1} in case that was redudant
using - (backslash hyphen) , to turn into a string literal maybe
using only Hyphen by itself
even removing regexp altogether and trying a LIKE [A-Z][0-9][A-Z]-[0-9][A-Z][0-9] etc
keeping the uppers , turning :digit:'s to [0-9] to see if that would maybe work
The only logical thing I can think of now is: the check constraint is actually working fine and tripping off when it matches my syntax. But I didn't write it clearly enough to say "IGNORE these cases and only get tripped/activated if it doesn't meet these conditions"
But I'm at my wits end and asking here as a last resort. I wouldn't if I could see my mistake eventually - but everything I can think of, I probably tried. I'm sure its some tiny formatting rule I just can't see (I can feel it).Thank you kindly to anyone who would know how to format a pattern matching string like this properly.
It looks like you may have been overcomplicating the regex a bit. The regex below matches your description based on the first set of bullets you lined out:
REGEXP_LIKE (postal_code, '^[A-Z]\d[A-Z]-\d[A-Z]\d$')
I see two problems with that regexp.
Firstly, you have a spurious \ at the start. It serves you no purpose, get rid of it.
Secondly, the second-from last {1} appears in your code with mismatched brackets as (1}. I get the error ORA-12725: unmatched parentheses in regular expression because of this.
To be honest, you don't need the {1}s at all: they just tell the regular expression that you want one of the previous item, which is exactly what you'd get without them.
So you can fix the regexp in your constraint by getting rid of the \ and removing the {1}s, including the one with mismatched parentheses.
Here's a demo of the fixed constraint in action:
SQL> CREATE TABLE postal_code_test (
2 Postal_Code varchar2(7) Constraint Table_Postal_Code Null
3 Constraint CK_Postal_Code Check ((Regexp_like (Postal_Code, '^[[:upper:]][[:digit:]][[:upper:]][[:punct:]][[:digit:]][[:upper:]][[:digit:]]$'))));
Table created.
SQL> INSERT INTO postal_code_test (postal_code) VALUES ('T4X-1S2');
1 row created.
SQL> INSERT INTO postal_code_test (postal_code) VALUES ('invalid');
INSERT INTO postal_code_test (postal_code) VALUES ('invalid')
*
ERROR at line 1:
ORA-02290: check constraint (user.CK_POSTAL_CODE) violated
You do not need the backslash and you have (1} instead of {1}.
You can simplify the expression to:
Postal_Code varchar2(7)
Constraint Table_Postal_Code Null
Constraint CK_Postal_Code Check (
REGEXP_LIKE(Postal_Code, '^[A-Z]\d[A-Z][[:punct:]]\d[A-Z]\d$')
)
or:
Constraint CK_Postal_Code Check (
REGEXP_LIKE(
Postal_Code,
'^[A-Z][0-9][A-Z][[:punct:]][0-9][A-Z][0-9]$'
)
)
or:
Constraint CK_Postal_Code Check (
REGEXP_LIKE(
Postal_Code,
'^[[:upper:]][[:digit:]][[:upper:]][[:punct:]][[:digit:]][[:upper:]][[:digit:]]$'
)
)
or (although the {1} syntax is redundant here):
Constraint CK_Postal_Code Check (
REGEXP_LIKE(
Postal_Code,
'^[[:upper:]]{1}[[:digit:]]{1}[[:upper:]]{1}[[:punct:]]{1}[[:digit:]]{1}[[:upper:]]{1}[[:digit:]]{1}$'
)
)
fiddle
removing regexp altogether and trying a LIKE [A-Z][0-9][A-Z]-[0-9][A-Z][0-9] etc
That will not work as the LIKE operator does not match regular expression patterns.

Replace pair of % in oracle

please, I have in Oracle table this texts (as 2 records)
"Sample text with replace parameter %1%"
"You reached 90% of your limit"
I need replace %1% with specific text from input parameter in Oracle Function. In fact, I can have more than just one replace parameters. I have also record with "Replace this %12% with real value"
This functionality I have programmed:
IF poc > 0 THEN
FOR i in 1 .. poc LOOP
p := get_param(mString => mbody);
mbody := replace(mbody,
'%' || p || '%', parameters(to_number(p, '99')));
END LOOP;
END IF;
But in this case I have problem with text number 2. This functionality trying replace "90%" also and I then I get this error:
ORA-06502: PL/SQL: numeric or value error: NULL index table key value
It's a possible to avoid try replace "90%"? Many thanks for advice.
Best regards
PS: Oracle version: 10g (OCI Version: 10.2)
Regular expressions can work here. Try the following and build them into your script.
SELECT REGEXP_REPLACE( 'Sample text with replace parameter %1%',
'\%[0-9]+\%',
'db_size' )
FROM DUAL
and
SELECT REGEXP_REPLACE( 'Sample text with replace parameter 1%',
'\%[0-9]+\%',
'db_size' )
FROM DUAL
The pattern is pretty simple; look for patterns where a '%' is followed by 1 or more numbers followed by a '%'.
The only issue here will be if you have more than one replacement to make in each string and each replacement is different. In that case you will need to loop round the string each time replacing the next parameter. To do this add the position and occurrence parameters to REGEXP_REPLACE after the replacement string, e.g.
REGEXP_REPLACE( 'Sample text with replace parameter %88888888888%','\%[0-9]+\%','db_size',0,1 )
You are getting the error because at parameters(to_number(p, '99')). Can you please check the value of p?
Also, if the p=90 then then REPLACE will not try to replace "90%". It will replace "%90%". How have you been sure that it's trying to replace "90%"?

REGEXP_LIKE to matchtwo words in a sentence in Oracle 11g

sentence :"STANDARD domain WARNING encountered in the PROCESS"
I want to identify all the sentences which have the words STANDARD and WARNING in it using REGEXP_LIKE. Also the search has to be case insensitive.
I would want to replace the following code with REGEXP_LIKE:
Select * from table where upper(sentence) like 'STANDARD%WARNING%'
You can specify the 'i' match parameter for case insensitivity:
Select * from table where REGEXP_LIKE (sentence, '\bstandard(?=\b).*\bwarning\b', 'i')
Regex is powerful but not so good with performance.
Try this -
Select * from table
where upper(sentence) like '%STANDARD%' or
upper(sentence) like '%WARNING%'
Easy to read and serve the purpose.

How can I replace just the last occurrence in a string with a Perl regex?

Ok, here is my test (this is not production code, but just a test to illustrate my problem)
my $string = <<EOS; # auto generated query
SELECT
users.*
, roles.label AS role_label
, hr_orders.position_label
, deps.label AS dep_label
, top_deps.label AS top_dep_label
FROM
users
LEFT JOIN $conf->{systables}->{roles} AS roles ON users.id_role = roles.id
LEFT JOIN (
SELECT
id_user
, MAX(dt) AS max_dt
FROM
hr_orders
WHERE
fake = 0
AND
IFNULL(position_label, ' ') <> ' '
GROUP BY id_user
) last_hr_orders ON last_hr_orders.id_user = users.id
LEFT JOIN hr_orders ON hr_orders.id_user = last_hr_orders.id_user AND hr_orders.dt = last_hr_orders.max_dt
$join
WHERE
$filter
ORDER BY
$order
$limit
EOS
my $where = "WHERE\nusers.fake = -1 AND ";
$string =~ s{where}{$where}i;
print "result: \n$string";
Code, which generates the query, ends with simple s{where}{$where}i, which replaces EVERY occurence of where.
I want to replace top-level WHERE (last occurence of WHERE?) with 'WHERE users.fake = -1' (actually, with more complex pattern, but it doesn't matter).
Any ideas?
Why do you want to build your sql queries by hard-coding strings and then making replacements on them? Wouldn't something like
my $proto_query = <<'EOQ'
select ... where %s ...
EOQ
my $query = sprintf $proto_query, 'users.fake = -1 AND ...';
or (preferably, as it avoids a lot of issues your initial approach and the above has) using a module such as Data::Phrasebook::SQL make a lot of things easier?
If you really wanted to go for string substitutions, you're probably looking for something like
my $foo = "foo bar where baz where moo";
$foo =~ s/(.*)where/$1where affe and/;
say $foo; # "foo bar where baz where affe and moo"
That is, capturing as much as you can until you can't capture any more without not having a "where" immediately follow what you captured, and then inserting whatever you captured captured again, plus whatever modifications you want to make.
However, note that this has various limitations if you're using that to mangle SQL queries. To do things right, you'd have to actually understand the SQL at some level. Consider, for example, select ... where user.name = 'where'.
apparently, what I need was Look-ahead regex feature
my regex is
s{where(?!.*where)}{$where}is;
The right way to parse SQL queries is to do it using a parser and not using regex.
see SQL::Statement::Structure - parse and examine structure of SQL queries

Finding and removing Non-ASCII characters from an Oracle Varchar2

We are currently migrating one of our oracle databases to UTF8 and we have found a few records that are near the 4000 byte varchar limit.
When we try and migrate these record they fail as they contain characters that become multibyte UF8 characters.
What I want to do within PL/SQL is locate these characters to see what they are and then either change them or remove them.
I would like to do :
SELECT REGEXP_REPLACE(COLUMN,'[^[:ascii:]],'')
but Oracle does not implement the [:ascii:] character class.
Is there a simple way doing what I want to do?
I think this will do the trick:
SELECT REGEXP_REPLACE(COLUMN, '[^[:print:]]', '')
If you use the ASCIISTR function to convert the Unicode to literals of the form \nnnn, you can then use REGEXP_REPLACE to strip those literals out, like so...
UPDATE table SET field = REGEXP_REPLACE(ASCIISTR(field), '\\[[:xdigit:]]{4}', '')
...where field and table are your field and table names respectively.
I wouldn't recommend it for production code, but it makes sense and seems to work:
SELECT REGEXP_REPLACE(COLUMN,'[^' || CHR(1) || '-' || CHR(127) || '],'')
The select may look like the following sample:
select nvalue from table
where length(asciistr(nvalue))!=length(nvalue)
order by nvalue;
In a single-byte ASCII-compatible encoding (e.g. Latin-1), ASCII characters are simply bytes in the range 0 to 127. So you can use something like [\x80-\xFF] to detect non-ASCII characters.
There's probably a more direct way using regular expressions. With luck, somebody else will provide it. But here's what I'd do without needing to go to the manuals.
Create a PLSQL function to receive your input string and return a varchar2.
In the PLSQL function, do an asciistr() of your input. The PLSQL is because that may return a string longer than 4000 and you have 32K available for varchar2 in PLSQL.
That function converts the non-ASCII characters to \xxxx notation. So you can use regular expressions to find and remove those. Then return the result.
The following also works:
select dump(a,1016), a from (
SELECT REGEXP_REPLACE (
CONVERT (
'3735844533120%$03  ',
'US7ASCII',
'WE8ISO8859P1'),
'[^!#/\.,;:<>#$%&()_=[:alnum:][:blank:]]') a
FROM DUAL);
I had a similar issue and blogged about it here.
I started with the regular expression for alpha numerics, then added in the few basic punctuation characters I liked:
select dump(a,1016), a, b
from
(select regexp_replace(COLUMN,'[[:alnum:]/''%()> -.:=;[]','') a,
COLUMN b
from TABLE)
where a is not null
order by a;
I used dump with the 1016 variant to give out the hex characters I wanted to replace which I could then user in a utl_raw.cast_to_varchar2.
I found the answer here:
http://www.squaredba.com/remove-non-ascii-characters-from-a-column-255.html
CREATE OR REPLACE FUNCTION O1DW.RECTIFY_NON_ASCII(INPUT_STR IN VARCHAR2)
RETURN VARCHAR2
IS
str VARCHAR2(2000);
act number :=0;
cnt number :=0;
askey number :=0;
OUTPUT_STR VARCHAR2(2000);
begin
str:=’^'||TO_CHAR(INPUT_STR)||’^';
cnt:=length(str);
for i in 1 .. cnt loop
askey :=0;
select ascii(substr(str,i,1)) into askey
from dual;
if askey < 32 or askey >=127 then
str :=’^'||REPLACE(str, CHR(askey),”);
end if;
end loop;
OUTPUT_STR := trim(ltrim(rtrim(trim(str),’^'),’^'));
RETURN (OUTPUT_STR);
end;
/
Then run this to update your data
update o1dw.rate_ipselect_p_20110505
set NCANI = RECTIFY_NON_ASCII(NCANI);
Try the following:
-- To detect
select 1 from dual
where regexp_like(trim('xx test text æ¸¬è© ¦ “xmx” number²'),'['||chr(128)||'-'||chr(255)||']','in')
-- To strip out
select regexp_replace(trim('xx test text æ¸¬è© ¦ “xmxmx” number²'),'['||chr(128)||'-'||chr(255)||']','',1,0,'in')
from dual
You can try something like following to search for the column containing non-ascii character :
select * from your_table where your_col <> asciistr(your_col);
I had similar requirement (to avoid this ugly ORA-31061: XDB error: special char to escaped char conversion failed. ), but had to keep the line breaks.
I tried this from an excellent comment
'[^ -~|[:space:]]'
but got this ORA-12728: invalid range in regular expression .
but it lead me to my solution:
select t.*, regexp_replace(deta, '[^[:print:]|[:space:]]', '#') from
(select '- <- strangest thing here, and I want to keep line break after
-' deta from dual ) t
displays (in my TOAD tool) as
replace all that ^ => is not in the sets (of printing [:print:] or space |[:space:] chars)
Thanks, this worked for my purposes. BTW there is a missing single-quote in the example, above.
REGEXP_REPLACE (COLUMN,'[^' || CHR (32) || '-' || CHR (127) || ']', ' '))
I used it in a word-wrap function. Occasionally there was an embedded NewLine/ NL / CHR(10) / 0A in the incoming text that was messing things up.
Answer given by Francisco Hayoz is the best. Don't use pl/sql functions if sql can do it for you.
Here is the simple test in Oracle 11.2.03
select s
, regexp_replace(s,'[^'||chr(1)||'-'||chr(127)||']','') "rep ^1-127"
, dump(regexp_replace(s,'['||chr(127)||'-'||chr(225)||']','')) "rep 127-255"
from (
select listagg(c, '') within group (order by c) s
from (select 127+level l,chr(127+level) c from dual connect by level < 129))
And "rep 127-255" is
Typ=1 Len=30: 226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
i.e for some reason this version of Oracle does not replace char(226) and above.
Using '['||chr(127)||'-'||chr(225)||']' gives the desired result.
If you need to replace other characters just add them to the regex above or use nested replace|regexp_replace if the replacement is different then '' (null string).
Please note that whenever you use
regexp_like(column, '[A-Z]')
Oracle's regexp engine will match certain characters from the Latin-1 range as well: this applies to all characters that look similar to ASCII characters like Ä->A, Ö->O, Ü->U, etc., so that [A-Z] is not what you know from other environments like, say, Perl.
Instead of fiddling with regular expressions try changing for the NVARCHAR2 datatype prior to character set upgrade.
Another approach: instead of cutting away part of the fields' contents you might try the SOUNDEX function, provided your database contains European characters (i.e. Latin-1) characters only. Or you just write a function that translates characters from the Latin-1 range into similar looking ASCII characters, like
å => a
ä => a
ö => o
of course only for text blocks exceeding 4000 bytes when transformed to UTF-8.
As noted in this comment, and this comment, you can use a range.
Using Oracle 11, the following works very well:
SELECT REGEXP_REPLACE(dummy, '[^ -~|[:space:]]', '?') AS dummy FROM DUAL;
This will replace anything outside that printable range as a question mark.
This will run as-is so you can verify the syntax with your installation.
Replace dummy and dual with your own column/table.
Do this, it will work.
trim(replace(ntwk_slctor_key_txt, chr(0), ''))
I'm a bit late in answering this question, but had the same problem recently (people cut and paste all sorts of stuff into a string and we don't always know what it is).
The following is a simple character whitelist approach:
SELECT est.clients_ref
,TRANSLATE (
est.clients_ref
, 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890#$%^&*()_+-={}|[]:";<>?,./'
|| REPLACE (
TRANSLATE (
est.clients_ref
,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890#$%^&*()_+-={}|[]:";<>?,./'
,'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
)
,'~'
)
,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890#$%^&*()_+-={}|[]:";<>?,./'
)
clean_ref
FROM edms_staging_table est