Postgres set varchar field to regular expression of itself - regex

I'm trying to normalise a data field by removing a fairly common postfix. I've got as far as using the substring() function in postgres, but can't quite get it to work. For example, if I want to strip the postfix 'xyz' from any values that have it;
UPDATE my_table SET my_field=substring(my_field from '#"%#"xyz' for '#');
But this is having some weird effects that I cant pin down. Any thoughts? Many thanks as always.

update my_table
set my_field = regexp_replace(my_field, 'xyz$', '')
where my_field ~ 'xyz$';
This will also change the value 'xyz' into an empty string. I don't know if you want that (or if the suffix can exists "on it's own".
The where clause is not strictly necessary but will make the update more efficient because only those rows are updated that actually meet the criteria.

UPDATE my_table
SET my_field = left(my_field, -3)
WHERE my_field LIKE '%xyz';
For several reasons:
If you don't want to change every single row, always add a WHERE clause to your UPDATE. Even if only some rows are actually changed by the expression. An UPDATE from the same value to the same value is still an UPDATE and will produce dead rows and table bloat and trigger triggers ...
Use left() in combination with LIKE.
left() with a negative second parameter effectively trims the number of character from the end of the string. left() was introduced with PostgreSQL 9.1. I quote the manual here:
When n is negative, return all but last |n| characters.
Always pick LIKE over a regular expression (~) if you can. LIKE is not as versatile, but much faster. (SIMILAR TO is rewritten as regular expression internally). Details in this related answer on dba.SE.
If you want to make sure that a minimum of characters remains:
WHERE my_field LIKE '_%xyz'; -- prepend as many _ as you want chars left
substring() would work like this (one possibility):
substring(my_field, '^(.*)xyz$');

Related

Having difficulty in pattern matching Postal Codes for an oracle regexp_like command

The Problem:
All I'm trying to do is come up with a pattern matching string for my regular expression that lets me select Canadian postal codes in this format: 'A1A-2B2' (for example).
The types of data I am trying to insert:
Insert Into Table
(Table_Number, Person_Name, EMail_Address, Street_Address, City, Province, Postal_Code, Hire_Date)
Values
(87, 'Tommy', 'mobster#gmail.com', '123 Street', 'location', 'ZY', 'T4X-1S2', To_Date('30-Aug-2020 08:50:56');
This is a slightly modified/generic version to protect some of the data. All of the other columns enter just fine/no complaints. But the postal code it does not seem to like when I try to run a load data script.
The Column & Constraint in question:
Postal_Code varchar2(7) Constraint Table_Postal_Code Null
Constraint CK_Postal_Code Check ((Regexp_like (Postal_Code, '^\[[:upper:]]{1}[[:digit:]]{1}[[:upper:]][[:punct:]]{1}[[:digit:]]{1}[[:upper:]](1}[[:digit:]]{1}$')),
My logic here: following the regular expression documentation:
I have:
an open quote
a exponent sign to indicate start of string
Backslash (I think to interpet a string literal)
-1 upper case letter, 1 digit, 1 uppercase , 1 :punct: to account for the hypen, 1 digit, 1 upper case letter, 1 digit
$ to indicate end of string
Close quote
In my mind, something like this should work, it accounts for every single letter/character and the ranges they have to be in. But something is off regarding my formatting of this pattern matching string.
The error I get is:
ORA-02290: check constraint (user.CK_POSTAL_CODE) violated
(slightly modified once more to protect my identity)
Which tells me that the data insert statement is tripping off my check constraint and thats about it. So its as issue with the condition of the constraint itself - ie string I'm using to match it. My instructor has told me that insert data is valid, and doesn't need any fix-up so I'm at a loss.
Limits/Rules: The Hyphen has to be there/matched to my understanding of the problem. They are all uppercase in the dataset, so I don't have to worry about lowercase for this example.
I have tried countless variations of this regexp statement to see if anything at all would work, including:
changing all those uppers to :alpha: , then using 'i' to not check for case sensitivity for the time being
removing the {1} in case that was redudant
using - (backslash hyphen) , to turn into a string literal maybe
using only Hyphen by itself
even removing regexp altogether and trying a LIKE [A-Z][0-9][A-Z]-[0-9][A-Z][0-9] etc
keeping the uppers , turning :digit:'s to [0-9] to see if that would maybe work
The only logical thing I can think of now is: the check constraint is actually working fine and tripping off when it matches my syntax. But I didn't write it clearly enough to say "IGNORE these cases and only get tripped/activated if it doesn't meet these conditions"
But I'm at my wits end and asking here as a last resort. I wouldn't if I could see my mistake eventually - but everything I can think of, I probably tried. I'm sure its some tiny formatting rule I just can't see (I can feel it).Thank you kindly to anyone who would know how to format a pattern matching string like this properly.
It looks like you may have been overcomplicating the regex a bit. The regex below matches your description based on the first set of bullets you lined out:
REGEXP_LIKE (postal_code, '^[A-Z]\d[A-Z]-\d[A-Z]\d$')
I see two problems with that regexp.
Firstly, you have a spurious \ at the start. It serves you no purpose, get rid of it.
Secondly, the second-from last {1} appears in your code with mismatched brackets as (1}. I get the error ORA-12725: unmatched parentheses in regular expression because of this.
To be honest, you don't need the {1}s at all: they just tell the regular expression that you want one of the previous item, which is exactly what you'd get without them.
So you can fix the regexp in your constraint by getting rid of the \ and removing the {1}s, including the one with mismatched parentheses.
Here's a demo of the fixed constraint in action:
SQL> CREATE TABLE postal_code_test (
2 Postal_Code varchar2(7) Constraint Table_Postal_Code Null
3 Constraint CK_Postal_Code Check ((Regexp_like (Postal_Code, '^[[:upper:]][[:digit:]][[:upper:]][[:punct:]][[:digit:]][[:upper:]][[:digit:]]$'))));
Table created.
SQL> INSERT INTO postal_code_test (postal_code) VALUES ('T4X-1S2');
1 row created.
SQL> INSERT INTO postal_code_test (postal_code) VALUES ('invalid');
INSERT INTO postal_code_test (postal_code) VALUES ('invalid')
*
ERROR at line 1:
ORA-02290: check constraint (user.CK_POSTAL_CODE) violated
You do not need the backslash and you have (1} instead of {1}.
You can simplify the expression to:
Postal_Code varchar2(7)
Constraint Table_Postal_Code Null
Constraint CK_Postal_Code Check (
REGEXP_LIKE(Postal_Code, '^[A-Z]\d[A-Z][[:punct:]]\d[A-Z]\d$')
)
or:
Constraint CK_Postal_Code Check (
REGEXP_LIKE(
Postal_Code,
'^[A-Z][0-9][A-Z][[:punct:]][0-9][A-Z][0-9]$'
)
)
or:
Constraint CK_Postal_Code Check (
REGEXP_LIKE(
Postal_Code,
'^[[:upper:]][[:digit:]][[:upper:]][[:punct:]][[:digit:]][[:upper:]][[:digit:]]$'
)
)
or (although the {1} syntax is redundant here):
Constraint CK_Postal_Code Check (
REGEXP_LIKE(
Postal_Code,
'^[[:upper:]]{1}[[:digit:]]{1}[[:upper:]]{1}[[:punct:]]{1}[[:digit:]]{1}[[:upper:]]{1}[[:digit:]]{1}$'
)
)
fiddle
removing regexp altogether and trying a LIKE [A-Z][0-9][A-Z]-[0-9][A-Z][0-9] etc
That will not work as the LIKE operator does not match regular expression patterns.

Partially match integers in PostgreSQL queries

So in my PostgreSQL 10 I have a column of type integer. This column represents a code of products and it should be searched against another code or part of the code. The values of the column are made of three parts, a five-digit part and two two-digit parts. Users can search for only the first part, the first-second or first-second-third.
So, in my column I have , say 123451233 the user searches for 12345 (the first part). I want to be able to return the 123451233. Same goes if the users also searches for 1234512 or 123451233.
Unfortunately I cannot change the type of column or break the one column into three (one for every part). How can I do this? I cannot use LIKE. Maybe something like a regex for integers?
Thanks
Consider to use simple arithmetic.
log(value)::int + 1 returns the number of digits in integer part of the value and using this:
value/(10^(log(value)::int-log(search_input)::int))::int
returns value truncated to the same digits number as search_input so, finally
search_input = value/(10^(log(value)::int-log(search_input)::int))::int
will make the trick.
It is more complex literally but also could be more efficient then strings manipulations.
PS: But having index like create index idx on your_table(cast(your_column as text)); search like
select * from your_table
where cast(your_column as text) like search_input || '%';
is the best case IMO.
You do not need regex functions. Cast the integer to text and use the function left(), example:
create table my_table(code int); -- or bigint
insert into my_table values (123451233);
with input_data(input_code) as (
values('1234512')
)
select t.*
from my_table t
cross join input_data
where left(code::text, length(input_code)) = input_code;
code
-----------
123451233
(1 row)

How to remove the space between the minus sign and number's in informatica

i have a issue where the there is a amount field which has data like
(- 98765.00),minus{spaces]{numbers} ?, i need to remove the space between the minus and the number and get is as (-98765.00), how do i do it in expression transformation.
field datatype is decimal (8,2).
Thanks,
Kiran
output_port: TO_DECIMAL(REPLACECHR(FALSE,input_port,' ',''))
REPLACECHR replaces the blanks with empty character, essentially removing them. The first argument can be TRUE/FALSE to specify case sensitive or not, but it is not important in this case.
You can use REG_REPLACE function to replace space
To achieve this you need to follow below steps,
* Create two variable ports
* REG_REPLACE - function requires string column, so you need to convert the decimal column to string column using TO_CHAR function
First variable port(string) - TO_CHAR(column_name)
* In previous port data is converted to string, now convert it again to decimal and apply REG_REPLACE function
Second variable port(decimal) - to_decimal(reg_replace(first_variable_port,'s+',''))
s - determines the white spaces in informatica regular expression
See the below image,
same number which you provided is used. Use the same data type and function
Debugger gives the exact result by removing white space in the below image,
May be you have the issue with other transformations which you are passing through. Debug and verify the data once.
Hope you got it, any issues feel free to ask
To have enjoy informatica, have a fun on https://etlinfromatica.wordpress.com/
If my understanding is correct, you need to replace both the spaces and the brackets. Here's the expression:
TO_DECIMAL(
REPLACECHR(0,
REPLACECHR(0, '(- 98765.00)', ' ', '') -- this part does the space replacement
, '()', '') -- this part replaces the brackets
)

DB2: find field value where first character is a lower case letter

I am trying to pick out a value in a field where the first character is a lower case letter. This is difficult since DB2 does not permit regular expressions. My current attempt is:
select * from mytable
where field1 like lcase('_%')
where I was hoping the underscore followed by percent wildcard would find any character in the first position, and then wrap the lcase() around that to ensure it is lower case. the result is that any and every value gets selected, so the lcase() is not performing what I want it to do, and in hindsight is used to cast to lowercase.
With that in mind, how to I ensure that the result of
('_%')
is lowercase only?
Thanks very much
i would use something like:
... where substr(field1,1,1) <> upper(substr(field1,1,1))
solution with 'a'...'z' will not work with characters different from latin characterset (e.g. cyrilic etc)
Why not
where field1 >= 'a' and field1 < '{'
This will even make use of an appropriate index, if any.
Be warned, however, that this won't work when your DB instance does lexicongraphic ordering. I am not sure if the latter is a DB attribute or a session attribute, however.
Another, more general way (especially when considering non ASCII letters) would be to check if the length of the field is > 0 and the lowercased substring consisting of the first character equals the substring consisting of the first character while the uppercased first character does not equal the first character. (Look up the functions in the DB2 reference, I have mine not ready at the moment.)
DB2 DOES allow Regular expressions with xQuery. For example:
with cteGender(VALUE) as
(
values
('M'),('F'),('U'),('S'),(' M'),('f')
),
cteResult(VALUE,RESULT_BOOLEAN) as
(
select '"' || VALUE || ‘"',
xmlquery('fn:matches($VALUE,''^[MFU]{1}$'')') from cteGender
)
select VALUE, RESULT_BOOLEAN,
xmlcast(RESULT_BOOLEAN as integer) RESULT_INTEGER from cteResult;
I took this example from: http://www.idug.org/p/bl/et/blogid=278&blogaid=187 That article explain very well how to use xQuery.
DB2 does not have SQL functions for Regular Expressions, but with xQuery you can do that. But if you really want SQL functions for RegEx, please visit this site: https://www.ibm.com/developerworks/jp/data/library/db2/j_d-regularexpression/ (In Japanese, but the code can be understood)
For more information about RegEx in DB2 please visit: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.xml.doc/doc/xqrregexp.html

regular expression clob field

I have a question related to an regular expression in oracle 10.
Assuming I have a value like 123456;12345;454545 stored in a clob field, is there a way via an regular expression to only filter on the second pattern (12345) knowing that the value can be more then 5 digits but always occurs after the first semicolon and always has a trailing semicolon at the end?
Thanks a lot for your support in that matter,
Have a nice day,
This query should give you your desired output.
SELECT REGEXP_REPLACE(REGEXP_SUBSTR('123456;12345;454545;45634',';[0-9]+;'),';')
FROM dual;
You can get filter any pattern using this query just change 2 to any value, but it should be less than or equal to the number of elements in the string
with tab(value) as
(select '123456;12345;454545' from dual)
select regexp_substr(value, '[^;]+', 1, 2) from tab;
easily by one call:
select regexp_replace('123456;12345;454545','^[0-9]+;([0-9]+);.*$','\1')
from dual;
perhaps, regexp expression can be modified in a way of more good-looking or your business logic, but the idea, I think, is clear.
select regexp_replace(regexp_substr(Col_name,';\d+;'),';','') from your_table;