postgres string compare - regex

I am using a postgres version 8.3 (greenplum). I am trying to compare two tables on a single column called col_name. What I need is a partial string compare on both the column row values. The values are xx.yyy.zzz. I want to pull out the first part namely 'xx' and truncate after that namely '.yyy.zzz'. I only want to compare for two rows for the string till the first period is encountered. There is possibility that the part of the string xx is of varying lengths in characters.
I am using the following logic, but I cant see why it is not working:
select
distinct x.col_name,
x.col_num
from table_A x
left outer join table_b y
on
regexp_matches((x.col_name,'^(?:([^.]+)\.?){1}',1),(y.col_name,'^(?:([^.]+)\.?){1}', 1))
and x.col_num=y.col_num;
I am getting this error:
ERROR: function regexp_matches(record, record) does not exist LINE
36: regexp_matches((x.col_name,'^(?:([^.]+).?){1}', 1),(y....
^ HINT: No function matches the given name and argument types. You may need to add explicit type casts.
********** Error **********
ERROR: function regexp_matches(record, record) does not exist SQL
state: 42883 Hint: No function matches the given name and argument
types. You may need to add explicit type casts. Character: 917
Can anyone help me out?
Thanks!

You can use the split_part function. Split the string to parts using '.' as the delimiter and compare the first components.
See documentation
So your query would be:
select
distinct x.col_name,
x.col_num
from table_A x
left outer join table_b y
on split_part(x.col_name, '.', 1) = split_part(y.col_name, '.', 1)
and x.col_num=y.col_num;
Your original query produces an error because you give strange parameters to the regexp_matches function.
The signature is regexp_matches(string text, pattern text [, flags text]), but your first argument to it is (x.col_name,'^(?:([^.]+)\.?){1}',1) which is not a string (and the same applies for the second argument)

Related

Extract string after last match strings [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I'm using BigQuery and I want to extract string after the specific match strings, in my case, the strings is sc
I have a string like this :
www.xxss.com?psct=T-EST2%20.coms&.com/u[sc'sc(mascscin', sc'.c(scscossccnfiscg.scjs']-/ci=1(sctitis)
My expected result is:
titis)
Is this possible?
In general, across all RDBMS finding the index of the last instance of a match in a string is easy to compute by first reversing the string. Then we are only looking for the first match.
Update: BigQuery
Follow the documentation for REGEXP_EXTRACT in the String Functions documentation for BigQuery
NOTE: BigQuery provides regular expression support using the re2 library; see that documentation for its regular expression syntax.
However, this problem can be solved without RegEx.
BigQuery supports array processing and has a SPLIT function, so you could split by the lookup variable and capture only the last result:
SELECT ARRAY_REVERSE(SPLIT( !YOUR COLUMN HERE! , "sc"))[OFFSET(1)]
The following adaptation from my original submission may still work:
SELECT REVERSE(SUBSTR(REVERSE(#text), 1, STRPOS(REVERSE(#text), "cs") -1))
For those who have a similar requirement in MS SQL Server the following syntax can be used.
other RDBMS can use a similar query, you will have to use the appropriate platform functions to acheive the result.
DECLARE #text varchar(200) = 'www.xxss.com?psct=T-EST2%20.coms&.com/u[sc''sc(mascscin'', sc''.c(scscossccnfiscg.scjs'']-/ci=1(sctitis)'
SELECT REVERSE(LEFT(REVERSE(#text), CharIndex('cs', REVERSE(#text),1) -1))
Produces: titis)
You could achieve a similar result by obtaining the last index of 'sc' as above and using that value in a SUBSTRING however for that to work you need to re-compute the Length, this solution instead uses the LEFT function and then REVERSE's the result , reducing the functional complexity of the query by 1 (1 less function call)
Step this through:
Reverse the value:
SELECT REVERSE(#text)
Results in:
)sititcs(1=ic/-]'sjcs.gcsifnccssocscs(c.'cs ,'nicscsam(cs'cs[u/moc.&smoc.02%2TSE-T=tcsp?moc.ssxx.www
Now we find the first Index of 'cs'
Note: we have to reverse the sequece of the lookup string as well!
SELECT CharIndex('cs', REVERSE(#text),1)
Result: 7
Select the characters before this index:
Note: we must use -1 here because SQL uses 1-based index result from CharIndex so we must reduce it by 1
SELECT LEFT(REVERSE(#text), CharIndex('cs', REVERSE(#text),1) -1)
Finally, we reverse the result:
SELECT REVERSE(LEFT(REVERSE(#text), CharIndex('cs', REVERSE(#text),1) -1))
Guess you could use 'sc' as seperator, define (if constant string length) string length in your query (wildcard),
STRING_SPLIT ( string , separator )

Partially match integers in PostgreSQL queries

So in my PostgreSQL 10 I have a column of type integer. This column represents a code of products and it should be searched against another code or part of the code. The values of the column are made of three parts, a five-digit part and two two-digit parts. Users can search for only the first part, the first-second or first-second-third.
So, in my column I have , say 123451233 the user searches for 12345 (the first part). I want to be able to return the 123451233. Same goes if the users also searches for 1234512 or 123451233.
Unfortunately I cannot change the type of column or break the one column into three (one for every part). How can I do this? I cannot use LIKE. Maybe something like a regex for integers?
Thanks
Consider to use simple arithmetic.
log(value)::int + 1 returns the number of digits in integer part of the value and using this:
value/(10^(log(value)::int-log(search_input)::int))::int
returns value truncated to the same digits number as search_input so, finally
search_input = value/(10^(log(value)::int-log(search_input)::int))::int
will make the trick.
It is more complex literally but also could be more efficient then strings manipulations.
PS: But having index like create index idx on your_table(cast(your_column as text)); search like
select * from your_table
where cast(your_column as text) like search_input || '%';
is the best case IMO.
You do not need regex functions. Cast the integer to text and use the function left(), example:
create table my_table(code int); -- or bigint
insert into my_table values (123451233);
with input_data(input_code) as (
values('1234512')
)
select t.*
from my_table t
cross join input_data
where left(code::text, length(input_code)) = input_code;
code
-----------
123451233
(1 row)

How to remove the space between the minus sign and number's in informatica

i have a issue where the there is a amount field which has data like
(- 98765.00),minus{spaces]{numbers} ?, i need to remove the space between the minus and the number and get is as (-98765.00), how do i do it in expression transformation.
field datatype is decimal (8,2).
Thanks,
Kiran
output_port: TO_DECIMAL(REPLACECHR(FALSE,input_port,' ',''))
REPLACECHR replaces the blanks with empty character, essentially removing them. The first argument can be TRUE/FALSE to specify case sensitive or not, but it is not important in this case.
You can use REG_REPLACE function to replace space
To achieve this you need to follow below steps,
* Create two variable ports
* REG_REPLACE - function requires string column, so you need to convert the decimal column to string column using TO_CHAR function
First variable port(string) - TO_CHAR(column_name)
* In previous port data is converted to string, now convert it again to decimal and apply REG_REPLACE function
Second variable port(decimal) - to_decimal(reg_replace(first_variable_port,'s+',''))
s - determines the white spaces in informatica regular expression
See the below image,
same number which you provided is used. Use the same data type and function
Debugger gives the exact result by removing white space in the below image,
May be you have the issue with other transformations which you are passing through. Debug and verify the data once.
Hope you got it, any issues feel free to ask
To have enjoy informatica, have a fun on https://etlinfromatica.wordpress.com/
If my understanding is correct, you need to replace both the spaces and the brackets. Here's the expression:
TO_DECIMAL(
REPLACECHR(0,
REPLACECHR(0, '(- 98765.00)', ' ', '') -- this part does the space replacement
, '()', '') -- this part replaces the brackets
)

postgresql: How to concatenate two regexp_matches()

I'm trying to extract both ints and chars from names such as 123A America, 234B Britania.
I only want the the number and the attached letter (i.e. 123A) .
I'm using regexp_matches(name, '(\d+)(\D)') and it results as:
{123,A},
{456,B}
I thought using concatenation, getting the first element of an array and the second element using two different functions
(regexp_matches(name, '(\d+)(\D)' )) [1] || (regexp_matches(name, '(\d+)(\D)' )) [2]
But it generates an error:
ERROR: functions and operators can take at most one set argument
How can I get the two element as one string?
You don't have to get the two items you're searching for as different sets, just get them as a single set. Remove the )( between \d+ and \D and that will return a set containing the entire string you're looking for.
Results in this -
regexp_matches('123A America, 234B Britania', '(\d+\D)' )
This will only find the first match. To get all matching substrings, use the g flag -
regexp_matches('123A America, 234B Britania', '(\d+\D)', 'g')
good answer by #Scott S however if you can't achieve what you need within one capture group the solution is to write a function, assign the regexp result to a variable and then use it.
CREATE OR REPLACE FUNCTION do_something(_input character varying)
RETURNS character varying AS
$BODY$
DECLARE
matches text[];
BEGIN
matches := regexp_matches(_input, '^([0-9]{1,}_[^_]{1,})_[a-z]{1,}(.*)$','i');
return substring(matches[1], 0, 24)||matches[2];
END
$BODY$
LANGUAGE plpgsql;

How to compare Unicode characters in SQL server?

Hi I am trying to find all rows in my database (SQL Server) which have character é in their text by executing the following queries.
SELECT COUNT(*) FROM t_question WHERE patindex(N'%[\xE9]%',question) > 0;
SELECT COUNT(*) FROM t_question WHERE patindex(N'%[\u00E9]%',question) > 0;
But I found two problems: (a) Both of them are returning different number of rows and (b) They are returning rows which do not have the specified character.
Is the way I am constructing the regular expression and comparing the Unicode correct?
EDIT:
The question column is stored using datatype nvarchar.
The following query gives the correct result though.
SELECT COUNT(*) FROM t_question WHERE question LIKE N'%é%';
Why not use SELECT COUNT(*) FROM t_question WHERE question LIKE N'%é%'?
NB: Likeand patindex do not accept regular expressions.
In the SQL Server pattern syntax [\xE9] means match any single character within the specified set. i.e. match \, x, E or 9. So any of the following strings would match that pattern.
"Elephant"
"axis"
"99.9"