I have a comma separated value (csv) file which needs to be loaded into a table, using Informatica. First column contains a concatenated value, concatenated by a '~', which needs to be saved into 7 different columns.
Is there any way to do it?
There are four ways i think you can do but all has pros and cons.
First read using a Source Qualifier with ~ as delimiter and write it into a csv file. So now you have all columns are delimited by comma. Then use this file as source to your next process. Pro - easy process. cons - two step process so can take time. Also if your data has ,/~ you need to enclose them using "".
Use shell script to replace ~ with comma. So now you have all columns are delimited by comma Then use this file as source to your next process. cons - Script need to be careful not to replace ~ inside data.
You can use one source qualifier(read file as comma separated) and the split the first column into 7 parts using INSTR/SUBSTR combination.
Step1 - first find position of ~ in first column like this -
v_pos1 = InStr( col1, '~', 1, 1)
v_pos2 = InStr( col1, '~', 1, 2)
v_pos3 = InStr( col1, '~', 1, 3)
...
Step 2 -
o_val1 = iif(v_pos1 =0,col1, SubStr( col1, 1, v_pos1 - 1))
o_val2 = iif(v_pos2 =0,'', SubStr( col1, v_pos1 + 1), SubStr( col1, v_pos1 + 1, v_pos2 - v_pos1 - 1))
o_val2 = iif(v_pos2 =0,'', SubStr( col1, v_pos2 + 1), SubStr( col1, v_pos2 + 1, v_pos3 - v_pos2 - 1))
...
You can also use new created LOOP transformation. This is an extension and need some bit of research
Related
Given a name_loc column of text like the following:
{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}
I'm trying to extract the names, ideally separated by commas:
Charlie, Wrigley, Ana
I've gotten this far:
SELECT SUBSTRING(CAST(name_loc AS VARCHAR) from '"([^ –]+)')
FROM table;
which returns
Charlie
How can I extend this query to extract all names?
You can do this with a combination of regexp_matches (to extract the names), array_agg (to regroup all matches in a row) and array_to_string (to format the array as you'd like, e.g. with a comma separator):
WITH input(name_loc) AS (
VALUES ('{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}')
, ('{"Other - somewhere}') -- added this to show multiple rows do not get merged
)
SELECT array_to_string(names, ', ')
FROM input
CROSS JOIN LATERAL (
SELECT array_agg(name)
FROM regexp_matches(name_loc, '"(\w+)', 'g') AS f(name)
) AS f(names);
array_to_string
Charlie, Wrigley, Ana
Other
View on DB Fiddle
My two cents, though I'm rather new to postgreSQL and I had to copy the 1st piece from #Marth's his answer:
WITH input(name_loc) AS (
VALUES ('{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}')
, ('{"Other - somewhere"}')
)
SELECT REGEXP_REPLACE(name_loc, '{?(,)?"(\w+)[^"]+"}?','\1\2', 'g') FROM input;
regexp_replace
Charlie,Wrigley,Ana
Other
Your string literal happens to be a valid array literal.
(Maybe not by coincidence? And the column should be type text[] to begin with?)
If that's the reliable format, there is a safe and simple solution:
SELECT t.id, x.names
FROM tbl t
CROSS JOIN LATERAL (
SELECT string_agg(split_part(elem, ' – ', 1), ', ') AS names
FROM unnest(t.name_loc::text[]) elem
) x;
Or:
SELECT id, string_agg(split_part(elem, ' – ', 1), ', ') AS names
FROM (SELECT id, unnest(name_loc::text[]) AS elem FROM tbl) t
GROUP BY id;
db<>fiddle here
Steps
Unnest the array with unnest() in a LATERAL CROSS JOIN, or directly in the SELECT list.
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Take the first part with split_part(). I chose ' – ' as delimiter, not just ' ', to allow for names with nested space like "Anne Nicole". See:
Split comma separated column data into additional columns
Aggregate results with string_agg(). I added no particular order as you didn't specify one.
Concatenate multiple result rows of one column into one, group by another column
I'm I am a newbie with Regex and would like to know if it is possible to do that.
It is possible to locate the token position of a sub-string in a string like the below sample text?
AA|BBBBBBBBBB|XXXX||XXXX||FFFFFFFFFFF
Requesting the position of the 1st occurrence of 'XXXX' I must get '3', requesting the 2nd occurrence of 'XXXX' I must get '5', requesting the 3rd occurrence of 'XXXX' I must get '0' cause there's no a 3rd ocurrence.
This can be done using just regex?
Thanks in advance.
PS: If it is possible I will implement this solution on DB2 v7r2 using REGEX functions to replace an UDF I write long time ago on PLSQL to do this job.
This isn't how'd I'd normally use regex....
But it can get the job done...
create variable mysource varchar(50)
default('AA|BBBBBBBBBB|XXXX||XXXX||FFFFFFFFFFF');
select
regexp_count(
substring(mysource
, 1
,regexp_instr(mysource
,'XXXX'
,1
,2 --occurance
,1)
)
,'\|')
from sysibm.sysdummy1;
REGEXP_COUNT
5
Might need to concat a '|' to the end of the source if it's possible for the pattern to fall in the last position.
EDIT
Ok, here's a completely different way...using a recursive common table expression (RCTE)
Note that the solution is easiest if you ensure that the text ends with a delimiter...
create variable mysource varchar(50)
default('AA|BBBBBBBBBB|XXXX||XXXX||FFFFFFFFFFF|');
And the code..
with splitstring (pos, data, remain) as (
select 1
, substring(mysource,1,locate('|', mysource) -1 )
, substring(mysource,locate('|', mysource) + 1 )
from sysibm.sysdummy1
union all
select pos + 1
, substring(remain,1,locate('|', remain) -1 )
, substring(remain,locate('|', remain) + 1 )
, matches as (
select row_number() over (order by pos) as occur
,pos
from splitString
where data = 'XXXX'
)
select coalesce(pos,0) as pos
from sysibm.sysdummy1
left join matches
on occur = 2 ;
Results
POS
5
I would like to eliminate all duplicate words in a comma separated list.
I've tried with:
SELECT
REGEXP_REPLACE(
'1234,234,1234,1234,928,1234,123,1234,Abcd,1234,1234',
'([^,\w]+)(,[ ]*[\1])+') AS r
FROM dual
It should return
1234,234,928,123,Abcd
But in fact it returns
1234,234,234,234
Also tried with ([^,\w]+)(,[ ]*\1)+ but with '1234,1234,1234' it returns (null)
Also tried with
SELECT
REGEXP_REPLACE(
'1234,234,1234,1234,928,1234,123,1234,Abcd,1234,1234',
'([^,\w]+)(,[ ]*[\1])+', '\1') AS r
FROM dual
and following replacements, even '\1\2' but none of them is giving the desired result.
Please, any ideas?
I know this isn't exactly the method you were asking for, but it still achieves the same result:
WITH DATA AS
( SELECT '1234,234,1234,1234,928,1234,123,1234,Abcd,1234,1234' str FROM dual)
SELECT DISTINCT trim(regexp_substr(str, '[^,]+', 1, LEVEL)) str
FROM DATA
CONNECT BY instr(str, ',', 1, LEVEL - 1) > 0
I'm trying to split a string using a string as a delimiter, in an Oracle store procedure. I can use instr easily, but I'm trying to learn how to do this with regex, as I understand that it is powerful and efficient.
After reading some articles, I thought I could do this (expected result was "Hello"):
select regexp_substr('Hello My Delimiter World', '( My Delimiter )+', 1, 1)
from dual
Result:
My Delimiter
and (expected result was "World"):
select regexp_substr('Hello My Delimiter World', '( My Delimiter )+', 1, 2)
from dual
Result:
null
What is the correct regex_substr for this requirement?
EDIT: I'm looking for something like the below. In a single pass, it selects the sub-string within the string:
E.g. select regexp_substr('Hello World', '[^ ]+', 1, 2) from dual But this sample only works with a single character.
Try these methods.
This gets the first element as you originally asked for:
SQL> with tbl(str) as (
select 'Hello My Delimiter World' from dual
)
SELECT REGEXP_SUBSTR( str ,'(.*?)( My Delimiter |$)', 1, 1, NULL, 1 ) AS element
FROM tbl;
ELEME
-----
Hello
This version parses the whole string. NULL elements added to show it works with missing elements:
SQL> with tbl(str) as (
select ' My Delimiter Hello My Delimiter World My Delimiter My Delimiter test My Delimiter ' from dual
)
SELECT LEVEL AS element,
REGEXP_SUBSTR( str ,'(.*?)( My Delimiter |$)', 1, LEVEL, NULL, 1 ) AS element_value
FROM tbl
CONNECT BY LEVEL <= regexp_count(str, ' My Delimiter ')+1;
ELEMENT ELEMENT_VALUE
---------- --------------------
1
2 Hello
3 World
4
5 test
6
6 rows selected.
I'm trying to do a regex match to return a substring between a start and end point.
Given the following table:
WITH test AS (SELECT 'ABCD_EFGH_THIS_IJKL' AS thetext FROM DUAL
UNION SELECT 'ABAB CDCD EG BCD' FROM DUAL)
SELECT *
FROM test
I would want to return the results:
'THIS'
NULL
So it would match THIS in the first string, and nothing in the second string.
For this is safe to assume that ABCD_EFGH preceeds the text i want to match, and _ follows the text I want to match.
Thanks for any help!
EDIT: This needs to work on 10g. Sorry for not making that clear turbanoff.
use REGEXP_SUBSTR with 11g
WITH test AS (SELECT 'ABCD_EFGH_THIS_IJKL' AS thetext FROM DUAL
UNION SELECT 'ABAB CDCD EG BCD' FROM DUAL)
SELECT REGEXP_SUBSTR( TEST.THETEXT, 'ABCD_EFGH_([^_]*).*', 1, 1, 'i', 1)
FROM test
Edit
This can be done without using regular expressions.
WITH test AS (SELECT 'ABCD_EFGH_THIS_IJKL' AS thetext FROM DUAL
UNION SELECT 'ABAB CDCD EG BCD' FROM DUAL)
select TEST.thetext
, instr(TEST.thetext, 'ABCD_EFGH_') + length('ABCD_EFGH_') START_POS
, instr(TEST.thetext, '_', length('ABCD_EFGH_') + 1) END_POS
, substr
(TEST.thetext
,instr(TEST.thetext, 'ABCD_EFGH_') + length('ABCD_EFGH_') --START_POS
,instr(TEST.thetext, '_', length('ABCD_EFGH_') + 1) - (instr(TEST.thetext, 'ABCD_EFGH_') + length('ABCD_EFGH_')) --END_POS - START_POS
) RESULT
FROM test