Oracle joins Vs Informatica Joins - informatica

I have two Tables A(master) and B (detail)
in oracle : A right join B --> Matched records from both + Non matched from Right
A left join B --> Matched records from both + Non matched from Left
so what is the equivalent output in informatica ?

Oracle and Informatica outer joins works in similar way.
In oracle - 'master' right join 'detail' means all matching + non matching from 'detail'
In informatica Joiner Transformation, refer to join type in properties,
A'Master' outer join means all matching + non matching from 'detail' B
'Detail' outer join means all matching + non matching from 'master' A
'Full' outer join means all matching + non matching from 'detail' B+ non matching from 'master' A

Master outer join: In Master outer join, all rows from the Detail source are returned by the join and only matching rows from the Master source are returned.
Detail outer join: In detail outer join, only matching rows are returned from the Detail source, and all rows from the Master source are returned.
Full outer join: In full outer join, all records from both the sources are returned. Master outer and Detail outer joins are equivalent to left outer joins in SQL.
Normal join: In normal join only matching rows are returned from both the sources

Related

Vertica REGEXP_SUBSTR use /g flag

I am trying to extract all occurrences of a word before '=' in a string, i tried to use this regex '/\w+(?=\=)/g' but it returns null, when i remove the first '/' and the last '/g' it returns only one occurrence that's why i need the global flag, any suggestions?
As Wiktor pointed out, by default, you only get the first string in a REGEXP_SUBSTR() call. But you can get the second, third, fourth, etc.
Embedded into SQL, you need to treat regular expressions differently from the way you would treat them in perl, for example. The pattern is just the pattern, modifiers go elsewhere, you can't use $n to get the n-th captured sub-expression, and you need to proceed in a specific way to get the n-th match of a pattern, etc.
The trick is to CROSS JOIN your queried table with an in-line created index table, consisting of as many consecutive integers as you expect occurrences of your pattern - and a few more for safety. And Vertica's REGEXP_SUBSTR() call allows for additional parameters to do that. See this example:
WITH
-- one exemplary input row; concatenating substrings for
-- readability
input(s) AS (
SELECT 'DRIVER={Vertica};COLUMNSASCHAR=1;CONNECTIONLOADBALANCE=True;'
||'CONNSETTINGS=set+search_path+to+public;DATABASE=sbx;'
||'LABEL=dbman;PORT=5433;PWD=;SERVERNAME=127.0.0.1;UID=dbadmin;'
)
,
-- an index table to CROSS JOIN with ... maybe you need more integers ...
loop_idx(i) AS (
SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 10
)
,
-- the query containing the REGEXP_SUBSTR() call
find_token AS (
SELECT
i -- the index from the in-line index table, needed
-- for ordering the outermost SELECT
, REGEXP_SUBSTR (
s -- the input string
, '(\w+)=' -- the pattern - a word followed by an equal sign; capture the word
, 1 -- start from pos 1
, i -- the i-th occurrence of the match
, '' -- no modifiers to regexp
, 1 -- the first and only sub-pattern captured
) AS token
FROM input CROSS JOIN loop_idx -- the CROSS JOIN with the in-line index table
)
-- the outermost query filtering the non-matches - the empty strings - away...
SELECT
token
FROM find_token
WHERE token <> ''
ORDER BY i
;
The result will be one row per found pattern:
token
DRIVER
COLUMNSASCHAR
CONNECTIONLOADBALANCE
CONNSETTINGS
DATABASE
LABEL
PORT
PWD
SERVERNAME
UID
You can do all sorts of things in modern SQL - but you need to stick to the SQL and to the relational paradigm - that's all ...
Happy playing ...
Marco

Notepad++ complex conditional search regex

I have a database SQL followed by a bunch of statements to collect statistics. I'd like to search the SQL for a specific join and find all corresponding collect statistics statements and then modify them to remove extraneous chars to finally extract a useful bunch of statements Input
select tbd.cola , tba.a, tbx.b,
tbc.r,
tbx.c ,
case when yada ya then tbx.c + xyz else 'daddy' end as nicecol
, tbx.g
from
tbd join tba on tbd.cola = tba.colb
left join
tbx on tbx.colp= tba.colp left join
tbc on tbc.colfff=tbx.colm join......
/*this is followed by a bunch of statements in format */
---- "collect stats column (cola,colbxx)
on tbd ( medium strong )"
---- "collect stats column (colfff) on tbc ( not
strong )"
---- "collect stats column ( colddsdsd) on tbc ( very strong )"
----"collect stats col (yada,secretxxx,xxx) on tbx ( strong ) "
note the spacing between follows logic
(/s*medium|not|very/s*strong/s*)
same thing for
---- "collect stats column
in other words - variable spacing between all the words.
No consistent spacing pattern and
the statements arbitrarily span between multiple lines or squeeze in a single line.
What I'd like to do is :
Search for column names being joined e.g. tbd.cola = tba.colb
Then look for these column names in the collect statistics statements so in our case
cola colp colm colfff are they join column names that come from
tbd join tba on tbd.cola = tba.colb
left join
tbx on tbx.colp= tba.colp left join
tbc on tbc.colfff=tbx.colm
we search for these in the collect stats statements and the following qualify
---- "collect stats column (cola,colbxx) on tbd ( medium strong )"
---- "collect stats column (colfff) on tbc ( not strong )"
Next the statements have to be "purified" so the extraneous chars & writing around em are removed.The desirable output format is below
collect stats column (cola,colbxx) on tbd;
collect stats column (colfff) on tbc ;
remove the ---- " pattern [-]+?" and
replace ( <string with or without space and with variable spaces around it> )" of the form ( not strong )" with ;
What I did was multistep process. I could manage the 3rd part using
"\s*([^"]+ strong\s*)\)
so that is like done but I am looking for a conditional select approach here. Need help w/ the 1st two.
there is no need to use boundaries to select the collect stats statement. I could select that part using my mouse and then work a regex in the selected part only
The logic would be to
search for join\s*tablename.column\s*\=\s*tablename.column pattern. The \= has = escaped
collect all matching column names into a buffer
Then create boundaries or physically select the part where collect statistics statement begins.
Run the select column list through the bunch of collect stats statements to see which qualify.
if there is a column combination like collect stats column (cola,colbxx) and only cola is a join column - that is also selected since one of em cols is a join column
Finally we have a shortlisted collect statistics statement bunch on which we run the last regex ( logic "\s*([^"]+ strong\s*)\))to rid it off extraneous characters.
We can break this operation into 2 components. 1st part is the conditional search. Search for joined column names in the collect statistics area. Search results get copied and pasted into another work area ( a new file ) and then we run the last part above on this selected file.
Ok I found something ! It works for the example you gave, but I can't have anticipated all possibilities, so tell me if it works for you.
It uses 2 substitutions. Make sure you checked regular expression, and the box next to it (saying something like ". matches new lines")
First substitution :
Replace this :
join\s+\w+\s+on\s+\w+\.(\w+)\b\s*=\s*\w+\.(\w+)\b(?=.*-+\s+"([^"]+(?:\1|\2)[^"]+)(\s)+\([^)]+\)")|.
By this :
\3\4
Second substitution :
Replace this :
(collect.*?)\s+(on\s\w+)\s
By this :
`\1 \2;\n
Demo
First substitution : Regex101
Second substitution : Regex101
Explanations
The regex is based on a alternation. The first part is
join\s+\w+\s+on\s+\w+\.(\w+)\b\s*=\s*\w+\.(\w+)\b(?=.*-+\s+"([^"]+(?:\1|\2)[^"]+)(\s)+\([^)]+\)")
join\s+\w+\s+on\s+\w+\.(\w+)\b\s*=\s*\w+\.(\w+)\b matches a string built like that : join tbname on tbname.cola = tbname.colb. Note that spaces around the = are optional and the names of cola and colb are captured for future use.
(?=.*-+\s+"([^"]+(?:\1|\2)[^"]+)(\s)+\([^)]+\)") allows the precedent match only if there is later in the file a string like ---- "[...] [cola OR colb] [...] ([...])", or in other words, a string beginning with multiples -, then 1 or more spaces and a ", ending with a pair of () and a ", and containing either cola or colb (or both).
It will look for a match like that at each position in the file, and for each position, if it does not match, it will go to the second part of the alternation, which is . (anything). So in the end, it will match the whole file, but if it matched some joined columns, capturing groups will contain something which is then written in the file through the replacement \3\4
The second substitution is just a reformatting of the lines kept.
Notes
I could do it with a single substitution, but it would be much more
ugly.
It might be strange, I had to erase the text that need to be kept at the end and rewrite it. The reason is Notepad++ does not allow lookbehinds to have a non defined size.
Depending on the size of your file, the first substitution might take much more time that for the example. I don't know how Notepad++ reacts when it takes too much time, but it might crash... If it is the case, we will have to split the process into multiples smaller substitutions.

postgres string compare

I am using a postgres version 8.3 (greenplum). I am trying to compare two tables on a single column called col_name. What I need is a partial string compare on both the column row values. The values are xx.yyy.zzz. I want to pull out the first part namely 'xx' and truncate after that namely '.yyy.zzz'. I only want to compare for two rows for the string till the first period is encountered. There is possibility that the part of the string xx is of varying lengths in characters.
I am using the following logic, but I cant see why it is not working:
select
distinct x.col_name,
x.col_num
from table_A x
left outer join table_b y
on
regexp_matches((x.col_name,'^(?:([^.]+)\.?){1}',1),(y.col_name,'^(?:([^.]+)\.?){1}', 1))
and x.col_num=y.col_num;
I am getting this error:
ERROR: function regexp_matches(record, record) does not exist LINE
36: regexp_matches((x.col_name,'^(?:([^.]+).?){1}', 1),(y....
^ HINT: No function matches the given name and argument types. You may need to add explicit type casts.
********** Error **********
ERROR: function regexp_matches(record, record) does not exist SQL
state: 42883 Hint: No function matches the given name and argument
types. You may need to add explicit type casts. Character: 917
Can anyone help me out?
Thanks!
You can use the split_part function. Split the string to parts using '.' as the delimiter and compare the first components.
See documentation
So your query would be:
select
distinct x.col_name,
x.col_num
from table_A x
left outer join table_b y
on split_part(x.col_name, '.', 1) = split_part(y.col_name, '.', 1)
and x.col_num=y.col_num;
Your original query produces an error because you give strange parameters to the regexp_matches function.
The signature is regexp_matches(string text, pattern text [, flags text]), but your first argument to it is (x.col_name,'^(?:([^.]+)\.?){1}',1) which is not a string (and the same applies for the second argument)

Regex non-capturing parenthesis issue

I have a database query which looks like this
select * from students join (select * from teachers) join (select * from workers
I had a requirement to tokenize this string based on 'select'.
I am trying regex (select)(.*?)((?:select)|$), ut it is matching only 2 times.
Request some pointers on how to achieve this.
I need the 3 output tokens as below
select * from students join (
select * from teachers) join (
select * from workers
I think this regex will work:
select.*?(?=select|$)
The regex matches the word select, then any text (not including new lines) up until right before the next select or the end of the string.
Demonstration here: http://regex101.com/r/sR3gV1
If you are trying to parse the select queries from the string then you can use this regex. Assuming you are not doing select from multiple tables(i.e. not doing select * from x,y,z)
(select.*?from\\s+\\w+)

How to compare Unicode characters in SQL server?

Hi I am trying to find all rows in my database (SQL Server) which have character é in their text by executing the following queries.
SELECT COUNT(*) FROM t_question WHERE patindex(N'%[\xE9]%',question) > 0;
SELECT COUNT(*) FROM t_question WHERE patindex(N'%[\u00E9]%',question) > 0;
But I found two problems: (a) Both of them are returning different number of rows and (b) They are returning rows which do not have the specified character.
Is the way I am constructing the regular expression and comparing the Unicode correct?
EDIT:
The question column is stored using datatype nvarchar.
The following query gives the correct result though.
SELECT COUNT(*) FROM t_question WHERE question LIKE N'%é%';
Why not use SELECT COUNT(*) FROM t_question WHERE question LIKE N'%é%'?
NB: Likeand patindex do not accept regular expressions.
In the SQL Server pattern syntax [\xE9] means match any single character within the specified set. i.e. match \, x, E or 9. So any of the following strings would match that pattern.
"Elephant"
"axis"
"99.9"