PostgreSQL regex - split column to array

PostgreSQL regex - split column to array - regex

I have a table music:
author | music
----------------------+-------
Kevin Clein | a
Gucio G. Gustawo | b
R. R. Andrzej | c
John McKnight Burman | d
How can I split a column which contain two different symbols (space and dot) and how to split name and surmane correctly to have result like:
author | name | surname
----------------------+---------+----------------
Kevin Clein | Kevin | Clein
Gucio G. Gustawo | Gucio G.| Gustawo
R. R. Andrzej | R. R. | Andrzej
John McKnight Burman | John | McKnight Burman
I have tried something like that so far:
WITH ad AS(
SELECT author,
s[1] AS name,
s[2] AS surname
FROM (SELECT music.*,
regexp_split_to_array(music.author,E'\\s[.]') AS s
FROM music)t
)SELECT * FROM ad;

I've create a possible solution to you. Be aware that it may not solve all problems and you will need to create an extra table to solve rules problem. By rule I mean what I've said in the comments like:
When to decide which is name and surname.
So in order to solve your problem I had to create another table that will handle surnames that should be considered as so.
The test case scenario:
create table surname (
id SERIAL NOT NULL primary key,
sample varchar(100)
);
--Test case inserts
insert into surname (sample) values ('McKnight'), ('McGregory'), ('Willian'), ('Knight');
create table music (
id SERIAL NOT NULL primary key,
author varchar(100)
);
insert into music (author) values
('Kevin Clein'),
('Gucio G. Gustawo'),
('R. R. Andrzej'),
('John McKnight Burman'),
('John Willian Smith'),
('John Williame Smith');
And My proposed solution:
select author,
trim(replace(author, surname, '')) as name,
surname
from (
select author,
case when position(s.sample in m.author)>0
then (regexp_split_to_array( m.author, '\s(?='||s.sample||')' ))[2]::text
else trim(substring( author from '\s\w+$' ))
end as surname
from music m left join surname s
on m.author like '%'||s.sample||'%'
where case when position(s.sample in m.author)>0
then (regexp_split_to_array( m.author, '\s(?='||s.sample||')' ))[2]::text
else trim(substring( author from '\s\w+$' )) end is not null
) as x
The output will be:
AUTHOR NAME SURNAME
------------------------------------------------------------
Kevin Clein Kevin Clein
Gucio G. Gustawo Gucio G. Gustawo
R. R. Andrzej R. R. Andrzej
John McKnight Burman John McKnight Burman
John Willian Smith John Willian Smith
John Williame Smith John Williame Smith
See it working here: http://sqlfiddle.com/#!15/c583f/2
In the table surname you will insert all names that should be considered as surname.
You may want to sub-query the query that do the case expression so you would use just the field instead of the hole case statement again on the where clause.

Related

Need to delete a substring in one column if equal to string in another

I have a table with street and city fields however most street fields have the city listed in it as well at the end. I have identified those records but now would like to delete the city from the street. Here is the query that I used to identify bad records. I am just learning SQL and would appreciate any help.
SELECT *
FROM mytable
WHERE Mailing_Street like CONCAT('%',SUBSTRING(mailing_city,1,Len(mailing_city)))
I am looking for
street
123 Main St Anytown
to be updated to
street
123 Main St
where city = Anytown

Group by similar strings in SAS within a column

I have the following table:
Name
----
John Smith
John Smth
Jane Lee
Jane Line
Timothy Brown
Timmothy Brown
Agnes James
Aaron James
Using SAS, how can I group these strings on a large scale to identify those that are similar, so that I can get this table:
Name
----
John Smith
John Smth
Timothy Brown
Timmothy Brown

There are many ways in SAS to perform comparisons of strings. A simple example is using SOUNDEX to find two strings that sound alike.
data have;
input Name $char20.;
datalines;
John Smith
John Smth
Jane Lee
Jane Line
Timothy Brown
Timmothy Brown
Agnes James
Aaron James
;
proc sql;
create table want as
select
A.name
, B.name as name2
, soundex(A.name) as sxname
, soundex(B.name) as sxname2
from have a
cross join
have b
where a.name lt b.name
having sxname = sxname2
;
Other techniques would use a matching criterion based on a metric such as Levenshtein edit distance, which can be computed with COMPLEV. You can also learn more about SPEDIS.
Searching up How to perform a fuzzy match using SAS functions and you will get plenty to chew on. Keep an eye out for papers by Charles Patridge

Sitecore query syntax: Select all female descendents whose parent isn't called Jack

Given a Sitecore content tree of Males and Females (each sex with own template) representing a family tree, how would I select all Female descendents of an item where the parent wasn't called Jack using Sitecore query?
Context: My context item is one of Bob's children. My query shouldn't return Bob himself. Bob also has hundreds of brothers with thousands of descendants that I really don't want appearing in my results.
Bob
Sarah
Jim
Julie
John
Sue
Jack
Anne
Jack
Claire
Mary
The query should return: Sarah, Julie, Sue and Mary but not Anne or Claire.
I can select all female descendents of Bob with:
..//*[##templateid='{insert female template id here}']
But how do I add the parent name != Jack clause?

If you had a "family root" node that did not represent a person by itself, you could do this:
/path/to/family root//*[##name != 'Jack']/*[##templateid = '{template id}']
In your case, you want only a certain person's descendants to be returned. The person themselves should not be included in the result set. In that case, the approach from your comment is the way to go:
..//*[../##name != 'Jack' AND ##templateid = '{template id}']
The results of both queries will include Mary since her direct parent is not called Jack.

How to join two tables in informatica without using JOINER transformation?

If we use Joiner , then it is taking to much time.
We have table A and Flat file B. A table has following fields Name , DEPT, SALARY.
File B has following fields NAME and DEPT. We have to match the NAME in between table and file B and update DEPT field in File B on the basis of Value of DEPT present in Table A.
Table A
NAME DEPT SALARY
John WSS 10000
Micheal LSS 50000
Flat File B
NAME DEPT
JOHN
JOHN
Micheal
Micheal
Output(After Updation) Table B
NAME DEPT
JOHN WSS
JOHN WSS
Micheal LSS
Micheal LSS

There is some ways to improve the performance in your case:
In case both of your tables are located in same data base, you have to implement your join inside Source Qualifier. It's a most effective way.
In case you want to use joiner transformation, you have verify, that the smallest input (smallest table) is marked as Master.
It's also worth to sort the input and check "Sorted Input" option in your joiner transformation.

first import ur flat file b as a source
Flat File B
NAME DEPT
JOHN
JOHN
Micheal
Micheal
then You need to use Lookup transformation on table A
Table A
NAME DEPT SALARY
John WSS 10000
Micheal LSS 50000
drag the name column source to look up transformation
and check the look up condition
table A name and flat file name name=name
then drag name and dept in expression transformation
then target

Rearranging the order of the text in a character string in SAS?

I have a data set with a character variable called "name". It contains the full name of a person like this:
"firstname middlename lastname".
I want to have the data rearranged so that is becomes:
"lastname, firstname middlename".
I'm not that hardcore in SAS functions, but I have used some of the few I know.
(My code can be seen below).
In the first try (test2) I don't get the result I want - I get:
"lastName , firstName middleName" and not
"lastName, firstName middleName" - my problem is the comma.
So I thought that I would solve my problem by making af new last name variable containing the comma at the end (in test2_new). But I don't get what I want? SAS put three dots at the end, and not a comma?
I hope a person with more SAS skills than me, can answer my question??
Kind Regards
Maria

data have ;
input #1 text & $64. ;
datalines ;
Susan Smith
David A Jameson
Bruce Thomas Forsyth
;
run ;
data want ;
set have ;
lastname = scan(text,-1,' ') ;
firstnames = substr(text,1,length(text)-length(lastname)) ;
newname = catx(', ',lastname,firstnames) ;
run ;
Which gives
text lastname firstnames newname
Susan Smith Smith Susan Smith, Susan
David A Jameson Jameson David A Jameson, David A
Bruce Thomas Forsyth Forsyth Bruce Thomas Forsyth, Bruce Thomas

PERL expressions are a useful tool here, particularly PRXCHANGE. The SAS Support website provides a good example of how to reverse first and last name, here's a slight modification of that code. I've only catered for people with either 2 or 3 names, but it should be fairly simple to expand this if necessary. My code is based on the HAVE dataset created in the answer from #Chris J.
data want;
set have;
if countw(text)=2 then text = prxchange('s/(\w+) (\w+)/$2, $1/', -1, text);
else if countw(text)=3 then text = prxchange('s/(\w+) (\w+) (\w+)/$3, $1 $2/', -1, text);
run;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

PostgreSQL regex - split column to array - regex

Related

Need to delete a substring in one column if equal to string in another

Group by similar strings in SAS within a column

Sitecore query syntax: Select all female descendents whose parent isn't called Jack

How to join two tables in informatica without using JOINER transformation?

Rearranging the order of the text in a character string in SAS?

Categories

Resources