Postgresql substring using regex - regex

I need to run a postgresql query to get names from database but I need to sort these names alphabetically.
The names that I am gettign from database are as follows:
(123) Jone Lee
(22) Hans Hee
2 Dean Alloni
Alen Khan
I need to output to be
Alen Khan
2 Dean Alloni
(22) Hans Hee
(123) Jone Lee
I tried the following psql query:
select name from table order by substring(name, E'\\W+\ +(.*)');
select name from table order by substring(name, E'\\(?\\w+?\\)?\ +?(.*)');
My problem if the name is Alen Khan, it only return Khan, so I get:
Khan
Dean Alloni
Hans Hee
Jone Lee
Any Help would be appreciate,
kind regards

select name
from table
order by substring(name, E'[a-zA-Z]+')
Edit as per OP's comment
select name
from table order by regexp_replace(name, '[^a-zA-Z]', '', 'g')

this will sort by strings last word
select name from table
order by (string_to_array(trim(name),' '))[ array_upper(string_to_array(trim(name),' '),1) ]

Related

Power BI How to remove duplicate rows?

In my report view, I have a table where the rows are repeated twice => once for each position available. I want to show only one row for each employee with his latest position. How can I accomplish this?
Name
Project
Date
Position
John Smith
PowerProject
01-01-2021
Engineer
John Smith
PowerProject
01-01-2021
Senior Engineer
Sort on the date. Group on the name. Choose All Rows as the function and change the code from _ to Table.Last(_) then expand

Group by similar strings in SAS within a column

I have the following table:
Name
----
John Smith
John Smth
Jane Lee
Jane Line
Timothy Brown
Timmothy Brown
Agnes James
Aaron James
Using SAS, how can I group these strings on a large scale to identify those that are similar, so that I can get this table:
Name
----
John Smith
John Smth
Timothy Brown
Timmothy Brown
There are many ways in SAS to perform comparisons of strings. A simple example is using SOUNDEX to find two strings that sound alike.
data have;
input Name $char20.;
datalines;
John Smith
John Smth
Jane Lee
Jane Line
Timothy Brown
Timmothy Brown
Agnes James
Aaron James
;
proc sql;
create table want as
select
A.name
, B.name as name2
, soundex(A.name) as sxname
, soundex(B.name) as sxname2
from have a
cross join
have b
where a.name lt b.name
having sxname = sxname2
;
Other techniques would use a matching criterion based on a metric such as Levenshtein edit distance, which can be computed with COMPLEV. You can also learn more about SPEDIS.
Searching up How to perform a fuzzy match using SAS functions and you will get plenty to chew on. Keep an eye out for papers by Charles Patridge

How to join two tables in informatica without using JOINER transformation?

If we use Joiner , then it is taking to much time.
We have table A and Flat file B. A table has following fields Name , DEPT, SALARY.
File B has following fields NAME and DEPT. We have to match the NAME in between table and file B and update DEPT field in File B on the basis of Value of DEPT present in Table A.
Table A
NAME DEPT SALARY
John WSS 10000
Micheal LSS 50000
Flat File B
NAME DEPT
JOHN
JOHN
Micheal
Micheal
Output(After Updation) Table B
NAME DEPT
JOHN WSS
JOHN WSS
Micheal LSS
Micheal LSS
There is some ways to improve the performance in your case:
In case both of your tables are located in same data base, you have to implement your join inside Source Qualifier. It's a most effective way.
In case you want to use joiner transformation, you have verify, that the smallest input (smallest table) is marked as Master.
It's also worth to sort the input and check "Sorted Input" option in your joiner transformation.
first import ur flat file b as a source
Flat File B
NAME DEPT
JOHN
JOHN
Micheal
Micheal
then You need to use Lookup transformation on table A
Table A
NAME DEPT SALARY
John WSS 10000
Micheal LSS 50000
drag the name column source to look up transformation
and check the look up condition
table A name and flat file name name=name
then drag name and dept in expression transformation
then target

Rearranging the order of the text in a character string in SAS?

I have a data set with a character variable called "name". It contains the full name of a person like this:
"firstname middlename lastname".
I want to have the data rearranged so that is becomes:
"lastname, firstname middlename".
I'm not that hardcore in SAS functions, but I have used some of the few I know.
(My code can be seen below).
In the first try (test2) I don't get the result I want - I get:
"lastName , firstName middleName" and not
"lastName, firstName middleName" - my problem is the comma.
So I thought that I would solve my problem by making af new last name variable containing the comma at the end (in test2_new). But I don't get what I want? SAS put three dots at the end, and not a comma?
I hope a person with more SAS skills than me, can answer my question??
Kind Regards
Maria
data have ;
input #1 text & $64. ;
datalines ;
Susan Smith
David A Jameson
Bruce Thomas Forsyth
;
run ;
data want ;
set have ;
lastname = scan(text,-1,' ') ;
firstnames = substr(text,1,length(text)-length(lastname)) ;
newname = catx(', ',lastname,firstnames) ;
run ;
Which gives
text lastname firstnames newname
Susan Smith Smith Susan Smith, Susan
David A Jameson Jameson David A Jameson, David A
Bruce Thomas Forsyth Forsyth Bruce Thomas Forsyth, Bruce Thomas
PERL expressions are a useful tool here, particularly PRXCHANGE. The SAS Support website provides a good example of how to reverse first and last name, here's a slight modification of that code. I've only catered for people with either 2 or 3 names, but it should be fairly simple to expand this if necessary. My code is based on the HAVE dataset created in the answer from #Chris J.
data want;
set have;
if countw(text)=2 then text = prxchange('s/(\w+) (\w+)/$2, $1/', -1, text);
else if countw(text)=3 then text = prxchange('s/(\w+) (\w+) (\w+)/$3, $1 $2/', -1, text);
run;

SAS: Remove observations from data set if they match an observation in another data set

I'm just learning SAS. This is a pretty simple question -- I'm probably overthinking it.
I have a data set called people_info and one of the variables is SocialSecurityNum. I have another table called invalid_ssn with a single variable: unique and invalid SocialSecurityNum observations.
I would like to have a DATA step (or PROC SQL step) that outputs to invalid_people_info if the SocialSecurityNum of the person (observation) matches one of the values in the invalid_ssn table. Otherwise, it will output back to people_info.
What's the best way to do this?
Edit: More info, to clarify...
people_info looks like this:
name SocialSecurityNum
joe 123
john 456
mary 876
bob 657
invalid_ssn looks like this:
SocialSecurityNum
456
876
What I want is for people_info to change (in place) and look like this:
name SocialSecurityNum
joe 123
bob 657
and a new table, called invalid_people_info to look like this:
name SocialSecurityNum
john 456
mary 876
The data step shown by Hong Ooi is great, but youou could also do this with proc sql without the need to sort first and also without actually doing a full merge.
proc sql noprint;
create table invalid_people_info as
select *
from people_info
where socialsecuritynum in (select distinct socialsecuritynum from invalid_ssn)
;
create table people_info as
select *
from people_info
where socialsecuritynum not in (select distinct socialsecuritynum from invalid_ssn)
;
quit;
This simply selects all rows where ssn is (not) in the distinct list of invalid ssn's.
Your requirement isn't clear. Do you want to remove all the invalid SSNs from people_info and put them into a new dataset? If so, this should work. You'll have to sort your datasets by SocialSecurityNum first.
data people_info invalid_people_info;
merge people_info (in=a) invalid_ssn (in=b);
by SocialSecurityNum;
if b then output invalid_people_info;
else output people_info;
run;