Renaming columns of dataframe with values from another dataframe - python-2.7

so I have a dataframe that roughly looks like this:
name1 name2 name3
123 456 678
123 456 678
123 456 678
and another dataframe that looks like this
name2 abc
name3 cdf
name1 fgh
Is there any way I can make the first dataframe column names like this:
fgh abc cdf
123 456 678
123 456 678
123 456 678
Thanks.

Use rename by Series with set_index for index by column A:
print (df2)
A B
0 name2 abc
1 name3 cdf
2 name1 fgh
df1 = df1.rename(columns=df2.set_index('A')['B'])
print (df1)
fgh abc cdf
0 123 456 678
1 123 456 678
2 123 456 678
Detail:
print (df2.set_index('A')['B'])
A
name2 abc
name3 cdf
name1 fgh
Name: B, dtype: object
Or by dictionary created by zip:
df1 = df1.rename(columns=dict(zip(df2.A, df2.B)))
Detail:
print (dict(zip(df2.A, df2.B)))
{'name3': 'cdf', 'name1': 'fgh', 'name2': 'abc'}

You can using Series get and assign it back
df.columns=s.get(df.columns)
df
Out[223]:
s1 fgh abc cdf
0 123 456 678
1 123 456 678
2 123 456 678

Related

Summation for more than once of the dataset using proc sql

My data
data mydata;
input
Category $
Item
type
amount;
datalines;
A 1 100 11111
A 2 900 11111
A 3 123 11111
B 1 113 11111
B 2 900 11111
C 1 111 11111
C 2 900 11111
;
My attempt
proc sql;
create table want as
select *, sum(amount and item <> 900) as without900, sum(amount) as total from mydata
group by category
;
quit;
Result
Category Item type amount without900 total
A 3 123 11111 3 33333
A 1 100 11111 3 33333
A 2 900 11111 3 33333
B 2 900 11111 2 22222
B 1 113 11111 2 11111
C 2 900 11111 2 11111
C 1 111 11111 2 11111
Expected result
Category Item type amount without900 total
A 3 123 11111 22222 33333
A 1 100 11111 22222 33333
A 2 900 11111 22222 33333
B 2 900 11111 11111 22222
B 1 113 11111 11111 11111
C 2 900 11111 11111 11111
C 1 111 11111 11111 11111
I know this can be easily achieved by creating another table and maybe hence using left join. I wonder how to achieve the expected using as least proc SQL step as possible. Thank you very much.
You are comparing item to 900, when you should be comparing type. The conditional sum can be accomplished using a case clause within.
Example
data mydata;
input Category $ Item type amount;
datalines;
A 1 100 11111
A 2 900 11111
A 3 123 11111
B 1 113 11111
B 2 900 11111
C 1 111 11111
C 2 900 11111
;
proc sql;
create table want as
select
*
, sum(case when type ne 900 then amount end) as without900
, sum(amount) as total
from
mydata
group by
category
;
quit;

Selecting first group of rows in case second group is duplicate

I have data like below (Dataset name - Have)
ID NAME AMOUNT PREFER
ABC Test1 123 Pref1
ABC Test1 456 Pref1
ABC Test1 789 Pref1
ABC Test1 123 Pref2
ABC Test1 456 Pref2
ABC Test1 789 Pref2
and i want First Group only as output
ID NAME AMOUNT PREFER
ABC Test1 123 Pref1
ABC Test1 456 Pref1
ABC Test1 789 Pref1
Tried so far. simple data step like
Data want;
set have;
by ID PREFER;
if first.PREFER;
run;
This will give me
ID NAME AMOUNT PREFER
ABC Test1 123 Pref1
ABC Test1 123 Pref2
Please suggest something in Data Step or Proc SQL
It sounds as though you probably want something like this:
data have;
input ID $ NAME $ AMOUNT PREFER $;
cards;
ABC Test1 123 Pref1
ABC Test1 456 Pref1
ABC Test1 789 Pref1
ABC Test1 123 Pref2
ABC Test1 456 Pref2
ABC Test1 789 Pref2
run;
data want;
set have;
by id;
retain t_prefer;
if first.id then t_prefer = prefer;
if prefer = t_prefer;
drop t_prefer;
run;
The trick is to use a retain statement so that a copy of the value of prefer from the first row per id is carried over between iterations of the data step, and you can then output only rows with that value of prefer.
You could just keep track of which group number the current record is in.
data want;
set have;
by ID PREFER;
if first.id then group=0;
group+first.prefer;
if group=1;
run;

Add seq number by group SAS

I need to assign seq number by group. I have tried using seq number but got it by one group (1,2,3, etc). However, I need it by two groups. As in the example below:
Have:
Var1 Var2 Var3
101 aaa 202
101 aaa 202
101 bbb 203
101 ccc 206
101 ddd 207
102 aaa 222
102 aaa 222
102 bbb 223
Want:
Obs var1 var2 var3 seq
1 101 aaa 202 1
2 101 aaa 202 1
3 101 bbb 203 2
4 101 ccc 206 3
5 101 ddd 207 4
6 102 aaa 222 1
7 102 aaa 222 1
8 102 bbb 223 2
If you sort your data it is quite simple:
proc sort data=sashelp.class out=class;
by sex age;
run;
data class;
set class;
by sex age;
if first.sex then
seqn = 0;
if first.age then
seqn + 1;
run;

SAS retrieving values from row

I want to retrieve row level values (Loans associated with account number) from a SAS table -
Please find below example.
Input
Account Number Loans
123 abc, def, ghi
456 jkl, mnopqr, stuv
789 w, xyz
Output
Account Numbers Loans
123 abc
123 def
123 ghi
456 jkl
456 mnopqr
456 stuv
789 w
789 xyz
Loans are separated by commas and they don't have fix length.
Use countw() to count the number of values on a line and scan() to pick them out.
Both have a last optional variable to specify the separator, which in your case is ,.
data Loans (keep= AccountNo Loan);
infile datalines truncover;
Input #1 AccountNo 3. #17 LoanList $250.;
if length(LoanList) gt 240 then put 'WARNING: You might need to extend Loans';
label AccountNo = 'Account Number' Loan = 'Loans';
do loanNo = 1 to countw(LoanList, ',');
Loan = scan(LoanList, loanNo, ',');
output;
end;
datalines;
123 abc, def, ghi
456 jkl, mnopqr, stuv
789 w, xyz
;
proc print data=Loans label noobs;
run;
The reverse operation requires different techniques.
To enable by AccountNo processing, we must first construct a SAS dataset from the input and then read that back in with a set statement.
data Loans;
infile datalines;
input #1 AccountNo 3. #5 Loan $25.;
datalines;
123 15-abc
123 15-def
123 15-ghi
456 99-jkl
456 99-mnopqr
456 99-stuv
789 77-w
789 77-xyz
;
data LoanLists;
set Loans;
by AccountNo;
Now create your Loanlist long enough and overwrite the default behaviour of SAS to re-initialise all variables for every observation (=row of data).
format Loanlist $250.;
retain Loanlist;
Collect all loans for an account, separating them with comma an blank.
if first.AccountNo then Loanlist = Loan;
else Loanlist = catx(', ',Loanlist,Loan);
if length(LoanList) gt 240 then put 'WARNING: you might need to extend LoanList';
Keep only the full list per account.
if last.AccountNo;
drop Loan;
proc print;
run;

How to split character and numerical separately in R

I have a dataframe which looks like this:
df= data.frame(name= c("1Alex100.00","12Rina Faso92.31","113john00.00"))
And I want to split this into a data frame with 3 columns so that the output looks like:
name1 name2 name3
1 Alex 100.00
12 Rina Faso 92.31
113 john 00.00
I have tried stringr() and grep() and have got limited success. Lack of a delimiter makes it lot more difficult.
You could try
library(tidyr)
res <- extract(df, name, into=c('name1', 'name2', 'name3'),
'(\\d+)([^0-9]+)([0-9.]+)', convert=TRUE)
res
# name1 name2 name3
#1 1 Alex 100.00
#2 2 Rina Faso 92.31
#3 3 john 50.00
str(res)
# 'data.frame': 3 obs. of 3 variables:
#$ name1: int 1 2 3
#$ name2: Factor w/ 3 levels "Alex","john",..: 1 3 2
# $ name3: num 100 92.3 50
Update
Based on 'df' from #DavidArenburg's post
res <- extract(df, name, into=c('name1', 'name2', 'name3'),
'(\\d+)([^0-9]+)([0-9.]+)', convert=TRUE)
res
# name1 name2 name3
#1 121 Réunion 13.76
#2 2 Côte d'Ivoire 22.40
#3 3 john 50.00
Try with str_match from stringr:
str_match(df$name, "^([0-9]*)([A-Za-z ]*)([0-9\\.]*)")
# [,1] [,2] [,3] [,4]
# [1,] "1Alex100.00" "1" "Alex" "100.00"
# [2,] "2Rina Faso92.31" "2" "Rina Faso" "92.31"
# [3,] "3john50.00" "3" "john" "50.00"
So as.data.frame(str_match(df$name, "^([0-9]*)([A-Za-z ]*)([0-9\\.]*)")[,-1]) should give you the desired result.
You could do like this also.
> df <- data.frame(name= c("1Alex100.00","12Rina Faso92.31","113john00.00"))
> x <- do.call(rbind.data.frame, strsplit(as.character(df$name), "(?<=[A-Za-z])(?=\\d)|(?<=\\d)(?=[A-Za-z])", perl=T))
> colnames(x) <- c("name1", "name2", "name3")
> print(x, row.names=FALSE)
name1 name2 name3
1 Alex 100.00
12 Rina Faso 92.31
113 john 00.00
With base R it could be done abit uglier though it works with special characters too
with(df, cbind(sub("\\D.*", "", name),
gsub("[0-9.]", "", name),
gsub(".*[A-Za-z]", "", name)))
# [,1] [,2] [,3]
# [1,] "1" "Alex" "100.00"
# [2,] "2" "Rina Faso" "92.31"
# [3,] "3" "john" "50.00"
An example on special characters
df = data.frame(name= c("121Réunion13.76","2Côte d'Ivoire22.40","3john50.00"))
with(df, cbind(sub("\\D.*", "", name),
gsub("[0-9.]", "", name),
gsub(".*[A-Za-z]", "", name)))
# [,1] [,2] [,3]
# [1,] "121" "Réunion" "13.76"
# [2,] "2" "Côte d'Ivoire" "22.40"
# [3,] "3" "john" "50.00"
Base R not ugly solutions:
proto=data.frame(name1=numeric(),name2=character(),name3=numeric())
strcapture("(\\d+)(\\D+)(.*)",as.character(df$name),proto)
name1 name2 name3
1 1 Alex 100.00
2 12 Rina Faso 92.31
3 113 john 0.00
read.table(text=gsub("(\\d+)(\\D+)(.*)","\\1|\\2|\\3",df$name),sep="|")
V1 V2 V3
1 1 Alex 100.00
2 12 Rina Faso 92.31
3 113 john 0.00
You could use the package unglue :
df <- data.frame(name= c("1Alex100.00","12Rina Faso92.31","113john00.00"))
library(unglue)
unglue_unnest(df, name, "{name1}{name2=\\D+}{name3}", convert = TRUE)
#> name1 name2 name3
#> 1 1 Alex 100.00
#> 2 12 Rina Faso 92.31
#> 3 113 john 0.00