Is there a way to sort dataframe columns in R. I tried with below, but the result is returning as character instead of dataframe
> asd <- data.frame(a = c("fsd","sdfsd"))
> asd <- with(asd, asd[order(a) , ])
> asd
[1] "fsd" "sdfsd"
Can we get in dataframe only?
Try this
a <- data.frame(x=LETTERS[1:5],y=c(5:1))
a[order(a$x),]
a[order(a$y),]
> a[order(a$x),]
x y
1 A 5
2 B 4
3 C 3
4 D 2
5 E 1
> a[order(a$y),]
x y
5 E 1
4 D 2
3 C 3
2 B 4
1 A 5
Related
I got a pandas dataframe where two columns correspond to names of people. The columns are related and the same name means same person. I want to assign the category code such that it is valid for the whole "name" space.
For example my data frame is
df = pd.DataFrame({"P1":["a","b","c","a"], "P2":["b","c","d","c"]})
>>> df
P1 P2
0 a b
1 b c
2 c d
3 a c
I want it to be replaced by the corresponding category codes, such that
>>> df
P1 P2
0 1 2
1 2 3
2 3 4
3 1 3
The categories are in fact derived from the concatenated array ["a","b","c","d"] and applied on individual columns seperatly. How can I achive this ?.
Use:
print (df.stack().rank(method='dense').astype(int).unstack())
P1 P2
0 1 2
1 2 3
2 3 4
3 1 3
EDIT:
For more general solution I used another answer, because problem with duplicates in index:
df = pd.DataFrame({"P1":["a","b","c","a"],
"P2":["b","c","d","c"],
"A":[3,4,5,6]}, index=[2,2,3,3])
print (df)
A P1 P2
2 3 a b
2 4 b c
3 5 c d
3 6 a c
cols = ['P1','P2']
df[cols] = (pd.factorize(df[cols].values.ravel())[0]+1).reshape(-1, len(cols))
print (df)
A P1 P2
2 3 1 2
2 4 2 3
3 5 3 4
3 6 1 3
You can do
In [465]: pd.DataFrame((pd.factorize(df.values.ravel())[0]+1).reshape(df.shape),
columns=df.columns)
Out[465]:
P1 P2
0 1 2
1 2 3
2 3 4
3 1 3
I have a list consisting of several string as shown below list=['abc','cde','fgh'] I want to add all the values to a particular index of dataframe. I am trying it with the code
df1.ix[1,3]=[list]
df1.to_csv('test.csv', sep=',')
I want in dataframe at poistion 1,3 all values to be inserted as it is ['abc','cde','fgh']. I don't want to convert it to string or any other format. But it is giving me error. what I am doing wrong here
I think you can use:
df1.ix[1,3] = L
Also is not recommended use variable list, because code word in python.
Sample:
df1 = pd.DataFrame({'a':[1,2,3], 'b':[1,2,3], 'c':[1,2,3], 'd':[1,2,3]})
print (df1)
a b c d
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
L = ['abc','cde','fgh']
df1.ix[1,3]= L
print (df1)
a b c d
0 1 1 1 1
1 2 2 2 [abc, cde, fgh]
2 3 3 3 3
I think you meant to use 1:3 not 1,3
consider the pd.DataFrame df
df = pd.DataFrame(dict(A=list('xxxxxxx')))
use loc or ix
df.loc[1:3, 'A'] = ['abc', 'cde', 'fgh']
or
df.ix[1:3, 'A'] = ['abc', 'cde', 'fgh']
yields
I'm new to python pandas and haven't found an answer to this in the documentation. I have an existing dataframe and I've added a new column Y. I want to set the value of column Y to 'abc' in all rows in which column Z = 'xyz'. In sql this would be a simple
update table set colY = 'abc' where colZ = 'xyz'
Is there a similar way to do this update in pandas?
Thanks!
You can use loc or numpy.where if you need set other value too:
df.loc[df.Z == 'xyz', 'Y'] = 'abc'
Sample:
import pandas as pd
import numpy as np
df = pd.DataFrame({'X':[1,2,3],
'Z':['xyz',5,6],
'C':[7,8,9]})
print (df)
C X Z
0 7 1 xyz
1 8 2 5
2 9 3 6
df.loc[df.Z == 'xyz', 'Y'] = 'abc'
print (df)
C X Z Y
0 7 1 xyz abc
1 8 2 5 NaN
2 9 3 6 NaN
df['Y1'] = np.where(df.Z == 'xyz', 'abc', 'klm')
print (df)
C X Z Y Y1
0 7 1 xyz abc abc
1 8 2 5 NaN klm
2 9 3 6 NaN klm
You can use set column values too:
df['Y2'] = np.where(df.Z == 'xyz', 'abc', df.C)
print (df)
C X Z Y Y2
0 7 1 xyz abc abc
1 8 2 5 NaN 8
2 9 3 6 NaN 9
Define:
dats <- list( df1 = data.frame(A=sample(1:3), B = sample(11:13)),
df2 = data.frame(AA=sample(1:3), BB = sample(11:13)))
s.t.
> dats
$df1
A B
1 2 12
2 3 11
3 1 13
$df2
AA BB
1 1 13
2 2 12
3 3 11
I would like to change all variable names from all caps to lower. I can do this with a loop but somehow cannot get this lapply call to work:
dats <- lapply(dats, function(x)
names(x)<-tolower(names(x)))
which results in:
> dats
$df1
[1] "a" "b"
$df2
[1] "aa" "bb"
while the desired result is:
> dats
$df1
a b
1 2 12
2 3 11
3 1 13
$df2
aa bb
1 1 13
2 2 12
3 3 11
If you don't use return at the end of a function, the last evaluated expression returned. So you need to return x.
dats <- lapply(dats, function(x) {
names(x)<-tolower(names(x))
x})
Given a list of values in R, what is a nice way to filter values in a list by a given predicate function?
It's not entirely clear whether you have a proper list object in R, or another type of object such as a data.frame or vector. Assuming you have a true list object, we can combine lapply and subset to do what you want. If you don't have a list, then there's no need for lapply.
set.seed(1)
#Fake data
dat <- list(a = data.frame(x = sample(1:10, 20, TRUE))
, b = data.frame(x = sample(1:10, 20, TRUE)))
#Apply the subset function over the list
lapply(dat, subset, x < 3)
$a
x
10 1
12 2
$b
x
4 2
7 1
14 2
18 2
#Example two
lapply(dat, subset, x %in% c(1,7,9))
$a
x
6 9
8 7
9 7
10 1
13 7
$b
x
3 7
7 1
9 9
15 9
16 7