R: How to generate data frames from all the possible combination of three columns? - combinations

I have the following data frame:
df1 <- data.frame(x1=c(1,2,3,4), x2=c(10,20,30,40), x3=c(100,200,300,400))
And I want to generate al the possible data frames that can be created from combining d1$x1, df1$x2 and df1$x3 in different orders so 4^3 different dataframes, e.g:
x1 x2 x3
1 1 10 100
2 2 20 200
3 3 30 300
4 4 40 400`
x1 x2 x3
1 1 40 400
2 2 30 300
3 3 20 200
4 4 10 100
and so on. For each of them I want to compute the following function:
my.function <- function(x1, x2, x3) {
sum(0.3*x1^2+0.3*x2^2+0.4*x3)/nrow(x1)
}
I did this, but it's clearly wrong:
res1 <- rep(NA, nrow(df1)^3)
for(i in 1:nrow(df1)){
for(j in 1:nrow(df1)){
for(k in 1:nrow(df1)){
x1.1 <- as.vector(c(df1[-i, 1], df1[i, 1]))
x2.1 <- as.vector(c(df1[-k, 2], df1[k, 2]))
x3.1 <- as.vector(c(df1[-j, 3], df1[j, 3]))
res1[nrow(df1)^2*(i-1) + nrow(df1)*(j-1)+k] <- m.function(x1.1, x2.1, x3.1)
}
}
}
I tried to find a similar problem of mine without much luck, could you please help me?
Thank you so much!!!

Related

comparison among input numbers

i want have comparison among input numbers .general shape of input numbers is like this :
n
x1 x2
x3 x4
x5 x6
x7 x8
n indicates how many input line we have and on each line we have 2 number with space ,just 2 number,
the first part of line is price and the second is quality
for example like this :
4
1 5
7 9
5 6
20 30
in line two '1' is price and '5' is quality
if i can find more quality with lower price i print 'yes' if not print 'no'
i write this but i can not change list to dictionary
x = int(input())
count =0
y= []
while count!=x:
y.append(input())
count+=1
z= []
for i in y:
z.append(i.split())
def dictionary(x):
d={x[0]:x[1]}
return d
for h in z:
a=dictionary(h)
for example i want
2
1 10
7 3
'yes'
or another example :
4
1 5
7 9
5 6
20 30
'no'
i hope someone answer this
n = int(input())
q,p = map(int, input().split())
for _ in range(n-1):
q1,p1 = map(int, input().split())
if q1 > q and p1 < p:
print("yes")
break
else:
print("no")
This works. The code is really simple.

if else loop not working

I am a beginner in R studio, so hopefully someone can help me with this problem. The case: I want to make an if else loop. I made the following code for an l times m matrix:
for (i in 1:l){
for (j in 1:m){
if (is.na(quantilereturns[i,j]) < quantile(quantilereturns[,j], c(.1), na.rm=TRUE)) {
quantilereturns[i,j]
} else { (0) }
}
}
Summary: I want to make a matrix with values that are smaller than the quantile of a certain vector in the matrix quantilereturns. So when they are smaller than the 10% quantile they get their original value otherwise it will be a zero.
The code doesn't give any errors, but it doesn't change the values in the matrix either.
Can someone help me?
You need to assign the result to a cell of the matrix. I will take the matrix of a recent other thread as an example:
a <- c(4, -9, 2)
b <- c(-1, 3, -8)
c <- c(5, 2, 6)
d <- c(7, 9, -2)
matrix <- cbind(a,b,c,d)
d <- dim(matrix)
rows <- d[1]
columns <- d[2]
print("Before")
print(matrix)
for (i in 1:rows) {
for (j in 1:columns) {
if (is.na(matrix[i,j]) >= quantile(matrix[,j], c(.1), na.rm=TRUE)) {
matrix[i,j] <- 0
}
}
}
print("After")
print(matrix)
this gives
[1] "Before"
a b c d
[1,] 4 -1 5 7
[2,] -9 3 2 9
[3,] 2 -8 6 -2
[1] "After"
a b c d
[1,] 0 0 5 0
[2,] 0 0 2 0
[3,] 0 0 6 0
So the essential line you are looking for is matrix[i,j] <- 0

How to shuffle data in pandas? [duplicate]

What's a simple and efficient way to shuffle a dataframe in pandas, by rows or by columns? I.e. how to write a function shuffle(df, n, axis=0) that takes a dataframe, a number of shuffles n, and an axis (axis=0 is rows, axis=1 is columns) and returns a copy of the dataframe that has been shuffled n times.
Edit: key is to do this without destroying the row/column labels of the dataframe. If you just shuffle df.index that loses all that information. I want the resulting df to be the same as the original except with the order of rows or order of columns different.
Edit2: My question was unclear. When I say shuffle the rows, I mean shuffle each row independently. So if you have two columns a and b, I want each row shuffled on its own, so that you don't have the same associations between a and b as you do if you just re-order each row as a whole. Something like:
for 1...n:
for each col in df: shuffle column
return new_df
But hopefully more efficient than naive looping. This does not work for me:
def shuffle(df, n, axis=0):
shuffled_df = df.copy()
for k in range(n):
shuffled_df.apply(np.random.shuffle(shuffled_df.values),axis=axis)
return shuffled_df
df = pandas.DataFrame({'A':range(10), 'B':range(10)})
shuffle(df, 5)
Use numpy's random.permuation function:
In [1]: df = pd.DataFrame({'A':range(10), 'B':range(10)})
In [2]: df
Out[2]:
A B
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
In [3]: df.reindex(np.random.permutation(df.index))
Out[3]:
A B
0 0 0
5 5 5
6 6 6
3 3 3
8 8 8
7 7 7
9 9 9
1 1 1
2 2 2
4 4 4
Sampling randomizes, so just sample the entire data frame.
df.sample(frac=1)
As #Corey Levinson notes, you have to be careful when you reassign:
df['column'] = df['column'].sample(frac=1).reset_index(drop=True)
In [16]: def shuffle(df, n=1, axis=0):
...: df = df.copy()
...: for _ in range(n):
...: df.apply(np.random.shuffle, axis=axis)
...: return df
...:
In [17]: df = pd.DataFrame({'A':range(10), 'B':range(10)})
In [18]: shuffle(df)
In [19]: df
Out[19]:
A B
0 8 5
1 1 7
2 7 3
3 6 2
4 3 4
5 0 1
6 9 0
7 4 6
8 2 8
9 5 9
You can use sklearn.utils.shuffle() (requires sklearn 0.16.1 or higher to support Pandas data frames):
# Generate data
import pandas as pd
df = pd.DataFrame({'A':range(5), 'B':range(5)})
print('df: {0}'.format(df))
# Shuffle Pandas data frame
import sklearn.utils
df = sklearn.utils.shuffle(df)
print('\n\ndf: {0}'.format(df))
outputs:
df: A B
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
df: A B
1 1 1
0 0 0
3 3 3
4 4 4
2 2 2
Then you can use df.reset_index() to reset the index column, if needs to be:
df = df.reset_index(drop=True)
print('\n\ndf: {0}'.format(df)
outputs:
df: A B
0 1 1
1 0 0
2 4 4
3 2 2
4 3 3
A simple solution in pandas is to use the sample method independently on each column. Use apply to iterate over each column:
df = pd.DataFrame({'a':[1,2,3,4,5,6], 'b':[1,2,3,4,5,6]})
df
a b
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
5 6 6
df.apply(lambda x: x.sample(frac=1).values)
a b
0 4 2
1 1 6
2 6 5
3 5 3
4 2 4
5 3 1
You must use .value so that you return a numpy array and not a Series, or else the returned Series will align to the original DataFrame not changing a thing:
df.apply(lambda x: x.sample(frac=1))
a b
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
5 6 6
From the docs use sample():
In [79]: s = pd.Series([0,1,2,3,4,5])
# When no arguments are passed, returns 1 row.
In [80]: s.sample()
Out[80]:
0 0
dtype: int64
# One may specify either a number of rows:
In [81]: s.sample(n=3)
Out[81]:
5 5
2 2
4 4
dtype: int64
# Or a fraction of the rows:
In [82]: s.sample(frac=0.5)
Out[82]:
5 5
4 4
1 1
dtype: int64
I resorted to adapting #root 's answer slightly and using the raw values directly. Of course, this means you lose the ability to do fancy indexing but it works perfectly for just shuffling the data.
In [1]: import numpy
In [2]: import pandas
In [3]: df = pandas.DataFrame({"A": range(10), "B": range(10)})
In [4]: %timeit df.apply(numpy.random.shuffle, axis=0)
1000 loops, best of 3: 406 µs per loop
In [5]: %%timeit
...: for view in numpy.rollaxis(df.values, 1):
...: numpy.random.shuffle(view)
...:
10000 loops, best of 3: 22.8 µs per loop
In [6]: %timeit df.apply(numpy.random.shuffle, axis=1)
1000 loops, best of 3: 746 µs per loop
In [7]: %%timeit
for view in numpy.rollaxis(df.values, 0):
numpy.random.shuffle(view)
...:
10000 loops, best of 3: 23.4 µs per loop
Note that numpy.rollaxis brings the specified axis to the first dimension and then let's us iterate over arrays with the remaining dimensions, i.e., if we want to shuffle along the first dimension (columns), we need to roll the second dimension to the front, so that we apply the shuffling to views over the first dimension.
In [8]: numpy.rollaxis(df, 0).shape
Out[8]: (10, 2) # we can iterate over 10 arrays with shape (2,) (rows)
In [9]: numpy.rollaxis(df, 1).shape
Out[9]: (2, 10) # we can iterate over 2 arrays with shape (10,) (columns)
Your final function then uses a trick to bring the result in line with the expectation for applying a function to an axis:
def shuffle(df, n=1, axis=0):
df = df.copy()
axis = int(not axis) # pandas.DataFrame is always 2D
for _ in range(n):
for view in numpy.rollaxis(df.values, axis):
numpy.random.shuffle(view)
return df
This might be more useful when you want your index shuffled.
def shuffle(df):
index = list(df.index)
random.shuffle(index)
df = df.ix[index]
df.reset_index()
return df
It selects new df using new index, then reset them.
I know the question is for a pandas df but in the case the shuffle occurs by row (column order changed, row order unchanged), then the columns names do not matter anymore and it could be interesting to use an np.array instead, then np.apply_along_axis() will be what you are looking for.
If that is acceptable then this would be helpful, note it is easy to switch the axis along which the data is shuffled.
If you panda data frame is named df, maybe you can:
get the values of the dataframe with values = df.values,
create an np.array from values
apply the method shown below to shuffle the np.array by row or column
recreate a new (shuffled) pandas df from the shuffled np.array
Original array
a = np.array([[10, 11, 12], [20, 21, 22], [30, 31, 32],[40, 41, 42]])
print(a)
[[10 11 12]
[20 21 22]
[30 31 32]
[40 41 42]]
Keep row order, shuffle colums within each row
print(np.apply_along_axis(np.random.permutation, 1, a))
[[11 12 10]
[22 21 20]
[31 30 32]
[40 41 42]]
Keep colums order, shuffle rows within each column
print(np.apply_along_axis(np.random.permutation, 0, a))
[[40 41 32]
[20 31 42]
[10 11 12]
[30 21 22]]
Original array is unchanged
print(a)
[[10 11 12]
[20 21 22]
[30 31 32]
[40 41 42]]
Here is a work around I found if you want to only shuffle a subset of the DataFrame:
shuffle_to_index = 20
df = pd.concat([df.iloc[np.random.permutation(range(shuffle_to_index))], df.iloc[shuffle_to_index:]])

Counting values to get a matrix in Stata

I have a variable age, 13 variables x1 to x13, and 802 observations in a Stata dataset. age has values ranging 1 to 9. x1 to x13 have values ranging 1 to 13.
I want to know how to count the number of 1 .. 13 in x1 to x13 according to different values of age. For example, for age 1, in x1 to x13, count the number of 1,2,3,4,...13.
I first change x1 to x13 as a matrix by using
mkmat x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13, matrix (a)
Then, I want to count using the following loop:
gen count = 0
quietly forval i = 1/802 {
quietly forval j = 1/13 {
replace count = count + inrange(a[r'i', x'j'], 0, 1), if age==1
}
}
I failed.
I am still somewhat uncertain as to what you like to achieve. But if I am understanding you correctly, here is one way to do it.
First, a simple data that has age ranging from one to three, and four variables x1-x4, each with values of integers ranging between 5 and 7.
clear
input age x1 x2 x3 x4
1 5 6 6 6
1 7 5 6 5
2 5 7 6 6
3 5 6 7 7
3 7 6 6 6
end
Then we create three count variables (n5, n6 and n7) that counts the number of 5s, 6s, and 7s for each subject across x1-x4.
forval i=5/7 {
egen n`i'=anycount(x1 x2 x3 x4),v(`i')
}
Below is how the data looks like now. To explain, the first "1" under n5 indicates that there is only one "5" for the subject across x1-x4.
+----------------------------------------+
| age x1 x2 x3 x4 n5 n6 n7 |
|----------------------------------------|
1. | 1 5 6 6 6 1 3 0 |
2. | 1 7 5 6 5 2 1 1 |
3. | 2 5 7 6 6 1 2 1 |
4. | 3 5 6 7 7 1 1 2 |
5. | 3 7 6 6 6 0 3 1 |
+----------------------------------------+
It sounds to me like your ultimate goal is to have sums calculated separately for each value in age. Assuming this is true, let's create a 3x3 matrix to store such results.
mat A=J(3,3,.) // age (1-3) and values (5-7)
mat rown A=age1 age2 age3
mat coln A=value5 value6 value7
forval i=5/7 {
forval j=1/3 {
qui su n`i' if age==`j'
loca k=`i'-4 // the first column for value5
mat A[`j',`k']=r(sum)
}
}
The matrix looks like this. To explain, the first "3" under value5 indicates that for all children of the age of 1, the value 5 appears a total of three times across x1-x4
A[3,3]
value5 value6 value7
age1 3 4 1
age2 1 2 1
age3 1 4 3
With Aspen's example, you could do this:
gen id = _n
reshape long x, i(id)
tab age x
Note that your sample code doesn't loop over different ages and there is an incorrect comma in the count command. I won't try to fix the code, as there are many more direct methods, one of which is above. tabulate has an option to save the table as a matrix.
Here is another solution closer to the original idea. Warning: code not tested.
matrix count = J(9, 13, 0)
forval i = 1/9 {
forval j = 1/13 {
forval J = 1/13 {
qui count if age == `i' & x`J' == `j'
matrix count[`i', `j'] = count[`i', `j'] + r(N)
}
}
}

matlab - selecting rows when importing

I have a list of values (product codes, like '1123','4356'...), call it LIST, and I want to select from a matrix M only the correspondent rows. I.e., the first col of the matrix M contains codes, the other cols the data, and I have an additional vector LIST that contains the codes to select.
Ex.
LIST MATRIX I WANT
[123; [000 1 2 3 ; [123 3 5 6 ;
456] 123 3 5 6 ; 456 1 4 6 ]
000 5 6 7 ;
456 1 4 6 ]
Efficient way to do it?
list = [123; 456];
mat = [000 1 2 3; 123 3 5 6; 000 5 6 7; 456 1 4 6];
iwant = [123 3 5 6 ; 456 1 4 6];
[a,b]=ismember(list,mat);
iwant2 = mat(b,:);
iwant==iwant2