SFrame: replacing of specific rows in the column - python-2.7

Sorry, I have probably some simple question.
I have SFrame looks like this:
A B C
0 1 2
0 2 3
1 2 3
1 3 4
2 3 1
2 3 3
. . .
Also I have another SFrame, looks like this:
A B C
0 1 4
0 2 5
I want replace SFrame with the similar A & B values, but with new C values.
A B C
0 1 4
0 2 5
1 2 3
1 3 4
2 3 1
2 3 3
. . .
It could be the all columns in the firstSFrame, but also just one column (SArray).
I try it with the next prompt:
sfr['C'][sfr['A']==0] = sfr2['C']
or just
sfr[sfr['A']==0] = sfr2
but got next error message:
TypeError: 'SArray' object does not support item assignment
Anyway, When I replace the SArray C from the similar length, this solution is worked.... The problem is in the different lengths of SFrames...

At the moment, I found someself a simple solution.
I create a list from all values, which I want replace in the first SFrame. Then convert this list to SArray and add it as a new column. (the number of columns is not important for me)...

Related

How Domain maps map indexes to target locales array in multi-dimension case

I didn't find how the domain map maps the indices in the multi-dimensional domains to the multi-dimensional target locales.
1.) How the target locales (one dimension) is arranged in multi-dimension fashion which equals the distribution dimension to map the indexes?
2.) In documentation it states that for multi-dimension case, the computation should be done in every dimension. For the domain {1..8, 1..8} ==> dom
assume dom is block-distributed over 6 target locales.
Steps in mapping
1 for 1st dimension (1..8) do the computation
if idx is low<=idx<=high then locid is
floor (idx-low)*N / (high-low+1) gives me an index say i.
Repeat the same for 2nd dimension which gives me an index say j.
Now I have a tuple ( i, j )
how this is mapped to the target locales array of dimension 2?
What the domain map do for changing the 1D target locales array to distribution dimension?
Is something like reshape function ?
Please let me know if this lacks sufficient information.
The specific details about how a domain's indices are mapped to a program's locales are not defined by the Chapel language itself, but rather by the implementation of the domain map used to declare the domain. In the comments under your question, you mention that you're referring to the Block distribution, so I'll focus on that in my answer (documented here), but note that any other domain map could take a different approach.
The Block distribution takes an optional targetLocales argument which permits you to specify the set of locales to be targeted, as well as their virtual topology. For instance, if I declare and populate a few arrays of locales:
var grid1: [1..3, 1..2] locale, // a 3 x 2 array of locales
grid2: [1..2, 1..3] locale; // a 2 x 3 array of locales
for i in 1..3 {
for j in 1..2 {
grid1[i,j] = Locales[(2*(i-1) + j-1)%numLocales];
grid2[j,i] = Locales[(3*(j-1) + i-1)%numLocales];
}
}
I can then pass them in as the targetLocales arguments to a few instances of a Block-distributed domain:
use BlockDist;
config const n = 8;
const D = {1..n, 1..n},
D1 = D dmapped Block(D, targetLocales=grid1),
D2 = D dmapped Block(D, targetLocales=grid2);
Each domain will distribute its n rows to the first dimension of its targetLocales grid and its n columns to the second dimension. We can see the results of this distribution by declaring arrays of integers over these domains and assigning them in parallel to make each element store its owning locale's ID, as follows:
var A1: [D1] int,
A2: [D2] int;
forall a in A1 do
a = here.id;
forall a in A2 do
a = here.id;
writeln(A1, "\n");
writeln(A2, "\n");
When running on six or more locales (./a.out -nl 6), the output is as follows, revealing the underlying grid structure:
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
2 2 2 2 3 3 3 3
2 2 2 2 3 3 3 3
2 2 2 2 3 3 3 3
4 4 4 4 5 5 5 5
4 4 4 4 5 5 5 5
0 0 0 1 1 1 2 2
0 0 0 1 1 1 2 2
0 0 0 1 1 1 2 2
0 0 0 1 1 1 2 2
3 3 3 4 4 4 5 5
3 3 3 4 4 4 5 5
3 3 3 4 4 4 5 5
3 3 3 4 4 4 5 5
For a 1-dimensional targetLocales array, the documentation says:
If the rank of targetLocales is 1, a greedy heuristic is used to reshape the array of target locales so that it matches the rank of the distribution and each dimension contains an approximately equal number of indices.
For example, if we distribute to a 1-dimensional 4-element array of locales:
var grid3: [1..4] locale;
for i in 1..4 do
grid3[i] = Locales[(i-1)%numLocales];
var D3 = D dmapped Block(D, targetLocales=grid3);
var A3: [D3] int;
forall a in A3 do
a = here.id;
writeln(A3);
we can see that the target locales form a square, as expected:
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
2 2 2 2 3 3 3 3
2 2 2 2 3 3 3 3
2 2 2 2 3 3 3 3
2 2 2 2 3 3 3 3
The documentation is intentionally vague about how a 1D targetLocales argument will be reshaped if it's not a perfect square, but we can find out what's done in practice by using the targetLocales() query on the domain. Also, note that if no targetLocales array is supplied, the entire Locales array (which is 1D) is used by default. As an illustration of both these things, if the following code is run on six locales:
var D0 = D dmapped Block(D);
writeln(D0.targetLocales());
we get:
LOCALE0 LOCALE1
LOCALE2 LOCALE3
LOCALE4 LOCALE5
illustrating that the current heuristic matches our explicit grid1 declaration above.

Assigning value to multiindexed pandas dataframe based on mix of integer and labels indexing

I have a dataframe with multiindexed columns. I want to select on the first level based on the column name, and then return all columns but the last one, and assign a new value to all these elements.
Here's a sample dataframe:
In [1]: mydf = pd.DataFrame(np.random.random_integers(low=1,high=5,size=(4,9)),
columns = pd.MultiIndex.from_product([['A', 'B', 'C'], ['a', 'b', 'c']]))
Out[1]:
A B C
a b c a b c a b c
0 4 1 2 1 4 2 1 1 3
1 4 4 1 2 3 4 2 2 3
2 2 3 4 1 2 1 3 2 3
3 1 3 4 2 3 4 1 5 1
If want to be able to assign to this elements for example:
In [2]: mydf.loc[:,('A')].iloc[:,:-1]
Out[2]:
A
a b
0 4 1
1 4 4
2 2 3
3 1 3
If I wanted to modify one column only, I know how to select it properly with a tuple so that the assigning works:
In [3]: mydf.loc[:,('A','a')] = 0
In [4]: mydf.loc[:,('A','a')]
Out[4]:
0 0
1 0
2 0
3 0
Name: (A, a), dtype: int32
So that worked well.
Now the following doesn't work...
In [5]: mydf.loc[:,('A')].ix[:,:-1] = 6 - mydf.loc[:,('A')].ix[:,:-1]
In [6]: mydf.loc[:,('A')].iloc[:,:-1] = 6 - mydf.loc[:,('A')].iloc[:,:-1]
Sometimes I will, and sometimes I won't, get the warning that a value is trying to be set on a copy of a slice from a DataFrame. But in both cases it doesn't actually assign.
I've pretty much tried everything I could think, I still can't figure out how to mix both label and integer indexing in order to set the value correctly.
Any idea please?
Versions:
Python 2.7.9
Pandas 0.16.1
This is not directly supported as .loc MUST have labels and NOT positions. In theory .ix could support this with mulit-index slicers, but the usual complicates of figuring out what is 'meant' by the user (e.g. is it a label or a position).
In [63]: df = pd.DataFrame(np.random.random_integers(low=1,high=5,size=(4,9)),
columns = pd.MultiIndex.from_product([['A', 'B', 'C'], ['a', 'b', 'c']]))
In [64]: df
Out[64]:
A B C
a b c a b c a b c
0 4 4 4 4 3 2 5 1 4
1 1 2 1 3 2 1 1 4 5
2 3 2 4 4 2 2 3 1 4
3 5 1 1 3 1 1 5 5 5
so we compute the indexer for the 'A' block; np.r_ turns this slice into an actual indexer; then we select the element (e.g. 0 in this case). This feeds into .iloc.
In [65]: df.iloc[:,np.r_[df.columns.get_loc('A')][0]] = 0
In [66]: df
Out[66]:
A B C
a b c a b c a b c
0 0 4 4 4 3 2 5 1 4
1 0 2 1 3 2 1 1 4 5
2 0 2 4 4 2 2 3 1 4
3 0 1 1 3 1 1 5 5 5

pandas - Perform computation against a reference record within groups

For each row of data in a DataFrame I would like to compute the number of unique values in columns A and B for that particular row and a reference row within the group identified by another column ID. Here is a toy dataset:
d = {'ID' : pd.Series([1,1,1,2,2,2,2,3,3])
,'A' : pd.Series([1,2,3,4,5,6,7,8,9])
,'B' : pd.Series([1,2,3,4,11,12,13,14,15])
,'REFERENCE' : pd.Series([1,0,0,0,0,1,0,1,0])}
data = pd.DataFrame(d)
The data looks like this:
In [3]: data
Out[3]:
A B ID REFERENCE
0 1 1 1 1
1 2 2 1 0
2 3 3 1 0
3 4 4 2 0
4 5 11 2 0
5 6 12 2 1
6 7 13 2 0
7 8 14 3 1
8 9 15 3 0
Now, within each group defined using ID I want to compare each record with the reference record and I want to compute the number of unique A and B values for the combination. For instance, I can compute the value for data record 3 by taking len(set([4,4,6,12])) which gives 3. The result should look like this:
A B ID REFERENCE CARDINALITY
0 1 1 1 1 1
1 2 2 1 0 2
2 3 3 1 0 2
3 4 4 2 0 3
4 5 11 2 0 4
5 6 12 2 1 2
6 7 13 2 0 4
7 8 14 3 1 2
8 9 15 3 0 3
The only way I can think of implementing this is using for loops that loop over each grouped object and then each record within the grouped object and computes it against the reference record. This is non-pythonic and very slow. Can anyone please suggest a vectorized approach to achieve the same?
I would create a new column where I combine a and b into a tuple and then I would group by And then use groups = dict(list(groupby)) and then get the length of each frame using len()

How to reshape long to wide data in Stata?

I have the following data:
id tests testvalue
1 A 4
1 B 5
1 C 3
1 D 3
2 A 3
2 B 3
3 C 3
3 D 4
4 A 3
4 B 5
4 A 1
4 B 3
I would like to change the above long data format into following wide data.
id testA testB testC testD index
1 4 5 3 3 1
2 3 3 . . 2
3 . . 3 4 3
4 3 5 . . 4
4 1 3 . . 5
I am trying
reshape wide testvalue, i(id) j(tests)
It gives error because there are no unique values within tests.
What would be the solution to this problem?
You need to create an extra identifier to make replicates distinguishable.
clear
input id str1 tests testvalue
1 A 4
1 B 5
1 C 3
1 D 3
2 A 3
2 B 3
3 C 3
3 D 4
4 A 3
4 B 5
4 A 1
4 B 3
end
bysort id tests: gen replicate = _n
reshape wide testvalue, i(id replicate) j(tests) string
See also here for documentation.

Function that given these 2 values produces this third value?

I'm trying to write a function that when given 2 arguments, the 2 leftmost columns, produces the third column as a result:
0 0 0
1 0 3
2 0 2
3 0 1
0 1 1
1 1 0
2 1 3
3 1 2
0 2 2
1 2 1
2 2 0
3 2 3
0 3 3
1 3 2
2 3 1
3 3 0
I know there will be a modulus involved but I can't quite figure it out.
I'm trying to figure out if 4 people are sitting at a table, given the person and target, from the person's perspective which seat is the target sitting in?
Thanks
If a and b are the positions of the two persons, their "distance" is:
(4+b-a) % 4
This also shows that the forth block in your example is wrong.
Assuming that last block of numbers is wrong, I think you're looking for (4 + b - a) % 4 gives c (for columns a b c).