Can I use Lists in R as a proxy to data frame having unequal number of columns? - list

My understanding as far as data frame in R is that it has to be rectangular. It is not possible to have a data frame with unequal column lengths. Can I use the lists in R to achieve this? What are he pros and cons for such an approach?

You can use lists to store whatever you want, even dataframes or other lists! You can indeed assign different length vectors, or even completely different objects. It gives you the same functionality as dataframes in that you can index using the dollar sign:
> fooList <- list(a=1:12, b=1:11, c=1:10)
> fooList$a
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> fooDF <- data.frame(a=1:10, b=1:10, c=1:10)
> fooDF$a
[1] 1 2 3 4 5 6 7 8 9 10
But numeric indexing is different:
> fooList[[1]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> fooDF[,1]
[1] 1 2 3 4 5 6 7 8 9 10
as well as the structure and printing method:
> fooList
$a
[1] 1 2 3 4 5 6 7 8 9 10 11 12
$b
[1] 1 2 3 4 5 6 7 8 9 10 11
$c
[1] 1 2 3 4 5 6 7 8 9 10
> fooDF
a b c
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
10 10 10 10
Simply said a dataframe is a matrix and a list more of a container.
A list is meant to keep all sorts of stuff together, and a dataframe is the usual data format (a subject/case for each row and a variable for each column). It is used in a lot of analyses, allows to index the scores of a subject, can be more easilly transformed and other things.
However if you have unequal length columns then I doubt each row resembles a subject/case in your data. In that case I guess you don't need much of the functionality of dataframes.
If each row does resemble a subject/case, then you should use NA for any missing values and use a data frame.

Related

Sorted list of random repeated numbers to sorted list of repeated and continuos numbers in google sheets

I think the best way to show the problem is with an example. Column A is what i have now, and column B is what I would want.
A
B
1
1
1
1
2
2
2
2
5
3
5
3
5
3
8
4
8
4
9
5
9
5
14
6
14
6
17
7
17
7
17
7
Update: Based on your comment, use this formula
=ArrayFormula(IF(ISNUMBER(A1:A), VLOOKUP(A1:A, {UNIQUE(A1:A), ArrayFormula(RANK(UNIQUE(A1:A), UNIQUE(A1:A), 1))}, 2, 0), ""))
Previous answer: Have you already used the SORT formula?
Try =SORT(A1:A, 1, 1) in cell B1
Assuming your data starts at row 2 through row 10 column A. In B2 :
=arrayformula(1/COUNTIF($A$2:$A$10,$A$2:$A$10))
in C2
=sumproduct(($B$1:$B1)*($A$1:$A1<A2))+1

How do I add a Header for Rows in a 2D Array?

I Need to output an 2D Array that has a label header for the colums and the rows.
the columns is easy i just ouput a string above the table but i cannot figure out how to add the word ROW in vertical letters at the begining of the table.
it has to look like this.
C o l u m n s
| 1 2 3 4 5 6
----------------------------------
1 | 2 3 4 5 6 7
R 2 | 3 4 5 6 7 8
O 3 | 4 5 6 7 8 9
W 4 | 5 6 7 8 9 10
S 5 | 6 7 8 9 10 11
6 | 7 8 9 10 11 12
i cannot figure out how to get the rows label

Sort rows in a dataframe based on highest values in the whole dataframe

I have a dataframe that has probability values for 3 category columns [A, B, C]. Now I want to sort the rows of this dataframe based on the condition that the row which has the highest probability value in the whole dataframe(irrespective of the columns), should be at the top followed by the row with the second highest probability value and so on.
If someone can help me with this?
In [15]: df = pd.DataFrame(np.random.randint(1, 10, (10,3)))
In [16]: df
Out[16]:
0 1 2
0 9 2 8
1 6 6 9
2 2 4 9
3 2 1 2
4 2 5 3
5 3 4 9
6 8 7 3
7 6 4 1
8 3 3 8
9 7 2 7
In [17]: df.iloc[df.apply(np.max, axis=1).sort_values(ascending=False).index]
Out[17]:
0 1 2
5 3 4 9
2 2 4 9
1 6 6 9
0 9 2 8
8 3 3 8
6 8 7 3
9 7 2 7
7 6 4 1
4 2 5 3
3 2 1 2

Why is srand(time(NULL)) working smoothly even though I repeatedly reset it?

I have a function that creates a vector of size N, and shuffles it:
void rand_vector_generator(int N) {
srand(time(NULL));
vector <int> perm(N);
for (unsigned k=0; k<N; k++) {
perm[k] = k;
}
random_shuffle(perm.begin(),perm.end());
}
I'm calling this from my main function with the loop:
for(int i=0; i<20; i++)
rand_vector_generator(10);
I expected this to not give me sufficient randomness in my shuffling because I'm calling srand(time(NULL)); with every function call and the seed is not too different from successive call to call. My understanding is that I call srand(time(NULL)); once and not multiple times so the seed doesn't "reset".
This thread somewhat affirms what I was expecting the result to be.
Instead, I get:
6 0 3 5 7 8 4 1 2 9
0 8 6 4 2 3 7 9 1 5
8 2 4 9 5 0 6 7 1 3
0 6 1 8 7 4 5 2 3 9
2 5 1 0 3 7 6 4 8 9
4 5 3 0 1 7 2 9 6 8
8 5 2 9 7 0 6 3 4 1
8 4 9 3 1 5 7 0 6 2
3 7 6 0 9 8 2 4 1 5
8 5 2 3 7 4 6 9 1 0
5 4 0 1 2 6 8 7 3 9
2 5 7 9 6 0 4 3 1 8
5 8 3 7 0 2 1 6 9 4
7 4 9 5 1 8 2 3 0 6
1 9 2 3 8 6 0 7 5 4
0 6 4 3 1 2 9 7 8 5
9 3 8 4 7 5 1 6 0 2
1 9 6 5 3 0 2 4 8 7
7 5 1 8 9 3 4 0 2 6
2 9 6 5 4 0 3 7 8 1
These vectors seem pretty randomly shuffled to me. What am I missing? Does the srand call somehow exist on a different scope than the function call so it doesn't get reset every call? Or am I misunderstanding something more fundamental here?
According to standard the use of std::rand in both std::random_shuffle and std::shuffle is implementation-defined (though it is often the case that an std::rand is used this is not guaranteed). Try it on another compiler? Another platform?
If you want to make sure the std::rand is used you should let your code use it explicitly (for example, using lambda expression):
random_shuffle(perm.begin(), perm.end(), []{return std::rand();});
On a somewhat unrelated note, the time()'s precision is one whole second, your code runs way faster than that (I would hope) so those multiple calls to srand() result in resetting to the same-ish seed

Reversed iterator from top to bottom, right to left, instead of completely reversed

I have a std::vector that goes in the following order:
0 3 6
1 4 7
2 5 8
using a reverse iterator I get the following output:
8 5 2
7 4 1
6 3 0
The output I need is:
6 3 0
7 4 1
8 5 2
how would i go about getting that required ordering, at first it seems like i would be able to get a reverse iterator position - vertical size but that gets tricky when my vector is something like this:
0 6 12
1 7 13
2 8 14
3 9 15
4 10 16
5 11 17...