How to add bins created by xtabs with zero values - tabulate

I'm creating frequency tables of my data to then compare. however, further comparisons are not possible because my frequencies created by Xtabs are resulting in different lengths. How do I force a specified length on Xtabs that will fill in the missing bins with zeros rather than leave them out. For example, using xtabs(~lcat20+AgeBin1, data = datasource), my result was:
AgeBin1
lcat20 0 1 2 3 4
100 1 0 0 0 0
160 5 1 0 0 0
180 2 3 0 0 0
200 1 2 0 0 0
lcat20=120 and lcat20=140 are missing because that dataset does not have any samples in those sizes. How do I fill in these missing categories? I appreciate any help.

Related

Retrieve multiple ArrayFire subarrays from min/max data points

I have an array with sections of touching values in it. For example:
0 0 1 0 0 0 0 0 0 0
0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 2 2 0 0
0 0 0 0 0 0 0 2 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0
from this, I created a set of af::arrays: minX, maxX, minY, maxY. These define the box that encloses each group.
so for this example:
minX would be: [1,5,2] // 1 for label(1), 5 for label(2) and 2 for label(3)
maxX would be: [3,7,2] // 3 for label(1), 7 for label(2) and 2 for label(3)
minY would be: [0,3,7] // 0 for label(1), 3 for label(2) and 7 for label(3)
maxY would be: [1,4,9] // 1 for label(1), 4 for label(2) and 9 for label(3)
So if you take the i'th element from each of those arrays, you can get the upperleft/lowerright bounds of a box that encloses the corresponding label.
I would like use these values to pull out subarrays from this larger array. My goal is to put these values enclosed in the boxes into a flat list. In GPU memory, I also have calculated how many entries I would need for each box using the max/min X/Y values. So in this example - the result of the flat list should be:
result=[0 1 0 1 1 1 2 2 2 0 0 2 3 3 3]
where the first 6 entries are from the box
______
|0 1 0 |
|1 1 1 |
------
the second 6 entries are from the box
______
|2 2 2 |
|0 0 2 |
------
and the final three entries are from the box
___
| 3 |
| 3 |
| 3 |
---
I cannot figure out how to index into this af::array with min/max values in memory that resides on the GPU (and do not want to transfer them to the CPU). I was trying to see if gfor/seq would work for me, but it appears that af::seq cannot use array data, and everything I have tried with using af::index i could not get to work for me either.
I am able to change how I represent min/max (I could store indices for upper left/lower right) but my main goal is to do this efficiently on the GPU without moving data back and forth between the GPU and CPU.
How can this be achieved efficiently with ArrayFire?
Thank you for your help
How did you get there so far? which language are you using?
I guess you could be tiling the results to 3rd dimensions to handle each regions separately and end up with min/max vectors in GPU memory.

effective calculate method for RCPSP(resource constrained project scheduling problem)

The problem I am solving is scheduling tasks from limited resources.
The way I thought about it is to use a two-dimensional array to identify resources.
I wonder how I can calculate efficiently because the operation speed is too long.
Using a binary tree is likely to be difficult. After calculation, there is a process of randomly exchanging indexes for the search process.
For example)
Factory's capacity : 4
A(2,2) B(3,2) C(1,1) \\\\task(processing time , required area)
Schedule : A-B-C ,1 means that there is space left, and 0 means that there is no space.
A task can only be allocated if the space required is continuously present.
The x-axis represents time and the y-axis represents capacity.
1 1 1 ... ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ 1 1 1 ㅡㅡㅡㅡㅡㅡㅡ 0 0 0 1 ㅡㅡㅡㅡㅡ 0 0 0 1
1 1 1 ... ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ 1 1 1 ㅡㅡㅡㅡㅡㅡㅡ 0 0 0 1 ㅡㅡㅡㅡㅡ 0 0 0 1
1 1 1 ... ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ 0 0 1 ㅡㅡㅡㅡㅡㅡㅡ 0 0 1 1 ㅡㅡㅡㅡㅡ 0 0 1 1
1 1 1 ... ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ 0 0 1 ㅡㅡㅡㅡㅡㅡㅡ 0 0 1 1 ㅡㅡㅡㅡㅡ 0 0 0 1
The good new is your problem looks very similar to well known Job shop scheduling problem. The bad new is Job shop scheduling is NP-hard.

sas search value across column with array and extract values of next 12 columns

I want to count the number of 'noncure' occurrences across different columns with some condition, at different position dates. How do I search for the occurrence of 12 '1's across columns.
[UPDATE]
I've modified my dataset and think this is the best way to populate out my desired results.
This is a sample of my raw data
data have;
input acct flg1 flg2 flg3 flg4 flg5 flg6 flg7 flg8 flg9 flg10 flg11 flg12 flg13 flg14 flg15 flg16 flg17 flg18 flg19 flg20 flg21 flg22 flg23 flg24 flg25;
datalines;
AA 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1
run;
The numbers on flg represent months - eg flg1 = jan10, flg2 = feb10 & so on.
To get noncure, certain conditions have to be fulfilled.
flg(i) has to be 0
noncure only happens if there is a minimum of 12 consecutive flg of '1' in the future
an account can have more than 1 noncure incidents
The computation of noncure should look like this (Refer to image for a better view - highlighted in green)
AA 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
noncure1 is 1 because flg1 is 0 and the next 12 1 is at flg9
noncure2 is 1 because flg2 is 0 and the next 12 1 is at flg9
noncure4 is 0 because flg4 is not 0
noncure23 is 0 because even though flg23 is 0, there is no following consecutive 12 at flg25 (only one count of '1')
I'm having problems searching for my first instance of consecutive 12 '1' at flg(i).
I was thinking of doing an array to populate out position of consecutive 12 (eg nc_pos) then do i to nc_pos - something along the lines of
nc_pos = <search for 12 consecutive occurrence of '1' from flg(i)> **I don't know the code for this**
if flg(i) = 0 then do i to nc_pos;
noncure_tag = 1;
obs_pos = i;
FYI I have few hundred thousand accounts with a total of 84 months and their starting positions are different (eg flg1 could be null and the first 0 or 1 may appear at flg3).
My final output should look something like the image file labelled TARGET highlighted in yellow.

How do I generate all vectors of size n where each element may contain 1 of m different values?

Sorry if this is a duplicate, but I did not find any answers which match mine.
Consider that I have a vector which contains 3 values. I want to construct another vector of a specified length from this vector. For example, let's say that the length n=3 and the vector contains the following values 0 1 2. The output that I expect is as follows:
0 0 0
0 0 1
0 0 2
0 1 0
0 1 1
0 1 2
0 2 0
0 2 1
0 2 2
1 0 0
1 0 1
1 0 2
1 1 0
1 1 1
1 1 2
1 2 0
1 2 1
1 2 2
2 0 0
2 0 1
2 0 2
2 1 0
2 1 1
2 1 2
2 2 0
2 2 1
2 2 2
My current implementation simply constructs for loops based on nand generates the expected output. I want to be able to construct output vectors of different lengths and with different values in the input vector.
I have looked at possible implementations using next_permutation, but unfortunately passing a length value does not seem to work.
Are there time and complexity algorithms that one can use for this case? Again, I might have compute this for up to n=17and sizeof vector around 6.
Below is my implementation for n=3. Here, encis the vector which contains the input.
vector<vector<int> > combo_3(vector<double>enc,int bw){
vector<vector<int> > possibles;
for (unsigned int inner=0;inner<enc.size();inner++){
for (unsigned int inner1=0;inner1<enc.size();inner1++){
for (unsigned int inner2=0;inner2<enc.size();inner2++){
cout<<inner<<" "<<inner1<<" "<<inner2<<endl;
unsigned int arr[]={inner,inner1,inner2};
vector<int>current(arr,arr+sizeof(arr)/sizeof(arr[0]));
possibles.push_back(current);
current.clear();
}
}
}
return possibles;
}
What you are doing is simple counting. Think of your output vector as a list of a list of digits (a vector of a vector). Each digit may have one of m different values where m is the size of your input vector.
This is not permutation generation. Generating every permutation means generating every possible ordering of an input vector, which is not what you're looking for at all.
If you think of this as a counting problem the answer may become clearer to you. For example, how would you generate all base 10 numbers with 5 digits? In that case, your input vector has size 10, and each vector in your output list has length 5.

iterate through a 3d matrix, finding all solutions

iterate through a 3d matrix
i need to check every possible solution to a certain predicament.
i have a matrix[x][y][z] that represents possible nodes to travel through. I already finished a method that should give me a set of solutions (it disables a single path every iteration and recalculates the entire solution, disable priority is based on travel capacity of the last solution)
but i need to see how effective my method is in terms of total time taken to calculate a set. For this i require a method calculate the solution on every permutation of these paths.
currently it only has a single layer in between 2 main layers (L1) where 0 is a free path and 1 is a non accessible path.
This here is the starting layout where i can toggle the values on layer L1 from 0 to 1 to disable a path and the basis of my shortest path search algorithm.
L0 0 0 0 0 0 L1 0 1 0 1 0 L2 0 0 0 0 0
0 1 0 1 0 1 1 1 1 1 0 1 0 1 0
0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
0 1 0 1 0 1 1 1 1 1 0 1 0 1 0
0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
how can i iterate through every possible combination of disabling the free paths when the dimensions of the matrix are non constant (meaning they are already user defined on compile time and can be changed whenever)? there are 2^n solutions where n is the number of free path on all mediary layers.
(a quick explanation in C or C++ would be best, even pseudo code is good) since there currently 9 free path to make combinations with there should be about 2^9 solutions which i need to test with. I havent done any brute force algorithms before so i have no idea how to make one.