In SAS: How to consolidate non zero values in rows by group

In SAS: How to consolidate non zero values in rows by group - sas

I have a dataset consisting of variables ObservationNumber, MeasurementNumber, SubjectID, and many dummy variables.
I would like to consolidate all non-zero values into one row by SubjectID GroupNumber.
Have:
ObsNum MeasurementNum SubjectID Dummy0 Dummy1 ... Dummy999
----------------------------------------------------...---------------
01 1 1 0 1 ... 0
02 2 1 0 1 ... 0
03 3 1 0 1 ... 0
04 4 1 0 0 ... 0
05 5 1 - - ... -
06 6 1 0 0 ... 0
07 1 2 1 0 ... 0
08 2 2 0 0 ... 0
09 3 2 0 1 ... 0
10 4 2 1 0 ... 0
11 4 2 0 1 ... 0
12 5 2 0 0 ... 1
13 6 2 0 0 ... 0
14 6 2 0 0 ... 1
15 6 2 0 0 ... 0
16 6 2 0 0 ... 0
17 6 2 0 1 ... 0
18 6 2 0 0 ... 0
19 6 2 0 0 ... 0
20 6 2 0 0 ... 0
21 6 2 1 0 ... 0
22 1 3 1 0 ... 0
23 2 3 0 1 ... 0
24 3 3 0 0 ... 1
25 4 3 - - ... -
26 5 3 0 0 ... 0
27 6 3 0 0 ... 0
28 1 4 - - ... -
29 2 4 0 0 ... 0
30 3 4 0 1 ... 0
31 4 4 1 0 ... 0
32 4 4 0 1 ... 0
33 4 4 0 0 ... 1
34 5 4 0 0 ... 1
35 6 4 0 1 ... 0
36 6 4 0 0 ... 1
Want:
MeasurementNum SubjectID Dummy0 Dummy1 ... Dummy999
----------------------------------------------------...---------------
1 1 0 1 ... 0
2 1 0 1 ... 0
3 1 0 1 ... 0
4 1 0 0 ... 0
5 1 - - ... -
6 1 0 0 ... 0
1 2 1 0 ... 0
2 2 0 0 ... 0
3 2 0 1 ... 0
4 2 1 1 ... 0
5 2 0 0 ... 1
6 2 1 1 ... 1
1 3 1 0 ... 0
2 3 0 1 ... 0
3 3 0 0 ... 1
4 3 - - ... -
5 3 0 0 ... 0
6 3 0 0 ... 0
1 4 - - ... -
2 4 0 0 ... 0
3 4 0 1 ... 0
4 4 1 1 ... 1
5 4 0 0 ... 1
6 4 0 1 ... 1
Each SubjectID has six measurement in which a set of dummyvariables are measured without outcome 0, 1 or missing. If a missing value occurs, all dummy variables for the respective observation are missing--and only one observation will be present in the dataset for that `MeasurementNumber.
I have tried to use the UPDATE statement, but it seems to not be able to deal with '0' and '-'.
Is there a direct way of condensing all dummyvariables in this dataset for each SubjectID grouped by MeasurementNumber?

Use Proc MEANS with BY and OUTPUT statements.
data have;
rownum = 0;
do rowid = 1 to 1000;
subjectid + 1;
do measurenum = 1 to 6;
do repeat = 1 to ceil(4 * ranuni(123));
array flags flag1-flag999;
do _n_ = 1 to dim(flags);
flags(_n_) = ranuni(123) < 0.10;
if subjectid < 7 and measurenum = subjectid then flags(_n_) = .;
end;
rownum + 1;
output;
end;
end;
end;
keep rownum measurenum subjectid flag:;
run;
proc means noprint data=have;
by subjectid measurenum;
var flag:;
output max=;
run;

Related

time series sliding window with occurrence counts

I am trying to get a count between two timestamped values:
for example:
time letter
1 A
4 B
5 C
9 C
18 B
30 A
30 B
I am dividing time to time windows: 1+ 30 / 30
then I want to know how many A B C in each time window of size 1
timeseries A B C
1 1 0 0
2 0 0 0
...
30 1 1 0
this shoud give me a table of 30 rows and 3 columns: A B C of ocurancess
The problem is the data is taking to long to be break down because it iterates through all master table every time to slice the data eventhough thd data is already sorted
master = mytable
minimum = master.timestamp.min()
maximum = master.timestamp.max()
window = (minimum + maximum) / maximum
wstart = minimum
wend = minimum + window
concurrent_tasks = []
while ( wstart <= maximum ):
As = 0
Bs = 0
Cs = 0
for d, row in master.iterrows():
ttime = row.timestamp
if ((ttime >= wstart) & (ttime < wend)):
#print (row.channel)
if (row.channel == 'A'):
As = As + 1
elif (row.channel == 'B'):
Bs = Bs + 1
elif (row.channel == 'C'):
Cs = Cs + 1
concurrent_tasks.append([m_id, As, Bs, Cs])
wstart = wstart + window
wend = wend + window
Could you help me in making this perform better ? i want to use map function and i want to prevent python from looping through all the loop every time.
This is part of big data and it taking days to finish ?
thank you

There is a faster approach - pd.get_dummies():
In [116]: pd.get_dummies(df.set_index('time')['letter'])
Out[116]:
A B C
time
1 1 0 0
4 0 1 0
5 0 0 1
9 0 0 1
18 0 1 0
30 1 0 0
30 0 1 0
If you want to "compress" (group) it by time:
In [146]: pd.get_dummies(df.set_index('time')['letter']).groupby(level=0).sum()
Out[146]:
A B C
time
1 1 0 0
4 0 1 0
5 0 0 1
9 0 0 1
18 0 1 0
30 1 1 0
or using sklearn.feature_extraction.text.CountVectorizer:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(token_pattern=r"\b\w+\b", stop_words=None)
r = pd.SparseDataFrame(cv.fit_transform(df.groupby('time')['letter'].agg(' '.join)),
index=df['time'].unique(),
columns=df['letter'].unique(),
default_fill_value=0)
Result:
In [143]: r
Out[143]:
A B C
1 1 0 0
4 0 1 0
5 0 0 1
9 0 0 1
18 0 1 0
30 1 1 0
If we want to list all times from 1 to 30:
In [153]: r.reindex(np.arange(r.index.min(), r.index.max()+1)).fillna(0).astype(np.int8)
Out[153]:
A B C
1 1 0 0
2 0 0 0
3 0 0 0
4 0 1 0
5 0 0 1
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 1
10 0 0 0
11 0 0 0
12 0 0 0
13 0 0 0
14 0 0 0
15 0 0 0
16 0 0 0
17 0 0 0
18 0 1 0
19 0 0 0
20 0 0 0
21 0 0 0
22 0 0 0
23 0 0 0
24 0 0 0
25 0 0 0
26 0 0 0
27 0 0 0
28 0 0 0
29 0 0 0
30 1 1 0
or using Pandas approach:
In [159]: pd.get_dummies(df.set_index('time')['letter']) \
...: .groupby(level=0) \
...: .sum() \
...: .reindex(np.arange(r.index.min(), r.index.max()+1), fill_value=0)
...:
Out[159]:
A B C
time
1 1 0 0
2 0 0 0
3 0 0 0
4 0 1 0
5 0 0 1
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 1
10 0 0 0
... .. .. ..
21 0 0 0
22 0 0 0
23 0 0 0
24 0 0 0
25 0 0 0
26 0 0 0
27 0 0 0
28 0 0 0
29 0 0 0
30 1 1 0
[30 rows x 3 columns]
UPDATE:
Timing:
In [163]: df = pd.concat([df] * 10**4, ignore_index=True)
In [164]: %timeit pd.get_dummies(df.set_index('time')['letter'])
100 loops, best of 3: 10.9 ms per loop
In [165]: %timeit df.set_index('time').letter.str.get_dummies()
1 loop, best of 3: 914 ms per loop

SAS sgplot: different symbols and colours by group

The following code produces the picture below.
As you can see, the group statement results in different colours for the data points.
Question: How can I also have different symbols for the two groups?
proc sgplot data=test;
scatter x=time y=Y / group=group;
run;
group time Y
0 0 10085.472039
0 0 10085.472039
0 0 10085.472039
0 1 9950.3642122
0 2 9817.0663279
0 4 9555.8037259
0 6 9301.4941325
0 8 9053.9525066
0 8 9053.9525066
0 8 9053.9525066
1 0 2954.7558871
1 0 2954.7558871
1 0 2954.7558871
1 1 2987.6191302
1 2 3020.8478832
1 4 3088.4182255
1 6 3157.4999815
1 8 3228.1269586
1 8 3228.1269586
1 8 3228.1269586
0 0 3929.2678194
0 0 3929.2678194
0 0 3929.2678194
0 1 3903.7639936
0 2 3878.4257063
0 4 3828.2414563
0 6 3778.7065572
0 8 3729.8126068
0 8 3729.8126068
0 8 3729.8126068
1 0 2694.5952697
1 0 2694.5952697
1 0 2694.5952697
1 1 2580.159876
1 2 2470.5843807
1 4 2265.1962804
1 6 2076.8827929
1 8 1904.2244475
1 8 1904.2244475
1 8 1904.2244475

Using http://www.ats.ucla.edu/stat/sas/faq/gr2grps_new.htm:
symbol1 v=star c=red h=1;
symbol2 v=triangle c=blue h=1;
proc gplot data=temp;
plot y*time=group;
run;
quit;

generating combinations of combinations

I'm trying to generate code which will take the components (i.e, a-f) of various combination permutations (combo) one, two, three, or four units long using these six components and provide various non duplicating combinations of combinations (combo.combo) which contain all of the components (i.e., [ab + cdef and ac + bde + f] but not [ae + bc + df and aef + bc + d]).
It would be nice if this code could allow me to 1) input the number of components, 2) input the min and max unit length per combo, 3) input the min and max number of combos per combo.combo, and 4) randomize the output list of combo.combos.
Maybe start with some kind of iteration loop to generate each version of the 720 possible component combinations (a-f) and then start pruning that list based on the set limiting parameters? I've got some working knowledge of python and will get started, but any tips or suggestions are most welcome.
combo.combo a b c d e f
a.bcdef 1 1 1 1 1 1
ab.cdef 1 1 1 1 1 1
abc.def 1 1 1 1 1 1
abcd.ef 1 1 1 1 1 1
abcde.f 1 1 1 1 1 1
a.b.cdef 1 1 1 1 1 1
a.bc.def 1 1 1 1 1 1
a.bcd.ef 1 1 1 1 1 1
a.bcde.f 1 1 1 1 1 1
ab.c.def 1 1 1 1 1 1
I've found a lot of code which will generate combination permutations but not combinations of combinations. I've included a binary matrix for the combination components, but am stuck on where to proceed from here or if this matrix is a false start (although a helpful visual aide.)
combo a b c d e f
a 1 0 0 0 0 0
b 0 1 0 0 0 0
c 0 0 1 0 0 0
d 0 0 0 1 0 0
e 0 0 0 0 1 0
f 0 0 0 0 0 1
ab 1 1 0 0 0 0
ac 1 0 1 0 0 0
ad 1 0 0 1 0 0
ae 1 0 0 0 1 0
af 1 0 0 0 0 1
bc 0 1 1 0 0 0
bd 0 1 0 1 0 0
be 0 1 0 0 1 0
bf 0 1 0 0 0 1
cd 0 0 1 1 0 0
ce 0 0 1 0 1 0
cf 0 0 1 0 0 1
de 0 0 0 1 1 0
df 0 0 0 1 0 1
ef 0 0 0 0 1 1
abc 1 1 1 0 0 0
abd 1 1 0 1 0 0
abe 1 1 0 0 1 0
abf 1 1 0 0 0 1
acd 1 0 1 1 0 0
ace 1 0 1 0 1 0
acf 1 0 1 0 0 1
ade 1 0 0 1 1 0
adf 1 0 0 1 0 1
aef 1 0 0 0 1 1
bcd 0 1 1 1 0 0
bce 0 1 1 0 1 0
bcf 0 1 1 0 0 1
bde 0 1 0 1 1 0
bdf 0 1 0 1 0 1
bef 0 1 0 0 1 1
cde 0 0 1 1 1 0
cdf 0 0 1 1 0 1
cef 0 0 1 0 1 1
def 0 0 0 1 1 1
abcd 1 1 1 1 0 0
abce 1 1 1 0 1 0
abcf 1 1 1 0 0 1
abde 1 1 0 1 1 0
abdf 1 1 0 1 0 1
abef 1 1 0 0 1 1
acde 1 0 1 1 1 0
acdf 1 0 1 1 0 1
acef 1 0 1 0 1 1
adef 1 0 0 1 1 1
bcde 0 1 1 1 1 0
bcdf 0 1 1 1 0 1
bcef 0 1 1 0 1 1
bdef 0 1 0 1 1 1
cdef 0 0 1 1 1 1

The approach which first comes to mind is this:
generate all the combinations using the given components (which you already did :) )
treat the resulting combinations as a new set of components (so instead of a, b,...,f your set will contain a, ab, abc, ...)
generate all the combinations from the second set
from the new set of combinations only keep those which apply to your condition (it's not very clear from your example what the constraint is)
This, of course, has sky-high exponential complexity, since you'll have to backtrack twice and step 3 has way more possibilities.
It's very possible that there's a more efficient algorithm, starting from the constraint ("non duplicating combinations of combinations which contain all of the components").

Find all puddles on the square (algorithm)

The problem is defined as follows:
You're given a square. The square is lined with flat flagstones size 1m x 1m. Grass surround the square. Flagstones may be at different height. It starts raining. Determine where puddles will be created and compute how much water will contain. Water doesn't flow through the corners. In any area of grass can soak any volume of water at any time.
Input:
width height
width*height non-negative numbers describing a height of each flagstone over grass level.
Output:
Volume of water from puddles.
width*height signs describing places where puddles will be created and places won't.
. - no puddle
# - puddle
Examples
Input:
8 8
0 0 0 0 0 1 0 0
0 1 1 1 0 1 0 0
0 1 0 2 1 2 4 5
0 1 1 2 0 2 4 5
0 3 3 3 3 3 3 4
0 3 0 1 2 0 3 4
0 3 3 3 3 3 3 0
0 0 0 0 0 0 0 0
Output:
11
........
........
..#.....
....#...
........
..####..
........
........
Input:
16 16
8 0 1 0 0 0 0 2 2 4 3 4 5 0 0 2
6 2 0 5 2 0 0 2 0 1 0 3 1 2 1 2
7 2 5 4 5 2 2 1 3 6 2 0 8 0 3 2
2 5 3 3 0 1 0 3 3 0 2 0 3 0 1 1
1 0 1 4 1 1 2 0 3 1 1 0 1 1 2 0
2 6 2 0 0 3 5 5 4 3 0 4 2 2 2 1
4 2 0 0 0 1 1 2 1 2 1 0 4 0 5 1
2 0 2 0 5 0 1 1 2 0 7 5 1 0 4 3
13 6 6 0 10 8 10 5 17 6 4 0 12 5 7 6
7 3 0 2 5 3 8 0 3 6 1 4 2 3 0 3
8 0 6 1 2 2 6 3 7 6 4 0 1 4 2 1
3 5 3 0 0 4 4 1 4 0 3 2 0 0 1 0
13 3 6 0 7 5 3 2 21 8 13 3 5 0 13 7
3 5 6 2 2 2 0 2 5 0 7 0 1 3 7 5
7 4 5 3 4 5 2 0 23 9 10 5 9 7 9 8
11 5 7 7 9 7 1 0 17 13 7 10 6 5 8 10
Output:
103
................
..#.....###.#...
.......#...#.#..
....###..#.#.#..
.#..##.#...#....
...##.....#.....
..#####.#..#.#..
.#.#.###.#..##..
...#.......#....
..#....#..#...#.
.#.#.......#....
...##..#.#..##..
.#.#.........#..
......#..#.##...
.#..............
................
I tried different ways. Floodfill from max value, then from min value, but it's not working for every input or require code complication. Any ideas?
I'm interesting algorithm with complexity O(n^2) or o(n^3).

Summary
I would be tempted to try and solve this using a disjoint-set data structure.
The algorithm would be to iterate over all heights in the map performing a floodfill operation at each height.
Details
For each height x (starting at 0)
Connect all flagstones of height x to their neighbours if the neighbour height is <= x (storing connected sets of flagstones in the disjoint set data structure)
Remove any sets that connected to the grass
Mark all flagstones of height x in still remaining sets as being puddles
Add the total count of flagstones in remaining sets to a total t
At the end t gives the total volume of water.
Worked Example
0 0 0 0 0 1 0 0
0 1 1 1 0 1 0 0
0 1 0 2 1 2 4 5
0 1 1 2 0 2 4 5
0 3 3 3 3 3 3 4
0 3 0 1 2 0 3 4
0 3 3 3 3 3 3 0
0 0 0 0 0 0 0 0
Connect all flagstones of height 0 into sets A,B,C,D,E,F
A A A A A 1 B B
A 1 1 1 A 1 B B
A 1 C 2 1 2 4 5
A 1 1 2 D 2 4 5
A 3 3 3 3 3 3 4
A 3 E 1 2 F 3 4
A 3 3 3 3 3 3 A
A A A A A A A A
Remove flagstones connecting to the grass, and mark remaining as puddles
1
1 1 1 1
1 C 2 1 2 4 5 #
1 1 2 D 2 4 5 #
3 3 3 3 3 3 4
3 E 1 2 F 3 4 # #
3 3 3 3 3 3
Count remaining set size t=4
Connect all of height 1
G
C C C G
C C 2 D 2 4 5 #
C C 2 D 2 4 5 #
3 3 3 3 3 3 4
3 E E 2 F 3 4 # #
3 3 3 3 3 3
Remove flagstones connecting to the grass, and mark remaining as puddles
2 2 4 5 #
2 2 4 5 #
3 3 3 3 3 3 4
3 E E 2 F 3 4 # # #
3 3 3 3 3 3
t=4+3=7
Connect all of height 2
A B 4 5 #
A B 4 5 #
3 3 3 3 3 3 4
3 E E E E 3 4 # # #
3 3 3 3 3 3
Remove flagstones connecting to the grass, and mark remaining as puddles
4 5 #
4 5 #
3 3 3 3 3 3 4
3 E E E E 3 4 # # # #
3 3 3 3 3 3
t=7+4=11
Connect all of height 3
4 5 #
4 5 #
E E E E E E 4
E E E E E E 4 # # # #
E E E E E E
Remove flagstones connecting to the grass, and mark remaining as puddles
4 5 #
4 5 #
4
4 # # # #
After doing this for heights 4 and 5 nothing will remain.
A preprocessing step to create lists of all locations with each height should mean that the algorithm is close to O(n^2).

This seems to be working nicely. The idea is it is a recursive function, that checks to see if there is an "outward flow" that will allow it to escape to the edge. If the values that do no have such an escape will puddle. I tested it on your two input files and it works quite nicely. I copied the output for these two files for you. Pardon my nasty use of global variables and what not, I figured it was the concept behind the algorithm that mattered, not good style :)
#include <fstream>
#include <iostream>
#include <vector>
using namespace std;
int SIZE_X;
int SIZE_Y;
bool **result;
int **INPUT;
bool flowToEdge(int x, int y, int value, bool* visited) {
if(x < 0 || x == SIZE_X || y < 0 || y == SIZE_Y) return true;
if(visited[(x * SIZE_X) + y]) return false;
if(value < INPUT[x][y]) return false;
visited[(x * SIZE_X) + y] = true;
bool left = false;
bool right = false;
bool up = false;
bool down = false;
left = flowToEdge(x-1, y, value, visited);
right = flowToEdge(x+1, y, value, visited);
up = flowToEdge(x, y+1, value, visited);
down = flowToEdge(x, y-1, value, visited);
return (left || up || down || right);
}
int main() {
ifstream myReadFile;
myReadFile.open("test.txt");
myReadFile >> SIZE_X;
myReadFile >> SIZE_Y;
INPUT = new int*[SIZE_X];
result = new bool*[SIZE_X];
for(int i = 0; i < SIZE_X; i++) {
INPUT[i] = new int[SIZE_Y];
result[i] = new bool[SIZE_Y];
for(int j = 0; j < SIZE_Y; j++) {
int someInt;
myReadFile >> someInt;
INPUT[i][j] = someInt;
result[i][j] = false;
}
}
for(int i = 0; i < SIZE_X; i++) {
for(int j = 0; j < SIZE_Y; j++) {
bool visited[SIZE_X][SIZE_Y];
for(int k = 0; k < SIZE_X; k++)//You can avoid this looping by using maps with pairs of coordinates instead
for(int l = 0; l < SIZE_Y; l++)
visited[k][l] = 0;
result[i][j] = flowToEdge(i,j, INPUT[i][j], &visited[0][0]);
}
}
for(int i = 0; i < SIZE_X; i++) {
cout << endl;
for(int j = 0; j < SIZE_Y; j++)
cout << result[i][j];
}
cout << endl;
}
The 16 by 16 file:
1111111111111111
1101111100010111
1111111011101011
1111000110101011
1011001011101111
1110011111011111
1100000101101011
1010100010110011
1110111111101111
1101101011011101
1010111111101111
1110011010110011
1010111111111011
1111110110100111
1011111111111111
1111111111111111
The 8 by 8 file
11111111
11111111
11011111
11110111
11111111
11000011
11111111
11111111
You could optimize this algorithm easily and considerably by doing several things. A: return true immediately upon finding a route would speed it up considerably. You could also connect it globally to the current set of results so that any given point would only have to find a flow point to an already known flow point, and not all the way to the edge.
The work involved, each n will have to exam each node. However, with optimizations, we should be able to get this much lower than n^2 for most cases, but it still an n^3 algorithm in the worst case... but creating this would be very difficult(with proper optimization logic... dynamic programming for the win!)
EDIT:
The modified code works for the following circumstances:
8 8
1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 1
1 0 1 1 1 1 0 1
1 0 1 0 0 1 0 1
1 0 1 1 0 1 0 1
1 0 1 1 0 1 0 1
1 0 0 0 0 1 0 1
1 1 1 1 1 1 1 1
And these are the results:
11111111
10000001
10111101
10100101
10110101
10110101
10000101
11111111
Now when we remove that 1 at the bottom we want to see no puddling.
8 8
1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 1
1 0 1 1 1 1 0 1
1 0 1 0 0 1 0 1
1 0 1 1 0 1 0 1
1 0 1 1 0 1 0 1
1 0 0 0 0 1 0 1
1 1 1 1 1 1 0 1
And these are the results
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1

J (Tacit) Sieve Of Eratosthenes

I'm looking for a J code to do the following.
Suppose I have a list of random integers (sorted),
2 3 4 5 7 21 45 49 61
I want to start with the first element and remove any multiples of the element in the list then move on to the next element cancel out its multiples, so on and so forth.
Thus the output
I'm looking at is 2 3 5 7 61. Basically a Sieve Of Eratosthenes. Would appreciate if someone could explain the code as well, since I'm learning J and find it difficult to get most codes :(
Regards,
babsdoc

It's not exactly what you ask but here is a more idiomatic (and much faster) version of the Sieve.
Basically, what you need is to check which number is a multiple of which. You can get this from the table of modulos: |/~
l =: 2 3 4 5 7 21 45 49 61
|/~ l
0 1 0 1 1 1 1 1 1
2 0 1 2 1 0 0 1 1
2 3 0 1 3 1 1 1 1
2 3 4 0 2 1 0 4 1
2 3 4 5 0 0 3 0 5
2 3 4 5 7 0 3 7 19
2 3 4 5 7 21 0 4 16
2 3 4 5 7 21 45 0 12
2 3 4 5 7 21 45 49 0
Every pair of multiples gives a 0 on the table. Now, we are not interested in the 0s that correspond to self-modulos (2 mod 2, 3 mod 3, etc; the 0s on the diagonal) so we have to remove them. One way to do this is to add 1s on their place, like so:
=/~ l
1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1
(=/~l) + (|/~l)
1 1 0 1 1 1 1 1 1
2 1 1 2 1 0 0 1 1
2 3 1 1 3 1 1 1 1
2 3 4 1 2 1 0 4 1
2 3 4 5 1 0 3 0 5
2 3 4 5 7 1 3 7 19
2 3 4 5 7 21 1 4 16
2 3 4 5 7 21 45 1 12
2 3 4 5 7 21 45 49 1
This can be also written as (=/~ + |/~) l.
From this table we get the final list of numbers: every number whose column contains a 0, is excluded.
We build this list of exclusions simply by multiplying by column. If a column contains a 0, its product is 0 otherwise it's a positive number:
*/ (=/~ + |/~) l
256 2187 0 6250 14406 0 0 0 18240
Before doing the last step, we'll have to improve this a little. There is no reason to perform long multiplications since we are only interested in 0s and not-0s. So, when building the table, we'll keep only 0s and 1s by taking the "sign" of each number (this is the signum:*):
* (=/~ + |/~) l
1 1 0 1 1 1 1 1 1
1 1 1 1 1 0 0 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 0 1 1
1 1 1 1 1 0 1 0 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
so,
*/ * (=/~ + |/~) l
1 1 0 1 1 0 0 0 1
From the list of exclusion, you just copy:# the numbers to your final list:
l #~ */ * (=/~ + |/~) l
2 3 5 7 61
or,
(]#~[:*/[:*=/~+|/~) l
2 3 5 7 61

Tacit iteration is usually done with the conjunction Power. When the test for completion needs to be something other than hitting a fixpoint, the Do While construction works well.
In this solution filterMultiplesOfHead is applied repeatedly until there are no more numbers not either applied or filtered. Numbers already applied are accumulated in a partial answer. When the list to be processed is empty the partial answer is the result, after stripping off the boxing used to segregate processed from unprocessed data.
filterMultiplesOfHead=: {. (((~: >.)# %~) # ]) }.
appendHead=: (>#[ , {.#>#])/
pass=: appendHead ; filterMultiplesOfHead#>#{:
prep=: a: , <
unfinished=: [: -. a: -: {:
sieve=: [: ; [: pass^:unfinished^:_ prep
sieve 2 3 4 5 7 21 45 49 61
2 3 5 7 61
prep 2 3 4 7 9 10
┌┬────────────┐
││2 3 4 7 9 10│
└┴────────────┘
appendHead prep 2 3 4 7 9 10
2
filterMultiplesOfHead 2 3 4 7 9 10
3 7 9
pass^:2 prep 2 3 4 7 9 10
┌───┬─┐
│2 3│7│
└───┴─┘
sieve 1-.~/:~~.>:?.$~100
2 3 7 11 29 31 41 53 67 73 83 95 97

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

In SAS: How to consolidate non zero values in rows by group - sas

Related

time series sliding window with occurrence counts

SAS sgplot: different symbols and colours by group

generating combinations of combinations

Find all puddles on the square (algorithm)

J (Tacit) Sieve Of Eratosthenes

Categories

Resources