how does LZW works on encoding and decoding - compression

I need Help on how LZW works. like how it encodes data along with the dictionary and how it is decoded back... please I need this for my compression.

Here's a simple example. Imagine you are trying to compress AAABBC.
Encoding
We'll assume that the original dictionary has all the letters A-Z with codes 1-26.
Here are all the iterations of the encoding:
i word x word+x Output To dictionary
---------------------------------------------
0 A A
1 A A AA 1 AA: 27
2 A A AA
3 AA B AAB 27 AAB: 28
4 B B BB 2 BB: 29
5 B C BC 2 BC: 30
6 3
The output is: 1 27 2 2 3
Decoding
We'll assume, again, that the original dictionary has all the letters A-Z with codes 1-26.
i x element word Output To dictionary
---------------------------------------------
0 1 A A A
1 27 AA A AA AA: 27
2 2 B AA B AAB: 28
3 2 B B B BB: 29
4 3 C B C BC: 30
Output: A AA B B C, which is the original string.
You can play around with it yourself to get a hang of it here:
LZW encoding
LZW decoding

Related

Search for a value in multiple columns and return value from Column A

Sheet Example:
A
B
C
D
E
F
Jonas
1
6
11
16
21
Joaquin
2
7
12
17
22
William
3
8
13
18
23
Mark
4
9
14
19
24
Stuart
5
10
15
20
25
Search value example:
19
Expected Return:
Mark
Formula indicated:
https://stackoverflow.com/a/55119579/11462274
=QUERY(Clients!A1:F, "select A where B="&B1&"
or C="&B1&"
or D="&B1&"
or E="&B1&"
or F="&B1&"", 1)
But the result is:
Jonas
Stuart
Why is Jonas returning when there is no value 19 in row 1?
An additional info:
If I have Columns from B to CC with values, is this still the indicated method? I ask because of the immense amount of lines I would have to write one by one for each of these columns.
try:
=INDEX(TEXTJOIN(", ", 1, IF(B1:F5=I1, A1:A5, )))
Try using an INDEX(MATCH()) formula.
=INDEX(A1:A5,MATCH(19,B4:F4,0))

SAS IF then statement

Hello for whatever reason my if then statement will not work for this code. What I am trying to get it to do is (kinda obvious but whatever) if the salary is LE 30,000 then make new variable income equal to low. Here is what I have so far.
data newdd2;
input subject group$ year salary : comma7. ##;
IF (salary <= 30,000) THEN income = 'low';
datalines;
1 A 2 53,900 2 B 2 37,400 3 A 1 49,500
4 C 2 43,900 5 B 3 38,400 6 A 3 39,500
7 A 3 53,600 8 B 2 37,700 9 C 1 49,900
10 C 2 43,300 11 B 3 57,400 12 B 3 39,500
13 B 1 33,900 14 A 2 41,400 15 C 2 49,500
16 C 1 43,900 17 B 1 39,400 18 A 3 39,900
19 A 2 53,600 20 A 2 37,700 21 C 3 42,900
22 C 2 43,300 23 B 1 57,400 24 C 3 69,500
25 C 2 33,900 26 A 2 35,300 27 A 2 47,500
28 C 2 43,900 29 B 3 38,400 30 A 1 32,500
31 A 3 53,600 32 B 2 37,700 33 C 1 41,600
34 C 2 43,300 35 B 3 57,400 36 B 3 39,500
37 B 2 33,900 38 A 2 41,400 39 C 2 79,500
40 C 1 43,900 41 C 1 29,500 42 A 3 39,900
43 A 2 53,600 44 A 2 37,500 45 C 3 42,900
46 C 2 43,300 47 B 1 47,400 48 C 3 59,500
run;
The error I keep getting is (The work dataset may be incomplete), however I am sure that my code is correct I've tried a number of things but no success yet thanks in advance.
You cannot use a comma in a numeric literal.
IF (salary <= 30000) THEN income = 'low';

SAS find in order variables

I need to find who has in order A-B-C. Please check the table for example;
id term grade subj num
10 2002 D 332 1
10 2002 A 333 2
11 2005 C 232 1
11 2005 A 232 2
11 2005 B 232 3
11 2005 C 232 4
15 2010 A 130 1
15 2010 B 130 2
15 2010 C 130 3
20 2000 B 500 1
20 2000 A 500 2
20 2000 C 500 3
What i need fromthis table is id : 11 AND 15
The output should be like
id term subj
11 2005 232
15 2010 130
So i need list the id's that had Grade of 'A' in it then was changed to 'B' then it was changed to 'C' .
Num could be in order. It dosen't have to start from 1, it could be 1 or 2 or 3, etc. But it should be in order A then B then C
I dont need to see the ID=20 bec for the num order grades' are not in order.
If all you are looking for is a simple 'A'-'B'-'C' sequence, then the LAG() function is sufficient. That is what I show in the example below. If you are looking for more sequences (e.g. 'A'-'B', 'B'-'C', 'A'-'B'-'C'-'D'), a slightly more complex solution is needed. If so, I'll edit the answer accordingly.
Below is a test program showing the implementation:
DATA d1;
INPUT
id :8.
term :8.
grade :$2.
subj :8.
num :8.
;
DATALINES;
10 2002 D 332 1
10 2002 A 333 2
11 2005 C 232 1
11 2005 A 232 2
11 2005 B 232 3
11 2005 C 232 4
15 2010 A 130 1
15 2010 B 130 2
15 2010 C 130 3
;
RUN;
DATA d2 (
KEEP = id term subj
);
SET d1;
grade_previous_1 = LAG1(grade);
grade_previous_2 = LAG2(grade);
IF (grade = 'C' AND grade_previous_1 = 'B' AND grade_previous_2 = 'A');
RUN;
Note that the LAG functions must be evaluated on their own lines and stored in variables, as shown above - don't fold them into the IF conditions or they won't always get executed. That is, don't say:
IF (grade = 'C' AND LAG1(grade) = 'B' AND LAG2(grade) = 'A');
That actually works in this example but in general it's better to get into the habit of calling LAG() outside of IF conditions and storing results in temporary variables.

Pandas Multi-indexing from a Flatten DataFrame

I want to resample flatten dataframe to multi-indexed columns.
Dataframe looks like :
goods category month stock
a c1 1 5
a c1 2 0
a c1 3 0
a c2 1 0
a c2 2 10
a c2 3 0
b c1 1 30
b c1 2 0
b c1 3 10
b c2 1 0
b c2 2 40
b c2 3 0
And I would like to set him like this :
stock
goods a b
category c1 c2 c1 c2
month
1 5 0 30 0
2 5 10 30 40
3 5 10 10 40
I try somethings with groupby or stack but I don't find a good way...Does anyone know how to do this ?
With unstack (to use this you first have to set the multi-index):
In [48]: df.set_index(['goods', 'category', 'month']).unstack([0,1])
Out[48]:
stock
goods a b
category c1 c2 c1 c2
month
1 5 0 30 0
2 0 10 0 40
3 0 0 10 0
Alternative with pivot_table (but be aware, if you have multiple values with the same combination of goods/category/month, they will be averaged by default (another function can be specified)):
In [54]: df.pivot_table(columns=['goods', 'category'], index='month', values='stock')
Out[54]:
goods a b
category c1 c2 c1 c2
month
1 5 0 30 0
2 0 10 0 40
3 0 0 10 0

All possible combinations of elements

I'd like to know a possible algorithm to calculate all possible combinations, without repetitions, starting from length=1 until length=N of N elements.
Example:
Elements: 1, 2, 3.
Output:
1
2
3
12
13
23
123
Look at the binary presentation of the numbers 0 to 2^n - 1.
n = 3
i Binary Combination
CBA
0 000
1 001 A
2 010 B
3 011 A B
4 100 C
5 101 A C
6 110 B C
7 111 A B C
So you just have to enumerate the numbers 1 to 2^n - 1 and look at the binary representation to know which elements to include. If you want to have them ordered by the number of elements post sort them or generate the numbers in order (there are several example on SO).