How can I match two columns of data by name? - openoffice-calc

I have two sets of data that look something like this:
Bill | 7
Sam | 13
Chuck | 9
and
Bill | 6
Sam | 3
Beth | 6
and I want:
Beth | 0 | 6
Bill | 7 | 6
Chuck| 9 | 0
Sam | 13 | 3
I don't even care if the data ends up looking like this:
Bill | 7 | Bill | 6
| | Beth | 6
Sam | 13 | Sam | 3
Chuck| 9 | Chuck| 0
I just would like to match up the names.

Your desired outcome - I've never seen such an order in "real life practice".
To use the data, I would go with an operating system tool to combine the source files
(like: copy file1 + file2 newfile.csv; new file extension for easily recognizing by OOo Calc).
In CALC you can then sort / filter, to show a persons data together, or sum / calculate with it.
If you want standard operations, like SUM per person, check out the pivot table feature.
HTH

Related

Can bitwise operators be used to quickly find a value from a permutation?

Let's say that I have two metrics: Age (1-20) and size (SMALL, MEDIUM, LARGE) and I want to find what the relative oldness (Young, Adult, Senior, Very Old) is? (Think "1 year old mouse vs 1 year old elephant," the oldness is different depending on size.)
SMALL MEDIUM LARGE
1 | Young 1 | Young 1 | Young
2 | Young 2 | Young 2 | Young
3 | Adult 3 | Young 3 | Young
4 | Adult 4 | Young 4 | Young
5 | Senior 5 | Young 5 | Young
6 | Very Old 6 | Adult 6 | Young
7 | Very Old 7 | Adult 7 | Young
8 | Very Old 8 | Adult 8 | Young
9 | Very Old 9 | Senior 9 | Young
10 | Very Old 10 | Senior 10 | Adult
11 | Very Old 11 | Senior 11 | Adult
12 | Very Old 12 | Senior 12 | Adult
13 | Very Old 13 | Very Old 13 | Adult
14 | Very Old 14 | Very Old 14 | Adult
15 | Very Old 15 | Very Old 15 | Senior
16 | Very Old 16 | Very Old 16 | Senior
17 | Very Old 17 | Very Old 17 | Senior
18 | Very Old 18 | Very Old 18 | Senior
19 | Very Old 19 | Very Old 19 | Senior
20 | Very Old 20 | Very Old 20 | Very Old
I could simply use a relational table like this to find the relative oldness given the age and size:
size age oldness
-------------------------
small 1 young
medium 8 adult
large 17 senior
This requires calculating and storing every permutation, and then searching against all of those permutations. Admittedly, databases are good at this. But, if I want to add another category VERY LARGE, the permutations I have to calculate, store and search against goes way up.
I could see a system working where size is just an age multiplier. For instance, if Small = x5, Medium = x2 and Large = x1, then the age 2 becomes 10, 4 and 2 respectively. You could then simplify the logic as "if multiplied age <= 10, oldness = young." However, the data might not always fits such a nice model. For instance, above, Larges are only young up to 9, not 10.
I feel like octals and bitwise operators might be a solution here, but I'm struggling to make it work. I'm thinking about chmod and how a permission like 755 holds 9 different pieces of information, and that you can ask simple questions like "can the current user execute?" by just doing 755 & 100. I think that perhaps something like this could help me, but I haven't been able to crack this nut.
Any ideas?

PowerBI Sort Columns in Matrix Visual

I have a Matrix visual in Microsoft PowerBI with Australian 'States' as rows and 'Months Ago' as columns.
By default the Matrix shows my columns from 0 months ago to 12. I would like it to show from 12 months ago on the left to 0 months ago on the right.
+-------------------+-----------------------------+-------+
| | Months Ago | |
+-------------------+-----------------------------+-------+
| State | 0 | 1 | 2 | 3 | 4 | 5 | Total |
+-------------------+----+----+----+----+----+----+-------+
| Queensland | 10 | 10 | 10 | 10 | 10 | 10 | 60 |
+-------------------+----+----+----+----+----+----+-------+
| New South Wales | | | | | | | |
+-------------------+----+----+----+----+----+----+-------+
| Victoria | | | | | | | |
+-------------------+----+----+----+----+----+----+-------+
| South Australia | | | | | | | |
+-------------------+----+----+----+----+----+----+-------+
| Western Australia | | | | | | | |
+-------------------+----+----+----+----+----+----+-------+
Currently I am only given the option to sort by the value type fields (ie revenue etc).
Is there any option to sort/order the Column Headers?
I don't think there is an option for you to sort column headers directly.
However, you can change the default sort order for the Months Ago column so that it will be reflected in general.
You can add a custom column MonthSrt = 12 - [Months Ago] in query editor:
(It won't work in DAX because of a known issue)
Then you can select the Months Ago column and sort it by MonthSrt:
The custom sort will be applied when you use the Months Ago column in visuals:
You can also make groups (1 to 1 items) al give them al logical number:
The order will change automaticly in the matrix
The following solution worked for me to display the dates in descending order in a matrix:
how to sort column dates in descending order of matrix in power bi

Power Bi create comma seperated list string where ids match

Im really new to Power Bi, I created the following table:
ID | POST_ID |
0 | 11 |
0 | 12 |
0 | 13 |
0 | 18 |
0 | 21 |
1 | 14 |
1 | 15 |
2 | 16 |
2 | 17 |
2 | 19 |
2 | 20 |
Now I need to pass this ids to an api as a comma seperated list, so I want to transform the table to:
ID | POST_ID |
0 | 11,12,13,18,21 |
1 | 14,15 |
2 | 16,17,19,20 |
But can't manage to do this. I assume it must be fairly easy to do? I have no clue to start, I've been messing around in the query editor now for a few hours and googling wont bring me much help either so far!
Thanks in advance!
Here is a solution
let
t=#table({"ID", "POST_ID"},{{0, 11},{0, 12},{0, 13},{0, 18},{0, 21},{1, 14},{1, 15},{2, 16},{2, 17},{2, 19},{2, 20}}),
group = Table.Group(t, {"ID"}, {{"val_list", each Text.Combine(List.Transform([POST_ID], Number.ToText), ",")}})
in
group
But... are you sure you finally need a text string?
If you pass result to API, perhaps you'd better pass a list of numbers
So your code should look like this
let
t=#table({"ID", "POST_ID"},{{0, 11},{0, 12},{0, 13},{0, 18},{0, 21},{1, 14},{1, 15},{2, 16},{2, 17},{2, 19},{2, 20}}),
group = Table.Group(t, {"ID"}, {{"val_list", each [POST_ID]}})
in
group

How to delete variables which occur in column x but not in column y?

How can I delete duplicates which occur in column x but not in column y?
My dataset is as follows:
+-------+---+---+
| year | x | y |
+-------+---+---+
| 2001 | 1 | 2 |
| 2001 | 2 | 3 |
| 2001 | 2 | 3 |
| 2001 | 4 | 6 |
| 2001 | 5 | 9 |
| 2001 | 4 | 2 |
| 2001 | 4 | 9 |
+-------+---+---+
What I want is to remove the entries which occur in column y from the ones in column x.
My result would be: 1,4,5
I am currently learning Stata and I would love to know a good source for all possible commands, if this exists? So I can learn better on my own. Currently I have trouble to find good sources.
In Stata what you call columns are always called variables.
See http://www.statalist.org/forums/help#stata for general advice on how to present data examples in Stata questions. (The comments on CODE delimiters don't apply here.)
This may help. I didn't understand the role of year in your problem.
clear
input year x y
2001 1 2
2001 2 3
2001 2 3
2001 4 6
2001 5 9
2001 4 2
2001 4 9
end
rename x Datax
rename y Datay
gen long obs = _n
reshape long Data, i(obs) j(which) string
bysort Data (which) : drop if which[_N] == "y"
list
+---------------------------+
| obs which year Data |
|---------------------------|
1. | 1 x 2001 1 |
2. | 4 x 2001 4 |
3. | 7 x 2001 4 |
4. | 6 x 2001 4 |
5. | 5 x 2001 5 |
+---------------------------+
All possible commands aren't documented in a single place. Someone could write new commands all the time and they would not be documented anywhere except their help files. Did you mean that? Nor are all existing commands documented in one place: many are user-written and most of those are just documented by their help files.
Most of the official commands in Stata as supplied by StataCorp are documented in the manuals. Literally, there are also undocumented commands (I am not inventing this: see help undocumented) and there are also nondocumented commands that exist, known about because StataCorp mention them in talks or emails. To be as positive as possible: start with the manuals, bundled with your copy of Stata as .pdf files.

Generating a new variable by selection from multiple variables

I have some data on diseases and age of diagnosis. Each participant was asked what diseases they have had and at what age that disease was diagnosed.
There are a set of variables disease1-28 with a numeric code for each disease and another set age1-28 with the age at diagnosis in years. The diseases are placed in successive variables in the order recalled; the age of diagnosis is placed in the appropriate age variable.
I would like to generate a new variable for each of several diseases giving the age of diagnosis of that disease: e.g. asthma_age_at_diagnosis
Can I do this without having 28 replace statements?
Example of the data:
+-------------+----------+----------+----------+------+------+------+
| Participant | Disease1 | Disease2 | Disease3 | Age1 | Age2 | Age3 |
+-------------+----------+----------+----------+------+------+------+
| 1 | 123 | 3 | . | 30 | 2 | . |
| 2 | 122 | 123 | 5 | 23 | 51 | 44 |
| 3 | 5 | . | . | 50 | . | . |
+-------------+----------+----------+----------+------+------+------+
I give a general heads-up that a question of this form without any code of your own is often considered off-topic for Stack Overflow. Still, the Stata users around here are the people answering Stata questions (surprise) and we usually indulge questions like this if interesting and well-posed.
I'd advise a different data structure, period. With your example data
clear
input Patient Disease1 Disease2 Disease3 Age1 Age2 Age3
1 123 3 . 30 2 .
2 122 123 5 23 51 44
3 5 . . 50 . .
end
You can reshape
reshape long Disease Age, i(Patient) j(Order)
drop if missing(Disease)
list, sep(0)
+--------------------------------+
| Patient Order Disease Age |
|--------------------------------|
1. | 1 1 123 30 |
2. | 1 2 3 2 |
3. | 2 1 122 23 |
4. | 2 2 123 51 |
5. | 2 3 5 44 |
6. | 3 1 5 50 |
+--------------------------------+
With the data in this form you can now answer lots of questions easily. I don't see that a whole bunch of new variables would make many analyses easier. Another way to see this is that you have hinted that the order in which diseases are coded is arbitrary; that being so, wiring that into the data structure is ill-advised. Even if order is important, it is still accessible as part of the dataset (variable Order).
Hint: If you still want separate variables for some purposes, look at separate.