Automating Chart Data generation for a Daily Expenses per Tag - google-visualization

Using Google Spreadsheets. Here's an attempt to replicate my table needs:
2012-08-30 food $15 expensive dinner
2012-08-30 food $10 pizza!
2012-08-30 other $30 that damn painting
2012-09-02 home $40 can't remember
2012-09-02 other $5 toilet paper
2012-09-02 home $2 buying new flowers
I can already do 2 things with it, but it is so far from optimal it becomes negligible.
First, Using SUMIF:
food $25 <- SUMIF(B:B;"food";C:C)
other $35
home $42
Then, combining it with ARRAYFORMULA:
food home other
2012-08-30 $25 $0 $30 <- ARRAYFORMULA(SUMIF(A:A&B:B;2012-08-30&"food";C:C))
2012-09-02 $0 $42 $5
See where this can become too big? Well...
I want to do 2 graphic charts out of this. Of course, the main one is the second:
A pie, from the first example. On using the SUMIF, I need to explicitly, write "food" there (or reference, whatever). Could that be done automagically filling every tag found?
While I can live without this, it may be the answer to the second, main question;
A plot or timeline, from the arrayformula. It should trace each tag in a plot along the time.
Is it even possible to be done? If not, any suggestions? I'm keen to start scripting if needed (and worth it). Or move away from google. Or from spreadsheets all together (lastly). Python maybe? Ruby?
Or maybe I'll just leave it as it is, if it's tooooo much trouble.

Using QUERY, you can generate the first table (with headers) using:
=QUERY(B:C;"select B, sum(C) where B != '' group by B label B 'Category', sum(C) 'Total'";0)
and this should be fairly easily plotted as a pie chart. You can select a range for the chart that is much longer than the current table to accommodate growing data, and the pie chart will conveniently ignore blank rows.
The second table can be generated using:
=QUERY(A:C;"select A, sum(C) where A is not null group by A pivot B";0)
and you can experiment with various chart types to achieve the desired visualisation.
edit
To provide a table that populates with zeros instead of blanks as per your comment; assuming the upper left (blank) cell of the table is I1, then in I2:
=SORT(UNIQUE(A:A))
and in J1:
=TRANSPOSE(SORT(UNIQUE(B:B)))
and then in J2:
=ArrayFormula(IF(I2:I*LEN(J1:1);MMULT(I2:I=TRANSPOSE(A:A);(J1:1=B:B)*C:C);IFERROR(1/0)))
Note this will populate CONTINUE functions to the far bottom and far right of the spreadsheet, over-writing everything in their path. So probably best to have a sheet dedicated to this table.

In general, Spreadsheets are not Databases, and this is a task for which you are fast approaching needing a database. However, as luck would have it (depending on how you look at it, anyhow), Google Spreadsheets actually do have some database-like access APIs, so you can probably do what you want:
http://googleajaxsearchapi.blogspot.com/2008/03/introducing-latest-ajax-api-google.html
https://developers.google.com/chart/interactive/docs/querylanguage
http://blog.ouseful.info/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/

Related

Extract list from range Google Sheets

I have some data from workplaces with some different work areas, I need to extract a list for each workplace with their corresponding availables working areas, I have an example of some kind of attempt really close what I wanted. I use this formula but with more data will be long time to do it =IF(D2=$G$1, "Yes", "No"). I want to do it more automatic with some formulas but I don't know where to start.
Give a try on below formula. Put the formula to G1 cell then drag down as needed.
=TRANSPOSE(IFERROR(FILTER($D$2:$D$16,$A$2:$A$16=F2,$D$2:$D$16<>""),""))

AWS GroundTruth text labeling - hide columns in the data, and checking quality of answers

I am new to SageMaker. I have a large csv dataset which I would like labelled:
sentence_id
sentence
pre_agreed_label
148392
A sentence
0
383294
Another sentence
1
For each sentence, I would like a) a yes/no binary classification in response to a question, and b) on a scale of 1-3, how obvious the classification was. I need the sentence id to map to other parts of the dataset, and will use the pre-agreed labels to assess accuracy.
I have identified SageMaker GroundTruth labelling jobs as a possible way to do this. Is this the best way? In trying to set it up I have run into a few problems.
The first problem is I can't find a way to display only the sentence column to the labellers, hiding the sentence_id and pre_agreed_labels.
The second is that there is either single labelling or multi labelling, but I would like a way to have two sets of single-selection labels:
Select one for binary classification:
Yes
No
Select one for difficulty of classification:
Easy
Medium
Hard
It seems as though this can be done using custom HTML, but I don't know how to do this - the template it gives you doesn't even render
Finally, having not used mechanical turk before, are there ways of ensuring people take the work seriously and don't just select random answers? I can see there's an option to have x number of people answer the same question, but is there also a way to put in an obvious question to which we already have a 'pre_agreed_label' every nth question, and kick people off the task if they get it wrong? There also appears to be a maximum of $1.20 per task which seems odd.

How to model data for race results in PowerBi?

I have data where a constant set of athletes compete in the same race every month. Each get a position 1st, 2nd... etc
I was wondering what visualization to choose to see the position results for rach race through time. I was thinking a sanke diagram such that each destination column would represent a single race results and the results would always be ordered from top down 1st 2nd... respectively. see below:
You can see that Bue got 2nd place in Race 1 and 2nd place in Race 2. Also, Purple 1st in Race 1 but had a bad lunch before the race and didn't do so well.
I haven't been able to adapt current resources to a sanke in this way.
Is this possible?
Is there another visualization that can accomplish the same idea?
How should the data be structured for this chart to work?
Thanks so much;
You can certainly do this with the Sankey Chart visual:
However, you'll probably need to drag and drop the bars to get the order you want and manually set the colors how you want (not great if you need this fully automated).
This is how I set up the data:
Edit:
A simple line chart will be easier to automate.
The data format is more intuitive too.

How can I resolve INDEX MATCH errors caused by discrepancies in the spelling of names across multiple data sources?

I've set up a Google Sheets workbook that synthesizes data from a few different sources via manual input, IMPORTHTML and IMPORTRANGE. Once the data is populated, I'm using INDEX MATCH to filter and compare the information and to RANK each data set.
Since I have multiple data inputs, I'm running into a persistent issue of names not being written exactly the same between sources, even though they're the same person. First names are the primary culprit (i.e. Mary Lou vs Marylou vs Mary-Lou vs Mary Louise) but some last names with special symbols (umlauts, accents, tildes) are also causing errors. When Sheets can't recognize a match, the INDEX MATCH and RANK functions both break down.
I'm wondering how to better unify the data automatically so my Sheet understands that each occurrence is actually the same person (or "value").
Since you can't edit the results of an IMPORTHTML directly, I've set up "helper columns" and used functions like TRIM and SPLIT to try and fix instances as I go, but it seems like there must be a simpler path.
It feels like IFS could work but I can't figure how to integrate it. Also thinking this may require a script, which I'm just beginning to study.
Here's a simplified example of what I'm trying to achieve and the corresponding errors: Sample Spreadsheet
The first tab is attempting to pull and RANK data from tabs 2 and 3. Sample formulas from the Summary tab, row 3 (Amelia Rose):
Cell B3: =INDEX('Q1 Sales'!B:B, MATCH(A3,'Q1 Sales'!A:A,0))
Cell C3: =RANK(B3,$B$2:B,1)
Cell D3: =INDEX('Q2 Sales'!B:B, MATCH(A3,'Q2 Sales'!A:A,0))
Cell E3: =RANK(D3,$D$2:D,1)
I'd be grateful for any insight on how to best index 'Q2Sales'!B3 as the correct value for 'Summary'!D3. Thanks in advance - the thoughtful answers on Stack Overflow have gotten me this far!
to counter every possible scenario do it like this:
=ARRAYFORMULA(IFERROR(VLOOKUP(LOWER(REGEXREPLACE(A2:A, "-|\s", )),
{REGEXEXTRACT(LOWER(REGEXREPLACE('Q2 Sales'!A2:A, "-|\s", )),
TEXTJOIN("|", 1, LOWER(REGEXREPLACE(A2:A, "-|\s", )))), 'Q2 Sales'!B2:B}, 2, 0)))

SAS gmap and label centering

I have a Problem where I could not find an easy solution and I am looking for some ideas or tipps.
I am working with SAS on a project which result should be a map of europe, where the countries get colored after a certain algorithm. I use the maps.europe data and the %annomac and %maplabel macros to label the countries.
This works pretty fine, except for Portugal and Spain - because theese countries have island far away from the coastside, the calculated centroid from %maplabel of the country is not in the center of the country:
Unfortunately I just can cut portugal completely out of the map but not the Islands
I have tried already this method:
Try to cut the parts of the map via gproject which contains the islands - this delievered unexplaniable results to me (just showing some parts of europe, even if I set the parameters extremly wide)
and now I am a bit stuck.
I already thought about this ideas:
Comnbining the map.europe with the map.spain and and map.portugal where I delete the islands before, but I am not sure how to do that that the labeling and all still works for theese combined data.
Is it possible to set the label points for portugal and spain manually and overwrite the data from the %maplabel macro?
Or is there an even easier solution?
Thanks for your help and best regards
stephan
I'm not familiar with those macros, but given how GMAP works, I would indeed override the annotate dataset. You may want to read up on how annotate datasets work, but in general:
The GMAP statement will have an option, annotate= and some dataset. Find that dataset, let's say it's called ANNODS.
Then look at that dataset. Identify a row that has function=text and label=PORTUGAL. That is the row you need to modify the x/y coordinates of in order to move the label around (x1 and y1). You might need to play around with this some to get the right coordinates.
Then run the PROC GMAP, and you should have a newly moved-over Portugal.