I am attempting to use collect_list to collect arrays (and maintain order) from two different data frames.
Test_Data and Train_Data have the same format.
from pyspark.sql import functions as F
from pyspark.sql import Window
w = Window.partitionBy('Group').orderBy('date')
# Train_Data has 4 data points
# Test_Data has 7 data points
# desired target array: [1, 1, 2, 3]
# desired MarchMadInd array: [0, 0, 0, 1, 0, 0, 1]
sorted_list_diff_array_lens = train_data.withColumn('target',
F.collect_list('target').over(w)
)\
test_data.withColumn('MarchMadInd', F.collect_list('MarchMadInd').over(w))\
.groupBy('Group')\
.agg(F.max('target').alias('target'),
F.max('MarchMadInd').alias('MarchMadInd')
)
I realize the syntax is incorrect with "test_data.withColumn", but I want to select the array for the MarchMadInd from the test_date, but the array for the target from the train_data. The desired output would look like the following:
{"target":[1, 1, 2, 3], "MarchMadInd":[0, 0, 0, 1, 0, 0, 1]}
Context: this is for a DeepAR time series model (using AWS) that requires dynamic features to include the prediction period, but the target should be historical data.
The solution involves using a join as recommended by pault.
Create a dataframe with dynamic features of length equal to Training + Prediction period
Create a dataframe with target values of length equal to just the Training period.
Use a LEFT JOIN (with the dynamic feature data on LEFT) to bring these dataframes together
Now, using collect_list will create the desired result.
Amazon did a great job with the monitoring in OpsWorks (see screenshot). You can point at any time in any of the area charts and see all values for all charts at that time.
Is it possible to achieve something similar with the Google Visualisation API?
I also have multiple (stacked) area charts and it's a pain to point at each datapoint to get the exact value. Some of them are overlapping or very close together.
You can't trigger the tooltips in all of the charts at the same time, but if you disable the built-in tooltips, you can achieve something similar by building out your tooltips in HTML and populating them manually in a "onmouseover" event handler:
function mouseOverHandler (e) {
// use e.row, e.column to find data and populate your tooltips
}
function mouseOutHandler (e) {
// clear the tooltips
}
google.visualization.events.addListener(chart1, 'onmouseover', mouseOverHandler);
google.visualization.events.addListener(chart1, 'onmouseout', mouseOutHandler);
google.visualization.events.addListener(chart2, 'onmouseover', mouseOverHandler);
google.visualization.events.addListener(chart2, 'onmouseout', mouseOutHandler);
// etc...
In your stacked area chart (assuming you do not replace the tooltips with a custom solution), you can set the focusTarget option to 'category' to make all values at a given x-axis value show up in the tooltip (works only within one chart, not across charts).
You can also cheat by putting all three charts in the same chart element with a little trickery (and some limitations). For instance, you can make the chart like this:
Here is the code for that (dummy data):
function drawVisualization() {
// Some raw data (not necessarily accurate)
var data = new google.visualization.DataTable();
data.addColumn('number', 'time');
data.addColumn('number', 'used');
data.addColumn('number', 'cached');
data.addColumn('number', 'free');
data.addColumn('number', 'user');
data.addColumn('number', 'system');
data.addColumn('number', 'io wait');
data.addColumn('number', '1 min');
data.addColumn('number', '5 min');
data.addColumn('number', '15 min');
data.addRows([
[1, {v:0.1, f:'10%'},{v:0.55, f:'45%'},{v:1, f:'45%'},{v:1.01, f:'0.15 GiB'},{v:1.83, f:'12.45 GiB'},{v:1.18, f:'2.7 GiB'},{v:2.28166561658701, f:'28.2%'},{v:2.38024858239246, f:'38.0%'},{v:2.42249842488051, f:'42.2%'}],
[2, {v:0.2, f:'20%'},{v:0.6, f:'40%'},{v:1, f:'40%'},{v:1.54, f:'8.1 GiB'},{v:1.47, f:'7.05 GiB'},{v:1.77, f:'11.55 GiB'},{v:2.53503269167234, f:'53.5%'},{v:2.74904576834128, f:'74.9%'},{v:2.4119751725877, f:'41.2%'}],
[3, {v:0.3, f:'30%'},{v:0.65, f:'35%'},{v:1, f:'35%'},{v:1.13, f:'1.95 GiB'},{v:1.15, f:'2.25 GiB'},{v:1.75, f:'11.25 GiB'},{v:2.73464579773048, f:'73.5%'},{v:2.85218912536736, f:'85.2%'},{v:2.80811037750353, f:'80.8%'}],
[4, {v:0.4, f:'40%'},{v:0.7, f:'30%'},{v:1, f:'30%'},{v:1.27, f:'4.05 GiB'},{v:1.86, f:'12.9 GiB'},{v:1.1, f:'1.5 GiB'},{v:2.86045009159487, f:'86.0%'},{v:2.92068159800651, f:'92.1%'},{v:2.54208355770477, f:'54.2%'}],
[5, {v:0.5, f:'50%'},{v:0.75, f:'25%'},{v:1, f:'25%'},{v:1.23, f:'3.45 GiB'},{v:1.12, f:'1.8 GiB'},{v:1.88, f:'13.2 GiB'},{v:2.89980619585711, f:'90.0%'},{v:2.8728120099814, f:'87.3%'},{v:2.75583720451997, f:'75.6%'}],
[6, {v:0.6, f:'60%'},{v:0.8, f:'20%'},{v:1, f:'20%'},{v:1.5, f:'7.5 GiB'},{v:1.78, f:'11.7 GiB'},{v:1.26, f:'3.9 GiB'},{v:2.84876005903125, f:'84.9%'},{v:2.66203284604438, f:'66.2%'},{v:2.63657004427344, f:'63.7%'}],
[7, {v:0.7, f:'70%'},{v:0.85, f:'15%'},{v:1, f:'15%'},{v:1.91, f:'13.65 GiB'},{v:1.26, f:'3.9 GiB'},{v:1.69, f:'10.35 GiB'},{v:2.71244021344925, f:'71.2%'},{v:2.78368423479417, f:'78.4%'},{v:2.69819140918026, f:'69.8%'}],
[8, {v:0.8, f:'80%'},{v:0.9, f:'10%'},{v:1, f:'10%'},{v:1.48, f:'7.2 GiB'},{v:1.51, f:'7.65 GiB'},{v:1.41, f:'6.15 GiB'},{v:2.50454251895529, f:'50.5%'},{v:2.59031474717769, f:'59.0%'},{v:2.33299806251049, f:'33.3%'}],
[9, {v:0.9, f:'90%'},{v:0.95, f:'5%'},{v:1, f:'5%'},{v:1.18, f:'2.7 GiB'},{v:1.53, f:'7.95 GiB'},{v:1.97, f:'14.55 GiB'},{v:2.24595415946281, f:'24.6%'},{v:2.24103507627355, f:'24.1%'},{v:2.22381828511115, f:'22.4%'}],
[10, {v:1, f:'100%'},{v:1, f:'0%'},{v:1, f:'0%'},{v:1.66, f:'9.9 GiB'},{v:1.61, f:'9.15 GiB'},{v:1.2, f:'3 GiB'},{v:2.1229770797314, f:'12.3%'},{v:2.13527478770454, f:'13.5%'},{v:2.14757249567768, f:'14.8%'}],
]);
// Create and draw the visualization.
var ac = new google.visualization.AreaChart(document.getElementById('visualization'));
ac.draw(data, {
title : 'Monthly Coffee Production by Country',
isStacked: false,
width: 600,
height: 400,
areaOpacity: 0.0,
focusTarget: 'category',
series: { 0: {areaOpacity: 0.5}, 1: {areaOpacity: 0.5}, 2: {areaOpacity: 0.5} },
vAxis: { ticks: [{v:0, f:""}, {v:0.5, f:"7.5 GiB"}, {v:1, f:"15.0 GiB"}, {v:1.5, f:"50%"}, {v:2, f:"100%"}, {v:2.5, f:"50%"}, {v:3, f:"100%"}, ] }
});
}
Basically, I put all 3 series on the same chart by putting them all as percentages of 1/3rd of the chart. So the first series is from 0-1, the second from 1-2, and the third from 2-3. I then used liberal quantities of {v:, f:} notation to make them look like different numbers (for the GiB particularly), and used the ticks option to make the axis look like it has 3 scales. Finally, I set focusTarget: 'category' so all lines get selected when you mouseover any of them.
You can format colors and even add dummy series to add thicker black lines between the series if you want to make them look more 'distinct'. You can also do some tricky stuff with dummy series and white areas and 100% opacity to potentially add background colors to higher areas. But the general concept is as outlined above, depending on what you are going for, it could work too.
I have a column chart that shows power consumption in current and previous years. Now this consumption comes from different sources, so I would like to chart these values in multiple columns that show stacked values.
I am using google chart, but setting the isStacked parameter to true in the options array just stacks every single value for a specific row. What I want to do achieve is something like this:
That is, rows with multiple stacked columns. Is this even possible with Google Chart API?
This is doable with a kludge:
function drawVisualization() {
// Create and populate the data table.
var data = google.visualization.arrayToDataTable([
['Country', 'Cars', 'Trucks'],
['', null, null],
['US', 15, 15],
['Canada', 17, 17],
['Europe', 13, 13],
['Mexico', 16, 16],
['Asia', 20, 20],
['', null, null],
['US', 15, 15],
['Canada', 17, 17],
['Europe', 13, 13],
['Mexico', 16, 16],
['Asia', 20, 20],
['', null, null],
['US', 15, 15],
['Canada', 17, 17],
['Europe', 13, 13],
['Mexico', 16, 16],
['Asia', 20, 20],
['', null, null],
['US', 15, 15],
['Canada', 17, 17],
['Europe', 13, 13],
['Mexico', 16, 16],
['Asia', 20, 20],
['', null, null],
]);
// Create and draw the visualization.
new google.visualization.ColumnChart(document.getElementById('visualization')).
draw(data,
{isStacked:true, width:800, hAxis: {showTextEvery:1, slantedText:true}}
);
}
This is probably not a great way of doing this. Basically, you are trying to make a single chart show too much information and would be far better off splitting up your data in to multiple charts. For instance, you can use domain roles, or you can just use a line chart to be able to compare the difference between trucks and cars (which you can't currently do because of the different baselines). You can also compare different countries on just cars or trucks as a standard column chart (not stacked). You can use chart interaction to allow users to pick what data they want to see. It looks like you are trying to recreate an existing Excel chart which isn't interactive with an interactive technology, so the best thing is likely to rethink how it should be used given the ability to interact due to the shift in technology.
I had some code that set my V axis scale from 0 - 4. However I deleted it and now I cannot remember how I got it working again. See below for my chart code, and the code I think I used before.
This is what I think I used before...
vAxis: {
viewWindowMode:'explicit',
viewWindow: {
max:100,
min:99.8
}
}
Below is my chart
// Create a line chart, passing some options
var LineChart = new google.visualization.ChartWrapper({
'chartType': 'LineChart',
'containerId': 'chart_div',
'options': {
//'width': 300,
'height': 300,
'legend': 'top',
'backgroundColor': '#eeeeee',
'colors': [ '#8ea23f'],
'pointSize': 5,
'title': 'Selected Site and Species abundance over time'
},
'view':{'columns':[0,2]},
});
There you go:
'vAxis': {'title': 'Something Here',
'minValue': 0,
'maxValue': 4},
There are two approaches to take, depending on what you need. The first one (shown by Tom's answer) sets alternative min and max values for the data set sent to the chart, eg. the chart interprets the maximum data value is must accommodate as MAX(vAxis.maxValue, data set max value). If the data set goes outside the bounds of vAxis.minValue/maxValue, those options will essentially be ignored. Also, the chart's actual axis range is only based on the min/max values - the values displayed will include the min/max, but might go beyond the min/max in order to produce clean intervals between axis labels.
If you need to explicitly limit the axis to a specific range, where your min and max values are the absolute limits you want displayed, then you use the vAxis.viewWindow.min/max options.
https://developers.google.com/chart/interactive/docs/gallery/linechart?hl=ja#Configuration_Options
How can I specify different colors for a single line dependent on the particular range? For example, one line should be blue from 0 to 10 and red from 10 to 20.
Short answer: you can't.
Long answer: you can build a lengthy workaround, but each step will present different issues that you may find worse than a single-colored line.
Run a loop through your data so that each color is in a different column for each series. This is the quickest way.
For instance, if you have data:
var data = google.visualization.arrayToDataTable([
['Month', 'Data'],
['2004/05', 123],
['2005/06', 234],
['2006/07', 345],
['2007/08', 456],
['2008/09', 789]
And you want to split it in to 3 colors:
<300,
300-600
600
You can write a script to make your data look like this:
var data = google.visualization.arrayToDataTable([
['Month', 'Red', 'Green', 'Blue'],
['2004/05', 123, null, null],
['2005/06', 234, null, null],
['2006/07', null, 345, null],
['2007/08', null, 456, null],
['2008/09', null, null, 789]
The above will give you colored points for each different range. If you actually want the lines connecting them to change color as soon as they cross a certain threshold, you have to calculate the intercept of 300 with whatever day, and add that to the series. Same with 600. You'd also have to change the "Month" series to actually be date values so you can set the point in between correctly. Of course, those intermediate points will show up too, which is a different headache...
You can also fiddle around with domains but those won't help you with the coloring (but will help you to connect the different points to the same series).