I am trying to graph a series of data sets which all have different x values with Google Charts. It seems however that all the line and scatter charts only support 1 x values for all series.
The data sets I have are basically csv strings like these:
Set1:
x1, x2, x3, x4
y1, y2, y3, y4
Set2:
x5, x6, x7, x8
y5, y6, y7, y8
The x's may be the same in each series, or they may differ.
I would prefer to just be able to throw the x and y values into the chart instead of going through all the data to find the x values and then make sure each set has the same x values.
Is it possible?
Yes. You can use the same 'X' value multiple times.
var data = new google.visualization.DataTable();
data.addColumn('number', 'X');
data.addColumn('number', 'Set 1');
data.addColumn('number', 'Set 2');
data.addRows([
[x1, y1, null],
[x2, y2, null],
[x3, y3, null],
[x4, y4, null],
[x5, null, y5],
[x6, null, y6],
[x7, null, y7],
[x8, null, y8],
]);
Alternatively, you can make them all part of the same series (if you want one color). So long as you aren't connecting them with lines, you won't be able to tell the difference. You can have the same X value a dozen times if you'd like, with separate points for Y for each one.
I would write a function to turn your CSV values in to a format like the above.
Related
Input File
AAAAAA this is some content.
This is AAAAAA some more content BBBBBB. BBBBBB BBBBBB
This is yet AAAAAA some more BBBBBB BBBBBB BBBBBB content.
I can accomplish this partially with this code:
awk '/AAAAAA/{gsub("AAAAAA", "x"++i)}1' test.txt > test1.txt
awk '{for(x=1;x<=NF;x++)if($x~/BBBBBB/){sub(/BBBBBB/,"y"++i)}}1' test1.txt
Output:
x1 this is some content.
This is x2 some more content y1. y2 y3
This is yet x3 some more y4 y5 y6 content.
Anyway to get this output?
Expected Output:
x1 this is some content.
This is x2 some more content y1. y2 y3
This is yet x3 some more y1 y2 y3 content.
another one
$ awk '{sub("AAAAAA","x"(++x)); y=0; while(sub("BBBBBB","y"(++y)));}1' file
x1 this is some content.
This is x2 some more content y1. y2 y3
This is yet x3 some more y1 y2 y3 content.
You may use this single awk:
awk '{
j=0
for (x=1; x<=NF; x++)
if ($x ~ /^A{6}/)
sub(/^A{6}/, "x" (++i), $x)
else if ($x ~ /^B{6}/)
sub(/^B{6}/, "y" (++j), $x)
} 1' file
x1 this is some content.
This is x2 some more content y1. y2 y3
This is yet x3 some more y1 y2 y3 content.
Just need to reset i=0 after each loop:
awk '/AAAAAA/{gsub("AAAAAA", "x"++i)}1' test.txt > test1.txt
awk '{for(x=1;x<=NF;x++)if($x~/BBBBBB/){sub(/BBBBBB/,"y"++i)}{i=0}}1' test1.txt
Output:
x1 this is some content.
This is x2 some more content y1. y2 y3
This is yet x3 some more y1 y2 y3 content.
Here is an alternate awk that is easily extended to add as many tags as you wish:
awk 'BEGIN{ rep["AAAAAA"]="x"; cnts["AAAAAA"]=1; reset["AAAAAA"]=0
rep["BBBBBB"]="y"; cnts["BBBBBB"]=1; reset["BBBBBB"]=1
# and so on...
}
{
for (e in rep) {
cnts[e]=(reset[e]) ? reset[e] : cnts[e]
while ( sub(e,rep[e] (cnts[e]++) ) )
; # empty statement since work is inside the while
}
} 1' file
Prints:
x1 this is some content.
This is x2 some more content y1. y2 y3
This is yet x3 some more y1 y2 y3 content.
Why is the result of the differentiation not 2*x0 in the following code:
In [54]: import sympy
In [55]: x = [sympy.Symbol('x%d' % i, real=True) for i in range(3)]
In [56]: x
Out[56]: [x0, x1, x2]
In [57]: sympy.diff('x0*x0 + x1*x1 + x2*x2',x[0])
Out[57]: 0
First, the creation of multiple numbered symbols is simpler with
x = sympy.symbols('x0:3', real=True) # returns (x0, x1, x2)
Second, the SymPy function to turn a string into a SymPy expression is sympify. This function is called automatically when you provide input as a string; however, this gives you no control over the interpretation of the string, and "unexpected results" are likely.
In this case, SymPy is not sure that "x0" appearing in the string is the same as x0 you created earlier. After all, your x0 has the additional property of being real, and the symbol from the string has no such assumptions on it. It's Symbol('x0') vs Symbol('x0', real=True); not a match.
This is one of many reasons why throwing a string in a SymPy function is a bad idea. Use sympify, and read about its parameters which control the parsing of input. Specifically, locals parameter is a dictionary mapping pieces of the string to objects you already have in SymPy, precisely what is needed here.
locals = {'x{}'.format(i): x[i] for i in range(3)} # {'x0': x0, 'x1': x1, 'x2': x2}
expr = sympy.sympify('x0*x0 + x1*x1 + x2*x2', locals=locals)
Now you can differentiate expr with respect to any symbols and get expected results
[expr.diff(sym) for sym in x] # [2*x0, 2*x1, 2*x2]
(Another benefit of having an expression before trying diff is that you can invoke diff as a method of the expression, saving the trouble of typing sympy. prefix.)
In your declarations, you should use sympy.symbols that is the reference method (from the documentation and tutorial) to declare variables.
x = [sympy.symbols('x%d' % i, real=True) for i in range(3)]
On top of this, you must pick (from experimentations that I made) either a string in both arguments, as:
sympy.diff('x0*x0 + x1*x1 + x2*x2',str(x[0]))
or symbolic expressions on both sides:
sympy.diff(x[0]*x[0] + x[1]*x[1] + x[2]*x[2], x[0])
Trying to convert the following durations into seconds
x <- "1005d 16h 09m 57s"
x1 <- "16h 09m 57s"
x2 <- "06h 09m 57s"
x3 <- "09m 57s"
x4 <- "57s"
I've modified the answer from Jthorpe in this post Convert factor of format Hh Mm Ss to time duration.
days <- as.numeric(gsub('^*([0-9]+)d.*$','\\1',x3))
hours <- as.numeric(gsub('^.*([0-9][0-9])h.*$','\\1',x3))
minutes <- as.numeric(gsub('^.*([0-9][0-9])m.*$','\\1',x4))
seconds <- as.numeric(gsub('^.*([0-9][0-9])s.*$','\\1',x4))
duration_seconds <- seconds + 60*minutes + 60*60*hours + 24*60*60*days
However, this is only working with x, but not x1-x4. Now, I know I can probably use if logic to get around the issue, but is there a better way?
Thanks in advance.
We can change the space character (\\s+) with + using gsub, then we can replace 'd', 'h', 'm', 's' with gsubfn and loop through the output and evaluate the string.
library(gsubfn)
v2 <- gsubfn("[a-z]", list(d="*24*60*60", h = "*60*60", m = "*60",
s="*1"), gsub("\\s+", "+", v1))
unname(sapply(v2, function(x) eval(parse(text=x))))
#[1] 86890197 58197 22197 597 57
data
v1 <- c(x, x1, x2, x3, x4)
Use:
ifelse(is.na(your_exp),0)
So that whenever na is the output of your expression it becomes 0.
Eg:
days <- ifelse(is.na(as.numeric(gsub('^*([0-9]+)d.*$','\\1',x1))),0)
hours <- ifelse(is.na(as.numeric(gsub('^.*([0-9][0-9])h.*$','\\1',x1))),0)
minutes <- ifelse(is.na(as.numeric(gsub('^.*([0-9][0-9])m.*$','\\1',x1))),0)
seconds <- ifelse(is.na(as.numeric(gsub('^.*([0-9][0-9])s.*$','\\1',x1))),0)
Output:(after duration_seconds <- seconds + 60*minutes + 60*60*hours + 24*60*60*days)
> duration_seconds
[1] 58197
I have a dataset, In which one column has a values in the format of [A-Z][A-Z][0-1][0-9][0-1][0-1][0-1][0-9][0-9] ie, AC1200019
Now I want to convert this format to [A-Z][A-Z][-][0-1][0-9][-][0-1][0-1][0-1][-][0-9][0-9] ie, AC-12-000-19
([A-Z][A-Z])([0-1][0-9])([0-1][0-1][0-1])([0-9][0-9])
Try this.Replace by $1-$2-$3-$4 or \\1-\\2-\\3-\\4.See demo.
https://regex101.com/r/uK9cD8/5
Try
gsub('^([A-Z]{2})([0-1][0-9])([0-1]{3})([0-9]{2})', '\\1-\\2-\\3-\\4', str1)
#[1] "AC-12-000-19"
data
str1 <- 'AC1200019'
Assuming the entire column has the same number of characters, here a simple version.
library(stringr)
x <- data.frame(X1 = c("AC1510018", "AC1200019", "BT1801007"))
paste(str_sub(x$X1,1,2), str_sub(x$X1,3,4),
str_sub(x$X1,5,7), str_sub(x$X1,8,9) , sep= "-")
I like the dplyr suite so here a version using dplyr and tidyr:
library(dplyr)
library(tidyr)
x %>%
separate(X1, into = c("X2", "X3", "X4", "X5"), sep = c(2,4,7)) %>%
unite("X1", X2, X3, X4, X5, sep="-")
or
x %>%
transmute(X2 = paste(str_sub(X1,1,2), str_sub(X1,3,4),
str_sub(X1,5,7), str_sub(X1,8,9) , sep= "-"))
I am trying to plot yearly data on a geochart. I would like the most recent data on top, but for whatever reason, the earliest year is always on top in the actual visualization.
I have tried re-ordering the table to have the latest years as the first entries in the data with no effect.
I thought that maybe it was happening because I used a view to filter my data, but the filter is not reordering the items with the older ones first (so that shouldn't impact how it is displayed).
I do not want to filter out data since I use transparency to display all points. Here is some sample code that displays the same problem:
function drawVisualization() {
var data = new google.visualization.DataTable();
data.addColumn('number', 'Latitude');
data.addColumn('number', 'Longitude');
data.addColumn('number', 'Color');
data.addColumn('number', 'Output (MW)');
data.addRows([
[35, 135, 2, 334],
[35, 135, 1, 100],
[35.1, 135.1, 1, 100],
[35.1, 135, 1, 100],
[35, 135.1, 1, 100],
[34.9, 134.9, 1, 100],
[34.9, 135, 1, 100],
[35, 135.1, 1, 100],
]);
var geochart = new google.visualization.GeoChart(
document.getElementById('visualization'));
geochart.draw(data, {
colorAxis: {
'minValue': 1,
'maxValue': 2,
'values': [1, 2],
'colors': ['black','red'],
},
'markerOpacity': 0.5,
'region': 'JP'
});
}
I can change the values in column 2 or 3 (0-indexed), or I can change the order of the entries in to the data table, but I keep getting the same result. I have a feeling it always sticks bigger sized values in the back so you can still see the little values, but I'm wondering if there is any authoritative reference on it, or any way to get around it.
This is what it looks like no matter what I do:
What I want it to look like is as follows (manipulated the SVG manually to adjust the Z-order):
I played around with it for a bit, and I think you're right: it's automatically z-indexing the markers in size-order. If I read your intent correctly, you are looking to show some subset of years, and you want the markers to be z-indexed by years. I think you can accomplish that with some custom filtering: sort your data by location and year, then for every location, filter out every year with a smaller size than any of the newer years. Something like this should work:
// order by location and year (descending)
var rows = data.getSortedRows([0, 1, {column: 2, desc: true}]);
// parse the rows backwards, removing all years where a location has a newer year with a larger size value
// we don't need to parse row 0, since that will always be the latest year for some location
var size, lat, long;
for (var i = rows.length - 1; i > 0; i--) {
size = data.getValue(rows[i], 3);
lat = data.getValue(rows[i], 0);
long = data.getValue(rows[i], 1);
for (var j = i - 1; j >= 0 && lat == data.getValue(rows[j], 0) && long == data.getValue(rows[i], 1); j--) {
if (size < data.getValue(rows[j], 3)) {
rows.splice(i, 1);
break;
}
}
}
var view = new google.visualization.DataView(data);
view.setRows(rows);
Here's a working example based on your code: http://jsfiddle.net/asgallant/36AmD/
You are correct that the order of the markers is determined by the size, with the larger markers drawn first so they end up below the smaller markers, which is a convenience for most applications. If you wish to hide 'later' markers based on order, you'll have to do that another way, perhaps by hiding the rows of data.
Is there a reason it makes sense to hide data if it covers 'earlier' data? Perhaps an option could be added to disable this automatic reordering, especially if transparent colors are used to allow you to see through.
Try this, helped me in a project:
setTimeout(function () {
$('.google-visualization-table').css("z-index", "1");
}, 500);