Line chart with a lot of data: skip labels

Line chart with a lot of data: skip labels - chart.js

I'd like to draw a line chart with data for a year. Total number of points will be about 264 pieces.
I'd like to organize labels on x-axis not for every point. Say, in a month I have about 22 points, and I'd like to have 3 labels per month.
Could you help me: what is the most elegant way to do that?

Use an array that stores all values you have (if you want to update your chart, display other values or to reduce AJAX calls). Then use another array with the data you want to display. You need an appropriate function to copy and filter your old array.
Here I use every fifth element:
let delta = 5
let displayedData = []
for (let i = 0; i < allData.length; i=i+delta) {
displayedData.push(allData[i]);
}
You can calculate the delta in a different way or use a completely different approach to get your data.
Side note: don't use .filter(), you don't want to lop through all your data. Use a direct access like I did above.

Related

How to get y axis range in Stata

Suppose I am using some twoway graph command in Stata. Without any action on my part Stata will choose some reasonable values for the ranges of both y and x axes, based both upon the minimum and maximum y and x values in my data, but also upon some algorithm that decides when it would be prettier for the range to extend instead to a number like '0' instead of '0.0139'. Wonderful! Great.
Now suppose that after (or while) I draw my graph, I want to slap some very important text onto it, and I want to be choosy about precisely where the text appears. Having the minimum and maximum values of the displayed axes would be useful: how can I get these min and max numbers? (Either before or while calling the graph command.)
NB: I am not asking how to set the y or x axis ranges.

Since this issue has been a bit of a headache for me for quite some time and I believe there is no good solution out there yet I wanted to write up two ways in which I was able to solve a similar problem to the one described in the post. Specifically, I was able to solve the issue of gray shading for part of the graph using these.
Define a global macro in the code generating the axis labels This is the less elegant way to do it but it works well. Locate the tickset_g.class file in your ado path. The graph twoway command uses this to draw the axes of any graph. There, I defined a global macro in the draw program that takes the value of the omin and omax locals after they have been set to the minimum between the axis range and data range (the command that does this is local omin = min(.scale.min,omin) and analogously for the max), since the latter sometimes exceeds the former. You could also define the global further up in that code block to only get the axis extent. You can then access the axis range using the globals after the graph command (and use something like addplot to add to the previously drawn graph). Two caveats for this approach: using global macros is, as far as I understand, bad practice and can be dangerous. I used names I was sure wouldn't be included in any program with the prefix userwritten. Also, you may not have administrator privileges that allow you to alter this file based on your organization's decisions. However, it is the simpler way. If you prefer a more elegant approach along the lines of what Nick Cox suggested, then you can:
Use the undocumented gdi natscale command to define your own axis labels The gdi commands are the internal commands that are used to generate what you see as graph output (cf. https://www.stata.com/meeting/dcconf09/dc09_radyakin.pdf). The tickset_g.class uses the gdi natscale command to generate the nice numbers of the axes. Basic documentation is available with help _natscale, basically you enter the minimum and maximum, e.g. from a summarize return, and a suggested number of steps and the command returns a min, max, and delta to be used in the x|ylabel option (several possible ways, all rather straightforward once you have those numbers so I won't spell them out for brevity). You'd have to adjust this approach in case you use some scale transformation.
Hope this helps!

I like Nick's suggestion, but if you're really determined, it seems that you can find these values by inspecting the output after you set trace on. Here's some inefficient code that seems to do exactly what you want. Three notes:
when I import the log file I get this message:
Note: Unmatched quote while processing row XXXX; this can be due to a formatting problem in the file or because a quoted data element spans multiple lines. You should carefully inspect your data after importing. Consider using option bindquote(strict) if quoted data spans multiple lines or option bindquote(nobind) if quotes are not used for binding data.
Sometimes the data fall outside of the min and max range values that are chosen for the graph's axis labels (but you can easily test for this).
The log linesize is actually important to my code below because the key values must fall on the same line as the strings that I use to identify the helpful rows.
* start a log (critical step for my solution)
cap log close _all
set linesize 255
log using "log", replace text
* make up some data:
clear
set obs 3
gen xvar = rnormal(0,10)
gen yvar = rnormal(0,.01)
* turn trace on, run the -twoway- call, and then turn trace off
set trace on
twoway scatter yvar xvar
set trace off
cap log close _all
* now read the log file in and find the desired info
import delimited "log.log", clear
egen my_string = concat(v*)
keep if regexm(my_string,"forvalues yf") | regexm(my_string,"forvalues xf")
drop if regexm(my_string,"delta")
split my_string, parse("=") gen(new)
gen axis = "vertical" if regexm(my_string,"yf")
replace axis = "horizontal" if regexm(my_string,"xf")
keep axis new*
duplicates drop
loc my_regex = "(.*[0-9]+)\((.*[0-9]+)\)(.*[0-9]+)"
gen min = regexs(1) if regexm(new3,"`my_regex'")
gen delta = regexs(2) if regexm(new3,"`my_regex'")
gen max_temp= regexs(3) if regexm(new3,"`my_regex'")
destring min max delta , replace
gen max = min + delta* int((max_temp-min)/delta)
*here is the info you want:
list axis min delta max

AmCharts4: amCore.percent() not working as intended

I have the following graph:
"1" is a LineSeries, and "2" is a columnSeries. I set the width of the columnSeries like this:
series.columns.template.width = am4core.percent(90);
But as you can see, the columns are far away from 90% width.
Interestingly, without the LineSeries, it looks like this, which is what I want it to look like:
Furthermore, if I write a very high value (80 000) instead of 90, I get the desired columns:
I noticed that the dateAxis behaves differently(different time showing), but I cannot see where this is coming from.
Also, this high value for percentage is not a solution, because it has different widths on different graphs

So after experimenting I found out that the only way to avoid something like this is to use an own dateaxis for columns.
If, for example, you want a line+column, you can add a dateAxis to the line, and a dateAxis to the column.
If you have multiple columns, you could either use one dateAxis for all of them, or give one to every single column. The latter will make the columns "stack", so they are on the exact same positions if one would use the same dates for both datasets.
Furthermore, doing so means that you have to disable labeling on all dateaxes except for one, otherwise the labels on the xAxis will stack.

AmCharts4: Multiple data arrays

In my application, I can view multiple charts at the same time. Those charts can be either displayed seperated or overlayed. Overlayed means, that those charts all are in the same chart-div, like this:
The problem is: if i want to add the data, i need to build an array out of all those datapoints, looking like this:
let data=[
{
date:...,
data1:...,
data2:...,
.....
},
{
....
}
]
But when i load it from the server, I have seperate data arrays which i have to merge. That means, I have to look at every single datapoint and create a new one in the "merged-array" if there is no datapoint at the given date.
Doing this with thousands of datapoints uses way too much resources. My question is: is it possible to supply multiple data arrays instead of only 1?

Using series.data instead of chart.data solved the problem for me!

spotfire plot list of elements

I have a data table that has this format :
and I want to plot temperature to time, any idea how to do that ?

This can be done in a TERR data function. I don't know how comfortable you are integrating Spotfire with TERR, there is an intro video here for instance (demo starts from about minute 7):
https://www.youtube.com/watch?v=ZtVltmmKWQs
With that in mind, I wrote the script without loading any library, so it is quite verbose and explicit, but hopefully simpler to follow step by step. I am sure there is a more elegant way, and there are better ways of making it flexible with column names, but this is a start.
Your input will be a data table (dt, the original data) and the output a new data table (dt.out, the transformed data). All column names (and some values) are addressed explicitly in the script (so if you change them it won't work).
#remove the []
dt$Values=gsub('\\[|\\]','',dt$Values)
#separate into two different data frames, one for time and one for temperature
dt.time=dt[dt$Description=='time',]
dt.temperature=dt[dt$Description=='temperature',]
#split the columns we want to separate into a list of vectors
dt2.time=strsplit(as.character(dt.time$Values),',')
dt2.temperature=strsplit(as.character(dt.temperature$Values),',')
#rearrange times
names(dt2.time)=dt.time$object
dt2.time=stack(dt2.time) #stack vectors
dt2.time$id=c(1:nrow(dt2.time)) #assign running id for merging later
colnames(dt2.time)[colnames(dt2.time)=='values']='time'
#rearrange temperatures
names(dt2.temperature)=dt.temperature$object
dt2.temperature=stack(dt2.temperature) #stack vectors
dt2.temperature$id=c(1:nrow(dt2.temperature)) #assign running id for merging later
colnames(dt2.temperature)[colnames(dt2.temperature)=='values']='temperature'
#merge time and temperature
dt.out=merge(dt2.time,dt2.temperature,by=c('id','ind'))
colnames(dt.out)[colnames(dt.out)=='ind']='object'
dt.out$time=as.numeric(dt.out$time)
dt.out$temperature=as.numeric(dt.out$temperature)
Gaia

because all of the example rows you've shown here contain exactly four list items and you haven't specified otherwise, I'll assume that all of the data fits this format.
with this assumption, it becomes pretty trivial, albeit a little messy, to split the values out into columns using the RXReplace() expression function.
you can create four calculated columns, each with an expression like:
Int(RXReplace([values],"\\[([\\d\\-]+),([\\d\\-]+),([\\d\\-]+),([\\d\\-]+)]","\\1",""))
the third argument "\\1" determines which number in the list to extract. backslashes are doubled ("escaped") per the requirements of the RXReplace() function.
note that this example assumes the numbers are all whole numbers. if you have decimals, you'd need to adjust each "phrase" of the regular expression to ([\\d\\-\\.]+), and you'd need to wrap the expression in Real() rather than Int() (if you leave this part out, the result will be a String type which could cause confusion later on when working with the data).
once you have the four columns, you'll be able to unpivot to get the data easily.

Maths Operations on Columns from Different Data Frames

I have two data frames, imported through Pandas from Fama French and Yahoo. I am trying to compare column values from the two data frames (more specifically, subtract one from the other), but a value error occurs whenever I try doing so. The data frames have different indexing and I don't know how to take this factor into account (I'm quite new to python & pandas).
Here is the code in question:
start, end = dt.datetime.now()-dt.timedelta(days=60*30), dt.datetime.now()
f = data.DataReader('F-F_Research_Data_Factors', 'famafrench', start, end)[0]
s = data.get_data_yahoo('aapl', start, end)
s = s.resample('M', how='last')
s['returns'] = s['Adj Close'].pct_change()
Ideally, I would like to create a series with row values = f['RF'] - s['returns']
Any help would be much appreciated.

Convert f.index
f.index = f.index.to_datetime() + pd.offsets.MonthEnd()
f['RF'] - s['returns']

Ask yourself, how you could possibly define a difference between two matrices when they have a different size?
First thing to do, is to match the two dataframes on a commmon value (say the date). Then you will be able to do any operation you want

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js