DolphinDB reactive state engine: Calculate the metrics of the last 10 records in a column - state

When processing stream data with the reactive state engine, I want to count how many of the last ten rows have the same value as the current row. Is there any way?

You can use a user-defined function combined with the moving function.
(1) Define an aggregate function
defg ncount(x){
return sum(x == x.tail())
}
(2) Function ncount() is defined within function moving.
It is used as the formula in parameter mertic.
metrics =[<time>, <moving(ncount, price, 10)>]

Related

aws cloudwatch metrics - AVG over a range

I want to make an average graph of the CDCLatencySource and CDCLatencyTarget of few ARNs.
CDCLatencySource are m1,m2,m3,m4
CDCLatencyTarget are m5,m6,m7,m8
So I made another row - AVG([m1,m4]) for the Source and same for the target.
But it looks like it average only the m1 & m4 and not the whole range.
What am I missing?
You will need to include all metrics, so for your CDCLatencySource it would be AVG([m1,m2,m3,m4]).
Similarly for CDCLatencyTarget the value would be AVG([m5,m6,m7,m8])
The functions do not accept ranges, instead they accept each metric id individually in the list that is passed into the function.
More information for metric math is available here for further reading.
From the docs:
AVG The AVG of a single time series returns a scalar representing the average of all the data points in the metric. The AVG of an array of time series returns a single time series. Missing values are treated as 0.
Thus you need to provide full array of time series:
AVG([m1,m2,m3,m4])
AVG([m5,m6,m7,m8])

spotfire plot list of elements

I have a data table that has this format :
and I want to plot temperature to time, any idea how to do that ?
This can be done in a TERR data function. I don't know how comfortable you are integrating Spotfire with TERR, there is an intro video here for instance (demo starts from about minute 7):
https://www.youtube.com/watch?v=ZtVltmmKWQs
With that in mind, I wrote the script without loading any library, so it is quite verbose and explicit, but hopefully simpler to follow step by step. I am sure there is a more elegant way, and there are better ways of making it flexible with column names, but this is a start.
Your input will be a data table (dt, the original data) and the output a new data table (dt.out, the transformed data). All column names (and some values) are addressed explicitly in the script (so if you change them it won't work).
#remove the []
dt$Values=gsub('\\[|\\]','',dt$Values)
#separate into two different data frames, one for time and one for temperature
dt.time=dt[dt$Description=='time',]
dt.temperature=dt[dt$Description=='temperature',]
#split the columns we want to separate into a list of vectors
dt2.time=strsplit(as.character(dt.time$Values),',')
dt2.temperature=strsplit(as.character(dt.temperature$Values),',')
#rearrange times
names(dt2.time)=dt.time$object
dt2.time=stack(dt2.time) #stack vectors
dt2.time$id=c(1:nrow(dt2.time)) #assign running id for merging later
colnames(dt2.time)[colnames(dt2.time)=='values']='time'
#rearrange temperatures
names(dt2.temperature)=dt.temperature$object
dt2.temperature=stack(dt2.temperature) #stack vectors
dt2.temperature$id=c(1:nrow(dt2.temperature)) #assign running id for merging later
colnames(dt2.temperature)[colnames(dt2.temperature)=='values']='temperature'
#merge time and temperature
dt.out=merge(dt2.time,dt2.temperature,by=c('id','ind'))
colnames(dt.out)[colnames(dt.out)=='ind']='object'
dt.out$time=as.numeric(dt.out$time)
dt.out$temperature=as.numeric(dt.out$temperature)
Gaia
because all of the example rows you've shown here contain exactly four list items and you haven't specified otherwise, I'll assume that all of the data fits this format.
with this assumption, it becomes pretty trivial, albeit a little messy, to split the values out into columns using the RXReplace() expression function.
you can create four calculated columns, each with an expression like:
Int(RXReplace([values],"\\[([\\d\\-]+),([\\d\\-]+),([\\d\\-]+),([\\d\\-]+)]","\\1",""))
the third argument "\\1" determines which number in the list to extract. backslashes are doubled ("escaped") per the requirements of the RXReplace() function.
note that this example assumes the numbers are all whole numbers. if you have decimals, you'd need to adjust each "phrase" of the regular expression to ([\\d\\-\\.]+), and you'd need to wrap the expression in Real() rather than Int() (if you leave this part out, the result will be a String type which could cause confusion later on when working with the data).
once you have the four columns, you'll be able to unpivot to get the data easily.

Cumulative sum of AWS Cloudwatch Metric

AWS Cloudwatch receives a count of 1 every time I start an image download. I am downloading 1,000s of images (on a cluster of EC2 instances) and would like to track the total progress.
I can't find any documentation on how to plot the cumulative sum of a metric. The AWS Cloudwatch Math Expressions looked promising, but they do not have an integrate function.
Currently, I can plot the sum of the started image downloads but only for periods, as seen below. Ideally, I'd like to plot the integral of this plot:
You can get a cumulative sum over the current range by using the SUM() function that is operated over the original range containing only the number One (1). Remember, you're looking for a single number in the end, so it's not much of a graph, but you need to turn the single value sum back into a time-series.
Define m1 as your metric. This is the metric you will want to use SUM() on.
Define an expression e1 as m1/m1. This results in a time-series with every value equal to 1. This is what will allow you convert that SUM back to a time-series.
Define an expression e2 as SUM(m1) / e1. This is, effectively, the cumulative sum of m1 divided by one for every data-point in the original time-series. It will be a horizontal line on the graph, which will have every point on that horizontal line being the cumulative sum of metric m1. This is required because Cloudwatch can only plot a time-series on the chart, not a single value.
Make m1 and e1 invisible. You need them, but you don't need to see them.
Finally, change the chart type from Line to Number, since you only wanted the cumulative sum anyway.
The reason you can't use SUM() directly is because it is a single value. By dividing by a time-series containing all 1's, the entire graph is the result of the SUM(). Then, changing the chart to a Number effectively hides all the math and presents only the "final result".
Looks like RUNNING_SUM() has been added that does what your need:
Graph with RUNNING_SUM
You can find RUNNING_SUM() under "Add math"->"All functions"
You are correct. All Amazon CloudWatch metrics are for a defined period.
The maximum period for a metric is one day, so this is not suitable for a cumulative counter that you wish to continue beyond one day.
You would need to find an alternate method of storing the count, such as an Amazon DynamoDB table. Use an atomic counter via UpdateItem to increment the count.
You can also use a very long period.
Change your stat to SUM, and set your metric's period to 7 days. You'll get a time series of 1 point with the cumulative sum of all the downloads.
If you give each download a unique dimension value, you can keep your queries separate.

CouchDB: How to use array keys in Map functions when using Reduce?

I would like to write a MapReduce function in CouchDB where the Map function emitted keys as arrays, but the reduce function used only one of the values in the map key. for example:
The Map function:
function(doc) {
if (doc.type_ === 'survey') {
emit([doc.timeRecorded_, doc.imei_], 1);
};
};
The Reduce function:
function(k,v) {
// How to handle only the doc.imei_ as the value?
// Or, alternatively, how to filter based on timeRecorded_ somewhere other than the map function?
return sum(v)
}
timeRecorded_ in an EPOCH number, so there will be no duplications (except by chance). If I were to aggregate on it then it would need to be rounded to a 'day' value. Alternatively the data could be prepared in such a way that the timeRecorded_ was already rounded in the source data (maybe changed to dateRecorded_)
A well-known pattern for this problem is to split the date into an array (e.g. [year, month, day, hour, minute]; intervals could be different but the order is to be kept) and use the array as the key in the map function.
Therefore you'll be able to reduce rows according to the group_level you need (e.g. "by year", "by month", "by day", "by hour", "by minute", etc.).
Source: http://blog.couchbase.com/understanding-grouplevel-view-queries-compound-keys

DAX, IF function, use two types of data types (numeric and text)

I wonder if you can use two kind of data types in the DAX, IF formula.
I want to calculate the EPS value. If it's positive, I want to return the value.
If it's negative, I want to show "(d)" as deficit.
Example code:
=IF([EPS]<=0;"(d)";
IFERROR([EPS];BLANK()))
But I only get the following error message:
"The second and third arguments of function IF have different data types"
Is there a workaround of this or a any ideas how I can combine the text and numeric data type in the IF function?.
Power Pivot for Excel 2016 and in Power BI Desktop can have an IF() function with multiple return types. In 2013, this is not currently possible.
Try =IF([EPS]<=0,"(d)")
Note that no value has been specified for value_if_false. Therefore, the function returns the default, which is an empty string.