Adding values from multiple .rrd file - rrdtool

Problem =====>
Basically there are three .rrd which are generated for three departments.
From that we fetch three values (MIN, MAX, CURRENT) and print ins 3x3 format. There is a python script which does that.
eg -
Dept1: Min=10 Max=20 Cur=15
Dept2: Min=0 Max=10 Cur=5
Dept3: Min=10 Max=30 Cur=25
Now I want to add the values together (Min, Max, Cur) and print in one line.
eg -
Dept: Min=20 Max=60 Cur=45
Issue I am facing =====>
No matter what CDEF i write, I am breaking the graph. :(
This is the part I hate as i do not get any error message.
As far as I understand(please correct me if i am wrong) I definitely cannot store the value anywhere in my program as a graph is returned.
What would be a proper way to add the values in this condition.
Please let me know if my describing the problem is lacking more detail.

You can do this with a VDEF over a CDEF'd sum.
DEF:a=dept1.rrd:ds0:AVERAGE
DEF:b=dept2.rrd:ds0:AVERAGE
DEF:maxa=dept1.rrd:ds0:MAXIMUM
DEF:maxb=dept2.rrd:ds0:MAXIMUM
CDEF:maxall=maxa,maxb,+
CDEF:all=a,b,+
VDEF:maxalltime=maxall,MAXIMUM
VDEF:alltimeavg=all,AVERAGE
PRINT:maxalltime:Max=%f
PRINT:alltimeavg:Avg=%f
LINE:all#ff0000:AllDepartments
However, you should note that, apart form at the highest granularity, the Min and Max totals will be wrong! This is because max(a+b) != max(a) + max(b). If you dont calculate the min/max aggregate at time of storage, the granularity will be gone at time of display.
For example, if a = (1, 2, 3) and b = (3, 2, 1), then max(a) + max(b) = 6; however the maximum at any point in time is in fact 4. The same issue applies to using min(a) + min(b).

Related

Name of List with maximum value

I am new to python. Using it with grasshopper.
I have 5 lists, each actually with 8760 items for which i have found max values at each index "but I also need to know which list the value came from at any given index."
I would put a simple example to explain myself better.
For 2 lists
A = [5,10,15,20,25]
B = [4,9,16,19,26]
Max value per index = [5,10,16,20,26]
What I want is something like
Max value per index = [5(A), 10(A), 16(B), 20(A), 26(B)]
Or something along the line that can relate. I am not sure whether its possible.
I would really appreciate the help. Thank you.
This can be adapted to N lists.
[(max(a),a.index(max(a))) for a in list(zip(A,B))]
The .index(max(a)) gets the index at which the max(a) occurs.
The output for your example is
[(5, 0), (10, 0), (16, 1), (20, 0), (26, 1)]
Of course, if both A and B share the same value, then the index will be the first one found, A.
See https://docs.python.org/3.3/library/functions.html for description of very useful zip built-in function.

PowerBI empty values row not displayed

I have a confusing mystery...
Simple DIVIDE formula works correctly. However blank rows are not displayed.
I attempted a different method using IF, and now the blank row is correctly displayed.
However this line is only displayed if I include the IF formula (which gives a zero value I don't want).
Formula 1:
Completion % =
DIVIDE(SUM(Courses[Completed]),SUM(Courses[Attended]),BLANK())
Formula 2:
Completion % with IF =
IF(SUM(Courses[Attended])=0,0,DIVIDE(SUM(Courses[Completed]),SUM(Courses[Attended])))
With only the DIVIDE formula:
Including the IF formula:
It appears that Power BI is capable of showing this row without error, but only if I inlude the additional IF formula. I'm guessing it's because there is now a value (0) to display.
However I want to be able show all courses, including those that have no values, without the inaccurate zero value.
I don't understand why the table doesn't include these lines. Can anyone explain/help?
The point is very simple, by default Power BI shows only elements for which there is at least one non-blank measure.
The DIVIDE operator under-the-hood execute the following:
IF(ISBLANK(B), BLANK(), A / B))
You can change its behaviour by defining the optimal parameter in order to show 0 instead of BLANK:
DIVIDE(A, B, 0) will be translated in the following:
IF(ISBLANK(B), 0, A/B))
Proposed solution
Those mentioned avobe might all be possible solutions to your problem, however, my personal suggestion is to simply enable the option "show item with no data" in your visualization.
While DIVIDE(A, B, 0) will return zero when when B is zero or blank, I think a blank A will still return a blank.
One possibility is to simply append +0 (or prepend 0+) to your measure so that it always returns a numeric value.
DIVIDE ( SUM ( Courses[Completed] ), SUM ( Courses[Attended] ) ) + 0

Generating rolling z-scores of panel data in Stata

I have an unbalanced panel data set (countries and years). For simplicity let's say I have one variable, x, that I am measuring. The panel data sorted first by country (a 3-digit numeric country-code) and then by year. I would like to write a .do file that generates a new variable, z_x, containing the standardized values of the variable x. The variables should be standardized by subtracting the mean from the preceding (exclusive) m time periods, and then dividing by the standard deviation from those same time periods. If this is not possible, return a missing value.
Currently, the code I am using to accomplish this is the following (edited now for clarity)
xtset weocountrycode year
sort weocountrycode year
local win_len = 5 // Defining rolling window length.
quietly: rolling sd_x=r(sd) mean_x=r(mean), window(`win_len') saving(stats_x, replace): sum x
use stats_x, clear
rename end year
save, replace
use all_data_PROCESSED_FINAL.dta, clear
quietly: merge 1:1 (weocountrycode year) using stats_x
replace sd_x = . if `x'[_n-`win_len'+1] == . | weocountrycode[_n-`win_len'+1] != weocountrycode[_n] // This and next line are for deleting values that rolling calculates when I actually want missing values.
replace mean_`x' = . if `x'[_n-`win_len'+1] == . | weocountrycode[_n-`win_len'+1] != weocountrycode[_n]
gen z_`x' = (`x' - mean_`x'[_n-1])/sd_`x'[_n-1] // calculate z-score
UPDATE:
My struggle with rolling is that when rolling is set up to use a window length 5 rolling mean, it automatically does window length 1,2,3,4 means for the first, second, third and fourth entries (when there are not 5 preceding entries available to average out). In fact, it does this in general - if the first non-missing value is on entry 5, it will do a length 1 rolling average on entry 5, length 2 rolling average on entry 6, ..... and then finally start doing length 5 moving averages on entry 9. My issue is that I do not want this, so I would like to avoid performing these calculations. Until now, I have only been able to figure out how to delete them after they are done, which is both inefficient and bothersome.
I tried adding an if clause to the -rolling- statement:
quietly: rolling sd_x=r(sd) mean_x=r(mean) if x[_n-`win_len'+1] != . & weocountrycode[_n-`win_len'+1] != weocountrycode[_n], window(`win_len') saving(stats_x, replace): sum x
But it did not fix the problem and the output is "weird" in the sense that
1) If `win_len' is equal to, say, 10, there are 15 missing values in the resulting z_x variable, instead of 9.
2) Even though there are "extra" missing values in z_x, the observations still start out as window length 1 means, then window length 2 means, etc. which makes no sense to me.
Which leads me to believe I fundamentally don't understand 1) what -rolling- is doing and 2) how an if clause works in the context of -rolling-.
Does this help?
Thanks!
I'm not sure I understand completely but I'll try to answer based on what I think your problem is, and based on a comment by #NickCox.
You say:
... when rolling is set up to use a window length 5 rolling mean...
if the first non-missing value is
on entry 5, it will do a length 1 rolling average on entry 5, length 2
rolling average on entry 6, ...
This is expected. help rolling states:
The window size refers to calendar periods, not the number of
observations. If there
are missing data (for example, because of weekends), the actual number of observations used by command may be less than
window(#).
It's not actually doing a "length 1 rolling average", but I get to that later.
Below some examples to see what rolling does:
clear all
set more off
*-------------------------- example data -----------------------------
set obs 92
gen dat = _n - 1
format dat %tq
egen seq = fill(1 1 1 1 2 2 2 2)
tsset dat
tempfile main
save "`main'"
list in 1/12, separator(4)
*------------------- Example 1. None missing ------------------------
rolling mean=r(mean), window(4) stepsize(4) clear: summarize seq, detail
list in 1/12, separator(0)
*------- Example 2. All but one value, missing in first window ------
use "`main'", clear
replace seq = . in 1/3
list in 1/8
rolling mean=r(mean), window(4) stepsize(4) clear: summarize seq, detail
list in 1/12, separator(0)
*------------- Example 3. All missing in first window --------------
use "`main'", clear
replace seq = . in 1/4
list in 1/8
rolling mean=r(mean), window(4) stepsize(4) clear: summarize seq, detail
list in 1/12, separator(0)
Note I use the stepsize option to make things much easier to follow. Because the date variable is in quarters, I set windowsize(4) and stepsize(4) so rolling is just computing averages by year. I hope that's easy to see.
Example 1 does as expected. No problem here.
Example 2 on the other hand, should be more interesting for you. We've said that what matters are calendar periods, so the mean is computed for the whole year (four quarters), even though it contains missings. There are three missings and one non-missing. summarize is computing the mean over the whole year, but summarize ignores missings, so it just outputs the mean of non-missings, which in this case is just one value.
Example 3 has missings for all four quarters of the year. Therefore, summarize outputs . (missing).
Your problem, as I understand it, is that when you face a situation like Example 2, you'd like the output to be missing. This is where I think Nick Cox's advice comes in. You could try something like:
rolling mean=r(mean) N=r(N), window(4) stepsize(4) clear: summarize seq, detail
replace mean = . if N != 4
list in 1/12, separator(0)
This says: if the number of non-missings for the window (r(N), also computed by summarize), is not the same as the window size, then replace it with missing.

For loop using a t-stat function to create a list

I am using the following function to calculate the t-stat for data in data frame (x):
wilcox.test.all.genes<-function(x,s1,s2) {
x1<-x[s1]
x2<-x[s2]
x1<-as.numeric(x1)
x2<-as.numeric(x2)
wilcox.out<-wilcox.test(x1,x2,exact=F,alternative="two.sided",correct=T)
out<-as.numeric(wilcox.out$statistic)
return(out)
}
I need to write a for loop that will iterate a specific number of times. For each iteration, the columns need to be shuffled, the above function performed and the maximum t-stat value saved to a list.
I know that I can use the sample() function to shuffle the columns of the data frame, and the max() function to identify the maximum t-stat value, but I can't figure out how to put them together to achieve a workable code.
You are trying to generate empiric p-values, corrected for the multiple comparisons you are making because of the multiple columns in your data. First, let's simulate an example data set:
# Simulate data
n.row = 100
n.col = 10
set.seed(12345)
group = factor(sample(2, n.row, replace=T))
data = data.frame(matrix(rnorm(n.row*n.col), nrow=n.row))
Calculate the Wilcoxon test for each column, but we will replicate this many times while permuting the class membership of the observations. This gives us an empiric null distribution of this test statistic.
# Re-calculate columnwise test statisitics many times while permuting class labels
perms = replicate(500, apply(data[sample(nrow(data)), ], 2, function(x) wilcox.test(x[group==1], x[group==2], exact=F, alternative="two.sided", correct=T)$stat))
Calculate the null distribution of the maximum test statistic by collapsing across the multiple comparisons.
# For each permuted replication, calculate the max test statistic across the multiple comparisons
perms.max = apply(perms, 2, max)
By simply sorting the results, we can now determine the p=0.05 critical value.
# Identify critical value
crit = sort(perms.max)[round((1-0.05)*length(perms.max))]
We can also plot our distribution along with the critical value.
# Plot
dev.new(width=4, height=4)
hist(perms.max)
abline(v=crit, col='red')
Finally, comparing a real test statistic to this distribution will give you an empiric p-value, corrected for multiple comparisons by controlling the family-wise error to p<0.05. For example, let's pretend a real test stat was 1600. We could then calculate the p-value like:
> length(which(perms.max>1600))/length(perms.max)
[1] 0.074

Time based rotation

I'm trying to figure out the best way of doing the following:
I have a list of values: L
I'd like to pick a subset of this list, of size N, and get a different subset (if the list has enough members) every X minutes.
I'd like the values to be picked sequentially, or randomly, as long as all the values get used.
For example, I have a list: [google.com, yahoo.com, gmail.com]
I'd like to pick X (2 for this example) values and rotate those values every Y(60 for now) minutes:
minute 0-59: [google.com, yahoo.com]
minute 60-119: [gmail.com, google.com
minute 120-179: [google.com, yahoo.com]
etc.
Random picking is also fine, i.e:
minute 0-59: [google.com, gmail.com]
minute 60-119: [yahoo.com, google.com]
Note: The time epoch should be 0 when the user sets the rotation up, i.e, the 0 point can be at any point in time.
Finally: I'd prefer not to store a set of "used" values or anything like that, if possible. i.e, I'd like this to be as simple as possible.
Random picking is actually preferred to sequential, but either is fine.
What's the best way to go about this? Python/Pseudo-code or C/C++ is fine.
Thank you!
You can use the itertools standard module to help:
import itertools
import random
import time
a = ["google.com", "yahoo.com", "gmail.com"]
combs = list(itertools.combinations(a, 2))
random.shuffle(combs)
for c in combs:
print(c)
time.sleep(3600)
EDIT: Based on your clarification in the comments, the following suggestion might help.
What you're looking for is a maximal-length sequence of integers within the range [0, N). You can generate this in Python using something like:
def modseq(n, p):
r = 0
for i in range(n):
r = (r + p) % n
yield r
Given an integer n and a prime number p (which is not a factor of n, making p greater than n guarantees this), you will get a sequence of all the integers from 0 to n-1:
>>> list(modseq(10, 13))
[3, 6, 9, 2, 5, 8, 1, 4, 7, 0]
From there, you can filter this list to include only the integers that contain the desired number of 1 bits set (see Best algorithm to count the number of set bits in a 32-bit integer? for suggestions). Then choose the elements from your set based on which bits are set to 1. In your case, you would use pass n as 2N if N is the number of elements in your set.
This sequence is deterministic given a time T (from which you can find the position in the sequence), a number N of elements, and a prime P.