Correct way to standardize some climate indices - cdo-climate

I am working on the relationship between climate extremes and conflict from 2010 to 2020 and I was wondering what is the correct way to standardize the following variables :
The Warm Spell Duration Index where percentile (90) for the threshold is calculated over a reference period from 1960 to 2009 using daily maximum temperature.
The Cold Spell Duration Index where percentile (10) for the threshold is calculated over a reference period from 1960 to 2009 using daily minimum temperature.
Their standardization should account for the variability specific to each of my observational units. I thought of using the same reference period (1960 -> 2009) to calculate the mean and standard deviation but I am not sure if this is correct since the reference period is already used in the calculation of percentiles. The question may seem strange but I can't decide on the methodology... I hope someone can help me and I thank you for reading !

Related

I need to CALCULATE() the total number of hours throughout the years, but what I what I get is a ridiculous number

I need to calculate the total of hours from the table 'Time Report[HH'] reardless of the date. The problem is that the result is much higher that what it actually is (4774 vs 68K (named REALESV2)).
Also, you can see how the curve drops after december :(
picture showing just 2022
picture showing 2022 and 2023
Thank you for your help!

Test Daily Timeframe strategy on strictly defined historical 15min data

I have a strategy that calculates entry/exit criteria based on Daily timeframe, but I use 15min timeframe to actually trade against those daily values. It works fine, EXCEPT when I try to go farther back in time to backtest than the current 15min chart will show.
As a simple example, is there a way in Pinescript to:
Calculate EMA 20 cross above EMA 200 on a DAILY timeframe
Backtest trades that on a 15min timeframe go long when this occurs or short during the opposite.
I can make that work, but I cannot for example go back to Jan 1, 2014 to June 1, 2014 15-min historical data and backtest the same criteria as Tradingview just resets the timeframe to Weekly to be able to show that far back.
Any ideas are greatly appreciated as I currently do things in excel and it takes a while per symbol.
Thanks

Calculating Year-over-Year Change using ISO Year and ISO Week in power BI

My backend table has the data at a week level. It contain the current ISO year and current ISO week, as well as, the previous year's ISO year and week number that the current year's data should be compared with.
For each signup_iso_year-signup_iso_week combination, there exists only one iso_prev_year-iso_prev_yearweek combination.
The iso_prev_year, iso_prev_yearweek columns account for the offset that might occur due to certain years having 53 weeks instead of 52.
data table
(I can't embed images, so I have added a table here as well, although it has much less information than the image in 'data table').
Number_of_signups
signup_iso_year
signup_iso_week
iso_prev_year
iso_prev_yearweek
Country
grade_level
5
2020
18
2019
18
IN
middle school
7
2020
18
2019
18
US
high school
6
2021
17
2018
18
IN
middle school
8
2021
17
2018
18
US
high school
I want to calculate to Year-Over_Year Change in number_of_signups using the signup_iso_year, signup_iso_week, iso_prev_year, iso_prev_yearweek columns.
I have already tried to create a calculated column that contains the sum of number_of_signups from previous year, but since every combination of country, grade_level, subject, email_type might not exist in previous or current year, some of the values are getting lost and hence giving incorrect results.
The answer I am looking for, is a Power BI measure that can give me the YOY change based on signup_iso_year and signup_iso_week.
Edit: I should have mentioned this before, but I forgot. The table contains data from 2018 to current day. So, the data size is quite large. Also, I need this YoY measure for a time series visual, which means that I can't assign ISOyear/ISOweek values for previous year using simple MAX/MIN functions. It needs to pick values from the iso_prev_year, iso_prev_yearweek columns but since EARLIER function can't be used in a measure, I am not able to figure out how to do that.
Which is why I had tried to create a calculated column, and use the EARLIER function to compute previous year's number_of_signups. But because of the other columns present in the data, i.e., country, grade_level, subject, email_type, there were discrepancies occurring in the actual number_of_signups and the calculated previous_year_number_of_signups. These discrepancies were due to the fact that not every combination of these columns exists for each week, so we might miss out on some data when calculating previous_year_number_of_signups.
Edit 2: Was asked to include examples of what the expected result would look like, so adding some pictures.
YoY at overall level, country level, grade level
YOY at country+grade level
If I understand your requirement correct, you need a Measure like below. Remember, this may not the exact one you need, but this will definitely help you to reach your required output.
prev_signup =
var iso_prev_year = MIN(your_table_name[iso_prev_year])
var iso_prev_year_week = MIN(your_table_name[iso_prev_yearweek])
RETURN
CALCULATE(
SUM(your_table_name[Number_of_signups]),
FILTER(
ALL(your_table_name),
your_table_name[signup_iso_year] = iso_prev_year
&& your_table_name[signup_iso_week] = iso_prev_year_week
)
) + 0
You can also do some transformation in Power Query Editor and join the same table using your Key columns. That case, you can bring previous year's value in the same row. Rest is just compare 2 columns from your table to calculate YOY

Why does powerbi average/day calculation become inaccurate over longer timespan?

I am trying to create a power BI calculation as an average per day of how many times a code was tripped. The dax calculcation that I have
Count Trips average per Day =
AVERAGEX(
KEEPFILTERS(VALUES('ruledata (2)'[Date].[Day])),
CALCULATE([Count Trips])
)
works fine when the average is being calculated over a couple days. I double checked this by hand and can confirm that it is accurate until at least 2 weeks. However once the time range increases to include months the average starts getting ridiculous and begins displaying average trips /day values that are much higher than the highest number of trips on a single day. I have confirmed that the data in the graph is correct
So I know that the values should reflect what is in the graph. In this two month example the DAX formula calculated the average to be 149.03 but the actual average/day should have been 82.8.
In general is there some sort of error in the DAX formula that I am using that could cause this?
I guess that 'ruledata (2)'[Date].[Day] is the day of the month. So if you take the average it will be wrong because when you take the average of e.g. March and April you will divide the total trips by 31, and not by 61 (30+31). So use 'ruledata (2)'[Date].[Date] instead.

Stata: Groupwise regressions and ranking

I am currently developing a sentiment index using Google search frequencies taken from Google Trends.
I am using Stata 12 on Windows.
My approach is as following:
I downloaded approx ~150 business-related search queries from Googletrends from Jan 2004 to Dec 2013
I now want to construct an index using the 30 at that point in time most relevant queries related to the market I observe
To achieve that I want to use monthly expanding backward rolling regressions of each query on the market
Thus I need to regress 150 items one-by-one on the market 120 times (12 months x 10 years), using different time windows and then extract the 30 queries with the most negative t-test.
To exemplify the procedure, if I would want to construct the sentiment for January 2010 I would regress the query terms on the market during the period from Jan 2004 to December 2009 and then extract the 30 queries with the most negative t-statistic.
Now I am looking for a way to make this as automatized as possible. I guess should be able to run the 150 items at once, and I can specify the time window using the time stamps. Using Excel commands and creating a do-file with all the regression commands in it (which would be quite large) I could probably create the regressions relatively efficiently (although it depends on how much Stata can handle - any experience on that?).
What I would need to make the data extraction much easier is a command which I can use to rank the results of the regression according to their t-statistics. Does someone have an efficient approach to this? Or has general advice?
If you are using Stata, once you run a ttest, you can type return list and you will get scalars that stata stores. Once you run a loop you can store these values in a number of different ways. check out the post command.