Understanding & Converting ThinkScripts CompoundValue Function - technical-indicator

I'm currently converting a ThinkScript indicator to C#, however, I've run into this CompoundValue function and I'm unsure how to covert it.
The documents reads :
Calculates a compound value according to following rule: if a bar
number is greater than length then the visible data value is returned,
otherwise the historical data value is returned. This function is used
to initialize studies with recursion.
Example Use:
declare lower;
def x = CompoundValue(2, x[1] + x[2], 1);
plot FibonacciNumbers = x;
My interpretation:
Based on description and example. It appears we are passing a calculation in x[1] + x[2] and it performing this calculation on the current bar and the previous bar (based on first param of 2). I'm unsure what the parameter 1 is for.
My Question:
Please explain what this function is actually doing. If possible, please illustrate how this method works using pseudo-code.

For the TLDR; crowd, some simple code that hopefully explains what the CompoundValue() function is trying to do, and which might help in converting it's functionality:
# from: Chapter 12. Past/Future Offset and Prefetch
# https://tlc.thinkorswim.com/center/reference/thinkScript/tutorials/Advanced/Chapter-12---Past-Offset-and-Prefetch
# According to this tutorial, thinkScript uses the highest offset, overriding
# all lower offsets in the script - WOW
declare lower;
# recursive addition using x[1] is overridden by 11 in the plot for
# Average(close, 11) below; SO `x = x[1] + 1` becomes `x = x[11] + 1`
def x = x[1] + 1;
# using CompoundValue, though, we can force the use of the *desired* value
# arguments are:
# - length: the number of bars for this variable's offset (`1` here)
# - "visible data": value to use IF VALUES EXIST for a bar (a calculation here)
# - "historical data": value to use IF NO VALUE EXISTS for a bar (`1` here)
def y = CompoundValue(1, y[1] + 1, 1);
# *plotting* this Average statement will change ALL offsets to 11!
plot Average11 = Average(close, 11);
# `def`ing the offset DOES NOT change other offsets, so no issue here
# (if the `def` setup DID change the offsets, then `x[1]` would
# become `x[14]`, as 14 is higher than 11. However, `x[1]` doesn't change.
def Average14 = Average(close, 14);
plot myline = x;
plot myline2 = y;
# add some labels to tell us what thinkScript calculated
def numBars = HighestAll(BarNumber());
AddLabel(yes, "# Bars on Chart: " + numBars, Color.YELLOW);
AddLabel(yes, "x # bar 1: " + GetValue(x, numBars), Color.ORANGE);
AddLabel(yes, "x # bar " + numBars + ": " + x, Color.ORANGE);
AddLabel(yes, "y # bar 1: " + GetValue(y, numBars), Color.LIGHT_ORANGE);
AddLabel(yes, "y # bar " + numBars + ": " + y, Color.ORANGE);
Now, some, er, lots of details...
First, a quick note on "offset" values:
thinkScript, like other trading-related languages, uses an internal looping system. This is like a for loop, iterating through all the "periods" or "bars" on a chart (eg, 1 bar = 1 day on a daily chart; 1 bar = 1 minute on a 1 minute intraday chart, etc). Every line of code in thinkScript is run for each and every bar in the chart or length of time specified in the script.
As noted by the OP, x[1] represents an offset of one bar before the current bar the loop is processing. x[2] represents two bars before the current bar, and so on. Additionally, it's possible to offset into the future by using negative numbers: x[-1] means one bar ahead of the current bar, for example.
These offsets work similarly to the for loop in C#, except they're backwards: x[0] in C# would represent the current x value, as it would in thinkScript; however, moving forward in the loop, x[1] would be the next value, and x[-1] wouldn't exist because, well, there is no past value before 0. (In general, of course! One can definitely loop with negative numbers in C#. The point is that positive offset indices in thinkScript represent past bars, while negative offset indices in thinkScript represent future bars - not the case in C#.)
Also important here is the concept of "length": in thinkScript, length parameters represent the distance you want to go - like the offset, but a range instead of one specific bar. In my example code above, I used the statement plot Average11 = Average(close, 11); In this case, the 11 parameter represents plotting the close for a period of 11 bars, ie, offsets x[0] through x[10].
Now, to explain the CompoundValue() function's purpose:
The Chapter 12. Past/Future Offset and Prefetch thinkScript tutorial explains that thinkScript actually overrides smaller offset or length values with the highest value in a script. What that means is that if you have two items defined as follows:
def x = x[1] + 1;
plot Average11 = Average(close, 11);
thinkScript will actually override the x[1] offset with the higher length used in the Average statement - therefore causing x[1] to become x[11]!
Yike! That means that the specified offsets, except the highest offset, mean nothing to thinkScript! So, wait a minute - does one have to use all the same offsets for everything, then? No! This is where CompoundValue() comes in...
That same chapter explains that CompoundValue() allows one to specify an offset for a variable that won't be changed, even if a higher offset exists.
The CompoundValue() function, with parameter labels, looks like this:
CompoundValue(length, "visible data", "historical data")
As the OP noted, this isn't really particularly clear. Here's what the parameters represent:
length: the offset number of bars for this variable.
In our example, def x = x[1] + 1, there is a 1 bar offset, so our statement starts as CompoundValue(length=1, ...). If instead, it was a larger offset, say 14 bars, we'd put CompoundValue(length=14, ...)
"visible data": the value or calculation thinkScript should perform if DATA IS AVAILABLE for the current bar.
Again, in our example, we're using a calculation of x[1] + 1, so CompoundValue(length=1, "visible data"=(x[1] + 1), ...). (Parentheses around the equation aren't necessary, but may help with clarity.)
"historical data": the value to use if NO DATA IS AVAILABLE for the current bar.
In our example, if no data is available, we'll use a value of 1.
Now, in thinkScript, parameter labels aren't required if the arguments are in order and/or defaults are supplied. So, we could write this CompoundValue statement like this without the labels:
def y = CompoundValue(1, y[1] + 1, 1);
or like this with the labels:
def y = CompoundValue(length=1, "visible data"=(y[1] + 1), "historical data"=1);
(Note that parameter names containing spaces have to be surrounded by double quotes. Single-word parameter names don't need the quotes. Also, I've placed parens around the equation just for the sake of clarity; this is not required.)
In summary: CompoundValue(...) is needed to ensure a variable uses the actual desired offset/number of bars in a system (thinkScript) that otherwise overrides the specified offsets with a higher number if present.
If all the offsets in a script are the same, or if one is using a different programming system, then CompoundValue() can simply be broken down into its appropriate calculations or values, eg def x = x[1] + 1 or, alternatively, an if/else statement that fills in the values desired at whatever bars or conditions are needed.

Please let me provide two equivalent working versions of the code in thinkscript itself. We use this approach to prove equivalence by subtracting the equivalent outputs from each other - the result should be 0.
# The original Fibonacci code with a parameter "length" added.
# That parameter is the first parameter of the CompoundValue function.
declare lower;
def length = 2;
def x = CompoundValue(length, x[1] + x[2], 1);
# plot FibonacciNumbers = x;
# Equivalent code using the `if` statement:
def y;
if(BarNumber() > length){
# Visible data. This is within the guarded branch of the if statement.
# Historical data y[1] (1 bar back) and y[2] (2 bars back) is available
y = y[1] + y[2];
}else{
# Not enough historical data so we use the special case satisfying the
# original rule.
y = 1;
}
plot FibonacciNumbersDiff = y - x;
Thinkscript "recursion" is a somewhat inflated term. The function name CompoundValue is not very helpful so it may create confusion.
The version using the if statement is more useful in general because when walking through the time series of bars, we often need a program structure with multiple nested if statements - this cannot be done with the CompoundValue function. Please see my other articles which make use of this in the context of scanning.
In Java, using the same structure, it looks like this:
int size = 100;
int length = 2;
int[] values = new int[size];
for(int index = 1; index < size; index++){
if(index > length){
values[index] = values[index - 1] + values[index - 2];
}else{
values[index] = 1;
}
}
The fundamental difference is the for loop which is not present in the thinkscript code. thinkscript provides the loop in a kind of inversion of control where it executes user code multiple times, once for each bar.

Related

Can't understand how it works (ThinkScript code)

I'm currently converting a ThinkScript indicator to python, however, I've run into this piece of code and I'm kinda confused on how it works:
input rollingPeriodMinutes = 60;
def factor = (SecondsFromTime(Market_Open_Time) / (60 * rollingPeriodMinutes) / 100);
def rolloverTime = if factor == Round(factor) then 1 else 0;
rec H1 = compoundValue(1, if !rolloverTime then if high > H1[1] then high else H1[1] else high, high);
rec H = compoundValue(1, if rolloverTime then H1[1] else H[1], high);
I can't really understand what is stored at the end in the variable "H". Can you help me understand?
Any help is really appraciated!! Thanks
input rollingPeriodMinutes = 60;
declares (defines) and sets a variable, rollingPeriodMinutes, to a default value of 60. The input declaration indicates that the user will be able to alter this value in the thinkorswim settings for this script.
def factor = (SecondsFromTime(Market_Open_Time) / (60 * rollingPeriodMinutes) / 100);
declares and sets a variable, factor to a calculated value. This uses the rollingPeriodMinutes value, above, as well as the SecondsFromTime function and a Market_Open_Time variable that must have been set elsewhere in the script.
def rolloverTime = if factor == Round(factor) then 1 else 0;
declares and sets a variable, rolloverTime to a boolean based on the if statement. This uses the factor variable above (1 is true and 0 is false in thinkscript).
rec H1 = compoundValue(1, if !rolloverTime then if high > H1[1] then high else H1[1] else high, high);
rec H = compoundValue(1, if rolloverTime then H1[1] else H[1], high);
rec is actually the same as def and has been obsoleted. Previously, it specifically declared a recursive variable; now one would just use def regardless. See the notes below for more information.
CompoundValue is an easy statement in thinkscript, but complicated to understand from the Learning Center reference.
In short, the declarations for H and H1 are saying 'going back 1 bar: if no data is present, then use the if statement to determine a value; else if data is present, then use the high value.
Broken out, the algorithm for H1 (where high is a reserved word for the high price for a given bar) could look like:
let numBarsBack = 1
if (data is present for the bar at numBarsBack) then
if (!rolloverTime == true) then
if high > (H1 value one bar previous) then H1 = high
else H1 = (H1 value one bar previous)
else H1 = high // thinkscript sometimes requires a "default" in `if` statements, even if there's no 3rd possible value
else (if rolloverTime == true) then H1 = high
else (if data is not present for the bar at numBarsBack) then H1 = high
*** See my complete description of how CompoundValue works in thinkscript at the SO question "Understanding & Converting ThinkScripts CompoundValue Function".***
Notes:
SecondsFromTime, according to the current thinkscript Learning Center reference looks like:
SecondsFromTime ( int fromTime);
Description
Returns the number of seconds from the specified time (24-hour clock notation) in the EST timezone. Note that this function always returns zero when chart's aggregation period is greater than or equal to 1 day.
Input parameters
Parameter Default value Description
fromTime - Defines time from which seconds are counted, in the HHMM format, 24-hour clock notation.
The Learning Center reference for rec says this:
rec
Notice: this is an article about an obsolete thinkScript® notation. Although rec variables are still supported by thinkScript®, they can be completely replaced by def.
Syntax
rec
Description
Enables you to reference a historical value of a variable that you are calculating in the study or strategy itself. Rec is short for "recursion".
Example
rec C = C[1] + volume;
plot CumulativeVolume = C;
This example plots the cumulative volume starting from the beginning of the time period.
and, finally:
Remember that thinkscript code is executed for every bar in a selected period. Ie, if you're looking at 10 days with a daily period, there will be a bar for each of the 10 days; and the script will run a loop, repeating the code for each of those 10 bars. As a result, the variables will have appropriate values for each bar.
Although the OP is wanting to convert a script to Python, if someone comes here interested in how thinkscript works, there are tricks to keep a value constant for an entire script (though this section of code does not include examples for that). For information on how to do this in thinkscript, see my answer to SO question "thinkscript - How to create a variable that retains its value".

How to perform rolling window calculations without SSC packages

Goal: perform rolling window calculations on panel data in Stata with variables PanelVar, TimeVar, and Var1, where the window can change within a loop over different window sizes.
Problem: no access to SSC for the packages that would take care of this (like rangestat)
I know that
by PanelVar: gen Var1_1 = Var1[_n]
produces a copy of Var1 in Var1_1. So I thought it would make sense to try
by PanelVar: gen Var1SumLag = sum(Var1[(_n-3)/_n])
to produce a rolling window calculation for _n-3 to _n for the whole variable. But it fails to produce the results I want, it just produces zeros.
You could use sum(Var1) - sum(Var1[_n-3]), but I also want to be able to make the rolling window left justified (summing future observations) as well as right justified (summing past observations).
Essentially I would like to replicate Python's ".rolling().agg()" functionality.
In Stata _n is the index of the current observation. The expression (_n - 3) / _n yields -2 when _n is 1 and increases slowly with _n but is always less than 1. As a subscript applied to extract values from observations of a variable it always yields missing values given an extra rule that Stata rounds down expressions so supplied. Hence it reduces to -2, -1 or 0: in each case it yields missing values when given as a subscript. Experiment will show you that given any numeric variable say numvar references to numvar[-2] or numvar[-1] or numvar[0] all yield missing values. Otherwise put, you seem to be hoping that the / yields a set of subscripts that return a sequence you can sum over, but that is a long way from what Stata will do in that context: the / is just interpreted as division. (The running sum of missings is always returned as 0, which is an expression of missings being ignored in that calculation: just as 2 + 3 + . + 4 is returned as 9 so also . + . + . + . is returned as 0.)
A fairly general way to do what you want is to use time series operators, and this is strongly preferable to subscripts as (1) doing the right thing with gaps (2) automatically working for panels too. Thus after a tsset or xtset
L0.numvar + L1.numvar + L2.numvar + L3.numvar
yields the sum of the current value and the three previous and
L0.numvar + F1.numvar + F2.numvar + F3.numvar
yields the sum of the current value and the three next. If any of these terms is missing, the sum will be too; a work-around for that is to return say
cond(missing(L3.numvar), 0, L3.numvar)
More general code will require some kind of loop.
Given a desire to loop over lags (negative) and leads (positive) some code might look like this, given a range of subscripts as local macros i <= j
* example i and j
local i = -3
local j = 0
gen double wanted = 0
forval k = `i'/`j' {
if `k' < 0 {
local k1 = -(`k')
replace wanted = wanted + L`k1'.numvar
}
else replace wanted = wanted + F`k'.numvar
}
Alternatively, use Mata.
EDIT There's a simpler method, to use tssmooth ma to get moving averages and then multiply up by the number of terms.
tssmooth ma wanted1=numvar, w(3 1)
tssmooth ma wanted2=numvar, w(0 1 3)
replace wanted1 = 4 * wanted1
replace wanted2 = 4 * wanted2
Note that in contrast to the method above tssmooth ma uses whatever is available at the beginning and end of each panel. So, the first moving average, the average of the first value and the three previous, is returned as just the first value at the beginning of each panel (when the three previous values are unknown).

thinkscript if function useless in important case

The thinkscript if function fails to branch as expected in an important case. The following test case can be used to reproduce this severe bug / defect.
In a nutshell, an if statement may normally be used to prevent a function call from being executed if one of its function parameters is invalid. We show that this is not the case. In fact, both branches are executed, including the branch not meeting the if condition.
This absolutely defeats the purpose of the test of the if condition, the test that every if statement in every language has.
Following is some sample code that shows the problem on a chart. The result can be seen by clicking on the "i" message icon blinking in the left top corner of the chart:
Folding: 'from' cannot be greater than 'to': 1 > -1.
# Get the current offset from the right edge from BarNumber()
# BarNumber(): The current bar number. On a chart, we can see that the number increases
# from left 1 to number of bars e.g. 140 at the right edge.
def barNumber = BarNumber();
def barCount = HighestAll(barNumber);
# rightOffset: 0 at the right edge, i.e. at the rightmost bar,
# increasing from right to left.
def rightOffset = barCount - barNumber;
# This script gets the minimum value from data in the offset range between startIndex
# and endIndex. It serves as a functional but not direct replacement for the
# GetMinValueOffset function where a dynamic range is required. Expect it to be slow.
script getMinValueBetween {
input data = low;
input startIndex = 0;
input endIndex = 0;
plot minValue = fold index = startIndex to endIndex with minRunning = Double.POSITIVE_INFINITY do Min(GetValue(data, index), minRunning);
}
# Call this only once at the last bar.
script buildConditions {
input startIndex = 1;
input endIndex = -1;
# Since endIndex < startIndex, getMinValueBetween() should never
# be executed. However it is executed nevertheless.
plot minValue = if (endIndex > startIndex) then getMinValueBetween(low, startIndex, endIndex) else close[startIndex];
}
plot scan;
if (rightOffset == 0) {
scan = buildConditions();
} else {
scan = 0;
}
declare lower;
The question has the answer in its first sentence.
One might contemplate using the if statement (vs the if function). However, that is broken as demonstrated in
thinkscript if statement failure
At least as of April 2021, the documentation for the if reserved word says:
... while the if-expression always calculates both then and else branches, the if-statement only calculates the branch defined by whether the condition is true or false.
(bolding and italics mine)
Definitely confusing and unexpected behavior!

C++ Trapezoidal Integration Function Returning Negative Numbers when it shouldn't

I am using the following function written in C++, whose purpose is to take the integral of one array of data (y) with respect to another (x)
// Define function to perform numerical integration by the trapezoidal rule
double trapz (double xptr[], double yptr[], int Npoints)
{
// The trapzDiagFile object and associated output file are how I monitor what data the for loop actually sees.
std::ofstream trapzDiagFile;
trapzDiagFile.open("trapzDiagFile.txt",std::ofstream::out | std::ofstream::trunc);
double buffer = 0.0;
for (int n = 0; n < (Npoints - 1); n++)
{
buffer += 0.5 * (yptr[n+1] + yptr[n]) * (xptr[n+1] - xptr[n]);
trapzDiagFile << xptr[n] << "," << yptr[n] << std::endl;
}
trapzDiagFile.close();
return buffer;
}
I validated this function for the simple case where x contains 100 uniformly spaced points from 0 to 1, and y = x^2, and it returned 0.33334, as it should.
But when I use it for a different data set, it returns -3.431, which makes absolutely no sense. If you look in the attached image file, the integral I am referring to is the area under the curve between the dashed vertical lines.
It's definitely a positive number.
Moreover, I used the native trapz command in MATLAB on the same set of numbers and that returned 1.4376.
In addition, I translated the above C++ trapz function into MATLAB, line for line as closely as possible, and again got 1.4376.
I feel like there's something C++ related I'm not seeing here. If it is relevant, I am using minGW-w64.
Apologies for the vagueness of this post. If I knew more about what kind of issue I am seeing, it would be easier to be concise about it.
Plot of the dataset for which the trapz function (my homemade C++ version) returns -3.431:
Please check the value of xptr[Npoints - 1]. It may be less than xptr[Npoints - 2], and was not included in the values that you output.

Solving a linear equation in one variable

What would be the most efficient algorithm to solve a linear equation in one variable given as a string input to a function? For example, for input string:
"x + 9 – 2 - 4 + x = – x + 5 – 1 + 3 – x"
The output should be 1.
I am considering using a stack and pushing each string token onto it as I encounter spaces in the string. If the input was in polish notation then it would have been easier to pop numbers off the stack to get to a result, but I am not sure what approach to take here.
It is an interview question.
Solving the linear equation is (I hope) extremely easy for you once you've worked out the coefficients a and b in the equation a * x + b = 0.
So, the difficult part of the problem is parsing the expression and "evaluating" it to find the coefficients. Your example expression is extremely simple, it uses only the operators unary -, binary -, binary +. And =, which you could handle specially.
It is not clear from the question whether the solution should also handle expressions involving binary * and /, or parentheses. I'm wondering whether the interview question is intended:
to make you write some simple code, or
to make you ask what the real scope of the problem is before you write anything.
Both are important skills :-)
It could even be that the question is intended:
to separate those with lots of experience writing parsers (who will solve it as fast as they can write/type) from those with none (who might struggle to solve it at all within a few minutes, at least without some hints).
Anyway, to allow for future more complicated requirements, there are two common approaches to parsing arithmetic expressions: recursive descent or Dijkstra's shunting-yard algorithm. You can look these up, and if you only need the simple expressions in version 1.0 then you can use a simplified form of Dijkstra's algorithm. Then once you've parsed the expression, you need to evaluate it: use values that are linear expressions in x and interpret = as an operator with lowest possible precedence that means "subtract". The result is a linear expression in x that is equal to 0.
If you don't need complicated expressions then you can evaluate that simple example pretty much directly from left-to-right once you've tokenised it[*]:
x
x + 9
// set the "we've found minus sign" bit to negate the first thing that follows
x + 7 // and clear the negative bit
x + 3
2 * x + 3
// set the "we've found the equals sign" bit to negate everything that follows
3 * x + 3
3 * x - 2
3 * x - 1
3 * x - 4
4 * x - 4
Finally, solve a * x + b = 0 as x = - b/a.
[*] example tokenisation code, in Python:
acc = None
for idx, ch in enumerate(input):
if ch in '1234567890':
if acc is None: acc = 0
acc = 10 * acc + int(ch)
continue
if acc != None:
yield acc
acc = None
if ch in '+-=x':
yield ch
elif ch == ' ':
pass
else:
raise ValueError('illegal character "%s" at %d' % (ch, idx))
Alternative example tokenisation code, also in Python, assuming there will always be spaces between tokens as in the example. This leaves token validation to the parser:
return input.split()
ok some simple psuedo code that you could use to solve this problem
function(stinrgToParse){
arrayoftokens = stringToParse.match(RegexMatching);
foreach(arrayoftokens as token)
{
//now step through the tokens and determine what they are
//and store the neccesary information.
}
//Use the above information to do the arithmetic.
//count the number of times a variable appears positive and negative
//do the arithmetic.
//add up the numbers both positive and negative.
//return the result.
}
The first thing is to parse the string, to identify the various tokens (numbers, variables and operators), so that an expression tree can be formed by giving operator proper precedences.
Regular expressions can help, but that's not the only method (grammar parsers like boost::spirit are good too, and you can even run your own: its all a "find and recourse").
The tree can then be manipulated reducing the nodes executing those operation that deals with constants and by grouping variables related operations, executing them accordingly.
This goes on recursively until you remain with a variable related node and a constant node.
At the point the solution is calculated trivially.
They are basically the same principles that leads to the production of an interpreter or a compiler.
Consider:
from operator import add, sub
def ab(expr):
a, b, op = 0, 0, add
for t in expr.split():
if t == '+': op = add
elif t == '-': op = sub
elif t == 'x': a = op(a, 1)
else : b = op(b, int(t))
return a, b
Given an expression like 1 + x - 2 - x... this converts it to a canonical form ax+b and returns a pair of coefficients (a,b).
Now, let's obtain the coefficients from both parts of the equation:
le, ri = equation.split('=')
a1, b1 = ab(le)
a2, b2 = ab(ri)
and finally solve the trivial equation a1*x + b1 = a2*x + b2:
x = (b2 - b1) / (a1 - a2)
Of course, this only solves this particular example, without operator precedence or parentheses. To support the latter you'll need a parser, presumable a recursive descent one, which would be simper to code by hand.