I am a beginning thinkscript programmer and I am learning the syntax of thinkscript pretty fast. However, I am having trouble with the if statements. I understand you can have one statement inside of an if block but is it possible to have multiple statements in an if block?
Not: if (condition) then (this) else (that);
but: if (condition) then { (this); (that);};
thinkScript essentially has three forms of if usage. All three forms require an else branch as well. One form allows for setting or plotting one or more values. The other two only allow one value to be set or plotted.
if statement: can set one or more values, for plot or def variables, within the brackets.
def val1;
plot val2;
if (cond) {
val1 = <value>;
val2 = <value>;
} else {
# commonly used options:
# sets the variable to be Not a Number
val1 = Double.NaN;
# sets the variable to what it was in the previous bar
# commonly used for recursive counting or retaining a past value across bars
val2 = val2[1];
}
if expression: all-in-one expression for setting a value. Can only set one value based on a condition, but can be used within other statements. This version is used commonly for recursively counting items across bars and for displaying different colors, say, based on a condition.
def val1 = if <condition> then <value if true> else <value if false>;
if function: similar to the above, but more compact, this is thinkScript(r)'s version of a ternary conditional statement. It's difference is that the true and false values must be double values. Therefore, it can't be used in setting colors, say, or other items that don't represent double values.
def var1 = if(<condition>, <value if true>, <value if false>);
The following example, modified from the thinkScript API doc for the if function, demonstrates using all three versions. I've added a 2nd variable to demonstrate how the if statement can set multiple values at once based on the same condition:
# using version 3, "if function"
plot Maximum1 = if(close > open, close, open);
# using version 2, "if expression"
plot Maximum2 = if close > open then close else open;
# using version 1, "if statement", with two variables, a `plot` and a `def`
plot Maximum3;
def MinimumThing;
if close > open {
Maximum3 = close;
MimimumThing = open;
} else {
Maximum3 = open;
MinimumThing = close;
}
As a side note, though the example doesn't show it, one can use the def keyword as well as the plot keyword for defining variables' values with these statements.
It is possible. Just mistaken plot variables for conditional statements since they, in turn, use conditional statements to draw out the graphics.
Related
Goal: I have a bunch of keywords I'd like to categorise automatically based on topic parameters I set. Categories that match must be in the same column so the keyword data can be filtered.
e.g. If I have "Puppies" as a first topic, it shouldn't appear as a secondary or third topic otherwise the data cannot be filtered as needed.
Example Data: https://docs.google.com/spreadsheets/d/1TWYepApOtWDlwoTP8zkaflD7AoxD_LZ4PxssSpFlrWQ/edit?usp=sharing
Video: https://drive.google.com/file/d/11T5hhyestKRY4GpuwC7RF6tx-xQudNok/view?usp=sharing
Parameters Tab: I will add words in columns D-F that change based on the keyword data set and there will often be hundreds, if not thousands, of options for larger data sets.
Categories Tab: I'd like to have a formula or script that goes down the columns D-F in Parameters and fills in a corresponding value (in Categories! columns D-F respectively) based on partial match with column B or C (makes no difference to me if there's a delimiter like a space or not. Final data sheet should only have one of these columns though).
Things I've Tried:
I've tried a bunch of things. Nested IF formula with regexmatch works but seems clunky.
e.g. this formula in Categories! column D
=IF(REGEXMATCH($B2,LOWER(Parameters!$D$3)),Parameters!$D$3,IF(REGEXMATCH($B2,LOWER(Parameters!$D$4)),Parameters!$D$4,""))
I nested more statements changing out to the next cell in Parameters!D column (as in , manually adding $D$5, $D$6 etc) but this seems inefficient for a list thousands of words long. e.g. third topic will get very long once all dog breed types are added.
Any tips?
Functionality I haven't worked out:
if a string in Categories B or C contains more than one topic in the parameters I set out, is there a way I can have the first 2 to show instead of just the first one?
e.g. Cell A14 in Categories, how can I get a formula/automation to add both "Akita" & "German Shepherd" into the third topic? Concatenation with a CHAR(10) to add to new line is ideal format here. There will be other keywords that won't have both in there in which case these values will just show up individually.
Since this data set has a bunch of mixed breeds and all breeds are added as a third topic, it would be great to differentiate interest in mixes vs pure breeds without confusion.
Any ideas will be greatly appreciated! Also, I'm open to variations in layout and functionality of the spreadsheet in case you have a more creative solution. I just care about efficiently automating a tedious task!!
Try using custom function:
To create custom function:
1.Create or open a spreadsheet in Google Sheets.
2.Select the menu item Tools > Script editor.
3.Delete any code in the script editor and copy and paste the code below into the script editor.
4.At the top, click Save save.
To use custom function:
1.Click the cell where you want to use the function.
2.Type an equals sign (=) followed by the function name and any input value — for example, =DOUBLE(A1) — and press Enter.
3.The cell will momentarily display Loading..., then return the result.
Code:
function matchTopic(p, str) {
var params = p.flat(); //Convert 2d array into 1d
var buildRegex = params.map(i => '(' + i + ')').join('|'); //convert array into series of capturing groups. Example (Dog)|(Puppies)
var regex = new RegExp(buildRegex,"gi");
var results = str.match(regex);
if(results){
// The for loops below will convert the first character of each word to Uppercase
for(var i = 0 ; i < results.length ; i++){
var words = results[i].split(" ");
for (let j = 0; j < words.length; j++) {
words[j] = words[j][0].toUpperCase() + words[j].substr(1);
}
results[i] = words.join(" ");
}
return results.join(","); //return with comma separator
}else{
return ""; //return blank if result is null
}
}
Example Usage:
Parameters:
First Topic:
Second Topic:
Third Topic:
Reference:
Custom Functions
I've added a new sheet ("Erik Help") with separate formulas (highlighted in green currently) for each of your keyword columns. They are each essentially the same except for specific column references, so I'll include only the "First Topic" formula here:
=ArrayFormula({"First Topic";IF(A2:A="",,IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) & IFERROR(CHAR(10)®EXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))))})
This formula first creates the header (which can be changed within the formula itself as you like).
The opening IF condition leaves any row in the results column blank if the corresponding cell in Column A of that row is also blank.
JOIN is used to form a concatenated string of all keywords separated by the pipe symbol, which REGEXEXTRACT interprets as OR.
IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) will attempt to extract any of the keywords from each concatenated string in Columns B and C. If none is found, IFERROR will return null.
Then a second-round attempt is made:
& IFERROR(CHAR(10)®EXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>"")))))
Only this time, REGEXREPLACE is used to replace the results of the first round with null, thus eliminating them from being found in round two. This will cause any second listing from the JOIN clause to be found, if one exists. Otherwise, IFERROR again returns null for round two.
CHAR(10) is the new-line character.
I've written each of the three formulas to return up to two results for each keyword column. If that is not your intention for "First Topic" and "Second Topic" (i.e., if you only wanted a maximum of one result for each of those columns), just select and delete the entire round-two portion of the formula shown above from the formula in each of those columns.
The thinkscript if statement fails to branch as expected in some cases. The following test case can be used to reproduce this bug / defect.
It is shared via Grid containing chart and script
To cut the long story short, a possible workaround in some cases is to use the if-expression which is a function, which may be slower, potentially leading to Script execution timeout in scans.
This fairly nasty bug in thinkscript prevents me from writing some scans and studies the way I need to.
Following is some sample code that shows the problem on a chart.
input price = close;
input smoothPeriods = 20;
def output = Average(price, smoothPeriods);
# Get the current offset from the right edge from BarNumber()
# BarNumber(): The current bar number. On a chart, we can see that the number increases
# from left 1 to number of bars e.g. 140 at the right edge.
def barNumber = BarNumber();
def barCount = HighestAll(barNumber);
# rightOffset: 0 at the right edge, i.e. at the rightmost bar,
# increasing from right to left.
def rightOffset = barCount - barNumber;
# Prepare a lookup table:
def lookup;
if (barNumber == 1) {
lookup = -1;
} else {
lookup = 53;
}
# This script gets the minimum value from data in the offset range between startIndex
# and endIndex. It serves as a functional but not direct replacement for the
# GetMinValueOffset function where a dynamic range is required. Expect it to be slow.
script getMinValueBetween {
input data = low;
input startIndex = 0;
input endIndex = 0;
plot minValue = fold index = startIndex to endIndex with minRunning = Double.POSITIVE_INFINITY do Min(GetValue(data, index), minRunning);
}
# Call this only once at the last bar.
script buildValue {
input lookup = close;
input offsetLast = 0;
# Do an indirect lookup
def lookupPosn = 23;
def indirectLookupPosn = GetValue(lookup, lookupPosn);
# lowAtIndirectLookupPosn is assigned incorrectly. The if statement APPEARS to be executed
# as if indirectLookupPosn was 0 but indirectLookupPosn is NOT 0 so the condition
# for the first branch should be met!
def lowAtIndirectLookupPosn;
if (indirectLookupPosn > offsetLast) {
lowAtIndirectLookupPosn = getMinValueBetween(low, offsetLast, indirectLookupPosn);
} else {
lowAtIndirectLookupPosn = close[offsetLast];
}
plot testResult = lowAtIndirectLookupPosn;
}
plot debugLower;
if (rightOffset == 0) {
debugLower = buildValue(lookup);
} else {
debugLower = 0;
}
declare lower;
To prepare the chart for the stock ADT, please set custom time frame:
10/09/18 to 10/09/19, aggregation period 1 day.
The aim of the script is to find the low value of 4.25 on 08/14/2019.
I DO know that there are various methods to do this in thinkscript such as GetMinValueOffset().
Let us please not discuss alternative methods of achieving the objective to find the low, alternatives for the attached script.
Because I am not asking for help achieving the objective. I am reporting a bug, and I want to know what goes wrong and perhaps how to fix it. In other words, finding the low here is just an example to make the script easier to follow. It could be anything else that one wants a script to compute.
Please let me describe the script.
First it does some smoothing with a moving average. The result is:
def output;
Then the script defines the distance from the right edge so we can work with offsets:
def rightOffset;
Then the script builds a lookup table:
def lookup;
script getMinValueBetween {} is a little function that finds the low between two offset positions, in a dynamic way. It is needed because GetMinValueOffset() does not accept dynamic parameters.
Then we have script buildValue {}
This is where the error occurs. This script is executed at the right edge.
buildValue {} does an indirect lookup as follows:
First it goes into lookup where it finds the value 53 at lookupPosn = 23.
With 53, if finds the low between offset 53 and 0, by calling the script function getMinValueBetween().
It stores the value in def lowAtIndirectLookupPosn;
As you can see, this is very simple indeed - only 38 lines of code!
The problem is, that lowAtIndirectLookupPosn contains the wrong value, as if the wrong branch of the if statement was executed.
plot testResult should put out the low 4.25. Instead it puts out close[offsetLast] which is 6.26.
Quite honestly, this is a disaster because it is impossible to predict which of any if statement in your program will fail or not.
In a limited number of cases, the if-expression can be used instead of the if statement. However the if-expression covers only a subset of use cases and it may execute with lower performance in scans. More importantly,
it defeats the purpose of the if statement in an important case because it supports conditional assignment but not conditional execution. In other words, it executes both branches before assigning one of two values.
Can someone explain to me why only the first IF ELSE statement in my code works? I am trying to combine multiple variables into one.
DATA BCMasterSet2;
SET BCMasterSet;
drop PositiveLymphNodes1;
if PositiveLymphNodes1 = "." then PositiveLymphNodes =
put(PositiveLymphNodes2, 2.);
else PositiveLymphNodes = PositiveLymphNodes1;
if PositiveLymphNodes2 = "." then new_posLymph = put(PositiveLymphNodes,
2.);
else new_posLymph = PositiveLymphNodes2;
RUN;
Here is a nice screenshot of what the incorrect output looks like:OUTPUT
Thanks!
Hard to say without seeing all of your data, but I have a suspicion: is positivelymphnodes1 character or numeric? Is it ever actually equal to "."?
If you are trying to say "if PositiveLymphNodes1 is missing", then you can say that this way:
if missing(positivelymphnodes1) then ...
You can also do the same thing using coalesce or coalescec (the latter is character, the former numeric, in its return value). It chooses the first nonmissing argument. - so if the first argument is missing, it chooses the second.
positiveLymphNodes = coalescec(PositiveLymphNodes1, put(positiveLymphNodes2,2.));
new_posLymph = coalescec(positiveLymphNodes2, put(positiveLymphNodes,2.));
I would be curious why you're using put only in one place and not the other - use it in both or neither, I would suggest.
I will present the simplified version of what I want to do. I know how to do it easily in SAS but not in Stata.
Let's say I am trying to create a "poor" binary variable = 1 if an observation is classified as poor and 0 otherwise. I want to have two classifications, one is based on real income, and another based on real consumption (these are variables in the dataset).
The SAS macro would be
%MACRO poverty_bin(type=, measure=)
DATA dataset;
SET dataset;
IF &measure. <= poverty_line THEN poor&type. = 1 ELSE poor&type. = 0;
RUN;
%MEND
%poverty_bin(type=con, measure=real_consumption);
%poverty_bin(type=inc, measure=real_income);
which should create two binary variables poor_con and poor_inc.
I have no idea how to do this in Stata. I tried doing something like this just to see if nested foreach is what I'm looking for:
foreach x of newlist con inc {
foreach y of newlist real_income real_consumption{
display "`x' and `y'"
}
}
But it gives an error message saying "variable real_income already defined"
The error message you cite implies that earlier code you do not show us created a variable real_income.
I do not know SAS but I can tell you that given a numeric variable x
gen y = x <= 42
will create a variable y with value 1 if x <= 42 and 0 otherwise.
For another such variable, use another similar statement. In Stata and perhaps any other language, setting up a nested loop or defining a program instead of making two statements directly seems overkill. For a number of new variables much larger than 2, that might not be true.
foreach v in x y {
gen new`v' = `v' <= 42
}
For completely arbitrary existing names, new names and thresholds it is likely to be easier to write out statements individually.
This is documented. See for example 13.2.2 in [U] or this FAQ.
I am using Stata 14. I have US states and corresponding regions as integer.
I want create a string variable that represents the region for each observation.
Currently my code is
gen div_name = "A"
replace div_name = "New England" if div_no == 1
replace div_name = "Middle Atlantic" if div_no == 2
.
.
replace div_name = "Pacific" if div_no == 9
..so it is a really long code.
I was wondering if there is a shorter way to do this where I can automate assigning values rather than manually hard coding them.
You can define value labels in one line with label define and then use decode to create the string variable. See the help for those commands.
If the correspondence was defined in a separate dataset you could use merge. See e.g. this FAQ
There can't be a short-cut here other than typing all the names at some point or exploiting the fact that someone else typed them earlier into a file.
With nine or so labels, typing them yourself is quickest.
Note that you type one statement more than you need, even doing it the long way, as you could start
gen div_name = "New England" if div_no == 1