Concurrency - how to assign variable from separate task? - concurrency

Consider the following scenario:
|---------------------|------------------|
| Task 0 | Task 1 |
|---------------------|------------------|
| x = 1; | y = 1; |
|---------------------|------------------|
| a = y; | b = x; |
|---------------------|------------------|
What is the value of a and b at the end of the concurrent tasks? I have no idea how to approach this problem.
Each assignment executes atomically. Within tasks, the statements occur in order. Before execution, both x and y are set to 0.

You can't be certain of the values after both tasks have run even though the assignments are atomic. This is because you are still dealing with two tasks running concurrently and the values have a dependency on an interleaving of statements which can't be guaranteed. But you can certainly produce a set of possible value pairs.
For example, say Task 0 runs to completion before Task 1 is able to set y. We have {a = 0, b = 1}. The same can be said when Task 1 runs to completion before Task 0 is able to set x giving {a = 0, b = 1}, {a = 1, b = 0}. The simple example leaves only one other possibility since they both can't be 0 so the final set would be:
[{a = 0, b = 1}, {a = 1, b = 0}, {a = 1, b = 1}]
How to assign variable from separate task?
The easiest way to avoid concurrency issues is to not create them. Let the tasks return the values of x and y and isolate the mutations of a and b in sequential code. The tasks have a dependency on one another so why not pull that dependency out of the task where it's safe?

Related

Let a variable equal multiple values in an if-statement [duplicate]

This question already has an answer here:
Generating a new variable using conditional statements
(1 answer)
Closed 3 years ago.
I am doing data clean-up in Stata and I need to recode a variable to equal 1 if a whole set of other variables are equal to 1, 6, or 7.
I can do this using the code below:
replace anyadl = 1 if diffdress==1 | diffdress==6 | diffdress==7 | ///
diffwalk==1 | diffwalk==6 | diffwalk==7 | ///
diffbath==1 | diffbath==6 | diffbath==7 | ///
diffeat==1 | diffeat==6 | diffeat==7 | ///
diffbed==1 | diffbed==6 | diffbed==7 | ///
difftoi==1 | difftoi==6 | difftoi==7
However, this is very inefficient to type out and it is easy to make errors.
Is there a simpler way to do this?
For example, something along the following lines:
replace anyadl = 1 if diff* == (1 | 6 | 7)
Your fantasy syntax wouldn't do what you want even if it were legal, as for example 1|6|7 would be evaluated as 1. That is, in Stata 1 OR 6 OR 7 is in effect true OR true OR true, so true, and thus 1, given the rules non-zero is true as input and true is 1 as output. The expression is 1|6|7 is legal; it's the wildcard in an equality or inequality that isn't.
Stepping back, your code is producing an indicator (some people say dummy) variable with values 1 or missing. In practice such a variable is much more useful if created with values 0 and 1 (and in some instances missing too).
generate anyad1 = 0
foreach v in dress walk bath eat bed toi {
replace anyad1 = 1 if inlist(diff`v', 1, 6, 7)
}
is one approach. In general, note both inlist(foo, 1, 6, 7) and inlist(1, foo, bar, bazz) as useful constructs.
Reading:
This paper on generating indicators
This one on useful functions
This one on inlist() and inrange()
FAQ on true and false in Stata

Ignore missing values when generating new variable

I want to create a new variable in Stata, that is a function of 3 different variables, X, Y and Z, like:
gen new_var = (((X)*3) + ((Y)*2) + ((Z)*4))/7
All observations have missing values for one or two of the variables.
When I run the aforementioned command, all it generates are missing values, because no observation has values for all 3 of the variables. I would like Stata to complete the function ignoring the missing variables.
I tried the following commands without success:
gen new_var= (cond(missing(X*3),., X) + cond(missing(Y*2),., Y))/7
gen new_var= (!missing(X*3+Y*2+Z*4)/7)
gen new_var= (max(X , Y, Z)/7) if missing(X , Y, Z)
The egen command does not allow complicated functions; otherwise rowtotal() could work.
EDIT:
To clarify, "ignoring missing variables" means that even if any one of the component variables is not missing, then apply the function to only that variable and produce a value for the new variable. The new variable should have missing values only when all three component variables are missing.
I am going to guess that "ignoring missing values" means "treating them as zeros". If you have some other idea, you should make it explicit.
That could be
gen new_var = (cond(missing(X), 0, 3 * X) ///
+ cond(missing(Y), 0, 2 * Y) ///
+ cond(missing(Z), 0, 4 * Z)) / 7
Let's look at your solutions and explain why they are all wrong either in general or usually.
(cond(missing(X*3),., X) + cond(missing(Y*2),., Y))/7
It is sufficient is note that if it's true that X is missing, then cond() yields missing, as then X * 3 is missing too. The same kind of remark applies to terms involving Y and Z. So you're replacing any missing values by missing values, which is no gain.
!missing(X*3+Y*2+Z*4)/7
Given the information that at least one of X Y Z is always missing, then this always evaluates to 0/7 or 0. Even if X Y Z were all non-missing, then it would evaluate to 1/7. That is a long way from the sum you want. missing() always yields 1 or 0, and its negation thus 0 or 1.
(max(X, Y, Z)/7) if missing(X , Y, Z)
The maximum of X, Y, Z will be the right answer if and only if one of the values is not missing and the other two are missing. max() ignores missings to the extent possible (even though in other contexts missings are treated as if arbitrarily large positive numbers).
If you just want to "ignore missing values" without "treating them as zeros", the following will work:
clear
set obs 10
generate X = rnormal(5, 2)
generate Y = rnormal(10, 5)
generate Z = rnormal(1, 10)
replace X = . in 2
replace Y = . in 5
replace Z = . in 9
generate new_var = (((X)*3) + ((Y)*2) + ((Z)*4)) / 7 if X != . | Y != . | Z != .
list
+---------------------------------------------+
| X Y Z new_var |
|---------------------------------------------|
1. | 3.651024 3.48609 -24.1695 -11.25039 |
2. | . 14.14995 8.232919 . |
3. | 3.689442 9.812483 1.154064 5.044221 |
4. | 2.500493 13.02909 5.25539 7.797317 |
5. | 4.19431 . 6.584174 . |
6. | 7.221717 13.92533 5.045283 9.956708 |
7. | 5.746871 14.26329 3.828253 8.725744 |
8. | 1.396223 16.2358 19.01479 16.10277 |
9. | 4.633088 13.95751 . . |
10. | 2.521546 4.490258 -3.396854 .422534 |
+---------------------------------------------+
Alternatively, you could also use the inlist() function:
generate new_var = (((X)*3) + ((Y)*2) + ((Z)*4)) / 7 if !inlist(., X, Y, Z)

Django ORM calculations between records

Is it possible to perform calculations between records in a Django query?
I know how to perform calculations across records (e.g. data_a + data_b). Is there way to perform say the percent change between data_a row 0 and row 4 (i.e. 09-30-17 and 09-30-16)?
+-----------+--------+--------+
| date | data_a | data_b |
+-----------+--------+--------+
| 09-30-17 | 100 | 200 |
| 06-30-17 | 95 | 220 |
| 03-31-17 | 85 | 205 |
| 12-31-16 | 80 | 215 |
| 09-30-16 | 75 | 195 |
+-----------+--------+--------+
I am currently using Pandas to perform these type of calculations, but would like eliminate this additional step if possible.
I would go with a Database cursor raw SQL
(see https://docs.djangoproject.com/en/2.0/topics/db/sql/)
combined with a Lag() window function as so:
result = cursor.execute("""
select date,
data_a - lag(data_a) over (order by date) as data_change,
from foo;""")
This is the general idea, you might need to change it according to your needs.
There is no row 0 in a Django database, so we'll assume rows 1 and 5.
The general formula for calculation of percentage as expressed in Python is:
((b - a) / a) * 100
where a is the starting number and b is the ending number. So in your example:
a = 100
b = 75
((b - a) / a) * 100
-25.0
If your model is called Foo, the queries you want are:
(a, b) = Foo.objects.filter(id__in=[id_1, id_2]).values_list('data_a', flat=True)
values_list says "get just these fields" and flat=True means you want a simple list of values, not key/value pairs. By assigning it to the (a, b) tuple and using the __in= clause, you get to do this as a single query rather than as two.
I would wrap it all up into a standalone function or model method:
def pct_change(id_1, id_2):
# Get a single column from two rows and return percentage of change
(a, b) = Foo.objects.filter(id__in=[id_1, id_2]).values_list('data_a', flat=True)
return ((b - a) / a) * 100
And then if you know the row IDs in the db for the two rows you want to compare, it's just:
print(pct_change(233, 8343))
If you'd like to calculate progressively the change between row 1 and row 2, then between row 2 and row 3, and so on, you'd just run this function sequentially for each row in a queryset. Because row IDs might have gaps we can't just use n+1 to compute the next row. Instead, start by getting a list of all the row IDs in a queryset:
rows = [r.id for r in Foo.objects.all().order_by('date')]
Which evaluates to something like
rows = [1,2,3,5,6,9,13]
Now for each elem in list and the next elem in list, run our function:
for (index, row) in enumerate(rows):
if index < len(rows):
current, next_ = row, rows[index + 1]
print(current, next_)
print(pct_change(current, next_))

Execute two commands in single-line If -statement

I want to execute two commands as part of a single-line If -statement:
Though below snippet runs without error, variable $F_Akr_Completed is not set to 1, but the MsgBox() is displayed properly (with "F is 0").
$F_Akr_Completed = 0
$PID_Chi = Run($Command)
If $F_Akr_Completed = 0 And Not ProcessExists($PID_Chi) Then $F_Akr_Completed = 1 And MsgBox(1,1,"[Info] Akron parser completed. F is " & $F_Akr_Completed)
Any idea why there is no syntax-error reported when it's not functional?
There is no error, because
If x = x Then x And x
is a valid statement, and x And x is a logical expression. There are many ways you can do this, e.g.:
If Not ($F_Akr_Completed And ProcessExists($PID_Chi)) Then $F_Akr_Completed = 1 + 0 * MsgBox(1,1,"[Info] Akron parser completed. F is " & 1)
But that is a bad style of coding. AutoIt is a mostly verbose language and I recommend to seperate multiple statements.
You can also assign values using the ternary operator:
$F_Akr_Completed = (Not ($F_Akr_Completed And ProcessExists($PID_Chi))) ? 1 : 0
which is the same as
$F_Akr_Completed = Int(Not ($F_Akr_Completed And ProcessExists($PID_Chi)))

Stata: Counting number of consecutive occurrences of a pre-defined length

Observations in my data set contain the history of moves for each player. I would like to count the number of consecutive series of moves of some pre-defined length (2, 3 and more than 3 moves) in the first and the second halves of the game. The sequences cannot overlap, i.e. the sequence 1111 should be considered as a sequence of the length 4, not 2 sequences of length 2. That is, for an observation like this:
+-------+-------+-------+-------+-------+-------+-------+-------+
| Move1 | Move2 | Move3 | Move4 | Move5 | Move6 | Move7 | Move8 |
+-------+-------+-------+-------+-------+-------+-------+-------+
| 1 | 1 | 1 | 1 | . | . | 1 | 1 |
+-------+-------+-------+-------+-------+-------+-------+-------+
…the following variables should be generated:
Number of sequences of 2 in the first half =0
Number of sequences of 2 in the second half =1
Number of sequences of 3 in the first half =0
Number of sequences of 3 in the second half =0
Number of sequences of >3 in the first half =1
Number of sequences of >3 in the second half = 0
I have two potential options of how to proceed with this task but neither of those leads to the final solution:
Option 1: Elaborating on Nick’s tactical suggestion to use strings (Stata: Maximum number of consecutive occurrences of the same value across variables), I have concatenated all “move*” variables and tried to identify the starting position of a substring:
egen test1 = concat(move*)
gen test2 = subinstr(test1,"11","X",.) // find all consecutive series of length 2
There are several problems with Option 1:
(1) it does not account for cases with overlapping sequences (“1111” is recognized as 2 sequences of 2)
(2) it shortens the resulting string test2 so that positions of X no longer correspond to the starting positions in test1
(3) it does not account for variable length of substring if I need to check for sequences of the length greater than 3.
Option 2: Create an auxiliary set of variables to identify the starting positions of the consecutive set (sets) of the 1s of some fixed predefined length. Building on the earlier example, in order to count sequences of length 2, what I am trying to get is an auxiliary set of variables that will be equal to 1 if the sequence of started at a given move, and zero otherwise:
+-------+-------+-------+-------+-------+-------+-------+-------+
| Move1 | Move2 | Move3 | Move4 | Move5 | Move6 | Move7 | Move8 |
+-------+-------+-------+-------+-------+-------+-------+-------+
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+-------+-------+-------+-------+-------+-------+-------+-------+
My code looks as follows but it breaks when I am trying to restart counting consecutive occurrences:
quietly forval i = 1/42 {
gen temprow`i' =.
egen rowsum = rownonmiss(seq1-seq`i') //count number of occurrences
replace temprow`i'=rowsum
mvdecode seq1-seq`i',mv(1) if rowsum==2
drop rowsum
}
Does anyone know a way of solving the task?
Assume a string variable concatenating all moves all (the name test1 is hardly evocative).
FIRST TRY: TAKING YOUR EXAMPLE LITERALLY
From your example with 8 moves, the first half of the game is moves 1-4 and the second half moves 5-8. Thus there is for each half only one way to have >3 moves, namely that there are 4 moves. In that case each substring will be "1111" and counting reduces to testing for the one possibility:
gen count_1_4 = substr(all, 1, 4) == "1111"
gen count_2_4 = substr(all, 5, 4) == "1111"
Extending this approach, there are only two ways to have 3 moves in sequence:
gen count_1_3 = inlist(substr(all, 1, 4), "111.", ".111")
gen count_2_3 = inlist(substr(all, 5, 4), "111.", ".111")
In similar style, there can't be two instances of 2 moves in sequence in each half of the game as that would qualify as 4 moves. So, at most there is one instance of 2 moves in sequence in each half. That instance must match either of two patterns, "11." or ".11". ".11." is allowed, so either includes both. We must also exclude any false match with a sequence of 3 moves, as just mentioned.
gen count_1_2 = (strpos(substr(all, 1, 4), "11.") | strpos(substr(all, 1, 4), ".11") ) & !count_1_3
gen count_2_2 = (strpos(substr(all, 5, 4), "11.") | strpos(substr(all, 5, 4), ".11") ) & !count_2_3
The result of each strpos() evaluation will be positive if a match is found and (arg1 | arg2) will be true (1) if either argument is positive. (For Stata, non-zero is true in logical evaluations.)
That's very much tailored to your particular problem, but not much worse for that.
P.S. I didn't try hard to understand your code. You seem to be confusing subinstr() with strpos(). If you want to know positions, subinstr() cannot help.
SECOND TRY
Your last code segment implies that your example is quite misleading: if there can be 42 moves, the approach above can not be extended without pain. You need a different approach.
Let's suppose that the string variable all can be 42 characters long. I will set aside the distinction between first and second halves, which can be tackled by modifying this approach. At its simplest, just split the history into two variables, one for the first half and one for the second and repeat the approach twice.
You can clone the history by
clonevar work = all
gen length1 = .
gen length2 = .
and set up your count variables. Here count_4 will hold counts of 4 or more.
gen count_4 = 0
gen count_3 = 0
gen count_2 = 0
First we look for move sequences of length 42, ..., 2. Every time we find one, we blank it out and bump up the count.
qui forval j = 42(-1)2 {
replace length1 = length(work)
local pattern : di _dup(`j') "1"
replace work = subinstr(work, "`pattern'", "", .)
replace length2 = length(work)
if `j' >= 4 {
replace count4 = count4 + (length1 - length2) / `j'
}
else if `j' == 3 {
replace count3 = count3 + (length1 - length2) / 3
}
else if `j' == 2 {
replace count2 = count2 + (length1 - length2) / 2
}
}
The important details here are
If we delete (repeated instances of) a pattern and measure the change in length, we have just deleted (change in length) / (length of pattern) instances of that pattern. So, if I look for "11" and found that the length decreased by 4, I just found two instances.
Working downwards and deleting what we found ensures that we don't find false positives, e.g. if "1111111" is deleted, we don't find later "111111", "11111", ..., "11" which are included within it.
Deletion implies that we should work on a clone in order not to destroy what is of interest.