How to use a list of indices to partition a list of observed in PyMC3? - pymc3

I have a list of observed data score and a list of indices ind. Every element of ind is either 0, 1, or 2. score and ind have the same length, and ind partitions score into three sets: if ind[i] is k, then score[i] is in set k.
I would like to fit three normal distributions to the data, one normal for set 0, one normal for set 1, and one normal for set 2. My PyMC3 code to set up the model is:
with pm.Model():
mean = pm.Uniform('mean', 0, 1, shape=3)
sd = pm.Uniform('sd', 0, 1, shape=3)
mean_i = pm.Deterministic('mean_i', mean[ind])
sd_i = pm.Deterministic('sd_i', sd[ind])
obs = pm.Normal('obs', mu=mean_i, sd=sd_i, observed=score)
But mean_i seems to have the wrong shape: the traceplots show it to have three elements, rather than just a single element as I expected. And the expression mean[ind] looks wrong: how does PyMC3 know that it should use ind in a way that aligns it with score?
How can I do this?

You can do this.
with pm.Model():
mean = pm.Uniform('mean', 0, 1, shape=3)
sd = pm.Uniform('sd', 0, 1, shape=3)
obs = pm.Normal('obs', mu=mean[ind], sd=[ind], observed=score)
For future reference you can also ask questions here

Related

Simple way to comprehend a list from a secondary list, only for ascending values

I have data in a pandas dataframe that consists of values that increase to a point, and then start decreasing. I am wondering how to simply extract the values up to the point at which they stop increasing.
For example,
d = {'values' : [1, 2, 3, 3, 2, 1]}
df = pd.DataFrame(data=d)
desired result = [1, 2, 3]
This is my attempt, which I thought would check to see if the current list index is larger than the previous, then move on:
result = [i for i in df['values'] if df['values'][i-1] < df['values'][i]]
which returns
[1, 2, 2, 1]
I'm unsure what is happening for that to be the result.
Edit:
Utilizing the .diff() function, suggested by Andrej, combined with list comprehension, I get the same result. (the numpy np.isnan() is used to include the first element of the difference list, which is NaN).
result = [i for i in df['values']
if df['values'].diff().iloc[i]>0
or np.isnan(df['values'].diff().iloc[i])]
result = [1, 2, 2, 1]
You can use .diff() to get difference between the values. If the values are increasing, the difference will be positive. So as next step do a .cumsum() of these values and search for maximum value:
print(df.loc[: df["values"].diff().cumsum().idxmax()])
Prints:
values
0 1
1 2
2 3

Chapel domains : differences between `low/high` and `first/last` methods

Chapel domains have two sets of methods
domain.low, domain.high
and
domain.first, domain.last
What are the various cases where these return different results (i.e when is domain.first != domain.low and domain.last != domain.high?
First, note that these queries are supported not just on domains, but also on ranges (a simpler type representing an integer sequence upon which many domains, and their domain queries, are based). For that reason, my answer will initially focus on ranges for simplicity, before returning to dense rectangular domains (which are defined using a range per dimension).
As background, first and last on a range are designed to specify the indices that you'll get when iterating over that range. In contrast, low and high specify the minimal and maximal indices that define the range.
For a simple range, like 1..10, first and low will be the same, evaluating to 1, while last and high will both evaluate to 10
The way you iterate through a range in reverse order in Chapel is by using a negative stride like 1..10 by -1. For this range, low and high will still be 1 and 10 respectively, but first will be 10 and last will be 1 since the range represents the integers 10, 9, 8, ..., 1.
Chapel also supports non-unit strides, and they can also result in differences. For example for the range 1..10 by 2, low and high will still be 1 and 10 respectively, and first will still be 1 but last will be 9 since this range only represents the odd values between 1 and 10.
The following program demonstrates these cases along with 1..10 by -2 which I'll leave as an exercise for the reader (you can also try it online (TIO)):
proc printBounds(r) {
writeln("For range ", r, ":");
writeln(" first = ", r.first);
writeln(" last = ", r.last);
writeln(" low = ", r.low);
writeln(" high = ", r.high);
writeln();
}
printBounds(1..10);
printBounds(1..10 by -1);
printBounds(1..10 by 2);
printBounds(1..10 by -2);
Dense rectangular domains are defined using a range per dimension. Queries like low, high, first, and last on such domains return a tuple of values, one per dimension, corresponding to the results of the queries on the respective ranges. As an example, here's a 4D domain defined in terms of the ranges above (TIO):
const D = {1..10, 1..10 by -1, 1..10 by 2, 1..10 by -2};
writeln("low = ", D.low);
writeln("high = ", D.high);
writeln("first = ", D.first);
writeln("last = ", D.last);

Google scripts if statement with conditional formatting on last row only

I am trying to use a Google script to set a cell in the last row only of the 2nd column in a Google sheet to green color if it is:
1. <0, and
2. not equal to #N/A
Partial preferred approach
I have the following if statement (without using a loop):
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Sheet1');
lastrow = sheet.getLastRow()
if (sheet.getRange(lastrow, 2, 1, 1) >0.00 && sheet.getRange(lastrow, 2, 1, 1) !='#N/A') {
sheet.getRange(lastrow, 2, 1, 1).setFontColor('green');
}
However, this is not working. It is simply not assigning the color green to the font.
Not the preferred approach
I could do this using a loop, based on this answer, and loop over all rows in the column one at a time:
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Sheet1');
lastrow = sheet.getLastRow()
var oValues = sheet.getRange(2, 2, lastrow, 1).getValues();
for (var i = 0; i < oValues.length; i++) {
if (oValues[i] >0.00) {
sheet.getRange(i, 2, 1, 1).setFontColor('green');
}
}
However, the disadvantage is that this approach is formatting all the rows in the column. I only need to format the last row.
Is there a way to avoid looping over all rows and just check if the last row meets 1. and 2. from above?
How about this answer?
Modification points :
In your script, sheet.getRange(lastrow, 2, 1, 1) > 0.00 and sheet.getRange(lastrow, 2, 1, 1) !='#N/A' mean the comparison with the range. When you want to compare the value of a cell, you can use sheet.getRange(lastrow, 2, 1, 1).getValue().
The condition of <0 and not equal to #N/A can be written by if (value < 0 && value) {}.
The cell of last row of 2nd column can be written by sheet.getRange(sheet.getLastRow(), 2).
In your case, you can also use sheet.getRange(sheet.getLastRow(), 2, 1, 1).
The modified script which was reflected above is as follows.
Modified script :
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Sheet1');
var range = sheet.getRange(sheet.getLastRow(), 2); // Last row of 2nd column
var value = range.getValue(); // Value of last row of 2nd column
if (value < 0 && value) { // <0 and not equal to #N/A
// range.setBackground("green"); // This line give the background color of cell.
range.setFontColor("green"); // This line give the font color of cell.
}
If I misunderstand your question, please tell me. I would like to modify.

DEAP toolbox: to consider different types and ranges of genes in mutation and crossover operators

I am working on a genetic algorithm implementation and I'm using DEAP toolbox.
I've written a code that initializes chromosomes which their first gene is a float number in range of [0.01, 2048], their second gene is again float in range of [0.0001, 10] and their last three genes are boolean. This is my code:
toolbox.register("attr_flt1", random.uniform, 0.01, 2048)
toolbox.register("attr_flt2", random.uniform, 0.0001, 10)
toolbox.register("attr_bool", random.randint, 0, 1)
enter ctoolbox.register("individual", tools.initCycle, creator.Individual,
(toolbox.attr_flt1, toolbox.attr_flt2, toolbox.attr_bool, toolbox.attr_bool, toolbox.attr_bool),
n=1)
There is a sample of created population:
[1817.2852738610263, 6.184224906600851, 0, 0, 1], [1145.7253307024512, 8.618185266721435, 1, 0, 1], ...
Now, I want to do mutation and crossover on my chromosomes by considering differences in the genes types and ranges.
Currently I have an error because a 0 value is produced for the first gene of a chromosome ,after applying crossover and mutation operators, which is wrong with my evaluation function.
Can anyone help me code selection, mutation and crossover using DEAP toolbox that produce new population in ranges defined at first?
If you use the mutation operator mutPolynomialBounded (documented here), then you can specify the interval for each gene.
With the bounds you indicated, perhaps using something such as
eta = 0.5 #indicates degree of ressemblance of mutated individual
indpb = 0.1 #probability of individual to be mutated
low = [0.01, 0.0001, 0, 0, 0] #lower bound for each gene
up = [2048, 10, 1, 1, 1] #upper bound for each gene
toolbox.register('mutate', mutPolynomialBounded(individual, eta, low, up, indpb))
as a mutation function will solve your error.
This way, first gene is in the interval [0.01, 2048], the second gene is in the interval [0.0001, 10] and the last three genes are in the interval [0, 1].
If you also want the last three genes to be either 0 or 1 (but not a float in between), then you might have to implement your own mutation function. For instance, the following function will select random values for each gene using your requirements
def mutRandom(individual, indpb):
if random.random() < indpb:
individual[0] = toolbox.attr_flt1()
individual[1] = toolbox.attr_flt2()
for i in range(2, 5):
individual[i] = toolbox.attr_bool()
return individual,

Bijective mapping of integers

English is not my native language: sorry for my mistakes. Thank you in advance for your answers.
I'm learning C++ and I'm trying to check to what extent two sets with the same number of integers--in whatever order--are bijective.
Example :
int ArrayA [4] = { 0, 0, 3, 4 };
int ArrayB [4] = { 4, 0, 0, 3 };
ArrayA and ArrayB are bijective.
My implementation is naive.
int i, x=0;
std::sort(std::begin(ArrayA), std::end(ArrayA));
std::sort(std::begin(ArrayB), std::end(ArrayB));
for (i=0; i<4; i++) if (ArrayA[i]!=ArrayB[i]) x++;
If x == 0, then the two sets are bijective. Easy.
My problem is the following: I would like to count the number of bijections between the sets, and not only the whole property of the relationship between ArrayA and ArrayB.
Example :
int ArrayA [4] = { 0, 0, 0, 1 }
int ArrayB [4] = { 3, 1, 3, 0 }
Are the sets bijective as a whole? No. But there are 2 bijections (0 and 0, 1 and 1).
With my code, the output would be 1 bijection. Indeed, if we sort the arrays, we get:
ArrayA = 0, 0, 0, 1;
ArrayB = 0, 1, 3, 3.
A side-by-side comparaison shows only a bijection between 0 and 0.
Then, my question is:
Do you know a method to map elements between two equally-sized sets and count the number of bijections, whatever the order of the integers?
Solved!
The answer given by Ivaylo Strandjev works:
Sort the sets,
Use the std::set_intersection function,
Profit.
You need to count the number of elements that are contained in both sets. This is called set intersection and it can be done with a standard function - set_intersection, part of the header algorithm. Keep in mind you still need to sort the two arrays first.