If I have data series and a set of constraints and want to predict the most likely values, what is the right algorithm or approach? For example, given the data table as follows:
The first three rows illustrate typical data values. Imagine we have dozens or hundreds of such rows. The constraints on the system are as follows:
G1 + G2 + G3 + G4 == D1 + D2 + D3
G1 + G2 = D1 - C1
G3 = D2 + C1 - C2
G4 = D3 + C2
So, given D1, D2 and D3 we need to predict G1, G2, G3, G4, C1, and C2. Note that there may not necessarily be enough information to solve the system by linear programming alone and so some kind of trend analysis or probability distribution might need to be made.
What is the right algorithm or approach to solve a problem like this?
Related
Below are the details of table1 where I've few columns like shop, shelf and product. It represents a particular shop has a particular shelf where products are being placed.
Below table replicates small volume of dataset.
shop shelf product
a a1 p1xxxxx
a a2 p2xxxxx
a a3 p1xxxxx
a a4 p1xxxxx
b b1 p1xxxxx
b b2 p2xxxxx
b b3 p3xxxxx
b b4 p1xxxxx
b b5 p2xxxxx
b b6 p1xxxxx
c c1 p3xxxxx
c c2 p3xxxxx
c c3 p2xxxxx
c c5 p2xxxxx
c c6 p3xxxxx
My aim is to get the count of a particular product "p1" where it's being placed.
For small volume, the below formula works fine, but when I tried to run the same formula on large volume of data it represent the total sum of product for each row.
count = if(
CALCULATE(
countrows(Table1),
SEARCH("*p1*",Table1[product],,0))=blank(),
0,
CALCULATE(
COUNTROWS(Table1),
SEARCH("*p1*",Table1[product],,0)
)
)
Below screenshot is for small volume of data. The formula works fine.
But for large volume data, the formula doesn't work, instead it gives total sum of that particular product. Below is the screenshot for reference.
This post also is related to the another question of same dataset.
I have a grid feature class that varies in size and shape. My test shapefile is a 3x4 grid. I need to create an alphanumeric sequence that goes in a specific order but can be scaled for a different size grid. Below is the order the grid is in:
A4 | B4 | C4
A3 | B3 | C3
A2 | B2 | C2
A1 | B1 | C1
and to use this alphanumeric sequence, the list will need to be printed in a specific order (starting from the bottom left of the table, moving to the right, and then returning to the left value on the next row up:
A1, B1, C1, A2, B2, C2, A3, B3, C3, A4, B4, C4
I had this:
from itertools import product
from string import ascii_uppercase, digits
for x, y in product(ascii_uppercase, digits):
print('{}{}'.format(x, y))
It generates a sequence like: A0 through A9, then B0 through B9, and so forth.
However I also need larger grids so the script would have to compensate and allow the sequence to use double digits after 9 if the grid is larger than 9 high.
ie. A10, B10, C10
I then tried to make 2 lists and then combine them together, but I ran into the problem of joining these in the sequence I need.
w = 3
h = 4
alpha = []
numeric = []
for letter in ascii_uppercase[:w]:
alpha.append(letter)
for num in range(1, h+1):
numeric.append(num)
I assume I might not need to make a numeric list, but don't know how to do it. I know slightly more than just the basics of python and have created so more complex scripts, but this is really puzzling for me! I feel like I am so close but missing something really simple from both of my samples above. Thank you for any help you can give me!
Solved, here is what I have for others who might need to use my question:
w = 9
h = 20
alpha = []
numeric = []
for letter in ascii_uppercase[:w]:
alpha.append(letter)
for num in range(1, h+1):
numeric.append(num)
longest_num = len(str(max(numeric)))
for y in numeric:
for x in alpha:
print '{}{:0{}}'.format(x, y, longest_num)
I didn't need the code formatted as a table since I was going to perform a field calculation in ArcMap.
After you compute numeric, also do:
longest_num = len(str(max(numeric)))
and change your format statement to:
'{}{:0{}}'.format(x, y, longest_num)
This ensures that when you get to double digits you get the following result:
A12 | B12 | C12
A11 | B11 | C11
...
A02 | B02 | C02
A01 | B01 | C01
To actually print the grid however you need to change your code:
longest_num = len(str(max(numeric)))
for y in reversed(numeric):
print(" | ".join('{}{:0{}}'.format(x, y, longest_num)
for x in alpha))
I need to do something like this:
d1 == min(d2,d3)
where d is a decision variable. I need to use Pyomo. In cplex the solution is achieved by the funnction minl, how can do this in Pyomo or in an equivalent linear form?
I searched for a solution on Google and found that I could assert that d1 must be less or equal to d2 and d3. But this do not fit my proble, because if d2 and d3 is equal to 1, d1 <= 1 while I need d1 == 1.
Thanks for replies.
When the d variables are binary variables,
d1 = min(d2,d3)
is really the same as multiplication
d1 = d2*d3
This is often linearized as
d1 <= d2
d1 <= d3
d1 >= d2+d3-1
I am trying in Sympy to use a standard engineering method of simplifying an equation, where you know one variable is much larger or smaller than another. For instance, given the equation
C1*R2*s+C1*R2+R1+R2
and knowing that
R1 >> R2
the equation can be simplified to
C1*R2*s+C1*R2+R1.
What you'd normally do by hand is divide by R1, giving
C1*R2*s/R1+C1*R2/R1+1+R2/R1
then anyplace you see R2/R1 by itself you can set it to zero and then multiply by R1. I've not been able to figure how this would be done in Sympy. Obviously it's easy to do the division step, but I haven't been able to figure out how to do the search and replace step- just using subs gives you
R1
which isn't the right answer. factor, expand, collect, don't seem to get me anywhere.
Using replace instead of subs works here.
C1, C2, R1, R2 = sp.symbols('C1, C2, R1, R2', real = True)
s = sp.symbols('s')
expr = C1*R2*s+C1*R2+R1+R2
print('original expression:', expr)
expr_approx = (R1 * ((expr/R1).expand().replace(R2/R1,0))).simplify()
print('approximate expression:', expr_approx)
original expression: C1*R2*s + C1*R2 + R1 + R2
approximate expression: C1*R2*s + C1*R2 + R1
I am working on a structural equation model (sem) model with 47 observed variables and 6 latent variables, of which 5 observed variables are endogenous and one latent variable is endogenous. Data has no missing values and sample size is 4,634.
I ran sem in Stata using the following command:
sem (I -> i1 i2 i3 i4 i5_1) ///
(N -> n1 n2 n3 n4) ///
(S -> s1 s2 s3 s4 s5 s6 s7 s8 s9) ///
(T -> t1 t2 t3 t4) ///
(SES -> se1 se2 se3 se4 se5 se6 se7 se8 se9 se10 ///
se11 se12 se13 se14 se15 se16 se17 se18 se19 se20) ///
(CS -> c1 c2 c3 c4 c5) ///
(CS <- I N S T SES)
It returned the following error message:
initial values are not feasible
Why am I receiving this message? How can I deal with this error?
I would start by looking at each measurement model separately, and see if there are problems there. i.e.:
sem ( i1 i2 i3 i4 i5_1 <- I)
sem ( i1 i2 i3 i4 i5_1 <- N)
etc.
My guess would be that the model for SES might prove to be the problem.
Edit:
Based on your comment we now know that the measurement models in isolation converge. Next step would be to check each of the measurement models to see whether they make sense: Do each of the loading have the expected sign? Are there loading that are unexpectedly large or small? If you see that, you need to figure out why that is the case. This just requires staring at your data, looking at graphs and correlation tables.
If there is no problem with your measurement model, then the next step would be to look at the structural part. Obviously you cannot do the same trick as with the measurement model, that is, you cannot estimate the structural part without the measurement models. The structural contains latent variables, and it is the measurement models that define what they are. So without measurement models, the structural model is not identified.
What I would do instead is simplify your model, and than add complication till you run into problems. For example I might start with:
sem (I -> i1 i2 i3 i4 i5_1) ///
(CS -> c1 c2 c3 c4 c5) ///
(CS <- I)
Than continue with:
sem (I -> i1 i2 i3 i4 i5_1) ///
(N -> n1 n2 n3 n4) ///
(CS -> c1 c2 c3 c4 c5) ///
(CS <- I N)
etc.
That way you can find which latent variable causes trouble. My first move would be to look at the measurement model of that variable and look at the scale of that variable. By default sem "borrows" the scale of one of the observed variables by setting the loading for that variable to 1. Is that variable in some sense "weird"? Similarly I would also look at the scale for your endogenous latent variable CS. If they are weird, you can choose to constrain the loading of another variable with a more reasonable scale to 1, or you can "standardize" your latent variable by constraining the variance of the latent variable to be 1.