I want to count the number of No in theses ranges F:R,BC:BN,CX:DI with array formula so if any one submit a new response containing No in these ranges it counts them
I tried using this formula
=ARRAYFORMULA(IF(ROW(E:E)=1,"NC",IF(LEN(E:E), IF(IFERROR(REGEXEXTRACT(TRANSPOSE(QUERY(TRANSPOSE(COUNTIFS(OR(DV:EG="No",BW:CH="No",U:AG="No"))),, 999^99)), "♦"))="♦", 1, 0), )))
but it didn't work, I also tried this formula:
=ARRAYFORMULA(IF(ROW(A:A)=1,"NC",IF(LEN(A:A)=0,IFERROR(1/0),COUNTIFS(F:R,"No")+COUNTIFS(BC:BN,"No")+COUNTIFS(CX:DI,"No"))))
But it counted all the value in the whole range
I need it to count the No row by row so at the end of every row under NC it shows the number of the No in these ranges F:R,BC:BN,CX:DI
Here is a spread sheet containing the data
https://docs.google.com/spreadsheets/d/1SksZv0h82j5oEZBj2AN5anDFr80AYNR5ettSwkpUKys/edit#gid=0
=ARRAYFORMULA({"NC"; IF(LEN(A2:A),
MMULT(IFERROR(LEN(REGEXEXTRACT({F2:R,BC2:BN,CX2:DI}, "No"))/
LEN(REGEXEXTRACT({F2:R,BC2:BN,CX2:DI}, "No")), 0),
TRANSPOSE(COLUMN(A1:AK1)^0)), )})
I am working with a matrix, lets call it X, in python.
I know how to get the dimension of the matrix using X.shape but I am interested specially on using the number of rows of the matrix in a for loop, and I dont know how to get this value in a datatype suitable for a loop.
For example, imagine tihs simple situation:
a = np.matrix([[1,2,3],[4,5,6]])
for i in 1:(number of rows of a)
print i
How can I get automatically that "number of rows of a"?
X.shape[0] == number of rows in X
A superficial search on numpy will lead you to shape. It returns a tuple of array dimensions.
In your case, the first dimension (axe) concerns the columns. You can access it as you access a tuple's element:
import numpy as np
a = np.matrix([[1,2,3],[4,5,6]])
# a. shape[1]: columns
for i in range(0,a.shape[1]):
print 'column '+format(i)
# a. shape[0]: rows
for i in range(0, a.shape[0]):
print 'row '+format(i)
This will print:
column 0
column 1
column 2
row 0
row 1
Hey guys anyone know how to create number of rows based on the count value without using java transformation in informatica 9.6(For flat file).Please help me with that
You can create an auxiliary table with n rows for each possible count value between 1 and N:
1
2
2
3
3
3
...
...
N rows with the last value
...
N rows with the last value
Join this table to the source data using the n count value as the key and you will get n copies of each source row.
This may be a trivial question, but as an R user coming to Stata I have so far failed to find the correct Google terms to find the answer. I want to do the following steps:
Do a bunch of tests (e.g. lrtest results in a foreach loop)
Extract the p-value from each test and save them in a list of some kind
Have a list I can do further operations on (e.g. perform multiple comparison correction)
So I am wondering how to extract p-values (or similar) from command results and how to save them into a vector-like object that I can work with. Here is some R code that does something similar:
myData <- data.frame(a=rnorm(10), b=rnorm(10), c=rnorm(10)) ## generate some data
pValue <- c()
for (variableName in c("b", "c")) {
myModel <- lm(as.formula(paste("a ~", variableName)), data=myData) ## fit model
pValue <- c(pValue, coef(summary(myModel))[2, "Pr(>|t|)"]) ## extract p-value and save in vector
}
pValue * 2 ## do amazing multiple comparison correction
To me it seems like Stata has much less of a 'programming' mindset to it than R. If you have any general Stata literature recommendations for an R user who can program, that would also be appreciated.
Here is an approach that would save the p-values in a matrix and then you can manipulate the matrix, maybe using Mata or standard matrix manipulation in Stata.
matrix storeMyP = J(2, 1, .) //create empty matrix with 2 (as many variables as we are looping over) rows, 1 column
matrix list storeMyP //look at the matrix
loc n = 0 //count the iterations
foreach variableName of varlist b c {
loc n = `n' + 1 //each iteration, adjust the count
reg a `variableName'
test `variableName' //this does an F-test, but for one variable it's equivalent to a t-test (check: -help test- there is lots this can do
matrix storeMyP[`n', 1] = `r(p)' //save the p-value in the matrix
}
matrix list storeMyP //look at your p-values
matrix storeMyP_2 = 2*storeMyP //replicating your example above
What's going on this that Stata automatically stores certain quantities after estimation and test commands. When the help files say this command stores the following values in r(), you refer to them in single quotes.
It could also be interesting for you to convert the matrix column(s) into variables using svmat storeMyP, or see help svmat for more info.
Lets have the following dataframe inside R:
df <- data.frame(sample=rnorm(1,0,1),params=I(list(list(mean=0,sd=1,dist="Normal"))))
df <- rbind(df,data.frame(sample=rgamma(1,5,5),params=I(list(list(shape=5,rate=5,dist="Gamma")))))
df <- rbind(df,data.frame(sample=rbinom(1,7,0.7),params=I(list(list(size=7,prob=0.7,dist="Binomial")))))
df <- rbind(df,data.frame(sample=rnorm(1,2,3),params=I(list(list(mean=2,sd=3,dist="Normal")))))
df <- rbind(df,data.frame(sample=rt(1,3),params=I(list(list(df=3,dist="Student-T")))))
The first column contains a random number of a probability distribution and the second column stores a list with its parameters and name.
The dataframe df looks like:
sample params
1 0.85102972 0, 1, Normal
2 0.67313218 5, 5, Gamma
3 3.00000000 7, 0.7, ....
4 0.08488487 2, 3, Normal
5 0.95025523 3, Student-T
Q1: How can I have the list of name distributions for all records? df$params$dist does not work. For a single record is easy, for example the third one: df$params[[3]]$dist
Q2: Is there any alternative way of storing data like this? something like a multi-dimensional dataframe? I do not want to add columns for each parameter because it will scatter the dataframe with missing values.
It's probably more natural to store information like this in a pure list structure, than in a data frame:
distList <- list(normal = list(sample=rnorm(1,0,1),params=list(mean=0,sd=1,dist="Normal")),
gamma = list(sample=rgamma(1,5,5),params=list(shape=5,rate=5,dist="Gamma")),
binom = list(sample=rbinom(1,7,0.7),params=list(size=7,prob=0.7,dist="Binomial")),
normal2 = list(sample=rnorm(1,2,3),params=list(mean=2,sd=3,dist="Normal")),
tdist = list(sample=rt(1,3),params=list(df=3,dist="Student-T")))
And then if you want to extract just the distribution name from each, we can use sapply to loop over the list and extract just that piece:
sapply(distList,function(x) x[[2]]$dist)
normal gamma binom normal2 tdist
"Normal" "Gamma" "Binomial" "Normal" "Student-T"
If you absolutely must store this information in a data frame, one way of doing so springs to mind. You're currently using a params column in your data frame to store the parameters associated with the distributions. Perhaps a better way of doing this would be to (i) identify the maximum number of parameters that you'll need for any distribution, (ii) store the distribution names in a field called df$distribution, and (iii) store the parameters in dedicated parameter columns, the meaning of which will have to be decided upon based on the type of distribution.
For instance, any row with df$distribution = 'Normal' should have df$param1 = and df$param2 = . A row with df$distribution='Student' should have df$param1 = and df$param2 = NA. Something like the following:
dg <- data.frame(sample=rnorm(1, 0, 1), distribution='Normal',
param1=0, param2=1)
dg <- rbind(dg, data.frame(sample=rgamma(1, 5, 5),
distribution='Gamma', param1=5, param2=5))
dg <- rbind(dg, data.frame(sample=rt(1, 3), distribution='Student',
param1=3, param2=NA))
It's ugly, but it will give you what you want. And don't worry about the missing values; missing values are a fact of life when dealing with non-trivial data frames. They can be dealt with easily in R by appropriate use of things like na.rm and complete.cases().
Based on the data frame you have above,
sapply(df$params,"[[","dist")
(or lapply if you prefer) would work.
I would probably put at least the names of the distributions in their own column, even if you want the parameters to be in variable-length lists.