I'm trying to complete something which should be quite simple but for the life of me, I can't work it out.
I'm trying to calculate the difference between 2 rows that share the same 'Scan type'.
I have attached a photo showing sample data from production. We run a scan and depending on the results of the scan, it's assigned a color.
I want to find the difference in Scan IDs between each Red scan.
Using the attached Photo of Sample data, I would expect a difference of 0 for id 3. A difference of 1 for id 4 and a difference of 10 for id 14.
I have (poorly) written something that works based on the maximum value from the scan id.
I have also tried following a few posts to see if I can get it to work..
var _curid= MAX(table1[scanid])
var _curclueid = MAX(table1[scanid])
var _calc =CALCULATE(SUM(TABLE1[scanid],FILTER(ALLSELECTED(table1[scanid]),table1[scanid]))
return if(_curid-_calc=curid,0,_curid-_calc)
Edit;
Forgot to mention I have checked threads;
57699052
61464745
56703516
57710425
Try the following DAX and if it helps then accept it as the answer.
Create a calculated column that returns the ID where the colour is Red as follows:
Column = IF('Table'[Colour] = "Red", 'Table'[ID])
Create another column as following:
Column 2 =
VAR Colr = 'Table'[Colour]
VAR SCAN = 'Table'[Scan ID]
VAR Prev_ID =
CALCULATE(MAX('Table'[Column 2]),
FILTER('Table', 'Table'[Colour] = Colr && 'Table'[Scan ID] < SCAN))
RETURN
'Table'[Column] - Prev_ID
Output:
EDIT:-
If you want your first value(ID3) to be 0 then relace the RETURN line with the following line:
IF(ISBLANK(Prev_ID) && 'Table'[Colour] = "Red", 0, 'Table'[Column] - Prev_ID )
This will give you the following result:
Related
Hi I have desperately been trying to work this out and have referred to several posts but am still not getting the correct answer!
I have a bunch of providers of different provider type. I calculate an average cost change for each provider (from more granular payment data). I then want to find the standard deviation of these provider level changes for the difference provider type.
This is where I've got up to with the dax - this gives the same standard deviation across all provider types rather than the required output.
group_test =
var tab1 = SUMMARIZECOLUMNS(ProvData[Provider Type],ProvData[Provider Code], "prov_avg",AVERAGEX(core_data, sum(PayData[Payment1])-sum(PayData[Payment2]))/SUM(PayData[Payment1]))
var sd_type = SELECTCOLUMNS(SUMMARIZE(tab1,[Provider Type],[Provider Code], "test", STDEVX.S(tab1,[prov_avg])), "sd_type", [test])
var tab2 = ADDCOLUMNS(tab1, "sd_type", sd_type)
return tab2
I want my final table to look like this
Provider Code
Provider type
Prov_avg
sd_type
1
a
x
sd for a
2
a
y
sd for a
3
b
z
sd for b
Thanks in advance for any help
Add a column to your table:
stdColumn =
var prov_Code = ProvData[Provider Code]
var prov_type = ProvData[Provider Type]
var stdValue = CALCULATE (STDEV.S([prov_avg]), FILTER(prov_Code = ProvData[Provider Code] && prov_type = ProvData[Provider Type]))
return stdValue
So what we do is to calculate the stdev based on the filter given on Code & Type
I have a single table of data named RDSLPDSL. I am trying to calculate two columns based on two measures I am creating from the table.
Count of RDSL Marker for 1 =
CALCULATE(
COUNT('RDSLPDSL'[RDSL Marker]),
'RDSLPDSL'[RDSL Marker] IN { 1 }
)
I am using the above code as a measure to look for values only with 1 in it in the RDSL Marker column.
RDSL % = DIVIDE([Count of RDSL Marker for 1], COUNTROWS(RDSLPDSL))
Then I created a column using the above code to divide the rows with 1 by the total number of rows in the table.
I am doing the same for another column with PDSL. It is as follows:
Count of PDSL Marker for 1 =
CALCULATE(
COUNT('RDSLPDSL'[PDSL Marker]),
'RDSLPDSL'[PDSL Marker] IN { 1 }
)
PDSL % = DIVIDE([Count of PDSL Marker for 1], COUNTROWS(RDSLPDSL))
But when I do this calculation, I am getting an error for circular dependency detected and not getting the final output even though the same code worked for the previous column.
I tried COUNTAX directly instead of using CALCULATE but that brings up the same error too.
I also tried using measures instead of custom column which seems to remove the error but the output is not what I expect and is incorrect.
Any help for the same would be highly appreciated.
I have a Google sheet with information on it and I am trying to automate it a bit. I need a formula which changes a cell value to Yes if there are specific strings in the column of another sheet. I have tried a couple different things using IF and importrange but it's just not working.
I have created a sample sheet to show what I am trying to do:
Test Sheet 1
Test Sheet 2
I would like column C of Sheet 1 to change to Yes if Columns 1 and 2 of both sheets match and Column C of Sheet 2 contains "Reloaded" or "Yes".
try:
=ARRAYFORMULA(IF(REGEXMATCH(VLOOKUP(A2:A&B2:B, {
IMPORTRANGE("1YMBUYC6JgQke-2YWs_VZx9zqlmOdhV8WYvhTpTVxBYM", "Sheet1!A2:A")&
IMPORTRANGE("1YMBUYC6JgQke-2YWs_VZx9zqlmOdhV8WYvhTpTVxBYM", "Sheet1!B2:B"),
IMPORTRANGE("1YMBUYC6JgQke-2YWs_VZx9zqlmOdhV8WYvhTpTVxBYM", "Sheet1!C2:C")}, 2, 0),
"Yes|Reloaded")=TRUE, "Yes", ))
UPDATE:
=ARRAYFORMULA(IFERROR(IF((D2:D="User Task")*(REGEXMATCH(VLOOKUP(B2:B, {
IMPORTRANGE("1YMBUYC6JgQke-2YWs_VZx9zqlmOdhV8WYvhTpTVxBYM", "Sheet1!B2:B"),
IMPORTRANGE("1YMBUYC6JgQke-2YWs_VZx9zqlmOdhV8WYvhTpTVxBYM", "Sheet1!C2:C")}, 2, 0),
"Yes|Reloaded")=TRUE), "Yes", )))
Here you go:
={
"Complete";
ARRAYFORMULA(
IF(
(IMPORTRANGE("1YMBUYC6JgQke-2YWs_VZx9zqlmOdhV8WYvhTpTVxBYM", "Sheet1!A2:A") = A2:A)
* (IMPORTRANGE("1YMBUYC6JgQke-2YWs_VZx9zqlmOdhV8WYvhTpTVxBYM", "Sheet1!B2:B") = B2:B)
* (REGEXMATCH(IMPORTRANGE("1YMBUYC6JgQke-2YWs_VZx9zqlmOdhV8WYvhTpTVxBYM", "Sheet1!C2:C"), "Reloaded|Yes")),
"Yes",
""
)
)
}
I have a dataset with 13 features and 1 label column with only two outcomes Income =< 50k or > 50k.
I am trying to see the distribution of values for each feature for the entire dataset vs the same feature but only with >50k cases to see how the distribution changes or not for that given subset.
if i do:
filtertable = table[table[column] == criteria]
that works well to get the subset
However when used inside a function:
def comparacion(tabla, columna, criterio):
completa = {}
criteria = {}
datos = tabla[tabla[columna] == criterio] #<- here is the problem
datos = tabla.drop(columna, axis=1)
titulos = datos.columns
for tit in titulos:
completa[tit] =
(tabla[tit].value_counts().astype(float))/len(tabla[tit])
criteria[tit] =
(datos[tit].value_counts().astype(float))/len(datos[tit])
return completa, criteria
For some reason the filtering does not work, any ideas what could it be the problem?
I am trying to change all date values in a spreadsheet's Date column where the year is earlier than 1900, to today's date, so I have a slice.
EDIT: previous lines of code:
df=pd.read_excel(filename)#,usecols=['NAME','DATE','EMAIL']
#regex to remove weird characters
df['DATE'] = df['DATE'].str.replace(r'[^a-zA-Z0-9\._/-]', '')
df['DATE'] = pd.to_datetime(df['DATE'])
sample row in dataframe: name, date, email
[u'Public, Jane Q.\xa0' u'01/01/2016\xa0' u'jqpublic#email.com\xa0']
This line of code works.
df["DATE"][df["DATE"].dt.year < 1900] = dt.datetime.today()
Then, all date values are formatted:
df["DATE"] = df["DATE"].map(lambda x: x.strftime("%m/%d/%y"))
But I get an error:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-
versus-copy
I have read the documentation and other posts, where using .loc is suggested
The following is the recommended solution:
df.loc[row_indexer,col_indexer] = value
but df["DATE"].loc[df["DATE"].dt.year < 1900] = dt.datetime.today() gives me the same error, except that the line number is actually the line number after the last line in the script.
I just don't understand what the documentation is trying to tell me as it relates to my example.
I started messing around with pulling out the slice and assigning to a separate dataframe, but then I'm going to have to bring them together again.
You are producing a view when you df["DATE"] and subsequently use a selector [df["DATE"].dt.year < 1900] and try to assign to it.
df["DATE"][df["DATE"].dt.year < 1900] is the view that pandas is complaining about.
Fix it with loc like this:
df.loc[df.DATE.dt.year < 1900, "DATE"] = pd.datetime.today()
My thought would be that you could do
df.loc[df.DATE.dt.year < 1900, "DATE"] = dt.datetime.today()
df.loc[:, "DATE"] = df.DATE.map(lambda x: x.strftime("%m/%d/%y")
Not at a computer so I can't test but I think that should do it.