I have a situation where I have to find the count of few Boolean field values only if they are true.
Input XML:
<PersonInfo>
<ArrayOfPersonInfo>
<CertAsAdultFlag>true</CertAsAdultFlag>
<DeceasedFlag>true</DeceasedFlag>
<WantedFlag>false</WantedFlag>
<CPSORFlag>true</CPSORFlag>
<ConditonalReleaseFlag>false</ConditonalReleaseFlag>
<ProbationFlag>true</ProbationFlag>
<MissingFlag>true</MissingFlag>
<ATLFlag>true</ATLFlag>
<CCWFlag>false</CCWFlag>
<VictimIDTheftFlag>true</VictimIDTheftFlag>
</ArrayOfPersonInfo>
</PersonInfo>
I need to find the count of these flags with the condition if they are 'true'.
Here is what I tried and was unsuccessful with:
<xsl:variable name="AlertCount" select="
count(
PersonInfo/ArrayOfPersonInfo[
CPSORFlag[.='true'] | CertAsAdultFlag[.='true'] |
DeceasedFlag[.='true'] | WantedFlag[.='true'] |
ConditonalReleaseFlag[.='true'] | MissingFlag[.='true'] |
ATLFlag[.='true'] | ProbationFlag[.='true'] | CCWFlag[.='true'] |
VictimIDTheftFlag[.='true'] | CHRIFlag[.='true'] |
CivilWritFlag[.='true'] | MentalPetitionFlag[.='true'] |
ProtectionOrderFlag[.='true'] | juvWantedFlag[.='true'] |
WeaponsFlag[.='true'] | WorkCardFlag[.='true']
]
)
"/>
I really need help with this from someone as I've been trying hard to get through it. Thanks in advance.
<xsl:variable name="AlertCount" select="count(PersonInfo//*[. = 'true'])" />
Here's why your's does not work:
The square brackets in your approach create a predicate over a node set.
That node-set was the union of all mentioned child nodes who fulfilled their condition. A non-empty node-set evaluates to true, a non-empty one to false.
In consequence your count() would always be 1 if any of the children were true and always be 0 if all of them were false.
In other words: You selected one <ArrayOfPersonInfo> node. If it fulfilled a condition (having any number of children with 'true' as their value) it was counted, otherwise not.
After clarification in the comments ("I need to worry only about the flags I mentioned in the above XML"):
<xsl:variable name="AlertCount" select="
count(
PersonInfo//*[
self::CPSORFlag or
self::CertAsAdultFlag or
self::DeceasedFlag or
self::WantedFlag or
self::ConditonalReleaseFlag or
self::MissingFlag or
self::ATLFlag or
self::ProbationFlag or
self::CCWFlag or
self::VictimIDTheftFlag or
self::CHRIFlag or
self::CivilWritFlag or
self::MentalPetitionFlag or
self::ProtectionOrderFlag or
self::juvWantedFlag or
self::WeaponsFlag or
self::WorkCardFlag
][. = 'true']
)
" />
Related
I have an import query (table a) and an imported Excel file (table b) with records I am trying to match it up with.
I am looking for a method to replicate this type of SQL in M:
SELECT a.loc_id, a.other_data, b.stk
FROM a INNER JOIN b on a.loc_id BETWEEN b.from_loc AND b.to_loc
Table A
| loc_id | other data |
-------------------------
| 34A032B1 | ... |
| 34A3Z011 | ... |
| 3DD23A41 | ... |
Table B
| stk | from_loc | to_loc |
--------------------------------
| STKA01 | 34A01 | 34A30ZZZ |
| STKA02 | 34A31 | 34A50ZZZ |
| ... | ... | ... |
Goal
| loc_id | other data | stk |
----------------------------------
| 34A032B1 | ... | STKA01 |
| 34A3Z011 | ... | STKA02 |
| 3DD23A41 | ... | STKD01 |
All of the other queries I can find along these lines use numbers, dates, or times in the BETWEEN clause, and seem to work by exploding the (from, to) range into all possible values and then filtering out the extra rows. However I need to use string comparisons, and exploding those into all possible values would be unfeasable.
Between all the various solutions I could find, the closest I've come is to add a custom column on table a:
Table.SelectRows(
table_b,
(a) => Value.Compare([loc_id], table_b[from_loc]) = 1
and Value.Compare([loc_id], table_b[to_loc]) = -1
)
This does return all the columns from table_b, however, when expanding the column, the values are all null.
This is not very specific "After 34A01 could be any string..." in trying to figure out how your series progresses.
But maybe you can just test for how a value "sorts" using the native sorting function in PQ.
add custom column with table.Select Rows:
= try Table.SelectRows(TableB, (t)=> t[from_loc]<=[loc_id] and t[to_loc] >= [loc_id])[stk]{0} otherwise null
To reproduce with your examples:
let
TableB=Table.FromColumns(
{{"STKA01","STKA02"},
{"34A01","34A31"},
{"34A30ZZZ","34A50ZZZ"}},
type table[stk=text,from_loc=text,to_loc=text]),
TableA=Table.FromColumns(
{{"34A032B1","34A3Z011","3DD23A41"},
{"...","...","..."}},
type table[loc_id=text, other data=text]),
//determine where it sorts and return the stk
#"Added Custom" = Table.AddColumn(#"TableA", "stk", each
try Table.SelectRows(TableB, (t)=> t[from_loc]<=[loc_id] and t[to_loc] >= [loc_id])[stk]{0} otherwise null)
in
#"Added Custom"
Note: if the above algorithm is too slow, there may be faster methods of obtaining these results
I am trying to use Django StrIndex to find all rows with the value a substring of a given string.
Eg:
my table contains:
+----------+------------------+
| user | domain |
+----------+------------------+
| spam1 | spam.com |
| badguy+ | |
| | protonmail.com |
| spammer | |
| | spamdomain.co.uk |
+----------+------------------+
but the query
SpamWord.objects.annotate(idx=StrIndex(models.Value('xxxx'), 'user')).filter(models.Q(idx__gt=0) | models.Q(domain='spamdomain.co.uk')).first()
matches <SpamWord: *#protonmail.com>
The query it is SELECT `spamwords`.`id`, `spamwords`.`user`, `spamwords`.`domain`, INSTR('xxxx', `spamwords`.`user`) AS `idx` FROM `spamwords` WHERE (INSTR('xxxx', `spamwords`.`user`) > 0 OR `spamwords`.`domain` = 'spamdomain.co.uk')
It should be <SpamWord: *#spamdomain.co.uk>
this is happening because
INSTR('xxxx', '') => 1
(and also INSTR('xxxxasd', 'xxxx') => 1, which it is correct)
How can I write this query in order to get entry #5 (spamdomain.co.uk)?
The order of the parameters of StrIndex [Django-doc] is swapped. The first parameter is the haystack, the string in which you search, and the second one is the needle, the substring you are looking for.
You thus can annotate with:
from django.db.models import Q, Value
SpamWord.objects.annotate(
idx=StrIndex('user', Value('xxxx'))
).filter(
Q(idx__gt=0) | Q(domain='spamdomain.co.uk')
).first()
Just filter rows where user is empty:
(~models.Q(user='') & models.Q(idx__gt=0)) | models.Q(domain='spamdomain.co.uk')
I have group elements using this expression:
count(. | key('products-by-category', CodiceAttivita)[1]) = 1
Now I need to confront the number of results and say that if is more than 1 show a block of elements.
I think to do something like that
<xsl:if test=" [count(. | key('products-by-category', CodiceAttivita)[1]) = 1] > 1">
But it doesn't work.
How can I fix it?
Thank you
Part of xml is
<Riepilogo>
<IVA>
<AliquotaIVA>4.00</AliquotaIVA>
<Imposta>5830.98</Imposta>
</IVA>
<Ammontare>145879.00</Ammontare>
<ImportoParziale>145774.50</ImportoParziale>
<TotaleAmmontareResi>0.00</TotaleAmmontareResi>
<CodiceAttivita>253000</CodiceAttivita>
</Riepilogo>
<Riepilogo>
<IVA>
<AliquotaIVA>10.00</AliquotaIVA>
<Imposta>645.66</Imposta>
</IVA>
<Ammontare>6587.00</Ammontare>
<ImportoParziale>6456.60</ImportoParziale>
<CodiceAttivita>433100</CodiceAttivita>
</Riepilogo>
<Riepilogo>
<IVA>
<AliquotaIVA>22.00</AliquotaIVA>
<Imposta>618.34</Imposta>
</IVA>
<Ammontare>3254.85</Ammontare>
<ImportoParziale>2810.65</ImportoParziale>
<CodiceAttivita>253000</CodiceAttivita>
</Riepilogo>
What I need is group for CodiceAttivita and define the case when the CodiceAttivita has the same value.
I have a table with numeric values and blank records. I'm trying to calculate a number of rows that are not blank and bigger than 20.
+--------+
| VALUES |
+--------+
| 2 |
| 0 |
| 13 |
| 40 |
| |
| 1 |
| 200 |
| 4 |
| 135 |
| |
| 35 |
+--------+
I've tried different options but constantly get the next error: "Cannot convert value '' of type Text to type Number". I understand that blank cells are treated as text and thus my filter (>20) doesn't work. Converting blanks to "0" is not an option as I need to use the same values later to calculate AVG and Median.
CALCULATE(
COUNTROWS(Table3),
VALUE(Table3[VALUES]) > 20
)
OR getting "10" as a result:
=CALCULATE(
COUNTROWS(ALLNOBLANKROW(Table3[VALUES])),
VALUE(Table3[VALUES]) > 20
)
The final result in the example table should be: 4
Would be grateful for any help!
First, the VALUE function expects a string. It converts strings like "123"into the integer 123, so let's not use that.
The easiest approach is with an iterator function like COUNTX.
CountNonBlank = COUNTX(Table3, IF(Table3[Values] > 20, 1, BLANK()))
Note that we don't need a separate case for BLANK() (null) here since BLANK() > 20 evaluates as False.
There are tons of other ways to do this. Another iterator solution would be:
CountNonBlank = COUNTROWS(FILTER(Table3, Table3[Values] > 20))
You can use the same FILTER inside of a CALCULATE, but that's a bit less elegant.
CountNonBlank = CALCULATE(COUNT(Table3[Values]), FILTER(Table3, Table3[Values] > 20))
Edit
I don't recommend the CALCULATE version. If you have more columns with more conditions, just add them to your FILTER. E.g.
CountNonBlank =
COUNTROWS(
FILTER(Table3,
Table3[Values] > 20
&& Table3[Text] = "xyz"
&& Table3[Number] <> 0
&& Table3[Date] <= DATE(2018, 12, 31)
)
)
You can also do OR logic with || instead of the && for AND.
Not sure why I'm having a difficult time with this, it seems so simple considering it's fairly easy to do in R or pandas. I wanted to avoid using pandas though since I'm dealing with a lot of data, and I believe toPandas() loads all the data into the driver’s memory in pyspark.
I have 2 dataframes: df1 and df2. I want to filter df1 (remove all rows) where df1.userid = df2.userid AND df1.group = df2.group. I wasn't sure if I should use filter(), join(), or sql For example:
df1:
+------+----------+--------------------+
|userid| group | all_picks |
+------+----------+--------------------+
| 348| 2|[225, 2235, 2225] |
| 567| 1|[1110, 1150] |
| 595| 1|[1150, 1150, 1150] |
| 580| 2|[2240, 2225] |
| 448| 1|[1130] |
+------+----------+--------------------+
df2:
+------+----------+---------+
|userid| group | pick |
+------+----------+---------+
| 348| 2| 2270|
| 595| 1| 2125|
+------+----------+---------+
Result I want:
+------+----------+--------------------+
|userid| group | all_picks |
+------+----------+--------------------+
| 567| 1|[1110, 1150] |
| 580| 2|[2240, 2225] |
| 448| 1|[1130] |
+------+----------+--------------------+
EDIT:
I've tried many join() and filter() functions, I believe the closest I got was:
cond = [df1.userid == df2.userid, df2.group == df2.group]
df1.join(df2, cond, 'left_outer').select(df1.userid, df1.group, df1.all_picks) # Result has 7 rows
I tried a bunch of different join types, and I also tried different
cond values:
cond = ((df1.userid == df2.userid) & (df2.group == df2.group)) # result has 7 rows
cond = ((df1.userid != df2.userid) & (df2.group != df2.group)) # result has 2 rows
However, it seems like the joins are adding additional rows, rather than deleting.
I'm using python 2.7 and spark 2.1.0
Left anti join is what you're looking for:
df1.join(df2, ["userid", "group"], "leftanti")
but the same thing can be done with left outer join:
(df1
.join(df2, ["userid", "group"], "leftouter")
.where(df2["pick"].isNull())
.drop(df2["pick"]))