I have group elements using this expression:
count(. | key('products-by-category', CodiceAttivita)[1]) = 1
Now I need to confront the number of results and say that if is more than 1 show a block of elements.
I think to do something like that
<xsl:if test=" [count(. | key('products-by-category', CodiceAttivita)[1]) = 1] > 1">
But it doesn't work.
How can I fix it?
Thank you
Part of xml is
<Riepilogo>
<IVA>
<AliquotaIVA>4.00</AliquotaIVA>
<Imposta>5830.98</Imposta>
</IVA>
<Ammontare>145879.00</Ammontare>
<ImportoParziale>145774.50</ImportoParziale>
<TotaleAmmontareResi>0.00</TotaleAmmontareResi>
<CodiceAttivita>253000</CodiceAttivita>
</Riepilogo>
<Riepilogo>
<IVA>
<AliquotaIVA>10.00</AliquotaIVA>
<Imposta>645.66</Imposta>
</IVA>
<Ammontare>6587.00</Ammontare>
<ImportoParziale>6456.60</ImportoParziale>
<CodiceAttivita>433100</CodiceAttivita>
</Riepilogo>
<Riepilogo>
<IVA>
<AliquotaIVA>22.00</AliquotaIVA>
<Imposta>618.34</Imposta>
</IVA>
<Ammontare>3254.85</Ammontare>
<ImportoParziale>2810.65</ImportoParziale>
<CodiceAttivita>253000</CodiceAttivita>
</Riepilogo>
What I need is group for CodiceAttivita and define the case when the CodiceAttivita has the same value.
Related
I am trying to find number of unique occurrences of some text within one xml tag and display the same in one of the columns in a csv file.
My Xml file looks somewhat like this:
<start>
<tag1> ..... </tag1>
<main>
<number> 685 </number>
<text> hi..some text...[]para01 |...</text>
</main>
<main>
<number> 67 </number>
<text> hi..some text...[]para01 |..</text>
</main>
<main>
<number> 75 </number>
<text> hi..some text...[]para02 |...</text>
</main>
<main> .......
I want to find the number of times each text after bracket is occurring (within each main tag), example para01 is seen 2 times, para02 1 time and so on.
What I tried:
tree = ET.parse(file)
root=tree.getroot()
with open(csvfile, 'a') as f:
writer=csv.writer(f, delimiter=', ')
writer.writerow(['number', 'para', 'count'])
lis = []
for child in root.findall('main'):
num = child.find('number').text
para = re.findall(r"\[] (.*?)\| ", child.find('text').text)
lis.append("" .join(para))
res = dict((i, lis.count(i)) for i in lis)
for key, value in res.items():
r.append([key, value])
r = [num, para, [key, value])
writer.writerow(r)
However I seem to be getting result very different from what I want:
number para. Count
685 para01 para01:2
67 para01 para01:2
75 para02 para02:1
What can I change in my code to get the above output?
Putting aside the csv aspect, to get to your target rows, try something like this:
trgt_txts = root.findall(".//text")
trgt_nums = root.findall(".//number")
txts = []
nums = []
for txt in trgt_txts:
target = txt.text.split(']')[1].split(' |' )[0]
txts.append(target)
for n in trgt_nums:
nums.append(n.text.strip())
for par in set(txts):
for p,n in zip(txts,nums):
if p == par:
print(n,par,par+':'+str(txts.count(par)))
Output:
75 para02 para02:1
685 para01 para01:2
67 para01 para01:2
To populate missing data with a fixed range of values
I would like to check how to populate column aktype with a range of values (the range of values for the same pidlink are always fixed at 11 types of values listed below) for those cells with missing values. I have about 17,000+ observations that are missing.
The range of values are as follows:
A
B
C
D
E
G
H
I
J
K
L
I have tried the following command but it does not work:-
foreach x of varlist aktype=1/11 {
replace aktype = "A" in 1 if aktype==""
replace aktype = "B" in 2 if aktype==""
replace aktype = "C" in 3 if aktype==""
replace aktype = "D" in 4 if aktype==""
replace aktype = "E" in 5 if aktype==""
replace aktype = "G" in 6 if aktype==""
replace aktype = "H" in 7 if aktype==""
replace aktype = "I" in 8 if aktype==""
replace aktype = "J" in 9 if aktype==""
replace aktype = "K" in 10 if aktype==""
replace aktype = "L" in 11 if aktype==""
}
Would appreciate it if you could advise on the right command to use. Many thanks!
I would generate a variable AK that has letters A-K in positions 1-11 (and 12-22, and 23-33, and so on). The replace missing values with the value of this variable AK.
* generate data
clear
set obs 20
generate aktype = ""
replace aktype = "foo" in 1/1
replace aktype = "bar" in 10/12
* generate variable with letters A-K
generate AK = char(65 + mod(_n - 1, 11))
* fill missing values
replace aktype = AK if missing(aktype)
list
This yields the following.
. list
+-------------+
| aktype AK |
|-------------|
1. | foo A |
2. | B B |
3. | C C |
4. | D D |
5. | E E |
|-------------|
This first addresses the comment "it does not work".
Generally, in this kind of forum you should always be specific and say exactly what happens, namely where the code breaks down and what the result is (e.g. what error message you get). If necessary, add why that is not what is wanted.
Specifically, in this case Stata would get no further than
foreach x of varlist aktype=1/11
which is illegal (as well as unclear to Stata programmers).
You can loop over a varlist. In this case looping over a single variable aktype is legal. (It is usually pointless, but that's style, not syntax.) So this is legal:
foreach x of varlist aktype
By the way, you define x as the loop argument, but never refer to it inside the loop. That isn't illegal, but it is unusual.
You can also loop over a numlist, e.g.
foreach x of numlist 1/11
although
forval x = 1/11
is a more direct way of doing that. All this follows from the syntax diagrams for the commands concerned, where whatever is not explicitly allowed is forbidden.
On occasions when you need to loop over a varlist and a numlist you will need to use different syntax, but what is best depends on the precise problem.
Now second to the question: I can't see any kind of rule in the question for which values get assigned A through L, so can't advise positively.
Observations in my data set contain the history of moves for each player. I would like to count the number of consecutive series of moves of some pre-defined length (2, 3 and more than 3 moves) in the first and the second halves of the game. The sequences cannot overlap, i.e. the sequence 1111 should be considered as a sequence of the length 4, not 2 sequences of length 2. That is, for an observation like this:
+-------+-------+-------+-------+-------+-------+-------+-------+
| Move1 | Move2 | Move3 | Move4 | Move5 | Move6 | Move7 | Move8 |
+-------+-------+-------+-------+-------+-------+-------+-------+
| 1 | 1 | 1 | 1 | . | . | 1 | 1 |
+-------+-------+-------+-------+-------+-------+-------+-------+
…the following variables should be generated:
Number of sequences of 2 in the first half =0
Number of sequences of 2 in the second half =1
Number of sequences of 3 in the first half =0
Number of sequences of 3 in the second half =0
Number of sequences of >3 in the first half =1
Number of sequences of >3 in the second half = 0
I have two potential options of how to proceed with this task but neither of those leads to the final solution:
Option 1: Elaborating on Nick’s tactical suggestion to use strings (Stata: Maximum number of consecutive occurrences of the same value across variables), I have concatenated all “move*” variables and tried to identify the starting position of a substring:
egen test1 = concat(move*)
gen test2 = subinstr(test1,"11","X",.) // find all consecutive series of length 2
There are several problems with Option 1:
(1) it does not account for cases with overlapping sequences (“1111” is recognized as 2 sequences of 2)
(2) it shortens the resulting string test2 so that positions of X no longer correspond to the starting positions in test1
(3) it does not account for variable length of substring if I need to check for sequences of the length greater than 3.
Option 2: Create an auxiliary set of variables to identify the starting positions of the consecutive set (sets) of the 1s of some fixed predefined length. Building on the earlier example, in order to count sequences of length 2, what I am trying to get is an auxiliary set of variables that will be equal to 1 if the sequence of started at a given move, and zero otherwise:
+-------+-------+-------+-------+-------+-------+-------+-------+
| Move1 | Move2 | Move3 | Move4 | Move5 | Move6 | Move7 | Move8 |
+-------+-------+-------+-------+-------+-------+-------+-------+
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+-------+-------+-------+-------+-------+-------+-------+-------+
My code looks as follows but it breaks when I am trying to restart counting consecutive occurrences:
quietly forval i = 1/42 {
gen temprow`i' =.
egen rowsum = rownonmiss(seq1-seq`i') //count number of occurrences
replace temprow`i'=rowsum
mvdecode seq1-seq`i',mv(1) if rowsum==2
drop rowsum
}
Does anyone know a way of solving the task?
Assume a string variable concatenating all moves all (the name test1 is hardly evocative).
FIRST TRY: TAKING YOUR EXAMPLE LITERALLY
From your example with 8 moves, the first half of the game is moves 1-4 and the second half moves 5-8. Thus there is for each half only one way to have >3 moves, namely that there are 4 moves. In that case each substring will be "1111" and counting reduces to testing for the one possibility:
gen count_1_4 = substr(all, 1, 4) == "1111"
gen count_2_4 = substr(all, 5, 4) == "1111"
Extending this approach, there are only two ways to have 3 moves in sequence:
gen count_1_3 = inlist(substr(all, 1, 4), "111.", ".111")
gen count_2_3 = inlist(substr(all, 5, 4), "111.", ".111")
In similar style, there can't be two instances of 2 moves in sequence in each half of the game as that would qualify as 4 moves. So, at most there is one instance of 2 moves in sequence in each half. That instance must match either of two patterns, "11." or ".11". ".11." is allowed, so either includes both. We must also exclude any false match with a sequence of 3 moves, as just mentioned.
gen count_1_2 = (strpos(substr(all, 1, 4), "11.") | strpos(substr(all, 1, 4), ".11") ) & !count_1_3
gen count_2_2 = (strpos(substr(all, 5, 4), "11.") | strpos(substr(all, 5, 4), ".11") ) & !count_2_3
The result of each strpos() evaluation will be positive if a match is found and (arg1 | arg2) will be true (1) if either argument is positive. (For Stata, non-zero is true in logical evaluations.)
That's very much tailored to your particular problem, but not much worse for that.
P.S. I didn't try hard to understand your code. You seem to be confusing subinstr() with strpos(). If you want to know positions, subinstr() cannot help.
SECOND TRY
Your last code segment implies that your example is quite misleading: if there can be 42 moves, the approach above can not be extended without pain. You need a different approach.
Let's suppose that the string variable all can be 42 characters long. I will set aside the distinction between first and second halves, which can be tackled by modifying this approach. At its simplest, just split the history into two variables, one for the first half and one for the second and repeat the approach twice.
You can clone the history by
clonevar work = all
gen length1 = .
gen length2 = .
and set up your count variables. Here count_4 will hold counts of 4 or more.
gen count_4 = 0
gen count_3 = 0
gen count_2 = 0
First we look for move sequences of length 42, ..., 2. Every time we find one, we blank it out and bump up the count.
qui forval j = 42(-1)2 {
replace length1 = length(work)
local pattern : di _dup(`j') "1"
replace work = subinstr(work, "`pattern'", "", .)
replace length2 = length(work)
if `j' >= 4 {
replace count4 = count4 + (length1 - length2) / `j'
}
else if `j' == 3 {
replace count3 = count3 + (length1 - length2) / 3
}
else if `j' == 2 {
replace count2 = count2 + (length1 - length2) / 2
}
}
The important details here are
If we delete (repeated instances of) a pattern and measure the change in length, we have just deleted (change in length) / (length of pattern) instances of that pattern. So, if I look for "11" and found that the length decreased by 4, I just found two instances.
Working downwards and deleting what we found ensures that we don't find false positives, e.g. if "1111111" is deleted, we don't find later "111111", "11111", ..., "11" which are included within it.
Deletion implies that we should work on a clone in order not to destroy what is of interest.
I need an XSLT 1.0 test expression that will indicate whether the elements of the current node t are perfectly interleaved, like this
<t>
<cat />
<dog />
<horse />
<cat />
<dog />
<horse />
</t>
or has some other order, such as
<t>
<cat />
<cat />
<dog />
<dog />
<horse />
<horse />
</t>
or
<t>
<cat />
<dog />
<cat/>
<horse/>
<cat/>
<horse />
</t>
If the first, there can be any number of such tuples. If the second, there can be any number (including zero) of each kind of child and in any order.
The special case of one cat, one dog, one horse can test true or false, whichever makes the algorithm easier.
I do know beforehand the names of the three elements.
EDIT. At Dimitre's request, let me try saying it another, maybe simpler, way.
The context node has any number of children, but each child has one of only three names. Before processing these children, I need to test whether they appear in a repeating pattern, such as A B C A B C A B C, or C A B C A B, or any other combination of repeated triplets, triplets in which each of the three appears once (A B C A B C tests true, A B B A B B tests false).
Provided that the order of the tuple is fixed, this template will return true for all cases where there are 1 or more tuples and false otherwise:
<xsl:template match="t">
<xsl:sequence
select="
count(*) gt 2 and
count(*) = count(*[
self::cat and position() mod 3 = 1 or
self::dog and position() mod 3 = 2 or
self::horse and position() mod 3 = 0])"/>
</xsl:template>
If the order of the tuple can vary, this template will return true for all cases where there are 1 or more tuples that are ordered the same as the first instance of the tuple and false otherwise
<xsl:template match="t">
<xsl:variable name="cat.pos" select="(count(cat[1]/preceding-sibling::*) + 1) mod 3"/>
<xsl:variable name="dog.pos" select="(count(dog[1]/preceding-sibling::*) + 1) mod 3"/>
<xsl:variable name="horse.pos" select="(count(horse[1]/preceding-sibling::*) + 1) mod 3"/>
<xsl:sequence
select="
count(*) gt 2 and
count(*) = count(*[
self::cat and position() mod 3 = $cat.pos or
self::dog and position() mod 3 = $dog.pos or
self::horse and position() mod 3 = $horse.pos])"/>
</xsl:template>
test="name(*[last()])=name(*[3])
and name(*[1])!=name(*[2])
and name(*[2])!=name(*[3])
and name(*[1])!=name(*[3])
and not(*[position() > 3][name()!=name(preceding-sibling::*[3])])"
returns true if the interleave is perfect (or if there are only three items).
Edits: Added the first condition to ensure the final tuple is complete and the three middle conditions to ensure the repeated tuple includes each of the three items (i.e., does not include duplicates).
I have a situation where I have to find the count of few Boolean field values only if they are true.
Input XML:
<PersonInfo>
<ArrayOfPersonInfo>
<CertAsAdultFlag>true</CertAsAdultFlag>
<DeceasedFlag>true</DeceasedFlag>
<WantedFlag>false</WantedFlag>
<CPSORFlag>true</CPSORFlag>
<ConditonalReleaseFlag>false</ConditonalReleaseFlag>
<ProbationFlag>true</ProbationFlag>
<MissingFlag>true</MissingFlag>
<ATLFlag>true</ATLFlag>
<CCWFlag>false</CCWFlag>
<VictimIDTheftFlag>true</VictimIDTheftFlag>
</ArrayOfPersonInfo>
</PersonInfo>
I need to find the count of these flags with the condition if they are 'true'.
Here is what I tried and was unsuccessful with:
<xsl:variable name="AlertCount" select="
count(
PersonInfo/ArrayOfPersonInfo[
CPSORFlag[.='true'] | CertAsAdultFlag[.='true'] |
DeceasedFlag[.='true'] | WantedFlag[.='true'] |
ConditonalReleaseFlag[.='true'] | MissingFlag[.='true'] |
ATLFlag[.='true'] | ProbationFlag[.='true'] | CCWFlag[.='true'] |
VictimIDTheftFlag[.='true'] | CHRIFlag[.='true'] |
CivilWritFlag[.='true'] | MentalPetitionFlag[.='true'] |
ProtectionOrderFlag[.='true'] | juvWantedFlag[.='true'] |
WeaponsFlag[.='true'] | WorkCardFlag[.='true']
]
)
"/>
I really need help with this from someone as I've been trying hard to get through it. Thanks in advance.
<xsl:variable name="AlertCount" select="count(PersonInfo//*[. = 'true'])" />
Here's why your's does not work:
The square brackets in your approach create a predicate over a node set.
That node-set was the union of all mentioned child nodes who fulfilled their condition. A non-empty node-set evaluates to true, a non-empty one to false.
In consequence your count() would always be 1 if any of the children were true and always be 0 if all of them were false.
In other words: You selected one <ArrayOfPersonInfo> node. If it fulfilled a condition (having any number of children with 'true' as their value) it was counted, otherwise not.
After clarification in the comments ("I need to worry only about the flags I mentioned in the above XML"):
<xsl:variable name="AlertCount" select="
count(
PersonInfo//*[
self::CPSORFlag or
self::CertAsAdultFlag or
self::DeceasedFlag or
self::WantedFlag or
self::ConditonalReleaseFlag or
self::MissingFlag or
self::ATLFlag or
self::ProbationFlag or
self::CCWFlag or
self::VictimIDTheftFlag or
self::CHRIFlag or
self::CivilWritFlag or
self::MentalPetitionFlag or
self::ProtectionOrderFlag or
self::juvWantedFlag or
self::WeaponsFlag or
self::WorkCardFlag
][. = 'true']
)
" />