Maybe bug in rapidxml - but I'm not sure how to fix - c++

I noticed the rapidxml parses illegal <<element/> to an element named <element, instead of producing an error.
I think the problem is the definition of lookup_node_name. The comment is
// Node name (anything but space \n \r \t / > ? \0)
What I understand from the w3.org specification is that a name can have letters, numbers, and a few other characters.
I'm not sure what will be a correct fix. Any suggestions?

From looking at the rapidxml code, lookup_node_name is a lookup table of valid name characters, and as the comment says, only a specific few are prohibited.
I'd try adding '< to the list of prohibited characters by setting the lookup entry for ASCII char 0x3C from 0 to 1. ie, on the line relating to chars 0x30..0x3f, change it from this...
// 0 1 2 3 4 5 6 7 8 9 A B C D E F
...
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, // 3
to this:
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, // 3
That may work for you, but I haven't tried it. I see you've tried to contact the developer via sourceforge, which is probably the best approach...

Related

how to print the max sequence of a given vector (possible values 1 and 0) with the property that are different and the number of the max sequence c++

Let's say I have a vector v with random 1 and 0.
std::vector<int> v = {1,0,1,0,0,1,0,1};
I want to find out the max sequence with the property v[i] != v[i-1]. Basically the numbers need to be different. In this example the max sequence is 4 (1, 0, 1, 0) from position v[0] to v[3]. There is also (0,1,0,1) from position v[4] to v[7]. There are 2 max sequences so the final output should look like this:
4 2
Where 4 is the max sequence and 2 the numbers of max sequences.
Let's take another example:
std::vector<int> v2 = {1,0,1,1,1,0,1,0,1,0};
The output here should be:
6 1
The max sequence starts from v[4] to v[9]. There is only one max sequence so it will print 1 this time.
I tried to solve this using a for loop:
n - number of integers in the vector
k - number of different integers in vector
maxk - the max sequence
many - how many max sequence are
for(int i{1}; i < n; i++) {
if(v[i] != v[i-1]) {
k++;
if(k > maxk) {
maxk = k;
}
}
else {
if(k == maxk) {
many++;
}
else {
many = 1;
}
k = 1;
}
}
But if you give it a vector like {1, 0, 0} it will not work. Can someone give me a tip of how this problem can be solved? Sorry for my bad english
First, sequence isn't the right word. A sequence can jump past elements. You mean a subarray.
Second, you talk about arrays with 0 and 1 in them, then give an example with 2. Do you want to not count subarrays with 2? Or count them? In other words if the input is [1, 2, 2] are you expecting an answer of 1 1 or 2 1?'.
That said, just make an array of where the best current subarray begins. For your first example that array would look like this:
1, 0, 1, 0, 0, 1, 0, 1
0, 0, 0, 0, 4, 4, 4, 4
And then a linear scan finds that you have a group of 4 starting at index 0, and another group of 4 starting at index 4.
For your next example,
1, 0, 1, 1, 1, 0, 1, 0, 1, 0
0, 0, 0, 3, 4, 4, 4, 4, 4, 4
And you have a group of 3 starting at index 0, 1 starting at 3, and 6 starting at 4. So we've found the 1 group of 6.
For your last example, what you'd get would depend on the answer you want.
I'll leave coding this to you.

Combinations in ROOT

What does the Combinations function do in ROOT/C++?
I only found this documentation
https://root.cern.ch/doc/master/namespaceROOT_1_1VecOps.html#a6d1d00c2ccb769cc48c6813dbeb132db
But I am still not sure what it does exactly.
Can someone provide an example showing how the answers in the documentation examples are computed?
Here is an example of what Combinations is doing:
Suppose you have a vector v{1., 2., 3., 4.,}
1, 2, 3, and 4 are the elements of the vector v
and 0, 1, 2, 3 are the indices of those elements.
If we write
Combinations (v, 2)
we get
{{ 0, 0, 0, 1, 1, 2} , { 1, 2, 3, 2, 3, 3}}.
That comes from looking at the different combinations of the vector elements.
Which are:
1, 2
1, 3
1, 4
2, 3
2, 4
3, 4
Which has the corresponding indices
0 1
0 2
0 3
1 2
1 3
2 3
Then, the left-side column makes the first vector in the answer and the right side column makes the second vector shown in the answer.

finding how many times a sequence repeats in a data frame using python

Is there a way to find how many times a sequence repeats in a dataframe?
Lets say I have a dataframe with a large number of 1 and 3's and I wanted to see how much this sequence [3,1,3,3,1] repeats.
here's an example list. 3,1,3,3,1,3,3,1,3,3,1,3,1,1,1,1,3,1,3,1,1,3,3,3
Here's an example of what I'm trying to do
this first part would be true 3,1,3,3,1,3,3,1,3,3,1,3,1,1,1,1,3,1,3,1,1,3,3,3
this second part would be false 3,1,3,3,1,3,3,1,3,3,1,3,1,1,1,1,3,1,3,1,1,3,3,3
and the third part would be false
3,1,3,3,1,3,3,1,3,3,1,3,1,1,1,1,3,1,3,1,1,3,3,3
I want to analyze sections at a time according to the length of the sequence I'm trying to find. In numeric order of the data frame.
My data Is in a dateandtime format. But I can change that.
Thanks for all your help I really appreciate it everything everybody does on this site.
my_list = np.array([3, 1, 3, 3, 1, 3, 3, 1, 3, 3, 1, 3, 1, 1, 1, 1, 3, 1, 3, 1, 1, 3, 3, 3])
target = np.array([3, 1, 3, 3, 1])
(my_list.reshape(-1, len(sequence)) == sequence[None, :]).all(axis=1)
This converts a list of numbers into a comma separated string, and then compares each sequential chunk to the target.
from itertools import izip_longest
my_list = [3, 1, 3, 3, 1, 3, 3, 1, 3, 3, 1, 3, 1, 1, 1, 1, 3, 1, 3, 1, 1, 3, 3, 3]
target = [3, 1, 3, 3, 1]
n = len(target)
>>> sum(all(a == b for a, b in izip_longest(target, my_list[(i * n):((i + 1) * n)]))
for i in range(len(my_list) // n))
1
Below is an alternative method that converts the integers to strings and then compares the strings.
target = ",".join(str(number) for number in target)
>>> target
'3,1,3,3,1'
>>> sum(",".join(str(number) for number in my_list[(i * n):(i * n + n)]) == target
for i in range(len(my_list) / n))
1
To give some more intuition on what is going on, the list is chunked five elements at a time and then those elements are joined as a string. These strings are then compared to the target string which was similarly converted, and the number of matches are then summed.
>>> [",".join(str(number) for number in my_list[(i * n):(i * n + n)])
for i in range(len(my_list) / n)]
['3,1,3,3,1', '3,3,1,3,3', '1,3,1,1,1', '1,3,1,3,1']
Step1
Convert list of integers into string.
Step2
Use findall() function of regex module to find all occurences of target_string in my_list_string.
import re
my_list = [3, 1, 3, 3, 1, 3, 3, 1, 3, 3, 1, 3, 1, 1, 1, 1, 3, 1, 3, 1, 1, 3, 3, 3]
target = [3, 1, 3, 3, 1]
my_list_string = ''.join(str(e) for e in my_list)
target_string = ''.join(str(e) for e in target)
print(len(re.findall(target_string, my_list_string)))

tiered while loops with multiple outputs

I am trying to learn about arrays. I know that python has lists, not arrays, but the idea is the same. I have a list of lists setup like an array, and I am trying to modify them for random art fun, but I can only get one resulting random number out of this piece of code.
##prior list creation code making a large list of zeros called "array"###
while 3 not in array:
i= random.randint(1,9)
j= random.randint(1,28)
if (i%3)!=0 and (j%7)!=0:
if 5 in array:
array[i][j]=3
return array
elif 3 in array:
array[i][j]=4
return array
else:
array[i][j]=5
amy=(i, j)
return array
continue
##the resulting list called "array" does not chnange any zero to any number except one to "5"##
I have cut the code that made an array filled with zeros. The only number that will show up is the 5... ideally, I would have each number only show up once with each run, but in different spots
What am I doing wrong? I don't fully understand arrays, so that might be it, but I'm having trouble searching what I think the problem might be. Any help you can provide would be great!
Edit:
Sorry about forgetting, the array is the proper size to hold the data (9 rows by 28columns), and it isn't throwing any errors or exceptions... that should have been in there before I posted.
It's hard to tell exactly what you are asking, but I think you just wanted to randomly assign the numbers 3, 4, and 5 somewhere inside your matrix with the condition that i is not divisible by 3 and j is not divisible by 7. If that's what you want, then this should do it:
import random
array = [ [ 0 for j in range(28) ] for i in range(9) ]
for n in [3, 4, 5]:
while True:
i = random.randint(1, 9)
if i % 3 != 0:
break
while True:
j = random.randint(1, 28)
if j % 7 != 0:
break
array[i][j] = n
print('\n'.join(str(a) for a in array))
So let's talk about how to get to this answer. First off, we want a way to generate random numbers until a condition happens. In most languages, this would involve a do-while loop, but Python doesn't have those. However, we can make something that is equivalent to a do-while loop using just a while loop:
# This says to run this loop *forever*
while True:
# do something here
pass # This means "do nothing" in Python
if condition:
# This says if the previously mentioned condition
# is True then we will stop executing the
# currently containing loop.
break
So this construct, that I just showed is a building block which you can use to make a loop that runs until a condition is met.
Let's see how that fits in your original example. We want a random number in the range [1, 9) that is not divisible by 3. The random.randint function will provide a random number in that range, but it doesn't guarantee that it is not divisible by 3. So we need to enforce that constraint ourselves. One way to accomplish that is to simply generate a new number if the constraint is not met.
So now we can use the previously discussed loop construct to build a loop that runs until we have a number that is not divisible by 3:
while True:
i = random.randint(1, 9)
if i % 3 != 0:
break
I'm sure you can do the logical replacement in the previous loop to see how it fits the other example.
So now we can talk a little bit more about where you went wrong in your original code, and how you can prevent from making those same mistakes in the future.
Let's talk about this line first:
while 3 not in array:
First of all, when developing code, it's a very good idea to try things out at the Python interactive interpreter prompt. This is especially true when you are trying out a feature of the language for the first time. As I showed in my comment, the condition in your while loop is always False. The reason for that is clear if you try it out in the interpreter:
>>> 3 in [[3],[3],[3]]
False
The in operator only looks one level deep into a list. Also, it's always a good idea to start small when testing things out interactively. Notice I'm using a list of only 3 elements with nested lists containing only 1 element each instead of your original example of a 9 element list with nested lists containing 28 elements.
Now, another approach that we could have taken to make your loop condition change over time would be to make a "recursive" version of the in operator. Alternatively, we could have just hard coded it to expect a list that contains lists. I'm going to take this second approach because it is simpler, and I don't know if you are already familiar with recursion, and this is already a long explanation.
def contains2d(outer, element):
"""Expects a 2D-array-like list. Returns True if any of the inner
sub-lists within the outer list contain element.
"""
return any([e in inner for inner in outer])
If we try this function out, we'll see that it behaves as you originally expected the in operator to behave:
>>> contains2d([[3],[3],[3]], 3)
True
>>> contains2d([[0],[0],[3]], 3)
True
>>> contains2d([[0],[0],[0]], 3)
False
So now let's talk about your misunderstanding of the continue keyword. A while loop repeats automatically. You don't need to use continue inside of a while loop for it to repeat. The continue keyword is only used to skip the rest of the body of a loop. This is generally used if you have some special case for which you don't want to do the normal loop processing. Here's a reasonable example of how you might use continue:
>>> for i in range(10):
... if i % 3 == 0:
... continue
... print(i)
...
1
2
4
5
7
8
Notice that it really doesn't make any sense to put continue at the end of a loop:
while True:
print("This loop runs forever!")
# The following continue is useless
continue
Alright, so now that we have contains2d, we can start to think about what we want. In your example, you say you want the array variable to contain 3,4, and 5 at the end of your loop. Again, let's start small and see if the condition is True under the desired circumstances. We know from earlier that contains2d([[0],[0],[3]], 3) == True, so that is an insufficient loop termination criteria. Remember, we want the loop to only be true when all conditions have been met. So that means we need to use the and operator
>>> contains2d([[0],[0],[3]], 3) and contains2d([[0],[0],[3]], 4) and contains2d([[0],[0],[3]], 5)
False
>>> contains2d([[4],[0],[3]], 3) and contains2d([[4],[0],[3]], 4) and contains2d([[4],[0],[3]], 5)
False
>>> contains2d([[4],[5],[3]], 3) and contains2d([[4],[5],[3]], 4) and contains2d([[4],[5],[3]], 5)
True
Note that this condition is very ugly and hard to write. Ideally, we'd probably like to refactor it. One way to do that is to use the built-in all function. I'll let you experiment with it on your own, but here's the end result:
>>> all([contains2d([[4],[5],[3]], e) for e in [3,4,5]])
True
That's much shorter and much more clear. So, moving on, we want to run this loop as long as that condition is not True:
while not all([contains2d(array, e) for e in [3,4,5]]):
# ...
The next two lines are actually just fine, but I would format them according to PEP-8:
while not all([contains2d(array, e) for e in [3,4,5]]):
i = random.randint(1, 9)
j = random.randint(1, 28)
# ...
If we replace the in conditions with our contains2d function, then we're almost getting to a working solution:
while not all([contains2d(array, e) for e in [3,4,5]]):
i = random.randint(1, 9)
j = random.randint(1, 28)
if i % 3 != 0 and j % 7 != 0:
if contains2d(array, 5):
# ...
elif contains2d(array, 3):
# ...
else:
# ...
The assignments within the if conditions are perfectly fine as well. However, when you have a return inside of a loop, that will exit the loop and exit the entire containing function. In this code you don't actually have a containing function, so that will just end your program. That's not what you want to do here, so let's just drop all of those return statements:
while not all([contains2d(array, e) for e in [3,4,5]]):
i = random.randint(1, 9)
j = random.randint(1, 28)
if i % 3 != 0 and j % 7 != 0:
if contains2d(array, 5):
array[i][j] = 3
elif contains2d(array, 3):
array[i][j] = 4
else:
array[i][j] = 5
This program is very close to being right. However, the innermost if block is using some odd logic. If you run this progam using my initialization and output, you'll see something like this:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4]
[0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 5, 4]
[0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4]
[0, 4, 4, 4, 4, 4, 4, 0, 3, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4]
It actually took me a minute to figure out what was going wrong, but the key obsevation is that the first time through the loop for which you get a successful pair of i and j, you'll take the else branch randomly placing a 5 in the nested list. The next successful time you get inside the if, you'll take the first branch which randomly places a 3 inside the nested list. However, here's where things go sideways. On the next succssful iteration, you'll still take the first branch, so you'll randomly place another 3 inside the nested list. You will keep doing this for several more iterations. Eventually, you will overwrite the previously written 5. At that point, the first condition will no longer hold and instead you will start taking the second branch which randomly places a 4 in the nested list. This branch will continue to be taken since 5 is no longer present in the nested list. Eventually, you will overwrite all of the previously written 3s with 4s in the nested list. At that point, the second branch condition will no longer hold, and you will land on the else branch again, writing a 5 to the nested list. Finally, the first condition will hold again, so on the next successful iteration, you'll write a 3 again, and as long as you don't get unlucky, that will miss the 5 and you'll now have met the termination criteria for the outer while loop leaving you with a bunch of 4s and just a single 3 and a single 5.
Maybe this is what you wanted to do, but it didn't seem that way from your question, so let's assume you really just wanted one of each of the numbers present in the nested list. If that's the case, we can easily correct the previous program. We just need to fix the conditions for each of the if cases. We only want to take the first branch when a 5 is present, but a 3 is not present. Thus that gives us if contains2d(array, 5) and not contains2d(array, 3):. Furthermore, we only want to take the second branch if 5 and 3 are both already present. Thus that gives us elif contains2d(array, 5) and contains2d(array, 3):. Finally, we only want to take the last branch if a 5 is not present. Thus we must change the else to another elif giving us elif not contains2d(array, 5):. Putting this all together gives us:
array = [ [ 0 for j in range(28) ] for i in range(9) ]
while not all([contains2d(array, e) for e in [3,4,5]]):
i = random.randint(1, 9)
j = random.randint(1, 28)
if i % 3 != 0 and j % 7 != 0:
if contains2d(array, 5) and not contains2d(array, 3):
array[i][j] = 3
elif contains2d(array, 5) and contains2d(array, 3):
array[i][j] = 4
elif not contains2d(array, 5):
array[i][j] = 5
print('\n'.join(str(a) for a in array))
This actually works like my original answer. However, it's not very satisfactory because the logic inside the if block is quite complicated. There's actually very rigid sequence of events that must happen. Whenever you think of a sequence of things, you should think of a list. In this case, the sequence goes like this, we assign 5, then we assign 3, then finally we assign 4. That can be represented as this list: [5, 3, 4]. If we reorder things a bit, we can get the following program:
array = [ [ 0 for j in range(28) ] for i in range(9) ]
for n in [5, 3, 4]:
while not contains2d(array, n):
i = random.randint(1, 9)
j = random.randint(1, 28)
if i % 3 != 0 and j % 7 != 0:
array[i][j] = n
print('\n'.join(str(a) for a in array))
This particular program does have a flaw in that it's possible for one of the later values to overwrite one of the earlier values, and thus the postcondition of all the numbers being present could possibly not hold if your random numbers happen to collide. In fact, my initial answer has this same issue. I'll leave fixing that as an exercise for you to figure out.

thrust::exclusive_scan_by_key unexpected behavior

int data[ 10 ] = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };
int keys[ 10 ] = { 1, 2, 1, 2, 1, 2, 1, 2, 1, 2 };
thrust::exclusive_scan_by_key( keys, keys + 10, data, data );
By the examples at Thrust Site I expected 0,0,1,1,2,2,3,3,4,4, but got 0,0,0,0,0,0,0,0,0 instead; Is it bug, or is there somewhere something the defines this behavior?
More importantly, assuming this is not a bug, is there a way to achieve this effect easily?
I don't think you understand what scan_by_key does. From the documentation:
"Specifically, consecutive iterators i and i+1 in the range [first1, last1) belong to the same segment if binary_pred(*i, *(i+1)) is true, and belong to different segments otherwise"
scan_by_key requires that your key array mark distinct segments using contiguous values:
keys: 0 0 0 1 1 1 0 0 0 1 1 1
seg#: 0 0 0 1 1 1 2 2 2 3 3 3
thrust compares adjacent keys to determine segments.
Your keys are producing a segment map like this:
int keys[ 10 ] = { 1, 2, 1, 2, 1, 2, 1, 2, 1, 2 };
seg#: 0 1 2 3 4 5 6 7 8 9
Since you are doing an exclusive scan, the correct answer to such a segment map (regardless of the data) would be all zeroes.
It's not entirely clear what "this effect" is that you want to achieve, but you may want to do back-to-back stable sort by key operations, reversing the sense of keys and values, to rearrange this data to group the segments (i.e. keys 1 and 2) together.