I'm trying to sort coordinates in a vector based on whether they are enveloped or dominated by other coordinates. For example the coordinate [1 , 2 , 1 , 1] is enveloped or dominated by [4, 2 , 1 , 2] even though the 2nd and 3rd values of both coordinates are equal.
Highlight of program. (Complete program online at rextester.com)
int input[18][4] = { { 4 , 3 , 3 , 3 } , { 1 , 5 , 4 , 1 } , { 2 , 4 , 5 , 4 } ,
{ 3 , 1 , 2 , 5 } , { 4 , 2 , 1 , 2 } , { 1 , 3 , 3 , 1 } ,
{ 2 , 3 , 3 , 3 } , { 3 , 1 , 2 , 3 } , { 5 , 2 , 1 , 2 } ,
{ 1 , 4 , 4 , 1 } , { 1 , 1 , 2 , 1 } , { 1 , 2 , 1 , 1 } ,
{ 2 , 1 , 2 , 4 } , { 2 , 2 , 1 , 2 } , { 3 , 1 , 1 , 2 } ,
{ 2 , 1 , 2 , 3 } , { 1 , 1 , 1 , 1 } , { 2 , 1 , 1 , 2 } };
struct Coordinate
{
Coordinate(){}
Coordinate( int (&val)[4] );
bool operator<( const Coordinate& otherCoord ) const;
void print() const;
int value[4];
};
void print( const std::vector<Coordinate>& coord );
int main()
{
std::vector<Coordinate> coord;
coord.assign( input , input + 18 );
print( coord );
std::sort( coord.begin() , coord.end() );
print( coord );
}
Program output is however not what I expected,
[ 1 , 1 , 1 , 1 ]
[ 1 , 4 , 4 , 1 ]
[ 2 , 1 , 1 , 2 ]
[ 2 , 1 , 2 , 3 ]
[ 3 , 1 , 1 , 2 ]
[ 1 , 3 , 3 , 1 ]
[ 2 , 2 , 1 , 2 ]
[ 2 , 1 , 2 , 4 ]
[ 1 , 2 , 1 , 1 ]
[ 1 , 1 , 2 , 1 ]
[ 4 , 3 , 3 , 3 ]
[ 5 , 2 , 1 , 2 ] // <-- ???
[ 3 , 1 , 2 , 3 ]
[ 2 , 3 , 3 , 3 ]
[ 4 , 2 , 1 , 2 ] // <-- ???
[ 3 , 1 , 2 , 5 ]
[ 2 , 4 , 5 , 4 ]
[ 1 , 5 , 4 , 1 ]
For example [ 5 , 2 , 1 , 2 ] envelopes or dominates [ 4 , 2 , 1 , 2 ] yet appears before it as shown in the program output.
What you are asking for is lexicographical ordering that basically amounts to saying the comparison (x1, y1) < (x2, y2) is equivalent to saying if (x1 < x2 || (x1 == x2 && y1 < y2))
The body of your Coordinate::operator< can be modified as follows:
for( int i = 0; i < 4; ++i ) {
if( value[i] > otherCoord.value[i] )
return false;
if (value[i] < otherCoord.value[i] )
return true;
}
return false;
We return false at the end because we are performing strict less-than comparison. When we've reached that line we know that all the elements of both coordinates are identical, so if we return true then we've satisfied <= instead.
However, I would propose that you update this code to use more modern C++. Namely vectors and arrays. This is nice especially because the default operator< for a std::array will perform lexicographical ordering for you. (Additionally you don't have to worry about pointer math because you get to use iterators).
Here is your new class:
template<size_t N>
struct Coordinate
{
Coordinate(){}
Coordinate( std::array<int, N> _val);
bool operator<( const Coordinate& otherCoord ) const;
void print() const;
std::array<int, N> value;
};
And here's how you'd implement operator<:
template<size_t N>
bool Coordinate<N>::operator<( const Coordinate<N>& otherCoord ) const
{
return value < otherCoord.value;
}
And finally main:
int main()
{
std::vector<Coordinate<4>> coords;
coords.assign( input.begin(), input.end() );
print(coords);
std::sort(coords.begin(), coords.end());
print( coords );
}
Prefer the templates for Coordinate so that you can make coordinates of arbitrary dimensionality at compile-time. Right now there is a lot of magic numbering going on to make it all work.
Here's a live demo
I've found the answer and I'm posting here for posterity's sake.
The sorting criterion must define strict weak ordering, which is defined by the following four properties
Accordingly I've re-implemented operator< as follows.
Note: implementation intentionally suboptimal for sake of clarity. (Comparisons should ideally be done once and cached.)
bool Coordinate::operator<( const Coordinate& otherCoord ) const
{
int ltCount = 0;
int gtCount = 0;
for( int i = 0; i < 4; ++i )
{
if( value[i] < otherCoord.value[i] ) ++ltCount;
if( value[i] > otherCoord.value[i] ) ++gtCount;
}
if( ltCount == 4 ) return true; // Strictly less
if( gtCount == 4 ) return false; // Strictly greater
// Neither stritcly greater or less. Create ordering (based on magnitute of first coordinate)
for( int i = 0; i < 4; ++i )
{
if( value[i] == otherCoord.value[i] ) continue;
return( value[i] < otherCoord.value[i] );
}
return false; // this should NEVER happen if coords are NOT equal.
}
Related
I am building a tool that is essentially an alternative to lcov. I am trying to make the default branch coverage it generates have as little noise as possible. One source of branch noise seems to be initializer lists:
#include <vector>
#include <string>
using namespace std;
struct A {
vector<string> reference_tokens;
};
int main() {
A a;
vector<string> rt = {"a", "b", "c"};
a.reference_tokens = {rt[0]};
return 0;
}
When I generate coverage for this snippet, I get:
9 : : struct A {
10 : : vector<string> reference_tokens;
11 : : };
12 : :
13 : 1 : int main() {
14 : 2 : A a;
15 : :
16 [ + - ][ + - ]: 6 : vector<string> rt = {"a", "b", "c"};
[ + - ][ + - ]
17 : :
18 [ + - ][ + - ]: 2 : a.reference_tokens = {rt[0]};
[ + + ][ - - ]
19 : 1 : return 0;
20 : : }
Now, I realize GCC inserts branches for handling exceptions. However, if I filter out exceptional branches, I'm still left with:
9 : : struct A {
10 : : vector<string> reference_tokens;
11 : : };
12 : :
13 : 1 : int main() {
14 : 2 : A a;
15 : :
16 : 6 : vector<string> rt = {"a", "b", "c"};
17 : :
18 [ + + ][ - - ]: 2 : a.reference_tokens = {rt[0]};
19 : 1 : return 0;
20 : : }
I'm not sure what these 4 branches [ + + ][ - - ] are for. They are not exceptional branches according to gcov, and it seems they always are 1:1. For example, a bigger initializer list will result in [ + + ][ + + ][ + + ][ - - ][ - - ][ - - ] non-exceptional branches.
So my question is... what are these branches? Are the reachable? Are they noise that can be safely removed?
I am using Arduino and I want to pass a parameter by references
so my parameter was an array of boolean, like this:
boolean isCodeHaveEnd(boolean (&code)) {
boolean TCode[18] = { 1 , 1 , 1 , 1 , 1 ,1
, 0 , 1 , 0 , 0 , 0 , 0
, 0 , 1 , 0 , 0 , 0 , 0} ;
boolean XCode[18] ;
for (size_t i = 108; i < 125; i++) {
XCode[i] = code[i] ;
}
return equal(TCode, XCode, 18) ;
}
whatever I tried I face the same error
src/main.cpp:109:33: error: invalid types 'boolean {aka bool}[size_t {aka unsigned int}]' for array subscript
XCode[i] = code[i] ;
problem solved, just need to add a const before the type of the parameter as follows:
boolean isCodeHaveEnd(const boolean (&code)[126]) {
boolean TCode[18] = { 1 , 1 , 1 , 1 , 1 ,1
, 0 , 1 , 0 , 0 , 0 , 0
, 0 , 1 , 0 , 0 , 0 , 0} ;
boolean XCode[18] ;
for (size_t i = 108; i < 125; i++) {
XCode[i] = code[i] ;
}
return equal(TCode, XCode, 18) ;
}
I am trying to a initialise an array of structs in a std::array. I know that the following is a way of initialising an std::array with integers.
std::array<int, 5> arr { {1, 2, 3, 4, 5} };
Scenario:
But, say I have an array of structs like this
struct MyStruct {
const char *char_val_1;
const char *char_val_2;
int int_val_1;
double d_val_1;
} my_struct_obj[] = {
{ "a1b1" , "a2b1" , 1 , 1.1 },
{ "a1b2" , "a3b1" , 2 , 1.2 },
{ "a1b3" , "a4b1" , 3 , 1.3 },
{ "a1b4" , "a5b1" , 4 , 1.4 },
{ "a1b5" , "a6b1" , 5 , 1.5 },
{ "a1b6" , "a7b1" , 6 , 1.6 },
{ "a1b7" , "a8b1" , 7 , 1.7 },
{ "a1b8" , "a9b1" , 8 , 1.8 },
{ "a1b9" , "a10b1" , 9 , 1.9 },
};
Question:
How can I create an std::array of MyStructs each initialised with different set of values?
Just like for integers, provide initializers for each value:
std::array<MyStruct, 9> my_struct_arr = {{
{ "a1b1" , "a2b1" , 1 , 1.1 },
{ "a1b2" , "a3b1" , 2 , 1.2 },
{ "a1b3" , "a4b1" , 3 , 1.3 },
{ "a1b4" , "a5b1" , 4 , 1.4 },
{ "a1b5" , "a6b1" , 5 , 1.5 },
{ "a1b6" , "a7b1" , 6 , 1.6 },
{ "a1b7" , "a8b1" , 7 , 1.7 },
{ "a1b8" , "a9b1" , 8 , 1.8 },
{ "a1b9" , "a10b1" , 9 , 1.9 },
}};
I'm looking for a function that returns a map[string]interface{} where interface{} can be a slice, a a map[string]interface{} or a value.
My use case is to parse WKT geometry like the following and retrieves point values; Example for a donut polygon:
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
The regex (I voluntary set \d that matches only integers for readability purpose):
(POLYGON \(
(?P<polygons>\(
(?P<points>(?P<point>(\d \d), ){3,})
(?P<last_point>\d \d )\),)*
(?P<last_polygon>\(
(?P<points>(?P<point>(\d \d), ){3,})
(?P<last_point>\d \d)\))\)
)
I have a function (copied from SO) that retrieves some informations but it's not that good for nested groups and list of groups:
func getRegexMatchParams(reg *regexp.Regexp, url string) (paramsMap map[string]string) {
match := reg.FindStringSubmatch(url)
paramsMap = make(map[string]string)
for i, name := range reg.SubexpNames() {
if i > 0 && i <= len(match) {
paramsMap[name] = match[i]
}
}
return match
}
It seems that the group point gets only 1 point.
example on playground
[EDIT] The result I want is something like this:
map[string]interface{}{
"polygons": map[string]interface{} {
"points": []interface{}{
{map[string]string{"point": "0 0"}},
{map[string]string{"point": "0 10"}},
{map[string]string{"point": "10 10"}},
{map[string]string{"point": "10 0"}},
},
"last_point": "0 0",
},
"last_polygon": map[string]interface{} {
"points": []interface{}{
{map[string]string{"point": "3 3"}},
{map[string]string{"point": "3 7"}},
{map[string]string{"point": "7 7"}},
{map[string]string{"point": "7 3"}},
},
"last_point": "3 3",
}
}
So I can use it further for different purposes like querying databases and validate that last_point = points[0] for each polygon.
Try to add some whitespace to the regex.
Also note that this engine won't retain all capture group values that are
within a quantified outer grouping like (a|b|c)+ where this group will only contain the last a or b or c it finds.
And, your regex can be reduced to this
(POLYGON\s*\((?P<polygons>\(\s*(?P<points>(?P<point>\s*(\d+\s+\d+)\s*,){3,})\s*(?P<last_point>\d+\s+\d+)\s*\)(?:\s*,\s*|\s*\)))+)
https://play.golang.org/p/rLaaEa_7GX
The original:
(POLYGON\s*\((?P<polygons>\(\s*(?P<points>(?P<point>\s*(\d+\s+\d+)\s*,){3,})\s*(?P<last_point>\d+\s+\d+)\s*\),)*(?P<last_polygon>\(\s*(?P<points>(?P<point>\s*(\d+\s+\d+)\s*,){3,})\s*(?P<last_point>\d+\s+\d+)\s*\))\s*\))
https://play.golang.org/p/rZgJYPDMzl
See below for what the groups contain.
( # (1 start)
POLYGON \s* \(
(?P<polygons> # (2 start)
\( \s*
(?P<points> # (3 start)
(?P<point> # (4 start)
\s*
( \d+ \s+ \d+ ) # (5)
\s*
,
){3,} # (4 end)
) # (3 end)
\s*
(?P<last_point> \d+ \s+ \d+ ) # (6)
\s* \),
)* # (2 end)
(?P<last_polygon> # (7 start)
\( \s*
(?P<points> # (8 start)
(?P<point> # (9 start)
\s*
( \d+ \s+ \d+ ) # (10)
\s*
,
){3,} # (9 end)
) # (8 end)
\s*
(?P<last_point> \d+ \s+ \d+ ) # (11)
\s* \)
) # (7 end)
\s* \)
) # (1 end)
Input
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
Output
** Grp 0 - ( pos 0 , len 65 )
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
** Grp 1 - ( pos 0 , len 65 )
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
** Grp 2 [polygons] - ( pos 9 , len 30 )
(0 0, 0 10, 10 10, 10 0, 0 0),
** Grp 3 [points] - ( pos 10 , len 23 )
0 0, 0 10, 10 10, 10 0,
** Grp 4 [point] - ( pos 27 , len 6 )
10 0,
** Grp 5 - ( pos 28 , len 4 )
10 0
** Grp 6 [last_point] - ( pos 34 , len 3 )
0 0
** Grp 7 [last_polygon] - ( pos 39 , len 25 )
(3 3, 3 7, 7 7, 7 3, 3 3)
** Grp 8 [points] - ( pos 40 , len 19 )
3 3, 3 7, 7 7, 7 3,
** Grp 9 [point] - ( pos 54 , len 5 )
7 3,
** Grp 10 - ( pos 55 , len 3 )
7 3
** Grp 11 [last_point] - ( pos 60 , len 3 )
3 3
Possible Solution
It's not impossible. It just takes a few extra steps.
(As an aside, isn't there a library for WKT that can parse this for you ?)
Now, I don't know your language capabilities, so this is just a general approach.
1. Validate the form you're parsing.
This will validate and return all polygon sets as a single string in All_Polygons group.
Target POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
POLYGON\s*\((?P<All_Polygons>(?:\(\s*\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,}\s*\))(?:\s*,\(\s*\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,}\s*\))*)\s*\)
** Grp 1 [All_Polygons] - ( pos 9 , len 55 )
(0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3)
2. If 1 was successful, set up a loop match using the output of All_Polygons string.
Target (0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3)
(?:\(\s*(?P<Single_Poly_All_Pts>\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,})\s*\))
This step is equivalent of a find all type of match. It should match successive values of all the points of a single polygon, returned in Single_Poly_All_Pts group string.
This will give you these 2 separate matches, which can be put into a temp array having 2 value strings:
** Grp 1 [Single_Poly_All_Pts] - ( pos 1 , len 27 )
0 0, 0 10, 10 10, 10 0, 0 0
** Grp 1 [Single_Poly_All_Pts] - ( pos 31 , len 23 )
3 3, 3 7, 7 7, 7 3, 3 3
3. If 2 was successful, set up a loop match using the temp array output of step 2.
This will give you the individual points of each polygon.
(?P<Single_Point>\d+\s+\d+)
Again this is a loop match (or a find all type of match). For each array element
(Polygon), this will produce the individual points.
Target[element 1] 0 0, 0 10, 10 10, 10 0, 0 0
** Grp 1 [Single_Point] - ( pos 0 , len 3 )
0 0
** Grp 1 [Single_Point] - ( pos 5 , len 4 )
0 10
** Grp 1 [Single_Point] - ( pos 11 , len 5 )
10 10
** Grp 1 [Single_Point] - ( pos 18 , len 4 )
10 0
** Grp 1 [Single_Point] - ( pos 24 , len 3 )
0 0
And,
Target[element 2] 3 3, 3 7, 7 7, 7 3, 3 3
** Grp 1 [Single_Point] - ( pos 0 , len 3 )
3 3
** Grp 1 [Single_Point] - ( pos 5 , len 3 )
3 7
** Grp 1 [Single_Point] - ( pos 10 , len 3 )
7 7
** Grp 1 [Single_Point] - ( pos 15 , len 3 )
7 3
** Grp 1 [Single_Point] - ( pos 20 , len 3 )
3 3
I have following data frame.
d = pd.DataFrame({'one' : [0,1,1,1,0,1],'two' : [0,0,1,0,1,1]})
d
one two
0 0 0
1 1 0
2 1 1
3 1 0
4 0 1
5 1 1
I want cumulative sum which resets at zero
desired output should be
pd.DataFrame({'one' : [0,1,2,3,0,1],'two' : [0,0,1,0,1,2]})
one two
0 0 0
1 1 0
2 2 1
3 3 0
4 0 1
5 1 2
i have tried using group by but it does not work for entire table.
df2 = df.apply(lambda x: x.groupby((~x.astype(bool)).cumsum()).cumsum())
print(df2)
Output:
one two
0 0 0
1 1 0
2 2 1
3 3 0
4 0 1
5 1 2
pandas
def cum_reset_pd(df):
csum = df.cumsum()
return (csum - csum.where(df == 0).ffill()).astype(d.dtypes)
cum_reset_pd(d)
one two
0 0 0
1 1 0
2 2 1
3 3 0
4 0 1
5 1 2
numpy
def cum_reset_np(df):
v = df.values
z = np.zeros_like(v)
j, i = np.where(v.T)
r = np.arange(1, i.size + 1)
p = np.where(
np.append(False, (np.diff(i) != 1) | (np.diff(j) != 0))
)[0]
b = np.append(0, np.append(p, r.size))
z[i, j] = r - b[:-1].repeat(np.diff(b))
return pd.DataFrame(z, df.index, df.columns)
cum_reset_np(d)
one two
0 0 0
1 1 0
2 2 1
3 3 0
4 0 1
5 1 2
Why go through this trouble?
because it's quicker!
This one is without using Pandas, but using NumPy and list comprehensions:
import numpy as np
d = {'one': [0,1,1,1,0,1], 'two': [0,0,1,0,1,1]}
out = {}
for key in d.keys():
l = d[key]
indices = np.argwhere(np.array(l)==0).flatten()
indices = np.append(indices, len(l))
out[key] = np.concatenate([np.cumsum(l[indices[n-1]:indices[n]]) \
for n in range(1, indices.shape[0])]).ravel()
print(out)
First, I find all occurences of 0 (positions to split the lists), then I calculate cumsum of the resulting sublists and insert them into a new dict.
This should do it:
d = {'one' : [0,1,1,1,0,1],'two' : [0,0,1,0,1,1]}
one = d['one']
two = d['two']
i = 0
new_one = []
for item in one:
if item == 0:
i = 0
else:
i += item
new_one.append(i)
j = 0
new_two = []
for item in two:
if item == 0:
j = 0
else:
j += item
new_two.append(j)
d['one'], d['two'] = new_one, new_two
df = pd.DataFrame(d)