unexpected results with word2vec algorithm

unexpected results with word2vec algorithm - c++

I implemented word2vec in c++.
I found the original syntax to be unclear, so I figured I'd re-implement it, using all the benefits of c++ (std::map, std::vector, etc)
This is the method that actually gets called every time a sample is trained (l1 denotes the index of the first word, l2 the index of the second word, label indicates whether it is a positive or negative sample, and neu1e acts as the accumulator for the gradient)
void train(int l1, int l2, double label, std::vector<double>& neu1e)
{
// Calculate the dot-product between the input words weights (in
// syn0) and the output word's weights (in syn1neg).
auto f = 0.0;
for (int c = 0; c < m__numberOfFeatures; c++)
f += syn0[l1][c] * syn1neg[l2][c];
// This block does two things:
// 1. Calculates the output of the network for this training
// pair, using the expTable to evaluate the output layer
// activation function.
// 2. Calculate the error at the output, stored in 'g', by
// subtracting the network output from the desired output,
// and finally multiply this by the learning rate.
auto z = 1.0 / (1.0 + exp(-f));
auto g = m_learningRate * (label - z);
// Multiply the error by the output layer weights.
// (I think this is the gradient calculation?)
// Accumulate these gradients over all of the negative samples.
for (int c = 0; c < m__numberOfFeatures; c++)
neu1e[c] += (g * syn1neg[l2][c]);
// Update the output layer weights by multiplying the output error
// by the hidden layer weights.
for (int c = 0; c < m__numberOfFeatures; c++)
syn1neg[l2][c] += g * syn0[l1][c];
}
This method gets called by
void train(const std::string& s0, const std::string& s1, bool isPositive, std::vector<double>& neu1e)
{
auto l1 = m_wordIDs.find(s0) != m_wordIDs.end() ? m_wordIDs[s0] : -1;
auto l2 = m_wordIDs.find(s1) != m_wordIDs.end() ? m_wordIDs[s1] : -1;
if(l1 == -1 || l2 == -1)
return;
train(l1, l2, isPositive ? 1 : 0, neu1e);
}
which in turn gets called by the main training method.
Full code can be found at
https://github.com/jorisschellekens/ml/tree/master/word2vec
With complete example at
https://github.com/jorisschellekens/ml/blob/master/main/example_8.hpp
When I run this algorithm, the top 10 words 'closest' to father are:
father
Khan
Shah
forgetful
Miami
rash
symptoms
Funeral
Indianapolis
impressed
This the method to calculate the nearest words:
std::vector<std::string> nearest(const std::string& s, int k) const
{
// calculate distance
std::vector<std::tuple<std::string, double>> tmp;
for(auto &t : m_unigramFrequency)
{
tmp.push_back(std::make_tuple(t.first, distance(t.first, s)));
}
// sort
std::sort(tmp.begin(), tmp.end(), [](const std::tuple<std::string, double>& t0, const std::tuple<std::string, double>& t1)
{
return std::get<1>(t0) < std::get<1>(t1);
});
// take top k
std::vector<std::string> out;
for(int i=0; i<k; i++)
{
out.push_back(std::get<0>(tmp[tmp.size() - 1 - i]));
}
// return
return out;
}
Which seems weird.
Is something wrong with my algorithm?

Are you sure, that you get "nearest" words (not farest)?
...
// take top k
std::vector<std::string> out;
for(int i=0; i<k; i++)
{
out.push_back(std::get<0>(tmp[tmp.size() - 1 - i]));
}
...

Related

Combining subranges of a vector efficiently to iterate through

Combining subranges of a vector efficiently
The Process
I have some numerical data stored in vector, v. Vector v is composed of many subranges of valid/invalid data with unpredictable lengths according to some predicate, e.g. being above some threshold value. After filtering, these valid ranges are represented by a second vector, f, which contains std::pair<size_t, size_t>'s indicating the start index of the range and index one past the end of the range.
For example, filtering the vector { 1, 5, 3, 12, 10, 21, 19, 14, 5, 9, 3, 7, 2 } for data above a threshold of 10 would return { {3, 8} }
The Data
The data I am using originates from real world measurements of the output power of a laser as it is cycled on and off. The transfer from off to on, and vice versa, is not instantaneous, and noise during the transition can make it difficult to determine the exact start point/end point.
The data produced is treated as immutable and no alterations are applied to v.
The Filter
In addition to the data to be filtered and a threshold value, the filter takes a value, x representing the number of valid/invalid elements it should encounter before determining there has been a transition from a valid subrange to an invalid one, or vice versa.
For example, using the same vector as above, { 1, 5, 3, 12, 10, 21, 19, 14, 5, 9, 3, 7, 2 }, but a threshold of 8 and x = 2:
The filter reaches index 3, recognizing 12 > 8.
It continues x more indices, checking that they are also above the threshold before recognizing a transition has occurred.
The start point is set to 3.
The reverse happens for the transition from above the threshold to below.
The filter reaches index 8, recognizing 5 < 8.
However, at index 9. v[9] = 9 > 8.
As there haven't been x values below the threshold, the valid subrange continues.
At index 10 the count starts again, this time finding a valid transition.
The end point is set to 10 (One past the end).
The Problem
By only retaining the information about the start and end points of the valid ranges I avoid keeping a copy of all the valid data.
At a later point, I then perform some transformation on the data such as taking the average of each range (nice and simple), or averaging the valid data into a maximum number of n points (which causes my problem).
How can I smoothly iterate through the valid indices of v across subranges?
My first thought was to look at the Ranges library provided by the C++ standard; however, I'm very inexperienced in using <ranges> and my simple experiments with it have probably led me further from a workable answer than I was initially through added confusion.
I am currently using Visual Studio 2022 and compiling for c++20.
Compiled using:
g++ -Wall -Wextra -pedantic -O3 -std=c++20 example.cpp
example.cpp
#include <vector>
#include <utility>
#include <limits>
std::vector<std::pair<size_t, size_t>>
filter( const std::vector<double>& data,
const double threshold,
const size_t x ) {
std::vector<std::pair<size_t, size_t>> range_indices;
// continuous_range indicates if currently in a continuous, VALID range.
bool continuous_range{ false };
// range_start/end track indices of most recent valid range
// count helps distinguish between noise & transitions
// from invalid to valid ranges or vice versa.
size_t range_start{ 0 }, range_end{ 0 }, count{ 0 };
for ( size_t i{ 0 }; i < data.size(); ++i ) {
/* Some logic to decide which switch branch
* Possible values:
* 0: data[i] < threshold & !continuous_range
* - In non-valid data range, reset count.
* 1: data[i] >= threshold & !continuous_range
* - Found new valid range if count >= x, else incr count
* 2: data[i] < threshold & continuous_range
* - Left a valid range if count >= x, else incr count
* 3: data[i] >= threshold & continuous_range
* - Within continuous range, rest count.
*/
size_t branch = data[i] >= threshold ? 2 : 1;
branch += continuous_range ? 1 : -1;
switch ( branch ) {
case 0:
count = 0;
break;
case 1:
count++;
continuous_range = count >= x;
if ( continuous_range ) {
range_start = i - count + 1;
count = 0;
}
break;
case 2:
count++;
// If count == x, no longer in cont. range
continuous_range = !(count >= x);
// If not in cont. range
if ( !continuous_range ) {
// 1 past the end
range_end = i - count + 1;
range_indices.push_back(
std::pair<size_t, size_t>{ range_start, range_end }
);
count = 0;
}
break;
case 3:
count = 0;
break;
}
}
// Handle case were valid range includes final datapoint.
if ( continuous_range && range_start > range_end ) {
range_indices.emplace_back(range_start, data.size() - 1);
}
return range_indices;
}
double
vector_max( const std::vector<double>& v ) {
double max{ std::numeric_limits<double>::lowest() };
for ( const auto& d : v ) {
if ( max < d ) { max = d; }
}
return max;
}
double
mean( const std::vector<double>& data,
const size_t start, const size_t end ) {
if ( data.empty() ) {
return std::numeric_limits<double>::signaling_NaN();
}
if ( start >= end || end > data.size() ) {
return std::numeric_limits<double>::signaling_NaN();
}
double sum{ 0.0 };
for ( size_t i{ start }; i < end; ++i ) {
sum += data[i];
}
return sum / (end - start);
}
std::vector<double>
avg_range( const std::vector<double>& data,
const std::vector<std::pair<size_t, size_t>>& valid_ranges ) {
std::vector<double> avg_data;
avg_data.reserve(valid_ranges.size());
for ( const auto& [first, last] : valid_ranges ) {
avg_data.emplace_back(mean(data, first, last));
}
return avg_data;
}
std::vector<double>
avg_npoints( const std::vector<double>& data,
const std::vector<std::pair<size_t, size_t>>& valid_ranges,
const size_t n ) {
/*
* Some method to iterate through the valid ranges in data
* using valid_indices so they appear as one continuous range.
* Then average the valid data into n points.
*/
}
int main() {
/*
* I would put data here, except in reality the code handles anywhere
* from a few 100k to a few million datapoints so I'm not sure what to
* provide instead.
*/
std::vector<double> data;
const auto indices = filter(data, 0.8 * vector_max(data), 2);
const auto range_avgs = avg_range(data, indices);
const auto npoint_avgs = avg_npoints(data, indices, 1000);
}

You can indeed do this quite elegantly with ranges. Here is a short example:
#include <ranges>
#include <span>
#include <vector>
// Store your subranges as
using Sub = std::span<double>;
// and return your filtered result as
std::vector<Sub> filter(std::vector<double> const& data, ...);
int main()
{
std::vector<double> data;
const auto subs = filter(data, ...);
// A view of the vector of spans, flattened into a single sequence
auto view = std::views::join(subs);
}
The spans can be created from a pair of iterators to the data vector, or an iterator and a count, so that will require some modifications to your filter algorithm.

I guess the ranges library offers ways to write your code in a much simpler way. However, you already have the code to filter and if we just consider the question
How can I smoothly iterate through the valid indices of v across subranges?
Then the answer is rather simple and requires only few additions to your code.
First I used an alias
using indices_t = std::vector<std::pair<size_t, size_t>>;
Next, your way to find the max can be simplified by using std::max_element:
double vector_max( const std::vector<double>& v ) {
return *std::max_element(v.begin(),v.end());
}
(assumes the vector is not empty)
Then you can write a function that takes a callable as parameter and calls it with all elements inside the intervals:
template <typename F>
void apply_to_intervals(F f,const std::vector<double>& v,const indices_t& indices) {
for (const auto& interv : indices) {
for (auto i = interv.first; i < interv.second; ++i){
f(v[i]);
}
}
}
Thats really all you need to smoothly iterate the filtered elements.
For example to print them:
void print(const std::vector<double>& v, const indices_t& indices) {
apply_to_intervals([](double x) {std::cout << x << "\n";},v,indices);
}
To calculate the average:
auto avg_range(const std::vector<double>& v,const indices_t& indices) {
double sum = 0;
size_t count = 0;
auto averager = [&](double x) {
sum += x;
++count;
};
apply_to_intervals(averager,v,indices);
return sum / count;
}
Complete Code

How to avoid using nested loops in cpp?

I am working on digital sampling for sensor. I have following code to compute the highest amplitude and the corresponding time.
struct LidarPoints{
float timeStamp;
float Power;
}
std::vector<LidarPoints> measurement; // To store Lidar points of current measurement
Currently power and energy are the same (because of delta function)and vector is arranged in ascending order of time. I would like to change this to step function. Pulse duration is a constant 10ns.
uint32_t pulseDuration = 5;
The problem is to find any overlap between the samples and if any to add up the amplitudes.
I currently use following code:
for(auto i= 0; i< measurement.size(); i++){
for(auto j=i+1; i< measurement.size(); j++){
if(measurement[j].timeStamp - measurement[i].timeStamp) < pulseDuration){
measurement[i].Power += measurement[j].Power;
measurement[i].timeStamp = (measurement[i].timeStamp + measurement[j].timeStamp)/2.0f;
}
}
}
Is it possible to code this without two for loops since I cannot afford the amount of time being taken by nested loops.

You can take advantage that the vector is sorted by timeStamp and find the next pulse with binary search, thus reducing the complexity from O(n^2) to O(n log n):
#include <vector>
#include <algorithm>
#include <numeric>
#include <iterator
auto it = measurement.begin();
auto end = measurement.end();
while (it != end)
{
// next timestamp as in your code
auto timeStampLower = it->timeStamp + pulseDuration;
// next value in measurement with a timestamp >= timeStampLower
auto lower_bound = std::lower_bound(it, end, timeStampLower, [](float a, const LidarPoints& b) {
return a < b.timeStamp;
});
// sum over [timeStamp, timeStampLower)
float sum = std::accumulate(it, lower_bound, 0.0f, [] (float a, const LidarPoints& b) {
return a + b.timeStamp;
});
auto num = std::distance(it, lower_bound);
// num should be >= since the vector is sorted and pulseDuration is positive
// you should uncomment next line to catch unexpected error
// Expects(num >= 1); // needs GSL library
// assert(num >= 1); // or standard C if you don't want to use GSL
// average over [timeStamp, timeStampLower)
it->timeStamp = sum / num;
// advance it
it = lower_bound;
}
https://en.cppreference.com/w/cpp/algorithm/lower_bound
https://en.cppreference.com/w/cpp/algorithm/accumulate
Also please note that my algorithm will produce different result than yours because you don't really compute the average over multiple values with measurement[i].timeStamp = (measurement[i].timeStamp + measurement[j].timeStamp)/2.0f
Also to consider: (I am by far not an expert in the field, so I am just throwing the ideea, it's up to you to know if its valid or not): with your code you just "squash" together close measurement, instead of having a vector of measurement with periodic time. It might be what you intend or not.
Disclaimer: not tested beyond "it compiles". Please don't just copy-paste it. It could be incomplet and incorrekt. But I hope I gave you a direction to investigate.

Due to jitter and other timing complexities, instead of simple summation, you need to switch to [Numerical Integration][۱] (eg. Trapezoidal Integration...).

If your values are in ascending order of timeStamp adding else break to the if statement shouldn't effect the result but should be a lot quicker.
for(auto i= 0; i< measurement.size(); i++){
for(auto j=i+1; i< measurement.size(); j++){
if(measurement[j].timeStamp - measurement[i].timeStamp) < pulseDuration){
measurement[i].Power += measurement[j].Power;
measurement[i].timeStamp = (measurement[i].timeStamp + measurement[j].timeStamp)/2.0f;
} else {
break;
}
}
}

multiply numbers on all paths and get a number with minimum number of zeros

I have m*n table which each entry have a value .
start position is at top left corner and I can go right or down until I reach lower right corner.
I want a path that if I multiply numbers on that path I get a number that have minimum number of zeros in it's right side .
example:
1 2 100
5 5 4
possible paths :
1*2*100*4=800
1*2*5*4= 40
1*5*5*4= 100
Solution : 1*2*5*4= 40 because 40 have 1 zero but other paths have 2 zero.
easiest way is using dfs and calculate all paths. but it's not efficient.
I'm looking for an optimal substructure for solving it using dynammic programming.
After thinking for a while I came up to this equation :
T(i,j) = CountZeros(T(i-1,j)*table[i,j]) < CountZeros(T(i,j-1)*table[i,j]) ?
T(i-1,j)*table[i,j] : T(i,j-1)*table[i,j]
Code :
#include <iostream>
#include <vector>
#include <algorithm>
#include <numeric>
using namespace std;
using Table = vector<vector<int>>;
const int rows = 2;
const int cols = 3;
Table memo(rows, vector<int>(cols, -1));
int CountZeros(int number)
{
if (number < 0)
return numeric_limits<int>::max();
int res = 0;
while (number != 0)
{
if (number % 10 == 0)
res++;
else break;
number /= 10;
}
return res;
}
int solve(int i, int j, const Table& table)
{
if (i < 0 || j < 0)
return -1;
if (memo[i][j] != -1)
return memo[i][j];
int up = solve(i - 1, j, table)*table[i][j];
int left = solve(i, j - 1, table)*table[i][j];
memo[i][j] = CountZeros(up) < CountZeros(left) ? up : left;
return memo[i][j];
}
int main()
{
Table table =
{
{ 1, 2, 100 },
{ 5, 5, 4 }
};
memo[0][0] = table[0][0];
cout << solve(1, 2, table);
}
(Run )
But it is not optimal (for example in above example it give 100 )
Any idea for better optimal sub-structure ? can I solve it with dynammic programming ?!

Let's reconsider the Bellman optimality equation for your task. I consider this as a systematic approach to such problems (whereas I often don't understand DP one-liners). My reference is the book of Sutton and Barto.
The state in which your system is can be described by a triple of integer numbers (i,j,r) (which is modeled as a std::array<int,3>). Here, i and j denote column and row in your rectangle M = m_{i,j}, whereas r denotes the multiplication result.
Your actions in state (i,j,r) are given by going right, with which you end in state (i, j+1, r*m_{i,j+1}) or by going down which leads to the state (i+1, j, r*m_{i+1,j}).
Then, the Bellman equation is given by
v(i,j,r) = min{ NullsIn(r*m_{i+1,j}) - NullsIn(r) + v_(i+1,j, r*m_{i+1,j})
NullsIn(r*m_{i,j+1}) - NullsIn(r) + v_(i,j+1, r*m_{i,j+1}) }
The rationale behind this equation is the following: NullsIn(r*m_{i+1,j}) - NullsIn(r) denotes the zeros you have to add when you take one of the two actions, i.e. the instant penalty. v_(i+1,j, r*m_{i+1,j}) denotes the zeros in the state you get to when you take this action. Now one wants to take the action which minimizes both contributions.
What you need further is only a function int NullsIn(int) which returns the nulls in a given integer. Here is my attempt:
int NullsIn(int r)
{
int ret=0;
for(int j=10; j<=r; j*=10)
{
if((r/j) * j == r)
++ret;
}
return ret;
}
For convenience I further defined a NullsDifference function:
int NullsDifference(int r, int m)
{
return NullsIn(r*m) - NullsIn(r);
}
Now, one has to do a backwards iteration starting from the initial state in the right bottom element of the matrix.
int backwardIteration(std::array<int,3> state, std::vector<std::vector<int> > const& m)
{
static std::map<std::array<int,3>, int> memoization;
auto it=memoization.find(state);
if(it!=memoization.end())
return it->second;
int i=state[0];
int j=state[1];
int r=state[2];
int ret=0;
if(i>0 && j>0)
{
int inew=i-1;
int jnew=j-1;
ret=std::min(NullsDifference(r, m[inew][j]) + backwardIteration({inew,j,r*m[inew][j]}, m),
NullsDifference(r, m[i][jnew]) + backwardIteration({i,jnew,r*m[i][jnew]}, m));
}
else if(i>0)
{
int inew=i-1;
ret= NullsDifference(r, m[inew][j]) + backwardIteration({inew,j,r*m[inew][j]}, m);
}
else if(j>0)
{
int jnew=j-1;
ret= NullsDifference(r, m[i][jnew]) + backwardIteration({i,jnew,r*m[i][jnew]}, m);
}
memoization[state]=ret;
return ret;
}
This routine is called via
int main()
{
int ncols=2;
int nrows=3;
std::vector<std::vector<int> > m={{1,2,100}, {5,5,4}};
std::array<int,3> initialState = {ncols-1, nrows -1, m[ncols-1][nrows - 1]};
std::cout<<"Minimum number of zeros: "backwardIteration(initialState, m)<<"\n"<<std::endl;
}
For your array, it prints out the desired 1 for the number of zeros.
Here is a live demo on Coliru.
EDIT
Here is an important thing: in production, you usually don't call backwardIteration as I did because it takes an exponentially increasing number of recursive calls. Rather, you start in the top left and call it, then store the result. Next you go left and down and each time call backwardIteration where you now use the previously stored result. And so on.
In order to do this, one needs a memoization concept within the function backwardIteration, which returns the already stored result instead of invoking another recursive call.
I've added memoization in the function call above. Now you can loop through the array from left top to right bottom in any way you like -- but prefereably take small steps, such as row-by-row, column-by-column, or rectangle-for-rectangle.
In fact, this and only this is the spirit of Dynamic Programming.

Optimal way to find shared elements between combination pairs

I have a list of ordered items of type A, who each contain a subset from a list of items B. For each pair of items in A, I would like to find the number of items B that they share (intersect).
For example, if I have this data:
A1 : B1
A2 : B1 B2 B3
A3 : B1
Then I would get the following result:
A1, A2 : 1
A1, A3 : 1
A2, A3 : 1
The problem I'm having is making the algorithm efficient. The size of my dataset is about 8.4K items of type A. This means 8.4K choose 2 = 35275800 combinations. The algorithm I'm using is simply going through each combination pair and doing a set intersection.
The gist of what I have so far is below. I am storing the counts as a key in a map, with the value as a vector of A pairs. I'm using a graph data structure to store the data, but the only 'graph' operation I'm using is get_neighbors() which returns the B subset for an item from A. I happen to know that the elements in the graph are ordered from index 0 to 8.4K.
void get_overlap(Graph& g, map<int, vector<A_pair> >& overlap) {
map<int, vector<A_pair> >::iterator it;
EdgeList el_i, el_j;
set<int> intersect;
size_t i, j;
VertexList vl = g.vertices();
for (i = 0; i < vl.size()-1; i++) {
el_i = g.get_neighbors(i);
for (j = i+1; j < vl.size(); j++) {
el_j = g.get_neighbors(j);
set_intersection(el_i.begin(), el_i.end(), el_j.begin(), el_j.end(), inserter(intersect, intersect.begin()));
int num_overlap = intersect.size();
it = overlap.find(num_overlap);
if (it == overlap.end()) {
vector<A_pair> temp;
temp.push_back(A_pair(i, j));
overlap.insert(pair<int, vector<A_pair> >(num_overlap, temp));
}
else {
vector<A_pair> temp = it->second;
temp.push_back(A_pair(i, j));
overlap[num_overlap] = temp;
}
}
}
}
I have been running this program for nearly 24 hours, and the ith element in the for loop has reached iteration 250 (I'm printing each i to a log file). This, of course, is a long way from 8.4K (although I know as iterations go on, the number of comparisons will shorten since j = i +1). Is there a more optimal approach?
Edit: To be clear, the goal here is ultimately to find the top k overlapped pairs.
Edit 2: Thanks to #Beta and others for pointing out optimizations. In particular, updating the map directly (instead of copying its contents and resetting the map value) drastically improved the performance. It now runs in a matter of seconds.

I think you may be able to make things faster by pre-computing a reverse (edge-to-vertex) map. This would allow you to avoid the set_intersection call, which performs a bunch of costly set insertions. I am missing some declarations to make fully functional code, but hopefully you will get the idea. I am assuming that EdgeList is some sort of int vector:
void get_overlap(Graph& g, map<int, vector<A_pair> >& overlap) {
map<int, vector<A_pair> >::iterator it;
EdgeList el_i, el_j;
set<int> intersect;
size_t i, j;
VertexList vl = g.vertices();
// compute reverse map
map<int, set<int>> reverseMap;
for (i = 0; i < vl.size()-1; i++) {
el_i = g.get_neighbors(i);
for (auto e : el_i) {
const auto findIt = reverseMap.find(e);
if (end(reverseMap) == findIt) {
reverseMap.emplace(e, set<int>({i})));
} else {
findIt->second.insert(i);
}
}
}
for (i = 0; i < vl.size()-1; i++) {
el_i = g.get_neighbors(i);
for (j = i+1; j < vl.size(); j++) {
el_j = g.get_neighbors(j);
int num_overlap = 0;
for (auto e: el_i) {
auto findIt = reverseMap.find(e);
if (end(reverseMap) != findIt) {
if (findIt->second.count(j) > 0) {
++num_overlap;
}
}
}
it = overlap.find(num_overlap);
if (it == overlap.end()) {
overlap.emplace(num_overlap, vector<A_pair>({ A_pair(i, j) }));
}
else {
it->second.push_back(A_pair(i,j));
}
}
}
I didn't do the precise performance analysis, but inside the double loop, you replace "At most 4N comparisons" + some costly set insertions (from set_intersection) with N*log(M)*log(E) comparisons, where N is the average number of edge per vertex, and M is the average number of vertex per edge, and E is the number of edges, so it could be beneficial depending on your data set.
Also, if your edge indexes are compact, then you can use a simplae vector rather than a map to represent the reverse map, which removed the log(E) performance cost.
One question, though. Since you're talking about vertices and edges, don't you have the additional constraint that edges always have 2 vertices ? This could simplify some computations.

how to elminate the "doubled" elements of a vector in c++

I'm using the HoughLinesto detect line in a frame, the lines information are saved in a cv::vector<cv::Vec2f> which I handle as two dimensional array, I'm interested in the second one , it the angle of the line, I want to keep only the lines that have a angle difference greater than 1.5 rad for that here I what I did :
.............................
cv::vector<cv::Vec2f> lineQ;
..............................
// ordring the vector based on the angle value in rad
for ( int i = 0 ; i< lineQ.size()-1; i++){
for(int j= i+1;j<lineQ.size();j++){
if(lineQ[i][1] > lineQ[j][1]){
tmp = lineQ[i];
lineQ[i] = lineQ[j];
lineQ[j] = tmp;
}
}
}
now I want to compare the vector elements between each other based on the angle
cv::vector<cv::Vec2f> line;
for ( int i = 0 ; i< lineQ.size()-1; i++){
for ( int j= i+1; j<lineQ.size(); j++){
if(fabs(lineQ[i][1] - lineQ[j][1])>1.5){
line.push_back(lineQ[i]);
}
}
}
this works for 2 lines but when I got 3 whit let's say 1.3rad as an angle the size of line
is than 2. I though to use erase but this change the size of my vector !

One option is to supply a soft "equals" to std::unique_copy:
std::unique_copy(lineQ.begin(), lineQ.end(), std::back_inserter(line),
[](const cv::Vec2f & a, const cv::Vec2f & b) {
return b[1] - a[1] <= 1.5;
});
Sidenote: You can also avoid the effort of writing your own sort (Bubble sort is just about the worst choice.) and use the standard library. Something like this ought to work:
std::sort(lineQ.begin(), lineQ.end(),
[](const cv::Vec2f & a, const cv::Vec2f & b) {
return a[1] < b[1];
})).
(The above code assumes C++11, which most of us have by now. If you're stuck on an earlier version, you can write a couple of functor classes instead.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

unexpected results with word2vec algorithm - c++

Are you sure, that you get "nearest" words (not farest)? ... // take top k std::vector<std::string> out; for(int i=0; i<k; i++) { out.push_back(std::get<0>(tmp[tmp.size() - 1 - i])); } ...

Related

Combining subranges of a vector efficiently to iterate through

How to avoid using nested loops in cpp?

multiply numbers on all paths and get a number with minimum number of zeros

Optimal way to find shared elements between combination pairs

how to elminate the "doubled" elements of a vector in c++

Categories

Resources