How to find different set of two vectors with no repetitions? - c++

I am writing the code to return the data in v1 that is not in v2 vector, with no repetitions using C++.
std::set_difference(v1.begin(), v1.end(), v2.begin(), v2.end(),
std::inserter(diff, diff.begin()));
However, when my input v1,v2 are
v1=[137 138 139 140 141 142 143 144 148 150 157 158 161]
v2=[138 157 150 140 137 158 141 139 143 148]
The output results in unexpected solution as
diff=[ 137 139 140 141 142 143 144 148 150 161]
While, my expected solution must be
diff=[ 142 144 161]
What should I correct my function? Thanks

v2 needs to be sorted. As does v1 (which it is). The function set_difference assumes both vectors are sorted.
The algorithm only has to walk each vector once, and only needs to compare the current cursor of each vector. This is a significant performance improvement, and space saving from an algorithm which worked with arbitrary inputs.

Copies the elements from the sorted range [first1, last1) which are
not found in the sorted range [first2, last2) to the range beginning
at d_first
You must sort your vectors, before difference them

Look at the method:
std::set_difference(v1.begin(), v1.end(), v2.begin(), v2.end(),
std::inserter(diff, diff.begin()));
It's called set_difference for a reason :)
Just use set containers instead of your vector ones. They will make sure that your data is sorted and that the algorithm is successful.


Fast read of large text file to 1D structure in C++

I need to read a batch of text files of up to 20mb in size, fast.
The text file comes in the format. The numbers need to be in double format as some other file may have 3 decimal place precision:
0 0 29 175 175 175 175 174
0 1 29 175 175 175 175 174
0 2 29 28 175 175 175 174
0 3 29 28 175 175 175 174
0 4 29 29 175 175 175 174
I would like to store the last six numbers of each line into a single 1D structure like this such that it skips the first two columns. It basically transposes each column and horizontally concatenates each transposed column:
29 29 29 29 29 175 175 28 28 29 175 175 175 175 175...
Here is my class attempting this that is too slow for my purposes.
void MyClass::GetFromFile(std::string filename, int headerLinestoSkip, int ColumnstoSkip, int numberOfColumnsIneed)
std::ifstream file(filename);
std::string file_line;
double temp;
std::vector<std::vector<double>> temp_vector(numberOfColumnsIneed);
SkipLines(file, headerLinestoSkip);
while(getline(file, file_line, '\n'))
std::istringstream ss(file_line);
for(int i=0; i<ColumnstoSkip; i++)
ss >> temp;
for(int i=0; i<numberOfColumnsIneed; i++)
ss >> temp;
for(int i=0; i<numberOfColumnsIneed; i++)
this->ClassMemberVector.insert(this->ClassMemberVector.end(), temp_vector[i].begin(), temp_vector[i].end());
I have read that memory mapping the file may be helpful but my attempts to getting it into the 1D structure I need has not been successful. An example from someone would be very much appreciated!
With 20mb and short lines as you show, that's approx 500 000 lines. Knowing this, there are several factors that could slow down your code:
I/O : at the current hardware and OS performance, I can't imagine that this plays a role here;
parsing/conversion. You read each line, build a string stream out of it, to then extract the numbers. This could be an overhead, especially on some C++ implementations where stream extraction is slower than the old sscanf(). I may be wrong but again I'm not sure that this overhead would be so huge.
the memory allocation for your vectors. This is definitely the first place to look for. A vector has a size and a capacity. Each time you add an item above capacity, the vector needs to be reallocated, which could require to move and move again all its content.
I'd strongly advise you to execute your code with a profiler to identify the bottleneck. Manual timing will be difficult here because your loop contains all potential problems, but each iteration is certainly to quick for std::chrono to measure the different loop parts with sufficient accuracy.
If you can't use a profiler, I'd suggest to compute a rough estimation of the number of lines using the file size, and take half of it. Pre-reserve then the corresponding capacity in each temp_vector[i]. If you observe a good progress you'll be the right track and could then fine tune this approach. If not, edit your answer with your new findings and post a comment to this answer.

Assertion triggered in CGAL when computing polygon union

I'm writing an application in Python that applies a watermark to a font file. The glyphs are non-convex polygons with holes and consist of bezier splines defined in PostScript. The watermark needs to merge with the glyphs, not overlap. I was unable to find a library to do this in Python so I'm using CGAL & C++. I have it working nicely on most glyphs, but it's mysteriously failing on others.
Initially, I was very impressed with CGAL. It looked very comprehensive and sophisticated and seemed to provide all the functionality I needed, but it suffers one fatal flaw - and I almost can't believe it - the library contains no error handling. No error codes, no exceptions, just assertions - but only in the debug build. In the release build you get a nice segmentation fault instead. Furthermore, the assertions reveal nothing about the nature of the problem.
The program fails only on certain glyphs. The letter 'e' works fine, but it crashes on 'a'. The crash occurs when I attempt to compute the union of the glyph with the watermark. There is nothing visibly wrong with the 'a' glyph. My application parses the PostScript correctly and renders it fine on a QGraphicsView.
This program needs to run on a server and its output sent directly to the customer so it must be reliable. I can't recover from an assertion failure or a segfault, so what can I do?
Even if I get it working reliably, how can I trust that it will never fail? If there was some kind of error handling in place, I could just skip the few glyphs that it fails on, leaving them unwatermarked - not ideal, but acceptable. I just don't understand what the authors of this library were thinking; they went to such tremendous effort to make the most comprehensive geometry library available only to ensure that it's completely unfit for purpose.
Currently, it's looking like I'm going to have to modify the code myself to handle the error in a sensible way, but this just seems so ridiculous.
I'm sorry if I come off as impatient, but I'm way past my deadline and my client isn't going to care about or understand these excuses.
The assertion failure is occurring on line 2141 of multiset.h:
CGAL_multiset_precondition (comp_f(object, nodeP->object) != LARGER);
It happens when I call join() on a BezierPolygonSet. My types are as follows:
typedef CGAL::CORE_algebraic_number_traits NtTraits;
typedef NtTraits::Rational Rational;
typedef NtTraits::Algebraic Algebraic;
typedef CGAL::Cartesian<Rational> RatKernel;
typedef CGAL::Cartesian<Algebraic> AlgKernel;
typedef RatKernel::Point_2 BezierRatPoint;
typedef CGAL::Arr_Bezier_curve_traits_2<RatKernel, AlgKernel, NtTraits> Traits;
typedef Traits::Point_2 BezierPoint;
typedef Traits::Curve_2 BezierCurve;
typedef CGAL::Gps_traits_2<Traits> BezierTraits;
typedef BezierTraits::X_monotone_curve_2 BezierXMonotoneCurve;
typedef BezierTraits::General_polygon_2 BezierPolygon;
typedef BezierTraits::General_polygon_with_holes_2 BezierPolygonWithHoles;
typedef CGAL::Gps_default_dcel<BezierTraits> BezierDcelTraits;
typedef CGAL::General_polygon_set_2<BezierTraits, BezierDcelTraits> BezierPolygonSet;
Any help would be much appreciated. Thanks.
I have a module called Geometry which wraps the CGAL code and exposes a bunch of geometric primitives (Point, Curve, LineSegment, CubicBezier, Path) and the functions:
PathList toPathList(const PolyList& polyList);
PolyList toPolyList(const PathList& paths);
The Path class has a method called computeUnion, which looks like this:
PathList Path::computeUnion(const PathList& paths1, const PathList& paths2) {
PolyList polyList1 = toPolyList(paths1);
PolyList polyList2 = toPolyList(paths2);
cgal_wrap::BezierPolygonSet polySet;
for (auto i : polyList1) {
for (auto i : polyList2) {
PolyList polyList;
return toPathList(polyList);
The error occurs when I call join(). The polygons are created from paths like so:
PolyList toPolyList(const PathList& paths) {
cgal_wrap::Traits traits;
cgal_wrap::Traits::Make_x_monotone_2 fnMakeXMonotone = traits.make_x_monotone_2_object();
cgal_wrap::RatKernel ratKernel;
cgal_wrap::RatKernel::Equal_2 fnEqual = ratKernel.equal_2_object();
PolyList polyList; // The final polygons with holes
cgal_wrap::BezierPolygon outerPoly;
std::list<cgal_wrap::BezierPolygon> holes;
std::list<cgal_wrap::BezierXMonotoneCurve> monoCurves;
bool first = true;
cgal_wrap::BezierRatPoint firstPoint;
// For each path in the list
for (auto i = paths.begin(); i != paths.end(); ++i) {
const Path& path = *i;
cgal_wrap::BezierRatPoint prevEndPoint;
// For each curve in the path
for (auto j = path.begin(); j != path.end(); ++j) {
const Curve& curve = **j;
std::list<cgal_wrap::BezierRatPoint> points;
if (curve.type() == LineSegment::type) {
const LineSegment& lseg = dynamic_cast<const LineSegment&>(curve);
cgal_wrap::BezierRatPoint A = lseg.A();
if (j != path.begin()) {
if (A != prevEndPoint) {
A = prevEndPoint;
else if (curve.type() == CubicBezier::type) {
const CubicBezier& bezier = dynamic_cast<const CubicBezier&>(curve);
cgal_wrap::BezierRatPoint A = bezier.A();
if (j != path.begin()) {
if (A != prevEndPoint) {
A = prevEndPoint;
bool bClosesCurve = false;
if (!first && Point(points.back()) == Point(firstPoint)) {
bClosesCurve = true;
prevEndPoint = points.back();
cgal_wrap::BezierCurve cgalCurve(points.begin(), points.end());
std::list<CGAL::Object> monoObjs;
fnMakeXMonotone(cgalCurve, std::back_inserter(monoObjs));
// Append the x-monotone curves to the list
cgal_wrap::BezierXMonotoneCurve monoCurve;
for (auto o = monoObjs.begin(); o != monoObjs.end(); ++o) {
if (CGAL::assign(monoCurve, *o)) {
if (!first) {
// If this curve closes the current chain, thereby creating a new polygon
if (bClosesCurve) {
// Add the new polygon to the list
cgal_wrap::BezierPolygon subPoly(monoCurves.begin(), monoCurves.end());
if (subPoly.orientation() == CGAL::COUNTERCLOCKWISE) {
if (!outerPoly.is_empty()) {
polyList.push_back(cgal_wrap::BezierPolygonWithHoles(outerPoly, holes.begin(), holes.end()));
outerPoly = subPoly;
else {
first = true;
else {
// This is the first curve in the chain - store its source point
firstPoint = cgalCurve.control_point(0);
first = false;
polyList.push_back(cgal_wrap::BezierPolygonWithHoles(outerPoly, holes.begin(), holes.end()));
return polyList;
Notice that I'm careful to ensure that the polygon boundaries have no gaps by setting the first point of curve n+1 to the last point of curve n in case they were slightly different. I was hoping this would solve the problem, but it didn't. I can't think of any other things that might make the shapes invalid.
Here is the successful merging of the 'e' glyph with the watermark (an X).
Here is what the 'a' glyph looks like. The merging fails on this glyph.
Here are the curves that make up the 'a' glyph after parsing it from PostScript. There doesn't appear to be anything wrong with it. As I said, it looks okay when rendered. The error probably occurs during the translation from this data into the CGAL types. The line segments get translated into BezierCurves with 2 control points. I will investigate further.
LineSegment[(344, 0), (409, 0)]
CubicBezier[(409, 0), (403, 24), (400, 68), (400, 161)]
LineSegment[(400, 161), (400, 324)]
CubicBezier[(400, 324), (400, 437), (330, 485), (232, 485)]
CubicBezier[(232, 485), (180, 485), (121, 472), (66, 437)]
LineSegment[(66, 437), (94, 385)]
CubicBezier[(94, 385), (127, 405), (167, 424), (224, 424)]
CubicBezier[(224, 424), (283, 424), (326, 392), (326, 320)]
LineSegment[(326, 320), (326, 290)]
LineSegment[(326, 290), (236, 287)]
CubicBezier[(236, 287), (188, 285), (150, 280), (118, 264)]
CubicBezier[(118, 264), (70, 242), (38, 199), (38, 136)]
CubicBezier[(38, 136), (38, 45), (102, -10), (188, -10)]
CubicBezier[(188, -10), (247, -10), (293, 18), (330, 53)]
LineSegment[(330, 53), (344, 0)]
LineSegment[(326, 234), (326, 114)]
CubicBezier[(326, 114), (304, 91), (260, 52), (201, 52)]
CubicBezier[(201, 52), (147, 52), (113, 88), (113, 140)]
CubicBezier[(113, 140), (113, 171), (127, 198), (154, 213)]
CubicBezier[(154, 213), (175, 224), (202, 230), (243, 231)]
LineSegment[(243, 231), (326, 234)]
Here are the 'a' glyph curves after translation into CGAL curves. Notice that they exactly match the curves before translation implying that none of them had to be split into X-monotone subcurves; they must have all been X-monotone already.
Outer boundary:
2 344 0 409 0 [1] | 344 0 --> 409 0
4 409 0 403 24 400 68 400 161 [1] | 409 0 --> 400 161
2 400 161 400 324 [1] | 400 161 --> 400 324
4 400 324 400 437 330 485 232 485 [1] | 400 324 --> 232 485
4 232 485 180 485 121 472 66 437 [1] | 232 485 --> 66 437
2 66 437 94 385 [1] | 66 437 --> 94 385
4 94 385 127 405 167 424 224 424 [1] | 94 385 --> 224 424
4 224 424 283 424 326 392 326 320 [1] | 224 424 --> 326 320
2 326 320 326 290 [1] | 326 320 --> 326 290
2 326 290 236 287 [1] | 326 290 --> 236 287
4 236 287 188 285 150 280 118 264 [1] | 236 287 --> 118 264
4 118 264 70 242 38 199 38 136 [1] | 118 264 --> 38 136
4 38 136 38 45 102 -10 188 -10 [1] | 38 136 --> 188 -10
4 188 -10 247 -10 293 18 330 53 [1] | 188 -10 --> 330 53
2 330 53 344 0 [1] | 330 53 --> 344 0
2 326 234 326 114 [1] | 326 234 --> 326 114
4 326 114 304 91 260 52 201 52 [1] | 326 114 --> 201 52
4 201 52 147 52 113 88 113 140 [1] | 201 52 --> 113 140
4 113 140 113 171 127 198 154 213 [1] | 113 140 --> 154 213
4 154 213 175 224 202 230 243 231 [1] | 154 213 --> 243 231
2 243 231 326 234 [1] | 243 231 --> 326 234
This polygon causes the assertion failure when added to a BezierPolygonSet. Any ideas?
You can customize error handling; see the manual.
Please attach a complete standalone program. Nothing looks wrong to me with statements you listed.
You can approximate the Bezier curves with polylines and process the whole things using polylines. If the problem is with the handling of Bezier curves, then this would solve it. If this is acceptable, it will also be more efficient.
We have fixed a bug in the CGAL component that handles Bezier curves, namely, Arr_Bezier_curve_traits_2.h.

Read matrix to 2D array in C/C++

What is the simplest way to read/input a matrix of numbers into an array in C++?
This is the file content (dimensions are unknown):
283 278 284 290 290 286 273 266 266 266 261 252 246
382 380 379 381 382 379 384 387 385 382 376 365 357
285 282 281 279 276 273 272 264 255 255 247 243 237
196 190 186 183 183 180 179 186 191 195 195 188 187
245 237 226 220 221 222 225 228 234 245 252 264 272
283 278 284 290 290 286 273 266 266 266 261 252 246
I've tried a lot of suggested codes, but non of them seem to work for me... :(
I want to do the following with the matrix:
MATRIX[i][j] = MATRIX[i][j] + rand()-RAND_MAX/2;
What to include in the if loop to read the matrix??
#include <iostream>
#include <fstream>
ifstream pFile;"test.txt");
if (pFile.is_open())
printf("Error reading the file!\n");
return 1;
First, as others suggested, use a std::vector<std::vector<int>>. It will make things a lot simpler.
#include <vector>
typedef std::vector<int> IntVector;
typedef std::vector<IntVector> IntVector2D;
So our type is IntVector2D. (I defined the one-dimensional vector to be used later)
Next, we want to set up a loop that reads one line at a time, parses the line, and stores the int's found on the line in one row of the matrix. To do that, we can read the line into a string, and use istringstream to do the parsing.
Last, for each line stored, we apply the changes to each item on the row according to your random number function.
So here is a sample of all of this put together:
#include <fstream>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <iostream>
typedef std::vector<int> IntVector;
typedef std::vector<IntVector> IntVector2D;
using namespace std;
// random number function to apply
int ApplyRand(int num)
{ return num + rand() - RAND_MAX/2; }
void OutputMatrix(const IntVector2D& m)
cout << "\n";
IntVector2D::const_iterator it = m.begin();
while (it != m.end())
copy(it->begin(), it->end(), ostream_iterator<int>(cout, " "));
cout << "\n";
// Transform the numbers in the matrix
void TransformMatrix(IntVector2D& m)
IntVector2D::iterator it = m.begin();
while (it != m.end())
transform(it->begin(), it->end(), it->begin(), ApplyRand);
int main()
IntVector2D matrix;
ifstream pFile("test.txt");
string s;
while ( std::getline(pFile, s) )
// create empty row on back of matrix
IntVector& vBack = matrix.back();
// create an istringstream to parse
istringstream ss(s);
// parse the data, adding each number to the last row of the matrix
copy(istream_iterator<int>(ss), istream_iterator<int>(), back_inserter(vBack));
// output the matrix
// Apply rand to each number
// output the updated matrix
283 278 284 290 290 286 273 266 266 266 261 252 246
382 380 379 381 382 379 384 387 385 382 376 365 357
285 282 281 279 276 273 272 264 255 255 247 243 237
196 190 186 183 183 180 179 186 191 195 195 188 187
245 237 226 220 221 222 225 228 234 245 252 264 272
283 278 284 290 290 286 273 266 266 266 261 252 246
-16059 2362 -9765 10407 3076 -373 -4632 13241 10845 8347 -10417 12014 7144826
-6042 -15513 -13007 -4059 -11177 -10563 16395 -1394 -12099 -15854 -15726 -3644
1323 2615 3616 3791 -10660 5616 -1340 -4581 -14259 3784 9531 10159 889
-6293 12510 7614 15122 14133 1470 -11540 -1056 -8481 12065 -9320 9352 11448
16524 16611 3880 -3304 -7439 -6420 11371 -15377 -3833 -13103 6059 -14277 -15823
14006 -7065 -7157 3171 6555 11349 7695 -227 -9388 8253 -772 -1125 14964
Note the use of std::copy to extract the items from the istringstream, and back_inserter, which is responsible for calling push_back on the last row of the matrix.
Also, the usage of std::transform allows us to call a function on each element, "transforming" the element from the original value to the changed value using ApplyRand as the function to do this transformation.
Here's a simple way to read in a matrix of unknown size using vectors. The advantage of vectors over arrays if you don't know the dimensions that you're working with is that you don't need to worry about resizing your data structure if you run out of space.
std::vector<std::vector<int> > matrix;
std::string line;
int value;
// read in matrix
std::ifstream file("path/to/file.txt");
while(std::getline(file, line)) {
std::vector<int> row;
std::istringstream iss(line);
while(iss >> value){
First of all, we declare a vector of vectors to keep our matrix in.
Note that the advantage of vectors over arrays if you don't know the dimensions that you're working with is that you don't need to worry about resizing your data structure if you run out of space. This is why we use vectors instead of arrays.
After that, we use stringstream to read all integers from input.
In a while loop, we continue until there still exits another line (getline() returns true if there is no more lines). In each step, we read a line from input (no matter how long it is, we read it completely) then we seprate the line's integers and put them in a vector using string stream. Then, we add that vector to our matrxi 2D vector.
I wrote this code:
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <fstream>
using namespace std;
int main () {
fstream cin;"input.txt");
string s;
vector <vector <int> > matrix;
while (getline(cin, s)) {
stringstream input(s);
int temp;
vector <int> currentLine;
while (input >> temp)
for (unsigned int i = 0; i < matrix.size(); i++) {
for (unsigned int j = 0; j < matrix[i].size(); j++)
cout << matrix[i][j] << " ";
cout << endl;
return 0;
And the output is exactly what you want. Note that the first line can't be seen and I had to scroll up to see that but be sure it's there. Give it a try. Here's the output:

Direct-inclusion sorting

What is the other name for direct-inclusion sorting and what is the algorithm for the same sort?
I have been trying to search on the Internet, but I'm not getting a straight answer, but I can not find any. I found this algorithm for straight insertion sort and in some books it's saying they are the same with direct direct-inclusion sorting, but I'm doubting it because the book is in Russian, so I want to confirm (that is, if it's true or might I have a translation error?)
Code in C++:
int main(int argc, char* argv[])
int arr[8] = {27, 412, 71, 81, 59, 14, 273, 87},i,j;
for (j=1; j<8; j++){
if (arr[j] < arr[j-1]) {
//Что бы значение j мы не меняли а работали с i
i = j;
//Меняем местами пока не найдем нужное место
//защита от выхода за пределы массива
if (i == 0)
while (arr[i] < arr[i-1]) ;
for (i=0;i<8;i++)
cout << arr[i]<< ' ';
cout << '\n';
return 0;
27 412 71 81 59 14 273 87
27 71 412 81 59 14 273 87
27 71 81 412 59 14 273 87
27 59 71 81 412 14 273 87
14 27 59 71 81 412 273 87
14 27 59 71 81 273 412 87
14 27 59 71 81 87 273 412
The posted code is Insertion sort.
Most implementations will copy an out-of-order element to a temporary variable and then work backwards, moving elements up until the correct open spot is found to "insert" the current element. That's what the pseudocode in the Wikipedia article shows.
Some implementations just bubble the out-of-order element backwards while it's less than the element to its left. That's what the inner do...while loop in the posted code shows.
Both methods are valid ways to implement Insertion sort.
The code you posted looks not like an algorithm for insertion sort, since you are doing a repeated swap of two neighboring elements.
Your code looks much more like some kind of bubble-sort.
Here a list of common sorting algorithms:
"straight insertion" and "direct inclusion" sounds like pretty much the same .. so I quess they probably are different names for the same algorithm.
Possibly the "straight" prefix should indicate that only one container is used .. however, if two neighboring elements are swaped, I would not call it insertion-sort, since no "insert" is done at all.
Given the fact that the term "direct inclusion sort" yields no google hits at all, and "direct insertion sorting" only 27 hits, the first three of which are this post here and two identically phrased blog posts, I doubt that this term has any widely accepted meaning. So the part of your question about
some book its saying they are the same with direct direct-inclusion sorting
is hard to answer, unless we find a clear definition of what direct-inclusion sorting actually is.

How do I read one number at a time and store it in an array, skipping duplicates?

I'm trying to read numbers from a file into an array, discarding duplicates. For instance, say the following numbers are in a file:
41 254 14 145 244 220 254 34 135 14 34 25
Though the number 34 occurs twice in the file, I would only like to store it once in the array. How would I do this?
(fixed, but I guess a better term would be a 64 bit Unsigned int) (was using numbers above 255)
vector<int64_t> v;
copy(istream_iterator<int64_t>(cin), istream_iterator<int64_t>(), back_inserter(v));
set<int64_t> s;
vector<int64_t> ov; ov.reserve(v.size());
for( auto i = v.begin(); i != v.end(); ++i ) {
if ( s.insert(v[i]).second )
// ov contains only unique numbers in the same order as the original input file.