Declaring an object variable makes all the previous instructions slower - c++

I ran into a fastidious problem and I have no idea of what is causing it. I hope you can help me find a solution.
Framework: I implemented a sparse_matrix class using the CSR representation and I used this object as the basis for a recommendation system. The class is defined as follows:
class sparse_matrix_csr
{
public:
sparse_matrix_csr();
sparse_matrix_csr(const std::vector<int> &row_indices);
sparse_matrix_csr(const std::vector<int> &row_indices, const size_t nb_columns);
// other member functions omitted
private:
std::vector<int> _row_indices;
std::vector<int> _row_start_indices;
std::vector<int> _column_indices;
std::vector<double> _values;
bool _column_sorted_by_index;
};
The _row_indices vector contains the indices of the rows of the matrix. _row_start_indices contains the index of the first element of a given row in the _column_indices, which contains the column indices, and _values vector, which contains the elements of the matrix. In particular, the constructor sparse_matrix_csr(const std::vector<int> &row_indices, const size_t nb_columns) is implemented as follows:
sparse_matrix_csr::sparse_matrix_csr(const std::vector<int> &row_indices,
const size_t nb_columns):
_row_indices(row_indices),
_row_start_indices(row_indices.size()),
_column_indices(row_indices.size() * nb_columns, 0),
_values(row_indices.size() * nb_columns, 0.0),
_column_sorted_by_index(true)
{
for (size_t i = 0; i < _row_start_indices.size(); ++i)
_row_start_indices[i] = i * nb_columns;
}
This constructor takes in the indices of the rows of the sparse matrix and the number of elements that will be contained in each row. In fact, in the application I am considering, I have same matrices that are sparse only wrt the rows.
Problem: The algorithm is structured as follows
\\ Instruction block 1
{
\\ Do something
}
sparse_matrix_csr mat(list_indices, nb_columns);
\\ Instruction block 2
{
\\ Do something
}
If I run the first block of instructions alone (commenting out all that follows), my algorithm runs smoothly. However, if I uncomment the second part of the algorithm, the first part of the algorithm slows down a lot. I have been able to identify the critical line for this slow-down in the declaration of mat. However, I cannot explain this retro-action on the first part of the algorithm. The full algorithm is reported at the end of my question.
My considerations: I have never observed such a retro-action and therefore I am a little confused. One possibility that I have considered is that I have a problem in memory management which causes the slow-down of the previous part of the algorithm. I am currently working on some matrices that have around 2e6 elements, i.e. around 4e6 values stored in 2 vectors. I read on another post on stack overflow that the data contained in a std::vector are allocated in the heap. Is that always true? even if I initialize a vector with a given size as I do in the sparse matrix constructors above?
If you need some clarification, do not hesitate!
Thanks in advance,
Pierpaolo
Full algorithm:
void collaborative_filtering_mpi (std::string ratings_file, std::string targets_file,
std::string output_file, int k_neighbors, double shrinkage_factor,
bool output_debug_data)
{
std::cout << "COLLABORATIVE FILTERING ALGORITHM - k = " << k_neighbors
<< " , d = " << shrinkage_factor << std::endl;
stopwatch sw_total;
sw_total.start();
//-------------------------------------
// STEP 1: Read data from input files
//-------------------------------------
std::cout << "1) Read input files: ";
stopwatch sw;
sw.start();
//-------------------------------
// 1.1) Read user rating matrix
//-------------------------------
// Initialization
sparse_matrix_csr user_item_rating_matrix(ratings_file, true, true, true); // file is sorted, skip header
//-------------------------------------------------
// 1.2) Read targets (user, item) to be predicted
//-------------------------------------------------
// Initialization
sparse_matrix_csr targets(targets_file, true, false, false); // do not read ratings column
sw.stop();
double time_input = sw.get_duration();
std::cout << time_input << std::endl;
//-----------------------------------------
// STEP 2: Pre-computations
//-----------------------------------------
std::cout << "2) Pre-computations: ";
sw.start();
//-----------------------------------------------------------------------
// 2.1) Sort user_item_rating_matrixand compute relevant sizes of the problem
//-----------------------------------------------------------------------
// Sort: if file is sorted, sort should do nothing.
user_item_rating_matrix.sort_columns_by_index();
// Compute list of users and list of items
std::vector<int> list_users;
user_item_rating_matrix.rows(list_users);
std::vector<int> list_items;
user_item_rating_matrix.columns(list_items);
// [DEBUG]: print user_rating_matrix
if (output_debug_data)
{
std::ofstream ofs_debug("data_debug/debug_user_rating_matrix.txt");
ofs_debug << user_item_rating_matrix;
ofs_debug.close();
}
//-------------------------------------------------------------
// 2.2) Compute inverse user rating matrix (indexed by column)
//-------------------------------------------------------------
// Initialize item_user_rating_matrix: it is a sparse matrix with
// items on the rows, users on the columns and rating as values.
// This variable will be helpful when computing similarities.
sparse_matrix_csr item_user_rating_matrix;
// Compute item_user_rating<_matrix by transposing the user_rating_matrix
transpose (user_item_rating_matrix, item_user_rating_matrix);
// [DEBUG]: print item_user_matrix_on_file
if (output_debug_data)
{
std::ofstream ofs_debug("data_debug/debug_item_user_matrix.txt");
ofs_debug << item_user_rating_matrix;
ofs_debug.close();
}
//-----------------------------------------------
// 2.3) sort targets and compute relevant sizes
//-----------------------------------------------
// Compute list of target items
std::vector<int> list_target_items;
targets.columns(list_target_items);
// [DEBUG]: print targets
if (output_debug_data)
{
std::ofstream ofs_debug("data_debug/debug_targets.txt");
ofs_debug << targets;
ofs_debug.close();
}
//--------------------------------------------------------------
// 2.4) Compute difference between list_items and list_targets
//--------------------------------------------------------------
std::vector<int> list_non_target_items;
compute_difference_vector (list_items, list_target_items, list_non_target_items);
// [DEBUG]
if (output_debug_data)
{
std::ofstream ofs_debug("data_debug/debug_difference_vector.txt");
ofs_debug << "list_items - size: " << list_items.size() << std::endl;
for (std::vector<int>::const_iterator iter = list_items.begin(); iter != list_items.end(); ++iter)
ofs_debug << (*iter) << ",";
ofs_debug << std::endl << std::endl;
ofs_debug << "list_target_items - size: " << list_target_items.size() << std::endl;
for (std::vector<int>::const_iterator iter = list_target_items.begin(); iter != list_target_items.end(); ++iter)
ofs_debug << (*iter) << ",";
ofs_debug << std::endl << std::endl;
ofs_debug << "list_non_target_items - size: " << list_non_target_items.size() << std::endl;
for (std::vector<int>::const_iterator iter = list_non_target_items.begin(); iter != list_non_target_items.end(); ++iter)
ofs_debug << (*iter) << ",";
ofs_debug << std::endl << std::endl;
ofs_debug.close();
}
//--------------------------------------------
// 2.5) Compute average rating for each user
//--------------------------------------------
dictionary<int, double> average_rating_vector;
compute_average_rating(user_item_rating_matrix, average_rating_vector);
if (output_debug_data)
{
std::ofstream ofs_debug("data_debug/debug_average_rating_vector.txt");
for (dictionary<int, double>::const_iterator iter = average_rating_vector.begin();
iter != average_rating_vector.end(); ++iter)
ofs_debug << (*iter).get_key() << ": " << (*iter).get_value() << std::endl;
ofs_debug.close();
}
sw.stop();
std::cout << sw.get_duration() << std::endl;
//-------------------------------------
// STEP 3: Similarity matrix
//-------------------------------------
std::cout << "3) Compute similarity matrix: ";
sw.start();
// Initialize similarity_matrix with target items on the rows.
sparse_matrix_csr similarity_matrix(list_target_items,
list_target_items.size() + list_non_target_items.size());
// compute similarity matrix
compute_similarity_matrix_mpi(similarity_matrix,
item_user_rating_matrix,
average_rating_vector,
list_target_items,
list_non_target_items,
shrinkage_factor);
// [DEBUG]: print similarity matrix sorted by similarity
if (output_debug_data)
{
std::ofstream ofs("data_debug/debug_similarity_matrix.txt");
ofs << similarity_matrix;
ofs.close();
}
sw.stop();
std::cout << sw.get_duration() << std::endl;
//---------------------------------------------------------------
// STEP 4: Find top-K similar elements with positive similarity
//---------------------------------------------------------------
std::cout << "4) Find top-K similar elements:" << std::endl;
sw.start();
if (k_neighbors > 0)
{
//---------------------------------------------------
// 4.1) Sort similarity matrix by rating (row-wise)
//---------------------------------------------------
std::cout << " .... Sort similarity matrix by rating: ";
sw.start();
// Sort all the rows of the similarity_matrix by similarity.
// If two items have the same rating, sort them in descending order of item.
//similarity_matrix.sort_columns_by_value();
similarity_matrix.sort_columns_by_value();
// [DEBUG]: print similarity matrix sorted by similarity
if (output_debug_data)
{
std::ofstream ofs ("data_debug/debug_similarity_matrix_sorted_by_rating.txt");
ofs << similarity_matrix;
ofs.close();
}
sw.stop();
std::cout << sw.get_duration() << std::endl;
//--------------------------------------------------------
// 4.2) Cut the useless columns of the similarity matrix
//--------------------------------------------------------
std::cout << " .... Reduce similarity matrix: ";
sw.start();
sparse_matrix_csr similarity_matrix_reduced(list_target_items,
k_neighbors);
reduce_similarity_matrix_mpi (similarity_matrix,
similarity_matrix_reduced,
k_neighbors);
// [DEBUG]: print similarity matrix sorted by similarity
if (output_debug_data)
{
std::ofstream ofs ("data_debug/debug_similarity_matrix_reduced.txt");
ofs << similarity_matrix_reduced;
ofs.close();
}
sw.stop();
std::cout << sw.get_duration() << std::endl;
//---------------------------------------------------
// 4.3) Sort the reduced similarity matrix by items
//---------------------------------------------------
std::cout << " .... Sort similarity matrix by index: ";
sw.start();
// Sort all the rows of the similarity_matrix by index.
similarity_matrix_reduced.sort_columns_by_index();
// [DEBUG]: print similarity matrix sorted by similarity
if (output_debug_data)
{
std::ofstream ofs ("data_debug/debug_similarity_matrix_sorted.txt");
ofs << similarity_matrix_reduced;
ofs.close();
}
sw.stop();
std::cout << sw.get_duration() << std::endl;
//-----------------------------------------
// STEP 5: Compute predictions for targets
//-----------------------------------------
std::cout << "5) Compute predictions: ";
sw.start();
compute_predicted_ratings_mpi (targets,
user_item_rating_matrix,
similarity_matrix_reduced);
sw.stop();
std::cout << sw.get_duration() << std::endl;
}
else
{
//---------------------------------------------------
// 4.3) Sort the reduced similarity matrix by items
//---------------------------------------------------
std::cout << " .... Sort similarity matrix by index: ";
sw.start();
// Sort all the rows of the similarity_matrix by index.
// similarity_matrix.sort_columns_by_index();
similarity_matrix.sort_columns_by_index();
// [DEBUG]: print similarity matrix sorted by similarity
if (output_debug_data)
{
std::ofstream ofs ("data_debug/debug_similarity_matrix_sorted.txt");
ofs << similarity_matrix;
ofs.close();
}
sw.stop();
std::cout << sw.get_duration() << std::endl;
//-----------------------------------------
// STEP 5: Compute predictions for targets
//-----------------------------------------
std::cout << "5) Compute predictions: ";
sw.start();
compute_predicted_ratings_mpi (targets,
user_item_rating_matrix,
similarity_matrix);
sw.stop();
std::cout << sw.get_duration() << std::endl;
}
//-----------------------------------------------------
// STEP 6: Print the prediction matrix in output file
//-----------------------------------------------------
std::cout << "6) Print predictions on file: ";
sw.start();
std::ofstream ofs_output(output_file);
targets.print(ofs_output);
sw.stop();
double time_output = sw.get_duration();
std::cout << time_output << std::endl;
sw_total.stop();
double time_total = sw_total.get_duration();
std::cout << ">> Total computation time: " << time_total << std::endl;
std::cout << ">> Total computation time - no input/output: " << (time_total - time_input - time_output) << std::flush;
}
UPDATE - 05 February 2015: I don't know what happened, but the code is now running fine on my 64-bit machine. However, it is still very slow on a 32-bit VM that I need to use to run my code. I measured the size of a sparse_matrix object using sizeof and it occupies 52 bit. I think this might be causing a cache problem (credit goes to #Dark Falcon). Do you have any ideas on how I could solve this problem and make my algorithm run efficiently on the 32-bit VM?
UPDATE - 06 February 2015: Well, the problem comes and go. I tried to change the implementation of the sparse_matrix_csr class wrapping the data in a shared_ptr in the following way:
class sparse_matrix_csr
{
public:
\\ public methods omitted
private:
class sparse_matrix_csr_data
{
public:
sparse_matrix_csr_data() {}
sparse_matrix_csr_data(const std::vector<int> &row_indices): _row_indices(row_indices) {}
sparse_matrix_csr_data(const std::vector<int> &row_indices, const size_t nb_columns);
sparse_matrix_csr_data(const std::string file, bool file_is_sorted, bool skip_header, bool read_values);
// data
std::vector<int> _row_indices;
std::vector<int> _row_start_indices;
std::vector<int> _column_indices;
std::vector<double> _values;
bool _column_sorted_by_index;
};
std::shared_ptr<sparse_matrix_csr_data> _data;
};
This modification did not improve things. I am currently having problems both on the 32-bit VM and the 64-bit laptop. I have no idea about what is causing the program to be so slow.

Related

Trouble when using Efficient_Ransac in CGAL

I want to use the Efficient Ransac implementation of CGAL, but whenever I try to set my own parameters, the algorithm doesn't detect any shape anymore.
This work is related to the Polyfit implementation in CGAL. I want to fine tune the plane detection to see the influence it has on the algorithm. When I use the standard call to ransac.detect(), it works perfectly. However, when I want to set my own parameters it just doesn't find any plane, even if I set them manually to the default values.
Here is my code, strongly related to this example
#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/IO/read_xyz_points.h>
#include <CGAL/IO/Writer_OFF.h>
#include <CGAL/property_map.h>
#include <CGAL/Surface_mesh.h>
#include <CGAL/Shape_detection/Efficient_RANSAC.h>
#include <CGAL/Polygonal_surface_reconstruction.h>
#ifdef CGAL_USE_SCIP
#include <CGAL/SCIP_mixed_integer_program_traits.h>
typedef CGAL::SCIP_mixed_integer_program_traits<double> MIP_Solver;
#elif defined(CGAL_USE_GLPK)
#include <CGAL/GLPK_mixed_integer_program_traits.h>
typedef CGAL::GLPK_mixed_integer_program_traits<double> MIP_Solver;
#endif
#if defined(CGAL_USE_GLPK) || defined(CGAL_USE_SCIP)
#include <CGAL/Timer.h>
#include <fstream>
typedef CGAL::Exact_predicates_inexact_constructions_kernel Kernel;
typedef Kernel::Point_3 Point;
typedef Kernel::Vector_3 Vector;
// Point with normal, and plane index
typedef boost::tuple<Point, Vector, int> PNI;
typedef std::vector<PNI> Point_vector;
typedef CGAL::Nth_of_tuple_property_map<0, PNI> Point_map;
typedef CGAL::Nth_of_tuple_property_map<1, PNI> Normal_map;
typedef CGAL::Nth_of_tuple_property_map<2, PNI> Plane_index_map;
typedef CGAL::Shape_detection::Efficient_RANSAC_traits<Kernel, Point_vector, Point_map, Normal_map> Traits;
typedef CGAL::Shape_detection::Efficient_RANSAC<Traits> Efficient_ransac;
typedef CGAL::Shape_detection::Plane<Traits> Plane;
typedef CGAL::Shape_detection::Point_to_shape_index_map<Traits> Point_to_shape_index_map;
typedef CGAL::Polygonal_surface_reconstruction<Kernel> Polygonal_surface_reconstruction;
typedef CGAL::Surface_mesh<Point> Surface_mesh;
int main(int argc, char ** argv)
{
Point_vector points;
// Loads point set from a file.
const std::string &input_file = argv[1];
//const std::string input_file(input);
std::ifstream input_stream(input_file.c_str());
if (input_stream.fail()) {
std::cerr << "failed open file \'" <<input_file << "\'" << std::endl;
return EXIT_FAILURE;
}
std::cout << "Loading point cloud: " << input_file << "...";
CGAL::Timer t;
t.start();
if (!input_stream ||
!CGAL::read_xyz_points(input_stream,
std::back_inserter(points),
CGAL::parameters::point_map(Point_map()).normal_map(Normal_map())))
{
std::cerr << "Error: cannot read file " << input_file << std::endl;
return EXIT_FAILURE;
}
else
std::cout << " Done. " << points.size() << " points. Time: " << t.time() << " sec." << std::endl;
// Shape detection
Efficient_ransac ransac;
ransac.set_input(points);
ransac.add_shape_factory<Plane>();
std::cout << "Extracting planes...";
t.reset();
// Set parameters for shape detection.
Efficient_ransac::Parameters parameters;
// Set probability to miss the largest primitive at each iteration.
parameters.probability = 0.05;
// Detect shapes with at least 500 points.
parameters.min_points = 100;
// Set maximum Euclidean distance between a point and a shape.
parameters.epsilon = 0.01;
// Set maximum Euclidean distance between points to be clustered.
parameters.cluster_epsilon = 0.01;
// Set maximum normal deviation.
// 0.9 < dot(surface_normal, point_normal);
parameters.normal_threshold = 0.9;
// Detect shapes.
ransac.detect(parameters);
//ransac.detect();
Efficient_ransac::Plane_range planes = ransac.planes();
std::size_t num_planes = planes.size();
std::cout << " Done. " << num_planes << " planes extracted. Time: " << t.time() << " sec." << std::endl;
// Stores the plane index of each point as the third element of the tuple.
Point_to_shape_index_map shape_index_map(points, planes);
for (std::size_t i = 0; i < points.size(); ++i) {
// Uses the get function from the property map that accesses the 3rd element of the tuple.
int plane_index = get(shape_index_map, i);
points[i].get<2>() = plane_index;
}
//////////////////////////////////////////////////////////////////////////
std::cout << "Generating candidate faces...";
t.reset();
Polygonal_surface_reconstruction algo(
points,
Point_map(),
Normal_map(),
Plane_index_map()
);
std::cout << " Done. Time: " << t.time() << " sec." << std::endl;
//////////////////////////////////////////////////////////////////////////
Surface_mesh model;
std::cout << "Reconstructing...";
t.reset();
if (!algo.reconstruct<MIP_Solver>(model)) {
std::cerr << " Failed: " << algo.error_message() << std::endl;
return EXIT_FAILURE;
}
const std::string& output_file(input_file+"_result.off");
std::ofstream output_stream(output_file.c_str());
if (output_stream && CGAL::write_off(output_stream, model))
std::cout << " Done. Saved to " << output_file << ". Time: " << t.time() << " sec." << std::endl;
else {
std::cerr << " Failed saving file." << std::endl;
return EXIT_FAILURE;
}
//////////////////////////////////////////////////////////////////////////
// Also stores the candidate faces as a surface mesh to a file
Surface_mesh candidate_faces;
algo.output_candidate_faces(candidate_faces);
const std::string& candidate_faces_file(input_file+"_candidate_faces.off");
std::ofstream candidate_stream(candidate_faces_file.c_str());
if (candidate_stream && CGAL::write_off(candidate_stream, candidate_faces))
std::cout << "Candidate faces saved to " << candidate_faces_file << "." << std::endl;
return EXIT_SUCCESS;
}
#else
int main(int, char**)
{
std::cerr << "This test requires either GLPK or SCIP.\n";
return EXIT_SUCCESS;
}
#endif // defined(CGAL_USE_GLPK) || defined(CGAL_USE_SCIP)
When launched, I have the following message:
Loading point cloud: Scene1/test.xyz... Done. 169064 points. Time: 0.428 sec.
Extracting planes... Done. 0 planes extracted. Time: 8.328 sec.
Generating candidate faces... Done. Time: 0.028 sec.
Reconstructing... Failed: at least 4 planes required to reconstruct a closed surface mesh (only 1 provided)
While I have this when launching the code the ransac detection function without parameters:
Loading point cloud: Scene1/test.xyz... Done. 169064 points. Time: 0.448 sec.
Extracting planes... Done. 18 planes extracted. Time: 3.088 sec.
Generating candidate faces... Done. Time: 94.536 sec.
Reconstructing... Done. Saved to Scene1/test.xyz_result.off. Time: 30.28 sec.
Can someone help me setting my own parameters for the ransac shape detection?
However, when I want to set my own parameters it just doesn't find any
plane, even if I set them manually to the default values.
Just to be sure: "setting them manually to the default values" is not what you are doing in the code you shared.
Default values are documented as:
1% of the total number of points for min_points, which should be around 1700 points in your case, not 100
1% of the bounding box diagonal for epsilon and cluster_epsilon. For that obviously I don't know if that is what you used (0.01) as I don't have access to your point set, but if you want to reproduce default values, you should use the CGAL::Bbox_3 object at some point
If you use these values, there's no reason why it should behave differently than with no parameters given (if it does not work, then please let me know because there may be a bug).

Unable to display <CL_DEVICE_MAX_WORK_ITEM_SIZE> in OpenCL and C++

I am new to OpenCL and I have a problem with displaying the <CL_DEVICE_MAX_WORK_ITEM_SIZES> as a whole number/value. Instead I get a memory address.
Initially, I tried to declare a string and integer output variable to display the maximum work item size. But eventually I found out that the work item size returns a size_t data type instead. So I created a vector to store the size_t variable but it outputs a memory address instead.
And also, my display shows the Device Number in the reverse order (shows Device #1 then Device #0 - this is used to select a device for the later part of my program)
Any help would be appreciated. Thank you.
int main()
{
std::vector<cl::Platform> platforms; // available platforms
std::vector<cl::Device> devices; // devices available to a platform
std::string outputString; // string for output
std::vector<::size_t> maxWorkItems[3];
unsigned int i, j; // counters
std::string choice; // user input choice
cl::Platform::get(&platforms);
std::cout << "Do you want to use a CPU or GPU device: ";
std::cin >> choice;
if (choice == "CPU" || choice == "cpu")
{
// for each platform
for (i = 0; i < platforms.size(); i++)
{
// get all CPU devices available to the platform
platforms[i].getDevices(CL_DEVICE_TYPE_ALL, &devices);
for (j = 0; j < devices.size(); j++)
{
cl_device_type type;
devices[j].getInfo(CL_DEVICE_TYPE, &type);
if (type == CL_DEVICE_TYPE_CPU) {
std::cout << "\tDevice #" << j << std::endl;
//outputs the device type
std::cout << "\tType: " << "CPU" << std::endl;
// get and output device name
outputString = devices[j].getInfo<CL_DEVICE_NAME>();
std::cout << "\tName: " << outputString << std::endl;
// get and output device vendor
outputString = devices[j].getInfo<CL_DEVICE_VENDOR>();
std::cout << "\tVendor: " << outputString << std::endl;
//get and output compute units
std::cout << "\tNumber of compute units: " << devices[j].getInfo<CL_DEVICE_MAX_COMPUTE_UNITS>() << std::endl;
//get and output workgroup size
std::cout << "\tMaximum work group size: " << devices[j].getInfo<CL_DEVICE_MAX_WORK_GROUP_SIZE>() << std::endl;
//get and output workitem size
maxWorkItems[0] = devices[j].getInfo<CL_DEVICE_MAX_WORK_ITEM_SIZES>();
std::cout << "\tMaximum work item size: " << maxWorkItems << std::endl;
//get and output local memory size
std::cout << "\tLocal memory size: " << devices[j].getInfo<CL_DEVICE_LOCAL_MEM_SIZE>() << std::endl;
std::cout << std::endl;
}
}
}
}
Below is the undesired output of my code:
The max work item size is in hexadecimal format, and the device numbers are in reverse order.
The CL_DEVICE_MAX_WORK_ITEM_SIZE property is of array type, specifically, size_t[]. You shouldn't be expecting a scalar value, but an array of CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS elements. With the OpenCL C++ wrapper, you're on the right track with the vector, but you've now declared an array of 3 vectors:
std::vector<::size_t> maxWorkItems[3];
You in fact just want the one vector that will hold all the returned values:
std::vector<::size_t> maxWorkItems;
The property query becomes:
maxWorkItems = devices[j].getInfo<CL_DEVICE_MAX_WORK_ITEM_SIZES>();
Then you should be able to query the max work items in each dimension using maxWorkItems[0], maxWorkItems[1], etc.

Parsing an edge list into a vector of structs

I am having a hard time parsing an edge list from a text file in c++. The edge list is in the following format:
*Edgeslist
1 6487
2 6488 6489 6490 6491 6492 6493 6494 6495 6496
3 6497 6498 6499 6500 6501 6502 6503 6504 6505
4 6506 6507 6508
5 6509 6510 6511
6 6512 6513 6514 6515
7 6516
8 6517 6518
9 6519 6520
10 6521 6522 6523 6524 6525 6526 6527 6528 6529 6530 6531 6532 6533 6534 6535
11 6566
My vector is a vector of structs that is defined here
struct Edge{
int character;
int edges[16];
};
The first number of each line should be read into the character integer and the rest should be read into the edges array. I have tried a few for loops, and currently working on a lengthy while loop with if statements for each number of possible integers to go into the array (max of 15 integers per line after the first number). Here is a part of my implementation so you can see what I am attempting.
while(std::getline(input, line))
{
int a, b, c, d, e, f, g, h, i, j, k, l, m, n, o;
std::stringstream ss(line);
if ( ss >> a)
{
std::cout << "1 " << a << "\n";
}
if ( ss >> a >> b)
{
std::cout << "2 " << a << " " << b << "\n";
}
if ( ss >> a >> b >> c)
{
std::cout << "3 " << a << " " << b << " " << c << "\n";
}
if ( ss >> a >> b >> c >> d)
{
std::cout << "4 " << a << " " << b << " " << c << " " << d << "\n";
}
I'll end it there but it does go on for awhile until it covers every possible line.
At the moment I am just trying to figure out the basic logic to parse this text file.
You have tagged this as C++.
I would recommend you add an initializer if you must continue with pod ...
struct Edge
{
int character;
int edges[16];
// more data attributes
// use ctor to initialize these values
Edge(void) :
character (0)
// edges[16]
{
for (int i=0; i<16; ++i)
edges[i] = 0;
}
// use dtor to clear them
~Edge(void)
{
for (int i=0; i<16; ++i)
edges[i] = 0;
character = 0;
// ...
}
};
I suspect you will also need a count of how many edges have currently been in installed (or perhaps call it nextIn).
The fundamentally important signature of C++ code is the preferred use of objects-defined-by-a-class. I recommend you consider:
struct Edge
{
int character; // poor name choice
std::vector<int> edges; // << use vector, not array
// use ctor to initialize these values
Edge(void) :
character (0)
// edges // default ctor does what you need
{
}
~Edge(void) {
// edges default dtor does what you need
character = 0;
}
};
The std::vector reduces your work to read arbitrary counts of values.
// Typical input:
// 3 6497 6498 6499 6500 6501 6502 6503 6504 6505
// 4 6506 6507 6508
#include <iostream>
#include <iomanip>
#include <sstream>
#include <vector>
struct Edge
{
int character; // <<< poor name choice
std::vector<int> edges; // <<< use vector, not array
// use ctor to initialize these values
Edge(void) :
character (0)
// edges default ctor does what you need
{
}
~Edge(void) {
// edges default dtor does what you need
character = 0;
}
bool ok(void) {
/*tbd - count errors? size check? */
return(true);
};
void load(std::string line)
{
// typical input line
// 3 6497 6498 6499 6500 6501 6502 6503 6504 6505
// 4 6506 6507 6508
std::stringstream ss(line+' ');
// padding at end ---------^----because ss.eof() sooner than I expected
//debug only
//std::cout << " in: (" << std::setw(3) << line.size() << ")
// << line << std::endl;
// process one work buff
do {
ss >> character; // read 1st int of line
if (ss.eof()) break;
if (ss.bad()) {
// maybe invalid integer format
std::cerr << "bad input: " << line << std::endl;
// tbd - error count?
break;
}
// process 1 or more entries for edge.vector from line
do {
int edgeVal = 0;
ss >> edgeVal;
if (ss.eof()) break;
if (ss.bad()) {
// maybe invalid integer format
std::cerr << "bad input: " << line << std::endl;
// tbd - error count?
break;
}
// additional edgeVal validations?
edges.push_back(edgeVal); // fill in one value to edge vector
// add validation here if edges.size() has an upper limit
// tbd - error count?
} while (1); // // process 1 or more entries to vector from line
} while(1); // one work buff
// debug only
dump();
} // void load(std::stringstream& ss, std::string line)
// for debug
void dump()
{
std::cout << "dump: (" << std::setw(3) << edges.size()
<< ") " << character << " ";
for (size_t i=0; i<edges.size(); ++i)
std::cout << edges[i] << " ";
std::cout << std::endl;
}
}; // struct Edge()
int t237(void)
{
std::vector<Edge> edgeVec;
// file processing at outer scope
do {
std::string line; // work buff
(void)std::getline(std::cin, line);
if(std::cin.eof()) break;
std::stringstream ss(line);
Edge temp; // a work buff
temp.load(line); // <<< load method for Edge (part of Edge)
// not sure where to put all the Edge objects
// temporarily, use edgeVec;
if (temp.ok()) // add flag check that edgeVec had no errors
edgeVec.push_back(temp);
else
/*tbd*/{}; // error in temp ... discard it? report it?
} while (1);
// tbd - how return vector and file status
return (0);
}
---- update
ss.eof() occurring before I expected ... added "padding at end"
added dump() debug method, added debug cout of input line
minimal testing complete
You should split your string into substrings at whitespaces. Details are explained here.
After that, you just cast your substrings to appropiate type.
std::stringstream ss(line);
ss >> character;
unsigned int n=0;
while(ss >> edges[n])
{
++n;
}
(One could make this a little shorter, but that would make it less readable.)

Losing fields when filling C++ std::map

I have a problem in using the std::map correctly. The class Example is a class with an ID, a label, a vector of keypoints and a descriptor matrix. The class Examples is a map for retrieving an example given its ID. The examples are read from files on disk, stored in the map, then used later.
Even if it is conceptually very simple, I am not able to fill the map properly.
I have the following class:
class Example
{
public:
std::string id;
std::string label;
std::vector<cv::KeyPoint> keypoints;
cv::Mat descriptors;
Example(std::string id_, std::string label_)
: id(id_), label(label_)
{
// ... nothing ...
}
string to_string() const
{
stringstream ss;
ss << "#" << id
<< " (" << label << ")"
<< " - #keypoints " << keypoints.size()
<< ", descr " << descriptors.rows << " x " << descriptors.cols;
return ss.str();
} // to_string
}; // class Example
ostream& operator <<(ostream & out, const Example &ex)
{
out << ex.to_string();
return out;
} // operator <<
And this one:
// OLD: class Examples : public std::map<std::string, Example*> {
class Examples {
// New line after Martini's comment
std::map<std::string, Example*> _map;
[...]
void fill() {
// create an example
Example *example = new Example(id, label);
// inputstream in
// Read all the keypoints
cv::KeyPoint p;
for(int i=0; ... ) {
in.read(reinterpret_cast<char *>(&p), sizeof(cv::KeyPoint));
example->keypoints.push_back(p); // push_back copies p
} // for
// ... misc code
cv::Mat descr(n_keypoints, d_size, cv_type, cv::Scalar(1));
// ... read Mat from inputstream in, then assign it to the example
example->descriptors = descr;
// SEE THE OUTPUT BELOW
std::cout << "INSERT THIS: " << (*example) << std::endl;
_map.insert(std::pair<string,Example*>(id, example));
std::cout << "READ THIS: " << *(get_example(id)) << std::endl;
// ... other code
} // fill
// Code modified after Martini's comment.
Example* get_example(const std::string &id) const {
std::map<std::string, Example*>::const_iterator it = _map.find(id);
if( it == _map.end()) {
// ... manage error
// ... then exit
} // if
return it->second;
} // get_example
} // class Examples
The output from the insert/get lines is:
INSERT THIS: #122225084 (label) - #keypoints 711, descr 711 x 128
READ THIS: #122225084 (label) - #keypoints 0, descr 0 x 0
In the insert I had a pointer to an example with 711 keypoints and a 711x128 descriptor matrix. If I read the example using its ID right after the insert, I get a pointer to an example with 0 keypoints and an empty matrix.
What am I doing wrong?
Looking into your code one possible explanation is that you already have element in the map with the same key. To diagnose that first of all print value of pointer before you add object and after that (something like this):
std::cout << "INSERT THIS: " << (void *)example << " " << (*example) << std::endl;
_map.insert(std::pair<string,Example*>(id, example));
std::cout << "READ THIS: " << (void *)get_example(id) << " " << *(get_example(id)) << std::endl;
Next or another way is to check result of insert:
if( !_map.insert(std::pair<string,Example*>(id, example)).second )
std::cout << "ERROR: example:" << id << " is already there";
If you want to override element unconditionally you can use oprator[]:
_map[ id ] = example;
If there are really duplicates you will get memory leak (you are getting it anyway) so I would strongly recommend to use smart pointer to store data in your map.

C++ Structs in arrays

Am i doing this right, I want a map with a Integer as key, and struct as value. What is the easiest way to, say I want the object at 1. How do I retrieve the value of isIncluded? The last two lines in the code, I tried doing it, but then I realized I donĀ“t really know what is the way to retrieving values of structs in a numbered Map array.
Do I need to call cells.get(1) and assign that to a new temporarely struct to get its values?
/** set ups cells map. with initial state of all cells and their info*/
void setExcludedCells (int dimension)
{
// Sets initial state for cells
cellInfo getCellInfo;
getCellInfo.isIncluded = false;
getCellInfo.north = 0;
getCellInfo.south = 0;
getCellInfo.west = 0;
getCellInfo.east = 0;
for (int i = 1; i <= (pow(dimension, 2)); i++)
{
cells.put(i, getCellInfo);
}
cout << "Cells map initialized. Set [" << + cells.size() << "] cells to excluded: " << endl;
cells.get(getCellInfo.isIncluded);
cells.get(1);
}
the Map, is declared as an private instance variable like this:
struct cellInfo {
bool isIncluded;
int north; // If value is 0, that direction is not applicable (border of grid).
int south;
int west;
int east;
};
Map<int, cellInfo> cells; // Keeps track over included /excluded cells
From the documentation for Map, it appears that .get() returns a ValueType.
You would use it thus:
// Display item #1
std::cout << cells.get(1).isIncluded << "\n";
std::cout << cells.get(1).north << "\n";
Or, since the lookup is relatively expensive, you could copy it to a local variable:
// Display item #1 via initialized local variable
cellInfo ci = cells.get(1);
std::cout << ci.isIncluded << " " << ci.north << "\n";
// Display item #2 via assigned-to local variable
ci = cells.get(2);
std::cout << ci.isIncluded << " " << ci.north << "\n";
My best advice is to use the standard library's std::map data structure instead:
// Expensive way with multiple lookups:
std::cout << cells[1].isIncluded << " " << cells[1].north << "\n";
// Cheap way with one lookup and no copies
const cellinfo& ci(maps[1]);
std::cout << ci.isIncluded << " " << ci.north << "\n";