Write a bimap to binary file and then read it - c++

I would like to know how to write a bimap which is actually too large( 180 million to 3000 million entries) to a binary file and then read to do some operation. To create a bimap I have the following code, where I created two streams to write and read binary data. I also insert the elements into the bimap.
#include <string>
#include <iostream>
#include <utility>
#include <fstream>
#include <boost/bimap.hpp>
#include <boost/bimap/unordered_set_of.hpp>
#include <boost/bimap/unordered_multiset_of.hpp>
namespace bimaps = boost::bimaps;
typedef boost::bimap<bimaps::unordered_set_of<unsigned long long int>,
bimaps::unordered_multiset_of<unsigned long long int > > bimap_reference;
typedef bimap_reference::value_type position;
bimap_reference numbers;
int main()
{
std::ofstream outfile ("bmap",std::ofstream::binary);
std::ifstream infile ("bmap",std::ifstream::binary);
numbers.insert(position(123456, 100000));
numbers.insert(position(234567, 80000));
numbers.insert(position(345678, 100000));
numbers.insert(position(456789, 80000));
//want to write the file
//want to read the file
// So that I can perform the following operation
using ritr = bimap_reference::right_const_iterator;
std::pair<ritr, ritr> range = numbers.right.equal_range(80000);
auto itr = range.first;
std::cout<<"first: "<<itr->first<<std::endl;
if(itr != numbers.right.end() && itr->second ==80000){
for (itr = range.first; itr != range.second; ++itr)
{
std::cout<<"numbers:"<<itr->second<<"<->"<<itr->first<<std::endl;
}
}
else {
std::cout<<"Not found:"<<std::endl;
}
return 0;
}
I want to write the bimap, and then read it again to perform some operation. How to do it.

To handle bimap write/read to/from binary file, boost serialization is very helpful. You need to include
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
as header files. Then you need to have file streams for write and read, and use boost::archive::binary_oarchive to write and boost::archive::binary_iarchive to read back. Also make sure you compile the code using -lboost_serialization. The full code is given below.
#include <string>
#include <iostream>
#include <utility>
#include <fstream>
#include <boost/bimap.hpp>
#include <boost/bimap/unordered_set_of.hpp>
#include <boost/bimap/unordered_multiset_of.hpp>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
namespace bimaps = boost::bimaps;
typedef boost::bimap<bimaps::unordered_set_of<unsigned long long int>,
bimaps::unordered_multiset_of<unsigned long long int > > bimap_reference;
typedef bimap_reference::value_type position;
bimap_reference numbers;
int main()
{
// insert elements into bimap and write to a binary file
{
numbers.insert(position(123456, 100000));
numbers.insert(position(234567, 80000));
numbers.insert(position(345678, 100000));
numbers.insert(position(456789, 80000));
std::ofstream ofs("data");
boost::archive::binary_oarchive oa(ofs);
oa << const_cast<const bimap_reference&>(numbers);
const bimap_reference::left_iterator left_iter = numbers.left.find(123456);
oa << left_iter;
const bimap_reference::right_iterator right_iter = numbers.right.find(100000);
oa << right_iter;
}
// load the bimap back to memory
{
std::ifstream ifs("data", std::ios::binary);
boost::archive::binary_iarchive ia(ifs);
ia >> numbers;
assert( numbers.size() == 4 ); // to throw an error
bimap_reference::left_iterator left_iter;
ia >> left_iter;
assert( left_iter->first == 123456 );
bimap_reference::right_iterator right_iter;
ia >> right_iter;
assert( right_iter->first == 100000 );
}
// then perform the following operation
using ritr = bimap_reference::right_const_iterator;
std::pair<ritr, ritr> range = numbers.right.equal_range(80000);
auto itr = range.first;
std::cout<<"first: "<<itr->first<< " <-> " << itr->second<<std::endl;
if(itr != numbers.right.end() && itr->first ==80000){
for (itr = range.first; itr != range.second; ++itr)
{
std::cout<<"numbers:"<<itr->second<<"<->"<<itr->first<<std::endl;
}
}
else {
std::cout<<"Not found:"<<std::endl;
}
return 0;
}

Related

push_back all contents in vector<char> to combine them as the first element of vector<string>

I'm trying to parse a string with spaces into several strings and store them into a list, which consists of strings without any space. I do not know how long the input of the string will me and I have the following code:
#include <bits/stdc++.h>
#include <sstream>
using namespace std;
vector<string> myWords;
vector<char> myBuffer;
int main() {
string mySentence;
getline(cin, mySentence);
int j = 0;
for (int i = 0; i < mySentence.length(); i++) {
if (mySentence[i] != ' ') myBuffer.push_back(mySentence[i]);
else {
myWords.push_back(myBuffer);
myBuffer.clear();
j++;
}
}
return 0;
}
The error in which I'm getting is at myWords.push_back(myBuffer);. How do I get around this?
The problem is that you are trying to push a std::vector<char> where a std::string is expected. So simply change the type of myBuffer to a std::string:
#include <iostream>
#include <string>
int main() {
std::string mySentence;
std::getline(std::cin, mySentence);
std::vector<std::string> myWords;
std::string myBuffer;
for (int i = 0; i < mySentence.length(); i++) {
if (mySentence[i] != ' ')
myBuffer.push_back(mySentence[i]);
else {
myWords.push_back(myBuffer);
myBuffer.clear();
}
}
if (!myBuffer.empty()) {
myWords.push_back(myBuffer);
}
// use myWords as needed...
return 0;
}
That being said, using a std::istringstream would be much simpler, as operator>> reads whitespace-delimited values from a stream for you:
#include <iostream>
#include <string>
#include <sstream>
int main() {
std::string mySentence;
std::getline(std::cin, mySentence);
std::vector<std::string> myWords;
std::string myBuffer;
std::istringstream iss(mySentence);
while (iss >> myBuffer) {
myWords.push_back(myBuffer);
}
// use myWords as needed...
return 0;
}
Alternatively, let the standard library handle the reading and pushing for you:
#include <iostream>
#include <string>
#include <sstream>
#include <iterator>
int main() {
std::string mySentence;
std::getline(std::cin, mySentence);
std::vector<std::string> myWords;
std::istringstream iss(mySentence);
std::copy(
std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>(),
std::back_inserter(myWords)
);
// use myWords as needed...
return 0;
}

How to read CSV file and assign to Eigen Matrix?

I try to read a large cvs file into Eigen Matrix, below the code found having problem where it can not detect each line of \n in cvs file to create multiple rows in the matrix. (It read entire file with single row). Not sure what's wrong with the code. Can anyone suggest here?
Im also looking for a effective way to read csv file with 10k of rows and 1k of cols. Not so sure the code below will be the best effective way? Very appreciated with your comment.
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <fstream>
#include <istream> //DataFile.fail() function
#include <vector>
#include <set>
#include <string>
using namespace std;
#include <Eigen/Core>
#include <Eigen/Dense>
using namespace Eigen;
void readCSV(istream &input, vector< vector<string> > &output)
{
int a = 0;
int b = 0;
string csvLine;
// read every line from the stream
while( std::getline(input, csvLine) )
{
istringstream csvStream(csvLine);
vector<string> csvColumn;
MatrixXd mv;
string csvElement;
// read every element from the line that is seperated by commas
// and put it into the vector or strings
while( getline(csvStream, csvElement, ' ') )
{
csvColumn.push_back(csvElement);
//mv.push_back(csvElement);
b++;
}
output.push_back(csvColumn);
a++;
}
cout << "a : " << a << " b : " << b << endl; //a doen't detect '\n'
}
int main(int argc, char* argv[])
{
cout<< "ELM" << endl;
//Testing to load dataset from file.
fstream file("Sample3.csv", ios::in);
if(!file.is_open())
{
cout << "File not found!\n";
return 1;
}
MatrixXd m(3,1000);
// typedef to save typing for the following object
typedef vector< vector<string> > csvVector;
csvVector csvData;
readCSV(file, csvData);
// print out read data to prove reading worked
for(csvVector::iterator i = csvData.begin(); i != csvData.end(); ++i)
{
for(vector<string>::iterator j = i->begin(); j != i->end(); ++j)
{
m(i,j) = *j;
cout << *j << ", ";
}
cout << "\n";
}
}
I will also attach a sample cvs file. https://onedrive.live.com/redir?resid=F1507EBE7BF1C5B!117&authkey=!AMzCnpBqxUyF1BA&ithint=file%2ccsv
Here's something you can actually copy-paste
Writing your own "parser"
Pros: lightweight and customizable
Cons: customizable
#include <Eigen/Dense>
#include <vector>
#include <fstream>
using namespace Eigen;
template<typename M>
M load_csv (const std::string & path) {
std::ifstream indata;
indata.open(path);
std::string line;
std::vector<double> values;
uint rows = 0;
while (std::getline(indata, line)) {
std::stringstream lineStream(line);
std::string cell;
while (std::getline(lineStream, cell, ',')) {
values.push_back(std::stod(cell));
}
++rows;
}
return Map<const Matrix<typename M::Scalar, M::RowsAtCompileTime, M::ColsAtCompileTime, RowMajor>>(values.data(), rows, values.size()/rows);
}
Usage:
MatrixXd A = load_csv<MatrixXd>("C:/Users/.../A.csv");
Matrix3d B = load_csv<Matrix3d>("C:/Users/.../B.csv");
VectorXd v = load_csv<VectorXd>("C:/Users/.../v.csv");
Using the armadillo library's parser
Pros: supports other formats as well, not just csv
Cons: extra dependency
#include <armadillo>
template <typename M>
M load_csv_arma (const std::string & path) {
arma::mat X;
X.load(path, arma::csv_ascii);
return Eigen::Map<const M>(X.memptr(), X.n_rows, X.n_cols);
}
Read the CSV file into your vector < vector > as you please (e.g. Lucas's answer). Instead of the vector< vector<string> > construct, use a vector< vector<double> > or even better a simple vector< double >. To assign the vector of vectors to an Eigen matrix efficiently using vector< vector< double > >, use the following:
Eigen::MatrixXcd mat(rows, cols);
for(int i = 0; i < rows; i++)
mat.row(i) = Eigen::Map<Eigen::VectorXd> (csvData[i].data(), cols).cast<complex<double> >();
If you opted to use the vector< double > option, it becomes:
Eigen::MatrixXcd mat(rows, cols);
mat = Eigen::Map<Eigen::VectorXd> (csvData.data(), rows, cols).cast<complex<double> >().transpose();
This will read from a csv file correctly:
std::ifstream indata;
indata.open(filename);
std::string line;
while (getline(indata, line))
{
std::stringstream lineStream(line);
std::string cell;
while (std::getline(lineStream, cell, ','))
{
//Process cell
}
}
Edit: Also, since your csv is full of numbers, make sure to use std::stod or the equivalent conversion once you expect to treat them as such.

Access elements of boost tokenizer

I'm trying to assign columns of a file using boost to a std::map. I would like to assign element 0 from each line to the index and element 2 to the value. Is there a way to do this without an iterator? The addr_lookup line does not work.
#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <boost/tokenizer.hpp>
#include <boost/lexical_cast.hpp>
int main()
{
std::ifstream myfile("core_info_lowbits.tab", std::ios_base::in);
std::string line;
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
boost::char_separator<char> sep(" ");
std::map<std::string, unsigned int> addr_lookup;
while ( std::getline (myfile,line) )
{
tokenizer tokens(line, sep);
//Line below does not work
addr_lookup[*tokens.begin()] = boost::lexical_cast<unsigned int> (*(tokens.begin()+2));
for (tokenizer::iterator tok_iter=tokens.begin();
tok_iter != tokens.end(); ++tok_iter)
std::cout << *tok_iter << std::endl;
}
}
You are trying to advance the iterator using +, which is not possible
Use:
tokenizer::iterator it1,it2= tokens.begin();
it1=it2;
++it2; ++it2;
addr_lookup[*it1] = boost::lexical_cast<unsigned int> (*it2);
Or simply,
tokenizer::iterator it1,it2= tokens.begin();
it1=it2;
std::advance(it2,2);
addr_lookup[*it1] = boost::lexical_cast<unsigned int> (*it2);

vector<string>::iterator - how to find position of an element

I am using the following code to find a string in an std::vector of string type. But how to return the position of particular element?
Code:
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
int main() {
vector<string> vec;
vector<string>::iterator it;
vec.push_back("H");
vec.push_back("i");
vec.push_back("g");
vec.push_back("h");
vec.push_back("l");
vec.push_back("a");
vec.push_back("n");
vec.push_back("d");
vec.push_back("e");
vec.push_back("r");
it=find(vec.begin(),vec.end(),"r");
//it++;
if(it!=vec.end()){
cout<<"FOUND AT : "<<*it<<endl;
}
else{
cout<<"NOT FOUND"<<endl;
}
return 0;
}
Output:
FOUND AT : r
Expected Output:
FOUND AT : 9
You can use std::distance for that:
auto pos = std::distance(vec.begin(), it);
For an std::vector::iterator, you can also use arithmetic:
auto pos = it - vec.begin();
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
int main() {
vector<string> vec;
vector<string>::iterator it;
vec.push_back("H");
vec.push_back("i");
vec.push_back("g");
vec.push_back("h");
vec.push_back("l");
vec.push_back("a");
vec.push_back("n");
vec.push_back("d");
vec.push_back("e");
vec.push_back("r");
it=find(vec.begin(),vec.end(),"a");
//it++;
int pos = distance(vec.begin(), it);
if(it!=vec.end()){
cout<<"FOUND "<< *it<<" at position: "<<pos<<endl;
}
else{
cout<<"NOT FOUND"<<endl;
}
return 0;
Use following :
if(it != vec.end())
std::cout<< "Found At :" << (it-vec.begin()) ;
Use this statement:
it = find(vec.begin(), vec.end(), "r") - vec.begin();

Infinite loop on 'getline' in Visual Studio 2010

I am working through some C++ exercises in Visual Studio 2010, and I keep having problems with an infinite loop which occurs when I try to terminate a standard in stream with "CTRL-Z", when using the getline() function. Here is the relevant bit of code....
// find all the lines that refer to each word in the input
map<string, vector<int> >
xref(istream& in,
vector<string> find_words(const string&) = split)
{
string line;
int line_number = 0;
map<string, vector<int> > ret;
// read the next line
while (getline(in, line)) {
++line_number;
// break the input line into words
vector<string> words = find_words(line);
// remember that each word occurs on the current line
for (vector<string>::const_iterator it = words.begin();
it != words.end(); ++it)
ret[*it].push_back(line_number);
}
return ret;
}
...instead of kicking me out of the while loop, the program goes into an infinite loop printing a random integer. I'm pretty sure this is something specific to the Windows environment that I'm missing. Here's the entire code...
#include <algorithm>
#include <cctype>
#include <string>
#include <vector>
#include "split.h"
using std::find_if;
using std::string;
using std::vector;
using std::isspace;
// `true' if the argument is whitespace, `false' otherwise
bool space(char c)
{
return isspace(c);
}
// `false' if the argument is whitespace, `true' otherwise
bool not_space(char c)
{
return !isspace(c);
}
vector<string> split(const string& str)
{
typedef string::const_iterator iter;
vector<string> ret;
iter i = str.begin();
while (i != str.end()) {
// ignore leading blanks
i = find_if(i, str.end(), not_space);
// find end of next word
iter j = find_if(i, str.end(), space);
// copy the characters in `[i,' `j)'
if (i != str.end())
ret.push_back(string(i, j));
i = j;
}
return ret;
}
#include <map>
#include <iostream>
#include <string>
#include <vector>
#include "split.h"
using std::cin; using std::cout;
using std::endl; using std::getline;
using std::istream; using std::string;
using std::vector; using std::map;
// find all the lines that refer to each word in the input
map<string, vector<int> >
xref(istream& in,
vector<string> find_words(const string&) = split)
{
string line;
int line_number = 0;
map<string, vector<int> > ret;
// read the next line
while (getline(in, line)) {
++line_number;
// break the input line into words
vector<string> words = find_words(line);
// remember that each word occurs on the current line
for (vector<string>::const_iterator it = words.begin();
it != words.end(); ++it)
ret[*it].push_back(line_number);
}
return ret;
}
int main()
{
// call `xref' using `split' by default
map<string, vector<int> > ret = xref(cin);
// write the results
for (map<string, vector<int> >::const_iterator it = ret.begin();
it != ret.end(); ++it) {
// write the word
cout << it->first << " occurs on line(s): ";
// followed by one or more line numbers
vector<int>::const_iterator line_it = it->second.begin();
cout << *line_it; // write the first line number
++line_it;
// write the rest of the line numbers, if any
while (line_it != it->second.end()) {
cout << ", " << *line_it;
++line_it;
}
// write a new line to separate each word from the next
cout << endl;
}
return 0;
}
I think instead of trying to make this work, I'd start by writing code I could understand (and for me to understand it, the code has to be fairly simple):
#include <map>
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <iterator>
#include "infix_iterator.h"
typedef std::map<std::string, std::vector<unsigned> > index;
namespace std {
ostream &operator<<(ostream &os, index::value_type const &i) {
os << i.first << ":\t";
std::copy(i.second.begin(), i.second.end(),
infix_ostream_iterator<unsigned>(os, ", "));
return os;
}
}
void add_words(std::string const &line, size_t num, index &i) {
std::istringstream is(line);
std::string temp;
while (is >> temp)
i[temp].push_back(num);
}
int main() {
index i;
std::string line;
size_t line_number = 0;
while (std::getline(std::cin, line))
add_words(line, ++line_number, i);
std::copy(i.begin(), i.end(),
std::ostream_iterator<index::value_type>(std::cout, "\n"));
return 0;
}
As (more or less) usual, this needs the infix_ostream_iterator I've posted elsewhere.