Boost Tokenizer strange symbols at start

Boost Tokenizer strange symbols at start - c++

#include <iostream>
#include <optional>
#include <string>
#include <boost/tokenizer.hpp>
int main() {
std::string a("http://website/some-path/,file1,file2");
char *ptr = (char *)a.c_str();
boost::char_separator<char> delim(",");
std::vector<std::string> pths{};
boost::tokenizer<boost::char_separator<char>> tokens(
std::string(ptr), delim);
std::optional<std::string> pref = std::nullopt;
for (const auto& tok : tokens) {
if (!pref) {
pref = tok;
std::cerr << "prfix is set: " << tok << std::endl;
continue;
}
pths.push_back(*pref + tok);
}
for(auto &t : pths) {
std::cout << t << std::endl;
}
}
My output:
prfix is set: �site/some-path/
�site/some-path/file1
�site/some-path/file2
The question is, what is wrong with the above? If I work with std::regex, it is fine.
EDIT: the scenario with *ptr is the one I actually had: the original string was passed to a function as char *, hence the above. This is to answer the comment by #273K.

A lot can be simplified, at once removing the problems:
Live On Coliru
#include <boost/tokenizer.hpp>
#include <iostream>
#include <optional>
#include <string>
auto generate(std::string const& a) {
boost::tokenizer tokens(a, boost::char_separator<char>{","});
std::optional<std::string> prefix;
std::vector<std::string> result;
for (const auto& tok : tokens) {
if (!prefix)
prefix = tok;
else
result.push_back(*prefix + tok);
}
return result;
}
int main() {
for (auto& t : generate("http://website/some-path/,file1,file2"))
std::cout << t << std::endl;
}
Prints
http://website/some-path/file1
http://website/some-path/file2

Related

Can't iterate through all the words in thr file.txt

I have a txt file which contains two txt file references ei. main.txt contains eg1.txt and eg2.txt and i have to access the content in them and find the occurences of every word and return a string with the word and the documents it was preasent in(0 being eg1.txt and 1 being eg2.txt). My program compiles but I can't get past the first word I encounter. It gives the right result (word: 0 1) since the word is preasent in both the files and in the first position but it doesn't return the other words. Could someone please help me find the error? Thank you
string func(string filename) {
map<string, set<int> > invInd;
string line, word;
int fileNum = 0;
ifstream list (filename, ifstream::in);
while (!list.eof()) {
string fileName;
getline(list, fileName);
ifstream input_file(fileName, ifstream::in); //function to iterate through file
if (input_file.is_open()) {
while (getline(input_file, line)) {
stringstream ss(line);
while (ss >> word) {
if (invInd.find(word) != invInd.end()) {
set<int>&s_ref = invInd[word];
s_ref.insert(fileNum);
}
else {
set<int> s;
s.insert(fileNum);
invInd.insert(make_pair<string, set<int> >(string(word) , s));
}
}
}
input_file.close();
}
fileNum++;
}

Basically your function works. It is a little bit complicated, but i works.
After removing some syntax errors, the main problem is, that you do return nothing from you function. There is also no output statement.
Let me show you you the corrected function which shows some output.
#include <string>
#include <map>
#include <iostream>
#include <fstream>
#include <set>
#include <sstream>
#include <utility>
using namespace std;
void func(string filename) {
map<string, set<int> > invInd;
string line, word;
int fileNum = 0;
ifstream list(filename, ifstream::in);
while (!list.eof()) {
string fileName;
getline(list, fileName);
ifstream input_file(fileName, ifstream::in); //function to iterate through file
if (input_file.is_open()) {
while (getline(input_file, line)) {
stringstream ss(line);
while (ss >> word) {
if (invInd.find(word) != invInd.end()) {
set<int>& s_ref = invInd[word];
s_ref.insert(fileNum);
}
else {
set<int> s;
s.insert(fileNum);
invInd.insert(make_pair(string(word), s));
}
}
}
input_file.close();
}
fileNum++;
}
// Show the output
for (const auto& [word, fileNumbers] : invInd) {
std::cout << word << " : ";
for (const int fileNumber : fileNumbers) std::cout << fileNumber << ' ';
std::cout << '\n';
}
return;
}
int main() {
func("files.txt");
}
This works, I tested it. But maybe you want to return the findings to your main function. Then you should write:
#include <string>
#include <map>
#include <iostream>
#include <fstream>
#include <set>
#include <sstream>
#include <utility>
using namespace std;
map<string, set<int> > func(string filename) {
map<string, set<int> > invInd;
string line, word;
int fileNum = 0;
ifstream list(filename, ifstream::in);
while (!list.eof()) {
string fileName;
getline(list, fileName);
ifstream input_file(fileName, ifstream::in); //function to iterate through file
if (input_file.is_open()) {
while (getline(input_file, line)) {
stringstream ss(line);
while (ss >> word) {
if (invInd.find(word) != invInd.end()) {
set<int>& s_ref = invInd[word];
s_ref.insert(fileNum);
}
else {
set<int> s;
s.insert(fileNum);
invInd.insert(make_pair(string(word), s));
}
}
}
input_file.close();
}
fileNum++;
}
return invInd;
}
int main() {
map<string, set<int>> data = func("files.txt");
// Show the output
for (const auto& [word, fileNumbers] : data) {
std::cout << word << " : ";
for (const int fileNumber : fileNumbers) std::cout << fileNumber << ' ';
std::cout << '\n';
}
}
Please enable C++17 in your compiler.
And please see below a brushed up solution. A little bit cleaner and compacter, with comments and better variable names.
#include <string>
#include <map>
#include <iostream>
#include <fstream>
#include <set>
#include <sstream>
#include <utility>
using WordFileIndicator = std::map<std::string, std::set<int>>;
WordFileIndicator getWordsWithFiles(const std::string& fileNameForFileLists) {
// Here will stor the resulting output
WordFileIndicator wordFileIndicator{};
// Open the file and check, if it could be opened
if (std::ifstream istreamForFileList{ fileNameForFileLists }; istreamForFileList) {
// File number Reference
int fileNumber{};
// Read all filenames from the list of filenames
for (std::string fileName{}; std::getline(istreamForFileList, fileName) and not fileName.empty();) {
// Open the files to read their content. Check, if the file could be opened
if (std::ifstream ifs{ fileName }; ifs) {
// Add word and associated file number to set
for (std::string word{}; ifs >> word; )
wordFileIndicator[word].insert(fileNumber);
}
else std::cerr << "\n*** Error: Could not open '" << fileName << "'\n\n";
// Continue with next file
++fileNumber;
}
}
else std::cerr << "\n*** Error: Could not open '" << fileNameForFileLists << "'\n\n";
return wordFileIndicator;
}
// Some test code
int main() {
// Get result. All words and in which file they exists
WordFileIndicator data = getWordsWithFiles("files.txt");
// Show the output
for (const auto& [word, fileNumbers] : data) {
std::cout << word << " : ";
for (const int fileNumber : fileNumbers) std::cout << fileNumber << ' ';
std::cout << '\n';
}
}
There would be a much faster solution by using std::unordered_map and std::unordered_set

Please make sure your code is composed from many small functions. This improves readability, it easier to reason what code does, in such form parts of code can be reused in alternative context.
Here is demo how it can looks like and why it is better to have small functions:
#include <algorithm>
#include <filesystem>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <string>
#include <unordered_map>
#include <vector>
struct FileData
{
std::filesystem::path path;
int index;
};
bool operator==(const FileData& a, const FileData& b)
{
return a.index == b.index && a.path == b.path;
}
bool operator!=(const FileData& a, const FileData& b)
{
return !(a == b);
}
using WordLocations = std::unordered_map<std::string, std::vector<FileData>>;
template<typename T>
void mergeWordsFrom(WordLocations& loc, const FileData& fileData, T b, T e)
{
for (; b != e; ++b)
{
auto& v = loc[*b];
if (v.empty() || v.back() != fileData)
v.push_back(fileData);
}
}
void mergeWordsFrom(WordLocations& loc, const FileData& fileData, std::istream& in)
{
return mergeWordsFrom(loc, fileData, std::istream_iterator<std::string>{in}, {});
}
void mergeWordsFrom(WordLocations& loc, const FileData& fileData)
{
std::ifstream f{fileData.path};
return mergeWordsFrom(loc, fileData, f);
}
template<typename T>
WordLocations wordLocationsFromFileList(T b, T e)
{
WordLocations loc;
FileData fileData{{}, 0};
for (; b != e; ++b)
{
++fileData.index;
fileData.path = *b;
mergeWordsFrom(loc, fileData);
}
return loc;
}
WordLocations wordLocationsFromFileList(std::istream& in)
{
return wordLocationsFromFileList(std::istream_iterator<std::filesystem::path>{in}, {});
}
WordLocations wordLocationsFromFileList(const std::filesystem::path& p)
{
std::ifstream f{p};
f.exceptions(std::ifstream::badbit);
return wordLocationsFromFileList(f);
}
void printLocations(std::ostream& out, const WordLocations& locations)
{
for (auto& [word, filesData] : locations)
{
out << std::setw(10) << word << ": ";
for (auto& file : filesData)
{
out << std::setw(3) << file.index << ':' << file.path << ", ";
}
out << '\n';
}
}
int main()
{
auto locations = wordLocationsFromFileList("files.txt");
printLocations(std::cout, locations);
}
https://wandbox.org/permlink/nBbqYV986EsqvN3t

How to use boost::hash to get the file content hash?

Is it possible to use boost:hash function to generate a file content hash with fixed length like MD5?
Is there a quick solution for this?
If not, what is the simplest way?

No, Boost doesn't implement MD5. Use a crypto/hash library for this.
CryptoC++ is nice in my experience.
OpenSSL implements all the popular digests, here's a sample that uses OpenSSL:
Live On Coliru
#include <openssl/md5.h>
#include <iostream>
#include <iomanip>
// Print the MD5 sum as hex-digits.
void print_md5_sum(unsigned char* md) {
for(unsigned i=0; i <MD5_DIGEST_LENGTH; i++) {
std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(md[i]);
}
}
#include <string>
#include <vector>
#include <fstream>
int main(int argc, char *argv[]) {
using namespace std;
vector<string> const args(argv+1, argv+argc);
for (auto& fname : args) {
MD5_CTX ctx;
MD5_Init(&ctx);
ifstream ifs(fname, std::ios::binary);
char file_buffer[4096];
while (ifs.read(file_buffer, sizeof(file_buffer)) || ifs.gcount()) {
MD5_Update(&ctx, file_buffer, ifs.gcount());
}
unsigned char digest[MD5_DIGEST_LENGTH] = {};
MD5_Final(digest, &ctx);
print_md5_sum(digest);
std::cout << "\t" << fname << "\n";
}
}

boot has such functionality. At lest now: https://www.boost.org/doc/libs/master/libs/uuid/doc/uuid.html
#include <iostream>
#include <algorithm>
#include <iterator>
#include <boost/uuid/detail/md5.hpp>
#include <boost/algorithm/hex.hpp>
using boost::uuids::detail::md5;
std::string toString(const md5::digest_type &digest)
{
const auto charDigest = reinterpret_cast<const char *>(&digest);
std::string result;
boost::algorithm::hex(charDigest, charDigest + sizeof(md5::digest_type), std::back_inserter(result));
return result;
}
int main ()
{
std::string s;
while(std::getline(std::cin, s)) {
md5 hash;
md5::digest_type digest;
hash.process_bytes(s.data(), s.size());
hash.get_digest(digest);
std::cout << "md5(" << s << ") = " << toString(digest) << '\n';
}
return 0;
}
Live Example

replacing a string in a vector without positioning

In the code i am working on now I have a vector load itself from a txt file now I was trying to see if their was a way to replace certain words in the vector without needing a position or anything
so for example if the txt contained a list of animals and i wanted to change bird to book how would i do that without need the position of the letters
#include <iostream>
#include <string>
#include <vector>
#include <fstream>
using namespace std;
vector <string> test;
int main()
{
string file;
fstream fout( "Vector.txt" );
while ( !fout.eof())
{
getline(fout,file);
test.push_back(file);
}
fout.close();
for( int i = 0; i < test.size(); i++)
{
cout << test[i] << endl;
}
system("pause");
}
txt contains:
dog
cat
bird
hippo
wolf

Use std::transform().
std::string bird2book(const string &str)
{
if (str == "bird")
return "book";
return str;
}
std::transform(test.begin(), test.end(), test.begin(), bird2book);

you can use std::replace
std::replace (test.begin(), test.end(), "bird", "book");

Try this:
typedef std::istream_iterator<string> isitr;
ifstream fin("Vector.txt");
vector <string> test{ isitr{fin}, isitr{} }; // vector contains strings
map<string,string> dict{ // replacements dictionary
{"bird", "book"}, {"cat", "kitten"}
};
for(auto& x: test) // x must be a reference
{
auto itr = dict.find(x);
if(itr != dict.end()) // if a match was found
x = itr->second; // replace x with the found replacement
// (this is why x must be a reference)
}
for(const auto& x: test)
cout << test << " ";

Use STL!! It's our power. Everything you need:
#include <iostream>
#include <algorithm>
#include <iterator>
#include <vector>
#include <string>
#include <fstream>
#include <map>
int main()
{
std::vector<std::string> words;
const std::map<std::string, std::string> words_to_replace{
{ "bird", "book" }, { "cat", "table" }
};
auto end = words_to_replace.cend();
std::transform(
std::istream_iterator<std::string>{ std::ifstream{ "file.txt" } },
std::istream_iterator<std::string>(),
std::back_inserter(words),
[&](const std::string& word) {
auto word_pos = words_to_replace.find(word);
return (word_pos != end) ? word_pos->second : word;
});
std::copy(words.cbegin(), words.cend(),
std::ostream_iterator<std::string>(std::cout, "\n"));
std::cout << std::endl;
}

string iterator incompatible for reading eachline

I have an std::ostringstream.
I would like to iterate for each line of this std::ostringstream.
I use boost::tokenizer :
std::ostringstream HtmlStream;
.............
typedef boost::tokenizer<boost::char_separator<char> > line_tokenizer;
line_tokenizer tok(HtmlStream.str(), boost::char_separator<char>("\n\r"));
for (line_tokenizer::const_iterator i = tok.begin(), end = tok.end(); i != end; ++i)
{
std::string str = *i;
}
On the line
for (line_tokenizer::const_iterator i = tok.begin(), end = tok.end(); i != end;
I have an assert error with "string iterator incompatible".
I have read about this error, on google and on StackOverflow too, but i have diffuclty to find my error.
Anyone could help me please ?
Thanks a lot,
Best regards,
Nixeus

I like to make it non-copying for efficiency/error reporting:
See it Live on Coliru
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
#include <iostream>
#include <vector>
int main()
{
auto const& s = "hello\r\nworld";
std::vector<boost::iterator_range<char const*>> lines;
boost::split(lines, s, boost::is_any_of("\r\n"), boost::token_compress_on);
for (auto const& range : lines)
{
std::cout << "at " << (range.begin() - s) << ": '" << range << "'\n";
};
}
Prints:
at 0: 'hello'
at 7: 'world'
This is more efficient than most of the alternatives shown. Of course, if you need more parsing capabilities, consider Boost Spirit:
See it Live on Coliru
#include <boost/spirit/include/qi.hpp>
int main()
{
std::string const s = "hello\r\nworld";
std::vector<std::string> lines;
{
using namespace boost::spirit::qi;
auto f(std::begin(s)),
l(std::end(s));
bool ok = parse(f, l, *(char_-eol) % eol, lines);
}
for (auto const& range : lines)
{
std::cout << "'" << range << "'\n";
};
}

how to split a string value that contains characters and numbers

I have a std::string s=n8Name4Surname. How can I obtain in 2 strings the Name and the Surname? THX

One way to do this is using Boost.Tokenizer. See this example:
#include <string>
#include <boost/tokenizer.hpp>
#include <boost/foreach.hpp>
int main()
{
using namespace std;
using namespace boost;
string text="n8Name4Surname.";
char_separator<char> sep("0123456789");
tokenizer<char_separator<char> > tokens(text, sep);
string name, surname;
int count = 0;
BOOST_FOREACH(const string& s, tokens)
{
if(count == 1)
{
name = s;
}
if(count == 2)
{
surname = s;
}
++count;
}
}
EDIT
If you put the results in a vector, its even less code:
#include <string>
#include <boost/tokenizer.hpp>
#include <boost/foreach.hpp>
#include <algorithm>
#include <iterator>
#include <vector>
int main()
{
using namespace std;
using namespace boost;
string text="n8Name4Surname.";
char_separator<char> sep("0123456789");
tokenizer<char_separator<char> > tokens(text, sep);
vector<string> names;
tokenizer<char_separator<char> >::iterator iter = tokens.begin();
++iter;
if(iter != tokens.end())
{
copy(iter, tokens.end(), back_inserter(names));
}
}

You can detect numerical characters in the string using function isdigit(mystring.at(position), then extract substring between those positions.
See:
http://www.cplusplus.com/reference/clibrary/cctype/isdigit/

Use Boost tokenizer with the digits 0-9 as delimiters. Then, throw away the string containing "n". It's overkill, I realize...

Simple STL approach:
#include <string>
#include <vector>
#include <iostream>
int main()
{
std::string s= "n8Name4Surname";
std::vector<std::string> parts;
const char digits[] = "0123456789";
std::string::size_type from=0, to=std::string::npos;
do
{
from = s.find_first_of(digits, from);
if (std::string::npos != from)
from = s.find_first_not_of(digits, from);
if (std::string::npos != from)
{
to = s.find_first_of(digits, from);
if (std::string::npos == to)
parts.push_back(s.substr(from));
else
parts.push_back(s.substr(from, to-from));
from = to; // could be npos
}
} while (std::string::npos != from);
for (int i=0; i<parts.size(); i++)
std::cout << i << ":\t" << parts[i] << std::endl;
return 0;
}

Mandatory Boost Spirit sample:
#include <string>
#include <boost/spirit/include/qi.hpp>
#include <iostream>
int main()
{
std::string s= "n8Name4Surname";
std::string::const_iterator b(s.begin()), e(s.end());
std::string ignore, name, surname;
using namespace boost::spirit::qi;
rule<std::string::const_iterator, space_type, char()>
digit = char_("0123456789"),
other = (char_ - digit);
if (phrase_parse(b, e, *other >> +digit >> +other >> +digit >> +other, space, ignore, ignore, name, ignore, surname))
{
std::cout << "name = " << name << std::endl;
std::cout << "surname = " << surname << std::endl;
}
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Boost Tokenizer strange symbols at start - c++

Related

Can't iterate through all the words in thr file.txt

How to use boost::hash to get the file content hash?

replacing a string in a vector without positioning

string iterator incompatible for reading eachline

how to split a string value that contains characters and numbers

Categories

Resources