Comparing vector of strings to a string - c++

I haven't coded this bit up yet, because I'm not sure of which is the best method to tackle this.
For starters, what the program does now is simply put the names of all the files in the same directory as the program into an array of strings and then print that array out.
What I want to do is sort these by file extension. There will be a list of particular extensions for the user to choose from, after which all files with that extension in the folder will be returned to the user.
I'm just not sure how to go about that. The first thing that comes to mind is to iterate through the vector and compare each string to another string with the desired extension, and if there is match then push that string into another vector that is specific for that file extension. There are only 5 extensions I'm looking for so it's not like I would have to make a whole ton of new vectors for each extension.
Alternativley I thought it might also make sense to never populate the original vector, and take the users request first and then iterate through the files and push all files with matching extensions into a specific vector. Once done if they choose another option the vector will simply be cleared and re-populated with the new file names.
Any tips on how to go about actually doing the comparison, I'm not that good with c++ syntax, also would it be wise to use a different type of container?
Thanks a lot for any and all advice you guys are willing to throw my way, it's greatly appreciated!
#include <iostream>
#include <filesystem>
#include <vector>
using namespace std;
using namespace std::tr2::sys;
void scan( path f, unsigned i = 0 )
{
string indent(i,'\t');
cout << indent << "Folder = " << system_complete(f) << endl;
directory_iterator d( f );
directory_iterator e;
vector<string>::iterator it1;
std::vector<string> fileNames;
for( ; d != e; ++d )
{
fileNames.push_back(d->path());
//print out conents without use of an array
/*cout << indent <<
d->path() << (is_directory( d->status() ) ? " [dir]":"") <<
endl;*/
//if I want to go into subdirectories
/*if( is_directory( d->status() ) )
scan( f / d->path(), i + 1 );*/
}
for(it1 = fileNames.begin(); it1 != fileNames.end(); it1++)
{
cout << *it1 << endl;
}
}
int main()
{
path folder = "..";
cout << folder << (is_directory( folder ) ? " [dir]":"") << endl;
scan( folder );
}

You don't mean 'sort', you mean 'filter'. Sort means something else entirely.
Your second option seems the best, why do the extra work with two vectors?
As for the comparison, the difficulty is that the thing you are looking for is at the end of the string, and most searching functions operate from the start of the string. But there is a handy thing in C++ called a reverse iterator which scans a string backwards from the end, not forwards from the start. You call rbegin() and rend() to get a string's reverse iterators. Here's a comparison function using reverse iterators.
#include <algorithm>
#include <string>
// return true if file ends with ext, false otherwise
bool ends_with(const std::string& file, const std::string& ext)
{
return file.size() >= ext.size() && // file must be at least as long as ext
// check strings are equal starting at the end
std::equal(ext.rbegin(), ext.rend(), file.rbegin());
}

Related

My program returning the set_intersection value of two text files containing 479k words each is really slow. Is it my code?

I wrote a program to compare two text files containing all of the words in the dictionary (one forwards and one backwards). The idea is that when the text file containing all of the backwards words is compared with the forwards words, any matches will indicate that those words can be spelled both forwards and backwards and will return all palindromes as well as any words that spell both a word backwards and forwards.
The program works and I've tested it on three different file sizes. The first set contain only two words, just for testing purposes. The second contains 10,000 English words (in each text file), and the third contains all English words (~479k words). When I run the program calling on the first set of text files, the result is almost instantaneous. When I run the program calling on the set of text files containing 10k words, it takes a few hours. However, when I run the program containing the largest files (479k words), it ran for a day and returned only about 30 words, when it should have returned thousands. It didn't even finish and was nowhere near finishing (and this was on a fairly decent gaming PC).
I have a feeling it has to do with my code. It must be inefficient.
There are two things that I've noticed:
When I run: cout << "token: " << *it << std::endl; it runs endlessly on a loop forever and never stops. Could this be eating up processing power?
I commented out sorting because all my data is already sorted. I noticed that the second I did this, the program running 10,000 word text files sped up.
However, even after doing these things there seemed to be no real change in speed in the program calling on the largest text files. Any advice? I'm kinda new at this. Thanks~
*Please let me know if you'd like a copy of the text files and I'd happily upload them. Thanks
#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <boost/tokenizer.hpp>
typedef boost::char_separator<char> separator_type;
using namespace std;
using namespace boost;
int main()
{
fstream file1; //fstream variable files
fstream file2; //fstream variable files
string dictionary1;
string dictionary2;
string words1;
string words2;
dictionary1 = "Dictionary.txt";
// dictionary1 = "Dictionarytenthousand.txt";
// dictionary1 = "Twoworddictionary.txt"; //this dictionary contains only two words separated by a comma as a test
dictionary2 = "Backwardsdictionary.txt";
// dictionary2 = "Backwardsdictionarytenthousand.txt";
// dictionary2 = "Backwardstwoworddictionary.txt"; //this dictionary contains only two words separated by a comma as a test
file1.open(dictionary1.c_str()); //opening Dictionary.txt
file2.open(dictionary2.c_str()); //opening Backwardsdictionary.txt
if (!file1)
{
cout << "Unable to open file1"; //terminate with error
exit(1);
}
if (!file2)
{
cout << "Unable to open file2"; //terminate with error
exit(1);
}
while (getline(file1, words1))
{
while (getline(file2, words2))
{
boost::tokenizer<separator_type> tokenizer1(words1, separator_type(",")); //separates string in Twoworddictionary.txt into individual words for compiler (comma as delimiter)
auto it = tokenizer1.begin();
while (it != tokenizer1.end())
{
std::cout << "token: " << *it << std::endl; //test to see if tokenizer works before program continues
vector<string> words1Vec; // vector to store Twoworddictionary.txt strings in
words1Vec.push_back(*it++); // adds elements dynamically onto the end of the vector
boost::tokenizer<separator_type> tokenizer2(words2, separator_type(",")); //separates string in Backwardstwoworddictionary.txt into individual words for compiler (comma as delimiter)
auto it2 = tokenizer2.begin();
while (it2 != tokenizer2.end())
{
std::cout << "token: " << *it2 << std::endl; //test to see if tokenizer works before program continues
vector<string> words2Vec; //vector to store Backwardstwoworddictionary.txt strings in
words2Vec.push_back(*it2++); //adds elements dynamically onto the end of the vector
vector<string> matchingwords(words1Vec.size() + words2Vec.size()); //vector to store elements from both dictionary text files (and ultimately to store the intersection of both, i.e. the matching words)
//sort(words1Vec.begin(), words1Vec.end()); //set intersection requires its inputs to be sorted
//sort(words2Vec.begin(), words2Vec.end()); //set intersection requires its inputs to be sorted
vector<string>::iterator it3 = set_intersection(words1Vec.begin(), words1Vec.end(), words2Vec.begin(), words2Vec.end(), matchingwords.begin()); //finds the matching words from both dictionaries
matchingwords.erase(it3, matchingwords.end());
for (vector<string>::iterator it4 = matchingwords.begin(); it4 < matchingwords.end(); ++it4) cout << *it4 << endl; // returns matching words
}
}
}
}
file1.close();
file2.close();
return 0;
}
Stop using namespace. Type the extra stuff.
Have code do one thing. Your code isn't doing what you claim it does, probably becuase you are doing 4 things at once and getting confused.
Then glue the code together.
Getline supports arbitrary delimiters. Use it with ','.
Write code that converts a file into a vector of strings.
std::vector<std::string> getWords(std::string filename);
then test it works. You are doing this wrong in your code posted above, in that you are making length 1 vectors and tossing them.
That will remove about half of your code.
Next, for set_intersection, use std::back_inserter and an empty vector as your output. Like (blah begin, blah end, foo begin, foo end, std::back_inserter(vec3)). It will call push_back with each result.
In pseudo code:
std::vec<std::string> loadWords(std::string filename)
auto file=open(filename)
std::vec<std::string> retval
while(std::readline(file, str, ','))
retval.push_back(str)
return retval
std::vec<string> intersect(std::string file1, std::string file2)
auto v1=loadWords(file1)
auto v2=loadWords(file2)
std::vec<string> v3;
std::set_intersect(begin(v1),end(v1),begin(v2),end(v2),std::back_inserter(v3))
return v3
and done.
Also stop it with the C++03 loops.
for(auto& elem:vec)
std::cout<<elem<<'\n';
is far clearer and less error prone than manually futzing with iterators.

Searching for files in a directory by name using Visual Studio C++

I'm trying to create a program where I can search for some files in a directory on my PC, using Visual Studio C++.
As I'm not very experienced with that, I found this code (below) in another answer but couldn't find any explanation to the code.
I'm having a hard time figuring it out and would strongly appreciate any help possible.
If there's another way of doing this I would be pleased to know how.
Thank you!
"
Now you can get file names. Just compare a file name.
while ((dirp = readdir(dp)) != NULL) {
std::string fname = dirp->d_name;
if(fname.find("abc") != std::string::npos)
files.push_back(fname);
}
Also you can use scandir function which can register filter function.
static int filter(const struct dirent* dir_ent)
{
if (!strcmp(dir_ent->d_name, ".") || !strcmp(dir_ent->d_name, ".."))
return 0;
std::string fname = dir_ent->d_name;
if (fname.find("abc") == std::string::npos) return 0;
return 1;
}
int main()
{
struct dirent **namelist;
std::vector<std::string> v;
std::vector<std::string>::iterator it;
n = scandir( dir_path , &namelist, *filter, alphasort );
for (int i=0; i<n; i++) {
std::string fname = namelist[i]->d_name;
v.push_back(fname);
free(namelist[i]);
}
free(namelist);
return 0;
}
"
A better way of doing this would probably be using the new std::filesystem library. directory_iterators allow you to go through the contents of a directory. Since they are just iterators, you can combine them with standard algorithms like std::find_if to search for a particular entry:
#include <filesystem>
#include <algorithm>
namespace fs = std::filesystem;
void search(const fs::path& directory, const fs::path& file_name)
{
auto d = fs::directory_iterator(directory);
auto found = std::find_if(d, end(d), [&file_name](const auto& dir_entry)
{
return dir_entry.path().filename() == file_name;
});
if (found != end(d))
{
// we have found what we were looking for
}
// ...
}
We first create a directory_iterator d for the directory in which we want to search. We then use std::find_if() to go through the contents of the directory and search for an entry that matches the filename we are looking for. std::find_if() expects a function object as last argument that is applied to every visited element and returns true if the element matches what we are looking for. std::find_if() returns the iterator to the first element for which this predicate function returns true, otherwise it returns the end iterator. Here, we use a lambda as predicate that returns true when the filename component of the path of the directory entry we're looking at matches the wanted filename. Afterwards, we compare the iterator returned by std::find_if() to the end iterator to see if we have found an entry or not. In case we did find an entry, *found will evaluate to a directory_entry representing the respective file system object.
Note that this will require a recent version of Visual Studio 2017. Don't forget to set the language standard to /std:c++17 or /std:c++latest in the project properties (C++/Language).
Both methods use the find function of a std::string:
fname.find("abc")
This looks for "abc" in the fname string. If it's found it returns the index it starts at, otherwise it retruns std::string::npos, so they both check for that substring.
You may want to see if you have an exact match, using == instead. It depends.
If an appropriate filename is found, it's pushed back into a vector.
Your main function has
std::vector<std::string>::iterator it;
which it doesn't use.
I suspect that came over with some copy/paste.
You can use a range based for loop to see what's in your vector:
for(const std::string & name : v)
{
std::cout << name << '\n';
}
The filter function also checks against "." and ".." since these have special meanings - current dir and up one dir.
At that point, th C API has returned a char *, so they use strcmp, rather than std::string methods.
Edit:
n = scandir( dir_path , &namelist, *filter, alphasort );
uses n which you haven't declared.
Try
int n = scandir( dir_path , &namelist, *filter, alphasort );
Also, that uses dir_path which needs declaring somewhere.
For a quick fix, try
const char * dir_path = "C:\\";
(or whatever path you want, watching out for escaping backslashes with an extra backslash.
You probably want to pass this in as an arg to main.

Remove specific format of string from a List?

I'm writing a program for an Arduino that takes information in a sort of NMEA format which is read from a .txt file stored in a List< String >. I need to strip out strings that begin with certain prefixes ($GPZDA, $GPGSA, $GPGSV) because these are useless to me and therefore I only need $GPRMC and $GPGGA which contains a basic time stamp and the location which is all I'm using anyway. I'm looking to use as little external libraries (SPRINT, BOOST) as possible as the DUE doesn't have a fantastic amount of space as-is.
All I really need is a method to remove lines from the LIST<STRING> that doesn't start with a specific prefix, Any ideas?
The method I'm currently using seems to have replaced the whole output with one specific string yet kept the file length/size the same (1676 and 2270, respectively), these outputs are achieved using two While statements that put the two input files into List<STRING>
Below is a small snipped from what I'm trying to use, which is supposed to sort the file into a correct order (Working, they are current ordered by their numerical value, which works well for the time which is the second field in the string) however ".unique();" appears to have taken each "Unique" value and replaced all the others with it so now I have a 1676 line list that basically goes 1,1,1,2,2,2,3,3,4... 1676 ???
while (std::getline(GPS1,STRLINE1)){
ListOne.push_back("GPS1: " + STRLINE1 + "\n");
ListOne.sort();
ListOne.unique();
std::cout << ListOne.back() << std::endl;
GPSO1 << ListOne.back();
}
Thanks
If I understand correctly and you want to have some sort of white list of prefixes.
You could use remove_if to look for them, and use a small function to check whether one of the prefixes fits(using mismatch like here) for example:
#include <iostream>
#include <algorithm>
#include <string>
#include <list>
using namespace std;
int main() {
list<string> l = {"aab", "aac", "abb", "123", "aaw", "wws"};
list<string> whiteList = {"aa", "ab"};
auto end = remove_if(l.begin(), l.end(), [&whiteList](string item)
{
for(auto &s : whiteList)
{
auto res = mismatch(s.begin(), s.end(), item.begin());
if (res.first == s.end()){
return false; //found allowed prefix
}
}
return true;
});
for (auto it = l.begin(); it != end; ++it){
cout<< *it << endl;
}
return 0;
}
(demo)

How does one extract the sequence of parsed options using Boost Program Options?

I'm building a graph generator using Boost Graph and Program Options. There are, for example, two types of components C and W, each with 1 source, 1 sink and some additional parameters to specify topology in between. I'd like to be able to stitch them together in the sequence provided by the order of the command line arguments.
For example:
./bin/make_graph -c4,5,1 -w3,3 -c3,1,2
Should create a graph resembling the following:
C -- W -- C
But:
./bin/make_graph -c4,5,1 -c3,1,2 -w3,3
Should create a graph resembling the following:
C -- C -- W
Using boost::program_options, I was unable to determine how to extract the exact order since it "composes" the options of the same string_key into a map with value_type == vector< string > (in my case).
By iterating over the map, the order is lost. Is there a way to not duplicate the parsing, but have a function called (perhaps a callback) every time an option is parsed? I couldn't find documentation in this direction. Any other suggestions?
To convince you that I'm not making this up, here's what I have so far:
namespace bpo = boost::program_options;
std::vector<std::string> args_cat, args_grid, args_web;
bpo::options_description desc("Program options:");
desc.add_options()
.operator ()("help,h","Displays this help message.")
.operator ()("caterpillar,c",bpo::value< std::vector<std::string> >(&args_cat)->default_value( std::vector<std::string>(1,"4,7,2"), "4,7,2" ),"Caterpillar tree with 3 parameters")
.operator ()("grid,g",bpo::value< std::vector<std::string> >(&args_grid)->default_value( std::vector<std::string>(1,"3,4"), "3,4" ),"Rectangular grid with 2 parameters")
.operator ()("web,w",bpo::value< std::vector<std::string> >(&args_web)->default_value( std::vector<std::string>(1,"3,4"), "3,4" ),"Web with 2 parameters")
;
bpo::variables_map ops;
bpo::store(bpo::parse_command_line(argc,argv,desc),ops);
bpo::notify(ops);
if((argc < 2) || (ops.count("help"))) {
std::cout << desc << std::endl;
return;
}
//TODO: remove the following scope block after testing
{
typedef bpo::variables_map::iterator OptionsIterator;
OptionsIterator it = ops.options.begin(), it_end = ops.options.end();
while(it != it_end) {
std::cout << it->first << ": ";
BOOST_FOREACH(std::string value, it->second) {
std::cout << value << " ";
}
std::cout << std::endl;
++it;
}
return;
}
I realize that I could also include the type as a parameter and solve this problem trivially, e.g.:
./bin/make_graph --component c,4,5,1 --component w,3,3 --component c,3,1,2
but that's moving in the direction of writing a parser/validator myself (maybe even without using Boost Program Options):
./bin/make_graph --custom c,4,5,1,w,3,3,c,3,1,2
./bin/make_graph c,4,5,1,w,3,3,c,3,1,2
How would you guys recommend I do this in an elegant way?
Thanks in advance!
PS: I've searched on SO for "[boost] +sequence program options" and "[boost-program-options] +order" (and their variants) before posting this, so I apologize in advance if this turns out to be a duplicate.
Since posting the question, I did some digging and have a "hack" that works with the existing examples I had above.
bpo::parsed_options p_ops = bpo::parse_command_line(argc,argv,desc);
typedef std::vector< bpo::basic_option<char> >::iterator OptionsIterator;
OptionsIterator it = p_ops.options.begin(), it_end = p_ops.options.end();
while(it != it_end) {
std::cout << it->string_key << ": ";
BOOST_FOREACH(std::string value, it->value) {
std::cout << value << " ";
}
std::cout << std::endl;
++it;
}
The reason I call it a hack is because it accesses all arguments as strings, and one would have to extract the types from it much like bpo::variables_map does with the .as<T>() member function. EDIT: It also accesses a member of the options struct directly.
How about this:
./bin/make_graph c,4,5,1 c,3,1,2 w,3,3
Where "c,4,5,1", "c,3,1,2" and "w,3,3" are positional arguments which are stored (in order) in a std::vector<std::string> (just like --input-file in this tutorial) . Then use Boost.Tokenizer or boost::algorithm::split to extract the subtokens from each argument string.
If the graphs can be complex, you should consider making it possible for the user to specify an input file that contains the graph parameters. Boost.Program_Options can parse a user config file that uses the same syntax as the command line options.

Read file and extract certain part only

ifstream toOpen;
openFile.open("sample.html", ios::in);
if(toOpen.is_open()){
while(!toOpen.eof()){
getline(toOpen,line);
if(line.find("href=") && !line.find(".pdf")){
start_pos = line.find("href");
tempString = line.substr(start_pos+1); // i dont want the quote
stop_pos = tempString .find("\"");
string testResult = tempString .substr(start_pos, stop_pos);
cout << testResult << endl;
}
}
toOpen.close();
}
What I am trying to do, is to extrat the "href" value. But I cant get it works.
EDIT:
Thanks to Tony hint, I use this:
if(line.find("href=") != std::string::npos ){
// Process
}
it works!!
I'd advise against trying to parse HTML like this. Unless you know a lot about the source and are quite certain about how it'll be formatted, chances are that anything you do will have problems. HTML is an ugly language with an (almost) self-contradictory specification that (for example) says particular things are not allowed -- but then goes on to tell you how you're required to interpret them anyway.
Worse, almost any character can (at least potentially) be encoded in any of at least three or four different ways, so unless you scan for (and carry out) the right conversions (in the right order) first, you can end up missing legitimate links and/or including "phantom" links.
You might want to look at the answers to this previous question for suggestions about an HTML parser to use.
As a start, you might want to take some shortcuts in the way you write the loop over lines in order to make it clearer. Here is the conventional "read line at a time" loop using C++ iostreams:
#include <fstream>
#include <iostream>
#include <string>
int main ( int, char ** )
{
std::ifstream file("sample.html");
if ( !file.is_open() ) {
std::cerr << "Failed to open file." << std::endl;
return (EXIT_FAILURE);
}
for ( std::string line; (std::getline(file,line)); )
{
// process line.
}
}
As for the inner part the processes the line, there are several problems.
It doesn't compile. I suppose this is what you meant with "I cant get it works". When asking a question, this is the kind of information you might want to provide in order to get good help.
There is confusion between variable names temp and tempString etc.
string::find() returns a large positive integer to indicate invalid positions (the size_type is unsigned), so you will always enter the loop unless a match is found starting at character position 0, in which case you probably do want to enter the loop.
Here is a simple test content for sample.html.
<html>
<a href="foo.pdf"/>
</html>
Sticking the following inside the loop:
if ((line.find("href=") != std::string::npos) &&
(line.find(".pdf" ) != std::string::npos))
{
const std::size_t start_pos = line.find("href");
std::string temp = line.substr(start_pos+6);
const std::size_t stop_pos = temp.find("\"");
std::string result = temp.substr(0, stop_pos);
std::cout << "'" << result << "'" << std::endl;
}
I actually get the output
'foo.pdf'
However, as Jerry pointed out, you might not want to use this in a production environment. If this is a simple homework or exercise on how to use the <string>, <iostream> and <fstream> libraries, then go ahead with such a procedure.