Can we remove two substrings from a C++ string simultaneously? - c++

Assume I have a C++ string /dev/class/xyz1/device/vendor/config. As a part of my work, I am required to remove substrings "device" and "config" from the above string.
I know I can accomplish it by using "erase" call twice. But, I was wondering if this can be achieved in a single call. Any string class library call or boost call to achieve this?

Other than Regular Expressions, I'm not aware of any other method.
However think about why you want to do this. Just because it's a single call won't make it "alot" faster, as the code still needs to be executed one way or the other.
On the other hand, having a command for each word would increase code-readability, which always should be high-priority.
If you need this often and want to save lines, you could however easily write such a function yourself, and put it into a library of your custom utility functions. The function could take the input string and a std::vector for strings or any other form of string-collection to remove from the prior.

It's not entirely clear how specific the algorithm should be. But, for the case given, the following would have minimum copying and do the mutation "atomically" (as in: either both or no substrings removed):
namespace ba = boost::algorithm;
void mutate(std::string& the_string) {
if (ba::ends_with(the_string, "/config")) {
auto pos = the_string.find("/device/");
if (std::string::npos != pos) {
the_string.resize(the_string.size() - 7); // cut `/config`
the_string.erase(pos, 7); // cut `/device`
}
}
}
See it Live On Coliru
#include <boost/algorithm/string.hpp>
namespace ba = boost::algorithm;
void mutate(std::string& the_string) {
if (ba::ends_with(the_string, "/config")) {
auto pos = the_string.find("/device/");
if (std::string::npos != pos) {
the_string.resize(the_string.size() - 7); // cut `/config`
the_string.erase(pos, 7); // cut `/device`
}
}
}
#include <iostream>
int main() {
std::string s = "/dev/class/xyz1/device/vendor/config";
std::cout << "before: " << s << "\n";
mutate(s);
std::cout << "mutated: " << s << "\n";
}
Prints
before: /dev/class/xyz1/device/vendor/config
mutated: /dev/class/xyz1/vendor

Related

How do I make an alphabetized list of all distinct words in a file with the number of times each word was used?

I am writing a program using Microsoft Visual C++. In the program I must read in a text file and print out an alphabetized list of all distinct words in that file with the number of times each word was used.
I have looked up different ways to alphabetize a string but they do not work with the way I have my string initialized.
// What is inside my text file
Any experienced programmer engaged in writing programs for use by others knows
that, once his program is working correctly, good output is a must. Few people
really care how much time and trouble a programmer has spent in designing and
debugging a program. Most people see only the results. Often, by the time a
programmer has finished tackling a difficult problem, any output may look
great. The programmer knows what it means and how to interpret it. However,
the same cannot be said for others, or even for the programmer himself six
months hence.
string lines;
getline(input, lines); // Stores what is in file into the string
I expect an alphabetized list of words with the number of times each word was used. So far, I do not know how to begin this process.
It's rather simple, std::map automatically sorts based on key in the key/value pair you get. The key/value pair represents word/count which is what you need. You need to do some filtering for special characters and such.
EDIT: std::stringstream is a nice way of splitting std::string using whitespace delimiter as it's the default delimiter. Therefore, using stream >> word you will get whitespace-separated words. However, this might not be enough due to punctuation. For example: Often, has comma which we need to filter out. Therefore, I used std::replaceif which replaces puncts and digits with whitespaces.
Now a new problem arises. In your example, you have: "must.Few" which will be returned as one word. After replacing . with we have "must Few". So I'm using another stringstream on the filtered "word" to make sure I have only words in the final result.
In the second loop you will notice if(word == "") continue;, this can happen if the string is not trimmed. If you look at the code you will find out that we aren't trimming after replacing puncts and digits. That is, "Often," will be "Often " with trailing whitespace. The trailing whitespace causes the second loop to extract an empty word. This is why I added the condition to ignore it. You can trim the filtered result and then you wouldn't need this check.
Finally, I have added ignorecase boolean to check if you wish to ignore the case of the word or not. If you wish to do so, the program will simply convert the word to lowercase and then add it to the map. Otherwise, it will add the word the same way it found it. By default, ignorecase = true, if you wish to consider case, just call the function differently: count_words(input, false);.
Edit 2: In case you're wondering, the statement counts[word] will automatically create key/value pair in the std::map IF there isn't any key matching word. So when we call ++: if the word isn't in the map, it will create the pair, and increment value by 1 so you will have newly added word. If it exists already in the map, this will increment the existing value by 1 and hence it acts as a counter.
The program:
#include <iostream>
#include <map>
#include <sstream>
#include <cstring>
#include <cctype>
#include <string>
#include <iomanip>
#include <algorithm>
std::string to_lower(const std::string& str) {
std::string ret;
for (char c : str)
ret.push_back(tolower(c));
return ret;
}
std::map<std::string, size_t> count_words(const std::string& str, bool ignorecase = true) {
std::map<std::string, size_t> counts;
std::stringstream stream(str);
while (stream.good()) {
// wordW may have multiple words connected by special chars/digits
std::string wordW;
stream >> wordW;
// filter special chars and digits
std::replace_if(wordW.begin(), wordW.end(),
[](const char& c) { return std::ispunct(c) || std::isdigit(c); }, ' ');
// now wordW may have multiple words seperated by whitespaces, extract them
std::stringstream word_stream(wordW);
while (word_stream.good()) {
std::string word;
word_stream >> word;
// ignore empty words
if (word == "") continue;
// add to count.
ignorecase ? counts[to_lower(word)]++ : counts[word]++;
}
}
return counts;
}
void print_counts(const std::map<std::string, size_t>& counts) {
for (auto pair : counts)
std::cout << std::setw(15) << pair.first << " : " << pair.second << std::endl;
}
int main() {
std::string input = "Any experienced programmer engaged in writing programs for use by others knows \
that, once his program is working correctly, good output is a must.Few people \
really care how much time and trouble a programmer has spent in designing and \
debugging a program.Most people see only the results.Often, by the time a \
programmer has finished tackling a difficult problem, any output may look \
great.The programmer knows what it means and how to interpret it.However, \
the same cannot be said for others, or even for the programmer himself six \
months hence.";
auto counts = count_words(input);
print_counts(counts);
return 0;
}
I have tested this with Visual Studio 2017 and here is the part of the output:
a : 5
and : 3
any : 2
be : 1
by : 2
cannot : 1
care : 1
correctly : 1
debugging : 1
designing : 1
As others have already noted, an std::map handles the counting you care about quite easily.
Iostreams already have a tokenize to break an input stream up into words. In this case, we want to to only "think" of letters as characters that can make up words though. A stream uses a locale to make that sort of decision, so to change how it's done, we need to define a locale that classifies characters as we see fit.
struct alpha_only: std::ctype<char> {
alpha_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
// everything is white space
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
// except lower- and upper-case letters, which are classified accordingly:
std::fill(&rc['a'], &rc['z'], std::ctype_base::lower);
std::fill(&rc['A'], &rc['Z'], std::ctype_base::upper);
return &rc[0];
}
};
With that in place, we tell the stream to use our ctype facet, then simply read words from the file and count them in the map:
std::cin.imbue(std::locale(std::locale(), new alpha_only));
std::map<std::string, std::size_t> counts;
std::string word;
while (std::cin >> word)
++counts[to_lower(word)];
...and when we're done with that, we can print out the results:
for (auto w : counts)
std::cout << w.first << ": " << w.second << "\n";
Id probably start by inserting all of those words into an array of strings, then start with the first index of the array and compare that with all of the other indexes if you find matches, add 1 to a counter and after you went through the array you could display the word you were searching for and how many matches there were and then go onto the next element and compare that with all of the other elements in the array and display etc. Or maybe if you wanna make a parallel array of integers that holds the number of matches you could do all the comparisons at one time and the displays at one time.
EDIT:
Everyone's answer seems more elegant because of the map's inherent sorting. My answer functions more as a parser, that later sorts the tokens. Therefore my answer is only useful to the extent of a tokenizer or lexer, whereas Everyone's answer is only good for sorted data.
You first probably want to read in the text file. You want to use a streambuf iterator to read in the file(found here).
You will now have a string called content, which is the content of you file. Next you will want to iterate, or loop, over the contents of this string. To do that you'll want to use an iterator. There should be a string outside of the loop that stores the current word. You will iterate over the content string, and each time you hit a letter character, you will add that character to your current word string. Then, once you hit a space character, you will take that current word string, and push it back into the wordString vector. (Note: that means that this will ignore non-letter characters, and that only spaces denote word separation.)
Now that we have a vector of all of our words in strings, we can use std::sort, to sort the vector in alphabetical order.(Note: capitalized words take precedence over lowercase words, and therefore will be sorted first.) Then we will iterate over our vector of stringWords and convert them into Word objects (this is a little heavy-weight), that will store their appearances and the word string. We will push these Word objects into a Word vector, but if we discover a repeat word string, instead of adding it into the Word vector, we'll grab the previous entry and increment its appearance count.
Finally, once this is all done, we can iterate over our Word object vector and output the word followed by its appearances.
Full Code:
#include <vector>
#include <fstream>
#include <iostream>
#include <streambuf>
#include <algorithm>
#include <string>
class Word //define word object
{
public:
Word(){appearances = 1;}
~Word(){}
int appearances;
std::string mWord;
};
bool isLetter(const char x)
{
return((x >= 'a' && x <= 'z') || (x >= 'A' && x <= 'Z'));
}
int main()
{
std::string srcFile = "myTextFile.txt"; //what file are we reading
std::ifstream ifs(srcFile);
std::string content( (std::istreambuf_iterator<char>(ifs) ),
( std::istreambuf_iterator<char>() )); //read in the file
std::vector<std::string> wordStringV; //create a vector of word strings
std::string current = ""; //define our current word
for(auto it = content.begin(); it != content.end(); ++it) //iterate over our input
{
const char currentChar = *it; //make life easier
if(currentChar == ' ')
{
wordStringV.push_back(current);
current = "";
continue;
}
else if(isLetter(currentChar))
{
current += *it;
}
}
std::sort(wordStringV.begin(), wordStringV.end(), std::less<std::string>());
std::vector<Word> wordVector;
for(auto it = wordStringV.begin(); it != wordStringV.end(); ++it) //iterate over wordString vector
{
std::vector<Word>::iterator wordIt;
//see if the current word string has appeared before...
for(wordIt = wordVector.begin(); wordIt != wordVector.end(); ++wordIt)
{
if((*wordIt).mWord == *it)
break;
}
if(wordIt == wordVector.end()) //...if not create a new Word obj
{
Word theWord;
theWord.mWord = *it;
wordVector.push_back(theWord);
}
else //...otherwise increment the appearances.
{
++((*wordIt).appearances);
}
}
//print the words out
for(auto it = wordVector.begin(); it != wordVector.end(); ++it)
{
Word theWord = *it;
std::cout << theWord.mWord << " " << theWord.appearances << "\n";
}
return 0;
}
Side Notes
Compiled with g++ version 4.2.1 with target x86_64-apple-darwin, using the compiler flag -std=c++11.
If you don't like iterators you can instead do
for(int i = 0; i < v.size(); ++i)
{
char currentChar = vector[i];
}
It's important to note that if you are capitalization agnostic simply use std::tolower on the current += *it; statement (ie: current += std::tolower(*it);).
Also, you seem like a beginner and this answer might have been too heavyweight, but you're asking for a basic parser and that is no easy task. I recommend starting by parsing simpler strings like math equations. Maybe make a calculator app.

Remove specific format of string from a List?

I'm writing a program for an Arduino that takes information in a sort of NMEA format which is read from a .txt file stored in a List< String >. I need to strip out strings that begin with certain prefixes ($GPZDA, $GPGSA, $GPGSV) because these are useless to me and therefore I only need $GPRMC and $GPGGA which contains a basic time stamp and the location which is all I'm using anyway. I'm looking to use as little external libraries (SPRINT, BOOST) as possible as the DUE doesn't have a fantastic amount of space as-is.
All I really need is a method to remove lines from the LIST<STRING> that doesn't start with a specific prefix, Any ideas?
The method I'm currently using seems to have replaced the whole output with one specific string yet kept the file length/size the same (1676 and 2270, respectively), these outputs are achieved using two While statements that put the two input files into List<STRING>
Below is a small snipped from what I'm trying to use, which is supposed to sort the file into a correct order (Working, they are current ordered by their numerical value, which works well for the time which is the second field in the string) however ".unique();" appears to have taken each "Unique" value and replaced all the others with it so now I have a 1676 line list that basically goes 1,1,1,2,2,2,3,3,4... 1676 ???
while (std::getline(GPS1,STRLINE1)){
ListOne.push_back("GPS1: " + STRLINE1 + "\n");
ListOne.sort();
ListOne.unique();
std::cout << ListOne.back() << std::endl;
GPSO1 << ListOne.back();
}
Thanks
If I understand correctly and you want to have some sort of white list of prefixes.
You could use remove_if to look for them, and use a small function to check whether one of the prefixes fits(using mismatch like here) for example:
#include <iostream>
#include <algorithm>
#include <string>
#include <list>
using namespace std;
int main() {
list<string> l = {"aab", "aac", "abb", "123", "aaw", "wws"};
list<string> whiteList = {"aa", "ab"};
auto end = remove_if(l.begin(), l.end(), [&whiteList](string item)
{
for(auto &s : whiteList)
{
auto res = mismatch(s.begin(), s.end(), item.begin());
if (res.first == s.end()){
return false; //found allowed prefix
}
}
return true;
});
for (auto it = l.begin(); it != end; ++it){
cout<< *it << endl;
}
return 0;
}
(demo)

Fastest string find from the list of string

I have a list of strings and I have to find whether a string is present in that list or not. I wanted to use the logic in low latency pricing engine so I wanted to have real fast logic for it.
I thought of having these strings stored in map as keys and then could use find() or count() function for the same.
Can anyone suggest any other more efficient logic for the same?
Probably std::unordered_set is an appropriate choice for your needs. You would then use find() to check if a string is present or not. Something like the example code here:
#include <iostream>
#include <string>
#include <unordered_set>
int main() {
std::unordered_set<std::string> myset{ "red", "green", "blue" };
std::cout << "color? ";
std::string input;
std::cin >> input;
auto pos = myset.find(input);
if (pos != myset.end())
std::cout << *pos << " is in myset\n";
else
std::cout << "not found in myset\n";
}
To understand how std::unordered_set works, please see hash set.
One more way I just now thought of is,
Put the list of string in single semicolon separated string and then use strfind.
e.g.
List of string, <ABC,DEF,GHI,JKL,MNO,PQRS,LMNOPQR, STUVW,XY,Z>
l_czEIDHolder = “ABC;DEF;GHI;JKL;MNO;PQRS;LMNOPQR; STUVW;XY;Z”
if string_to_search = “PQRS”
make string_to_search = string_to_search +”;”
strfind(czEIDHolder, string_to_search) OR
string::find(czEIDHolder, string_to_search)

C++ boost/regex regex_search

Consider the following string content:
string content = "{'name':'Fantastic gloves','description':'Theese gloves will fit any time period.','current':{'trend':'high','price':'47.1000'}";
I have never used regex_search and I have been searching around for ways to use it - I still do not quite get it. From that random string (it's from an API) how could I grab two things:
1) the price - in this example it is 47.1000
2) the name - in this example Fantastic gloves
From what I have read, regex_search would be the best approach here. I plan on using the price as an integer value, I will use regex_replace in order to remove the "." from the string before converting it. I have only used regex_replace and I found it easy to work with, I don't know why I am struggling so much with regex_search.
Keynotes:
Content is contained inside ' '
Content id and value is separated by :
Conent/value are separated by ,
Value of id's name and price will vary.
My first though was to locate for instance price and then move 3 characters ahead (':') and gather everything until the next ' - however I am not sure if I am completely off-track here or not.
Any help is appreciated.
boost::regex would not be needed. Regular expressions are used for more general pattern matching, whereas your example is very specific. One way to handle your problem is to break the string up into individual tokens. Here is an example using boost::tokenizer:
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
#include <map>
int main()
{
std::map<std::string, std::string> m;
std::string content = "{'name':'Fantastic gloves','description':'Theese gloves will fit any time period.','current':{'trend':'high','price':'47.1000'}";
boost::char_separator<char> sep("{},':");
boost::tokenizer<boost::char_separator<char>> tokenizer(content, sep);
std::string id;
for (auto tok = tokenizer.begin(); tok != tokenizer.end(); ++tok)
{
// Since "current" is a special case I added code to handle that
if (*tok != "current")
{
id = *tok++;
m[id] = *tok;
}
else
{
id = *++tok;
m[id] = *++tok; // trend
id = *++tok;
m[id] = *++tok; // price
}
}
std::cout << "Name: " << m["name"] << std::endl;
std::cout << "Price: " << m["price"] << std::endl;
}
Link to live code.
As the string you are attempting to parse appears to be JSON (JavaScript Object Notation), consider using a specialized JSON parser.
You can find a comprehensive list of JSON parsers in many languages including C++ at http://json.org/. Also, I found a discussion on the merits of several JSON parsers for C++ in response to this SO question.

Comparing vector of strings to a string

I haven't coded this bit up yet, because I'm not sure of which is the best method to tackle this.
For starters, what the program does now is simply put the names of all the files in the same directory as the program into an array of strings and then print that array out.
What I want to do is sort these by file extension. There will be a list of particular extensions for the user to choose from, after which all files with that extension in the folder will be returned to the user.
I'm just not sure how to go about that. The first thing that comes to mind is to iterate through the vector and compare each string to another string with the desired extension, and if there is match then push that string into another vector that is specific for that file extension. There are only 5 extensions I'm looking for so it's not like I would have to make a whole ton of new vectors for each extension.
Alternativley I thought it might also make sense to never populate the original vector, and take the users request first and then iterate through the files and push all files with matching extensions into a specific vector. Once done if they choose another option the vector will simply be cleared and re-populated with the new file names.
Any tips on how to go about actually doing the comparison, I'm not that good with c++ syntax, also would it be wise to use a different type of container?
Thanks a lot for any and all advice you guys are willing to throw my way, it's greatly appreciated!
#include <iostream>
#include <filesystem>
#include <vector>
using namespace std;
using namespace std::tr2::sys;
void scan( path f, unsigned i = 0 )
{
string indent(i,'\t');
cout << indent << "Folder = " << system_complete(f) << endl;
directory_iterator d( f );
directory_iterator e;
vector<string>::iterator it1;
std::vector<string> fileNames;
for( ; d != e; ++d )
{
fileNames.push_back(d->path());
//print out conents without use of an array
/*cout << indent <<
d->path() << (is_directory( d->status() ) ? " [dir]":"") <<
endl;*/
//if I want to go into subdirectories
/*if( is_directory( d->status() ) )
scan( f / d->path(), i + 1 );*/
}
for(it1 = fileNames.begin(); it1 != fileNames.end(); it1++)
{
cout << *it1 << endl;
}
}
int main()
{
path folder = "..";
cout << folder << (is_directory( folder ) ? " [dir]":"") << endl;
scan( folder );
}
You don't mean 'sort', you mean 'filter'. Sort means something else entirely.
Your second option seems the best, why do the extra work with two vectors?
As for the comparison, the difficulty is that the thing you are looking for is at the end of the string, and most searching functions operate from the start of the string. But there is a handy thing in C++ called a reverse iterator which scans a string backwards from the end, not forwards from the start. You call rbegin() and rend() to get a string's reverse iterators. Here's a comparison function using reverse iterators.
#include <algorithm>
#include <string>
// return true if file ends with ext, false otherwise
bool ends_with(const std::string& file, const std::string& ext)
{
return file.size() >= ext.size() && // file must be at least as long as ext
// check strings are equal starting at the end
std::equal(ext.rbegin(), ext.rend(), file.rbegin());
}