How to replace multiple sets of keywords in a string? - c++

So I have a file of strings that I am reading in, and I have to replace certain values in them with other values. The amount of possible replacements is variable. As in, it reads the patterns to replace with in from a file. Currently I'm storing in a vector<pair<string,string>> for the patterns to find and match. However I run into issues:
Example:
Input string: abcd.eaef%afas&333
Delimiter patterns:
. %%%
% ###
& ###
Output I want: abcd%%%eaef###afas###333
Output I get: abcd#########eaef###afas###333
The issue being it ends up replacing the % sign or any other symbol that was already a replacement for something else, it should not be doing that.
My code is (relevant portions):
std::string& replace(std::string& s, const std::string& from, const std::string& to){
if(!from.empty())
for(size_t pos = 0; (pos = s.find(from, pos)) != std::string::npos; pos += to.size()) s.replace(pos, from.size(), to);
return s;
}
string line;
vector<pair<string, string>> myset;
while(getline(delimiterfile, line)){
istringstream is(line);
string delim, pattern;
if(is >> delim >> pattern){
myset.push_back(make_pair(delim, pattern));
} else {
throw runtime_error("Invalid pattern pair!");
}
}
while(getline(input, line)){
string temp = line;
for(auto &item : myset){
replace(temp, item.first, item.second);
}
output << temp << endl;
}
Can someone please tell me what I'm messing up and how to fix it?

In pseudo-code a simple replacement algorithm could look something like this:
string input = getline();
string output; // The string containing the replacements
for (each char in input)
{
if (char == '.')
output += "%%%";
// TODO: Other replacements
else
output += char;
}
If you implement the above code, once it's done the variable output will contain the string with all replacements made.

I would suggest you use stringstream. This way you will be able to achieve what you are looking for very easily.

Related

How can I erase first and last character in string?

This function returns an array of strings with a list of files in a folder. It looks like this:
"folder//xyz.txt"
How can I make it look like this?
folder//xyz.txt
Its the same but without "".
vector<string> list_of_files(string folder_name)
{
vector<string> files;
string path = folder_name;
for (const auto& entry : fs::directory_iterator(path))
{
stringstream ss;
ss << entry.path(); //convert entry.path() to string
string str = ss.str();
files.push_back(ss.str());
}
return files;
}
Erasing the first and last characters of a string is easy:
if (str.size() >= 1)
str.erase(0, 1); // from 1st char (#0), len 1; bit verbose as not designed for this
if (str.size() >= 1)
str.pop_back(); // chop off the end
Your quotes have come from inserting the path to a stream (quoted is used to help prevent bugs due to spaces down the line).
Fortunately, you don't need any of this: as explored in the comments, the stringstream is entirely unnecessary; the path already converts to a string if you ask it to:
vector<string> list_of_files(string folder_name)
{
vector<string> files;
for (const auto& entry : fs::directory_iterator(folder_name))
files.push_back(entry.path().string());
return files;
}

split a string using multiple delimiters (including delimiters) in c++

I have a string which I input as follows
using namespace std;
string s;
getline(cin, s);
I input
a.b~c.d
I want to split the string at . and ~ but also want to store the delimiters. The split elements will be stored in a vector.
Final output should look like this
a
.
b
~
c
.
d
I saw a solution here but it was in java.
How do I achieve this in c++?
This solution is copied verbatim from this answer except for the commented lines:
std::stringstream stringStream(inputString);
std::string line;
while(std::getline(stringStream, line))
{
std::size_t prev = 0, pos;
while ((pos = line.find_first_of(".~", prev)) != std::string::npos) // only look for . and ~
{
if (pos > prev)
wordVector.push_back(line.substr(prev, pos-prev));
wordVector.push_back(line.substr(pos, 1)); // add delimiter
prev = pos+1;
}
if (prev < line.length())
wordVector.push_back(line.substr(prev, std::string::npos));
}
I haven't tested the code, but the basic idea is you want to store the delimiter character in the result as well.

How to extract specific elements from a string?

I am trying to extract the first numbers from each block of numbers from the next string.
string s = "f 1079//2059 1165//2417 1164//2414 1068//1980";
In this example I need to extract 1079, 1165, 1164 and 1068
I have tried with getline and substr but I have not been able to.
You can utilize the <regex>(C++ regular expression library) with pattern (\\d+)//. Locate the numbers before double slashes. Also using the parentheses to extract the numbers only by submatch.
Here is usage.
string s = "f 1079//2059 1165//2417 1164//2414 1068//1980";
std::regex pattern("(\\d+)//");
auto match_iter = std::sregex_iterator(s.begin(), s.end(), pattern);
auto match_end = std::sregex_iterator();
for (;match_iter != match_end; match_iter++)
{
const std::smatch& m = *match_iter;
std::cout << m[1].str() << std::endl; // sub-match for token in parentheses, the 1079, 1165, ...
// m[0]: whole match, "1079//"
// m[1]: first submatch, "1070"
}
I usually reach for istringstream for this kind of thing:
std::string input = "f 1079//2059 1165//2417 1164//2414 1068//1980";
std::istringstream is(input);
char f;
if (is >> f)
{
int number, othernumber;
char slash1, slash2;
while (is >> number >> slash1 >> slash2 >> othernumber)
{
// Process 'number'...
}
}
here is an attempt with getline and substring which works.
auto extractValues(const std::string& source)
-> std::vector<std::string>
{
auto target = std::vector<std::string>{};
auto stream = std::stringstream{ source };
auto currentPartOfSource = std::string{};
while (std::getline(stream, currentPartOfSource, ' '))
{
auto partBeforeTheSlashes = std::string{};
auto positionOfSlashes = currentPartOfSource.find("//");
if (positionOfSlashes != std::string::npos)
{
target.push_back(currentPartOfSource.substr(0, positionOfSlashes));
}
}
return target;
}
Or there is another split way to extract tokens, but it may involve some string copy.
Consider a split_by function like
std::vector<std::string> split_by(const std::string& str, const std::string& delem);
Possible implementations in Split a string in C++?
Make string be splitted by first, then splitted by // and extract first item.
std::vector<std::string> tokens = split_by(s, " ");
std::vector<std::string> words;
std::transform(tokens.begin() + 1, tokens.end(), // drop first "f"
std::back_inserter(words),
[](const std::string& s){ return split_by(s, "//")[0]; });

Program gets "Expression: string subscript out of range"

#include <iostream>
#include <string>
using namespace std;
string Latin(string words)
{
string strWord, strSentence = "";
int length = 0, index = 0;
while (words[index] != '\0')
{
if(words.find(' ', index) != -1)
{
length = words.find(' ', index);
length -= index;
strWord = words.substr(index,length);
strWord.insert(length, "ay");
strWord.insert(length, 1, words[index]);
strWord.erase(0,1);
index += length +1;
}
else
{
strWord = words.substr(index);
length = strWord.length();
strWord.insert(length, "ay");
strWord.insert(length,1,words[index]);
strWord.erase(0,1);
index = words.length();
}
strSentence += (strWord + " ");
}
return strSentence;
}
int main()
{
string str;
getline(cin,str);
str = Latin(str);
cout<<str<<endl;
return 0;
}
I get this error that says
I have no clue what to do. As I am new to this, this is a program that is suppose to ask for user input of a length of words and translate them into pig Latin. Any help would be greatly appreciated.
Unless I really wanted to make my own life difficult, I'd do this quite a bit differently. First, I'd use a std::stringstream to break the input string into words to process. Then, I'd use std::rotate to move the first character of the string to the end. Finally, I'd wrap that all in std::transform to manage applying the function to each word in succession.
std::string line;
std::getline(std::cin, line);
std::stringstream buffer(line);
std::stringstream result;
std::transform(std::istream_iterator<std::string>(buffer),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(result, " "),
[](std::string s) {
std::rotate(s.begin(), s.begin() + 1, s.end());
s += "ay";
return s;
});
Of course, this doesn't know the special rules for things like words that start with vowels or letter pairs like sh or ch, but it looks like that's outside the scope of the task at hand.
For more on std::rotate, I recommend watching some of Sean Parent's videos.

Need a regular expression to extract only letters and whitespace from a string

I'm building a small utility method that parses a line (a string) and returns a vector of all the words. The istringstream code I have below works fine except for when there is punctuation so naturally my fix is to want to "sanitize" the line before I run it through the while loop.
I would appreciate some help in using the regex library in c++ for this. My initial solution was to us substr() and go to town but that seems complicated as I'll have to iterate and test each character to see what it is then perform some operations.
vector<string> lineParser(Line * ln)
{
vector<string> result;
string word;
string line = ln->getLine();
istringstream iss(line);
while(iss)
{
iss >> word;
result.push_back(word);
}
return result;
}
Don't need to use regular expressions just for punctuation:
// Replace all punctuation with space character.
std::replace_if(line.begin(), line.end(),
std::ptr_fun<int, int>(&std::ispunct),
' '
);
Or if you want everything but letters and numbers turned into space:
std::replace_if(line.begin(), line.end(),
std::not1(std::ptr_fun<int,int>(&std::isalphanum)),
' '
);
While we are here:
Your while loop is broken and will push the last value into the vector twice.
It should be:
while(iss)
{
iss >> word;
if (iss) // If the read of a word failed. Then iss state is bad.
{ result.push_back(word);// Only push_back() if the state is not bad.
}
}
Or the more common version:
while(iss >> word) // Loop is only entered if the read of the word worked.
{
result.push_back(word);
}
Or you can use the stl:
std::copy(std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>(),
std::back_inserter(result)
);
[^A-Za-z\s] should do what you need if your replace the matching characters by nothing. It should remove all characters that are not letters and spaces. Or [^A-Za-z0-9\s] if you want to keep numbers too.
You can use online tools like this one : http://gskinner.com/RegExr/ to test out your patterns (Replace tab). Indeed some modifications can be required based on the regex lib you are using.
I'm not positive, but I think this is what you're looking for:
#include<iostream>
#include<regex>
#include<vector>
int
main()
{
std::string line("some words: with some punctuation.");
std::regex words("[\\w]+");
std::sregex_token_iterator i(line.begin(), line.end(), words);
std::vector<std::string> list(i, std::sregex_token_iterator());
for (auto j = list.begin(), e = list.end(); j != e; ++j)
std::cout << *j << '\n';
}
some
words
with
some
punctuation
The simplest solution is probably to create a filtering
streambuf to convert all non alphanumeric characters to space,
then to read using std::copy:
class StripPunct : public std::streambuf
{
std::streambuf* mySource;
char myBuffer;
protected:
virtual int underflow()
{
int result = mySource->sbumpc();
if ( result != EOF ) {
if ( !::isalnum( result ) )
result = ' ';
myBuffer = result;
setg( &myBuffer, &myBuffer, &myBuffer + 1 );
}
return result;
}
public:
explicit StripPunct( std::streambuf* source )
: mySource( source )
{
}
};
std::vector<std::string>
LineParser( std::istream& source )
{
StripPunct sb( source.rdbuf() );
std::istream src( &sb );
return std::vector<std::string>(
(std::istream_iterator<std::string>( src )),
(std::istream_iterator<std::string>()) );
}