Replace words in a string without skipping whitespaces - c++

I've got a string which contains a sentence. I have to search and replace a specific word in that string. In my case I have a vector of lines and another vector of words to replace.
Here's my function that generates a file with the final text:
void Generator::generate_file(const string& fileName){
string inBuffer, outBuffer;
std::stringstream ss;
std::ofstream outFile;
outFile.open(fileName);
for (const auto& inIT : userCode){
//userCode is a vector which contains lines of text
ss.str(inIT);
ss.clear();
outBuffer = "";
while (ss >> inBuffer){
for (auto keyIT : keywords){
//keywords is a vector which contains words to replace
if (keyIT == inBuffer)
inBuffer = "REPLACED";
}
outBuffer += inBuffer + " ";
}
outFile << outBuffer << endl;
}
outFile.close();
}
The problem with this function is that it skips all whitespaces. I need them in the output file. What should I do to achieve that?
Below you can see an example of how it works:
userCode:
userCode[0] = "class UrlEncoder(object): class";
userCode[1] = " def __init__(self, alphabet=DEFAULT_ALPHABET,\n block_size=DEFAULT_BLOCK_SIZE):";
Displaying the userCode vector:
class UrlEncoder(object):
def __init__(self, alphabet=DEFAULT_ALPHABET, block_size=DEFAULT_BLOCK_SIZE):
After executing my function it looks like this:
REPLACED UrlEncoder(object):
REPLACED __init__(self, alphabet=DEFAULT_ALPHABET, block_size=DEFAULT_BLOCK_SIZE):
As you can see it properly replaced the keywords. But unfortunately it skipped the tabulator.

The main issue is the way the stream extraction >> operator works. It removes and discards any leading whitespace characters when reading the next formatted input. Assuming you want to stick with using ss >> inBuffer when grabbing input, you need to find someway to preemptively grab any leading whitespace before you perform any input extraction.
For example,
string eatwhite(const string &str, size_t pos)
{
size_t endwhite = str.find_first_not_of(" \t\n", pos);
if (endwhite == string::npos) return "";
return string(str.begin() + pos, str.begin() + endwhite);
}
Now you would call eatwhite before doing any >>:
string outBuffer = eatwhite(ss.str(), ss.tellg());
while (ss >> inBuffer)
{
for (auto keyIT : keywords)
{
//...
}
string whitesp = eatwhite(ss.str(), ss.tellg());
outBuffer += inBuffer + whitesp;
}
outFile << outBuffer << endl;

Related

C++ read large file and save it to a string and remove specific random words

I was trying to find a way for read a semi-large file (130 MB) and save it into a string fast and remove all random words in the string, like this:
File.txt:
0x0239183 (10): Hello
0x0039123 (1): Test
...
The only word that the program should take is the one after the 2 points, not counting the space, for example ("Hello" & "Test" in this case).
I tried with this code:
fstream f(legitfiles.c_str(), fstream::in );
string s;
while(getline( f, s, '\0')){
size_t space_pos = s.find(" ");
if (space_pos != std::string::npos) {
s = s.substr(space_pos + 1);
}
}
cout << s << endl;
f.close();
But when I start the program the only word that remove is the first of the first line.
Output File.txt:
(10): Hello
0x0039123 (1): Test
...
You can try this:
while(getline(f, s)){
size_t space_pos = s.rfind(" ") + 1;
cout << s.substr(space_pos) << endl;
}
Note that the call to std::getline relies upon its default delimiter, and the std::string method used to find the correct position to slice the substring from is std::string::rfind.

Reading in only letters from a text file

I am trying to read in from a text file a poem that contains commas, spaces, periods, and newline character. I am trying to use getline to read in each separate word. I do not want to read in any of the commas, spaces, periods, or newline character. As I read in each word I am capitalizing each letter then calling my insert function to insert each word into a binary search tree as a separate node. I do not know the best way to separate each word. I have been able to separate each word by spaces but the commas, periods, and newline characters keep being read in.
Here is my text file:
Roses are red,
Violets are blue,
Data Structures is the best,
You and I both know it is true.
The code I am using is this:
string inputFile;
cout << "What is the name of the text file?";
cin >> inputFile;
ifstream fin;
fin.open(inputFile);
//Input once
string input;
getline(fin, input, ' ');
for (int i = 0; i < input.length(); i++)
{
input[i] = toupper(input[i]);
}
//check for duplicates
if (tree.Find(input, tree.Current, tree.Parent) == true)
{
tree.Insert(input);
countNodes++;
countHeight = tree.Height(tree.Root);
}
Basically I am using the getline(fin,input, ' ') to read in my input.
I was able to figure out a solution. I was able to read in an entire line of code into the variable line, then I searched each letter of the word and only kept what was a letter and I stored that into word.Then, I was able to call my insert function to insert the Node into my tree.
const int MAXWORDSIZE = 50;
const int MAXLINESIZE = 1000;
char word[MAXWORDSIZE], line[MAXLINESIZE];
int lineIdx, wordIdx, lineLength;
//get a line
fin.getline(line, MAXLINESIZE - 1);
lineLength = strlen(line);
while (fin)
{
for (int lineIdx = 0; lineIdx < lineLength;)
{
//skip over non-alphas, and check for end of line null terminator
while (!isalpha(line[lineIdx]) && line[lineIdx] != '\0')
++lineIdx;
//make sure not at the end of the line
if (line[lineIdx] != '\0')
{
//copy alphas to word c-string
wordIdx = 0;
while (isalpha(line[lineIdx]))
{
word[wordIdx] = toupper(line[lineIdx]);
wordIdx++;
lineIdx++;
}
//make it a c-string with the null terminator
word[wordIdx] = '\0';
//THIS IS WHERE YOU WOULD INSERT INTO THE BST OR INCREMENT FREQUENCY COUNTER IN THE NODE
if (tree.Find(word) == false)
{
tree.Insert(word);
totalNodes++;
//output word
//cout << word << endl;
}
else
{
tree.Counter();
}
}
This is a good time for a technique I've posted a few times before: define a ctype facet that treats everything but letters as white space (searching for imbue will show several examples).
From there, it's a matter of std::transform with istream_iterators on the input side, a std::set for the output, and a lambda to capitalize the first letter.
You can make a custom getline function for multiple delimiters:
std::istream &getline(std::istream &is, std::string &str, std::string const& delims)
{
str.clear();
// the 3rd parameter type and the condition part on the right side of &&
// should be all that differs from std::getline
for(char c; is.get(c) && delims.find(c) == std::string::npos; )
str.push_back(c);
return is;
}
And use it:
getline(fin, input, " \n,.");
You can use std::regex to select your tokens
Depending on the size of your file you can read it either line by line or entirely in an std::string.
To read the file you can use :
std::ifstream t("file.txt");
std::string sin((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
and this will do the matching for space separated string.
std::regex word_regex(",\\s]+");
auto what =
std::sregex_iterator(sin.begin(), sin.end(), word_regex);
auto wend = std::sregex_iterator();
std::vector<std::string> v;
for (;what!=wend ; wend) {
std::smatch match = *what;
V.push_back(match.str());
}
I think to separate tokens separated either by , space or new line you should use this regex : (,| \n| )[[:alpha:]].+ . I have not tested though and it might need you to check this out.

General CSV Parser with multiple EOL characters

I'm trying to change this function to also account for when CSV files are given with \r endings. I can't seem to figure out how to get getline() take that into account.
vector<vector<string>> Parse::parseCSV(string file)
{
// input fstream instance
ifstream inFile;
inFile.open(file);
// check for error
if (inFile.fail()) { cerr << "Cannot open file" << endl; exit(1); }
vector<vector<string>> data;
string line;
while (getline(inFile, line))
{
stringstream inputLine(line);
char delimeter = ',';
string word;
vector<string> brokenLine;
while (getline(inputLine, word, delimeter)) {
word.erase(remove(word.begin(), word.end(), ' '), word.end()); // remove all white spaces
brokenLine.push_back(word);
}
data.push_back(brokenLine);
}
inFile.close();
return data;
};
This is a possible duplicate of Getting std :: ifstream to handle LF, CR, and CRLF?. The top answer is particularly good.
If you know every line ends with a \r you can always specify the getline delimiter with getline(input, data, '\r'), where input is an stream, data is a string, and the third parameter is the character to split by. You could also try something like the following after the start of the first while loop
// after the start of the first while loop
stringstream inputLine;
size_t pos = line.find('\r');
if(pos < line.size()) {
inputLine << std::string(x.begin(), x.begin() + p);
inputLine << "\n"
inputLine << std::string(x.begin() + p + 1, x.end());
} else {
inputLine << line;
}
// the rest of your code here

How to extract a substring from a string in C++?

I've been looking thousand of questions and answers about what I'm going to ask, but I still didn't find the way to do what I'm gonna to explain.
I have a text file from which I have to extract information about several things, all of them with the following format:
"string1":"string2"
And after that, there is more information, I mean:
The text file is something like this:
LINE 1
XXXXXXXXXXXXXXXXXXXXXXXXXXXX"string1":"string2"XXXXXXXXXXXXXXXXXXXXXXXXXX"string3":"string4"XXXXXXXXXXXXXXXXXXXXXXXXXXXX...('\n')
LINE 2
XXXXXXXXXXXXXXXXXXXXXXXXXXXX"string5":"string6"XXXXXXXXXXXXXXXXXXXXXXXXXX"string7":"string8"XXXXXXXXXXXXXXXXXXXXXXXXXXXX...
XXX represents irrelevant information I do not need, and theEntireString (string used in the code example) stores all the information of a single line, not all the information of the text file.
I have to find first the content of string1 and store the content of string2 into another string without the quotes. The problem is that I have to stop when I reache the last quote and I don't know how exactly do this. I suppose I have to use the functions find() and substr(), but despite having tried it repeatedly, I did not succeed.
What I have done is something like this:
string extractInformation(string theEntireString)
{
string s = "\"string1\":\"";
string result = theEntireString.find(s);
return result;
}
But this way I suppose I store into the string the last quote and the rest of the string.
"find" function just give you the position of matched string to get the resulting string you need to use the "subst" function. Try This
string start,end;
start = theEntireString.substr(1,theEntireString.find(":")-2);
end = theEntireString.substr(theEntireString.find(":")+2,theEntireString.size()-1);
That will solve you problem
Assuming either the key or value contains a quotation mark. The following will output the value after the ":". You can also use it in a loop to repeatedly extract the value field if you have multiple key-value pairs in the input string, provided that you keep a record of the position of last found instance.
#include <iostream>
using namespace std;
string extractInformation(size_t p, string key, const string& theEntireString)
{
string s = "\"" + key +"\":\"";
auto p1 = theEntireString.find(s);
if (string::npos != p1)
p1 += s.size();
auto p2 = theEntireString.find_first_of('\"',p1);
if (string::npos != p2)
return theEntireString.substr(p1,p2-p1);
return "";
}
int main() {
string data = "\"key\":\"val\" \"key1\":\"val1\"";
string res = extractInformation(0,"key",data);
string res1 = extractInformation(0,"key1",data);
cout << res << "," << res1 << endl;
}
Outputs:
val,val1
Two steps:
First we have to find the position of the : and splice the string into two parts:
string first = theEntireString.substr(0, theEntireString.find(":"));
string second = theEntireString.substr(theEntireString.find(":") + 1);
Now, we have to remove the "":
string final_first(first.begin() + 1, first.end() - 1);
string final_second(second.begin() + 1, second.end() - 1);
You don't need any string operation. I hope the XXXXX doesn't contain any '"', so You can read the both strings directly from the file:
ifstream file("input.txt");
for( string s1,s2; getline( getline( file.ignore( numeric_limits< streamsize >::max(), '"' ), s1, '"' ) >> Char<':'> >> Char<'"'>, s2, '"' ); )
cout << "S1=" << s1 << " S2=" << s2 << endl;
the little help-function Char is:
template< char C >
std::istream& Char( std::istream& in )
{
char c;
if( in >> c && c != C )
in.setstate( std::ios_base::failbit );
return in;
}
#include <regex>
#include <iostream>
using namespace std;
const string text = R"(
XXXXXXXXXXXXXXXXXXXXXXXXXXXX"string1":"string2"XXXXXXXXXXXXXXXXXXXXXXXXXX"string3" :"string4" XXXXXXXXXXXXXXXXXXXXXXXXXXXX...
XXXXXXXXXXXXXXXXXXXXXXXXXXXX"string5": "string6"XXXXXXXXXXXXXXXXXXXXXXXXXX"string7" : "string8" XXXXXXXXXXXXXXXXXXXXXXXXXXXX...
)";
int main() {
const regex pattern{R"~("([^"]*)"\s*:\s*"([^"]*)")~"};
for (auto it = sregex_iterator(begin(text), end(text), pattern); it != sregex_iterator(); ++it) {
cout << it->format("First: $1, Second: $2") << endl;
}
}
Output:
First: string1, Second: string2
First: string3, Second: string4
First: string5, Second: string6
First: string7, Second: string8
Running (with clang and libc++): http://coliru.stacked-crooked.com/a/f0b5fd383bc227fc
This is how raw string literals look in an editor that understand them: http://bl.ocks.org/anonymous/raw/9442865/

how to read a line in C++ in a certain pattern and store it in a string?

I want to read each line of txt file which resembles something like this
1190/2132 123/23123 45
I want to read the whole line and then store them in three separate strings for future use to build a tree . I am using fgets right now , but getting errors regarding putting it into a string . How should i do it ?
Try this:
std::string line;
while(std::getline(file, line))
{
std::stringstream linestream(line);
std::string word1, word2, word3;
line >> word1 >> word2 >> word3;
// Store words
}
You've tagged the question C++, but you say you're using fgets, so I'm not sure which one you want.
Using C stdio functions:
fscanf(file, "%s %s %s", str1, str2, str3);
Using C++ streams:
input_stream >> str1 >> str2 >> str3;
This may work:
string a, b, c;
getline(cin, a, '/')
getline(cin, b, ' ')
//will only get executed if the third string exist
if(cin >> c){}
Stuff you need to get it work:
include so that you can open the text file with a input file stream.
include if you want to display some information on the screen as well but just optional.
The code part:
Pseudo code:
define a character array with length K, where K can be defined as a MACRO
open an input file stream
test if it is opened, if opened read a line and parse the line until the EOF.
if not opened, return -1.
The Code
int fileread(const char* filename, dataType& data /* some object saving the read info. */)
{
char lntxt[MAX_LNTXT_LENGTH_CPTIMGIDX]; // 4)
ifstream inSR(_filename); // 5)
if (inSR.is_open()) // 6)
{
// If file is open
while (inSR.peek()>0)
{
inSR.getline(lntxt, MAX_LNTXT_LENGTH_CPTIMGIDX);
// delim can be a set of possible delim
char* strTk = strtok(lntxt, _delim);
while (strTk != NULL)
{
strTk = strtok(NULL, _delim);
if (strTk != NULL)
// Your code to process the data, i.e. some arithmetic operation
// or store it in other variables or objects.**
}
inSR.close();
return 0;
}
else // 7)
{
cout <<"The file " <<_filename <<" can not be opened.";
return -1;
}
}