wordwap function fix to preserve whitespace between words - c++

Some time ago I was looking for a snippet to do a wordwrap for a certain size of line length without breaking up the words. It was working fair enough, but now when I started using it in edit control, I noticed it eats up multiple white space symbols in between. I am contemplating how to fix it or get rid of it completely if wstringstream is not suitable for the task. Maybe someone out there have a similar function?
void WordWrap2(const std::wstring& inputString, std::vector<std::wstring>& outputString, unsigned int lineLength)
{
std::wstringstream iss(inputString);
std::wstring line;
std::wstring word;
while(iss >> word)
{
if (line.length() + word.length() > lineLength)
{
outputString.push_back(line+_T("\r"));
line.clear();
}
if( !word.empty() ) {
if( line.empty() ) line += word; else line += +L" " + word;
}
}
if (!line.empty())
{
outputString.push_back(line+_T("\r"));
}
}
Wrap line delimiter symbol should remain \r

Instead of reading a word at a time, and adding words until you'd exceed the desired line length, I'd start from the point where you want to wrap, and work backwards until you find a white-space character, then add that entire chunk to the output.
#include <iostream>
#include <string>
#include <vector>
#include <stdlib.h>
void WordWrap2(const std::wstring& inputString,
std::vector<std::wstring>& outputString,
unsigned int lineLength) {
size_t last_pos = 0;
size_t pos;
for (pos=lineLength; pos < inputString.length(); pos += lineLength) {
while (pos > last_pos && !isspace((unsigned char)inputString[pos]))
--pos;
outputString.push_back(inputString.substr(last_pos, pos-last_pos));
last_pos = pos;
while (isspace((unsigned char)inputString[last_pos]))
++last_pos;
}
outputString.push_back(inputString.substr(last_pos));
}
As it stands, this will fail if it encounters a single word that's longer than the line length you've specified (in such a case, it probably should just break in the middle of the word, but it currently doesn't).
I've also written it to skip over whitespace between words when they happen at a line break. If you really don't want that, just eliminate the:
while (isspace((unsigned char)inputString[last_pos]))
++last_pos;

If you don't want to loose space characters, you need to add the following line before doing any reads:
iss >> std::noskipws;
But then using >> with a string as a second argument won't work well w.r.t. spaces.
You'll have to resort to reading chars, and manage them in an ad'hoc manner yourself.

Related

how do you split a string embedded in a delimiter in C++?

I understand how to split a string by a string by a delimiter in C++, but how do you split a string embedded in a delimiter, e.g. try and split ”~!hello~! random junk... ~!world~!” by the string ”~!” into an array of [“hello”, “ random junk...”, “world”]? are there any C++ standard library functions for this or if not any algorithm which could achieve this?
#include <iostream>
#include <vector>
using namespace std;
vector<string> split(string s,string delimiter){
vector<string> res;
s+=delimiter; //adding delimiter at end of string
string word;
int pos = s.find(delimiter);
while (pos != string::npos) {
word = s.substr(0, pos); // The Word that comes before the delimiter
res.push_back(word); // Push the Word to our Final vector
s.erase(0, pos + delimiter.length()); // Delete the Delimiter and repeat till end of String to find all words
pos = s.find(delimiter); // Update pos to hold position of next Delimiter in our String
}
res.push_back(s); //push the last word that comes after the delimiter
return res;
}
int main() {
string s="~!hello~!random junk... ~!world~!";
vector<string>words = split(s,"~!");
int n=words.size();
for(int i=0;i<n;i++)
std::cout<<words[i]<<std::endl;
return 0;
}
The above program will find all the words that occur before, in between and after the delimiter that you specify. With minor changes to the function, you can make the function suit your need ( like for example if you don't need to find the word that occurs before the first delimiter or last delimiter) .
But for your need, the given function does the word splitting in the right way according to the delimiter you provide.
I hope this solves your question !

Parsing coordinates from file using c++

I have a question about parsing coordinates from a file into my C++ program.
The content of the file "file.txt" consists of one line: "1,2"
the 1 needs to be the X coordinate. The ',' is the delimiter. And the 2 is the Y coordinate.
The output of my program is: "1".
It looks like my program only puts the string in front of the delimiter in the vector and then thinks its the end of the file.
How can i solve this problem?
You can find my code down here. Thanks in advance!
#include <string>
#include <vector>
#include <fstream>
#include <iostream>
#include <sstream>
char data[220];
void parseString(std::string string);
int main(int argc, char **argv) {
std::ifstream indata("file.txt");
std::vector <std::string> buffer(5);
int i = 0;
while(indata.good())
{
indata.getline(data, 220);
parseString(data);
++i;
}
return 0;
}
void parseString(std::string string){
std::string delimiter = ",";
size_t pos = 0;
std::string token;
std::vector<std::string> tempVector(2);
int i = 0;
while ((pos = string.find(delimiter)) != std::string::npos) {
token = string.substr(0, pos);
tempVector[i] = token;
string.erase(0, pos + delimiter.length());
}
for(std::string S : tempVector){
std::cout << S << std::endl;
}
}
Here is the problem come from:
while ((pos = string.find(delimiter)) != std::string::npos) {
token = string.substr(0, pos);
tempVector[i] = token;
string.erase(0, pos + delimiter.length());
}
After the first loop parsing (in while), you erase the first part, i.e. "1,", which leaves you only "2". Then you will stop here as no more delimiter is in it. That's why you only got 1.
You can simply put string data into a std::istringstream, then you can parse data easily by using >>:
std::istringstream iss(data); // e.g. data = "1,2"
int first_int, second_int;
char delimiter;
iss >> first_int >> delimiter >> second_int;
| | |
1 ',' 2
The root of the problem is that your requirements are underspecified. For example:
Can you assume that every coordinate is just from 0 to 9? Or are there coordinates with more digits?
Can there be negative coordinates? Should you be able to handle a minus character? Is a plus character allowed, i.e. something like "-1,+1"?
Where is whitespace allowed?
Do you have to handle errors such as when the file is empty or there is no ',' at all, or if there are multiple commas, or if one of the supposed numbers does not consist of digits?
Are you allowed to ignore everything after correct input, i.e. something like "1,2xxx"?
For the simplest of requirements imaginable here, you could just do:
if (data[1] == ',') {
int x = data[0] - '0';
int y = data[2] - '0';
}
But that's apparently not good enough. So you do have more complex requirements, and I think you should put more thought into them. Only then will you be able to produce a really correct program.
As a final word, mind that user input is always a very complex thing, and it's generally hard to think about and cover each and every corner case, but everyone likes programs which handle user input correctly and intuitively and report errors in the most precise way possible, don't we? :)

Segmentation Fault searching for End of Line

I'm writing code that counts the amount of lines and characters of a file.
#include <fstream>
#include <iostream>
#include <stdlib.h>
using namespace std;
int main(int argc, char* argv[])
{
ifstream read(argv[1]);
char line[256];
int nLines=0, nChars=0, nTotalChars=0;
read.getline(line, 256);
while(read.good()) /
{
nChars=0;
int i=0;
while(line[i]!='\n')
{
if ((int)line[i]>32) {nChars++;}
i++;
}
nLines++;
nTotalChars= nTotalChars + nChars;
read.getline(line, 256);
}
cout << "The number of lines is "<< nLines << endl;
cout << "The number of characters is "<< nTotalChars << endl;
}
The line while(line[i]!='\n') seems to be the cause of the following error
Segmentation fault (core dumped)
I can't figure out what's wrong. The internet tells me that I'm checking for the end of a line correctly as far as I can tell.
Your code will not find '\n' because it is discarded from the input sequence. From the documentation of getline:
The delimiting character is the newline character [...]: when found in the input sequence, it is extracted from the input sequence, but discarded and not written to s.
You should be searching for '\0':
while(line[i])
{
if ((int)line[i]>32) {nChars++;}
i++;
}
Because getline will not store \n, so the loop:
while(line[i]!='\n')
{
if ((int)line[i]>32) {nChars++;}
i++;
}
will never end, until line[i] exceeds the array length and causes segmentation fault.
You do not have an end of line character in the line. So, you should be checking for a NULL character (end of string) instead of the end of line. Also make sure that you do not go past the size of your buffer (256) in your case.
I think a for loop would be safer:
for ( unsigned int i = 0; i < line.size(); i++ ) {
//whatever
}
There are several problems with your code, but for starters, you
shouldn't be reading lines into a char[]. If you use
std::string, then you don't have to worry about reading
partial lines, etc.
Then there's the fact that getline extracts the '\n' from
the file, but does not store it, so your code (even modified
to use std::string) will never see a '\n' in the buffer. If
you're using string, you iterate from line.begin() to
line.end(); if you're using a char[], you iterate over the
number of bytes returned by read.gcount(), called after the
call to getline. (It's very difficult to get this code right
using a char[] unless you assume that no text file in the
world contains a '\0'.)
Finally, if the last line doesn't end with a '\n' (a frequence
case under Windows), you won't process it. If you're using
std::string, you can simply write:
std::getline( read, line );
while ( read ) {
// ...
std::getline( read, line );
}
or even:
while ( std::getline( read, line ) ) {
++ nLines;
for ( std::string::const_iterator current = line.begin();
current != line.end();
++ current ) {
// process character *current in line...
}
}
(The latter is ubiquitous, even if it is ugly.)
With char[], you have to replace this with:
while ( read.getline( buffer, sizeof(buffer) ) || read.gcount() != 0 ) {
int l = read.gcount();
if ( read ) {
++ nLines;
} else {
if ( read.eof() ) {
++ nLines; // Last line did not end with a '\n'
} else {
read.clear(); // Line longer than buffer...
}
for ( int i = 0; i != l; ++ i ) {
// process character buffer[i] in line...
}
}
One final question: what is (int)line[i] > 32 supposed to
mean? Did you want !isspace( line[i] ) &&
!iscntrl( line[i] )? (That's not at all what it does, of
course.)

Need a regular expression to extract only letters and whitespace from a string

I'm building a small utility method that parses a line (a string) and returns a vector of all the words. The istringstream code I have below works fine except for when there is punctuation so naturally my fix is to want to "sanitize" the line before I run it through the while loop.
I would appreciate some help in using the regex library in c++ for this. My initial solution was to us substr() and go to town but that seems complicated as I'll have to iterate and test each character to see what it is then perform some operations.
vector<string> lineParser(Line * ln)
{
vector<string> result;
string word;
string line = ln->getLine();
istringstream iss(line);
while(iss)
{
iss >> word;
result.push_back(word);
}
return result;
}
Don't need to use regular expressions just for punctuation:
// Replace all punctuation with space character.
std::replace_if(line.begin(), line.end(),
std::ptr_fun<int, int>(&std::ispunct),
' '
);
Or if you want everything but letters and numbers turned into space:
std::replace_if(line.begin(), line.end(),
std::not1(std::ptr_fun<int,int>(&std::isalphanum)),
' '
);
While we are here:
Your while loop is broken and will push the last value into the vector twice.
It should be:
while(iss)
{
iss >> word;
if (iss) // If the read of a word failed. Then iss state is bad.
{ result.push_back(word);// Only push_back() if the state is not bad.
}
}
Or the more common version:
while(iss >> word) // Loop is only entered if the read of the word worked.
{
result.push_back(word);
}
Or you can use the stl:
std::copy(std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>(),
std::back_inserter(result)
);
[^A-Za-z\s] should do what you need if your replace the matching characters by nothing. It should remove all characters that are not letters and spaces. Or [^A-Za-z0-9\s] if you want to keep numbers too.
You can use online tools like this one : http://gskinner.com/RegExr/ to test out your patterns (Replace tab). Indeed some modifications can be required based on the regex lib you are using.
I'm not positive, but I think this is what you're looking for:
#include<iostream>
#include<regex>
#include<vector>
int
main()
{
std::string line("some words: with some punctuation.");
std::regex words("[\\w]+");
std::sregex_token_iterator i(line.begin(), line.end(), words);
std::vector<std::string> list(i, std::sregex_token_iterator());
for (auto j = list.begin(), e = list.end(); j != e; ++j)
std::cout << *j << '\n';
}
some
words
with
some
punctuation
The simplest solution is probably to create a filtering
streambuf to convert all non alphanumeric characters to space,
then to read using std::copy:
class StripPunct : public std::streambuf
{
std::streambuf* mySource;
char myBuffer;
protected:
virtual int underflow()
{
int result = mySource->sbumpc();
if ( result != EOF ) {
if ( !::isalnum( result ) )
result = ' ';
myBuffer = result;
setg( &myBuffer, &myBuffer, &myBuffer + 1 );
}
return result;
}
public:
explicit StripPunct( std::streambuf* source )
: mySource( source )
{
}
};
std::vector<std::string>
LineParser( std::istream& source )
{
StripPunct sb( source.rdbuf() );
std::istream src( &sb );
return std::vector<std::string>(
(std::istream_iterator<std::string>( src )),
(std::istream_iterator<std::string>()) );
}

Read a string line by line using c++

I have a std::string with multiple lines and I need to read it line by line.
Please show me how to do it with a small example.
Ex: I have a string string h;
h will be:
Hello there.
How are you today?
I am fine, thank you.
I need to extract Hello there., How are you today?, and I am fine, thank you. somehow.
#include <sstream>
#include <iostream>
int main() {
std::istringstream f("line1\nline2\nline3");
std::string line;
while (std::getline(f, line)) {
std::cout << line << std::endl;
}
}
There are several ways to do that.
You can use std::string::find in a loop for '\n' characters and substr() between the positions.
You can use std::istringstream and std::getline( istr, line ) (Probably the easiest)
You can use boost::tokenize
this would help you :
http://www.cplusplus.com/reference/iostream/istream/getline/
If you'd rather not use streams:
int main() {
string out = "line1\nline2\nline3";
size_t start = 0;
size_t end;
while (1) {
string this_line;
if ((end = out.find("\n", start)) == string::npos) {
if (!(this_line = out.substr(start)).empty()) {
printf("%s\n", this_line.c_str());
}
break;
}
this_line = out.substr(start, end - start);
printf("%s\n", this_line.c_str());
start = end + 1;
}
}
I was looking for some standard implementation for a function which can return a particular line from a string. I came across this question and the accepted answer is very useful. I also have my own implementation which I would like to share:
// CODE: A
std::string getLine(const std::string& str, int line)
{
size_t pos = 0;
if (line < 0)
return std::string();
while ((line-- > 0) and (pos < str.length()))
pos = str.find("\n", pos) + 1;
if (pos >= str.length())
return std::string();
size_t end = str.find("\n", pos);
return str.substr(pos, (end == std::string::npos ? std::string::npos : (end - pos + 1)));
}
But I have replaced my own implementation with the one shown in the accepted answer as it uses standard function and would be less bug-prone..
// CODE: B
std::string getLine(const std::string& str, int lineNo)
{
std::string line;
std::istringstream stream(str);
while (lineNo-- >= 0)
std::getline(stream, line);
return line;
}
There is behavioral difference between the two implementations. CODE: B removes the newline from each line it returns. CODE: A doesn't remove newline.
My intention of posting my answer to this not-active question is to make others see possible implementations.
NOTE:
I didn't want any kind of optimization and wanted to perform a task given to me in a Hackathon!