The way to strip newline characters - c++

Let's say we have a long string with multiple newline characters:
char const* some_text = "part1\n\npart2\npart3";
Now the task is to replace all '\n' characters with spaces if it appears only once between text parts, and at the same time leave all '\n' characters if it appears more than once. In other words:
"123\n456" => "123 456"
"123\n\n456" => "123\n\n456"
"123\n\n456\n789" => "123\n\n456 789"
What is the best way to do this?

The following regular expression detects single occurrences of newlines:
([^\n]|^)\n([^\n]|$)
|-------|
no newline before
(either other character or beginning of string)
|--|
newline
|--------|
no newline after
(either other character or end of string)
You can use that regular expression in std::regex_replace in order to replace those single newlines by spaces (and keeping the matched character before and after the newline by adding $1 and $2):
std::string testString("\n123\n\n456\n789");
std::regex e("([^\n]|^)\n([^\n]|$)");
std::cout << std::regex_replace(testString, e, "$1 $2") << std::endl;

Since it was tagged as C++, I'll treat it as such. Obviously this could be solved with a regex but it's equally trivial enough (as described) not to require one.
std::string s = "your\n\nstring\nhere\n";
size_t n = -1, len = s.length();
while ((n = s.find('\n', n+1)) != std::string::npos)
if ((n == 0 && s[n+1] != '\n') || (n == len && s[n-1] != '\n') ||
(n != 0 && n != len && s[n-1] != '\n' && s[n+1] != '\n'))
s[n] = ' ';

This function may works for your case, just manually check and replace single \n with space. There may have better option like regex_replace.
void rep(char ch[])
{
int cnt = 0;
int i;
for(i=0; ch[i]!='\0'; i++)
{
if(ch[i]=='\n')
cnt++;
else if(cnt==1)
{
ch[i-1]=' ';
cnt=0;
}
else
cnt=0;
}
if(cnt==1)
ch[i-1]=' ';
}

Related

How to get the number of words from a line from a text file

I am trying to get the number of words from a line in a text file. I used .getline() in order to extract a line from the entire text file. The code is:
#include <iostream>
#include <iomanip>
#include <fstream>
int main()
{
char const* filename = "duck.txt";
std::ifstream ifs{ filename };
constexpr size_t MAX_LINE_LEN{ 2048 };
char line[MAX_LINE_LEN];
int lineCount = 0;
int totalWordCount = 0;
int totalByteCount = 0;
while (ifs.getline(line, MAX_LINE_LEN-1))
{
int lineWord = 0;
char* q = &line[0];
if (ifs.eof())
{
lineCount--;
totalByteCount--;
}
while (*q != '\0')
{
q++;
totalByteCount++;
}
totalByteCount++;
if (*q == '\0')
{
lineCount++;
}
int i = 0;
int j = 0;
while (line[i] != '\0' && line[j] != '\0')
{
while (line[i] == ' ')
{
i++;
}
j = i;
while (line[j] != ' ')
{
j++;
}
lineWord++;
j = i;
}
totalWordCount += lineWord;
}
std::cout << "Total Lines: " << lineCount << '\n' << "Total Words: " << totalWordCount << '\n' << "Total Bytes: " << totalByteCount;
}
But the only important part is:
int i = 0;
int j = 0;
while (line[i] != '\0' && line[j] != '\0')
{
while (line[i] == ' ')
{
i++;
}
j = i;
while (line[j] != ' ')
{
j++;
}
lineWord++;
j = i;
}
totalWordCount += lineWord;
I'm trying to read the line character by character until I reach a non-whitespace character after which I'll assign that subscript to i. Then, I'll set j to the subscript of the first whitespace encountered after the character of line[i]. If j finds a whitespace, then there is a word. If j reaches '\0', then the line has ended and I end the while loop. When I try to compile and run this, the compiler just displays nothing. What am I doing wrong? Also, I can't add anymore header files
In your original code:
int i = 0;
int j = 0;
while (line[i] != '\0' && line[j] != '\0')
{
while (line[i] == ' ') // you do not check for end of string character?
{
i++;
}
j = i; // here i = j = beginning of word.
// since you rewind to beginning of the word below,
// your program keeps repeating this loop endlessly
while (line[j] != ' ') // this loop could easily run for quite a while.
// until it seg-faults
{
j++;
}
lineWord++;
j = i; // BUG Here! you're 'rewinding' j to to beginning of the word.
// you loop back and keep counting the same word over and over.
}
totalWordCount += lineWord;
What is the purpose of i and j ? Wouldn't the code be simpler and easier to read and maintain using a single pointer? Or a single index? This kind of algorithm is where a pointer would excel, though, as the only arithmetic pointer operation needed is increment.
As in:
const char* p = &line[0];
int word_count = 0;
line[MAX_LINE_LEN - 1] = 0; // making sure the code below stays within boundaries.
for(;;)
{
// skip to next word
while (*p && *p == ' ') ++p; // stay within the string by testing for zero.
if (!*p)
break; // done!
// since p now points to the beginning of a word, we've got one
++word_count;
// skip to end of word
while (*p && *p != ' ') ++p;
}
This is all fine, for most cases but there could be some exceptional typos in the text, like "hello, world !", where the last punctuation would be counted as a word. There is also the problem of horizontal tabs, which could also be counted erroneously as words.
To cover these cases, you should test for valid characters for words, instead of for space, which is a rather vague concept.
Without using library calls, You'd need to define what constitutes spaces and punctuation, either with a constant, or with a function.
Substituting the test for space with a more targeted test, using isalnum() to check for alpha or numeric characters:
for(;;)
{
// skip to next word
while (*p && !std::isalnum(*p & 0xFF)) ++p;
if (!*p)
break; // done!
++word_count;
// skip to end of word
while (*p && std::isalnum(*p & 0xFF)) ++p;
}
Beware of function of the isalnum(), isalpha().. family, they define their input as an int, the mask ensures that characters in the 128-255 range are not sign-extended aand are passed correctly as positive values.

Remove out excess spaces from string in C++

I have written program for removing excess spaces from string.
#include <iostream>
#include <string>
void RemoveExcessSpaces(std::string &s) {
for (int i = 0; i < s.length(); i++) {
while (s[i] == ' ')s.erase(s.begin() + i);
while (s[i] != ' ' && i < s.length())i++;
}
if (s[s.length() - 1] == ' ')s.pop_back();
}
int main() {
std::string s(" this is string ");
RemoveExcessSpaces(s);
std::cout << "\"" << s << "\"";
return 0;
}
One thing is not clear to me. This while (s[i] == ' ')s.erase(s.begin() + i); should remove every space in string, so the output would be thisisstring, but I got correct output which is this is string.
Could you explain me why program didn't remove one space between this and is and why I got the correct output?
Note: I cannot use auxiliary strings.
That is because when your last while loop finds the space between your characters (this is) control pass to increment part of your for loop which will increase the value of int i then it will point to next character of given string that is i(this is string) that's why there is space between (this is).
Your second while loop will break when s[i]==' '. But then your for loop will increment i and s[i] for that i will be skipped. This will happen for every first space character after each word.

Fastest way to count words of string

How could I make this algorithm faster and shorten this code which counts word of given string?
int number_of_words(std::string &s) {
int count = 0;
for (int i = 0; i < s.length(); i++) {
// skip spaces
while (s[i] == ' ' && i < s.length())
i++;
if (i == s.length())
break;
// word found
count++;
// inside word
while (s[i] != ' ' && i < s.length())
i++;
}
return count;
}
Your code is quite alright, speed-wise. But if you want to make your code shorter, you may use find_first_not_of() and find_first_of standard functions, like I did in following code that solves your task.
I made an assumption that all your words are separated by only spaces. If other separators are needed you may pass something like " \r\n\t" instead of ' ' in both lines of my code.
One small optimization that can be made in your code is when you notice that after first while-loop we're located on non-space character, so we can add ++i; line for free before second loop. Similarly after second while-loop we're located on space character so we may add one more ++i; line after second while loop. This will give a tiny bit of speed gain to avoid extra two checks inside while loop.
Try it online
#include <iostream>
#include <string>
int number_of_words(std::string const & s) {
ptrdiff_t cnt = 0, pos = -1;
while (true) {
if ((pos = s.find_first_not_of(' ', pos + 1)) == s.npos) break;
++cnt;
if ((pos = s.find_first_of(' ', pos + 1)) == s.npos) break;
}
return cnt;
}
int main() {
std::cout << number_of_words(" abc def ghi ") << std::endl;
}
Output:
3

Remove extra whitespace from c++ issue

I have this code snippet from online.
void ShortenSpace(string &s)
{
// n is length of the original string
int n = s.length();
//pointer i to keep trackof next position and j to traverse
int i = 0, j = -1;
// flag that sets to true is space is found
bool spaceFound = false;
// Handles leading spaces
while (++j < n && s[j] == ' ');
// read all characters of original string
while (j < n)
{
// if current characters is non-space
if (s[j] != ' ')
{
//if any preceeding space before ,.and ?
if ((s[j] == '.' || s[j] == ',' ||
s[j] == '?') && i - 1 >= 0 &&
s[i - 1] == ' ')
s[i - 1] = s[j++];
else
// copy current character to index i
// and increment both i and j
s[i++] = s[j++];
// set space flag to false when any
// non-space character is found
spaceFound = false;
}
// if current character is a space
else if (s[j++] == ' ')
{
// If space is seen first time after a word
if (!spaceFound)
{
s[i++] = ' ';
spaceFound = true;
}
}
}
// Remove trailing spaces
if (i <= 1)
s.erase(s.begin() + i, s.end());
else
s.erase(s.begin() + i - 1, s.end());
}
The problem is if the input is: "test (multiple spaces) test (multiple spaces) test."
It will remove the last period and put output like "test test test"
It removes the whitespace correctly but somehow it is mishandling/removing the punctuation. I do not want it to remove the punctuation. I'm still beginner in C++ so I am having a hard time figuring out why.
Because it indiscriminately deletes the last character.
The last conditional should check if the last character is a white space as well:
// Trim string to result
if (i <= 1 || s[i-1] != ' ')
s.erase(s.begin() + i, s.end());
else
s.erase(s.begin() + i - 1, s.end());
I corrected the comment as well, as it does not trim the trailing white spaces, but the trailing characters that are left over after the manipulation. This algorithm clears characters it shifts ahead. If you were to leave out this last conditional, the output would be:
test test test. test. for input test test test.

Remove spaces from string not taking effect

I'm trying to remove all characters and spaces except letters. But the "erase spaces" part doesn't take effect, it will only take effect if I comment out the remove characters part.
for (int i = 0; i < s.size(); i++)
{
if (!(s[i] >= 'a' && s[i] <= 'z' || s[i] >= 'A' && s[i] <= 'Z'))
{
s[i] = '\0';
}
}
s.erase(remove(s.begin(), s.end(), ' '), s.end());
You're replacing all the non-alphabetic characters with NULs, then removing all the spaces. Since NULs are not spaces, this latter step does nothing. If you change the assignment in the loop to
s[i] = ' ';
you would instead replace them with spaces, which would then be removed by the eraser(remove
If you want to make the code more readable, you could replace the complex if with
if (!isalpha(s[i]))
or you could even replace the whole thing with
s.erase(remove_if(s.begin(), s.end(), [](char ch){ return !isalpha(ch); });
So you replaced the characters you don't want with '\0'.
Then you removed all ' ' characters.
That last stage presumably should involve '\0'…
For the benefit of future readers: in C++20, we have unified erasure, so we can simply use
std::erase_if(s, [](unsigned char c) { return !std::isalpha(ch); });
(See Do I need to cast to unsigned char before calling toupper(), tolower(), et al.?
for why unsigned char should be used)