How can I remove a newline from inside a string in C++? - c++

I am trying to take text input from the user and compare it to a list of values in a text file. The values are this:
That line at the end is the cursor, not a straight line, but it doesn't matter. Anyway, I sort by word and produce the values, then check the values. Semicolon is a separator between words. All the data is basic to get the code working first. The important thing is that all the pieces of data have newlines after them. No matter what I try, I can't get rid of the newlines completely. Looking at the ASCII values shows why, My efforts remove only the new line, but not the carriage return. This is fine most of the time, but when comparing values they won't be the same because the one with the carriage return is treated as longer. Here is the important parts of the code:
int pos = 0;
while (pos != std::string::npos)
{
std::string look = lookContents.substr(pos+1, lookContents.find("\n", pos + 1) - pos);
//look.erase(std::remove(look.begin(), look.end(), '\n'), look.end());
//##
for (int i = 0; i < look.length(); i++)
{
std::cout << (int)(look[i]) << " ";
}
std::cout << std::endl;
std::cout << look << ", " << words[1] << std::endl;
std::cout << look.compare(0,3,words[1]) << std::endl;
std::cout << pos << std::endl;
//##
//std::cout << look << std::endl;
if (look == words[1])
{
std::cout << pos << std::endl;
break;
}
pos = lookContents.find("\n", pos + 1);
}
Everything between the //## are just error checking things. Heres what is outputs when I type look b:2
As you can see, the values have the ASCII 10 and 13 at the end, which is what is used to create newlines. 13 is carriage return and 10 is newline. The last one has its 10 remove earlier in the code so the code doesn't do an extra loop on an empty substring. My efforts to remove the newline, including the commented out erase function, either only remove the 13, or remove both the 10 and 13 but corrupt later data like this:
Also, you can see that using cout to print look and words1 at the same time causes look to just not exist for some reason. Printing it by itself works fine though. I realise I could fix this by just using that compare function in the code to check all but the last characters, but this feels like a temporary fix. Any solutions?

My efforts remove only the new line, but not the carriage return
The newline and carriage control are considered control characters.
To remove all the control characters from the string, you can use std::remove_if along with std::iscntrl:
#include <cctype>
#include <algorithm>
//...
lookContents.erase(std::remove_if(lookContents.begin(), lookContents.end(),
[&](char ch)
{ return std::iscntrl(static_cast<unsigned char>(ch));}),
lookContents.end());
Once you have all the control characters removed, then you can process the string without having to check for them.

Related

Text file as input in C++ program will not work unless the text is copy and pasted

I have a very strange bug in my code that is a little hard to explain. Let me begin with what the program does: basically, the C++ program takes input text (from a file named "input.txt" in the same directory) and uses Markov Chains to generate some artificial output text that resembles the style of the input text and prints it to the terminal.
It works when I copy and paste the text of 'Alice in Wonderland' (http://paulo-jorente.de/text/alice_oz.txt) directly into "input.txt", but if I add any words or characters to the beginning or end of the contents of the text file, then the code stops running (or runs infinitely). However, this does not happen if I add text anywhere in the middle of the contents of the text file.
If you would to test it yourself, try running the code with Alice in Wonderland copied into "input.txt". Then after it runs successfully, go to input.txt and type some random characters or words after the last of the text from 'Alice' ("...home again!") and try to run it again; it will fail.
Here is the code:
#include <ctime>
#include <iostream>
#include <algorithm>
#include <fstream>
#include <string>
#include <vector>
#include <map>
using namespace std;
class markovTweet{
string fileText;
map<string, vector<string> > dictionary;
public:
void create(unsigned int keyLength, unsigned int words) {
ifstream f("input.txt");
if(f.good()){
fileText.assign((istreambuf_iterator<char>(f)), istreambuf_iterator<char>());
}else{
cout << "File cannot be read. Ensure there is a file called input.txt in this directory." << "\n" << endl;
return;
}
if(fileText.length() < 1){
return;
}
cout << "\n" << "file imported" << "\n";
createDictionary(keyLength);
cout << "\n" << "createDictionary" << "\n" << "\n";
createText(words - keyLength);
cout << "\n" << "text created, done" << endl;
}
private:
void createText(int w) {
string key, first, second;
size_t next;
map<string, vector<string> >::iterator it = dictionary.begin();
advance( it, rand() % dictionary.size() );
key = (*it).first;
cout << key;
while(true) {
vector<string> d = dictionary[key];
if(d.size() < 1) break;
second = d[rand() % d.size()];
if(second.length() < 1) break;
cout << " " << second;
if(--w < 0) break;
next = key.find_first_of( 32, 0 );
first = key.substr( next + 1 );
key = first + " " + second;
}
cout << "\n";
}
void createDictionary(unsigned int kl) {
string w1, key;
size_t wc = 0, pos, next;
next = fileText.find_first_not_of( 32, 0 );
if(next == string::npos) return;
while(wc < kl) {
pos = fileText.find_first_of(' ', next);
w1 = fileText.substr(next, pos - next);
key += w1 + " ";
next = fileText.find_first_not_of(32, pos + 1);
if(next == string::npos) return;
wc++;
}
key = key.substr(0, key.size() - 1);
while(true) {
next = fileText.find_first_not_of(32, pos + 1);
if(next == string::npos) return;
pos = fileText.find_first_of(32, next);
w1 = fileText.substr(next, pos - next);
if(w1.size() < 1) break;
if(find( dictionary[key].begin(), dictionary[key].end(), w1) == dictionary[key].end() )
dictionary[key].push_back(w1);
key = key.substr(key.find_first_of(32) + 1) + " " + w1;
}
}
};
int main() {
markovTweet t;
cout << "\n" << "Artificially generated tweet using Markov Chains based off of input.txt: " << "\n" << "\n";
//lower first number is more random sounding text, second number is how long output is.
t.create(4, 30);
return 0;
}
This is a very strange bug and any help that you can offer is much appreciated! Thanks!
This might be something to think about regarding std::map's time complexity for its operator[]().
Using operator[] : “[]” can also be used to insert elements in map. Similar to above functions and returns the pointer to the newly constructed element. Difference is that this operator always constructs a new element i.e even if a value is not mapped to key, default constructor is called and assigns a “null” or “empty” value to the key. Size of map is always increased by 1.
Time complexity : log(n) where n is size of map
courtesy from: geeksforgeeks
In your class's createDictionary() function try adding this line of code in the 2nd while loop:
{
//...code
if (find(dictionary[key].begin(), dictionary[key].end(), w1) == dictionary[key].end()) {
dictionary[key].push_back(w1);
std::cout << dictionary.size() << std::endl;
//code...
}
When I copied the text from the file it was generating 62037 entries into your dictionary or hashmap. It takes roughly 20 - 30 seconds to run and finish.
When I added the text " Good Bye! " to the end of the file, saved it and ran the program/debugger it generated 62039 entries. Again it took about 20-30 seconds to run.
Then I added the text "Hello World " to the beginning of the file, saved it and ran the program/debugger and it generated 62041 entries. Again it took about 20-30 seconds to run.
However, there were a couple of times during this process, that it generated that many entries into your map, but the code was still going through the loop... The one time it was around 620xx - 640xx. I don't know what was causing it to generate that many keys... but like I said, there were a couple of times that it quit printing the values, but was still iterating through the same while loop, yet the size of the map wasn't increasing...
This happened the first time that I entered the text at the beginning of the file after trying it with the appended text at the end. This is when I decided to print out the size of your map and noticed that I was getting this infinite loop... Then I stopped the debugger went back to the text file and kept the inserted text at the beginning, but deleted the appended text at the end making sure to leave a single space at the end of the text.
This time when I ran the program/debugger, It worked correctly and it generated 62039 entries. Again it took about 20-30 seconds to run. After, the first successful run with the inserted text at the beginning is when I added the text at the end, and it ran fine. I then even tried to have "Hello World!" followed by a newline by using enter into the text file and having "Good Bye!" preceded by one as well and it still worked fine.
Yes, there is something causing a bug, but I don't know exactly what is causing it. However, I believe that I have traced it to be within this while loop and the conditional branching for exiting... It should have broken out of this loop and went into the createText function but it never broke out, the condition you have for:
if (next == std::string::npos) return
and
if (w1.size() < 1) break;
somehow were not being met.
The time complexity is okay, however, it's not the best but it's also not the worst as there are approximately 62-63k entries running in O(log n) time. This also doesn't include counting the space complexity which does need to be taken into consideration.
It could be that during one run you might be getting stack-overflow which is causing the infinite loop and the next time you run it, it might not. I don't think it has anything to do with adding in text into the text file directly except that it will increase the size of your map in O(log N) time and increase the space complexity as well.
Regardless of what you add into this text file and after saving it, the way your program or algorithms are written, it is pulling all of the contents of that file as pointer indices by char type through the iterator classes and storing it into a single string, fileText. After this string is constructed there are approximately 336940 characters in your class's member string.
Hopefully, this information can guide you in narrowing down where the bug is in your program and determining what is actually causing it. It truly is hard to narrow down this culprit.

Storing data in char array causing corruption around variable

I am working on a C++ project and I am having an issue.
Below is my code
tempfingerprint = libssh2_hostkey_hash(session, LIBSSH2_HOSTKEY_TYPE_RSA);
char temp[48];
memset(temp, 0, sizeof(temp));
for (i = 0; i < 16; i++)
{
//fingerprintstream << (unsigned char)tempfingerprint[i] << ":";
if (temp[0] == 0)
{
sprintf(temp, "%02X:", (unsigned char)tempfingerprint[i]);
}
else
{
//sprintf(temp, "%s:%02X", temp, (unsigned char)tempfingerprint[i]);
char characters[3];
memset(characters, 0, sizeof(characters));
//If less than 16, then add the colon (:) to the end otherwise don't bother as we're at the end of the fingerprint
sprintf(characters, "%02X:", (unsigned char)tempfingerprint[i]);
strcat(temp, characters);
}
}
//Remove the end colon as its not needed. 48 Will already be null terminated, so the previous will contain the last colon
temp[47] = 0;
return string(temp);
When I run my app, I get the following error from visual studio
Run-Time-Check Failure #2 - Stack around the variable 'temp' was corrupted.
I've ran the same code on Linux through Valgrind and no errors were shown so I'm not sure what the problem is with Windows.
Here's an approach using on what Paul McKenzie's talking about (though he might implement it differently) based on it looks like you were trying to do with the stream
#include <iostream>
#include <sstream>
#include <iomanip> // output format modifiers
using namespace std;
int main()
{
stringstream fingerprintstream;
// set up the stream to print uppercase hex with 0 padding if required
fingerprintstream << hex << uppercase << setfill('0');
// print out the first value without a ':'
fingerprintstream << setw(2) << 0;
for (int i = 1; i < 16; i++) // starting at 1 because first has already been handled.
{
// print out the rest prepending the ':'
fingerprintstream << ":" << setw(2) << i;
}
// print results
std::cout << fingerprintstream.str();
return 0;
}
Output:
00:01:02:03:04:05:06:07:08:09:0A:0B:0C:0D:0E:0F
Just realized what I think OP ran up against with the garbage output. When you output a number, << will use the appropriate conversion to get text, but if you output a character << prints the character. So fingerprintstream << (unsigned char)tempfingerprint[i]; takes the binary value at tempfingerprint[i] and, thanks to the cast, tries to render it as a character. Rather than "97", you will get (assuming ASCII) "a". A large amount of what you try to print will give nonsense characters.
Example: If I change
fingerprintstream << ":" << setw(2) << i;
to
fingerprintstream << ":" << setw(2) << (unsigned char)i;
the output becomes
0?:0?:0?:0?:0?:0?:0?:0?:0?:0?:0 :0
:0?:0?:0
:0?:0?
Note the tab and the line feeds.
I need to know the definition of tempfingerprint to be sure, but you can probably solve the garbage output problem by removing the cast.
Based on new information, tempfingerprint is const char *, so tempfingerprint[i] is a char and will be printed as a character.
We want a number, so we have to force the sucker to be an integer.
static_cast<unsigned int>(tempfingerprint[i]&0xFF)
the &0xFF masks out everything but the last byte, eliminating sign extension of negative numbers into huge positive numbers when displayed unsigned.
There are, as far as I see, two issues in the code which lead to exceeding array boundaries:
First, with char temp[48] you reserve exactly 48 characters for storing results; However, when calling strcat(temp, characters) with the 16th value, and characters comprises at least the characters including the colon, then temp will comprise 16*3 digits/colons + one terminating '\0'-character, i.e. 49 characters (not 48). Note that strcat automatically appends a string terminating char.
Second, you define char characters[3] such that you reserve place for two digits and the colon, but not for the terminating '\0'-character. Hence, an sprintf(characters, "%02X:",...) will exceed characterss array bounds, as sprintf also appends the string terminator.
So, if you do not want to rewrite your code in general, changing your definitions to char temp[49] and char characters[4] will solve the problem.

Why do I obtain this strange character?

Why does my C++ program create the strange character shown below in the pictures? The picture on the left with the black background is from the terminal. The picture on the right with the white background is from the output file. Before, it was a "\v" now it changes to some sort of astrological symbol or symbol to denote males. 0_o This makes no sense to me. What am I missing? How can I have my program output just a backslash v?
Please see my code below:
// SplitActivitiesFoo.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <iostream>
#include <vector>
#include <fstream>
using namespace std;
int main()
{
string s = "foo:bar-this-is-more_text#\venus \"some more text here to read.\"";
vector<string> first_part;
fstream outfile;
outfile.open("out.foobar");
for (int i = 0; i < s.size(); ++i){
cout << "s[" << i << "]: " << s[i] << endl;
outfile << s[i] << endl;
}
return 0;
}
Also, assume that I do not want to modify my string 's' in this case. I want to be able to parse each character of the string and work around the strange character somehow.This is because in the actual program the string will be read in from a file and parsed then sent to another function. I guess I could figure out a way to programmatically add backslashes...
How can I have my program output just a backslash v?
If you want a backslash, then you need to escape it: "#\\venus".
This is required because a backslash denotes that the next character should be interpreted as something special (note that you were already using this when you wanted double-quotes). So the compiler has no way of knowing you actually wanted a backslash unless you tell it.
A literal backslash character therefore has the syntax \\. This is the case in both string literals ("\\") and character literals ('\\').
Why does my C++ program create the strange character shown below in the picture?
Your string contains the \v control character (vertical tab), and the way it's displayed is dependent on your terminal and font. It looks like your terminal is using symbols from the traditional MSDOS code page.
I found an image for you here, which shows exactly that symbol for the vertical tab (vt) entry at value 11 (0x0b):
Also, assume that I do not want to modify my string 's' in this case. I want to be able to parse each character of the string and work around the strange character somehow.
Well, I just saw you add the above part to your question. Now you're in difficult territory. Because your string literal does not actually contain the character v or any backslashes. It only appears that way in code. As already said, the compiler has interpreted those characters and substituted them for you.
If you insist on printing v instead of a vertical tab for some crazy reason that is hopefully not related to an XY Problem, then you can construct a lookup-table for every character and then replace undesirables with something else:
char lookup[256];
std::iota( lookup, lookup + 256, 0 ); // Using iota from <numeric>
lookup['\v'] = 'v';
for (int i = 0; i < s.size(); ++i)
{
cout << "s[" << i << "]: " << lookup[s[i]] << endl;
outfile << lookup[s[i]] << endl;
}
Now, this won't print the backslashes. To undo the string further check out std::iscntrl. It's locale-dependent, but you could utilise it. Or just something naive like:
const char *lookup[256] = { 0 };
s['\f'] = "\\f";
s['\n'] = "\\n";
s['\r'] = "\\r";
s['\t'] = "\\t";
s['\v'] = "\\v";
s['\"'] = "\\\"";
// Maybe add other controls such as 0x0E => "\\x0e" ...
for (int i = 0; i < s.size(); ++i)
{
const char * x = lookup[s[i]];
if( x ) {
cout << "s[" << i << "]: " << x << endl;
outfile << x << endl;
} else {
cout << "s[" << i << "]: " << s[i] << endl;
outfile << s[i] << endl;
}
}
Be aware there is no way to correctly reconstruct the escaped string as it originally appeared in code, because there are multiple ways to escape characters. Including ordinary characters.
Most likely the terminal that you are using cannot decipher the vertical space code "\v", thus printing something else. On my terminal it prints:
foo:bar-this-is-more_text#
enus "some more text here to read."
To print the "\v" change or code to:
String s = "foo:bar-this-is-more_text#\\venus \"some more text here to read.\"";
What am I missing? How can I have my program output just a backslash v?
You are escaping the letter v. To print backslash and v, escape the backslash.
That is, print double backslash and a v.
\\v

cout partially print without endl

I'm printing a bunch of strings as following:
cout<<count<<"|"<<newTime.time<<"|"<<newCat<<"|"<<newCon<<endl;
in which count is a counter, newTime.time is a string of time, and newCat and newCon are both strings.
The output is like following:
06:02:11:20:08|DB Mgr|Sending query: “SELECT * FROM users”
Apparently, it left out the count and "|". However, if I change the code into
cout<<count<<"|"<<endl;
cout<<newTime.time<<"|"<<newCat<<"|"<<newCon<<endl;
The output just turned into
2|
06:02:11:20:08|DB Mgr|Sending query: “SELECT * FROM users”
I was first thinking if this is the problem of buffer. I changed endl to flush but the problem still exists.
Thanks for any help.
It sounds like your time string may have a carriage return \r in it. If that's the case, then outputting using your first method will still output the count and separator, but the \r will return to the start of the line and begin overwriting it.
Your second method will not overwrite the count since it's on the previous line (a \r will have little visible effect if you're already at the start of the line).
If you're running on a UNIX-like platform, you can pipe the output through something like od -xcb (a hex dump filter) to see if there is a \r in the output.
Alternatively, if you have a string in your code, you can see if it contains a carriage return with something like:
std::string s = "whatever";
size_t pos = s.find ('\r');
if (pos != std::string::npos) {
// carriage return was found.
}
By way of example, the following program:
#include <iostream>
int main (void) {
std::string s1 = "strA";
std::string s2 = "\rstrB";
std::string s3 = "strC";
std::cout << s1 << '|' << s2 << '|' << s3 << '\n';
std::cout << "=====\n";
std::cout << s1 << '|' << '\n';
std::cout << s2 << '|' << s3 << '\n';
std::cout << "=====\n";
size_t pos = s2.find ('\r');
if (pos != std::string::npos)
std::cout << "CR found at " << pos << '\n';
return 0;
}
seems to output the following:
strB|strC
=====
strA|
strB|strC
=====
CR found at 0
but in fact that first line is actually:
strA|(\r)strB|strC
where (\r) is the carriage return.
And keep in mind you rarely need endl - it's effectively a \n with a flush which is not really necessary in most cases. You can just get away with using \n and let the automated flushing take care of itself.

How to use wildcard for strings (matching and replacing)?

I want to search for a number of letters including ? replaced by a letter matched in a string in C++.
Think of a word like abcdefgh. I want to find an algorithm to search for an input ?c for any letter replaced by ?, and finds bc, but also it should also check for ?e? and find def.
Do you have any ideas?
How about using boost::regex? or std::regex if you're using c++11 enabled compilers.
If you just want to support ?, that's pretty easy: when you encounter a ? in the pattern, just skip ahead over one byte of input (or check for isalpha, if you really meant you only want to match letters).
Edit: Assuming the more complex problem (finding a match starting at any position in the input string), you could use code something like this:
#include <string>
size_t match(std::string const &pat, std::string const &target) {
if (pat.size() > target.size())
return std::string::npos;
size_t max = target.size()-pat.size()+1;
for (size_t start =0; start < max; ++start) {
size_t pos;
for (pos=0; pos < pat.size(); ++pos)
if (pat[pos] != '?' && pat[pos] != target[start+pos])
break;
if (pos == pat.size())
return start;
}
return std::string::npos;
}
#ifdef TEST
#include <iostream>
int main() {
std::cout << match("??cd?", "aaaacdxyz") << "\n";
std::cout << match("?bc", "abc") << "\n";
std::cout << match("ab?", "abc") << "\n";
std::cout << match("ab?", "xabc") << "\n";
std::cout << match("?cd?", "cdx") << "\n";
std::cout << match("??cd?", "aaaacd") << "\n";
std::cout << match("??????", "abc") << "\n";
return 0;
}
#endif
If you only want to signal a yes/no based on whether the whole pattern matches the whole input, you do pretty much the same thing, but with the initial test for != instead of >, and then basically remove the outer loop.
Or if you insist on "wildcards" in the form you exhibit the term you want to search for is "glob"s (at least on unix-like systems).
The c-centric API is to be found in glob.h on unix-like systems, and consists of two calls glob and globfree in section 3 of the manual.
Switching to full regular expressions will allow you to use a more c++ approach as shown in the other answers.