C++ substr() problems when string contains special characters - c++

I'm trying to split a c++ string into a number of substrings (NUM_LINES) each with the length of CHAR_PER_LINE.
for(int i = 0; i < NUM_LINES; i++) {
lines[i] = totalstring.substr(i*CHAR_PER_LINE,CHAR_PER_LINE);
}
Works fine as long as there's no special character in the string. Otherwise substr() gets me a string that isn't CHAR_PER_LINE characters long, but stops right before a special character and exits the loop.
Any hints?
ok, edit:
1) I'm definitely not reaching the end of my string. If my totalstring.length() is 1000 and I have a special character in the first line (that is the first CHAR_PER_LINE (30) chars of the string) the loop exits.
2) Special characters I had problems with are for instance 'ö' and '–' (the long one)
EDIT 2:
std::string text = "aaaabbbbccccdödd";
std::string line[4];
for(int i = 0; i < 4; i++)
line[i] = text.substr(i*4,4);
for(int i = 0; i < 4; i++)
std::cout << line[i] << "\n";
This example works. I get a '%' for the ö.
So the problem wasn't substr(). Sorry. I'm using Cairo to create a gui and it seems my Cairo output is causing the troubles, not substr().

How about a hint of what special characters you're talking about?
My guess is that you reached the end of the string.

The STL doesn't care of special characters. If there are multibyte sequences (i.e. UTF8), std::string treats them as a sequence of single one-byte-characters. If you need proper Unicode handling, do not use the builtin substr or length.
You can, however, use std::wstring (from your posting it isn't clear whether you're already using it, but I guess not) - it holds wchar_t characters - large enough for the native character set of your target platform.

What's happening is that you're running off the end of the string on the last line. It isn't exiting the loop after skipping characters. It exits the loop precisely when it should, and the last line contains the right number of characters, it's just that some of them are garbage so your diagnositic printout is showing that the line is short.
The only way the loop could be exited early is if an exception were thrown.

Related

C++, how to remove char from a string

I have to remove some chars from a string, but I have some problems. I found this part of code online, but it does not work so well, it removes the chars but it even removes the white spaces
string messaggio = "{Questo e' un messaggio} ";
char chars[] = {'Ì', '\x1','\"','{', '}',':'};
for (unsigned int i = 0; i < strlen(chars); ++i)
{
messaggio.erase(remove(messaggio.begin(), messaggio.end(), chars[i]), messaggio.end());
}
Can someone tell me how this part of code works and why it even removes the white spaces?
Because you use strlen on your chars array. This function stops ONLY when it encounters a \0, and you inserted none... So you're parsing memory after your array - which is bad, it should even provoke a SEGFAULT.
Also, calling std::remove is enough.
A correction could be:
char chars[] = {'I', '\x1','\"','{', '}',':'};
for (unsigned int i = 0; i < sizeof(chars); ++i)
{
std::remove(messaggio.begin(), messaggio.end(), chars[i]) ;
}
Answer for Wissblade is more or less correct, it just lacks of some technical details.
As mentioned strlen searches for terminating character: '\0'.
Since chars do not contain such character, this code invokes "Undefined behavior" (buffer overflow).
"Undefined behavior" - means anything can happen, code may work, may crash, may give invalid results.
So first step is to drop strlen and use different means to get size of the array.
There is also another problem. Your code uses none ASCII character: 'Ì'.
I assume that you are using Windows and Visual Studio. By default msvc compiler assumes that file is encoded using your system locale and uses same locale to generate exactable. Windows by default uses single byte encoding specific to your language (to be compatible with very old software). Only in such chase you code has chance to work. On platforms/configuration with mutibyte encoding, like UTF-8 this code can't work even after Wisblade fixes.
Wisblade fix can take this form (note I change order of loops, now iteration over characters to remove is internal loop):
bool isCharToRemove(char ch)
{
constexpr char skipChars[] = {'Ì', '\x1','\"','{', '}',':'};
return std::find(std::begin(skipChars), std::end(skipChars), ch) != std::end(skipChars);
}
std::string removeMagicChars(std::string message)
{
message.erase(
std::remove_if(message.begin(), message.end(), isCharToRemove),
message.end());
}
return message;
}
Let me know if you need solution which can handle more complex text encoding.

want to optimize this string manipulation program c++

I've just solve this problem:
http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem&problem=3139
Here's my solution:
https://ideone.com/pl8K3K
int main(void)
{
string s, sub;
int f,e,i;
while(getline(cin, s)){
f=s.find_first_of("[");
while(f< s.size()){
e= s.find_first_of("[]", f+1);
sub = s.substr(f, e-f);
s.erase(f,e-f);
s.insert(0, sub);
f=s.find_first_of("[", f+1);
}
for(i=0; i<s.size(); i++){
while((s[i]==']') || (s[i]=='[')) s.erase(s.begin()+i);
}
cout << s << endl;
}
return 0;
}
I get TLE ,and I wanna know which operation in my code costs too expensive and somehow optimize the code..
Thanks in advance..
If I am reading your problem correctly, you need to rethink your design. There is no need for functions to search, no need for erase, substr, etc.
First, don't think about the [ or ] characters right now. Start out with a blank string and add characters to it from the original string. That is the first thing that speeds up your code. A simple loop is what you should start out with.
Now, while looping, when you actually do encounter those special characters, all you need to do is change the "insertion point" in your output string to either the beginning of the string (in the case of [) or the end of the string (in the case of ]).
So the trick is to not only build a new string as you go along, but also change the point of insertion into the new string. Initially, the point of insertion is at the end of the string, but that will change if you encounter those special characters.
If you are not aware, you can build a string not by just using += or +, but also using the std::string::insert function.
So for example, you always build your output string this way:
out.insert(out.begin() + curInsertionPoint, original_text[i]);
curInsertionPoint++;
The out string is the string you're building, the original_text is the input that you were given. The curInsertionPoint will start out at 0, and will change if you encounter the [ or ] characters. The i is merely a loop index into the original string.
I won't post any more than this, but you should get the idea.

capitalizing characters, how toupper works?

I need to use a character array and take the characters in the array and capitalize and lower case them as necessary. I was looking at the toupper and its example, but I'm confused about how this works. Looking from the example given on cplusplus.com I wrote
int main(){
int i = 0;
char str[] = "This is a test.";
while(str[i]){
putchar(toupper(str[i]));
i++;
}
for(int i = 0; i < 15; i++){
cout << str[i];
}
}
and there are two things I don't understand about this. The first is that without the cout at the bottom, the program prints out THIS IS A TEST. Does putchar print to the screen? (the use of putchar is not explained on the example). But my second more important question is why does the cout at the bottom still print out This is a test.? Does it not change the chars in str[]? Is there another way I should be doing this (keeping in mind I need to use character arrays)?
Yes, putchar() prints a character to the program's standard output. That is its purpose. It is the source of the uppercase output.
The cout at the bottom of the program prints the original string because you never modified it. The toupper() function doesn't -- indeed can't -- modify its argument. Instead, it returns the uppercased char.
putchar writes a single character to output: http://www.cplusplus.com/reference/cstdio/putchar/
As a result, the first while loop converts each character from str one at a time to upper case and outputs them. HOWEVER, it does not change the contents of str - this explains the lower case output from the second loop.
Edit:
I've expanded the first loop:
// Loop until we've reached the end of the string 'str'
while(str[i]){
// Convert str[i] to upper case, but then store that elsewhere. Do not modify str[i].
char upperChar = toupper(str[i]);
// Output our new character to the screen
putchar(upperChar);
i++;
}

Why it works when I type multiple characters in a character variable?

I am a new C++ user.
My code is as following:
#include <iostream>
using namespace std;
int main()
{
int option = 1;
char abstract='a';
while(option == 1){
char temp;
cin>> temp;
abstract = temp;
cout << abstract;
option = 1;
if(abstract == '!'){
option = 0;
}
}
return 0;
}
And when I typed something like: abcdefg
all the characters are on the screen,why? It's just because of the compiler?
In fact, only one character at a time is stored in your char. cin>>temp; reads a single char at a time since more characters would not fit there. The loop simply reads and prints one character after the other.
As a visualization hint, try echoing your characters with cout<<abstract<<endl;. You will see a single character per line/iteration.
Your terminal does not restrict the number of characters typed in , that's why you can type as many as you want. Your c++ compiler would read only one of the characters because 'temp' is of type char. you can type an 'if' statement to check the number of characters typed in the terminal
Because of the while loop, which processes each character in turn. Not sure what you expected to happen.
Print it out with delimiters to see that there's never more than a single character printed per iteration:
cout << "'" << abstract << "'";
The terminal window itself is responsible for reading the characters and echoing them back to the screen. Your C++ program asks the terminal for characters and, in this sort of program at least, has no effect on how those characters are displayed.

C : Using substr to parse a text file

I just need a little help with file parsing. We have to parse a file that has 6 string entries per row in the format:
"string1", "string2", "string3", "string4", "string5", "string6"
My instructor recently gave us a little piece of code as a "hint," and I'm supposed to use it. Unfortunately, I can't figure out how to get it to work. Here's my file parsing function.
void parseData(ifstream &myFile, Book bookPtr[])
{
string bookInfo;
int start, end;
string bookData[6];
getline(myFile, bookInfo);
start = -2;
myFile.open("Book List.txt");
for (int j = 0; j < 6; j++)
{
start += 3;
end = bookInfo.find('"', start);
bookData[j] = bookInfo.substr(start, end-start);
start = end;
}
}
So I'm trying to read the 6 strings into an array of strings. Can someone please help walk me through the process?
start = -2;
for (int j = 0; j < 6; j++)
{
start += 3;
end = bookInfo.find('"', start);
bookData[j] = bookInfo.substr(start, end-start);
start = end;
}
So ", " is four characters. The leading closing quote is 3 characters behind the opening closing quote.
At entry to the loop start is pointing to the last closing quote. (On first entry to loop it is faked as -2 to be pointing to the closing quote of the imaginary "-1th" element.)
So we advance from the last closing quote to the following opening quote:
start += 3;
Then we use std::string::find to find the closing quote:
end = bookInfo.find('"', start);
The offset tells it to ignore all characters up to and including that position.
We then have the two quote positions, start..end, so we use substr to extract the string:
bookData[j] = bookInfo.substr(start, end-start);
And we then update start for the next loop to be the last closing quote:
start = end
Please, for your own sake, create a minimal example. This starts with a string like the line you gave as example and ends with the different parts in an array. Leave the loading from a file out for now, getline() seems to work for you, or? Then, do not declare every variable you might want to use at the beginning of a function. This is not ancient C, where you simply had to do that or introduce additional {} blocks. There is another thing odd, and that is the Book bookPtr[]. This is indeed just a Book* bookPtr, i.e. you are not passing an array to a function but just a pointer. Don't fall for this misleading syntax, it's a lie! Anyway, you don't seem to be using that pointer to the object(s) of the unknown type anyway.
Concerning the splitting of a line into strings, one approach is to locate pairs of double quotes. Everything in between is one of the strings, everything without is irrelevant. The string class has a find() function which optionally takes a starting position. Starting position is always one behind the previously found position.
Your code above seems to assume that there is exactly one double quote, a comma, a space and another double quote that separates two strings. This isn't 100% clear, I would also be prepared for handling multiple spaces or no space at all. Also, is the comma guaranteed? Are the double quotes guaranteed? Anyway, keep it simple. Unless you get a better spec on the input, just assume that only the parts between the quotes is what differs.
Then, what exactly works and what doesn't? You need to ask more specific questions and give more detailed information. The code above doesn't look broken per se, although there are a few things a bit off. For example, you don't typically pass ifstreams to a function, but use the istream baseclass. In your case, you read a line from that file and then open another file using the same fstream object, which doesn't make sense to me, since you don't use it after that. If you only needed that stream locally, you would create and open it there (handling errors of course!) and pass in the filename as parameter only.