How to deal with garbage characters in a string? - c++

Suppose I have a string that contains a necessary numeric character but it is not terminated by '/0', it has garbage characters instead. Actually, the string has garbage characters after the number. So how to deal with the garbage character while storing that numerical character in another string or variable?

So how to deal with the garbage character while storing that numerical character in another string or variable?
Only copy a substring. Example:
std::string example "garbage1garbage";
char numerical = example[7];
We got the numerical character excluding the garbage entirely.

If the text be converted is in a std::string, then you can extract a number from the front as follows:
#include <sstream>
...
std::string input = "128734garbage";
std::istringstream iss{input};
int num;
if (iss >> num)
...use_num...
else
std::cerr << "wasn't able to parse an int from input\n";
Just change int to double, uint64_t, ... - whatever suits your data.
If you have only a pointer to the text and know it's not null-terminated, just getting the text into a std::string is problematic. You could instead use a function that converts text to a number, but stops at the first invalid character. std::stol et al, and the other unsigned and floating point variants linked from the same reference page, are good candidates for that.
From your "another string or variable" - the above addresses storing into a numeric variable. You can then create a new std::string from the number using std::to_string, or a std::ostringstream, if that's what you want to do. This will standardise the output format though, so input like say "1E4" might end up looking like say 1000.0. Alternatively, with the stol-type functions you can use the pointer-to-the-end-of-the-number to work out the length of the numeric part, and use std::string::substr() to extract the leading number as a new std::string object.
You should also be aware that the distinction between number and garbage is not always what you might expect. For example "0XBEFHJQ" might be split by some of the above functions as 0xBEF hex and HJQ garbage.

Related

C++ Atoi can't handle special characters

Im using this atoi to remove all letters from the string. But my string uses special characters as seen below, because of this my atoi exits with an error. What should I do to solve this?
#include <iostream>
#include <string>
using namespace std;
int main() {
std::string playerPickS = "Klöver 12"; // string with special characters
size_t i = 0;
for (; i < playerPickS.length(); i++) { if (isdigit(playerPickS[i])) break; }
playerPickS = playerPickS.substr(i, playerPickS.length() - i); // convert the remaining text to an integer
cout << atoi(playerPickS.c_str());
}
This is what I believe is the error. I only get this when using those special characters, thats why I think thats my problem.
char can be signed or unsigned, but isidigt without a locale overload expects a positive number (or EOF==-1). In your encoding 'ö' has a negative value. You can cast it to unsigned char first: is_digit(static_cast<unsigned char>(playerPickS[i])) or use the locale-aware variant.
atoi stops scanning when it finds something that's not a digit (roughly speaking). So, to get it to do what you want, you have to feed it something that at least starts with the string you want to convert.
From the documentation:
[atoi] Discards any whitespace characters until the first non-whitespace character is found, then takes as many characters as possible to form a valid integer number representation and converts them to an integer value. The valid integer value consists of the following parts:
(optional) plus or minus sign
numeric digits
So, now you know how atoi works, you can pre-process your string appropriately before passing it in. Good luck!
Edit: If your call to isdigit is failing to yield the desired result, the clue lies here:
The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF.
So you need to check for that yourself before you call it. Casting playerPickS[i] to an unsigned int will probably work.

convert uint8_t array to string in c++

This can be marked solved. The problem was the print macro. ESP_LOGx can't put out c++ Strings.
I'm trying to convert an uin8_t array to a string in c++.
The array is defined in a header file like this:
uint8_t mypayload[1112];
Printing the array itself works, so I'm sure it's not empty.
now I'm trying to convert it to a string:
string qrData;
std::string qrData(reinterpret_cast<char const*>(mypayload), sizeof mypayload);
I also tried:
qrData = (char*)mypayload;
printing the string results in 5 random chars.
Does anybody have hint where I made a mistake?
The only correct comment so far is from Some programmer dude. So all credits go to him.
The comment from Ian4264 is flat wrong. Of course you can do a reinterpret_cast.
Please read here about the constructors of a std::string. You are using constructor number 4. The description is:
4) Constructs the string with the first count characters of character string pointed to by s. s can contain null characters. The length of the string is count. The behavior is undefined if [s, s + count) is not a valid range.
So, even if the string contains 0 characters, the C-Style string-"terminator", all bytes of the uint8_t arrays will be copied. And if you print the string, then it will print ALL characters, even the none printable characters after the '\0'.
That maybe your "random" characters. Because the string after your "terminator" does most probably contain uninitialized values.
You should consider to use the constructor number 5
5) Constructs the string with the contents initialized with a copy of the null-terminated character string pointed to by s. The length of the string is determined by the first null character. The behavior is undefined if [s, s + Traits::length(s)) is not a valid range.
And if you need to add bytes, also possible. The std::string can grow dynamically.
BTW: you do define your "std::string qrData" double, which will not compile
Since you know the size of your data in another variable, why are you using sizeof? It will give you the size of the array, not the size of your data.
This should give you the right result, assuming no other errors in your code
std::string qrData(reinterpret_cast<char const*>(mypayload), data->payload_len);
Incidentally in the code you quoted why is qrData declared twice? That seems a bit suspicious.
qrData = (const char*)mypayload;
string is accept only const char*.
String s = String((char *)data, len); //esp32

How to save text file to struct with string in C++

I'm wanting to save the content of a file to a struct. I've tried to use seekg and read to write to it but it isn't working.
My file is something like:
johnmayer24ericclapton32
I want to store the name, the last name and the age in a struct like that
typedef struct test_struct{
string name;
string last_name;
int age;
} test_struct;
Here is my code
int main(){
test_struct ts;
ifstream data_base;
data_base.open("test_file.txt");
data_base.seekg(0, ios_base::beg);
data_base.read(ts, sizeof(test_struct));
data_base.close();
return 0;
}
It doesn't compile as it don't want me to use ts on the read function. Is there another way - or a way - of doing it?
Serialization/Deserialization of strings is tricky.
As binary data the convention is to output the length of the string first, then the string data.
https://isocpp.org/wiki/faq/serialization#serialize-binary-format
String data is tricky because you have to unambiguously know when the string’s body stops. You can’t unambiguously terminate all strings with a '\0' if some string might contain that character; recall that std::string can store '\0'. The easiest solution is to write the integer length just before the string data. Make sure the integer length is written in “network format” to avoid sizeof and endian problems (see the solutions in earlier bullets).
That way when reading the data back in you know the length of the string to expect and can preallocate the size of the string then just read that much data from the stream.
If your data is a non-binary (text) format it's a little trickier:
https://isocpp.org/wiki/faq/serialization#serialize-text-format
String data is tricky because you have to unambiguously know when the string’s body stops. You can’t unambiguously terminate all strings with a '\n' or '"' or even '\0' if some string might contain those characters. You might want to use C++ source-code escape-sequences, e.g., writing '\' followed by 'n' when you see a newline, etc. After this transformation, you can either make strings go until end-of-line (meaning they are deliminated by '\n') or you can delimit them with '"'.
If you use C++-like escape-sequences for your string data, be sure to always use the same number of hex digits after '\x' and '\u'. I typically use 2 and 4 digits respectively. Reason: if you write a smaller number of hex digits, e.g., if you simply use stream << "\x" << hex << unsigned(theChar), you’ll get errors when the next character in the string happens to be a hex digit. E.g., if the string contains '\xF' followed by 'A', you should write "\x0FA", not "\xFA".
If you don’t use some sort of escape sequence for characters like '\n', be careful that the operating system doesn’t mess up your string data. In particular, if you open a std::fstream without std::ios::binary, some operating systems translate end-of-line characters.
Another approach for string data is to prefix the string’s data with an integer length, e.g., to write "now is the time" as 15:now is the time. Note that this can make it hard for people to read/write the file, since the value just after that might not have a visible separator, but you still might find it useful.
Text-based serialization/deserialization convention varies but one field per line is an accepted practice.
You'll have to develop a specific algorithm, since there is no separator character between the "fields".
static const std::string input_text = "johnmayer24ericclapton32";
static const std::string alphabet = "abcdefghijklmnopqrstuvwxyz";
static const std::string decimal_digit = "0123456789";
std::string::size_type position = 0;
std::string artist_name;
position = input_text.find_first_not_of(alphabet);
if (position != std::string::npos)
{
artist_name = input_text.substr(0, position - 1);
}
else
{
cerr << "Artist name not found.";
return EXIT_FAILURE;
}
Similarly, you can extract out the number, then use std::stoi to convert the numeric string to internal representation number.
Edit 1: Splitting the name
Since there is no separator character between the first and last name, you may want to have a list of possible first names and use that to find out where the first name ends and the surname starts.

Converting integer to string in c++

This is the code I wrote to convert integer to string.
#include <iostream>
using namespace std;
int main()
{
string s;
int b=5;
s.push_back((char)b);
cout<<s<<endl;
}
I expected the output to be 5 but it is giving me blank space.
I know there is another way of doing it using stringstream but I want to know what is wrong in this method?
Character code for numbers are not equal to the integer the character represents in typical system.
It is granteed that character codes for decimal digits are consecutive (N3337 2.3 Character sets, Paragraph 3), so you can add '0' to convert one-digit number to character.
#include <iostream>
using namespace std;
int main()
{
string s;
int b=5;
s.push_back((char)(b + '0'));
cout<<s<<endl;
}
You are interpreting the integer 5 as a character. In ASCII encoding, 5 is the Enquiry control character as you lookup here.
The character 5 on the other hand is represented by the decimal number 53.
As others said, you can't convert an integer to a string the way you are doing it.
IMHO, the best way to do it is using the C++11 method std::to_string.
Your example would translate to:
using namespace std;
int main()
{
string s;
int b=5;
s = to_string(b);
cout<<s<<endl;
}
The problem in your code is that you are converting the integer 5 to ASCII (=> ENQ ASCII code, which is not "printable").
To convert it to ASCII properly, you have to add the ASCII code of '0' (48), so:
char ascii = b + '0';
However, to convert an integer to std::string use:
std::stringstream ss; //from <sstream>
ss << 5;
std::string s = ss.str ();
I always use this helper function in my projects:
template <typename T>
std::string toString (T arg)
{
std::stringstream ss;
ss << arg;
return ss.str ();
}
Also, you can use stringstream,
std::to_string doesn't work for me on GCC
If we were writing C++ from scratch in 2016, maybe we would make this work. However as it choose to be (mostly) backward compatible with a fairly low level language like C, 'char' is in fact just a number, that string/printing algorithms interpret as a character -but most of the language doesn't treat special. Including the cast. So by doing (char) you're only converting a 32 bit signed number (int) to a 8 bit signed number (char).
Then you interpret it as a character when you print it, since printing functions do treat it special. But the value it gets printed to is not '5'. The correspondence is conventional and completely arbitrary; the first numbers were reserved to special codes which are probably obsolete by now. As Hoffman pointed out, the bit value 5 is the code for Enquiry (whatever it means), while to print '5' the character has to contain the value 53. To print a proper space you'd need to enter 32. It has no meaning other than someone decided this was as good as anything, sometime decades ago, and the convention stuck.
If you need to know for other characters and values, what you need is an "ASCII table". Just google it, you'll find plenty.
You'll notice that numbers and letters of the same case are next to each other in the order you expect, so there is some logic to it at least. Beware, however, it's often not intuitive anyway: uppercase letters are before lowercase ones for instance, so 'A' < 'a'.
I guess you're starting to see why it's better to rely on dedicated system functions for strings!

Mistake using scanf

could you say me what is the mistake in my following code?
char* line="";
printf("Write the line.\n");
scanf("%s",line);
printf(line,"\n");
I'm trying to get a line as an input from the console.But everytime while using "scanf" the program crashes. I don't want to use any std, I totally want to avoid using cin or cout. I'm just trying to learn how to tak a full line as an input using scanf().
Thank you.
You need to allocate the space for the input string as sscanf() cannot do that itself:
char line[1024];
printf("Write the line.\n");
scanf("%s",line);
printf(line,"\n");
However this is dangerous as it's possible to overflow the buffer and is therefore a security concern. Use std::string instead:
std::string line;
std::cout << "Write the line." << std::endl;
std::cin >> line;
std::cout << line << std::endl;
or:
std::getline (std::cin, line);
Space not allocated for line You need to do something like
char *line = malloc();
or
Char line[SOME_VALUE];
Currently line is a poor pointer pointing at a string literal. And overwriting a string literal can result in undefined behaviour.
scanf() doesn't match lines.
%s matches a single word.
#include <stdio.h>
int main() {
char word[101];
scanf("%100s", word);
printf("word <%s>\n", word);
return 0;
}
input:
this is a test
output:
word <this>
to match the line use %100[^\n"] which means 100 char's that aren't newline.
#include <stdio.h>
int main() {
char word[101];
scanf("%100[^\n]", word);
printf("word <%s>\n", word);
return 0;
}
You are trying to change a string literal, which in C results in Undefined behavior, and in C++ is trying to write into a const memory.
To overcome it, you might want to allocate a char[] and assign it to line - or if it is C++ - use std::string and avoid a lot of pain.
You should allocate enough memory for line:
char line[100];
for example.
The %s conversion specifier in a scanf call expects its corresponding argument to point to a writable buffer of type char [N] where N is large enough to hold the input.
You've initialized line to point to the string literal "". There are two problems with this. First is that attempting to modify the contents of a string literal results in undefined behavior. The language definition doesn't specify how string literals are stored; it only specifies their lifetime and visibility, and some platforms stick them in a read-only memory segment while others put them in a writable data segment. Therefore, attempting to modify the contents of a string literal on one platform may crash outright due to an access violation, while the same thing on another platform may work fine. The language definition doesn't mandate what should happen when you try to modify a string literal; in fact, it explicitly leaves that behavior undefined, so that the compiler is free to handle the situation any way it wants to. In general, it's best to always assume that string literals are unwritable.
The other problem is that the array containing the string literal is only sized to hold 1 character, the 0 terminator. Remember that C-style strings are stored as simple arrays of char, and arrays don't automatically grow when you add more characters.
You will need to either declared line as an array of char or allocate the memory dynamically:
char line[MAX_INPUT_LEN];
or
char *line = malloc(INITIAL_INPUT_LEN);
The virtue of allocating the memory dynamically is that you can resize the buffer as necessary.
For safety's sake, you should specify the maximum number of characters to read; if your buffer is sized to hold 21 characters, then write your scanf call as
scanf("%20s", line);
If there are more characters in the input stream than what line can hold, scanf will write those extra characters to the memory following line, potentially clobbering something important. Buffer overflows are a common malware exploit and should be avoided.
Also, %s won't get you the full line; it'll read up to the next whitespace character, even with the field width specifier. You'll either need to use a different conversion specifier like %[^\n] or use fgets() instead.
The pointer line which is supposed to point to the start of the character array that will hold the string read is actually pointing to a string literal (empty string) whose contents are not modifiable. This leads to an undefined behaviour manifested as a crash in your case.
To fix this change the definition to:
char line[MAX]; // set suitable value for MAX
and read atmost MAX-1 number of characters into line.
Change:
char* line="";
to
char line[max_length_of_line_you_expect];
scanf is trying to write more characters than the reserved by line. Try reserving more characters than the line you expect, as been pointed out by the answers above.