bulletproof use of from_chars() - c++

I have some literal strings which I want to convert to integer and even double. The base is 16, 10, 8, and 2.
At this time, I wonder about the behavior of std::from_chars() - I try to convert and the error code inside from_chars_result return holds success - even if it isn't as shown here:
#include <iostream>
#include <string_view>
#include <charconv>
using namespace std::literals::string_view_literals;
int main()
{
auto const buf = "01234567890ABCDEFG.FFp1024"sv;
double d;
auto const out = std::from_chars(buf.begin(), buf.end(), d, std::chars_format::hex);
if(out.ec != std::errc{} || out.ptr != buf.end())
{
std::cerr << buf << '\n'
<< std::string(std::distance(buf.begin(), out.ptr), ' ') << "^- here\n";
auto const ec = std::make_error_code(out.ec);
std::cerr << "err: " << ec.message() << '\n';
return 1;
}
std::cout << d << '\n';
}
gives:
01234567890ABCDEFG.FFp1024
^- here
err: Success
For convenience also at coliru.
In my use case, I'll check the character set before but, I'm not sure about the checks to make it bulletproof. Is this behavior expected (maybe my English isn't sufficient, or I didn't read carefully enough)? I've never seen such checks on iterators on blogs etc.
The other question is related to different base like 2 and 8. Base of 10 and 16 seems to be supported - what would be the way for the other two bases?
Addendum/Edit:
Bulletproof here means that I can have nasty things in the string. The obvious thing for me is that 'G' is not a hex character. But I would have expected an appropriate error code in some way! The comparison out.ptr != buf.end() I've never seen in blogs (or I didn't read the right ones :)
If I enter a crazy long hex float, at least a numerical result out of range comes up.
By bulletproof I also mean that I can find such impossible strings by length, for example, so that I can save myself the call to from_chars() - for float/doubles and integers (here I would 'strlen' compare digits10 from std::numeric_limits).

The from_chars utility is designed to convert the first number it finds in the string and to return a pointer to the point where it stopped. This allows you to parse strings like "42 centimeters" by first converting the number and then parsing the rest of the string yourself for what comes after it.
The comparison out.ptr != buf.end() I've never seen in blogs (or I didn't read the right ones :)
If you know that the entire string should be a number, then checking that the pointer in the result points to the end of the string is the normal way to ensure that from_chars read the entire string.

Related

Bit manipulation on character string

Can we apply bit manipulation on a character string?
If so, is it always possible to retrieve back a character string from the manipulated string?
I was hoping to use the XOR operator on two strings by converting them to binary and then back to character string.
I took up some code from another StackOverflow question but it only solves half the problem
std::string TextToBinaryString(string words)
{
string binaryString = "";
for (char& _char : words)
{
binaryString +=std::bitset<8>(_char).to_string();
}
return binaryString;
}
I don't know how to convert this string of ones and zeroes back to a string of characters.
I did read std::stio in some google search results as a solution but was not able to understand them.
The manipulation that I wish to do is
std::string message("Hello World");
int n = message.size();
bin_string = TextToBinaryString(message)
std::string left,right;
bin_string.copy(left,n/2,0);
bin_string.copy(right,n,n/2);
std::string result = left^right;
I know I can hardcode this by picking up every entry and applying the operation but it is the conversion of the binary string back to characters that are making me scratch my head.
*EDIT: *I am trying to implement a cipher framework called Feistel cipher (SORRY, should had made that clear before) there they use the property of XOR that when you XOR something with the same thing again it cancels out... For eg. (A^B)^B=A. I wanted to output the ciphered jibberish in the middle. Hence, the query.
Can we apply bit manipulation on a character string?
Yes.
A character is an integer type, so you can do anything to them you can do to any other integer. What happened when you tried?
If so, is it always possible to retrieve back a character string from the manipulated string?
No. It is sometimes possible to recover the original string, but some manipulations are not reversible.
XOR, the particular operation you asked about, is self-reversing, so it works in that case but not in general.
A cheesy example (depends on ASCII character set, don't do this in real code for converting case, etc. etc.)
#include <iostream>
#include <string>
int main() {
std::string s("a");
std::cout << "original: " << s << '\n';
s[0] ^= 0x20;
std::cout << "modified: " << s << '\n';
s[0] ^= 0x20;
std::cout << "restored: " << s << '\n';
}
shows (on an ASCII-compatible) system
original: a
modified: A
restored: a
Note that I'm not converting "a" into "1100001" first, and then using XOR (somehow) zero bit 5 giving "1000001" and then converting that back into "A". Why would I?
This part of your question suggests you don't understand the difference between values and representations: the character is always stored in binary. You can also always treat it as if it is stored in octal, or in decimal, or in hexadecimal - the choice of base only affects how we write (or print) the value, and not what the value is in itself.
Writing a Feistel cipher where the plaintext and key are the same length is trivial:
std::string feistel(std::string const &text, std::string const &key)
{
std::string result;
std::transform(text.begin(), text.end(), key.begin(),
std::back_inserter(result),
[](char a, char b) { return a^b; }
);
return result;
}
This doesn't work at all if the key is shorter, though - looping round the key appropriately is left as an exercise for the reader.
Oh, and printing the encoded string is unlikely to work nicely (unless your key is helpfully just a sequence of space characters, as above).
You probably want something like this:
#include<string>
#include<cassert>
using namespace std;
std::string someBitmanipulation(string words)
{
std::string manipulatedstring;
for (char& thechar : words)
{
thechar ^= 0x5A; // xor with 0x5A
}
return manipulatedstring;
}
int main()
{
std::string original{ "ABC" };
// xor each char of original with 0x5a at put result into manipulated
auto manipulated = someBitmanipulation(original);
// check if manipulating the manipulated string is the same as the original string
assert(original == someBitmanipulation(manipulated));
}
You don't need std::bitset at all.
Now change thechar ^= 0x5A; to say thechar |= 0x5A; and see what happens.

strlen() not working well with special characters

When trying to determine the length of a low-level character string with the strlen function of I have noticed that it does not work properly when the string contains Spanish characters that do not exist in English, such as the exclamation opening symbol !, accents or the letter ñ. All these elements are counted as two characters, a situation that is not fixed with Locale.
#include <cstring>
#include <iostream>
int main() {
const char * s1 = "Hola!";
const char * s2 = "¡Hola!";
std::cout << s1 << " has " << strlen(s1) << " elements, but " << s2
<< " has " << strlen(s2) << " intead of 6" << std::endl;
}
This is a work for the university on low-level strings, so it is not possible to use libraries as strings.
strlen gives you the number of non-zero char objects in the buffer pointed to by its argument, up to the first zero char. Your system is apparently using a character encoding (most likely UTF-8) where these problematic characters take up more than one byte (that is, more than one char object).
How to solve this depends on what you're trying to do. For certain operations (such as determining the size of a buffer needed to store the string), the result from strlen is 100% correct, as it's exactly what you need. For most other purposes, welcome to the vast world of character/byte/code-point/whatever nuances. You might want to read up on text encodings, Unicode etc. http://utf8everywhere.org/ might be a good site to start.
You've mentioned this is a university assignment: based on what the teaching goal is, you might need to implement some form of UTF en/de-coding, or just steer clear of non-ASCII characters.

How do i check argv[1] doesn't have any alphabetical characters?

This has been driving my entire C++ class nuts, none of us has been able to find a solid solution to this problem.
We are passing information to our program through the Terminal, via argv* [1]. We would call our program ./main 3 and the program will run 3 times.
The problem comes when we are validating the input, we are trying to cover all of our bases and for most of them we are good, like an alphabetical character entered, a negative number, 0, etc. But what keeps passing through is an int followed by a str for example ./main 3e or ./main 1.3. I've tried this
( Ashwin's answer caught my eye ) but it doesn't seem to work or at least I can't implement it in my code.
This is my code now:
int main(int argc, char * argv[]){
if (!argv[1]) exit(0);
int x = atoi(argv[1]);
if (!x or x <= 0) exit(0);
// I would like to add another exit(0); for when the input mixes numbers and letters or doubles.
for (int i = 0; i < x; i++){
// rest of the main func.
}
Despite the title, it sounds like you really want to do is check whether every single character in the input argument is a digit. You can achieve this by iterating over it, checking that every element is a digit using std::isdigit.
Here's a sketch using the std::all_of algorithm:
size_t len = strlen(argv[1]);
bool ok = std::all_of(argv[1], argv[1] + len,
[](unsigned char c) { return std::isdigit(c); } );
You can add an extra check for the first element being '0' if needed.
If you want to convert a string to a number and verify that the entire string was numeric, you can use strtol instead of atoi. As an additional bonus, strtol correctly checks for overflow and gives you the option of specifying whether or not you want hexadecimal/octal conversions.
Here's a simple implementation, with all the errors noted (printing error messages from a function like this is not a good idea; I just did it for compactness). A better option might be to return an error enum instead of the bool, but this function returns a std::pair<bool, int>: either (false, <undefined>) or (true, value):
std::pair<bool, int> safe_get_int(const char* s) {
char* endptr;
bool ok = false;
errno = 0; /* So we can check ERANGE later */
long val = strtol(s, &endptr, 10); /* Don't allow hex or octal. */
if (endptr == s) /* Includes the case where s is just whitespace */
std::cout << "You must specify some value." << '\n';
if (*endptr != '\0')
std::cout << "Argument must be an integer: " << s << '\n';
else if (val < 0)
std::cout << "Argument must not be negative: " << s << '\n';
else if (errno == ERANGE || val > std::numeric_limits<int>:max())
std::cout << "Argument is too large: " << s << '\n';
else
ok = true;
return std::make_pair(ok, ok ? int(val) : 0);
}
In general, philosophical terms, when you have an API like strtol (or, for that matter, fopen) which will check for errors and deny the request if an error occurs, it is better programming style to "try and then check the error return", than "attempt to predict an error and only try if it looks ok". The second strategy, "check before use", is plagued with bugs, including security vulnerabilities (not in this case, of course, but see TOCTOU for a discussion). It also doesn't really help you, because you will have to check for error returns anyway, in case your predictor was insufficiently precise.
Of course, you need to be confident that the API in question does not have undefined behaviour on bad input, so read the official documentation. In this case, atoi does have UB on bad input, but strtol does not. (atoi: "If the value cannot be represented, the behavior is undefined."; contrast with strtol)

How can I read accented characters in C++ and use them with isalnum?

I am programming in French and, because of that, I need to use accented characters. I can output them by using
#include <locale> and setlocale(LC_ALL, ""), but there seems to be a problem when I read accented characters. Here is simple example I made to show the problem :
#include <locale>
#include <iostream>
using namespace std;
const string SymbolsAllowed = "+-*/%";
int main()
{
setlocale(LC_ALL, ""); // makes accents printable
// Traduction : Please write a string with accented characters
// 'é' is shown correctly :
cout << "Veuillez écrire du texte accentué : ";
string accentedString;
getline(cin, accentedString);
// Accented char are not shown correctly :
cout << "Accented string written : " << accentedString << endl;
for (unsigned int i = 0; i < accentedString.length(); ++i)
{
char currentChar = accentedString.at(i);
// The program crashes while testing if currentChar is alphanumeric.
// (error image below) :
if (!isalnum(currentChar) && !strchr(SymbolsAllowed.c_str(), currentChar))
{
cout << endl << "Character not allowed : " << currentChar << endl;
system("pause");
return 1;
}
}
cout << endl << "No unauthorized characters were written." << endl;
system("pause");
return 0;
}
Here is an output example before the program crashes :
Veuillez écrire du texte accentué : éèàìù
Accented string written : ʾS.?—
I noticed the debugger from Visual Studio shows that I have written something different than what it outputs :
[0] -126 '‚' char
[1] -118 'Š' char
[2] -123 '…' char
[3] -115 '' char
[4] -105 '—' char
The error shown seems to tell that only characters between -1 and 255 can be used but, according to the ASCII table the value of the accented characters I used in the example above do not exceed this limit.
Here is a picture of the error dialog that pops up : Error message: Expression: c >= -1 && c <= 255
Can someone please tell me what I am doing wrong or give me a solution for this? Thank you in advance. :)
char is a signed type on your system (indeed, on many systems) so its range of values is -128 to 127. Characters whose codes are between 128 and 255 look like negative numbers if they are stored in a char, and that is actually what your debugger is telling you:
[0] -126 '‚' char
That's -126, not 126. In other words, 130 or 0x8C.
isalnum and friends take an int as an argument, which (as the error message indicates) is constrained to the values EOF (-1 on your system) and the range 0-255. -126 is not in this range. Hence the error. You could cast to unsigned char, or (probably better, if it works on Windows), use the two-argument std::isalnum in <locale>
For reasons which totally escape me, Windows seems to be providing console input in CP-437 but processing output in CP-1252. The high half of those two code pages is completely different. So when you type é, it gets sent to your program as 130 (0xC2) from CP-437, but when you send that same character back to the console, it gets printed according to CP-1252 as an (low) open single quote ‚ (which looks a lot like a comma, but isn't). So that's not going to work. You need to get input and output to be on the same code page.
I don't know a lot about Windows, but you can probably find some useful information in the MS docs. That page includes links to Windows-specific functions which set the input and output code pages.
Intriguingly, the accented characters in the source code of your program appear to be CP-1252, since they print correctly. If you decide to move away from code page 1252 -- for example, by adopting Unicode -- you'll have to fix your source code as well.
With the is* and to* functions, you really need to cast the input to unsigned char before passing it to the function:
if (!isalnum((unsigned char)currentChar) && !strchr(SymbolsAllowed.c_str(), currentChar)) {
While you're at it, I'd advise against using strchr as well, and switch to something like this:
std::string SymbolsAllowed = "+-*/%";
if (... && SymbolsAllowed.find(currentChar) == std::string::npos)
While you're at it, you should probably forget that you ever even heard of the exit function. You should never use it in C++. In the case here (exiting from main) you should just return. Otherwise, throw an exception (and if you want to exit the program, catch the exception in main and return from there).
If I were writing this, I'd do the job somewhat differently in general though. std::string already has a function to do most of what your loop is trying to accomplish, so I'd set up symbolsAllowed to include all the symbols you want to allow, then just do a search for anything it doesn't contain:
// Add all the authorized characters to the string:
for (unsigned char a = 0; a < std::numeric_limits<unsigned char>::max(); a++)
if (isalnum(a) || isspace(a)) // you probably want to allow spaces?
symbolsAllowed += a;
// ...
auto pos = accentedString.find_first_not_of(symbolsAllowed);
if (pos != std::string::npos) {
std::cout << "Character not allowed: " << accentedString[pos];
return 1;
}

Trying to ignore all whitespace up to the first character (desperately needing a simple nudge)

I'll be flat out honest, this is a small snippet of code I need to finish my homework assignment. I know the community is very suspicious of helping students, but I've been racking my head against the wall for the past 5 hours and literally have accomplished nothing on this assignment. I've never asked for help on any assignments, but none have given me this much trouble.
All I'm having trouble with is getting the program to strip the leading whitespace out. I think I can handle the rest. I'm not asking for a solution to my overall assignment, just a nudge on this one particular section.
I'll post the full assignment text here, but I am NOT posting it to try to get a full solution, I'm only posting it so others can see the conditions I have to work with.
"This homework will give you more practice in writing functions and also how numbers are read into a variable. You need to write a function that will read an unsigned integer into a variable of type unsigned short int. This will have a maximum value of 65535, and the function needs to take care of illegal numbers. You can not use "cin >>", inside the function.
The rules for numeric input are basically as follows:
1) skip all leading white spaces
2) first character found must be numeric else an error will occur
3) numeric characters are then processed one at a time and combine with number
4) processing stops when non-numeric found
We will follow these rules and also add error handling and overflow. If an illegal entry is made before a numeric than an error code of "1" will be sent back, if overflow occurs, that is number bigger then 65535, then error code of "2" will be sent back. If no error then "0" is sent back.
Make sure the main function will continue to loop until the user enters a “n” or “N” for NO, the main should test the error code returned from the function called “ReadInt” and display appropriate error messages or display the number if there is no error. Take care in designing the “ReadInt” function, it should be value returning and have a reference parameter. The function needs to process one character at a time from the input buffer and deal with it in a correct fashion. Once the number has been read in, then make sure the input buffer is empty, otherwise the loop in main may not work correct. I know this is not how the extraction works, but lets do it this way.
You do not need to turn in an algorithm with this assignment, but I would advise you to write one. And the debugger may prove helpful as well. You are basically rewriting the extraction operator as it works on integers."
A majority of my code won't make sense as I've been deleting things and adding things like crazy to try everything I can think of.
#include <iostream>
#include <CTYPE.h>
using namespace std;
int ReadInt (unsigned short int &UserIn);
int main()
{
int Error;
unsigned short int UserInput;
char RepeatProgram;
do
{
Error=ReadInt(UserInput);
if (Error==0)
cout << "Number is " << UserInput << endl;
else if (Error==1)
cout << "Illegal Data Entry\n";
else if (Error==2)
cout << "Numerical overflow, number too big\n";
cout << "Continue? n/N to quit: ";
cin >> RepeatProgram;
cout << endl;
} while (RepeatProgram!='N' && RepeatProgram!='n');
}
int ReadInt (unsigned short int &UserIn)
{
int Err=0;
char TemporaryStorage;
long int FinalNumber=0;
cout << "Enter a number: ";
//cin.ignore(1000, !' '); this didn't work
cin.get(TemporaryStorage);
cout << TemporaryStorage;//I'm only displaying this while I test my ideas to see if they are working or not, before I move onto the the next step
cout << endl;
return Err;
}
I really appreciate any help I may get and hope I don't give the impression that I'm looking for a full free solution to the whole problem. I want to do this on my own, I'm just lot on this beginning.
As a preface, I want to state that this is a question made by a student, but unlike most of their type, it is a quality question that merits a quality answer, so I'll try to do it ;). I won't try to just answer your concrete question, but also to show you other slight problems in your code.
First of all, let's analyze your code step by step. More or less like what a debugger would do. Take your time to read this carefully ;)...
#include <iostream>
#include <CTYPE.h>
Includes headers <iostream> and <ctype.h> (the uppercase works because of some flaws/design-decisions of NTFS in Windows). I'ld recommend you to change the second line to #include <cctype> instead.
using namespace std;
This is okay for any beginner/student, but don't get an habit of it! For the purposes of "purity", I would explicitly use std:: along this answer, as if this line didn't existed.
int ReadInt (unsigned short int &UserIn);
Declares a function ReadInt that takes a reference UserIn to type unsigned short int and returns an object of type int.
int main()
{
Special function main; no parameters, returns int. Begin function.
int Error;
unsigned short int UserInput;
char RepeatProgram;
Declares variables Error, UserInput, and RepeatProgram with respective types int, unsigned short int, and char.
do
{
Do-while block. Begin.
Error=ReadInt(UserInput);
Assign return value of ReadInt of type int called with argument UserInput of type int& to variable Error of type unsigned short int.
if (Error==0)
std::cout << "Number is " << UserInput << endl;
If Error is zero, then print out UserInput to standard output.
else if (Error==1)
std::cout << "Illegal Data Entry\n";
else if (Error==2)
std::cout << "Numerical overflow, number too big\n";
Otherwise, if an error occurs, report it to the user by means of std::cout.
std::cout << "Continue? n/N to quit: ";
std::cin >> RepeatProgram;
Query the user if he/she wants to continue or quit. Store the input character in RepeatProgram of type char.
std::cout << std::endl;
Redundant, unless you want to add padding, which is probably your purpose. Actually, you're better off doing std::cout << '\n', but that doesn't matters too much.
} while (RepeatProgram!='N' && RepeatProgram!='n');
Matching expression for the do-while block above. Repeat execution of the given block if RepeatProgram is neither lower- or uppercase- letter N.
}
End function main. Implicit return value is zero.
int ReadInt (unsigned short int &UserIn)
{
Function ReadInt takes a reference UserIn to unsigned short int and returns an object of type int. Begin function.
int Err=0;
char TemporaryStorage;
long int FinalNumber=0;
Declares variables Err, TemporaryStorage, and FinalNumber of respective types int, char, and long int. Variables Err and FinalNumber are initialized to 0 and 0, respectively. But, just a single thing. Didn't the assignment said that the output number be stored in a unsigned short int? So, better of this...
unsigned short int FinalNumber = 0;
Now...
std::cout << "Enter a number: ";
//std::cin.ignore(1000, !' '); this didn't work
Eh? What's this supposed to be? (Error: Aborting debugger because this makes no logic!**). I'm expecting that you just forgot the // before the comment, right? Now, what do you expect !' ' to evaluate to other than '\0'? istream::ignore(n, ch)will discard characters from the input stream until either n characters have been discarded, ch is found, or the End-Of-File is reached.
A better approach would be...
do
std::cin.get(TemporaryStorage);
while(std::isspace(TemporyStorage));
Now...
std::cin.get(TemporaryStorage);
This line can be discarded with the above approach ;).
Right. Now, where getting into the part where you obviously banged your head against all solid objects known to mankind. Let me help you a bit there. We have this situation. With the above code, TemporaryStorage will hold the first character that is not whitespace after the do-while loop. So, we have three things left. First of all, check that at least one digit is in the input, otherwise return an error. Now, while the input is made up of digits, translate characters into integers, and multiply then add to get the actual integer. Finally, and this is the most... ahem... strange part, we need to avoid any overflows.
if (!std::isdigit(TemporaryStorage)) {
Err = 1;
return Err;
}
while (std::isdigit(TemporaryStorage)) {
unsigned short int OverflowChecker = FinalNumber;
FinalNumber *= 10; // Make slot for another digit
FinalNumber += TemporaryStorage - '0'; '0' - '0' = 0, '1' - '0' = 1...
// If an unsigned overflows, it'll "wrap-around" to zero. We exploit that to detect any possible overflow
if (FinalNumber > 65535 || OverflowChecker > FinalNumber) {
Err = 2;
return Err;
}
std::cin.get(TemporaryStorage);
}
// We've got the number, yay!
UserIn = FinalNumber;
The code is self-explanatory. Please comment if you have any doubts with it.
std::cout << TemporaryStorage;//I'm only displaying this while I test my ideas to see if they are working or not, before I move onto the the next step
cout << endl;
return Err;
Should I say something here? Anyway, I already did. Just remember to take that std::couts out before showing your work ;).
}
End function ReadInt.
You can skip leading whitespace from a stream using std::ws. For example:
std::cin >> std::ws;
This use of >> just invokes the manipulator std::ws on the stream. To meet the teacher's requirements you can invoke it directly:
std::ws(std::cin);
Formatted input automatically skips whitespace. Note that should also always check whether input was successful:
if (std::cin.get(TemporaryStorage)) {
...
}