I have following code which works well on my ubuntu system:
#include <algorithm>
// ... other functions
bool IsHexPrefixed(const std::string& input) {
return input.substr(0, 2) == "0x";
}
std::string StripHexPrefix(const std::string& input) {
return IsHexPrefixed(input) ? input.substr(2, input.length()) : input;
}
bool IsHexString(const std::string& input) {
std::string stripped_string_ = StripHexPrefix(input);
return std::all_of(stripped_string_.begin(), stripped_string_.end(), ::isxdigit);
}
// ... some other functions
On Windows 10 via cmd, VSCode, and Visual Studio 2019 I get pop-up mentioning the Debug Assertion Error on Windows as well as Visual Studio 2019.
The line on which this error is coming is std::all_of() function call in IsHexString() function.
I tried to use exceptions and find out where the exception is coming, but no solution is found yet. I also tried to use Breakpoint but that is also not helping to get the cause.
What could be the reason for this error?
EDIT:
The string that I passed to IsHexString() function is 000002C479F17CC0.
The reason is just what the assertion says. isxdigit is undefined if it's argument is not represented as unsigned char or EOF(see notes here).
Since it takes an int argument, it's highly likely your string contains chars in range 129-255 (probably by containing non-ASCII text), so they get promoted to negative integer numbers.
The linked cppreference page also has a workaround to avoid promotion issues that you could apply to you case:
std::all_of(stripped_string_.begin(), stripped_string_.end(),
[](unsigned char c){ return std::isxdigit(c); });
Another possibility is that StripHexPrefix function corrupts your string causing the problem above.
Related
The source code itself isn't really in question. It's the compiler's reaction to it. Here's the troublesome snippet:
int XtnUtil::IsExtension(const char * filename, const char * xtn)
{
char* fx = FindExtension(filename); // Get a pointer to the filename extension
if (!fx) return 0; // Bail out if it's not there
if (xtn[0] == '.') xtn++; // Make sure we're looking at the alpha part
return (stricmp(fx, xtn) ? 0 : 1); // TRUE if they're equal
}
I've also used _stricmp instead of stricmp. In either case, the compiler gives me a particularly uninformative message:
It seems to say "Don't use _stricmp, use _stricmp instead." I tried it with and without the underscore and also tried the POSIX equivalent, strcasecmp() but Visual Studio doesn't seem to know that function at all.
For the moment, I've moved past this by simply writing my own function named mystricmp() which is kind of distasteful but seems to work. Right now I'm mostly interested in why the compiler gave me such a funky message, and what would I be able to do about it if the function I had to hand-write weren't trivial?
I'm using this code:
std::string word;
std::ifstream f((file_name + ".txt").c_str());
while (f >> word) {
good_input = true;
for (int i = 0; i < word.length(); ++i) {
if (ispunct(word.at(i))) {
word.erase(i--, 1);
}
else if (isupper(word.at(i))){
word.at(i) = tolower(word.at(i));
}
}
Every time I read the word "doesn't" from a text file, I get this error:
Debug Assertion Failed!
Program: directory\SortingWords(Length).exe
File: minkernel\crts\ucrt\src\appcrt\convert\istype.cpp
Line: 36
Expression: c >= -1 && c <= 255
For more information please visit... [etc.]
When I click "abort", my program exits with code 3. Don't know if that's helpful?
It looks like it's got something to do with the apostrophe maybe? This code works find for all other words in my document up until this one. And works great with documents that don't include apostrophes, yet they include plenty of other punctuation...
I tried changing the encoding of the text file (simply made with notepad), but that didn't help. Generally found lots of complaints about apostrophes but no working answers. Can anyone help me figure out what's going on?
As documentation for ispunct says:
The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF.
Visual C++ is nice enough to add an almost explicit message for this error if you link to the debug runtime (this is often the case with undefined behaviour - with the release runtime, it just crashes or behaves strangely; with the debug runtime, you get an error dialog box).
In theory, this means that in the character set used by your environment, ' is not representable as an unsigned char, i.e. its character code is too big or too low.
In practice, this seems very unlikely and perhaps even impossible on Windows. It is much more likely that your file doesn't really contain an apostrophe but a character that merely looks like one, e.g. an accent: ´
Here's how you can reproduce the problem in a simple manner:
#include <ctype.h>
int main()
{
ispunct('\'');
ispunct('´'); // undefined behaviour (crash or error message with Visual C++)
}
isupper has the same problem.
You can use those functions safely with static_cast, e.g.:
if (ispunct(static_cast<unsigned char>(word.at(i))))
Of course, now ispunct will return zero for the character. If you really need to cover ´, you have to do so explicitly, for example with a helper function like this:
bool extended_ispunct(int c)
{
return static_cast<unsigned char>(c) || c == '´';
}
When I try using the isdigit() function with a Chinese character, it reports an assert in Visual Studio 2013 in Debug mode, but there is no problem in Release mode.
I think if this function is to determine whether the parameter is a digit, why does it not return 0 if the Chinese is wrong?
This is my code:
string testString = "abcdefg12345中文";
int count = 0;
for (const auto &c : testString) {
if (isdigit(c)) {
++count;
}
}
and this is the assert :
You broke the contract of isdigit(int), which expects only ASCII characters in the range stated.
The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF.
Your standard library implementation is being kind and asserting, rather than going on to blow stuff up.
There is an alternative, locale-aware isdigit(charT ch, const locale&) that you may be able to use here.
I suggest performing some further research on how "characters" work in computers, particularly with regards to encoding more "exotic"1 character sets.
1 From the perspective of computer history. Of course, to you, it is the less exotic alternative!
The isdigit() and related functions / macros in <ctypes.h> expect an int converted from an unsigned char, or EOF, which on most systems means a value in the range 0-255 (or -1 for EOF). So any value not in the range -1…255 is incorrect.
Problem 1: You are passing in a char, which on your system has range -128…+127. Solution to this problem is simple:
if (isdigit(static_cast<unsigned char>(c)))
This won't crash, however, it's not quite correct for Chinese characters.
Problem 2: Non-ASCII characters should probably use iswdigit() instead. This will correctly handle Chinese characters:
wstring testString = L"abcdefg12345中文";
int count = 0;
for (const auto &c : testString) {
if (iswdigit(c)) {
++count;
}
}
I have come across something weird in Visual Studio C++ 2013 Community Edition which is either a compiler bug or I'm writing invalid code that does compile without warnings.
Consider the following snippet:
#include <string>
#include <iostream>
int main()
{
std::wstring text;
wchar_t firstChar = 'A';
text = firstChar + L"B";
std::wcout << text << std::endl;
return 0;
}
I expect the output to be "AB" but what I get is random garbage.
My question: is the operator+ between the wchar_t and wchar_t[] ill-defined, or is this a compiler/library bug?
In the case that it is ill-defined, should the compiler have issued a warning?
Note: I am not looking for a solution/workaround. Changing L"B" to std::wstring(L"B") fixes the problem. What I want to know is if I did something wrong or if the compiler did.
Note 2: I also get a garbage result in Visual Studio C++ 2010 and an online compiler which supposedly uses g++, and no compilation error, so I'm leaning towards that code being invalid, even though I would expect a compiler error.
firstChar + L"B" increments the address of the string literal L"B" with the promoted value of firstChar. It doesn't concatenate the character and the string. This is undefined behavior since std::basic_string will try to copy the string until it finds L'\0', which is beyond its original bound.
std::basic_string has multiple overloaded operators that allow operations like this to have intuitive behavior, which is why it works in that case.
Before you get started; yes I know this is a duplicate question and yes I have looked at the posted solutions. My problem is I could not get them to work.
bool invalidChar (char c)
{
return !isprint((unsigned)c);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
I tested this method on "Prusæus, Ægyptians," and it did nothing
I also attempted to substitute isprint for isalnum
The real problem occurs when, in another section of my program I convert string->wstring->string. the conversion balks if there are unicode chars in the string->wstring conversion.
Ref:
How can you strip non-ASCII characters from a string? (in C#)
How to strip all non alphanumeric characters from a string in c++?
Edit:
I still would like to remove all non-ASCII chars regardless yet if it helps, here is where I am crashing:
// Convert to wstring
wchar_t* UnicodeTextBuffer = new wchar_t[ANSIWord.length()+1];
wmemset(UnicodeTextBuffer, 0, ANSIWord.length()+1);
mbstowcs(UnicodeTextBuffer, ANSIWord.c_str(), ANSIWord.length());
wWord = UnicodeTextBuffer; //CRASH
Error Dialog
MSVC++ Debug Library
Debug Assertion Failed!
Program: //myproject
File: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c
Line: //Above
Expression:(unsigned)(c+1)<=256
Edit:
Further compounding the matter: the .txt file I am reading in from is ANSI encoded. Everything within should be valid.
Solution:
bool invalidChar (char c)
{
return !(c>=0 && c <128);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
If someone else would like to copy/paste this, I can check this question off.
EDIT:
For future reference: try using the __isascii, iswascii commands
Solution:
bool invalidChar (char c)
{
return !(c>=0 && c <128);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
EDIT:
For future reference: try using the __isascii, iswascii commands
At least one problem is in your invalidChar function. It should be:
return !isprint( static_cast<unsigned char>( c ) );
Casting a char to an unsigned is likely to give some very, very big
values if the char is negative (UNIT_MAX+1 + c). Passing such a
value toisprint` is undefined behavior.
Another solution that doesn't require defining two functions but uses anonymous functions available in C++17 above:
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), [](char c){return !(c>=0 && c <128);}), str.end());
}
I think it looks cleaner
isprint depends on the locale, so the character in question must be printable in the current locale.
If you want strictly ASCII, check the range for [0..127]. If you want printable ASCII, check the range and isprint.