Odd behavior in boost::format hex

Odd behavior in boost::format hex - c++

I'm trying to format a binary array: char* memblock to a hex string.
When I use the following:
fprintf(out, "0x%02x,", memblock[0]);
I get the following output:
0x7f,
When I try to use boost::format on an ofstream like so:
std::ofstream outFile (path, std::ios::out); //also tried with binary
outFile << (boost::format("0x%02x")%memblock[0]);
I get a weird output like this (seen in Vi): 0x0^?.
What gives?

Given that the character for 0x7f is CTRL-?, it looks like it's outputting the memblock[0] as a character rather than a hex value, despite your format string.
This actually makes sense based on what I've read in the documentation. Boost::format is a type-safe library where the format specifiers dictate how a variable will be output, but limited by the actual type of said variable, which takes precedence.
The documentation states (my bold):
Legacy printf format strings: %spec where spec is a printf format specification.
spec passes formatting options, like width, alignment, numerical base used for formatting numbers, as well as other specific flags. But the classical type-specification flag of printf has a weaker meaning in format.
It merely sets the appropriate flags on the internal stream, and/or formatting parameters, but does not require the corresponding argument to be of a specific type. e.g. : the specification 2$x, meaning "print argument number 2, which is an integral number, in hexa" for printf, merely means "print argument 2 with stream basefield flags set to hex" for format.
And presumably, having the field flag set to hex doesn't make a lot of sense when you're printing a char, so it's ignored. Additionally from that documentation (though paraphrased a little):
The type-char does not impose the concerned argument to be of a restricted set of types, but merely sets the flags that are associated with this type specification. A type-char of p or x means hexadecimal output but simply sets the hex flag on the stream.
This is also verified more specifically by the text from this link:
My colleagues and I have found, though, that when a %d descriptor is used to print a char variable the result is as though a %c descriptor had been used - printf and boost::format don't produce the same result.
The Boost documentation linked to above also explains that the zero-padding 0 modifier works on all types, not just integral ones, which is why you're getting the second 0 in 0x0^? (the ^? is a single character).
In many ways, this is similar to the problem of trying to output a const char * in C++ so that you see a pointer. The following code:
#include <iostream>
int main() {
const char *xyzzy = "plugh";
std::cout << xyzzy << '\n';
std::cout << (void*)xyzzy << '\n';
return 0;
}
will produce something like:
plugh
0x4009a0
because the standard libraries know that C-style strings are a special case but, if you tell them it's a void pointer, they'll give you a pointer as output.
A solution in your specific case may be just to cast your char to an int or some other type that intelligently handles the %x format specifier:
outFile << (boost::format("0x%02x") % static_cast<int>(memblock[0]));

Related

Sign & Unsigned Char is not working in C++

In C++ Primer 5th Edition I saw this
when I tried to use it---
At this time it didn't work, but the program's output did give a weird symbol, but signed is totally blank And also they give some warnings when I tried to compile it. But C++ primer and so many webs said it should work... So I don't think they give the wrong information did I do something wrong?
I am newbie btw :)

But C++ primer ... said it should work
No it doesn't. The quote from C++ primer doesn't use std::cout at all. The output that you see doesn't contradict with what the book says.
So I don't think they give the wrong information
No1.
did I do something wrong?
It seems that you've possibly misunderstood what the value of a character means, or possibly misunderstood how character streams work.
Character types are integer types (but not all integer types are character types). The values of unsigned char are 0..255 (on systems where size of byte is 8 bits). Each2 of those values represent some textual symbol. The mapping from a set of values to a set of symbols is called a "character set" or "character encoding".
std::cout is a character stream. << is stream insertion operator. When you insert a character into a stream, the behaviour is not to show the numerical value. Instead, the behaviour to show the symbol that the value is mapped to3 in the character set that your system uses. In this case, it appears that the value 255 is mapped to whatever strange symbol you saw on the screen.
If you wish to print the numerical value of a character, what you can do is convert to a non-character integer type and insert that to the character stream:
int i = c;
std::cout << i;
1 At least, there's no wrong information regarding your confusion. The quote is a bit inaccurate and outdated in case of c2. Before C++20, the value was "implementation defined" rather than "undefined". Since C++20, the value is actually defined, and the value is 0 which is the null terminator character that signifies end of a string. If you try to print this character, you'll see no output.
2 This was bit of a lie for simplicity's sake. Some characters are not visible symbols. For example, there is the null terminator charter as well as other control characters. The situation becomes even more complex in the case of variable width encodings such as the ubiquitous Unicode, where symbols may consist of a sequence of several char. In such encoding, and individual char cannot necessarily be interpreted correctly without other char that are part of such sequence.
3 And this behaviour should feel natural once you grok the purpose of character types. Consider following program:
unsigned char c = 'a';
std::cout << c;
It would be highly confusing if the output would be a number that is the value of the character (such as 97 which may be the value of the symbol 'a' on the system) rather than the symbol 'a'.
For extra meditation, think about what this program might print (and feel free to try it out):
char c = 57;
std::cout << c << '\n';
int i = c;
std::cout << i << '\n';
c = '9';
std::cout << c << '\n';
i = c;
std::cout << i << '\n';

This is due to the behavior of the << operator on the char type and the character stream cout. Note, the << is known as formatted output means it does some implicit formatting.
We can say that the value of a variable is not the same as its representation in certain contexts. For example:
int main() {
bool t = true;
std::cout << t << std::endl; // Prints 1, not "true"
}
Think of it this way, why would we need char if it would still behave like a number when printed, why not to use int or unsigned? In essence, we have different types so to have different behaviors which can be deduced from these types.
So, the underlying numeric value of a char is probably not what we looking for, when we print one.
Check this for example:
int main() {
unsigned char c = -1;
int i = c;
std::cout << i << std::endl; // Prints 255
}
If I recall correctly, you're somewhat close in the Primer to the topic of built-in types conversions, it will bring in clarity when you'll get to know these rules better. Anyway, I'm sure, you will benefit greatly from looking into this article. Especially the "Printing chars as integers via type casting" part.

A story of stringstream, hexadecimal and characters

I have a string containing hexadecimal values (two characters representing a byte). I would like to use std::stringstream to make the conversion as painless as possible, so I came up with the following code:
std::string a_hex_number = "e3";
{
unsigned char x;
std::stringstream ss;
ss << std::hex << a_hex_number;
ss >> x;
std::cout << x << std::endl;
}
To my biggest surprise this prints out "e" ... Of course I don't give up so easily, and I modify the code to be:
{
unsigned short y;
std::stringstream ss;
ss << std::hex << a_hex_number;
ss >> y;
std::cout << y << std::endl;
}
This, as expected, prints out 227 ...
I looked at http://www.cplusplus.com/reference/istream/istream/operator%3E%3E/ and http://www.cplusplus.com/reference/ios/hex/ but I just could not find a reference which tells me more about why this behaviour comes ...(yes, I feel that it is right because when extracting a character it should take one character, but I am a little bit confused that std:hex is ignored for characters). Is there a mention about this situation somewhere?
(http://ideone.com/YHt7Fz)
Edit I am specifically interested if this behaviour is mentioned in any of the STL standards.

If I understand correctly, you're trying to convert a string in
hex to an unsigned char. So for starters, since this is
"input", you should be using std::istringstream:
std::istringstream ss( a_hex_number );
ss >> std::hex >> variable;
Beyond that, you want the input to parse the string as an
integral value. Streams do not consider character types as
numeric values; they read a single character into them (after
skipping leading white space). To get a numeric value, you
should input to an int, and then convert that to unsigned
char. Characters don't have a base, so std::hex is
irrelevant for them. (The same thing holds for strings, for
example, and even for floating point.)
With regards to the page you site: the page doesn't mention
inputting into a character type (strangely enough, because it
does talk about all other types, including some very special
cases). The documentation for the std::hex manipulator is
also weak: in the running text, it only says that "extracted
values are also expected to be in hexadecimal base", which isn't
really correct; in the table, however, it clearly talks about
"integral values". In the standard, this is documented in
§27.7.2.2.3. (The >> operators for character types are not
member functions, but free functions, so are defined in
a different section.) What we are missing, however, is a good
document which synthesizes these sort of things: whether the
>> operator is a member or a free function doesn't really
affect the user much; you want to see all of the >> available,
with their semantics, in one place.

Let's put it simple: variable type is 'stronger' than 'hex'. That's why 'hex' is ignored for 'char' variable.
Longer story:
'Hex' modifies internal state of stringstream object telling it how to treat subsequent operations on integers. However, this does not apply to chars.

When you print out a character (i.e. unsigned char), it's printed as a character, not as a number.

QSettings does not differentiate between string and int values

I am writing and reading string and int values using a file-backed QSettings object.
When I later try to read the values from a different process, the values are read as strings instead of int.
This is the code I am using to write values:
QSettings settings("TestQSettings.ini", QSettings::IniFormat);
settings.setValue("AAA",QString("111"));
settings.setValue("BBB",222);
This is the file created:
[General]
AAA=111
BBB=222
This is the code I am using to read values:
QVariant qvar = settings.value("AAA");
std::cout << "AAA type " << qvar.type() << std::endl;
qvar = settings.value("BBB");
std::cout << "BBB type " << qvar.type() << std::endl;
If I run this code from the same process:
AAA type 10
BBB type 2
If I run this code from a different process:
AAA type 10
BBB type 10
I know it's possible to convert the types after they have been read.
Unfortunately, this solution will require modifying Windows legacy cross-platform code which I prefer not to modify, for example multiple calls to RegQueryValueEx().
Is it possible to store and read the type information for strings and integers?
For example, Strings will have quotes "" and integers will not:
[General]
AAA="111"
BBB=222
This problem is present on both Qt 4 and Qt 5, on Linux.

Whoa whoa, are you using .ini files or the registry?
With .ini files it's obviously impossible to know what the type was, since it's all a string. You can attempt conversion of the variant to an integer (don't use canConvert!), and assume it's an integer if it converts into one.
With the registry, QSettings will work as you expect it to.
I really don't see what the problem is. Don't use .ini files if you wish to retain type information. You'd face exactly the same problems if you wrote the code by hand in a platform-dependent manner.
You can explicitly write quoted strings into the .ini files, and check for presence of quotes when reading them back. If the quotes are not present, you can try conversion to an integer.

I solved this problem for a component which needs to save and restore variants of arbitrary type, without knowing what its clients expect. The solution was to store the variant's typeName() alongside each value:
void store(QSettings& settings, const QString& key, const QVariant& value)
{
settings.setValue(key+"value", value);
settings.setValue(key+"type", value.typeName());
}
When reading back, we also read the type name, and convert() the variant if it's not already the correct type, before returning it.
QVariant retrieve(const QSettings& settings, const QString& key)
{
auto value = settings.value(key+"value");
const auto typeName = settings.value(key+"type").toString();
const bool wasNull = value.isNull(); // NOTE 1
const auto t = QMetaType::type(typeName.toUtf8()); // NOTE 2
if (value.userType() != t && !value.convert(t) && !wasNull) {
// restore value that was cleared by the failed convert()
value = settings.value(key+"value");
qWarning() << "Failed to convert value" << value << "to" << typeName;
}
return value;
}
Notes
The wasNull variable is in there because of this niggle of convert():
Warning: For historical reasons, converting a null QVariant results in a null value of the desired type (e.g., an empty string for QString) and a result of false.
In this case, we need to ignore the misleading return value, and keep the successfully-converted null variant of the correct type.
It's not clear that UTF-8 is the correct encoding for QMetaType names (perhaps local 8-bit is assumed?); my types are all ASCII, so I just use toLatin1() instead, which might be faster. If it were an issue, I'd use QString::fromLatin1 in the store() method (instead of implicit char* to QString conversion), to ensure a clean round-trip.
If the type name is not found, t will be QMetaType::UnknownType; that's okay, because convert() will then fail, and we'll return the unconverted variant (or a null). It's not great, but it's a corner case that won't happen in normal usage, and my system will recover reasonably quickly.

Turns out the solution was very simple.
When values are written to the INI file, the type is known.
I am appending to the value "\"STRING right before SetValue
When values are read back from the INI file.
I verify that string types have the above postfix.
If they do, I chop the postfix off.
If they don't I assume they are integers instead of strings.
Works like a charm!
Thanks to you all and especially #Kuba Ober for practically handing out the solution.

Scanf function with strings

I need to read input from user. The input value may be string type or int type.
If the value is int then the program insert the value into my object.
Else if the value is string then it should check the value of that string, if it's "end" then the program ends.
Halda h; //my object
string t;
int tint;
bool end=false;
while(end!=true)
{
if(scanf("%d",&tint)==1)
{
h.insert(tint);
}
else if(scanf("%s",t)==1)
{
if(t=="end")
end=true;
else if(t=="next")
if(h.empty()==false)
printf("%d\n",h.pop());
else
printf("-1\n");
}
}
The problem is that scanning string doesn't seem to work properly.
I've tried to change it to: if(cin>>t) and it worked well.
I need to get it work with scanf.

The specifier %s in the scanf() format expects a char*, not a std::string.
From C11 Standard (C++ Standard refers to it about the C standard library):
Except in the case of a % specifier, the input item (or, in the case of a %n directive, the
count of input characters) is converted to a type appropriate to the conversion specifier. If
the input item is not a matching sequence, the execution of the directive fails: this
condition is a matching failure. Unless assignment suppression was indicated by a *, the
result of the conversion is placed in the object pointed to by the first argument following
the format argument that has not already received a conversion result. If this object
does not have an appropriate type, or if the result of the conversion cannot be represented
in the object, the behavior is undefined.
Anyway, here there's is no real reason to prefer the C way, use C++ facilities. And when you use the C library, use safe functions that only reads characters up to a given limit (just like fgets, or scanf with a width specifier), otherwise you could have overflow, that leads again to undefined behavior, and some errors if you're luck.

That's a really bad way to check for end-of-input. Either use an integer or use a string.
If you choose string, make provisions to convert from string to int.
My logic would be to first check if it can be converted to integer. if it can be, then continue with the logic. If it can't be(such as if it's a float or double or some other string) then ignore and move on. If it can be, then insert it into Halda's object.
Sidenote: Do not use scanf() and printf() when you're working with C++.

Assuming string refers to std::sring this program doesn't have defined behavior. You can't really use std::string with sscanf() You could set up a buffer inside the std::string and read into that but the string wouldn't change its size. You are probably better off using streams with std::string (well, in my opinion you are always better off using streams).

printf with std::string?

My understanding is that string is a member of the std namespace, so why does the following occur?
#include <iostream>
int main()
{
using namespace std;
string myString = "Press ENTER to quit program!";
cout << "Come up and C++ me some time." << endl;
printf("Follow this command: %s", myString);
cin.get();
return 0;
}
Each time the program runs, myString prints a seemingly random string of 3 characters, such as in the output above.

C++23 Update
We now finally have std::print as a way to use std::format for output directly:
#include <print>
#include <string>
int main() {
// ...
std::print("Follow this command: {}", myString);
// ...
}
This combines the best of both approaches.
Original Answer
It's compiling because printf isn't type safe, since it uses variable arguments in the C sense1. printf has no option for std::string, only a C-style string. Using something else in place of what it expects definitely won't give you the results you want. It's actually undefined behaviour, so anything at all could happen.
The easiest way to fix this, since you're using C++, is printing it normally with std::cout, since std::string supports that through operator overloading:
std::cout << "Follow this command: " << myString;
If, for some reason, you need to extract the C-style string, you can use the c_str() method of std::string to get a const char * that is null-terminated. Using your example:
#include <iostream>
#include <string>
#include <stdio.h>
int main()
{
using namespace std;
string myString = "Press ENTER to quit program!";
cout << "Come up and C++ me some time." << endl;
printf("Follow this command: %s", myString.c_str()); //note the use of c_str
cin.get();
return 0;
}
If you want a function that is like printf, but type safe, look into variadic templates (C++11, supported on all major compilers as of MSVC12). You can find an example of one here. There's nothing I know of implemented like that in the standard library, but there might be in Boost, specifically boost::format.
[1]: This means that you can pass any number of arguments, but the function relies on you to tell it the number and types of those arguments. In the case of printf, that means a string with encoded type information like %d meaning int. If you lie about the type or number, the function has no standard way of knowing, although some compilers have the ability to check and give warnings when you lie.

Please don't use printf("%s", your_string.c_str());
Use cout << your_string; instead. Short, simple and typesafe. In fact, when you're writing C++, you generally want to avoid printf entirely -- it's a leftover from C that's rarely needed or useful in C++.
As to why you should use cout instead of printf, the reasons are numerous. Here's a sampling of a few of the most obvious:
As the question shows, printf isn't type-safe. If the type you pass differs from that given in the conversion specifier, printf will try to use whatever it finds on the stack as if it were the specified type, giving undefined behavior. Some compilers can warn about this under some circumstances, but some compilers can't/won't at all, and none can under all circumstances.
printf isn't extensible. You can only pass primitive types to it. The set of conversion specifiers it understands is hard-coded in its implementation, and there's no way for you to add more/others. Most well-written C++ should use these types primarily to implement types oriented toward the problem being solved.
It makes decent formatting much more difficult. For an obvious example, when you're printing numbers for people to read, you typically want to insert thousands separators every few digits. The exact number of digits and the characters used as separators varies, but cout has that covered as well. For example:
std::locale loc("");
std::cout.imbue(loc);
std::cout << 123456.78;
The nameless locale (the "") picks a locale based on the user's configuration. Therefore, on my machine (configured for US English) this prints out as 123,456.78. For somebody who has their computer configured for (say) Germany, it would print out something like 123.456,78. For somebody with it configured for India, it would print out as 1,23,456.78 (and of course there are many others). With printf I get exactly one result: 123456.78. It is consistent, but it's consistently wrong for everybody everywhere. Essentially the only way to work around it is to do the formatting separately, then pass the result as a string to printf, because printf itself simply will not do the job correctly.
Although they're quite compact, printf format strings can be quite unreadable. Even among C programmers who use printf virtually every day, I'd guess at least 99% would need to look things up to be sure what the # in %#x means, and how that differs from what the # in %#f means (and yes, they mean entirely different things).

use myString.c_str() if you want a c-like string (const char*) to use with printf
thanks

Use std::printf and c_str()
example:
std::printf("Follow this command: %s", myString.c_str());

You can use snprinft to determine the number of characters needed and allocate a buffer of the right size.
int length = std::snprintf(nullptr, 0, "There can only be %i\n", 1 );
char* str = new char[length+1]; // one more character for null terminator
std::snprintf( str, length + 1, "There can only be %i\n", 1 );
std::string cppstr( str );
delete[] str;
This is a minor adaption of an example on cppreference.com

printf accepts a variable number of arguments. Those can only have Plain Old Data (POD) types. Code that passes anything other than POD to printf only compiles because the compiler assumes you got your format right. %s means that the respective argument is supposed to be a pointer to a char. In your case it is an std::string not const char*. printf does not know it because the argument type goes lost and is supposed to be restored from the format parameter. When turning that std::string argument into const char* the resulting pointer will point to some irrelevant region of memory instead of your desired C string. For that reason your code prints out gibberish.
While printf is an excellent choice for printing out formatted text, (especially if you intend to have padding), it can be dangerous if you haven't enabled compiler warnings. Always enable warnings because then mistakes like this are easily avoidable. There is no reason to use the clumsy std::cout mechanism if the printf family can do the same task in a much faster and prettier way. Just make sure you have enabled all warnings (-Wall -Wextra) and you will be good. In case you use your own custom printf implementation you should declare it with the __attribute__ mechanism that enables the compiler to check the format string against the parameters provided.

The main reason is probably that a C++ string is a struct that includes a current-length value, not just the address of a sequence of chars terminated by a 0 byte. Printf and its relatives expect to find such a sequence, not a struct, and therefore get confused by C++ strings.
Speaking for myself, I believe that printf has a place that can't easily be filled by C++ syntactic features, just as table structures in html have a place that can't easily be filled by divs. As Dykstra wrote later about the goto, he didn't intend to start a religion and was really only arguing against using it as a kludge to make up for poorly-designed code.
It would be quite nice if the GNU project would add the printf family to their g++ extensions.

Printf is actually pretty good to use if size matters. Meaning if you are running a program where memory is an issue, then printf is actually a very good and under rater solution. Cout essentially shifts bits over to make room for the string, while printf just takes in some sort of parameters and prints it to the screen. If you were to compile a simple hello world program, printf would be able to compile it in less than 60, 000 bits as opposed to cout, it would take over 1 million bits to compile.
For your situation, id suggest using cout simply because it is much more convenient to use. Although, I would argue that printf is something good to know.

Here’s a generic way of doing it.
#include <string>
#include <stdio.h>
auto print_helper(auto const & t){
return t;
}
auto print_helper(std::string const & s){
return s.c_str();
}
std::string four(){
return "four";
}
template<class ... Args>
void print(char const * fmt, Args&& ...args){
printf(fmt, print_helper(args) ...);
}
int main(){
std::string one {"one"};
char const * three = "three";
print("%c %d %s %s, %s five", 'c', 3+4, one + " two", three, four());
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js