Binary data as command line argument - c++

I have a simple c++ program (and a similar one for c) that just prints out the first argument
#include <iostream>
int main(int argc, char** argv)
{
if(argc > 1)
std::cout << ">>" << argv[1] << "<<\n";
}
I can pass binary data (i have tried on bash) as argument like
$./a.out $(printf "1\x0123")
>>1?23<<
If I try to pass a null there i get
./a.out $(printf "1\x0023")
bash: warning: command substitution: ignored null byte in input
>>123<<
Clearly bash(?) does not allow this
But is it possible to send a null as a command line argument this way?
Do either c or c++ put any restrictions on this?
Edit: I am not using this in day-to-day c++, this question is just out of curiosity

This answer is written in C, but can be compiled as C++ and works the same in both. I quote from the C11 standard; there are equivalent definitions in the C++ standards.
There isn't a good way to pass null bytes to a program's arguments
C11 §5.1.2.2.1 Program startup:
If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup.
C11 §7.1.1 Definitions of terms
A string is a contiguous sequence of characters terminated by and including the first null character.
That means that each argument passed to main() in argv is a null-terminated string. There is no reliable data after the null byte at the end of the string — searching there would be accessing out of bounds of the string.
So, as noted at length in the comments to the question, it is not possible in the ordinary course of events to get null bytes to a program via the argument list because null bytes are interpreted as being the end of each argument.
By special agreement
That doesn't leave much wriggle room. However, if both the calling/invoking program and the called/invoked program agree on the convention, then, even with the limitations imposed by the standards, you can pass arbitrary binary data, including arbitrary sequences of null bytes, to the invoked program — up to the limits on the length of an argument list imposed by the implementation.
The convention has to be along the lines of:
All arguments (except argv[0], which is ignored, and the last argument, argv[argc-1]) consist of a stream of non-null bytes followed by a null.
If you need adjacent nulls, you have to provide empty arguments on the command line.
If you need trailing nulls, you have to provide empty arguments as the last arguments on the command line.
This could lead to a program such as (null19.c):
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static void hex_dump(const char *tag, size_t size, const char *buffer);
int main(int argc, char **argv)
{
if (argc < 2)
{
fprintf(stderr, "Usage: %s arg1 [arg2 '' arg4 ...]\n", argv[0]);
exit(EXIT_FAILURE);
}
size_t len_args = 0;
for (int i = 1; i < argc; i++)
len_args += strlen(argv[i]) + 1;
char buffer[len_args];
size_t offset = 0;
for (int i = 1; i < argc; i++)
{
size_t arglen = strlen(argv[i]) + 1;
memmove(buffer + offset, argv[i], strlen(argv[i]) + 1);
offset += arglen;
}
assert(offset != 0);
offset--;
hex_dump("Argument list", offset, buffer);
return 0;
}
static inline size_t min_size(size_t x, size_t y) { return (x < y) ? x : y; }
static void hex_dump(const char *tag, size_t size, const char *buffer)
{
printf("%s (%zu):\n", tag, size);
size_t offset = 0;
while (size != 0)
{
printf("0x%.4zX:", offset);
size_t count = min_size(16, size);
for (size_t i = 0; i < count; i++)
printf(" %.2X", buffer[offset + i] & 0xFF);
putchar('\n');
size -= count;
offset += count;
}
}
This could be invoked using:
$ ./null19 '1234' '5678' '' '' '' '' 'def0' ''
Argument list (19):
0x0000: 31 32 33 34 00 35 36 37 38 00 00 00 00 00 64 65
0x0010: 66 30 00
$
The first argument is deemed to consist of 5 bytes — four digits and a null byte. The second is similar. The third through sixth arguments each represent a single null byte (it gets painful if you need large numbers of contiguous null bytes), then there is another string of five bytes (three letters, one digit, one null byte). The last argument is empty but ensures that there is a null byte at the end. If omitted, the output would not include that final terminal null byte.
$ ./null19 '1234' '5678' '' '' '' '' 'def0'
Argument list (18):
0x0000: 31 32 33 34 00 35 36 37 38 00 00 00 00 00 64 65
0x0010: 66 30
$
This is the same as before except there is no trailing null byte in the data. The two examples in the question are easily handled:
$ ./null19 $(printf "1\x0123")
Argument list (4):
0x0000: 31 01 32 33
$ ./null19 1 23
Argument list (4):
0x0000: 31 00 32 33
$
This works strictly within the standard assuming only that empty strings are recognized as valid arguments. In practice, those arguments are already contiguous in memory so it might be possible on many platforms to avoid the copying phase into the buffer. However, the standard does not stipulate that the argument strings are laid out contiguously in memory.
If you need multiple arguments with binary data, you can modify the convention. For example, you could take a control argument of a string which indicates how many subsequent physical arguments make up one logical binary argument.
All this relies on the programs interpreting the argument list as agreed. It is not really a general solution.

Related

How should I fix valgrind's uninitialised value error?

I have written a small application which works at some point with binary data. In unit tests, I compare this data with the expected one. When an error occurs, I want the test to display the hexadecimal output such as:
Failure
Expected: string_to_hex(expected, 11)
Which is: "01 43 02 01 00 65 6E 74 FA 3E 17"
To be equal to: string_to_hex(writeBuffer, 11)
Which is: "01 43 02 01 00 00 00 00 98 37 DB"
In order to display that (and to compare binary data in the first place), I used the code from Stack Overflow, slightly modifying it for my needs:
std::string string_to_hex(const std::string& input, size_t len)
{
static const char* const lut = "0123456789ABCDEF";
std::string output;
output.reserve(2 * len);
for (size_t i = 0; i < len; ++i)
{
const unsigned char c = input[i];
output.push_back(lut[c >> 4]);
output.push_back(lut[c & 15]);
}
return output;
}
When checking for memory leaks with valgrind, I fould a lot of errors such as this one:
Use of uninitialised value of size 8
at 0x11E75A: string_to_hex(std::__cxx11::basic_string, std::allocator > const&, unsigned long)
I'm not sure to understand it. First, everything seems initialized, including, I'm mistaken, output. Moreover, there is no mention of size 8 in the code; the value of len varies from test to test, while valgrind reports the same size 8 every time.
How should I fix this error?
So this is one of the cases where passing a pointer to char that points to buffer filled with arbitrary binary data into evil implicit constructor of std::string class was causing string to be truncated to first \0. Straightforward approach would be to pass a raw pointer but a better solution is to start using array_view span or similar utility classes that will provide index validation at least in debug build for both input and lut.

Reading value from buffer efficiently

I have a std::vector<char> buffer in memory with a number at a specific offset, e.g.
00 00 00 00 00 00 00 00 00 33 2E 31 34 99 99 99 .........3.14™™™
I know the end and start offset to read the double/float value, but right now I'm copying the relevant part with std::copy() into a std::string and then calling std::stod. My question is: how can I make this faster?
There must be a way to avoid the copy.. for instance: can I point a stream to a specific offset in another buffer? Or something similar perhaps
If the numbers were delimited, then using strtod directly on the buffer like Let_Me_Be suggests is efficient. However, since the numbers are not delimited, you cannot use strtod directly.
If the buffer is zero (or eof) terminated, then you can simply modify it, by adding the terminator after the number, and then restore the original character, like bolov suggested. Since the end offset is part of the number, there's always at least the terminator after it, so offset_end won't overflow. The following code assumes that offset_end is one past the last character. If it's the last character, then simply use + 1.
auto original = data[offset_end];
data[offset_end] = '\0';
auto result = strtod(&data[offset_start], nullptr);
data[offset_end] = original;
Even, if the buffer is not terminated, you can still do that, but only if the number is not at the very end. If it is, or if you don't know where the buffer ends, or the buffer is const, then your current solution is as efficient as it gets.
If you know the offset then simply:
vector<char> data;
// ... snip ...
char *endp = null;
double result = strtod(&data[offset],&endp);
Note: This assumes that the number is followed by non-numeric characters (or end of string).

Wierd result (`\210`) when printing the end of a char array

My codes are like this:
int main(int argc, char *argv[])
{
char ca[] = {'0'};
cout << *ca << endl;
cout << *(ca+1) << endl;
cout << ca[1] << endl;
cout << (char)(0) << endl;
return 0;
}
The result is like this:
0
\210
\210
^#
From this thread, I knew that ^# is the same as \0 actually. However, the \210 seems not because when I use hexdump to view the result.
bash-3.2$ ./playground | hexdump -C
00000000 30 0a 88 0a 88 0a 00 0a |0.......|
00000008
It can be seen clearly that \210 is 88 instead of 00.
As I understood, ca+1 should point to a null terminator, which is \0. But why cout << *(ca+1) << endl; gives me \210 as the result?
Because you have to manually add the null terminator when declaring a character array. If you make it a string (such as in char myString[] = "hi"), then it will add a null terminator. But if you make it an array, with the braces, it will not.
As for the 0x88 byte, it just happened to be the next byte in RAM for whatever reason.
In any valid C program the string literals are always null terminated. Here you are trying to initialize the individual element of character array but just with list initialization syntax and not to a string literal. As this is static array allocated with in same function, you can even confirm this with help of sizeof operator.
doing ca should give you 1 i.e. one character array. However if you would have done something like char ca[] = "0"; then applying sizeof(ca) should give you 2 i.e. character '0' and null termination character. As aaaaaa123456789 mentioned, this is just an output now you are getting, just another byte in a memory. If you run this at some different time, you will see different output or your program may crash. referring incorrect location may cause any runtime anomaly.

how values are stored in char

I am adding values into the combo box as a string. Below is my code.
Platform Windows XP and I am using Microsoft Visual Studio 2003
language C++
error encountered -> "Run-Time Check Failure #2 - Stack around the variable 'buffer' was corrupted."
If I increase the size of the buffer to say 4 and above then I won't get this error.
My question is not related to how to fix that error, but I am wondering why I got this error if buffer size = 2.
According to my logic I have given buffer size = 2 as char[0] will store the valve of char[1] = null terminated character.
Now since char can store values from 0 to 255 , I thought this should be ok as my inserted values are from 1 to 63 and then from 183 to 200.
CComboBox m_select_combo;
const unsigned int max_num_of_values = 63;
m_select_combo.AddString( "ALL" );
for( unsigned int i = 1; i <= max_num_of_values ; ++i )
{
char buffer[2];
std::string prn_select_c = itoa( i, buffer, 10 );
m_select_combo.AddString( prn_select_c.c_str() );
}
const unsigned int max_num_of_high_sats = 202 ;
for( unsigned int i = 183; i <= max_num_of_high_sats ; ++i )
{
char buffer[2];
std::string prn_select_c = itoa( i, buffer, 10 );
m_select_combo.AddString( prn_select_c.c_str() );
}
Could you guys please give me an idea as to what I'm not understanding?
itoa() zero-terminates it's output, so when you call itoa(63, char[2], 10) it writes three characters 6, 3 and the terminating \0. But your buffer is only two characters long.
itoa() function is best avoided in favour of snprintf() or boost::lexical_cast<>().
You should read the documentation for itoa.
Consider the following loop:
for( unsigned int i = 183; i <= max_num_of_high_sats ; ++i )
{
char buffer[2];
std::string prn_select_c = itoa( i, buffer, 10 );
m_select_combo.AddString( prn_select_c.c_str() );
}
The first iteration converts the integer 183 to the 3 character string "183", plus a terminating null character. That's 4 bytes, which you are trying to cram into a two byte array. The docs tell you specifically to make sure your buffer is large enough to hold any value; in this case it should be at least the number of digits in max_num_of_high_sats long, plus one for the terminating null.
You might as well make it large enough to hold the maximum value you can store in an unsigned int, which would be 11 (eg. 10 digits for 4294967295 plus a terminating null).
the ito function is used to convert a int to a C sytle string based on the 3rd parameter base.
As a example, it just likes to print out the int 63 in printf. you need two ASII byte, one is used to storage CHAR 6, the other is used to storage CHAR 3. the 3rd should be NULL. So in your case the max int is three digital. you need 4 bytes in the string
You are converting an integer to ASCII, that is what itoa does. If you have a number like 183 that is four chars as a string, '1', '8', '3', '\0'.
Each character takes one byte, for example character '1' is the value 0x31 in ASCII.

Using bitwise operators in C++ to change 4 chars to int

What I must do is open a file in binary mode that contains stored data that is intended to be interpreted as integers. I have seen other examples such as Stackoverflow-Reading “integer” size bytes from a char* array. but I want to try taking a different approach (I may just be stubborn, or stupid :/). I first created a simple binary file in a hex editor that reads as follows.
00 00 00 47 00 00 00 17 00 00 00 41
This (should) equal 71, 23, and 65 if the 12 bytes were divided into 3 integers.
After opening this file in binary mode and reading 4 bytes into an array of chars, how can I use bitwise operations to make char[0] bits be the first 8 bits of an int and so on until the bits of each char are part of the int.
My integer = 00 00 00 00
+ ^ ^ ^ ^
Chars Char[0] Char[1] Char[2] Char[3]
00 00 00 47
So my integer(hex) = 00 00 00 47 = numerical value of 71
Also, I don't know how the endianness of my system comes into play here, so is there anything that I need to keep in mind?
Here is a code snippet of what I have so far, I just don't know the next steps to take.
std::fstream myfile;
myfile.open("C:\\Users\\Jacob\\Desktop\\hextest.txt", std::ios::in | std::ios::out | std::ios::binary);
if(myfile.is_open() == false)
{
std::cout &lt&lt "Error" &lt&lt std::endl;
}
char* mychar;
std::cout &lt&lt myfile.is_open() &lt&lt std::endl;
mychar = new char[4];
myfile.read(mychar, 4);
I eventually plan on dealing with reading floats from a file and maybe a custom data type eventually, but first I just need to get more familiar with using bitwise operations.
Thanks.
You want the bitwise left shift operator:
typedef unsigned char u8; // in case char is signed by default on your platform
unsigned num = ((u8)chars[0] << 24) | ((u8)chars[1] << 16) | ((u8)chars[2] << 8) | (u8)chars[3];
What it does is shift the left argument a specified number of bits to the left, adding zeros from the right as stuffing. For example, 2 << 1 is 4, since 2 is 10 in binary and shifting one to the left gives 100, which is 4.
This can be more written in a more general loop form:
unsigned num = 0;
for (int i = 0; i != 4; ++i) {
num |= (u8)chars[i] << (24 - i * 8); // += could have also been used
}
The endianness of your system doesn't matter here; you know the endianness of the representation in the file, which is constant (and therefore portable), so when you read in the bytes you know what to do with them. The internal representation of the integer in your CPU/memory may be different from that of the file, but the logical bitwise manipulation of it in code is independent of your system's endianness; the least significant bits are always at the right, and the most at the left (in code). That's why shifting is cross-platform -- it operates at the logical bit level :-)
Have you thought of using Boost.Spirit to make a binary parser? You might hit a bit of a learning curve when you start, but if you want to expand your program later to read floats and structured types, you'll have an excellent base to start from.
Spirit is very well-documented and is part of Boost. Once you get around to understanding its ins and outs, it's really mind-boggling what you can do with it, so if you have a bit of time to play around with it, I'd really recommend taking a look.
Otherwise, if you want your binary to be "portable" - i.e. you want to be able to read it on a big-endian and a little-endian machine, you'll need some sort of byte-order mark (BOM). That would be the first thing you'd read, after which you can simply read your integers byte by byte. Simplest thing would probably be to read them into a union (if you know the size of the integer you're going to read), like this:
union U
{
unsigned char uc_[4];
unsigned long ui_;
};
read the data into the uc_ member, swap the bytes around if you need to change endianness and read the value from the ui_ member. There's no shifting etc. to be done - except for the swapping if you want to change endianness..
HTH
rlc