Having trouble getting the output of Linux 'dd' command in C++ program - c++

I'm trying to use the sudo dd if=/dev/sda ibs=1 count=64 skip=446 command to get the partition table information from the master boot record in order to parse it I'm basically trying to read the output to a string in order to parse it, but all I'm getting is the following: � !. What I'm expecting is:
80 01 01 00 83 FE 3F 01 3F 00 00 00 43 7D 00 00
00 00 01 02 83 FE 3F 0D 82 7D 00 00 0C F1 02 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
My current code looks like this, and is just taken from here: How to execute a command and get output of command within C++ using POSIX?
#include <iostream>
#include <stdexcept>
#include <stdio.h>
#include <string>
using namespace std;
string exec(const char* cmd) {
char buffer[128];
string result = "";
FILE* pipe = popen(cmd, "r");
if (!pipe) throw std::runtime_error("popen() failed!");
try {
while (!feof(pipe)) {
if (fgets(buffer, 128, pipe) != NULL)
result += buffer;
}
} catch (...) {
pclose(pipe);
throw;
}
pclose(pipe);
return result;
}
int main() {
string s = exec("sudo dd if=/dev/sda ibs=1 count=64 skip=446");
cout << s;
}
Obviously I'm doing something wrong, but I can't figure out the problem. How do I get the proper output into my string?

while (!feof(pipe)) {
This is your first bug.
result += buffer;
This is your second bug. buffer is a char array, which decays to a char * in this context. As you know, a char * in a string context gets typically interpreted as a C-style string that's terminated by a '\0' byte.
You might've noticed that you expect to get a bunch of 00 bytes read. Well, after the char array gets decayed to a char *, everything up to the first 00 byte is going to get appended to your result, rather than the 128 bytes, exactly. And if there were no 00 bytes in those 128 bytes, you'll probably end up getting some random garbage, as an extra bonus, with a small possibility of a crash.
if (fgets(buffer, 128, pipe) != NULL)
This is your third bug. If the read data happens to include a 0A byte, an '\n' character, this is not going to read 128 bytes.
cout << s;
This is your fourth bug. Since the data will (after all the other bugs are fixed) presumably contain binary stuff, your terminal is inlikely to have much success displaying various bytes, especially bytes 00 through 1F.
To fix your code you will need to:
Correctly handle the end-of-file condition.
Correctly read binary data. fgets(), et al, are completely unsuitable for the task. If you insist on using C file structures, your only reasonable option is to use fread().
Correctly assemble a std::string from a blob of binary data. Merely appending a char buffer to it, crossing your fingers, and hoping for the best, will not work. You will most likely need to use the two-argument std::string constructor, that takes a beginning and an ending iterator value as parameters.
Display binary data correctly, instead of just dumping the entire blob to std::cout, just like that. The most common approach is a std::hex manipulator, and diligent up-conversion of each char to an int, as an unsigned value.

Related

Read binary file into struct and also problems with endianness

I want to read a binary file image.dd into struct teststruct *test;. Basically there are two problems:
1. Wrong order because of little / big endian.
printf("%02x", test->magic); just gives me 534b554c instead of 4c55b453 (maybe this has something to do with the "main problem" in the next part). Its just "one value". As an example, printf("%c", test->magic); gives L instead of LUKS.
2. No output with test->version.
uint16_t version; in struct teststruct gives no output. Which means, i call printf("%x ", test->version); and there is no result.
This is exampleh.h which contains struct:
#ifndef _EXAMPLEH_H
#define _EXAMPLEH_H
#define MAGIC_L 6
struct teststruct {
char magic [MAGIC_L];
uint16_t version;
};
#endif
This is the main code:
using namespace std;
#include <stdint.h>
#include <string.h>
#include <iostream>
#include <fstream>
#include "exampleh.h"
struct teststruct *test;
int main() {
FILE *fp = fopen("C:\\image.dd", "rb"); // open file in binary mode
if (fp == NULL) {
fprintf(stderr, "Can't read file");
return 0;
}
fread(&test,sizeof(test),1,fp);
//printf("%x ", test->magic); //this works, but in the wrong order because of little/big endian
printf("%x ", test->version); //no output at all
fclose(fp);
return 0;
}
And this here are the first 114 Bytes of image.dd:
4C 55 4B 53 BA BE 00 01 61 65 73 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 78 74 73 2D 70 6C 61 69
6E 36 34 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 73 68 61 32 35 36 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 40
You must allocate the structure and read data into the structure instead of reading into an pointer directly. If you are going to read only one structure, you won't need to declare pointers for the structure.
printf("%x ", test->magic); invokes undefined behavior because pointer (automatically converted from the array) is passed to where unsigned int is required.
In this case, the observed behavior is because:
Firstly, fread(&test,sizeof(test),1,fp); read the first few bytes from the file as pointer value.
Then, printf("%02x", test->magic); printed the first 4-byte integer from the file because test->magic is (converted to) the pointer to the array placed at the top of the structure, and the address of the array is same as the address of the structure itself, so the address read from the file is printed. One more lucky is that where to read 4-byte integer and address (pointer) from as function arguments are the same.
Finally, you didn't get any output from printf("%x ", test->version); because the address read from the file is unfortunately in region that is not readable and trying to read there caused Segmentation Fault.
Fixed code:
using namespace std;
#include <stdint.h>
#include <string.h>
#include <iostream>
#include <fstream>
#include "exampleh.h"
struct teststruct test; // allocate structure directly instead of pointer
int main() {
FILE *fp = fopen("C:\\image.dd", "rb"); // open file in binary mode
if (fp == NULL) {
fprintf(stderr, "Can't read file");
return 0;
}
fread(&test,sizeof(test),1,fp); // now structure is read instead of pointer
for (int i = 0; i < 6; i++) {
printf("%02x", (unsigned char)test.magic[i]); // print using proper combination of format and data
}
printf(" ");
printf("%x ", test.version); // also use . instead of ->
fclose(fp);
return 0;
}
struct teststruct *test; points to NULL, as it is defined in the global namespace. You do not allocate memory for this pointer, so test->version is UB.
fread(&test,sizeof(test),1,fp); is also wrong, this will read a pointer, not the content of the struct.
An easy fix is to change test to be a struct teststruct and not a pointer to it.
using namespace std;
#include <stdint.h>
#include <string.h>
#include <iostream>
#include <fstream>
#include "exampleh.h"
struct teststruct test; //not a pointer anymore
int main() {
FILE *fp = fopen("C:\\image.dd", "rb"); // open file in binary mode
if (fp == NULL) {
fprintf(stderr, "Can't read file");
return 0;
}
fread(&test,sizeof(test),1,fp);
//printf("%x ", test.magic); //this works, but in the wrong order because of little/big endian
printf("%x ", test.version); //no output at all
fclose(fp);
return 0;
}

The size of these structs are different in a file but the same in program memory

Consider the following POD struct:
struct MessageWithArray {
uint32_t raw;
uint32_t myArray[10];
//MessageWithArray() : raw(0), myArray{ 10,20,30,40,50,60,70,80,90,100 } { };
};
Running the following:
#include <type_traits>
#include <iostream>
#include <fstream>
#include <string>
struct MessageWithArray {
uint32_t raw;
uint32_t myArray[10];
//MessageWithArray() : raw(0), myArray{ 10,20,30,40,50,60,70,80,90,100 } { };
};
//https://stackoverflow.com/questions/46108877/exact-definition-of-as-bytes-function
template <class T>
char* as_bytes(T& x) {
return &reinterpret_cast<char&>(x);
// or:
// return reinterpret_cast<char*>(std::addressof(x));
}
int main() {
MessageWithArray msg = { 0, {0,1,2,3,4,5,6,7,8,9} };
std::cout << "Size of MessageWithArray struct: " << sizeof(msg) << std::endl;
std::cout << "Is a POD? " << std::is_pod<MessageWithArray>() << std::endl;
std::ofstream buffer("message.txt");
buffer.write(as_bytes(msg), sizeof(msg));
return 0;
}
Gives the following output:
Size of MessageWithArray struct: 44
Is a POD? 1
A hex dump of the "message.txt" file looks like this:
00 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00
03 00 00 00 04 00 00 00 05 00 00 00 06 00 00 00
07 00 00 00 08 00 00 00 09 00 00 00
Now if I uncomment the constructor (so that MessageWithArray has a zero-argument constructor), MessageWithArray becomes a non-POD struct. Then I use the constructor to initialize instead. This results in the following changes in the code:
....
struct MessageWithArray {
.....
MessageWithArray() : raw(0), myArray{ 10,20,30,40,50,60,70,80,90,100 }{ };
};
....
int main(){
MessageWithArray msg;
....
}
Running this code, I get:
Size of MessageWithArray struct: 44
Is a POD? 0
A hex dump of the "message.txt" file looks like this:
00 00 00 00 0D 0A 00 00 00 14 00 00 00 1E 00 00
00 28 00 00 00 32 00 00 00 3C 00 00 00 46 00 00
00 50 00 00 00 5A 00 00 00 64 00 00 00
Now, I'm not so interested in the actual hex values, what I'm curious about is why there is one more byte in the non-POD struct dump compared to the POD struct dump, when sizeof() declares they are the same number of bytes? Is it possible that, because the constructor makes the struct non-POD, that something hidden has been added to the struct? sizeof() should be an accurate compile-time check, correct? Is something possibly avoiding being measured by sizeof()?
Specifications: I am running this in an empty project in Visual Studio 2017 version 15.7.5, Microsoft Visual C++ 2017, on a Windows 10 machine.
Intel Core i7-4600M CPU
64-bit Operating System, x64-based processor
EDIT: I decided to initialize the struct to avoid Undefined Behaviour, and because the question is still valid with the initialization. Initializing it to a value without 10 preserves the behaviour I observed initially, because the data the array had never contained any 10s (even if it was garbage, and random).
It has nothing to do with POD-ness.
Your ofstream is opened in text mode (rather than binary mode). On windows it means that \n gets converted to \r\n.
In the second case there happened to be one 0x0A (\n) byte in the struct, that became 0x0D 0x0A (\r\n). That's why you see an extra byte.
Also, using uninitialized variables in the first case leads to undefined behaviour, which is this case didn't manifest itself.
Other answer explains the problem with writing binary data into stream opened in text mode, however this code is fundamentally wrong. There is no need to dump anything, the proper way to check sizes of those structures and verify that they are equal would be to use static_assert:
struct MessageWithArray {
uint32_t raw;
uint32_t myArray[10];
};
struct NonPodMessageWithArray {
uint32_t raw;
uint32_t myArray[10];
NonPodMessageWithArray() : raw(0), myArray{ 10,20,30,40,50,60,70,80,90,100 } {}
};
static_assert(sizeof(MessageWithArray) == sizeof(NonPodMessageWithArray));
online compiler

Why does RegSetValueEx work even when I break the rule about accounting for NUL termination in the length?

I've got a simple program that adds calc.exe to startup:
#include <windows.h>
#include <tchar.h>
int main(){
_tprintf(TEXT("Adding calc.exe to SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run...\n"));
HKEY hRegRunKey;
LPCTSTR lpKeyName = TEXT("SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run");
LPCTSTR lpKeyValue = TEXT("Calculator");
LPCTSTR lpProgram = TEXT("C:\\WINDOWS\\system32\\calc.exe");
DWORD cchProgram = _tcslen(lpProgram);
_tprintf(TEXT("Path: %s. \n"), lpProgram);
_tprintf(TEXT("Length: %d. \n"), cchProgram);
if(RegOpenKeyEx( HKEY_LOCAL_MACHINE, lpKeyName, 0, KEY_SET_VALUE, &hRegRunKey) == ERROR_SUCCESS){
if(RegSetValueEx(hRegRunKey, lpKeyValue, 0, REG_SZ, (const BYTE *)lpProgram, cchProgram * sizeof(TCHAR)) != ERROR_SUCCESS){
_tprintf(TEXT("ERROR: Can't set key value.\n"));
exit(1);
}else{
_tprintf(TEXT("Key has been added sucessfully.\n"));
}
}
Sleep(5000);
RegCloseKey(hRegRunKey);
}
For me the world of c/c++/WIN32API is still full of misteries... so I have few questions.
1. When I define string is it automatically null terminated?
LPCTSTR lpProgram = TEXT("C:\\WINDOWS\\system32\\calc.exe");
or should it be done:
LPCTSTR lpProgram = TEXT("C:\\WINDOWS\\system32\\calc.exe\0");
2. In my code is final argument to RegSetValueEx set to correct value?
From MSDN - RegSetValueEx function page:
cbData [in] The size of the information pointed to by the lpData
parameter, in bytes. If the data is of type REG_SZ, REG_EXPAND_SZ, or
REG_MULTI_SZ, cbData must include the size of the terminating null
character or characters.
cchProgram is set to 28 characters without null termination. On my system(because of UNICODE I think?) cchProgram * sizeof(TCHAR) = 56.
Shouldn't I set it to 58 to add null termination?
When I run this program, as it is above, without any modifications and I'll check Calculator value in registry via Modify binary date I get:
43 00 3A 00 5C 00 57 00 C.:.\.W.
49 00 4E 00 44 00 4F 00 I.N.D.O.
57 00 53 00 5C 00 73 00 W.S.\.s.
79 00 73 00 74 00 65 00 y.s.t.e.
6D 00 33 00 32 00 5C 00 m.3.2.\.
63 00 61 00 6C 00 63 00 c.a.l.c.
2E 00 65 00 78 00 65 00 ..e.x.e.
00 00 ..
Its 58 bytes including null termination. I'am confuse:/
UPDATE
Accounting for a NULL character by adding 1 to string length when calculating cbData yields exactly same result as without adding it.
cchProgram * sizeof(TCHAR) produces same data entry as (cchProgram + 1) * sizeof(TCHAR)
Providing value smaller then a string length doesn't add NULL byte and copies given number of bytes.
27 * sizeof(TCHAR) as cbData produces:
43 00 3A 00 5C 00 57 00 C.:.\.W.
49 00 4E 00 44 00 4F 00 I.N.D.O.
57 00 53 00 5C 00 73 00 W.S.\.s.
79 00 73 00 74 00 65 00 y.s.t.e.
6D 00 33 00 32 00 5C 00 m.3.2.\.
63 00 61 00 6C 00 63 00 c.a.l.c.
2E 00 65 00 78 00 ..e.x.
I am on some old XP, service pack god knows what, I don't know how other version of windows would handle it.
1: Yes, it will be null terminated without the need for \0.
Double quoted strings (") are literal constants whose type is in fact a null-terminated array of characters. So string literals enclosed between double quotes always have a null character ('\0') automatically appended at the end.
2: _tcslen() doesn't include the null terminator. You can add sizeof(TCHAR) to add it.
The reason the value still works is probably because Windows tries to be robust even when given incorrect input. It is probably automatically appending the null terminator for you. However, because the documentation says you must include the null terminator it may not always append it.
When I define string is it automatically null terminated?
String literals are null-terminated, yes. "Hello" is actually {'H', 'e', 'l', 'l', 'o', '\0'}.
In my code is final argument to RegSetValueEx set to correct value?
You're right that you need the null terminator. An easier way would be sizeof(TEXT("C:\\WINDOWS\\system32\\calc.exe")) if the string literal is short, since sizeof("Hello") is 6; it includes the null-terminator, but in most cases, you'll need your variable and will have to add one to the length you get from string character-counting functions, since they don't include the null-terminator.
Ben Voigt made an excellent point below that a const TCHAR[] program = TEXT("text"); can be used the same way as a literal in the call (sizeof(program)), but it a lot more maintainable when you want to change one less place in the code, which is a must for any actual project instead of a really small test, and even that can grow.
Finally, there are two things you should get out of your head early:
Hungarian notation: Don't do it. It's outdated and rather pointless.
TCHAR: Just use wide strings with any Windows API functions you can.
What you're doing absolutely right is checking function calls for errors. You wouldn't believe how many problems asked about can be solved by checking for failure and using GetLastError when the documentation says to.
Since you asked how you're supposed to use C++ facilities, here's one way, with a couple changes that make more sense for using C++:
#include <windows.h>
int main(){
//R means raw string literal. Note one backslash
std::cout << R"(Adding calc.exe to SOFTWARE\Microsoft\Windows\CurrentVersion\Run...)" << '\n';
const WCHAR[] keyName = LR"(SOFTWARE\Microsoft\Windows\CurrentVersion\Run)");
std::cout << "Enter program name: ";
std::wstring keyValue;
if (!std::getline(std::wcin, keyValue)) {/*error*/}
std::cout << "Enter full program path: ";
std::wstring program;
if (!std::getline(std::wcin, program)) {/*error*/}
std::wcout << "Path: " << program << ".\n";
std::cout << "Length: " << program.size() << ".\n";
HKEY runKey;
if(RegOpenKeyExW(HKEY_LOCAL_MACHINE, keyName, 0, KEY_SET_VALUE, &runKey)) {/*error*/}
if(RegSetValueExW(runKey, keyValue.c_str(), 0, REG_SZ, reinterpret_cast<const BYTE *>(program.c_str()), (program.size() + 1) * 2)) {
std::cout << "ERROR: Can't set key value.\n";
return 1;
}
if (RegCloseKey(runKey)) {/*error*/}
std::cout << "Key has been added successfully.\n";
std::cout << "Press enter to continue..."
std::cin.get();
}
A better way to do this using C++ idioms would be to at least have a RegKey RAII class that calls RegCloseKey in its destructor and saves you the work. At the very least, it could be used like this:
RegKey key(HKEY_LOCAL_MACHINE, keyName, KEY_SET_VALUE);
RegSetValueExW(key, ...); //could have implicit or explicit conversion, fill in the ...
//RegCloseKey called when key goes out of scope

Examining Output of raw files C++

Hi I am reading in a binary file formatted in hex. It is an image file below is a short example of the first few lines of code using hd ... |more command on linux. The image is a binary graphic so the only pixel colours are either black or white. It is a 1024 by 1024 image however the size comes out to be 2097152 bytes
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000dfbf0 00 00 00 00 00 00 00 00 00 00 00 00 ff 00 ff 00 |................|
000dfc00 ff 00 ff 00 ff 00 00 00 00 00 00 00 00 00 00 00 |................|
000dfc10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
This is the code I am using to read it in found in another thread on SO
ifstream file (argv[1], ios::in | ios::binary | ios::ate);
ifstream::pos_type fileSize;
char* fileContents;
if(file.is_open())
{
fileSize = file.tellg();
fileContents = new char[fileSize];
file.seekg(0, ios::beg);
if(!file.read(fileContents, fileSize))
{
cout << "fail to read" << endl;
}
file.close();
cout << fileSize << endl;
The code works however when I run this for loop
for (i=0; i<2097152; i++)
printf("%hd",fileContents[i]);
The only thing printed out are zeros and no 1s. Why is this are my parameters in printf not correctly specifying the pixel size. I know for a fact that there are 1's in the image representing the white areas. Also how do i figure out how many bytes represent a pixel in this image.
Your printf() is wrong. %hd means short, while fileContents[i] is a char; on all modern systems I'm familiar with, this is a size mismatch. Use an array of short instead, since you have twice as many bytes as pixels.
Also, stop using printf() and use std::cout, avoiding all type mismatch problems.
Since 2097152/1024 is exactly 2048 which is in turn 2*1024, I would assume each pixel is 2 bytes.
The other problem is probably in the printf. I'm not sure what %hd is, I would use %02x myself and cast the data to int.

Accessing specific binary information based on binary format documentation

I have a binary file and documentation of the format the information is stored in. I'm trying to write a simple program using c++ that pulls a specific piece of information from the file but I'm missing something since the output isn't what I expect.
The documentation is as follows:
Half-word Field Name Type Units Range Precision
10 Block Divider INT*2 N/A -1 N/A
11-12 Latitude INT*4 Degrees -90 to +90 0.001
There are other items in the file obviously but for this case I'm just trying to get the Latitude value.
My code is:
#include <cstdlib>
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char* argv[])
{
char* dataFileLocation = "testfile.bin";
ifstream dataFile(dataFileLocation, ios::in | ios::binary);
if(dataFile.is_open())
{
char* buffer = new char[32768];
dataFile.seekg(10, ios::beg);
dataFile.read(buffer, 4);
dataFile.close();
cout << "value is << (int)(buffer[0] & 255);
}
}
The result of which is "value is 226" which is not in the allowed range.
I'm quite new to this and here's what my intentions where when writing the above code:
Open file in binary mode
Seek to the 11th byte from the start of the file
Read in 4 bytes from that point
Close the file
Output those 4 bytes as an integer.
If someone could point out where I'm going wrong I'd sure appreciate it. I don't really understand the (buffer[0] & 255) part (took that from some example code) so layman's terms for that would be greatly appreciated.
Hex Dump of the first 100 bytes:
testfile.bin 98,402 bytes 11/16/2011 9:01:52
-0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -A -B -C -D -E -F
00000000- 00 5F 3B BF 00 00 C4 17 00 00 00 E2 2E E0 00 00 [._;.............]
00000001- 00 03 FF FF 00 00 94 70 FF FE 81 30 00 00 00 5F [.......p...0..._]
00000002- 00 02 00 00 00 00 00 00 3B BF 00 00 C4 17 3B BF [........;.....;.]
00000003- 00 00 C4 17 00 00 00 00 00 00 00 00 80 02 00 00 [................]
00000004- 00 05 00 0A 00 0F 00 14 00 19 00 1E 00 23 00 28 [.............#.(]
00000005- 00 2D 00 32 00 37 00 3C 00 41 00 46 00 00 00 00 [.-.2.7.<.A.F....]
00000006- 00 00 00 00 [.... ]
Since the documentation lists the field as an integer but shows the precision to be 0.001, I would assume that the actual value is the stored value multiplied by 0.001. The integer range would be -90000 to 90000.
The 4 bytes must be combined into a single integer. There are two ways to do this, big endian and little endian, and which you need depends on the machine that wrote the file. x86 PCs for example are little endian.
int little_endian = buffer[0] | buffer[1]<<8 | buffer[2]<<16 | buffer[3]<<24;
int big_endian = buffer[0]<<24 | buffer[1]<<16 | buffer[2]<<8 | buffer[3];
The &255 is used to remove the sign extension that occurs when you convert a signed char to a signed integer. Use unsigned char instead and you probably won't need it.
Edit: I think "half-word" refers to 2 bytes, so you'll need to skip 20 bytes instead of 10.