c++ convert memory into data structure - c++

When debugging an application I have found in memory a structure that I am 100% certain only consists of 4 strings. Though I am not quite sure how I would convert it to a data structure so I can use the structures pointer address to access values. For example here is what the data struct looks like in memory (as an example lets say it is CONSISTENTLY located at the memory address 0x123456)
The data structureconsists of 4 separate strings
string 1 = ad
string 2 = dgdhkkkkkkhkk
string 3 = ggghhjk
string 4 = dgcfoh
And I have tried creating a data struct like
struct reversedConnectionDat_t
{
char * data1;
char * data2;
char * data3;
char * data4;
}
and this is how I tried accessing the data
reversedConnectionDat_t * storeDat = (reversedConnectionDat_t*)0x123456;
print(storeDat->data3);
But it does not seem to work. Am I not reading the strings from memory properly?
(Oh and the strings will sometimes change from what I posted in the example code posted above, i.e sometimes string 1 will be 7 in length and string 3 will only be 2 in length etc...)

You have a pointer to a structure of pointers so even if you point the structure to the correct memory address, you still have uninitialized pointers inside the structure. You need to provide them with actual memory. I would try setting up your structure like this ...
struct reversedConnectionDat_t
{
char data1 [3];
char data2 [50];
char data3 [50];
char data4 [50];
}
BTW, I didn't count the spaces. I just kind of guessed at it but you get the idea.

I think you've mis-identified that data structure. I suspect that what you have is three independent buffers, each of which can hold one or more null-terminated strings.
The first structure is 68 bytes long and contains "ad\0dgdhkkkkkkhkk\0" (followed by enough \0 to fill the buffer.
It's possible that this buffer is really only 64 bytes long, and that the four bytes after it are used for some other data element.
The second buffer looks to be 64 bytes long, containing a single string and padded with \0 characters to fill out the 64 bytes.
It's impossible to say how long the third buffer is. All we know is that it's long enough to hold the string "dgcfoh\0". I'd guess that the buffer is 64 bytes long, but be willing to revise that opinion if I get more data.
I think the structure you want is:
struct s
{
char data1[68]; // buffer holds one or more null-terminated strings
char data2[64];
char data3[64].
}
Based on the scant information you've given us, that's what I'd start with. Then you need a way to parse a buffer of null-terminated strings. That is, get the two individual strings from the first buffer. That's a pretty easy bit of C code.

I couldn't understand what's wrong with your code except for your magicnumber:0x123456, casting which might not suit your structure. Are you sure your magic-number results in data compatible to the struct defined by you? Like, if you'll try to access storeDat->data3, it'll definitely be leading to seg-fault except you do something as follows or you are very lucky.
struct R{
char *a;
char *b;
};
int main(void)
{
struct R *r1 = (struct R*) malloc(sizeof(struct R));
r1->a = "12333"; //Pointing to a string literal
r1->b = "12331"; //Pointing to a string literal
int address = (int)&r1;
struct R *r2 = (struct R*) address;
std::cout<<r2->b;
return 0;
}
P.S. - I'm not a good programmer. But was just curious to answer, as I thought it might be of some help. Sorry, if I couldn't understand your problem properly.

You are using pointers(char*) and the structure size of your structure is the size of the 4 pointers. If you want to get the strings you should use arrays(char[]) with fixed size.
This will only work if your string size are equal to the buffer size.
IMO the best way is to get the in a char array then find the null terminators
/0 and then configure your pointers to point to the start of each string(at the start and right after the first 3 null terminators).
char* pointerToMem = something; //your strings data
yourStruct.str1 = pointerToMem;
while(*pointerToMem != '\0')
{
pointerToMem++;
}
yourStruct.str2 = pointerToMem + 1;
This is how you can make the struct of pointers work. This code is not optimal and you should not use it as it but it shows how can you get the strings from the memory. To have a C string you only need the address of the first character and some null terminator at the end.

Related

Write and read struct to file

I'm having problems trying to save a struct to a new PE section and then reading.
The struct looks like:
#pragma pack(push, 1)
typedef struct _SCRIPT_STRUCT
{
DWORD dwKeySize;
unsigned char *lpKeyBuffer;
DWORD dwScriptSize;
unsigned char *lpScriptBuffer;
} SCRIPT, *PSCRIPT;
#pragma pack(pop)
lpKeyBuffer is a random hex values (0-255) array and lpScriptBuffer contains an encrypted (RC4) script (in Lua if that matters).
I think the struct is successfully written in the new section created but I can't read the buffers.
(Writting):
SCRIPT tScript;
tScript.dwKeySize = KEY_SIZE;
tScript.lpKeyBuffer = new unsigned char[tScript.dwKeySize];
GenerateKey(tScript.lpKeyBuffer, KEY_SIZE);
tScript.dwScriptSize = szScript.size();
tScript.lpScriptBuffer = new unsigned char[tScript.dwScriptSize];
memcpy(tScript.lpScriptBuffer, szScript.c_str(), tScript.dwScriptSize);
tScript.lpScriptBuffer = (unsigned char*)szScript.c_str();
rc4_encryption(tScript.lpScriptBuffer, tScript.dwScriptSize, tScript.lpKeyBuffer, tScript.dwKeySize);
DWORD dwScriptStructSize = sizeof(DWORD) + tScript.dwKeySize + sizeof(DWORD) + tScript.dwScriptSize;
char lpStructBuffer[dwScriptStructSize];
ZeroMemory(lpStructBuffer, dwScriptStructSize);
memcpy(lpStructBuffer, &tScript, dwScriptStructSize);
//CreateFile, create new section, etc
SetFilePointer(hCurrent, LastHeader->PointerToRawData, NULL, FILE_BEGIN);
WriteFile(hCurrent, lpStructBuffer, dwScriptStructSize, &dwRead, 0);
(Reading):
SCRIPT tScript;
memcpy(&tScript, lpScriptBuffer, dwSectionSize);
tScript.lpKeyBuffer = new unsigned char[tScript.dwKeySize];
tScript.lpKeyBuffer[tScript.dwKeySize] = 0x00;
tScript.lpScriptBuffer = new unsigned char[tScript.dwScriptSize];
tScript.lpScriptBuffer[tScript.dwScriptSize] = 0x00;
printf("dwScriptSize = %lu\n", tScript.dwScriptSize);
printf("dwKeySize = %lu\n", tScript.dwKeySize);
rc4_encryption(tScript.lpScriptBuffer, tScript.dwScriptSize, tScript.lpKeyBuffer, tScript.dwKeySize);
printf("script: %s\n", tScript.lpScriptBuffer);
The DWORD outputs are correct but the last printf shows strange symbols.
You can't save pointers to a file from one process, and read them from another process and expect it to work. Pointers are generally unique per process, especially to dynamically allocated data. When you read a pointer from a file from another process (even if it's the same program) that pointer will no longer point to the allocated data, it will just be a "random" stray pointer, and dereferencing it (like you do when printing it as a string) will lead to undefined behavior.
You need to save the string separately from the structure. When reading it's easy since you have the sizes of the variable-length data, and you know where (in relation to the structure) the data is saved.
The comment from shrike made me think and take a closer look at the code you present. It's not complete so this is all guess work, but the actual problem might actually be something different from what I described above.
Lets take a look at a few lines from your "reading" code (which doesn't actually show any reading):
SCRIPT tScript;
memcpy(&tScript, lpScriptBuffer, dwSectionSize);
tScript.lpKeyBuffer = new unsigned char[tScript.dwKeySize];
tScript.lpKeyBuffer[tScript.dwKeySize] = 0x00;
tScript.lpScriptBuffer = new unsigned char[tScript.dwScriptSize];
tScript.lpScriptBuffer[tScript.dwScriptSize] = 0x00;
Now assuming you read the structure into a character buffer lpScriptBuffer (similar to how you use one when writing, not really needed though), then you still have the same problem with the pointers I told about above, but there is another issue: You reassigning the pointers to point to some newly allocated memory. This is all well and good, but the problem here is that you don't actually try to initialize that memory. Not that you really can't with the code you show, but that's beside the point. The problem with you not initializing the memory is precisely that, it's uninitialized and therefore will have an indeterminate contents, seemingly random, and most likely not valid text. Using uninitialized memory is, like dereferencing stray pointers, undefined behavior.
There is also yet another issue: You writing out of bounds of the memory you allocate. As you hopefully knows, indexes in array are zero-based. So if you allocate size bytes then valid indexes are from (and including) 0 to size - 1.
Since you allocate e.g. tScript.dwKeySize bytes of memory then the top index is tScript.dwKeySize - 1 but then you use tScript.dwKeySize as index, which is out of bounds and again will lead to undefined behavior.
You need to allocate tScript.dwKeySize + 1 bytes instead, if the size doesn't already include the string terminator.

How do you read from a memory buffer c++

I am fairly new at C++ and am trying to understand how memory manipulation works. I am used to Java and Python and haven't really been exposed to this.
I am working on a project that has the following structure that doesn't quite make sense to me.
typedef struct
{
size_t size;
char *data;
} data_buffer;
This structure basically acts as a buffer, with a pointer to the data stored within the buffer and the size of the buffer to allow the program to know how large the buffer is when reading from it.
An example of how the program uses the buffer:
data_buffer buffer = {0};
//Manipulate data here so it contains pertinent information
CFile oFile;
oFile.Write(buffer.data, buffer.size);
The program mostly uses 3rd party code to read the data found within the buffer, so I am having trouble finding an example of how this is done. My main question is how do I read the contents of the buffer, given only a pointer to a character and a size? However, I would also like to understand how this actually works. From what I understand, memory is written to, with a pointer to where it starts and the size of the memory, so I should be able to just iterate through the memory locations, grabbing each character from memory and tagging it onto whatever structure I choose to use, like a CString or a string. Yet, I don't understand how to iterate through memory. Can someone help me understand this better? Thanks.
There is no reason you cannot use a std::string or CString to manipulate that data. (Use higher level constructs when they are available to you.)
To get the data into a std::string, use the constructor or assignment operator:
std::string s( buffer.data, buffer.size );
You can even stick it in a std::stringstream so you can treat the data buffer like a file:
std::istringstream ss( s );
int n;
ss >> n;
Things work similarly for the MFC string class.
To get the data from a string, you'll need to copy it over. Ideally, you'll be able to allocate the data's memory. Assuming you have data written into a stringstream
std::ostringstream ss;
ss << name << "," << employee_number;
You can then allocate the space you need using the function that creates the data_buffer object:
function_that_creates_a_data_buffer( buffer, ss.str().size() );
If there is no such function (there ought to be!) you must malloc() or new it yourself, as appropriate:
buffer.size = ss.str().size();
buffer.data = (char*)malloc( buffer.size );
Now just copy it:
ss.str().copy( buffer.data, buffer.size );
If your buffer needs a null-terminator (I have so far assumed it doesn't), make sure to add one to the size you allocate and set the last character to zero.
buffer.size = ss.str().size + 1;
buffer.data = new char[ buffer.size ];
ss.str().copy( buffer.data, buffer.size );
buffer.data[ buffer.size-1 ] = 0;
Make sure to look at the documentation for the various classes you will use.
Hope this helps.
A variable of type char* is actually a pointer to memory. Your struct contains data which is of type char* so it is a pointer to memory. (I suggest writing char* data instead of char *data, to help keep this clear.)
So you can use it as a starting point to look at your data. You can use another pointer to walk over the buffer.
char* bufferInspectorPointer;
bufferInspectorPointer = buffer.data;
bufferInspectorPointer will now point to the first byte of the buffer's data and
*bufferInsepectorPointer
will return the contents of the byte.
bufferInspectorPointer++
will advance the pointer to the next byte in the buffer.
You can do arithmetic with pointers in C++, so
bufferInspectorPointer - buffer.data
will tell you how many bytes you have covered. You can compare it to buffer.size to see how far you have left to go.
Since you tagged this as C++ I'd recommend using algorithms. You can get your iterators by using buffer.data as start and buffer.data + buffer.size as end. So to copy the memory into a std::string you'd do something like so:
std::string str(buffer.data, buffer.data + buffer.size);
Or perhaps to append onto a string:
str.reserve(str.size() + buffer.size);
std::copy(buffer.data, buffer.data + buffer.size, std::back_inserter(str));
Of course you can always chose a different end so long as it's not past buffer.data + buffer.size.
They are using a char array so that you can access each byte of the data buffer since size of char is usually 1 byte.
Reading the contents of the data buffer depends on the application. If you know how the internal data is encoded, you can write an unpacking function which selects chunks of the char array and convert/typecast it to the target variables.
eg: Lets say the data buffer is actually a list of integers of size 4 bytes.
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char const* argv[])
{
//how the data buffer was probably filled
int *a = (int *)malloc(10*sizeof(int));
int i;
for(i=0;i<10;i++) {
a[i] = i;
}
char *data = (char *)a;
//how we could read from the data buffer
int *b = (int *)malloc(10*sizeof(int));
char *p = data;
for(i=0;i<10;i++) {
b[i]=(int )*p;
printf("got value %d\n",b[i]);
p += sizeof(int);
}
free(a);
free(b);
return 0;
}
Note: That being said, since this is C++, it would be much safer if we could avoid using char pointers and work with strings or vectors. Other answers have explored other options of how to handle such buffers properly in C++.

memcpy behaving in an unexpected way

Given below is my sample code :
int function1(unsigned char *out, int length){
unsigned long crypto_out_len = 16;
unsigned char crypto_out[16] = {0};
.......
//produces 16 bytes output & stores in crypto_out
crypto_function(crypto_out, crypto_out_len);
//lets say crypto_output contents after are : "abcdefghijklmnop"
.......
memcpy(out, crypto_out,length);
return 0;
}
function2(){
unsigned char out[10] = {0};
function1(out, 10);
std::pair<unsigned char *,int> map_entry;
map_entry.first = out;
map_entry.second = 10;
}
Now, map_entry.first should contain : "abcdefghij", right?
But it contains "abcdefghij#$%f1^", some garbage associated with it. How should I avoid such unexpected behaviour so that map_entry.first should contain exactly "abcdefghij".
Since you haven't pasted the whole code, I can't be 100% sure but I think I know what's wrong. memcpy() is behaving correctly here, and everything is 100% defined behavior.
In this case, out is a 10-character string without a null terminator. You assign it to unsigned char* that contains no length information, and I suspect you simply don't use the number ten when you are referring to map_entry.first.
If you print it as unsigned char* or construct a std::string with it, C++ expects it to be a null-terminated string. Therefore, it reads it up until the first null character. Now, since out didn't have one it just runs over and starts reading characters on the stack after out which happen to be what you see as garbage.
What you need to do, is make sure that either the string is null-terminated or make sure that you always refer to it specifying the correct length. For the former, you'd want to make out 11-byte long, and leave the last byte as 0:
function2(){
unsigned char out[11] = {0};
function1(out, 10);
std::pair<unsigned char *,int> map_entry;
map_entry.first = out;
map_entry.second = 10;
}
Please also note that C++ will actually stop at the first null character it encounters. If your crypto_function() may output zero bytes in the middle of the string, you should be aware that the string will be truncated at the point.
For the latter, you'd have to use functions that actually allow you to specify the string length, and always pass the length of 10 to those. If you always work with it like this, you don't have to worry about zero bytes from crypto_function().
You are confusing char[] with strings. out does contain your expected data, but its not 0 terminated, so if you try to display it as a string it may look like it contains extra data. If the data is actually strings, you need to correctly 0 terminate them.

Difference between using character pointers and character arrays

Basic question.
char new_str[]="";
char * newstr;
If I have to concatenate some data into it or use string functions like strcat/substr/strcpy, what's the difference between the two?
I understand I have to allocate memory to the char * approach (Line #2). I'm not really sure how though.
And const char * and string literals are the same?
I need to know more on this. Can someone point to some nice exhaustive content/material?
The excellent source to clear up the confusion is Peter Van der Linden, Expert C Programming, Deep C secrets - that arrays and pointers are not the same is how they are addressed in memory.
With an array, char new_str[]; the compiler has given the new_str a memory address that is known at both compilation and runtime, e.g. 0x1234, hence the indexing of the new_str is simple by using []. For example new_str[4], at runtime, the code picks the address of where new_str resides in, e.g. 0x1234 (that is the address in physical memory). by adding the index specifier [4] to it, 0x1234 + 0x4, the value can then be retrieved.
Whereas, with a pointer, the compiler gives the symbol char *newstr an address e.g. 0x9876, but at runtime, that address used, is an indirect addressing scheme. Supposing that newstr was malloc'd newstr = malloc(10);, what is happening is that, everytime a reference in the code is made to use newstr, since the address of newstr is known by the compiler i.e. 0x9876, but what is newstr pointing to is variable. At runtime, the code fetches data from physical memory 0x9876 (i.e. newstr), but at that address is, another memory address (since we malloc'd it), e.g 0x8765 it is here, the code fetches the data from that memory address that malloc assigned to newstr, i.e. 0x8765.
The char new_str[] and char *newstr are used interchangeably, since an zeroth element index of the array decays into a pointer and that explains why you could newstr[5] or *(newstr + 5) Notice how the pointer expression is used even though we have declared char *newstr, hence *(new_str + 1) = *newstr; OR *(new_str + 1) = newstr[1];
In summary, the real difference between the two is how they are accessed in memory.
Get the book and read it and live it and breathe it. Its a brilliant book! :)
Please go through this article below:
Also see in case of array of char like in your case, char new_str[] then the new_str will always point to the base of the array. The pointer in itself can't be incremented. Yes you can use subscripts to access the next char in array eg: new_str[3];
But in case of pointer to char, the pointer can be incremented new_str++ to fetch you the next character in the array.
Also I would suggest this article for more clarity.
This is a character array:
char buf [1000];
So, for example, this makes no sense:
buf = &some_other_buf;
This is because buf, though it has characteristics of type pointer, it is already pointing to the only place that makes sense for it.
char *ptr;
On the other hand, ptr is only a pointer, and may point somewhere. Most often, it's something like this:
ptr = buf; // #1: point to the beginning of buf, same as &buf[0]
or maybe this:
ptr = malloc (1000); // #2: allocate heap and point to it
or:
ptr = "abcdefghijklmn"; // #3: string constant
For all of these, *ptr can be written to—except the third case where some compiling environment define string constants to be unwritable.
*ptr++ = 'h'; // writes into #1: buf[0], #2: first byte of heap, or
// #3 overwrites "a"
strcpy (ptr, "ello"); // finishes writing hello and adds a NUL
The difference is that one is a pointer, the other is an array. You can, for instance, sizeof() array. You may be interested in peeking here
If you're using C++ as your tags indicate, you really should be using the C++ strings, not the C char arrays.
The string type makes manipulating strings a lot easier.
If you're stuck with char arrays for some reason, the line:
char new_str[] = "";
allocates 1 byte of space and puts a null terminator character into it. It's subtly different from:
char *new_str = "";
since that may give you a reference to non-writable memory. The statement:
char *new_str;
on its own gives you a pointer but nothing that it points to. It can also have a random value if it's local to a function.
What people tend to do (in C rather than C++) is to do something like:
char *new_str = malloc (100); // (remember that this has to be freed) or
char new_str[100];
to get enough space.
If you use the str... functions, you're basically responsible for ensuring that you have enough space in the char array, lest you get all sorts of weird and wonderful practice at debugging code. If you use real C++ strings, a lot of the grunt work is done for you.
The type of the first is char[1], the second is char *. Different types.
Allocate memory for the latter with malloc in C, or new in C++.
char foo[] = "Bar"; // Allocates 4 bytes and fills them with
// 'B', 'a', 'r', '\0'.
The size here is implied from the initializer string.
The contents of foo are mutable. You can change foo[i] for example where i = 0..3.
OTOH if you do:
char *foo = "Bar";
The compiler now allocates a static string "Bar" in readonly memory and cannot be modified.
foo[i] = 'X'; // is now undefined.
char new_str[]="abcd";
This specifies an array of characters (a string) of size 5 bytes (one byte for each character plus one for the null terminator). So it stores the string 'abcd' in memory and we can access this string using the variable new_str.
char *new_str="abcd";
This specifies a string 'abcd' is stored somewhere in the memory and the pointer new_str points to the first character of that string.
To differentiate them in the memory allocation side:
// With char array, "hello" is allocated on stack
char s[] = "hello";
// With char pointer, "hello" is stored in the read-only data segment in C++'s memory layout.
char *s = "hello";
// To allocate a string on heap, malloc 6 bytes, due to a NUL byte in the end
char *s = malloc(6);
s = "hello";
If you're in c++ why not use std::string for all your string needs? Especially anything dealing with concatenation. This will save you from a lot of problems.

Dereferencing Variable Size Arrays in Structs

Structs seem like a useful way to parse a binary blob of data (ie a file or network packet). This is fine and dandy until you have variable size arrays in the blob. For instance:
struct nodeheader{
int flags;
int data_size;
char data[];
};
This allows me to find the last data character:
nodeheader b;
cout << b.data[b.data_size-1];
Problem being, I want to have multiple variable length arrays:
struct nodeheader{
int friend_size;
int data_size;
char data[];
char friend[];
};
I'm not manually allocating these structures. I have a file like so:
char file_data[1024];
nodeheader* node = &(file_data[10]);
As I'm trying to parse a binary file (more specifically a class file). I've written an implementation in Java (which was my class assignment), no I'm doing a personal version in C++ and was hoping to get away without having to write 100 lines of code. Any ideas?
Thanks,
Stefan
You cannot have multiple variable sized arrays. How should the compiler at compile time know where friend[] is located? The location of friend depends on the size of data[] and the size of data is unknown at compile time.
This is a very dangerous construct, and I'd advise against it. You can only include a variable-length array in a struct when it is the LAST element, and when you do so, you have to make sure you allocate enough memory, e.g.:
nodeheader *nh = (nodeheader *)malloc(sizeof(nodeheader) + max_data_size);
What you want to do is just use regular dynamically allocated arrays:
struct nodeheader
{
char *data;
size_t data_size;
char *friend;
size_t friend_size;
};
nodeheader AllocNodeHeader(size_t data_size, size_t friend_size)
{
nodeheader nh;
nh.data = (char *)malloc(data_size); // check for NULL return
nh.data_size = data_size;
nh.friend = (char *)malloc(friend_size); // check for NULL return
nh.friend_size = friend_size;
return nh;
}
void FreeNodeHeader(nodeheader *nh)
{
free(nh->data);
nh->data = NULL;
free(nh->friend);
nh->friend = NULL;
}
You can't - at least not in the simple way that you're attempting. The unsized array at the end of a structure is basically an offset to the end of the structure, with no build-in way to find the end.
All the fields are converted to numeric offsets at compile time, so they need to be calculable at that time.
The answers so far are seriously over-complicating a simple problem. Mecki is right about why it can't be done the way you are trying to do it, however you can do it very similarly:
struct nodeheader
{
int friend_size;
int data_size;
};
struct nodefile
{
nodeheader *header;
char *data;
char *friend;
};
char file_data[1024];
// .. file in file_data ..
nodefile file;
file.header = (nodeheader *)&file_data[0];
file.data = (char *)&file.header[1];
file.friend = &file.data[file->header.data_size];
For what you are doing you need an encoder/decoder for the format. The decoder takes the raw data and fills out your structure (in your case allocating space for the copy of each section of the data), and the decoder writes raw binary.
(Was 'Use std::vector')
Edit:
On reading feedback, I suppose I should expand my answer. You can effectively fit two variable length arrays in your structure as follows, and the storage will be freed for you automatically when file_data goes out of scope:
struct nodeheader {
std::vector<unsigned char> data;
std::vector<unsigned char> friend_buf; // 'friend' is a keyword!
// etc...
};
nodeheader file_data;
Now file_data.data.size(), etc gives you the length and and &file_data.data[0] gives you a raw pointer to the data if you need it.
You'll have to fill file data from the file piecemeal - read the length of each buffer, call resize() on the destination vector, then read in the data. (There are ways to do this slightly more efficiently. In the context of disk file I/O, I'm assuming it doesn't matter).
Incidentally OP's technique is incorrect even for his 'fine and dandy' cases, e.g. with only one VLA at the end.
char file_data[1024];
nodeheader* node = &(file_data[10]);
There's no guarantee that file_data is properly aligned for the nodeheader type. Prefer to obtain file_data by malloc() - which guarantees to return a pointer aligned for any type - or else (better) declare the buffer to be of the correct type in the first place:
struct biggestnodeheader {
int flags;
int data_size;
char data[ENOUGH_SPACE_FOR_LARGEST_HEADER_I_EVER_NEED];
};
biggestnodeheader file_data;
// etc...