VOLUME_BITMAP_BUFFER - Accessing the BYTE Buffer? - c++

As the title suggests, I'm doing some low-level volume access (as close to C code as possible) to make a disk clone program, I want to set all the free clusters to zero to use a simple compression and keep the size small.
I've been beating my head off a wall forever trying to figure out why I can't get the FSCTL_GET_VOLUME_BITMAP function working properly... so if possible please don't link me to any external reading as I've probably already been there and its either been C#, invalid links, or has no explanation I am looking for.
I want to understand the buffer itself more than i need the actual code.
The simplest way I can ask is what is the proper way to read from an array with a length of [1] in C/C++ like the one used by VOLUME_BITMAP_BUFFER?
The only way I can even assign anything to it is by recreating it with my own huge buffer and I still end up with errors, even after locking the volume in Recovery mode. I do get all needed permissions to access the raw disk just on a side note.
I know I'm likely missing some fundamental basic in C++ that would allow me to read from the memory its stored in, but I just can't figure it out without getting errors.
In case I happen to be looping through the bytes wrong which is causing my error, I added how I was doing it...although that still leaves me with the Buffer question.
I know you can call multiple times, but I have to assume its not 8 bytes at a time.
Something like this (pardon my code..I typed this on my phone so it likely has errors)...I tried adding any relevant cause of failure in case, but the buffer is the real question.
#define BYTE_MASK = 0x80;
#define BITS_PER_BYTE = 8;
void function foo() {
const int BUFFER_SIZE = 268435456;
struct {
LARGE_INTEGER StartingLcn;
LARGE_INTEGER BitmapSize;
BYTE Buffer[BUFF_SIZE];
} volBuff;
// I want to use VOLUME_BITMAP_BUFFER
/* Part of a larger loop checking for errors and more data
BYTE Mask = 1;
BOOL b = DeviceIoControl(vol, FSCTL_GET-VOLUME_BITMAP, &lcnStart, sizeof(STARTING_LCN_INPUT_BUFFER), &volBuff, sizeof(volBuff), &dwRet);
*/
for (x = 0; x < (bmpLen / BITS_PER_BYTE;) {
if ((volBuff.Buffer[x] & Mask) != 0) {
NotFree++;
} else {
FreeSpc++;
}
// I did try not dividing the size
if (Mask == BYTE_MASK) {
Mask = 1;
x++;
} else {
Mask = (Mask << 1);
}
}
return;
}
I've literally put an entire project on hold for I don't even know how long just being stubborn at this point...and I can't find any answer that actually explains the buffer, forget the rest of the process.
If someone wants to be more thorough I won't complain after all my attempts, but the buffer is driving me crazy at this point.
Any help would be greatly appreciated.

I can safely assume the one answer I was given
"...array with a length of [1]..." there is no way in Standard C++ of accessing the additional bytes. You can either: (a) pray that your compiler can do this (via an extension) or (b) write a C module (where this is well defined) that you can call from C++. - Richard Critton"
Was as correct of an answer I can expect after my extensive attempts to make this work any other way, especially since I was only able to make my own array work using standard C and not C++ directly.
I wanted to put a close on this since my computer is out of use for a bit.
If the problem continues after I dig through some examples for defragmenting in C that I FINALLY came across I'll ask a more direct question with full code to support it.
That answer was enough to remove the wall I had hit and get me thinking again. I thank you for that.

Related

How to speed up program execution

This is a very simple question, but unfortunately, I am stuck and do not know what to do. My program is a simple program that keeps on accepting 3 numbers and outputs the largest of the 3. The program keeps on running until the user inputs a character.
As the tittle says, my question is how I can make this execute faster ( There will be a large amount of input data ). Any sort of help which may include using a different algorithm or using different functions or changing the entire code is accepted.
I'm not very experienced in C++ Standard, and thus do not know about all the different functions available in the different libraries, so please do explain your reasons and if you're too busy, at least try and provide a link.
Here is my code
#include<stdio.h>
int main()
{
int a,b,c;
while(scanf("%d %d %d",&a,&b,&c))
{
if(a>=b && a>=c)
printf("%d\n",a);
else if(b>=a && b>=c)
printf("%d\n",b);
else
printf("%d\n",c);
}
return 0;
}
It's working is very simple. The while loop will continue to execute until the user inputs a character. As I've explained earlier, the program accepts 3 numbers and outputs the largest. There is no other part of this code, this is all. I've tried to explain it as much as I can. If you need anything more from my side, please ask, ( I'll try as much as I can ).
I am compiling on an internet platform using CPP 4.9.2 ( That's what is said over there )
Any sort of help will be highly appreciated. Thanks in advance
EDIT
The input is made by a computer, so there is no delay in input.
Also, I will accept answers in c and c++.
UPDATE
I would also like to ask if there are any general library functions or algorithms, or any other sort of advise ( certain things we must do and what we must not do ) to follow to speed up execution ( Not just for this code, but in general ). Any help would be appreciated. ( and sorry for asking such an awkward question without giving any reference material )
Your "algorithm" is very simple and I would write it with the use of the max() function, just because it is better style.
But anyway...
What will take the most time is the scanf. This is your bottleneck. You should write your own read function which reads a huge block with fread and processes it. You may consider doing this asynchronously - but I wouldn't recommend this as a first step (some async implementations are indeed slower than the synchronous implementations).
So basically you do the following:
Read a huge block from file into memory (this is disk IO, so this is the bottleneck)
Parse that block and find your three integers (watch out for the block borders! the first two integers may lie within one block and the third lies in the next - or the block border splits your integer in the middle, so let your parser just catch those things)
Do your comparisions - that runs as hell compared to the disk IO, so no need to improve that
Unless you have a guarantee that the three input numbers are all different, I'd worry about making the program get the correct output. As noted, there's almost nothing to speed up, other than input and output buffering, and maybe speeding up decimal conversions by using custom parsing and formatting code, instead of the general-purpose scanf and printf.
Right now if you receive input values a=5, b=5, c=1, your code will report that 1 is the largest of those three values. Change the > comparisons to >= to fix that.
You can minimize the number of comparisons by remembering previous results. You can do this with:
int d;
if (a >= b)
if (a >= c)
d = a;
else
d = c;
else
if (b >= c)
d = b;
else
d = c;
[then output d as your maximum]
That does exactly 2 comparisons to find a value for d as max(a,b,c).
Your code uses at least two and maybe up to 4.

C++: Determining whether a variable contains no data

I've been messing around in C++ a little bit but I'm still pretty new. I searched around a little bit and even using the keywords of exactly the problem I am trying to tackle yields no results. Basically I am just trying to figure out how to tell if a variable has no data. I have a file that my program reads and it searches for a specific character within that file and basically uses delimiters to determine where to store the actual data in a variable. Now I added some comments in the file saying that it should not be edited which has caused me some problems. So I pretty much want to count the number of comments, but I'm not sure how to do it because the way I had it set up was resulting in huge numbers being returned. So I figured I would attempt to fix it with a simple if statement to see if there was any data in the array while it was running the loop, and if there was then simply add +1 to my variable. Needless to say it did not work. Here's the code. And if you know a better way of doing this, by all means please do share.
size_t arySearchData[20];
size_t commentLines[20];
size_t foundDelimiter;
size_t foundComment;
int commentsNum;
foundDelimiter = lineText.find("]");
foundComment = lineText.find("#");
if (foundComment != std::string::npos) {
commentLines[20] = int(foundComment);
if (foundComment = <PROBLEM>){
commentsNum++;
}
}
So it successfully gets the two comments in my file and recognizes that they are located at the first index(0) in each line but when I tried to have it just do commentsNum++ in my first if statement it just comes up with tons of random numbers, and I am not sure why. So as I said my problem is within the second if statement, I need a void or just a better way to solve this. Any help would be greatly appreciated.
And yes I do realize I could just determine if there 'was' data in the there rather than being void or null but then it would have to be specific and if the comment (#) had a space before it, then it would render my method of reading the file useless as the index will have changed.
A variable in C++ always contains data, just it may not be initialised.
int i;
It will have some value, what it is can't be determined until you do something like
i = 1337;
until you do that the value of i will be what ever happened to be in the memory location that i has been assigned to.
The compile may pick up on the fact that you are trying to use a variable which you have not actually given a value your self, but this will normally just be a warning, as their is nothing wrong as such with doing so
You do not initialize commentsNum. Try this:
int commentsNum = 0;
In C++ other than static variables, other variables are assigned undetermined values. This is primarily done to adhere to underlying philosophy -- "you don't pay for things you don't use", so it doesn't zero that memory by default." However, for static variables, memory is allocated at link time. Unlike runtime initialization, which would need to happen in local variables, link time allocation and initialization incur low cost.
I would recommend hence setting int commentsNum = 0;

From where starts the process' memory space and where does it end?

On Windows platform, I'm trying to dump memory from my application where the variables lie. Here's the function:
void MyDump(const void *m, unsigned int n)
{
const unsigned char *p = reinterpret_cast<const unsigned char *>(m);
char buffer[16];
unsigned int mod = 0;
for (unsigned int i = 0; i < n; ++i, ++mod) {
if (mod % 16 == 0) {
mod = 0;
std::cout << " | ";
for (unsigned short j = 0; j < 16; ++j) {
switch (buffer[j]) {
case 0xa:
case 0xb:
case 0xd:
case 0xe:
case 0xf:
std::cout << " ";
break;
default: std::cout << buffer[j];
}
}
std::cout << "\n0x" << std::setfill('0') << std::setw(8) << std::hex << (long)i << " | ";
}
buffer[i % 16] = p[i];
std::cout << std::setw(2) << std::hex << static_cast<unsigned int>(p[i]) << " ";
if (i % 4 == 0 && i != 1)
std::cout << " ";
}
}
Now, how can I know from which address starts my process memory space, where all the variables are stored? And how do I now, how long the area is?
For instance:
MyDump(0x0000 /* <-- Starts from here? */, 0x1000 /* <-- This much? */);
Best regards,
nhaa123
The short answer to this question is you cannot approach this problem this way. The way processes are laid out in memory is very much compiler and operating system dependent, and there is no easy to to determine where all of the code and variables lie. To accurately and completely find all of the variables, you'd need to write large portions of a debugger yourself (or borrow them from a real debugger's code).
But, you could perhaps narrow the scope of your question a little bit. If what you really want is just a stack trace, those are not too hard to generate: How can one grab a stack trace in C?
Or if you want to examine the stack itself, it is easy to get a pointer to the current top of the stack (just declare a local variable and then take it's address). Tthe easiest way to get the bottom of the stack is to declare a variable in main, store it's address in a global variable, and use that address later as the "bottom" (this is easy but not really 'clean').
Getting a picture of the heap is a lot lot lot harder, because you need extensive knowledge of the internal workings of the heap to know which pieces of it are currently allocated. Since the heap is basically "unlimited" in size, that's quite alot of data to print if you just print all of it, even the unused parts. I don't know of a way to do this, and I would highly recommend you not waste time trying.
Getting a picture of static global variables is not as bad as the heap, but also difficult. These live in the data segments of the executable, and unless you want to get into some assembly and parsing of executable formats, just avoid doing this as well.
Overview
What you're trying to do is absolutely possible, and there are even tools to help, but you'll have to do more legwork than I think you're expecting.
In your case, you're particularly interested in "where the variables lie." The system heap API on Windows will be an incredible help to you. The reference is really quite good, and though it won't be a single contiguous region the API will tell you where your variables are.
In general, though, not knowing anything about where your memory is laid out, you're going to have to do a sweep of the entire address space of the process. If you want only data, you'll have to do some filtering of that, too, because code and stack nonsense are also there. Lastly, to avoid seg-faulting while you dump the address space, you may need to add a segfault signal handler that lets you skip unmapped memory while you're dumping.
Process Memory Layout
What you will have, in a running process, is multiple disjoint stretches of memory to print out. They will include:
Compiled code (read-only),
Stack data (local variables),
Static Globals (e.g. from shared libraries or in your program), and
Dynamic heap data (everything from malloc or new).
The key to a reasonable dump of memory is being able to tell which range of addresses belongs to which family. That's your main job, when you're dumping the program. Some of this, you can do by reading the addresses of functions (1) and variables (2, 3 and 4), but if you want to print more than a few things, you'll need some help.
For this, we have...
Useful Tools
Rather than just blindly searching the address space from 0 to 2^64 (which, we all know, is painfully huge), you will want to employ OS and compiler developer tools to narrow down your search. Someone out there needs these tools, maybe even more than you do; it's just a matter of finding them. Here are a few of which I'm aware.
Disclaimer: I don't know many of the Windows equivalents for many of these things, though I'm sure they exist somewhere.
I've already mentioned the Windows system heap API. This is a best-case scenario for you. The more things you can find in this vein, the more accurate and easy your dump will be. Really, the OS and the C runtime know quite a bit about your program. It's a matter of extracting the information.
On Linux, memory types 1 and 3 are accessible through utilities like /proc/pid/maps. In /proc/pid/maps you can see the ranges of the address space reserved for libraries and program code. You can also see the protection bits; read-only ranges, for instance, are probably code, not data.
For Windows tips, Mark Russinovich has written some articles on how to learn about a Windows process's address space and where different things are stored. I imagine he might have some good pointers in there.
Well, you can't, not really... at least not in a portable manner. For the stack, you could do something like:
void* ptr_to_start_of_stack = 0;
int main(int argc, char* argv[])
{
int item_at_approximately_start_of_stack;
ptr_to_start_of_stack = &item_at_approximately_start_of_stack;
// ...
// ... do lots of computation
// ... a function called here can do something similar, and
// ... attempt to print out from ptr_to_start_of_stack to its own
// ... approximate start of stack
// ...
return 0;
}
In terms of attempting to look at the range of the heap, on many systems, you could use the sbrk() function (specifically sbrk(0)) to get a pointer to the start of the heap (typically, it grows upward starting from the end of the address space, while the stack typically grows down from the start of the address space).
That said, this is a really bad idea. Not only is it platform dependent, but the information you can get from it is really not as useful as good logging. I suggest you familiarize yourself with Log4Cxx.
Good logging practice, in addition to the use of a debugger such as GDB, is really the best way to go. Trying to debug your program by looking at a full memory dump is like trying to find a needle in a haystack, and so it really is not as useful as you might think. Logging where the problem might logically be, is more helpful.
AFAIK, this depends on OS, you should look at e.g. memory segmentation.
Assuming you are running on a mainstream operating system. You'll need help from the operating system to find out which addresses in your virtual memory space have mapped pages. For example, on Windows you'd use VirtualQueryEx(). The memory dump you'll get can be as large as two gigabytes, it isn't that likely you discover anything recognizable quickly.
Your debugger already allows you to inspect memory at arbitrary addresses.
You can't, at least not portably. And you can't make many assumptions either.
Unless you're running this on CP/M or MS-DOS.
But with modern systems, the where and hows of where you data and code are located, in the generic case, aren't really up to you.
You can play linker games, and such to get better control of the memory map for you executable, but you won't have any control over, say, any shared libraries you may load, etc.
There's no guarantee that any of your code, for example, is even in a continuous space. The Virtual Memory and loader can place code pretty much where it wants. Nor is there any guarantee that your data is anywhere near your code. In fact, there's no guarantee that you can even READ the memory space where your code lives. (Execute, yes. Read, maybe not.)
At a high level, your program is split in to 3 sections: code, data, and stack. The OS places these where it sees fit, and the memory manager controls what and where you can see stuff.
There are all sorts of things that can muddy these waters.
However.
If you want.
You can try having "markers" in your code. For example, put a function at the start of your file called "startHere()" and then one at the end called "endHere()". If you're lucky, for a single file program, you'll have a continuous blob of code between the function pointers for "startHere" and "endHere".
Same thing with static data. You can try the same concept if you're interested in that at all.

Really strange problem with just in time debugging and filestreams

I developed a small program which was working fine until I made a really minor change in some unrelated part of the code. From that point onwards the program throws an unhandled win32 exception and Microsoft Visual Studio Just in time debugger kicks in.
I am using codeblocks and my compiler is the gcc compiler. What is frustrating is that the program works fine if I choose to debug from codeblocks with the gdb. This is what does not make sense to me.
Since I can not debug with gdb to see what's wrong (because it runs fine in debugging mode), I put printfs here and there to find the root of it all. I isolated in one function but it just does not make sense.
bool FileReader::readBitmap(int fileNum)
{
char check;
int dataOffset;
int dataSize;
string fileName;
//used for quick int to string conversion
std::ostringstream stringstream;
stringstream<<fileNum;
string fileNumber = stringstream.str();
fileName = "img"+fileNumber+".bmp";
ifstream stream(fileName.c_str(),ios::in|ios::binary);
stream.read(&check,1);
//checking if it is a bitmap file
if(check != 'B')
return false;
stream.read(&check,1);
if(check != 'M')
return false;
stream.seekg(BMPBPP);
stream.read(&check,1);
//if it is not a monochrome bitmap
if(((int)check) != 1)
return false;//quit
//get the dataoffset
stream.seekg(DATAOFFSET);
stream.read(&check,1);
dataOffset = (int)check;
//get the data size in bytes
stream.seekg(DATASIZEINBYTES);
stream.read(&check,1);
dataSize = (int)check;
//if this is the first image we read
if(firstImageRead)
{
//allocate the image buffer
imgBuffer = (char*) malloc(dataSize);
//and make sure it does not get re-allocated
firstImageRead = false;
}
//get the actual bitmap data
stream.seekg(dataOffset);
stream.read(imgBuffer,dataSize);
stream.close();
return true;
}
-BIG- EDIT: Trying to find what the problem could be I moved the ifstream from the function to being a private member of the class. And the function now does EXACTLY the same only that it uses stream.open() to open the file.
Now it works with no problems. So the problem lies somehow ... in the ifstream being initialized every time inside the function, as opposed to just being used inside the function. Still ... does not make sense and this should not have occured.
I am really intrigued to find what the problem was here?
Honestly does anyone have any idea what this could be attributed to?
A few points to investigate:
Is firstImageRead initialized to true?
The rest of the code doesn't know how big imgBuffer is, so further processing is probably reading beyond the end of the buffer. How does the rest of your code determine how much data to read from imgBuffer?
If dataSize of any image is bigger than for the first one, imgBuffer will be too small.
If the character you read at position DATASIZEINBYTES happens to be negative, you will try to malloc() about 2GB.
Sidenote: Is it correct, that you read only one byte for the image size? Are the images that small?
It would help if you showed the exception and callstack of the failure.
My guess would be there is a sharing violation, you're opening the file again before it was closed. In the cases where you exit early, the file is not being closed.
Are you sure the file exists and that you are reading from the stream ok?
What about that global 'imgBuffer', make sure nothing is calling free on it between several calls to FileReader::readBitmap()
Hello and thanks for you input, but
unfortunately the file does exist and
the only time this buffer is freed is
at the end of the program. It has to
be this function since if I empty it
and just return true; , the program
runs fine ... but has no input (no
file read).
Make sure you are allocating enough space for the buffer. Remember that the malloc() is only called the first time you call this function. If dataSize is say 1000 the first time, and if you on the next file want to read 2000 bytes, dataSize will be 2000, but the buffer is only allocated for 1000 bytes.
I am just answering my own question since I found what the problem was. As most people guessed it was indeed a pointer index going out of bounds by just +1. What made it really hard to spot was that the debugger pointed me to a totally different direction.
That also explains why adding one more private member in the class 'fixed' the problem. It allocated more memory for the fileReader object, and writing out of bounds wrote on the memory occupied by the additional private member and did not cause an unhandled exception.
What do we learn from all this? Be very very careful when setting indices .... since well this is not the first time this happened to me :)

MD5 Code Coverage

I'm currently implementing an MD5 hash algorithm based RSA data security code, in the UpdateData method there is a section which reads:
mCount[0] += (length << 3);
if (mCount[0] < (length << 3))
{
mCount[1]++;
}
I'm trying at the moment to understand how the if statement could ever evaluate to true (the mCount[0] value is initialised to 0). Any help would be greatly appreciated.
Thanks
It can happen if there is an overflow of the mCount[0] variable.
unsigned int i = 4294967295;//2^32-1
unsigned int j = 1;
i += j;
assert(i < j);
The block of code you mentioned is probably called multiple times, depending on how much data there is to process. So mCount[0] will eventually overflow.
This is for carry propagation, the sum of length*8 is stored in two 32-bit words (here, mCount is likely an array of unsigned int) mCount[1]:mCount[0].
lo += a
if (lo < a) hi++; // true if overflow occurs: lo + a >= 2^32
is equivalent to 64-bit operation:
(hi:lo) += (0:a)
This will happen whenever mCount[0] is negative before the addition and if the addition itself does not overflow.
I know this is not technically an answer to your programming question, but I honestly think the most valuable advice I can give you is that you should use a widely-used and well-vetted MD5 (or any other crypto) algorithm, DO NOT roll your own. How can I put this delicately... this advice is doubly true if you are asking questions about integer math. The road to hell is littered with the bodies of people who tried to implement tricky crypto themselves without fully understanding what they were doing, and ended up leaving gaping security holes in the process. Be smart, use somebody else's debugged implementation, use your own valuable time to implement parts of the system you can't get from someplace else.
As Eric and Brian point out, this happens has when mCount[0] overflows, which happens every 500mb (2^29), so if you hash files/data streams larger than 500mb you will see this code trigger.
Using two 32bit counters, allows for 2^61 bytes of input before the counter truly overflows.