A few months ago I read a book on security practices, and it suggested the following method for protecting our classes from overwriting with e.g. overflows etc.:
first define a magic number and a fixed-size array (can be a simple integer too)
use that array containing the magic number, and place one at the top, and one at the bottom of our class
a function compares these numbers, and if they are equal, and equal to the static variable, the class is ok, return true, else it is corrupt, and return false.
place this function at the start of every other class method, so this will check the validity of the class on function calls
it is important to place this array at the start and the end of the class
At least this is as I remember it. I'm coding a file encryptor for learning purposes, and I'm trying to make this code exception safe.
So, in which scenarios is it useful, and when should I use this method, or is this something totally useless to count on? Does it depend on the compiler or OS?
PS: I forgot the name of the book mentioned in this post, so I cannot check it again, if anyone of you know which one was it please tell me.
What you're describing sounds a Canary, but within your program, as opposed to the compiler. This is usually on by default when using gcc or g++ (plus a few other buffer overflow countermeasures).
If you're doing mutable operations on your class and you want to make sure you don't have side effects, I don't know if having a magic number is very useful. Why rely on a homebrew validity check when there are mothods out there that are more likely to be successful?
Checksums: I think it'd be more useful for you to hash the unencrypted text and add that to the end of the encrypted file. When decrypting, remove the hash and compare the hash(decrypted text) with what it should be.
I think most, if not all, widely used encryptors/decryptors store some sort of checksum in order to verify that the data has not changed.
This type of a canary will partially protect you against a very specific type of overflow attack. You can make it a little more robust by randomizing the canary value every time you run the program.
If you're worried about buffer overflow attacks (and you should be if you are ever parsing user input), then go ahead and do this. It probably doesn't cost too much in speed to check your canaries every time. There will always be other ways to attack your program, and there might even be careful buffer overflow attacks that get around your canary, but it's a cheap measure to take so it might be worth adding to your classes.
Related
I know that there are implementations of memcpy, which copied memory in reverse order to optimize for some processors. At one time, a bug "Strange sound on mp3 flash website" was connected with that. Well, it was an interesting story, but my question is about another function.
I am wondering, there is a memset function in the world, which fills the buffer, starting from the end. It is clear that in theory nothing prevents doing such an implementation of a function. But I am interested exactly in the fact that this function was done in practice by someone somewhere. I would be especially grateful on the link on the library with such a function.
P.S. I understand that in terms of applications programming it has completely no difference whether the buffer is filled in the ascending or descending order. However, it is important for me to find out whether there was any "reverse" function implementation. I need it for writing an article.
The Linux kernel's memset for the SuperH architecture has this property:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/sh/lib/memset.S?id=v4.14
Presumably it's done this way because the mov instruction exists in predecrement form (mov.l Rm,#-Rn) but not postincrement form. See:
http://shared-ptr.com/sh_insns.html
If you want something that's not technically kernel internals on a freestanding implementation, but an actual hosted C implementation that application code could get linked to, musl libc also has an example:
https://git.musl-libc.org/cgit/musl/tree/src/string/memset.c?id=v1.1.18
Here, the C version of memset (used on many but not all target archs) does not actually fill the whole buffer backwards, but rather starts from both the beginning and end in a manner that reduces the number of conditional branches and makes them all predictable for very small memsets. See the commit message where it was added for details:
https://git.musl-libc.org/cgit/musl/commit/src/string/memset.c?id=a543369e3b06a51eacd392c738fc10c5267a195f
Some of the arch-specific asm versions of memset also have this property:
https://git.musl-libc.org/cgit/musl/tree/src/string/x86_64/memset.s?id=v1.1.18
I'm writing a fairly straightforward function that sends an array over to a file descriptor. However, in order to send the data, I need to append a one byte header.
Here is a simplified version of what I'm doing and it seems to work:
void SendData(uint8_t* buffer, size_t length) {
uint8_t buffer_to_send[length + 1];
buffer_to_send[0] = MY_SPECIAL_BYTE;
memcpy(buffer_to_send + 1, buffer, length);
// more code to send the buffer_to_send goes here...
}
Like I said, the code seems to work fine, however, I've recently gotten into the habit of using the Google C++ style guide since my current project has no set style guide for it (I'm actually the only software engineer on my project and I wanted to use something that's used in industry). I ran Google's cpplint.py and it caught the line where I am creating buffer_to_send and threw some comment about not using variable length arrays. Specifically, here's what Google's C++ style guide has to say about variable length arrays...
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Variable-Length_Arrays_and_alloca__
Based on their comments, it appears I may have found the root cause of seemingly random crashes in my code (which occur very infrequently, but are nonetheless annoying). However, I'm a bit torn as to how to fix it.
Here are my proposed solutions:
Make buffer_to_send essentially a fixed length array of a constant length. The problem that I can think of here is that I have to make the buffer as big as the theoretically largest buffer I'd want to send. In the average case, the buffers are much smaller, and I'd be wasting about 0.5KB doing so each time the function is called. Note that the program must run on an embedded system, and while I'm not necessarily counting each byte, I'd like to use as little memory as possible.
Use new and delete or malloc/free to dynamically allocate the buffer. The issue here is that the function is called frequently and there would be some overhead in terms of constantly asking the OS for memory and then releasing it.
Use two successive calls to write() in order to pass the data to the file descriptor. That is, the first write would pass only the one byte, and the next would send the rest of the buffer. While seemingly straightforward, I would need to research the code a bit more (note that I got this code handed down from a previous engineer who has since left the company I work for) in order to guarantee that the two successive writes occur atomically. Also, if this requires locking, then it essentially becomes more complex and has more performance impact than case #2.
Note that I cannot make the buffer_to_send a member variable or scope it outside the function since there are (potentially) multiple calls to the function at any given time from various threads.
Please let me know your opinion and what my preferred approach should be. Thanks for your time.
You can fold the two successive calls to write() in your option 3 into a single call using writev().
http://pubs.opengroup.org/onlinepubs/009696799/functions/writev.html
I would choose option 1. If you know the maximum length of your data, then allocate that much space (plus one byte) on the stack using a fixed size array. This is no worse than the variable length array you have shown because you must always have enough space left on the stack otherwise you simply won't be able to handle your maximum length (at worst, your code would randomly crash on larger buffer sizes). At the time this function is called, nothing else will be using the further space on your stack so it will be safe to allocate a fixed size array.
I remember I saw somewhere (probably in Github) an example like this in a setter:
void MyClass::setValue(int newValue)
{
if (value != newValue) {
value = newValue;
}
}
For me it doesn't make a lot of sense, but I wonder if it gives any performance improvement.
It have no sense for scalar types, but it may have sense for some user-defined types (since type can be really "big" or its assignment operator can do some "hard" work).
The deeper the instruction pipeline (and it only gets deeper and deeper on Intel platform at least), the higher the cost of a branch misprediction.
When a branch mispredicts, some instructions from the mispredicted
path still move through the pipeline. All work performed on these
instructions is wasted since they would not have been executed had the
branch been correctly predicted
So yes, adding an if int he code can actually hurt performance. The write would be L1 cached, possibly for a long time. If the write has to be visible then the operation would have to be interlocked to start with.
The only way you can really tell is by actually testing the different alternatives (benchmarking and/or profiling the code). Different compiler, different processors and different code calling it will make a big difference.
In general, and for "simple" data types (int, double, char, pointers, etc), it won't make sense. It will just make the code longer and more complex for the processor [at least if the compiler does what you ask of it - it may realize that "this doesn't make any sense, let's remove this check - I wouldn't rely on that tho' - compilers are often smarter than you, but making life more difficult for the compiler almost never leads to better code].
Edit: Additionally, it only makes GOOD sense to compare things that can be easily compared. If it's difficult to compare the data in the case where they are equal (for example, long strings take a lot of reads from both strings if they are equal [or strings that begin the same, and are only different in the last few characters]. So there is very little saving. The same applies for a class with a bunch of members that are often almost all the same, but one or two fields are not, and so on. On the other hand, if you have a "customer data" class, that has an integer customer ID that must be unique, then comparing just the customer id will be "cheap", but copying the customer name, address, phone number(s), and other data on the customer will be expensive. [Of course, in this case, why is it not a (smart) pointer or reference?]. End Edit.
If the data is "shared" between different processors (multiple threads accessing the same data), then it may help a little bit [in particular if this value is often read, and often written with the same value as before]. This is because "kicking out" the old value from the other processor's caches is expensive, and you only want to do that if you ACTUALLY change something.
And of course, it only makes ANY sense to worry about performance when you are working on code that you know is absolutely on the bleeding edge of the performance hot-path. Anywhere else, making the code as easily readable and as clear and concise as possible is always the best choice - this will also, typically, make the compiler more able to determine what is actually going on and ensure best optimization results.
This pattern is common in Qt, where the API is highly based on signals & slots. This pattern helps to avoid infinite looping in the case of cyclic connections.
In your case, where signals aren't present, this code only kills performance, as pointed out by #remus-rusanu and #mats-petersson.
I am trying to break up a long "main" program in order to be able to modify it, and also perhaps to unit-test it. It uses some huge data, so I hesitate:
What is best: to have function calls, with possibly extremely large (memory-wise) data being passed,
(a) by value, or
(b) by reference
(by extremely large, I mean maps and vectors of vectors of some structures and small classes... even images... that can be really large)
(c) Or to have private data that all the functions can access ? That may also mean that main_processing() or something could have a vector of all of them, while some functions will only have an item... With the advantage of functions being testable.
My question though has to do with optimization, while I am trying to break this monster into baby monsters, I also do not want to run out of memory.
It is not very clear to me how many copies of data I am going to have, if I create local variables.
Could someone please explain ?
Edit: this is not a generic "how to break down a very large program into classes". This program is part of a large solution, that is already broken down into small entities.
The executable I am looking at, while fairly large, is a single entity, with non-divisible data. So the data will either be all created as member variable in a single class, which I have already created, or it will (all of it) be passed around as argument around functions.
Which is better ?
If you want unit testing, you cannot "have private data that all the functions can access" because then, all of that data would be a part of each test case.
So, you must think about each function, and define exactly on which part of the data it works. As for function parameters and return values, it's very simple: use pass-by-value for small objects, and pass-by-reference for large objects.
You can use a guesstimate for the threshold that separates small and large. I use the rule "8 is small, anything more is large" but what is good for my system cannot be equally good for yours.
This seems more like a general question about OOP. Split up your data into logically grouped concepts (classes), and place the code that works with those data elements with the data (member functions), then tie it all together with composition, inheritance, etc.
Your question is too broad to give more specific advice.
I have a legacy code that receives some proprietary, parses it and creates a bunch of static char arrays (embedded in class representing the message), to represent NULL strings. Afterwards pointers to the string are passed all around and finally serialized to some buffer.
Profiling shows that str*() methods take a lot of time.
Therefore I would like to use memcpy() whether it's possible. To achive it I need a way to associate length with pointer to NULL terminating string. I though about:
Using std::string looks less efficient, since it requires memory allocation and thread synchronization.
I can use std::pair<pointer to string, length>. But in this case I need to maintain length "manually".
What do you think?
use std::string
Profiling shows that str*() methods
take a lot of time
Sure they do ... operating on any array takes a lot of time.
Therefore I would like to use memcpy()
whether it's possible. To achive it I
need a way to associate length with
pointer to NULL terminating string. I
though about:
memcpy is not really any slower than strcpy. In fact if you perform a strlen to identify how much you are going to memcpy then strcpy is almost certainly faster.
Using std::string looks less
efficient, since it requires memory
allocation and thread synchronization
It may look less efficient but there are a lot of better minds than yours or mine that have worked on it
I can use std::pair. But in this case I need to
maintain length "manually".
thats one way to save yourself time on the length calculation. Obviously you need to maintain the length manually. This is how windows BSTRs work, effectively (though the length is stored immediately prior, in memory, to the actual string data). std::string. for example, already does this ...
What do you think?
I think your question is asked terribly. There is no real question asked which makes answering next to impossible. I advise you actually ask specific questions in the future.
Use std::string. It's an advice already given, but let me explain why:
One, it uses a custom memory allocation scheme. Your char* strings are probably malloc'ed. That means they are worst-case aligned, which really isn't needed for a char[]. std::string doesn't suffer from needless alignment. Furthermore, common implementatios of std::string use the "Small String Optimization" which eliminates a heap allocation altogether, and improves locality of reference. The string size will be on the same cache line as the char[] itself.
Two, it keeps the string length, which is indeed a speed optimization. Most str* functions are slower because they don't have this information up front.
A second option would be a rope class, e.g. from SGI. This be more efficient by eliminating some string copies.
Your post doesn't explain where the str*() function calls are coming from; passing around char * certainly doesn't invoke them. Identify the sites that actually do the string manipulation and then try to find out if they're doing so inefficiently. One common pitfall is that strcat first needs to scan the destination string for the terminating 0 character. If you call strcat several times in a row, you can end up with a O(N^2) algorithm, so be careful about this.
Replacing strcpy by memcpy doesn't make any significant difference; strcpy doesn't do an extra pass to find the length of the string, it's simply (conceptually!) a character-by-character copy that stops when it encounters the terminating 0. This is not much more expensive than memcpy, and always cheaper than strlen followed by memcpy.
The way to gain performance on string operations is to avoid copies where possible; don't worry about making the copying faster, instead try to copy less! And this holds for all string (and array) implementations, whether it be char *, std::string, std::vector<char>, or some custom string / array class.
What do I think? I think that you should do what everyone else obsessed with pre-optimization does. You should find the most obscure, unmaintainable, yet intuitively (to you anyway) high-performance way you can and do it that way. Sounds like you're onto something with your pair<char*,len> with malloc/memcpy idea there.
Whatever you do, do NOT use pre-existing, optimized wheels that make maintenence easier. Being maintainable is simply the least important thing imaginable when you're obsessed with intuitively measured performance gains. Further, as you well know, you're quite a bit smarter than those who wrote your compiler and its standard library implementation. So much so that you'd be seriously silly to trust their judgment on anything; you should really consider rewriting the entire thing yourself because it would perform better.
And ... the very LAST thing you'll want to do is use a profiler to test your intuition. That would be too scientific and methodical, and we all know that science is a bunch of bunk that's never gotten us anything; we also know that personal intuition and revelation is never, ever wrong. Why waste the time measuring with an objective tool when you've already intuitively grasped the situation's seemingliness?
Keep in mind that I'm being 100% honest in my opinion here. I don't have a sarcastic bone in my body.