As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Can some one please tell me an approach for finding security flaws in a given code. For ex: in a given socket program. Any good examples or good book recommendations are welcome.
Thanks & Regards,
Mousey
The lowest hanging fruit in this category would be to simply search the source for functions which are commonly misused or are difficult use safely such as:
strcpy
strcat
sprintf
gets
then start looking at ones that are not inherintly too bad, but could be misused. Particularly anything that writes to a buffer can potentially be hazardous if misused.
memcpy
memmove
recv/read
send/write
the entire printf family should always have a constant for the format string
NOTE: all of these (except gets) can be used correctly, so don't think it's a flaw just because the function is used, instead take a look at how it is used. Also note that gets is always a flaw.
NOTE2: this list is not exhaustive, do a little research about commonly misused functions and how they can be avoided.
As far as tools, I recommend things like valgrind and splint
One major topic that wasn't covered in Evan's answer is integer overflows. Here are some examples:
wchar_t *towcs(const char *s)
{
size_t l = strlen(s)+1;
mbstate_t mbs = {0};
wchar_t *w = malloc(l*sizeof *w), *w2;
if (!w || (l=mbsrtowcs(w, (char **)&s, l, &st))==-1) {
free(w);
return 0;
}
return (w2=realloc(w, l*sizeof *w)) ? w2 : w;
}
Here, a giant string (>1gig on 32-bit) will make multiplication by the size (I'm assuming 4) overflow, resulting in a tiny allocation and subsequent writes past the end of it.
Another more common example:
uint32_t cnt;
fread(&cnt, 1, 4, f);
cnt=ntohl(cnt);
struct record *buf = malloc(cnt * sizeof *buf);
This sort of code turns up in reading file/network data quite a lot, and it's subject to the same sort of overflows.
Basically, any arithmetic performed on values obtained from an untrusted source, which will eventually be used as an allocation size/array offset, needs to be checked. You can either do it the cheap way (impose arbitrary limits on the value read that keep it significantly outside the range which could overflow, or you can test for overflow at each step: Instead of:
foo = malloc((x+1)*sizeof *foo);
You need to do:
if (x<=SIZE_MAX-1 && x+1<=SIZE_MAX/sizeof *foo) foo = malloc((x+1)*sizeof *foo);
else goto error;
A simple grep for malloc/realloc with arithmetic operators in its argument will find many such errors (but not ones where the overflow already occurred a few lines above, etc.).
Here's a book recommendation: Writing Secure Code. Demonstrates not only how to write secure code, but also common pitfalls and practices that expose security holes. It's slightly dated (my copy says it was published in 2002), but the security concepts it teaches are still quite applicable even 8 years later.
Some source code constructs you can keep an eye out for are:
Functions that don't do bounds checking. Evan covered it pretty well.
Input validation & sanitization, or lack thereof.
NULL pointer dereferencing
fork()s, execve()s, pipe()s, system() called with non-static parameters (or worse, with user input).
Objects shared between threads with inappropriate storage durations (pointers to automatic variables or even "dead" objects in thread-local storage).
When dealing with file manipulation, make sure correct variable types are used for the return results of functions. Make sure they're checked for errors. Make no assumptions about the implementation - permissions of created files, uniqueness of filenames, etc.
Poor sources of randomness (for encryption, communication, etc.) should be avoided.
Simple or obvious mistakes (perhaps out of carelessness) should be fixed anyway. You never know what's exploitable, unless it is.
Also, are the data protected? Well, if you don't care, that's fine. :-)
Some tools that you can consider are:
valgrind : exposes memory flaws, which in large applications are usually critical.
splint : a static checker
fuzzing frameworks
RATS : a free, open-source tool. Its authors' company was acquired by Fortify.
I took a security class where we used a commercial product called Fortify 360, which did static analysis of C++ code. We ran it against an old-old-old version of OpenSSL, and it found loads of stuff, and provided guidance to rectify the flaws (which, by the way, the latest version of OpenSSL had resolved).
At any rate, it is a useful commercial tool.
Some of the OpenBSD folk just recently published a presentation on their coding practices.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'm currently working on some high-performance, stability-critical frameworks, which will likely deploy on x86_64, ia64, and potentially ARM platforms. The library is so far done in C. The current standard is C99, though we are interested in experimenting with features of C11.
Initially, the choice to avoid C++ was because we wanted to prevent developers from using classes due to their inherent inefficiencies, such as larger memory footprints, vtables, and inheritance. We also wanted to keep structs free of member functions. In other words, C was chosen over C++ deliberately to prevent to use of certain features in C++.
However, we recently did a double-take after further investigating some of C++'s features. It certainly seems to have some benefits, mainly type safety and generics.
What I would like to know is:
1- What, exactly, does this type-safety mean for the programmer and compiler?
2- What are the benefits of C++'s type safety, and how can we avoid the pitfalls of unsafe typing with C?
1- What, exactly, does this type-safety mean for the programmer and compiler?
Type safety protects you from debugging silly mistakes, such as Adding Degrees and Radians together or trying to multiply a "string" to an integer. I wouldn't worry about the effects on the compiler. Having programmed in both type-safe languages (C++) and Non-typesafe (PERL,C) I would say that I normally spend less time debugging "Computer internal" things in the type-safe languages (again, adding strings and integers) but spend more time chasing type values and definitions and converting between them.
2- What are the benefits of C++'s type safety, and how can we avoid the pitfalls of unsafe typing with C?
The Type safety is a level of protection that allows the compiler to check that what you are doing is sane. For an individual this is less important than in a group setting because while you know that your "GetNumberOfStudents" function outputs a string instead of an integer, your co-workers may not. The bigger advantage of C++ over C is that you can separate the way you store your data from the way you retrieve your data, so that "GetListOfAllCustomers" won't change to the people using the function if you decide to internally change your data structures.
Short answer: If you're willing to trade developer time and hardware comprehension time for performance and compactness, I would lean towards C. If you're willing to trade a small amount of performance and aren't memory bound, to lessen developer time, I would lean towards C++. I program in C# for all my data analysis and C for all my embedded software work.
Templates in C++ can make it practical to write code that yields better performance than the same code in C. For example, you can do things like generate unrolled loops tuned at compile time to match your problem size coherently to your target's cache size.
Pointer arithmetic and casts in C can potentially lead to buffer over-runs, dangling references and memory leaks. C++ features like smart pointers and container classes can greatly reduce the incidence of these kinds of bugs.
Powerful idioms like RAII are directly supported by C++ language features, and make it much easier to do things like write multi-threaded concurrency correctly without introducing race conditions or deadlocks.
Rich types that represent critical data attributes like unit values and dimensions enable the compiler to catch unit conversion errors at compile time. It is possible to do checked conversions with structs and typedefs in C, but a typical C implementation would catch many unit errors only at run time, if at all, whereas a C++ implementation can be both safer and faster than the equivalent functionality written in C.
These are just some simple examples, but there is much, much more to say about the pitfalls and limitations of C, and the ways that C++ language features may be used to write code that is fast, correct and resilient to change.
Thanks to everyone's contributions here, our team decided to leave the code in C.
To answer the questions posed, here is what I found:
1- Type safety means that type compatibility is checked at compile time. In C, variables are mostly type safe. The reason that C is not regarded as type safe is generally because va_list, which is used in many common operations in C, especially stdio, is not type safe. Another reason that C has a reputation of being unsafe is that it has allows implicit conversions. To the programmer, type safety means catching type mistakes at compile time. To the compiler, it means checking type assignments and implicit conversions at compile time, and reacting more strictly than in a non-type-safe scenario. As far as I could find, it does not make any real differences in the compiled binary.
2- C++'s type safety mainly serves to catch invalid implicit conversions at compile time, hopefully making the programmer's life easier. However, by compiling with gcc and -Wconversion (we usually use -Wall) we get this feedback in the form of warnings rather than failures to build, so the benefits are relatively small as long as we pay close attention to our compiler output.
With C, unsafe typing issues can be virtually eliminated by good coding practice and review.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Note: For this question I will mainly be referring to C++, however this may apply to other languages.
Note: Please assume there is no recursion.
People often say (if you have an exceptionally large function) to "break up" a function into several smaller
functions, but is this logical? What if I know for a fact that I will never use one of those smaller functions,
that is just a waste of: memory, performance, and you may have to jump around the code more when reading it.
Also what if you are only going to use a (hypothetically large) function once, should you just insert the
function body into the place where it would be called (for the same reasons as last time i.e: memory, performance, and you may have to jump around the code more when reading it)? So... to make a function or not to make a function, that is the question.
TO ALL
*EDIT*
I am still going through all the answers, however from what I have read so far I have formed a hypothesis.
Would it be correct to say split it up functions during development, but do what I suggest in the question before deployment, along with making functions you use once in development, but inserting bodies before deployment?
This really depends on the context.
When we say the size of a function, we actually mean the semantic distance of lines inside the function. We prefer that one function should do only one thing. If your function only does one thing and semantic distance is small inside it, then it is OK to have large function.
However, it is not good practice to make a function do a lot of things and it is better to refactor such functions to a few smaller ones with good naming and good placement of codes, such that the user of the code does not need to jump around.
Don't worry too much about performance and memory. Your compiler should take care of the bulk of that for you, especially for very thin functions.
My goal is typically to ensure that the given function call can be replaced entirely in the reader's memory--the developer can treat the abstraction purely. Take this:
// Imagine here that these are real variable/function names as written by a
// lazy coder. I have seen code like this in the wild.
void someFunc(int arg1, int arg2) {
int val3 = doFirstPart(arg1, field1);
int val4 = doSecondPart(arg2, val3);
queue.push(val4);
}
The refactoring of doFirstPart and doSecondPart buys you very little, and likely makes things harder to understand. The problem here isn't method extraction, though: The problem is poor naming and abstraction! You will have to read doFirstPart and doSecondPart or the point of the whole function is lost.
Consider this, instead:
void pushLatestRateAndValue(int rate, int value) {
int rateIndex = calculateRateIndex(rate, latestRateTable);
int valueIndex = caludateValueIndex(rateIndex, value);
queue.push(valueIndex);
}
In this contrived example, you don't have to read calculateRateIndex or calculateValueIndex unless you really want to dig deep--you know exactly what it does just by reading it.
Aside from that, it may be a matter of personal style. I know that some coders prefer to extract every business "statement" into a different function, but I find that a little hard to read. My personal preference is to look for an opportunity to extract a function from any function longer than one "screenful" (~25 lines) which has the advantage of keeping the entire function visible at once, and also because 25 lines happens to my personal mental limit of short-term memory and temporary understanding.
There are many good arguments for not making a routine longer than roughly what will fit on one page. One that most people don't always think about is that - unless you deploy debug symbols, which most people don't do - that stack trace coming in from the field is a lot easier to analyze and turn into a hypothesis about a cause when the routines that it refers to are small than when the error turn out to be occuring somewhere in that 2,000-line whale of a method that you never got around to split up.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
There are a myriad of discussion threads on the subject of cross-platform Unicode string usage, but it seems there's is a wide range of opinion, without addressing some specific concerns that've been vexing me on a specific project I'm working on:
I have a large cross platform C++ code base that goes back almost twenty years. It contains a hodge-podge of all manner of string implementations, including:
char*
Pascal-style strings
std::string
several custom cross-platform classes with overlapping functionality
CFString
all manner of constant strings
This code base is in the process of being rewritten to entirely use Unicode strings and implementing a strong MVC architecture, with the hope that the model will be fully portable (Mac OS / IOS / Android / Windows 7 & 8 / Unix).
While persistent data is being written as XML/UTF-8, there are some dilemmas regarding string usage in run-time objects:
I'd like to create a class that cleanly hides the implementation of storage, allocation and common string operations. Through the miracle of C++ operator and assignment overloading I'm hoping to be able to substitute a class instance to replace all the different string parameters that functions can accept. This would allow for an incremental conversion of the code base.
We are constantly scanning / parsing / analyzing strings, and I worry that using a strictly UTF-8 underlying implementation for persistent objects might have performance issues. If not, would the modern std::string found in Microsoft's VC++ and GNU's G++ be a simple underlying implementation?
The Mac OS / IOS versions ultimately need to have their strings "converted" to CFString. The CF functions are rich and highly optimized. I'm thinking it would be a good strategy to have my own class create CFStrings by providing CF with a buffer (for example, CFStringCreateWithCharactersNoCopy or CFStringCreateMutableWithExternalCharactersNoCopy). Seems as if this could reduce the amount of conversion/allocation CFString would normally require after fetching data from the model — ALTHOUGH perhaps in a proper MVC implementation the Controller/View shouldn't have access to actual strings owned by the model?
Does C++ 11 change the picture for any of these cross-platform string issues?
I would've guessed that these issues should have been solved long ago — but from reviewing the responses on this site (and others) I can't see that it has.
I'd like to create a class that cleanly hides the implementation of storage, allocation and common string operations. Through the miracle of C++ operator and assignment overloading I'm hoping to be able to substitute a class instance to replace all the different string parameters that functions can accept. This would allow for an incremental conversion of the code base.
Sounds like std::string with an added cast operator to const char*, so you won't have to call c_str(). Which will mean you have to use char and UTF-8 for storage, as opposed to UTF-16 or similar.
We are constantly scanning / parsing / analyzing strings, and I worry that using a strictly UTF-8 underlying implementation for persistent objects might have performance issues. If not, would the modern std::string found in Microsoft's VC++ and GNU's G++ be a simple underlying implementation?
This depends on several other factors. On the one hand, UTF-8 might be inefficient if your input contains a lot of non-ascii data and you have to analyze it one codepoint at a time. In that case, UTF-16 or even UTF-32 might be more reasonable, as you won't have as many case distinctions to reassemble code points from multiple string elements. On the other hand, performance greatly depends on whether you can pass strings by reference or have to create a copy, particularly when calling a function. So some modifications to your existing code base might be necessary in order to avoid too many copies.
The Mac OS / IOS versions ultimately need to have their strings "converted" to CFString. The CF functions are rich and highly optimized. I'm thinking it would be a good strategy to have my own class create CFStrings by providing CF with a buffer (for example, CFStringCreateWithCharactersNoCopy or CFStringCreateMutableWithExternalCharactersNoCopy). Seems as if this could reduce the amount of conversion/allocation CFString would normally require after fetching data from the model — ALTHOUGH perhaps in a proper MVC implementation the Controller/View shouldn't have access to actual strings owned by the model?
When you create strings without copying the data buffer, then you'll have to ensure that the buffer lives as long as the string is accessed. This might be true in some cases, but not in all. In general the problems are very similar to those you have with char* backed by a std::string, which is the reason why c_str() is an explicit function call and not only an automatic cast. By doing such a conversion, you have to guarantee that the original object stays allocated. In general, I'd pass const std::string& to views, so they won't accidentially change strings owned by the model. If they need to retain or modify the string, they'll have to copy it.
Does C++ 11 change the picture for any of these cross-platform string issues?
C++ 11 provides a number of new smart pointer implementations, which allow you more control over how long a string object remains allocated. So you could for example use a shared_prt<string> as the data storage of your class, to obtain automatic reference counting and deallocation of strings. This would give you a higher level of abstraction, but might be even farther away from what your current code base does, so I'm not sure whether this will make porting any easier for you.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have been involved in numerous c++ projects mainly in the application domain pertaining to VOIP protocols. Now I have to move to L3 , L2 protocol development projects where I found 'C' is preferred language of choice for the L2/L3/L4 developers.
Now I am wondering expect device firmware related applications, why protocols are developed using stone age era language. Why ppl dont take the benefits of OOPS techniques? Will it be prudent if I try to convince them to switch to c++. Most of the developers working in the team are C experts and not comfortable with C++.
There are several reasons for continuing using C.
There are existing C projects. Who will pay for converting them into C++?
C++ compiler (of a good quality) is not available on every platform.
Psychological reason. When you pass and return objects by value, temp objects are created left and right. This is not ok for small systems. People do not really understand that passing and returning references completely solves this problem. There are other similar issues.
And finally. What is wrong with C? It works! (Do not fix what is not broken).
It is possible to write the same performant code on C++ as on C, but this requires better understanding, training, code control discipline. Common percetion is that these flaws are unavoidable.
If you think of C as simply a "stone-age language," then I think you misunderstand why people continue to use it. I like and use both C and C++. I like them both for different reasons, and for different kinds of problems.
The C language presents a model of the computer that is both (mostly) complete and very easy to understand, with very few surprises. C++ presents a very complex model, and requires the programmer to understand a lot of nuance to avoid nasty surprises. The C++ compiler does a lot of stuff automatically (calling constructors, destructors, stack unwinding, etc.). This is usually nice, but sometimes it interferes with tracking down bugs. In general, I find that it's very easy to shoot yourself in the foot with both C and C++, but I find the resulting foot-surgery is much easier to do in C, simply because it's a simpler language model.
The C model of a computer is about as close to assembly as you can while still being reasonably portable. The language does almost nothing automatically, and lets you do all kinds of crazy memory manipulations. This allows for unsafe programming, but it also allows for very optimized programming in an environment with very few surprises. It's very easy to tell exactly what a line of code does in C. That is not true in C++, where the compiler can create and destroy temporary objects for you. I've had C++ code where it took profiling to reveal that automatic destructors were eating a ton of cycles. This never happens in C, where a line of code has very few surprises. This is less of an issue today than it was in the past; C++ compilers have gotten a lot better at optimizing many of their temporaries away. It can still be an issue, though, and especially in an embedded environment where memory (including stack space) is often tight.
Finally, code written in C++ often compiles slowly. The culprits are usually templates, but eliminating templates often makes your C++ code look a lot like C. And, I really cannot overstate how much this can affect productivity. It kills productivity when your debug-fix-recompile-test cycle is limited by the compilation time. Yes, I know and love pre-compiled headers, but they only do so much.
Don't get the impression that I'm anti-C++ here. I like and use the language. It's nice to have classes, smart pointers, std::vector, std::string, etc. But there's a reason that C is alive and kicking.
For a different perspective, and one that is firmly anti-C++, you should at least skim over Linus Torvald's perspective on C++. His arguments are worth thinking about, even if you disagree with them.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 12 years ago.
Recently i had a discussion with my boss (a long time C developer) who discouraged me in using C++ streams and stick to "good old" printf & friends. Now i can understand why he is saying this and believe me i did not follow his advice.
But still this is bugging me - are there things in C that are still better in some cases than newer C++ implementations of the same/similar thing? By better i mean for example performance, stability or even code readability/maintainability. And if so, can someone give me examples? I'm mainly talking about similar differences like printf/streams, not about features like inheritance or OOP for that matter. The reason why i'm asking all this is that i consider myself a C++ developer and as such I always try to code the C++ way.
C printf()-style output is typically faster than C++ ostream output. But of course it can't handle all the types that C++ output can. That's the only advantage I'm aware of - typically, because of aggressive inlining, C++ can be a lot faster than C.
There is one thing that C programmers sometimes point out and that is worth considering: If you stay away from macros, then it's mostly obvious what a line of C code does. Take for example this:
x = y;
In C, this is an assignment and only an assignment. The value of y is (after a possible conversion) copied into x.
In C++ this could literally mean anything.
A simple assignment,
a user defined conversion operator in y which deletes the internet and returns a value that is of the same type as x
There is a constructor which makes an object of x's type from y, after melting down a nuclear power plant. This value is assigned to x.
There is a user defined assigment operator which allows assignment from a bunch of other types, for which y has a conversion operator or which are in some other ways obtainable from y. The assignment operator has a bug which might create a black hole, because its a part of the LHC operation software.
more of the above.
To make it even more interesting, every single operation might throw an exception in C++, which means that every line must be written in a way that it can rollback what it changed, which is sometimes hard when you can't say what a line actually does. And to make it worse, your program might crash instantly, because the exception happens because the assignment is called during a exception unwind. In C++ things tend to become "vertically complex", which poses its own requirements to the capabilities and the communication skills of the developers.
When you're writing C++, write C++. When you're writing C, write C. Whoever says different is probably uncomfortable with the differences, or thinks of C++ as a "better C". That isn't the case; C++ is its own language with its own features, and is mostly C-compatible for the sole purpose of easing conversion.
As far as performance goes, I used to be a USACO competitor. I quickly found that 98% of one of my programs' runtime was spent using C++ IOStreams. Changing to fscanf reduced the overhead by a factor of ten. Performance-wise, there's no contest at all.
I think C style is better when you need raw memory management. It is a bit cumbersome to do that with C++ constructs and you don't have realloc() for example.
Someone who down voted that, probably never tried to explore the topic.
I'm surprised how people can't imagine themselves in different positions. I'm not saying that everybody should use C style constructs. I'm saying that C style is better when you NEED raw memory management. Someone has to write all those secure classes/libraries (including standard library, garbage collectors, memory pools). Your experience in which you never need it does not cover all cases.
Another situation is when you write a library. With C you get pretty symbols table, which can be easily binded with many other programming languages. With C++ you will have name mangling, which makes library harder (but not impossible) to use in non-C++ environment.
I couldnt give you a conclusive answer; however i found this rather dated comparison interesting.
http://unthought.net/c++/c_vs_c++.html
I dont think using printf style functions generally over iostreams is justified.
iostreams just greatly speed up development time and debugging time, and are much less error prone (e.g. think of buffer overflows, wrong % type specifiers, wrong number of arguments ... and the biggest problem is that the compiler cant help you at all).
And if you dont use endl when it isnt needed, cout isnt that much slower than printf.
So generally you should go with C++ iostreams, and only if profiling shows that critical sections take too much time because of iostream calls, then optimize those sections with C style functions, but make sure to use the safer versions of the functions like snprintf instead of sprintf.
Examples:
Consider you have a int foo variable, which you printf in a number of places, later during development, you realize you need foo to be a double instead. Now you have to change the type specifiers in every printf style call which uses foo. And if you miss one single line, welcome in the land of undefined behaviour.
Recently i had a case where my program crashed because i missed a simple comma, and because of the great printf-style command, my compiler didnt help me: printf("i will crash %s" /*,*/ "here");. This wouldnt have happened with iostreams either.
And of course you cant extend the behaviour of printf and friend to work with your own classes like you can with iostreams.
Good old C! Ah, the pre-ANSI days... <sarcasm>I certainly miss having practically no type checking on arguments and returns values or having the compiler assume anything untyped is an int and not an error.</sarcasm>
Seriously, though - there is a fairly good argument against using exceptions as error handling. I read a fairly decent argument against exceptions for system level work and mostly I think the problem is that you can't simply read a block of code and know it won't throw in C++, whereas you can read most C and say "all the errors (at this level) are trapped" or "the ones that aren't don't matter".
Where using C++ features might be problematic:
portability: IMHO C is still more portable
mixed language programming: calling a C function from another language is almost never problematic, with C++ you quickly get in trouble because of name mangling etc.
performance issues: features like templates may lead to code bloat, temporary object creation may have a huge impact too, etc...
maintainability: Since C++ is more complex than C, Restrict use to language features you expect the person who is later maintaining your code to be capable of.
However, some/most of C++ features are quite handy and useful if used with care.
Remember the saying "With C++ it's harder to shoot yourself in the knee, but if you do, it will cost you the entire leg".
I sometimes prefer pointers and memcpy over iterators and std::copy when I don't need generic code.
Same for iostreams, they are convenient and extensible but there are a lot of situations when [f|s]printf / scanf are as simple.