Why does istream::operator>> accept char pointers/arrays? - c++

char someArray[n];
std::cin >> someArray; // potential buffer overrun
I've seen code like the above numerous times on the C++ forums I frequent. Is there a good reason for this not to be treated as a compile time error? or at the very least, a warning?

An underlying premise with C (and C++) is that the coder should know what they're doing. Otherwise they'd be coding in BASIC :-)
It's not permitted to be an error since it's allowed per the standard, just like gets and scanf("%s") are allowed in C, despite the fact they're a problem waiting to happen.
The code you've posted is bad and has no place in serious software, but it's fine for "toy" programs or testing things. You just need to be aware of its problems (and it sounds very much like you are aware of them).

If C++ had been all been invented in one fell swoop, it probably wouldn't exist at all -- if you wanted to read a string, you'd have to read it into a std::string, and that would be the end of it.
Unfortunately, C++ was used for quite a while before std::string was standardized (or invented at all). Both operator>> and istream::getline (not to be mistaken for std::getline) were invented during that time. When they were invented, there was little (or no) real alternative, so they worked with arrays of char.
Today, of course, there are alternatives, and it's best to just avoid these unless you get stuck writing code with some ancient compiler that doesn't support the superior alternatives.

Related

What's the deal with the CRT SECURE warnings/errors on visual studio?

I encountered this issue a dozen, if not a million times already: I compile a c++ program on visual studio and get a dozen, if not a million warnings and/or errors suggesting that I am doing something very dangerous and that there is no way my compiler will let me do that. the warnings/errors tell me that I am using a deprecated function and that I should consider using some other safer function that may or may not do the same thing as this one, but I have no idea what this one does in the first place since I did not write it.
After some research (I do it everytime, I am not a quick learner) I find out I am not the first one facing this particular problem, and I can coerce my compiler to work with this program with the proper macro definition (for the future readers who don't care about my question but want to compile their program, you have to define _CRT_SECURE_NO_DEPRECATE, don't you ever dare following visual studio's advice and using the allegedly safe function).
I have often read in the manual or on this very website, along with the answer, the fact that I should not do that if I don't know precisely what I am doing.
I must confess: I have no idea what I am doing, and I would be very grateful if someone would accept to explain it to me.
So here are my questions:
What are those functions that are unsafe? Why do they exist in the first place?
What is unsafe about them?
Why are they so often found in perfectly honourable libraries?
I have come to the understanding that there is no safe and portable alternative to those functions: why is it so? How about we have some people think about it and try to define a way to do it, and everyone would accept to do it that way, and we would call it standard maybe?
To tackle your questions in order:
They exist in the first place because the standard wrote them in such a way. Standards authors are human so don't think of everything and this left some security weaknesses in the C API. You can find a list of these deprecated functions at http://msdn.microsoft.com/en-us/library/ms235384.aspx.
Many of the functions are unsafe as they allow such things as buffer overruns to occur but other security vulnerabilities may be exposed depending on the function.
Honourable libraries generally try for some cross platform compatibility so I suspect will try to stick to stand C rather than using compiler specific functions and extensions.
The "perfect" standard will probably never exist as in my first point :) Some of the C API problems can be avoided using C++ but that's a big hammer to crack a small nut and brings security vulnerabilities of its own.

Suitability of D for writing a Tracing JIT Compiler?

I'd like to write an interpreter and tracing JIT for a programming language I'm designing. I already have many years of experience programming in C++, but I've been wondering if perhaps newer alternatives might be better. One of the things I found most frustrating, back in my C++ days, was having to use header files to deal with the clunky one-pass compiler model. The problem is that not all languages are equally suited for this purpose. For my tracing JIT, I need to be able to write executable code into memory and have the interpreter call to that code. I will also need the generated code to be able to call back into host functions.
I started looking at Go and saw that the language had pointers but no pointer arithmetic. This immediately struck me as a huge issue. I may well want to write my own allocator and garbage collector. I will need to closely control the way my language objects are laid out in memory and be able to get the address of specific fields and write to them. Unless there's ways to deal with this, it kind of seems like Go fails to be low-level enough for my purposes.
The D language seems promising. It has pointer arithmetic and a clear outline of the ABI needed to call in and out of D. I've heard lots of good things about it. It also has garbage collection which is nice for compiler writing, but I still have a few things I'm not sure about:
Does D have standard libs that will allow me to mark chunks of memory as executable?
If I allocate a big chunk of memory that I want to manage myself, with my own GC, and have a bunch of pointers going into there, will this pose problems with D's garbage collector?
How well does D interoperate with C code, in your experience? Is loading C dynamic libraries and calling into them fairly easy?
Finally, there's the whole support aspect. For those who have used D on linux here, how good is the toolchain? Any issues? Has anyone written a JIT compiler in D, and if so, how was the experience?
I believe so, see core.memory.GC if I remember right.
No, it shouldn't. Just call malloc or whatever you need, and make sure the GC doesn't see it.
Yes, it's pretty easy to interoperate with C code.
Caveat: You probably don't want to rely on the GC either, since it's not 'precise' (i.e. can and does leak memory if you're unlucky). But for small blocks of data it's usually fine.
Go does allow pointer arithmetic, but you must import the unsafe package to do so (or use a C function). Pointer arithmetic is a common source of bugs, and Go has other mechanisms, like slices, which provide safe ways to do some of the same activities that require pointer arithmetic in C. With unsafe you can cast any pointer to a uintptr and back, and uintptr is an ordinary numeric type, which allows you do do arithmetic.
I started looking at Go and saw that the language had pointers but no pointer arithmetic. This immediately struck me as a huge issue.
You, obviously, haven't tried the language. It works pretty well without any "pointer arithmetic". If you really need to bend rules, there is always "unsafe" package that will allow you to do anything.
I may well want to write my own allocator and garbage collector. I will need to closely control the way my language objects are laid out in memory and be able to get the address of specific fields and write to them.
I haven't written allocatior or garbage collector myself, but you can take address of a field of a structure. All Go data structures are simple and easy to control and reason about. See http://research.swtch.com/godata for short introduction. Also size and aligment guarantees are part of the language http://golang.org/ref/spec#Size_and_alignment_guarantees. If nothing else, you could always jump into C or asm.
IMHO, you should try to implement some small task to see if Go fits your requirements. Feel free to ask questions at http://groups.google.com/group/golang-nuts.
Alex
There is already a JIT compiler, very serious one, done in D. I highly recommend taking a look at http://lycus.org/ , more specifically pages about the MCI project - http://github.com/lycus/mci . MCI documentation will give you some more information. As you will see, MCI is more than just a JIT, it has its own (better than anything else I have seen) IR, optimizer, verifier, etc...

Is it bad practice to use C features in C++?

For example printf instead of cout, scanf instead of cin, using #define macros, etc?
I wouldn't say bad as it will depend on the personal choice. My policy is when there is a type-safe alternatives is available in C++, use them as it will reduce the errors in the code.
It depends on which features. Using define macros in C++ is strongly frowned upon, and for a good reason. You can almost always replace a use of a define macro with something more maintainable and safe in C++ (templates, inline functions, etc.)
Streams, on the other hand, are rightly judged by some people to be very slow and I've seen a lot of valid and high-quality C++ code using C's FILE* with its host of functions instead.
And another thing: with all due respect to the plethora of stream formatting possibilities, for stuff like simple debug printouts, IMHO you just can't beat the succinctness of printf and its format string.
You should definitely use printf in place of cout. The latter does let you make most or all of the formatting controls printf allows, but it does so in a stateful way. I.e. the current formatting mode is stored as part of the (global) object. This means bad code can leave cout in a state where subsequent output gets misformatted unless you reset all the formatting every time you use it. It also wreaks havoc with threaded usage.
I would say the only ones that are truly harmful to mix are the pairings between malloc/free and new/delete.
Otherwise it's really a style thing...and while the C is compatible with the C++, why would you want to mix the two languages when C++ has everything you need without falling back?
There are better solutions for most cases, but not all.
For example, people quite often use memcpy. I would almost never do that (except in really low-level code). I always use std::copy, even on pointers.
The same counts for the input/output routines. But it’s true that sometimes, C-style printf is substantially easier to use than cout (especially in logging). If Boost.Format isn’t an option then sure, use C.
#define is a different beast entirely. It’s not really a C-only feature, and there are many legitimate uses for it in C++. (But many more that aren’t.)
Of course you’d never use it to define constants (that’s what const is for), nor to declare inline functions (use inline and templates!).
On the other hand, it is often useful to generate debugging assertions and generally as a code generation tool. For example, I’m unit-testing class templates and without extensive use of macros, this would be a real pain in the *ss. Using macros here isn’t nice but it saves literally thousands of lines of code.
For allocations, I would avoid using malloc/free altogether and just stick to new/delete.
Not really, printf() is quite faster than cout, and the c++ iostream library is quite large. It depends on the user preference or the program itself (is it needed? etc). Also, scanf() is not suitable to use anymore, I prefer fgets().
What can be used or not only depends on the compiler that will be used. Since you are programming in c++, in my opinion, to maximize compatibility it is better to use what c++ provides instead of c functions unless you do not have any other choices.
Coming from a slightly different angle, I'd say it's bad to use scanf in C, never mind C++. User input is just far to variable to be parsed reliably with scanf.
I'd just post a comment to another reply, but since I can't... C's printf() is better than C++'s iostream because of internationalization. Want to translate a string and put the embedded number in a different place? Can't do it with an ostream. printf()'s format specification is a whole little language unto itself, interpreted at runtime.

How portable is code using wmemset()?

Currently our code uses a for-loop for filling a buffer holding a Unicode string with some Unicode character value (of type wchar_t). There's wmemset() function in Visual C++ using which we could replace a loop with a single function call in that code. However we're concerned about portability - we'd like to leave code as portable as possible and so introducing non-portable or poorly portable stuff is a bad idea.
Will using wmemset() hurt portability and to what extent?
It's mentioned in the C++ standard cwchar (Table 48) at least and hence should be pretty standard. So I guess it should not hurt portability

How to spot undefined behavior

Is there any way to know if you program has undefined behavior in C++ (or even C), short of memorizing the entire spec?
The reason I ask is that I've noticed a lot of cases of programs working in debug but not release being due to undefined behavior. It would be nice if there were a tool to at least help spot UB, so we know there's the potential for problems.
Good coding standards. Protect you from yourself. Here are some ideas:
The code must compile at the highest warning level... without warnings. (In other words, your code must not set off any warnings at all when set to the highest level.) Turn on the error on warning flag for all projects.
This does mean some extra work when you use other peoples' libraries since they may not have done this. You will also find there are some warnings which are pointless... turn those off individually as your team decides.
Always use RAII.
Never use C style casts! Never! - I think there's like a couple rare cases when you have to break this but you will probably never find them.
If you must reinterpret_cast or cast to void then use a wrapper to make sure you're always casting to/from the same type. In other words, wrap your pointer/object in a boost::any and cast a pointer to it into whatever you need and on the other side do the same. Why? Because you will always know what type to reinterpret_cast from and the boost::any will enforce that you've cast to the correct type after that. It's the safest you can get.
Always initialize your variables at the point of declaration (or in constructor initializers when in a class).
There are more but those are some very important ones to start with.
Nobody can memorize the standard. What we intermediate to advanced C++ programmers do is use constructs we know are safe and protect ourselves from our human nature... and we don't use constructs that are not safe unless we have to and then we take extra care to make sure the danger is all wrapped up in a nice safe interface that is tested to hell and back.
One important thing to remember which is universal across all languages is to:
make your constructs easy to use correctly and difficult to use incorrectly
It's not possible to detect undefined behavior in all cases. For example, consider x = x++ + 1;. If you're familiar with the language, you know it's UB. Now, *p = (*p)++ + 1; is obviously also UB, but what about *q = (*p)++ + 1;? That's UB if q == p, but other than that it's defined (if awkward-looking). In a given program, it might well be possible to prove that p and q will never be equal when reaching that line, but that can't be done in general.
To help spot UB, use all of the tools you've got. Good compilers will warn for at least the more obvious cases, although you may have to use some compiler options for best coverage. If you have further static analysis tools, use them.
Code reviews are also very good for spotting such problems. Use them, if you've got more than one developer available.
Static code analysis tools such as PC-Lint can help a lot here
Well, this article covers most aspects..
I think you can use one tool from coverity to spot bugs which are going to lead to undefined behavior.
I guess you could use theorem provers (i only know Coq) to be sure your program does what you want.
clang tries hard to produce warnings when undefined behavior is encountered.
I'm not aware of any software tool to detect all forms of UB. Obviously using your compiler's warnings and possibly lint or another static code checker can help a lot.
The other thing that helps a lot is simply experience: The more you program the language, the more you'll see constructs that appear suspect and be able to catch them earlier in the process.
Unfortunately, there is no way way to detect all UB. You'd have to solve the Halting Problem to do that.
The best you can do is to know as many of the rules as possible, look it up when you're in doubt, and check with other programmers (through pair programming, code reviews or just SO questions)
Compiling with as many warnings as possible, and under multiple compilers can help. And running the code through static analysis tools such as Valgrind can detect many issues.
But ultimately, no tool can detect it all.
An additional problem is that many programs actually have to rely on UB. Some API's require it, and just assume that "it works on all sane compilers". OpenGL does that in one or two cases. The Win32 API won't even compile under a standards compliant compiler.
So even if you had a magic UB-detecting tool, it would still be tripped up by the cases that aren't under your control.
Simple: Don't do things that you don't know that you can do.
When you are unsure or have a fishy feeling, check the reference
A good compiler, such as the Intel C++ compiler, should be able to spot 99% of cases of undefined behaviour. You'll need to investigate the flags and switches to use. As ever, read the manual.