I'm focusing on micropython, specifically the branch dynamic-native-modules.
This feature will, in the future, allow you to compile a C/C++ function into a native .obj and package it together with a .py interface for a huge speed boost.
Awesome! But the issue is that if you're using a RTOS, which doesn't have virtual memory, then any executing native code can access any part of the address space including peripherals, the RTOS' state etc.
You don't want the user to be able to do something like this:
void user_func()
{
/* point to arbitrary memory, potentially the reset registers, flash erase . . . you get the point */
int * a = (int*)0x1234;
*a = 0x10110000; // DESTROY!!!
}
Even the following should be disallowed:
void user_func()
{
int a;
(int*)(&a-1000) = 0x10010111;
}
SOLUTIONS?
Create own version of gcc (for each binary format)
Decompile .obj files and detect use of pointers (for each binary binary format)
FEEDBACK TO COMMENTS
I get that it may be impossible to stop a malicious user but that's not the #1 worry. We want to stop well-meaning but accidental code. If it's not possible to stop every, single case that's ok.
If we can prohibit/detect explicit pointer accesses and simply provide warnings regarding array use, that is still very valuable.
WARNING: YOU'RE USING AN ARRAY! MAKE SURE YOU DON'T GO OUT-OF-BOUNDS
Your best chance is a GCC plugin which looks at the frontend-generated GENERIC or GIMPLE IRs and implements the policies you want. Depending on the policies and the source code you want to accept, this could be a lot of work and very difficult.
If you want a purely syntax-based or type-based approach (simply rejecting all pointer arithmetic), Clang with its ASTs is easier to work with than GCC.
There could be a way of doing what you want - as long as you protect from honest mistakes, not deliberate attempts to break things.
First, you embrace C++ Core Guidelines and use support from the tools: https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#S-tools
For your purposes, it will stop your code from using raw pointers and pointer arithmetic in non-library code. Period. Core guidelines do more than that, but this is what is related to your question.
Than, you do a simple grep on the code to make sure it is not using pointers blessed by Guidelines - this would be a case of more or less simple grep.
In practice, it is not reasonably possible.
You might consider changing the compilation (e.g. with a GCC plugin, like mentioned in Florian Weimer's answer) to check every access to arrays, but that makes the generated code significantly slower. Your native hardware is slow enough, you don't want to make it even more slow.
And Python is not exactly the right source language. Its dynamic typing would make its quite slow already.
Perhaps you might consider a statically typed language, like Ocaml (combined with a JIT or AOT compilation library like GCCJIT, etc). The type system (and its inference) increase the safety (and the speed) of the generated code, and with lot of hard work (several years, probably worth a PhD
since you'll need do new research) you might improve the type inference to even infer the occurrences where the array index is never out-of-bounds (and an index boundary check is not even needed).
In most case, upgrading the hardware (to a Raspberry Pi style one, with an MMU, and probably the ability to have a real OS with some virtual memory) is probably the most pragmatic approach
PS. Be aware of Rice's theorem. Most static source analysis cannot work reliably and well (be sound and complete, in technical terms).
Related
What i have is:
a hex file with the bytes of a c-struct in it, orderd in big-endian
the struct definition as *.h file
the struct information as dwarf2 debug info
My application has to be written in C / C++. Intermediate scripts using for example python would be ok.
What i have to do is read the bytes of the hex-file and cast it into the struct type on a system that is little-endian.
And during this process, i will have to reverse the bytes of each struct member.
The obvious solution would be to write a conversion function, that does byteswapping for each struct-member, but since the struct has multiple layers and ~1200 members that are changing faster than i can update my conversion function, writing that by hand is no solution.
So i could generate the conversion function automatically by:
Finding and parsing the types inside multiple *.h files
Iterating members of all struct-types and generate swaps for them -> without some sort of reflection api not that easy)
loading the struct via the conversion function.
Since this solution seems like quite a bit of work, i was wandering if there is easier way like telling the compiler to swap it or use debug-info somehow.
Does anybody know a trick that might help in this case?
Thanks and greetings!
Remark:
Changing any of the processes leading to this / changing the input-conditions or delegating responsibilities to other developers involved is not pssible.
Changing something about the hex-file as an input is not possible. This file comes out of some other system that will not change to fix this problem here.
Padding, Datatype-sizes etc. are identical. This is ensured by other measures, too. So endianess is defenetly the only problem. This is also why i see no reason against using dwarf2 info to identify the bytes of every struct member.
I agree that the layout of the struct is very bad. But It has some reasons why it is that way and to be short, i can/am not allowed not change that anyway because of process-reasons and backwards compatibility.
To give some more scope:
The Software that all of this is used in is deployed to multiple different embedded devices (multiple types). The hex-file containes the calibration information of the software and is thus stored in a specific system that can only output this hex-file.
I am now porting the software to a little-endian device and i have to use the hex-file given from the "main" branch of software, which is big-endian, as an input.
There is no way to tell C or C++ compiler to swap bytes from LE to BE or vice versa automatically. You really have to do it yourself. If your data structs are really huge, probably the best way is to implement automatic conversion code generation.
The problem, as far as I understand it, is tricky but tractable. As far as I understand, data extraction won't be running on an embedded device, so it won't be resource constrained. I say - embrace the runtime inefficiency that desktop hardware allows, and go for easy to debug instead.
Instead of thinking of the source file as "almost what I need modulo a couple of minor adjustments", think of it as "generic binary file with an open ended, evolving schema". The schema description is the DWARF data.
What I would do: start a Python project. Use the pyelftools PyPI module to parse the DWARF. Scroll for the compile units (CUs). In each CU, scroll through the top level entries (DIEs). Look for a DW_TAG_structure_type DIE with a specific value of DW_AT_name (I hope the struct name is known in advance). Then go through the DW_TAG_member sub-DIEs. DW_AT_data_member_location will give you the offset, letting you work around the padding. Look at DW_AT_type to detect the member type (you'd have to resolve the DIE reference for that). Recurse into struct- and array-type members as necessary.
From that, generate a format string for the struct.unpack method - it can read big-endian ints seamlessly. Then use struct.pack to format it into whatever format the C++ consumer expects.
This depends on you being able to track the data file to the DWARF info of the generating executable, exactly the same build. I hope the processes of the organization allow for that.
Recent versions of GCC allow the declaration of the desired endianness irrespective of the target platform for a source code section using the pragma scalar_storage_order or a specific type using an attribute with the same identifier. The main catch: g++ does not support this. Also, this won't work in all cases. For example, taking a pointer to a member with transparent endianness conversion leads to an error. Unless you're okay with sticking to C for struct access (it all depends on your current codebase), this is not an option.
The persistence layout is based on the original struct layout - so be it. However, a more explicit approach of serializing the structs should be preferred for exactly the reason you bring this up. Besides the endianness issue, struct packing also affects compatibility and should be explicitly specified. For persistence, a packing of 1 would be optimal. For in-memory data structures, that alignment is far from optimal in terms of performance and concurrency characteristics. Also, different platforms might have incompatible data types (e.g. sizeof(long) on 64-bit Linux/Windows - LP64 vs. LLP64). So, keeping the persistence layout separate from in-memory data structures tends to have a long list of advantages and therefore usually outweights the disadvantage of having to maintain the serialization code separately. Particularly, if portability is a major concern.
You could take advantage of C/C++-based reflection libraries or implement one yourself. In case of C, this will definitely require macros (e.g. Metaresc). In case of C++, you might actually get away your original struct definitions (e.g. Boost.Precise and Flat Reflection).
If reflection is not an option, you could generate the serialization code either by parsing the headers or debug symbols. Generally, parsing C/C++ is more complex. By moving the structs involved into dedicated headers, you might get away with a simple C/C++ parser. To make things easier, you could simplify parsing by processing the gdb output of ptype based on debug symbols. Or, you could parse debug symbols directly. With a scripting language like Python, both approaches should be feasible (pygccxml and pyelftools come to mind).
Rather than sticking to generating the serialization code as part of the build process, you could generate that code once and require updates whenever the structs change in the future. That's what I would do in a multi-platform scenario. Doing that would also spare you the pain of implementing a perfect parser that can deal with all kinds of C/C++ input, it would only have to be good enough for one-time generation.
Say I have C++ project which has been working for years well.
Say also this project might (need to verify) contain undefined behaviour.
So maybe compiler was kind to us and doesn't make program misbehave even though there is UB.
Now imagine I want to add some features to the project. e.g. add Crypto ++ library to it.
But the actual code I add to it say from Crypto++ is legitimate.
Here I read:
Your code, if part of a larger project, could conditionally call some
3rd party code (say, a shell extension that previews an image type in
a file open dialog) that changes the state of some flags (floating
point precision, locale, integer overflow flags, division by zero
behavior, etc). Your code, which worked fine before, now exhibits
completely different behavior.
But I can't gauge exactly what author means. Does he say even by adding say Crypto ++ library to my project, despite the code from Crypto++ I add is legitimate, my project can suddenly start working incorrectly?
Is this realistic?
Any links which can confirm this?
It is hard for me to explain to people involved that just adding library might increase risks. Maybe someone can help me formulate how to explain this?
When source code invokes undefined behaviour, it means that the standard gives no guarantee on what could happen. It can work perfectly in one compilation run, but simply compiling it again with a newer version of the compiler or of a library could make it break. Or changing the optimisation level on the compiler can have same effect.
A common example for that is reading one element past end of an array. Suppose you expect it to be null and by chance next memory location contains a 0 on normal conditions (say it is an error flag). It will work without problem. But suppose now that on another compilation run after changing something totally unrelated, the memory organization is slightly changed and next memory location after the array is no longer that flag (that kept a constant value) but a variable taking other values. You program will break and will be hard to debug, because if that variable is used as a pointer, you could overwrite memory on random places.
TL/DR: If one version works but you suspect UB in it, the only correct way is to consistently remove all possible UB from the code before any change. Alternatively, you can keep the working version untouched, but beware, you could have to change it later...
Over the years, C has mutated into a weird hybrid of a low-level language and a high-level language, where code provides a low-level description of a way of performing a task, and modern compilers then try to convert that into a high-level description of what the task is and then implement efficient code to perform that task (possibly in a way very different from what was specified). In order to facilitate the translation from the low-level sequence of steps into the higher-level description of the operations being performed, the compiler needs to make certain assumptions about the conditions under which those low-level steps will be performed. If those assumptions do not hold, the compiler may generate code which malfunctions in very weird and bizarre ways.
Complicating the situation is the fact that there are many common programming constructs which might be legal if certain parts of the rules were a little better thought-out, but which as the rules are written would authorize compilers to do anything they want. Identifying all the places where code does things which arguably should be legal, and which have historically worked correctly 99.999% of the time, but might break for arbitrary reasons can be very difficult.
Thus, one may wish for the addition of a new library not to break anything, and most of the time one's wish might come true, but unfortunately it's very difficult to know whether any code may have lurking time bombs within it.
So we've all heard the don't-use-register line, the reasoning being that trying to out-optimize a compiler is a fool's errand.
register, from what I know, doesn't actually state anything about CPU registers, just that a given variable can't be referenced indirectly. I'll hazard a guess that it's often referred to as obsolete because compilers can detect a lack of addressing automatically thus making such optimizations transparent.
But if we're firm on that argument, can't it be levelled at every optimization-driven keyword in C? Why do we use inline and C99's restrict for example?
I suppose that some things like aliasing make deducing some optimizations hard or even impossible, so where is the line drawn before we start venturing into Sufficiently Smart Compiler territory?
Where should the line should be drawn in C and C++ between spoon-feeding a compiler optimization information and assuming it knows what it's doing?
EDIT: Jens Gustedt pointed out that my conflating of C and C++ isn't right since two of the keywords have semantic differences and one doesn't exist in standard C++. I had a good link about register in C++ which I'll add if I find it...
I would agree that register and inline are somewhat similar in this respect. If the compiler can see the body of the callee while compiling a call site, it should be able to make a good decision on inlining. The use of the inline keyword in both C and C++ has more to do with the mechanics of making the body of the function visible than with anything else.
restrict, however, is different. When compiling a function, the compiler has no idea of what the call sites are going to be. Being able to assume no aliasing can enable optimizations that would otherwise be impossible.
inline is used in the scenario where you implement a non-templated function within the header then include it from multiple compilation units.
This ensures that the compiler should create just one instance of the function as though it were inlined, so you do not get a link error for multiply defined symbol. It does not however require the compiler to actually inline it.
There are GNU flags I think force-inline or similar but that is a language extension.
register doesn't even say that you can't reference the
variable indirectly (at least in C++). It said that in the
original C, but that has been dropped.
Whether trying to out-optimize the compiler is a fool's errand
depends on the optimization. Not many compilers, for example,
will convert sin(x) * sin(x) + cos(x) * cos(x) into 1.
Today, most compilers ignore register, and no one uses it,
because compilers have become good enough at register allocation
to do a better job than you can with register. In fact,
respecting register would typically make the generated code
slower. This is not the case for inline or restrict: in
both cases, there exist techniques, at least theoretically,
which could result in the compiler doing a better job than you
can. Such techniques are not widespread, however, and (as far
as I know, at least), have a very high compile time overhead,
with in some cases compile times which grow exponentially with
the size of the program (which makes them more or less unusable
on most real programs—compile times which are measured in
years really aren't acceptable).
As to where to draw the line... it changes in time. When
I first started programming in C, register made a significant
difference, and was widely used. Today, no. I imagine that in
time, the same may happen with inline or restrict—some
experimental compilers are very close with inline already.
This is a flame-bait question but I will dive in anyway.
Compilers are a lot better at optimising that your average programmer. There was a time I programmed on a 25MHz 68030 and I got some advantage from the use of register because the compiler's optimizer was so poor. But that was back in 1990.
I see inline as just as bad as register.
In general, measure first before you modify. If you find that you code performs so poorly you want to use register or inline, take a deep breath, stand back and look for a better algorithm first.
In recent times (i.e. the last 5 years) I have gone through code bases and removed inline functions galore with no perceptible change in performance being visible. Code size, however, always benefits from the removal of inline methods. That isn't a big issue for your standard x86-style monster multicore marvel of the modern age but it does matter if you work in the embedded space.
It is a moving target, because compiler technology is improving. (Well, sometimes it is more changing than improving, but that has some of the same effect of rendering your optimization attempts moot, or worse.)
Generally, you should not guess at whether an optimization keyword or other optimization technique is good or not. One has to learn quite a bit about how computers work, including the particular platform you are targeting, and how compilers work.
So a rule about using various optimization techniques is to ask do I know the compiler will not do the best job here? Am I willing to commit to that for a while—will the compiler remain stable while this code is in use, am I willing to rewrite the code when the compiler changes this situation? Typically, you have to be an experienced and knowledgeable software engineer to know when you can do better than the compiler. It also helps if you can talk to the compiler developers.
This means people cannot give you an answer here that has a definite guideline. It depends on what compiler you are using, what your project is, what your resources are, and what your goals are, and so on.
Although some people say not to try to out-optimize the compiler, there are various areas of software engineering where people do better than a compiler and in which it is worth the expense of paying people for this.
The difference is as follows:
register is very local optimization (i.e. inside one function). The register allocation is a relatively solved problem both by smarter compilers and by larger number of register (mostly the former but say x86-64 have more registers then x86 and both have larger number then say 8-bit processor)
inline is harder as it is inter-procedure optimization. However as it involves relatively small depth of recursion and small number of procedures (if inlined procedure is too big there is no sense of inlining it) it may be safely left to the compiler.
restrict is much harder. To fully know the that two pointers don't alias you would need to analyse whole program (including libraries, system, plug-ins etc.) - and even then run into problems. However the information is clearer for programmer AND it is part of specification.
Consider very simple code:
void my_memcpy(void *dst, const void *src, size_t size) {
for (size_t i = 0; i < size; i++) {
((char *)dst)[i] = ((const char *)str)[i];
}
}
Is there a benefit to making this code efficient? Yes - memcpy tend to be very useful (say for copying GC). Can this code be vectorized (here - moved by words - say 128b instead of 8b)? Compiler would have to deduce that dst and src does not alias in any way and regions pointed by them are independent. size may depend on user input or runtime behaviour or other elements which makes the analysis practically impossible - similar problems to Halting Problem - in general we cannot analyse everything without running it. Or it might be part of C library (I assume shared libraries) and is called by program hence all call sites are not even known at compile time. Without such analysis the program would exhibit different behaviour with optimization on. On the other hand programmer might ensure that they are different objects simply by knowing the (even higher-level) design instead of need for bottom-up analysis.
restrict can also be part of documentation as it might be programmer who wrote the procedure in a way that it cannot handle 2 aliasing pointers. For example if we want to copy memory from aliasing locations the above code is incorrect.
So to sum up - Sufficiently Smart Compiler would not be able to deduce the restrict (unless we move to compilers understending the meaning of code) without knowing the whole program. Even then the it would be close to undecidability. However for local optimization the compilers are already sufficiently smart. My guess it that Sufficiently Smart Compiler with whole program analysis would be able to deduce in many interesting cases however.
PS. By local I mean single function. So local optimization cannot assume anything about arguments, global variables etc.
One thing that hasn't been mentioned is that many non-x86 compilers aren't nearly as good at optimizing as gcc and other "modern" C-compilers are.
For instance, the compilers for PIC are absolutely terrible at optimizing. Also, the optimizer for cicc (the CUDA compiler), though much better, still seems to miss a lot of fairly simple optimizations.
For these cases, I've found optimization hints like register, inline, and #pragma unroll to be extremely useful.
From what I have seen back in the days I was more involved with C/C++, these are merely orders directly given to the compiler. Compiler may try to inline a function even if it is not given the direct order to do so. That really depends on the compiler and may even raise some cross-compiler issues. As an example, visual studio provides different levels of optimization which correspond to the different intelligence levels of the compiler. I have read that all class functions are implicitly inline to give compiler a hint to minimize function call overhead. In any case, these directives are extremely helpful when you are using a less intelligent compiler while in intelligent cases, they may be very obvious for the compiler to do some optimization.
Also, be sure that these keywords are guaranteed to be safe. Some compiler optimizations may not work with some libraries such as OpenGL (as I have seen it myself). So in cases where you feel that compiler optimization may be harmful, you can use these keywords to make sure it is done the way you want it to.
The compilers such as g++ these days optimize the code very well. You might as well search for optimization elsewhere, maybe in the methods and algorithm you use or by using TBB or CUDA to make your code parallel.
I'd like to write an interpreter and tracing JIT for a programming language I'm designing. I already have many years of experience programming in C++, but I've been wondering if perhaps newer alternatives might be better. One of the things I found most frustrating, back in my C++ days, was having to use header files to deal with the clunky one-pass compiler model. The problem is that not all languages are equally suited for this purpose. For my tracing JIT, I need to be able to write executable code into memory and have the interpreter call to that code. I will also need the generated code to be able to call back into host functions.
I started looking at Go and saw that the language had pointers but no pointer arithmetic. This immediately struck me as a huge issue. I may well want to write my own allocator and garbage collector. I will need to closely control the way my language objects are laid out in memory and be able to get the address of specific fields and write to them. Unless there's ways to deal with this, it kind of seems like Go fails to be low-level enough for my purposes.
The D language seems promising. It has pointer arithmetic and a clear outline of the ABI needed to call in and out of D. I've heard lots of good things about it. It also has garbage collection which is nice for compiler writing, but I still have a few things I'm not sure about:
Does D have standard libs that will allow me to mark chunks of memory as executable?
If I allocate a big chunk of memory that I want to manage myself, with my own GC, and have a bunch of pointers going into there, will this pose problems with D's garbage collector?
How well does D interoperate with C code, in your experience? Is loading C dynamic libraries and calling into them fairly easy?
Finally, there's the whole support aspect. For those who have used D on linux here, how good is the toolchain? Any issues? Has anyone written a JIT compiler in D, and if so, how was the experience?
I believe so, see core.memory.GC if I remember right.
No, it shouldn't. Just call malloc or whatever you need, and make sure the GC doesn't see it.
Yes, it's pretty easy to interoperate with C code.
Caveat: You probably don't want to rely on the GC either, since it's not 'precise' (i.e. can and does leak memory if you're unlucky). But for small blocks of data it's usually fine.
Go does allow pointer arithmetic, but you must import the unsafe package to do so (or use a C function). Pointer arithmetic is a common source of bugs, and Go has other mechanisms, like slices, which provide safe ways to do some of the same activities that require pointer arithmetic in C. With unsafe you can cast any pointer to a uintptr and back, and uintptr is an ordinary numeric type, which allows you do do arithmetic.
I started looking at Go and saw that the language had pointers but no pointer arithmetic. This immediately struck me as a huge issue.
You, obviously, haven't tried the language. It works pretty well without any "pointer arithmetic". If you really need to bend rules, there is always "unsafe" package that will allow you to do anything.
I may well want to write my own allocator and garbage collector. I will need to closely control the way my language objects are laid out in memory and be able to get the address of specific fields and write to them.
I haven't written allocatior or garbage collector myself, but you can take address of a field of a structure. All Go data structures are simple and easy to control and reason about. See http://research.swtch.com/godata for short introduction. Also size and aligment guarantees are part of the language http://golang.org/ref/spec#Size_and_alignment_guarantees. If nothing else, you could always jump into C or asm.
IMHO, you should try to implement some small task to see if Go fits your requirements. Feel free to ask questions at http://groups.google.com/group/golang-nuts.
Alex
There is already a JIT compiler, very serious one, done in D. I highly recommend taking a look at http://lycus.org/ , more specifically pages about the MCI project - http://github.com/lycus/mci . MCI documentation will give you some more information. As you will see, MCI is more than just a JIT, it has its own (better than anything else I have seen) IR, optimizer, verifier, etc...
I am about to delve into kernel land. My question relates to the programming language. I have seen most tutorials to be written in C. I currently program in C++ and Assembly. I also studied C before C++, but I didn't use it a lot. Would it be possible to program in kernel mode using simplistic C++without using any advanced constructs? Basically I am trying to avoid the minor differences that exist between the two languages(like no bool in C, no automatic returning of 0 from main, really minor differences). I won't be using templates, classes and the like. So would it be possible to program in kernel mode using simplistic C++ without any major annoyances?
Even if not officially supported, you can use C++ as the development language for Windows kernel development.
You should be aware of the following things :
you MUST define the new and delete operator to map to ExAllocatePoolWithTag and ExFreePool.
try to avoid virtual functions. It seems not possible to control the location of the vtable of the object and this may have unexpected results if it is in a pageable portion and you code is called with IRQL >= DISPATCH_LEVEL.
if you still need to use virtual methods table than lock .rdata segment before using it on IRQL >= DISPATCH_LEVEL.
Apart from these kinds of limitations, you can use C++ for your driver development.
Add two links if you want to do C++ in WDK. It's a one time setup effort.
The NT Insider:Guest Article: C++ in an NT Driver
The NT Insider:Global Relief Effort - C++ Runtime Support for the NT DDK
Have seen kernel codes use lots of auto-locks/smart-pointers; although they make the code neat, I feel it has a learning curve for beginner to fully understand, and if abused, lots of construct/destruct codes slow things down.
If you write your code carefully, knowing what exactly stands behind each definition, operator, call, etc, then there should be no problem writing kernel code in C++. The Microsoft document mentioned in the comments above is a good reading precisely because it describes situations in which C++ isn't as transparent as C or doesn't provide similar important guarantees and from that you know what to avoid.
Microsoft has written a guide. Basically they tell us to steer clear of anything but using C++'s relaxed rules of variable declarations...sigh. Anything else and you're on your own. Anyway it can't be all that bad but here are some examples of what you need to remember:
Memory allocated in the paged pool can get paged out. If you try to access it when IRQL is above PASSIVE_LEVEL you're screwed (or at least you will be every once in a while when your customer complains about your driver BSODding their system)! Test your driver on a low memory system under load!
The non-paged pool is limited, you most likely cannot allocate all your needs from it.
Stack is much smaller than in user mode ~12-24K.
Anything you do involving floating point path in the kernel must be protected by KeSaveFloatingPointState and KeRestoreFloatingPointState
C++ exceptions: No
Read the guide for more. Now if you can make sure that the generated code follows the rules, go ahead and use C++.