LLVM | check if a value is signed [duplicate] - c++

The LLVM project does not distinguish between signed and unsigned integers as described here. There are situations where you need to know if a particular variable should be interpreted as signed or as unsigned though, for instance when it is size extended or when it is used in a division. My solution to this is to keep a separate type information for every variable that describes whether it is an integer or a cardinal type.
However, I am wondering, isn't there a way to "attribute" a type in LLVM that way? I was looking for some sort of "user data" that could be added to a type but there seems to be nothing. This would have to happen somehow when the type is created since equal types are generated only once in LLVM.
My question therefore is:
Is there a way to track whether an integer variable should be interpreted as signed or unsigned within the LLVM infrastructure, or is the only way indeed to keep separate information like I do?
Thanks

First of all, you have to be sure that you need inserting extra type meta-data since Clang already handles signed integer operations appropriately for example by using sdiv and srem rather than udev and urem.
Additionally, It's possible to utilize that to implement some lightweight type-inference based on how the variables are accessed in the IR. Note that an operation like add doesn't need signdness info since it is based on two-complement representation.
Otherwise, I think that the best way to do that is to modify the front-end (Clang) to add some custom DWARF debug info. Here is a link that might get you started.
UPDATE:
If your goal is to implement static-analysis directly on LLVM IR. This paper can offer a thorough discussion.
Navas, J.A., Schachte, P., Søndergaard, H., Stuckey, P.J.:
Signedness-agnostic program analysis: Precise integer bounds for low-level code. In: Jhala, R., Igarashi, A. (eds.) APLAS 2012. LNCS,
vol. 7705, pp. 115–130. Springer, Heidelberg (2012)

Related

C++ data type greater than 64 bits

I'm trying to write an RSA implementation and need to work with numbers that are 100 bits and larger. Are there any C++ data types that can allow for this?
If by "C++ data type" you mean "primitive integral type guaranteed by the standard", then no.
If by "C++ data type" you mean "primitive integral type that actually exists on my platform", then maybe, but you'd have to tell us what your platform is, and you haven't.
If by "C++ data type" you mean "any type usable in C++", then the answer is trivially of course, since any platform will be able to fit std::array<uint32_t, 4>. You'll have to write some code to use that like a regular integral type, though.
The more general solution is to use a big integer, arbitrary precision or multiprecision library. For example Boost.multiprecision, but you can find lots of others now you know the correct search terms.
Note
Maarten Bodewes makes a good point about security that I completely ignored by just answering the C++ part. You didn't say that your RSA implementation has any security requirements at all, but just in case ...
If you do care about it actually being safe to use in some real application, consider that
100 bits is probably much too weak, and
you may have more security concerns than just the correctness of the algorithm (such as side-channel attacks and malicious input attacks).
These are well outside the scope of this (or any other individual) question, but they deserve some thought and research. Using a multiprecision library indended specifically for cryptographic use is the minimal first step to getting this right.
If you've been using GCC and your computer supports 64-bit architecture, you could use __int128_t datatype in C++ to hold 16-bytes of data (i.e. 128-bits integer). As mentioned by #Batsheba, you could rather use the boost multiprecision library (comes along /multiprecision/cpp_int.hpp) in case you're having any trouble in using __int128_t.

Is there a fixed-width bool type in standard C++?

As far as I could find, the width of the bool type is implementation-defined. But are there any fixed-width boolean types, or should I stick to, for e.g., a uint8_t to represent a fixed-width bool?
[EDIT]
I made this python script that auto-generates a C++ class which can hold the variables I want to be able to send between a micro controller and my computer. The way it works is that it also keeps two arrays holding a pointer to each one of these variables and the sizeof each one of them. This gives me the necessary information to easily serialize and deserialize each one of these variables. For this to work however the sizeof, endianness, etc of the variable types have to be the same on both sides since I'm using the same generated code on both sides.
I don't know if this will be a problem yet, but I don't expect it to be. I have already worked with this (32bit ARM) chip before and haven't had problems sending integer and float types in the past. However it will be a few days until I'm back and can try booleans out on the chip. This might be a bigger issue later, since this code might be reused on other chips later.
So my question is. Is there a fixed width bool type defined in the standard libraries or should I just use a uint8_t to represent the boolean?
There is not. Just use uint8_t if you need to be sure of the size. Any integer type can easily be treated as boolean in C-related languages. See https://stackoverflow.com/a/4897859/1105015 for a lengthy discussion of how bool's size is not guaranteed by the standard to be any specific value.

Ada to C++: Pass an unsigned 64-bit value

I need to pass 2 pieces of data from an Ada program to some C++ code for processing.
Data - double.
Time - unsigned 64 bits.
I was able to make a procedure in Ada that worked with my C++ method using a Long_Float (double in C++) and Integer (int in C++, obviously not 64-bits though).
I used the following code (code not on me so syntax might be slightly off):
procedure send_data (this : in hidden_ptr; data : in Long_Float; time : in Integer);
pragma import (CPP, send_data, "MyClass::sendData");
Now that that's working, I'm trying to expand the time to the full 64-bits and ideally would like to have an unsigned long long on the C++ side. I don't see any types in Ada that match that so I created my own type:
type U64 is mod 2 ** 64;
When using that type with my send_data method I get an error saying there are no possible ways to map that type to a C++ type (something along those lines, again don't have the code or exact error phrase on me).
Is there a way to pass a user-defined type in Ada to C++? Perhaps there's another type in Ada I can use as an unsigned 64-bit value that would work? Is there a way to pass the address of my U64 type as a parameter to the C++ method instead if that's easier? I'm using the green hills adamulti compiler v3.5 (very new to ada, not sure if that info helps or not). Examples would be greatly appreciated!
As an addendum to #KeithThompson's comment/answer...
Ada's officially supported C interface types are in Interfaces.C, and there's no extra-long int or unsigned in there (in the 2005 version. Is the 2012 version official yet?).
You did the right thing to work around it. If your compiler didn't support that, more drastic measures will have to be taken.
The first obvious thing to try is to pass the 64-bit int by reference. That will require changes on the C side though.
I know C/C++ TIME structs tend to be 64-bit values, defined as unioned structs. So what you could do is define an Ada record to mimic one of those C structs, make sure it gets laid out the way C would (eg: with record representation and size clauses), and then make that object what your imported routine uses for its parameter.
After that you'll have to pull nasty parameter tricks. For example, you could try changing the Ada import's side of the parameter to a 64-bit float, unchecked_converting the actual parameter into a 64-bit float, and trying to pass it that way. The problem there is that a lot of CPU's pass floats in different registers than ints. So that likely won't work, and if it does it most certianly isn't portable.
There may well be other ways to fake it out, if you figure out how your compiler's CPP calling convention works. For example, if it uses two adajacent 32-bit registers to pass 64-bit ints, you could split your Ada 64-bit int into two and pass it in two parameters on the Ada side. If it passes 64-bit values by reference, you could just tell the Ada side that you're passing a pointer to your U64. Again, these solutions won't be portable, but they would get you going.
Put_Line("First = " & Interfaces.C.long'Image(Interfaces.C.long'First));
Put_Line("Last = " & Interfaces.C.long'Image(Interfaces.C.long'Last));
result:
First = -9223372036854775808
Last = 9223372036854775807

Is there any tool for C++ which will check for common unspecified behavior?

Often one makes assumptions about a particular platform one is coding on, for example that signed integers use two's complement storage, or that (0xFFFFFFFF == -1), or things of that nature.
Does a tool exist which can check a codebase for the most common violations of these kinds of things (for those of us who want portable code but don't have strange non-two's-complement machines)?
(My examples above are specific to signed integers, but I'm interested in other errors (such as alignment or byte order) as well)
There are various levels of compiler warnings that you may wish to have switched on, and you can treat warnings as errors.
If there are other assumptions you know you make at various points in the code you can assert them. If you can do that with static asserts you will get failure at compile time.
I know that CLang is very actively developing a static analyzer (as a library).
The goal is to catch errors at analysis time, however the exact extent of the errors caught is not that clear to me yet. The library is called "Checker" and T. Kremenek is the responsible for it, you can ask about it on clang-dev mailing list.
I don't have the impression that there is any kind of reference about the checks being performed, and I don't think it's mature enough yet for production tool (given the rate of changes going on) but it may be worth a look.
Maybe a static code analysis tool? I used one a few years ago and it reported errors like this. It was not perfect and still limited but maybe the tools are better now?
update:
Maybe one of these:
What open source C++ static analysis tools are available?
update2:
I tried FlexeLint on your example (you can try it online using the Do-It-Yourself Example on http://www.gimpel-online.com/OnlineTesting.html) and it complains about it but perhaps not in a way you are looking for:
5 int i = -1;
6 if (i == 0xffffffff)
diy64.cpp 6 Warning 650: Constant '4294967295' out of range for operator '=='
diy64.cpp 6 Info 737: Loss of sign in promotion from int to unsigned int
diy64.cpp 6 Info 774: Boolean within 'if' always evaluates to False [Reference: file diy64.cpp: lines 5, 6]
Very interesting question. I think it would be quite a challenge to write a tool to flag these usefully, because so much depends on the programmer's intent/assumptions
For example, it would be easy to recognize a construct like:
x &= -2; // round down to an even number
as being dependent on twos-complement representation, but what if the mask is a variable instead of a constant "-2"?
Yes, you could take it a step further and warn of any use of a signed int with bitwise &, any assignment of a negative constant to an unsigned int, and any assignment of a signed int to an unsigned int, etc., but I think that would lead to an awful lot of false positives.
[ sorry, not really an answer, but too long for a comment ]

What is the uintptr_t data type?

What is uintptr_t and what can it be used for?
First thing, at the time the question was asked, uintptr_t was not in C++. It's in C99, in <stdint.h>, as an optional type. Many C++03 compilers do provide that file. It's also in C++11, in <cstdint>, where again it is optional, and which refers to C99 for the definition.
In C99, it is defined as "an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer".
Take this to mean what it says. It doesn't say anything about size.
uintptr_t might be the same size as a void*. It might be larger. It could conceivably be smaller, although such a C++ implementation approaches perverse. For example on some hypothetical platform where void* is 32 bits, but only 24 bits of virtual address space are used, you could have a 24-bit uintptr_t which satisfies the requirement. I don't know why an implementation would do that, but the standard permits it.
uintptr_t is an unsigned integer type that is capable of storing a data pointer (whether it can hold a function pointer is unspecified). Which typically means that it's the same size as a pointer.
It is optionally defined in C++11 and later standards.
A common reason to want an integer type that can hold an architecture's pointer type is to perform integer-specific operations on a pointer, or to obscure the type of a pointer by providing it as an integer "handle".
It's an unsigned integer type exactly the size of a pointer. Whenever you need to do something unusual with a pointer - like for example invert all bits (don't ask why) you cast it to uintptr_t and manipulate it as a usual integer number, then cast back.
There are already many good answers to "what is uintptr_t data type?". I will try to address the "what it can be used for?" part in this post.
Primarily for bitwise operations on pointers. Remember that in C++ one cannot perform bitwise operations on pointers. For reasons see Why can't you do bitwise operations on pointer in C, and is there a way around this?
Thus in order to do bitwise operations on pointers one would need to cast pointers to type uintptr_t and then perform bitwise operations.
Here is an example of a function that I just wrote to do bitwise exclusive or of 2 pointers to store in a XOR linked list so that we can traverse in both directions like a doubly linked list but without the penalty of storing 2 pointers in each node.
template <typename T>
T* xor_ptrs(T* t1, T* t2)
{
return reinterpret_cast<T*>(reinterpret_cast<uintptr_t>(t1)^reinterpret_cast<uintptr_t>(t2));
}
Running the risk of getting another Necromancer badge, I would like to add one very good use for uintptr_t (or even intptr_t) and that is writing testable embedded code.
I write mostly embedded code targeted at various arm and currently tensilica processors. These have various native bus width and the tensilica is actually a Harvard architecture with separate code and data buses that can be different widths.
I use a test driven development style for much of my code which means I do unit tests for all the code units I write. Unit testing on actual target hardware is a hassle so I typically write everything on an Intel based PC either in Windows or Linux using Ceedling and GCC.
That being said, a lot of embedded code involves bit twiddling and address manipulations. Most of my Intel machines are 64 bit. So if you are going to test address manipulation code you need a generalized object to do math on. Thus the uintptr_t give you a machine independent way of debugging your code before you try deploying to target hardware.
Another issue is for the some machines or even memory models on some compilers, function pointers and data pointers are different widths. On those machines the compiler may not even allow casting between the two classes, but uintptr_t should be able to hold either.
-- Edit --
Was pointed out by #chux, this is not part of the standard and functions are not objects in C. However it usually works and since many people don't even know about these types I usually leave a comment explaining the trickery. Other searches in SO on uintptr_t will provide further explanation. Also we do things in unit testing that we would never do in production because breaking things is good.