I am porting application from fortran to JAVA.I was wondering how to convert if equivalence is between two different datatypes.
If I type cast,i may loose the data or should I pass that as byte array?
You have to fully understand the old FORTRAN code. EQUIVALENCE shares memory WITHOUT converting the values between different datatypes. Perhaps the programmer was conserving memory by overlapping arrays that weren't used at the same time and the EQUIVALENCE can be ignored. Perhaps they were doing something very tricky, based on the binary representation of a particular platform, and you will need to figure out what they were doing.
There is extremely little reason to use EQUIVALENCE in modern Fortran. In most cases where bits need to be transferred from one type to another without conversion, the TRANSFER intrinsic function should be used instead.
From http://www.fortran.com/F77_std/rjcnf0001-sh-8.html#sh-8.2 :
An EQUIVALENCE statement is used to specify the sharing of storage units by two or more entities in a program unit. This causes association of the entities that share the storage units.
If the equivalenced entities are of different data types, the EQUIVALENCE statement does not cause type conversion or imply mathematical equivalence. If a variable and an array are equivalenced, the variable does not have array properties and the array does not have the properties of a variable.
So, consider the reason it was EQUIVALENCE'd in the Fortran code and decide from there how to proceed. There's not enough information in your question to assess the intention or best way to convert it.
Related
I've been trying to wrap my head around how C/C++ code is represented in machine code and I'm having trouble understanding what data types actually are apart from a designation of memory length.
Types also are associated with;
a set of values that all variables of that type can represent;
the layout in memory of that type (e.g. the meaning, if any, attached to each bit or byte that represents a variable),
the set of operations that can act on a variable;
the behaviour of those operations.
Types are not necessarily represented directly in machine code. A compiler emits a set of instructions and data (in a way that varies between target platforms) that manipulate memory and machine registers. The type of each variable, in C source, gives information to the compiler about what memory to allocate for it, and the compiler makes decisions for mapping between expressions (in C statements) and usage of registers and machine instructions to give the required effects.
In C/C++, addition or subtraction on pointer is defined only if the resulting pointer lies within the original pointed complete object. Moreover, comparison of two pointers can only be performed if the two pointed objects are subobjects of a unique complete object.
What are the reasons of such limitations?
I supposed that segmented memory model (see here §1.2.1) could be one of the reasons but since compilers can actually define a total order on all pointers as demonstrated by this answer, I am doubting this.
The reason is to keep the possibility to generate reasonable code. This applies to systems with a flat memory model as well as to systems with more complex memory models. If you forbid the (not very useful) corner cases like adding or subtracting out of arrays and demanding a total order on pointers between objects you can skip a lot of overhead in the generated code.
The limitations imposed by the standard allows the compiler to make assumptions on pointer arithmetic and use this to improve quality of the code. It covers both computing things statically in the compiler instead of at runtime and choosing which instrutions and addressing modes to use. As an example, consider a program with two pointers p1 and p2. If the compiler can derive that they point to different data objects it can safely assume that any no operation based on following p1 will ever affect the object pointed to by p2. This allows the compiler to reorder loads and stores based on p1 without consider loads and stores based on p2 and the other way around.
There are architectures where program and data spaces are separated, and it's simply impossible to subtract two arbitrary pointers. A pointer to a function or to const static data will be in a completely different address space than a normal variable.
Even if you arbitrarily supplied a ranking between different address spaces, there's a possibility that the diff_t type would need to be a larger size. And the process of comparing or subtracting two pointers would be greatly complicated. That's a bad idea in a language that is designed for speed.
You only prove that the restriction could be removed - but miss that it would come with a cost (in terms of memory and code) - which was contrary to the goals of C.
Specifically the difference needs to have a type, which is ptrdiff_t, and one would assume it is similar to size_t.
In a segmented memory model you (normally) indirectly have a limitation on the sizes of objects - assuming that the answers in: What's the real size of `size_t`, `uintptr_t`, `intptr_t` and `ptrdiff_t` type on 16-bit systems using segmented addressing mode? are correct.
Thus at least for differences removing that restriction would not only add extra instructions to ensure a total order - for an unimportant corner case (as in other answer), but also spend double the amount of memory for differences etc.
C was designed to be more minimalistic and not to force compiler to spend memory and code on such cases. (In those days memory limitations mattered more.)
Obviously there are also other benefits - like the possibility to detect errors when mixing pointers from different arrays. Similarly as mixing iterators for two different containers is undefined in C++ (with some minor exceptions) - and some debug-implementations detect such errors.
The rationale is that some architectures have segmented memory, and pointers to different objects may point at different memory segments. The difference between the two pointers would then not necessarily be something meaningful.
This goes back all the way to pre-standard C. The C rationale doesn't mention this explicitly, but it hints at this being the reason, if we look where it explains the rationale why using a negative array index is undefined behavior (C99 rationale 5.10 6.5.6, emphasis mine):
In the case of p-1, on the other hand, an entire object would have to be allocated prior to the
array of objects that p traverses, so decrement loops that run off the bottom of an array can fail.
This restriction allows segmented architectures, for instance, to place objects at the start of a
range of addressable memory.
Since the C standard intends to cover the majority of processor architectures, it should also cover this one:
Imagine an architecture (I know one, but wouldn't name it) where pointers are not just plain numbers, but are like structures or "descriptors". Such a structure contains information about the object it points into (its virtual address and size) and the offset within it. Adding or subtracting a pointer produces a new structure with only the offset field adjusted; producing a structure with the offset greater than the size of the object is hardware prohibited. There are other restrictions (such as how the initial descriptor is produced or what are the other ways to modify it), but they are not relevant to the topic.
In most cases where the Stanadrd classifies an action as invoking Undefined Behavior, it has done so because:
There might be platforms where defining the behavior would be expensive. Segmented architectures could behave weirdly if code tries to do pointer arithmetic that extends beyond object boundaries, and some compilers may evaluate p > q by testing the sign of q-p.
There are some kinds of programming where defining the behavior would be useless. Many kinds of code can get by just fine without relying upon forms of pointer addition, subtraction, or relational comparison beyond those given by the Standard.
People writing compilers for various purposes should be capable of recognizing cases where quality compilers intended for such purposes should behave predictably, and handling such cases when appropriate, whether or not the Standard compels them to do so.
Both #1 and #2 are very low bars, and #3 was thought to be a "gimme". Although it has become fashionable for compiler writers to show off their cleverness by finding ways of breaking code whose behavior was defined by quality implementations intended for low-level programming, I don't think the authors of the Standard expected compiler writers to perceive a huge difference between actions which were required to behave predictably, versus those where nearly all quality implementations were expected to behave identically, but where there it might conceivably be useful to let some arcane implementations do something else.
I would like to answer this by inverting the question. Instead of asking why pointer addition and most of the arithmetic operations are not allowed, why do pointers allow only adding or subtracting an integer, post and pre increment and decrement and comparison (or subtraction) of pointers pointing to the same array? It is to do with the logical consequence of the arithmetic operation.
Adding/subtracting an integer n to a pointer p gives me the address of nth element from the currently pointed element either in the forward or reverse direction. Similarly, subtracting p1 and p2 pointing to the same array gives me the count of elements between the two pointers.
The fact (or design) that the pointer arithmetic operations are defined consistent with the type of variable it is pointing to is a real stroke of genius. Any operation other than the permitted ones defies programming or philosophically logical reasoning and therefore is not allowed.
I have heard quite a lot about storing external data in pointer.
For example in (short string optimization).
For example:
when we want to overload << for our SSO class, dependant of the length of the string we want to print either value of pointer or string.
Instead of creating bool flag we could encode this flag inside pointer itself. If i am not mistaken its thanks PC architecture that adds padding to prevent unalligned memory access.
But i have yet to see it in example. How could we detect such flag, when binary operation such as & to check if RSB or LSB is set to 1 ( as a flag ) are not allowed on pointers? Also wouldnt this mess up dereferencing pointers?
All answers are appreciated.
It is quite possible to do such things (unlike other's have said). Most modern architectures (x86-64, for example) enforce alignment requirements that allow you to use the fact that the least significant bits of a pointer may be assumed to be zero, and make use of that storage for other purposes.
Let me pause for a second and say that what I'm about to describe is considered 'undefined behavior' by the C & C++ standard. You are going off-the-rails in a non-portable way by doing what I describe, but there are more standards governing the rules of a computer than the C++ standard (such as the processors assembly reference and architecture docs). Caveat emptor.
With the assumption that we're working on x86_64, let us say that you have a class/structure that starts with a pointer member:
struct foo {
bar * ptr;
/* other stuff */
};
By the x86 architectural constraints, that pointer in foo must be aligned on an 8-byte boundary. In this trivial example, you can assume that every pointer to a struct foo is therefore an address divisible by 8, meaning the lowest 3 bits of a foo * will be zero.
In order to take advantage of such a constraint, you must play some casting games to allow the pointer to be treated as a different type. There's a bunch of different ways of performing the casting, ranging from the old C method (not recommended) of casting it to and from a uintptr_t to cleaner methods of wrapping the pointer in a union. In order to access either the pointer or ancillary data, you need to logically 'and' the datum with a bitmask that zeros out the part of the datum you don't wish.
As an example of this explanation, I wrote an AVL tree a few years ago that sinks the balance book-keeping data into a pointer, and you can take a look at that example here: https://github.com/jschmerge/structures/blob/master/tree/avl_tree.h#L31 (everything you need to see is contained in the struct avl_tree_node at the line I referenced).
Swinging back to a topic you mentioned in your initial question... Short string optimization isn't implemented quite the same way. The implementations of it in Clang and GCC's standard libraries differ somewhat, but both boil down to using a union to overload a block of storage with either a pointer or an array of bytes, and play some clever tricks with the string's internal length field for differentiating whether the data is a pointer or local array. For more of the details, this blog post is rather good at explaining: https://shaharmike.com/cpp/std-string/
"encode this flag inside pointer itself"
No, you are not allowed to do this in either C or C++.
The behaviour on setting (let alone dereferencing) a pointer to memory you don't own is undefined in either language.
Sadly what you want to achieve is to be done at the assembler level, where the distinction between a pointer and integer is sufficiently blurred.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
So already know there's like 'blocks' or units of memory called.. bytes? and different variables take up different amounts of bytes. But my real question is when you create a new program, say on the compiler, does the memory start storing at address one. And using a pointer you can see what fills what blocks of memory? Also is this ram? Sorry for so much wondering by trying to get a grasp on the lower level part of c++ to get a hint of how memory is stored and such, thanks.
Objects in C++ occupy memory, and if you can obtain the address of an object, you can inspect that memory. It's completely unspecified where and how that memory comes about; it's supposed to be provided by "the platform", i.e. the compiler knows how to generate machine code that interacts with the system's notion of memory in such a way that every object fits into some memory. You also have platform-provided services (malloc and operator new) to give you memory directly for your own use.
Since this question is likely to be closed fast (it fits in well with the original idea of SO, but not with current "policy") I'm adding this answer quickly so that I can continue writing it. I disagree that strongly with current policy, for this particular kind of case. So…
About the topic.
Memory management is an extremely large topic. However, your questions about it, e.g. “does the memory start storing at address one”, concern the very basics. And this is small topic, possible to answer.
The C++ memory model.
/ Bytes.
As seen from the inside of a C++ program, memory is a not necessarily contiguous sequence of bytes. A byte is in this context the smallest addressable unit of electronic memory (or more generally of computer main memory, if other technologies should become popular), and corresponds to C++ char. The C++11 standard describes it thusly, in its §1.7/1:
“A byte is at least large enough to contain
any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8
encoding form and is composed of a contiguous sequence of bits, the number of which is implementation-defined”
Essential facts about C++ bytes:
A byte is at least 8 bits.
In practice it’s either 8 bits or 16 bits. The latter size is used on some digital signal processors, e.g. from Texas Instruments.
The number of bits per byte is given by CHAR_BIT.
This macro symbol is defined by the <limits.h> C header. It yields a value that can be used at compile time. An alternative way to designate that value is std::numeric_limits<unsigned char>::digits, after including the <limits> C++ header.
unsigned char is commonly used as a byte type.
All three variants of char, namely plain char, unsigned char and signed char, are guaranteed to map to byte, but there is no dedicated standard C++ byte type.
/ Locations.
A value of a built-in type such as double typically occupies a small number of bytes, contiguous in memory. The C++ standard, in its §1.7/3, refers to that, the bytes of a basic value, as a memory location. The essential fact about locations is that two threads can update separate memory locations without interfering with each other, but this is not guaranteed if they update separate bytes in the same memory location.
The sizeof operator produces the number of bytes of a value of a specified type.
By definition, in C++11 in §5.3.3/1, sizeof(char) is 1.
/ Addresses.
To quote the C++11 standard’s §1.7/1, “Every byte has a unique address.”.
The standard doesn’t define address further, but in practice, on modern machines the addresses that a C++ program deals with are bitpatterns of a fixed size, typically 32 or 64 bits.
When a C++ program deals directly with addresses it must do so via pointers, which are adresses with associated types. As a special case the pointer type void* represents untyped addresses, and as such must be able to store the largest address bitpatterns. Thus, on a modern machine CHAR_BIT*sizeof(void*) is in practice the number of bits of an address as seen from inside a C++ program.
Pointer values (addresses) are only guaranteed comparable via the built-in ==, < etc. if they point within the same array, extended with a hypothetical extra item at the end. However, the standard library offers a more general pointer comparision. C++ §20.8.5/8:
“For templates greater, less, greater_equal, and less_equal, the specializations for any pointer type
yield a total order, even if the built-in operators <, >, <=, >= do not.”
Thus depending on the machine addresses, as seen from C++, either are or can be mapped to integer values. But this does not mean that they can be mapped to int. Depending on the C++ implementation type int may be too small to hold addresses.
There are very few guarantees about what direction addresses increase in, e.g. that subsequent varaible declarations give you locations with increasing addresses. However, there is such a guarantee for non-static data members that (C++03) have no intervening access specifier or (C++11) have the same access, e.g. public. C++11 §9.2/14:
“Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so
that later members have higher addresses within a class object.”
There is also such a guarantee for items of an array.
The literal 0, used where a pointer value is expected, denotes the nullpointer of the relevant type. For the built in relational operators C++ supports comparing a non-0 pointer to 0 via == and !=, but does not support magnitude comparisons. For absolute safety pointer comparisons can be done via e.g. std::less etc., as noted above.
/ Objects.
An object is “a region of storage”, according to C++11 §1.8/1. That paragraph also notes that an object “has a type”, which determines how the bits in the memory region are interpreted. In order to create an object you can simply declare a variable (a variable is an object with a name) or e.g. use a new-expression.
Worth noting:
A region, in the formal sense of the C++ standard, is not necessarily contiguous.
As far as I can determine this fact is only implicit in the standard, in that an object can be a sub-object, which can be an object of a class with virtual inheritance (sharing a common base class sub-object), in a context of multiple inheritance, where that object – by definition a region of storage – is necessarily spread out in memory.
Dave Abrahams once contended that the intent was to support C++ implementations where objects could be spread around also in other situations than multiple virtual inheritance, but as far as I know no C++ implementations do that. In particular, a variable or any other most derived object (object that isn’t part of some other object) o is in practice a contiguous region of bytes, with all the bytes contained in the sizeof(o) bytes extending from and including the object’ start address.
/ Arrays.
An array, in the sense of array created via the [] notation, is contiguous sequence of objects of some fixed type T. Each item (object in the array) has an associated index, starting at 0 for the first item and contigously increasing. To refer to the first item of an array a, you can use square bracket notation and write a[0].
If the first item has start address a, then iten number n has start address a + n*sizeof(T).
In other words, addresses increase in the same direction as the item indices, with item 0 placed lowest in memory.
Operating system processes.
A C++ program can run on just about any kind of computer, from the smallest embedded chips to the larges supercomputers. In the small computer end of the scale there is not necessarily any operating system or memory management hardware, with the program accessing the computer’s physical memory and other hardware directly. But on e.g. a typical cell phone or desktop computer the program will be executed in an operating system process that isolates the program from direct access to the computer.
In particular, the addresses that an OS process see and manage, may not necessarily be physical memory addresses. Instead they may be just logical addresses, which transparently to your C++ code are very efficiently mapped to physical addresses. Among other things this allows you to run two or more instances of your program at the same time, without their memory addressing clashing – because the instances’ logical addresses are mapped to different parts of physical memory.
Practical note: as a security measure, unless otherwise specified a C++ program for Windows, created with Microsoft’s tools, will have have parts placed at different logical addresses in different instances, to make it more difficult for malware to exploit known locations. Thus you can’t even rely on fixed logical addresses. And so where objects will be placed, and so on, is not just compiler dependent and operating system dependent, but can depend on the particular instance of the program…
Still you have the guarantees discussed above, namely …
increasing addresses for sub-objects with the same access (e.g. public) within the same outer object, and
increasing addresses in the direction of higher indices in an array.
malloc and operator new are the library calls for allocating memory in C++ program. It is important to note that they aren't provided by the platform, they are provided by the standard library. All that is specified in C++ standard is that these calls should return a memory address that is allocated for the program code.
The platform usually have a different API for allocating memory from the OS, e.g. in Linux there are mmap() and brk() system calls, in Windows there is VirtualAlloc() system call. Malloc and operator new uses these system specific syscalls to request memory from the OS, and then suballocate them to the program. In the OS kernel itself, these system calls usually modifies MMU entries (on architectures that uses MMU).
I am using boost::units library to enforce physical consistency in a scientific project. I have read and tried several examples from boost documentation. I am able to create my dimensions, units and quantities. I did some calculus, it works very well. It is exactly what I expected, except that...
In my project, I deal with time series which have several different units (temperature, concentration, density, etc.) based on six dimensions. In order to allow safe and easy units conversions, I would like to add a member to each channel class representing the dimensions and units of time series. And, data treatment (import, conversion, etc.) are user-driven, therefore dynamic.
My problem is the following, because of the boost::units structure, quantities within an homogeneous system but with different dimensions have different types. Therefore you cannot directly declare a member such as:
boost::units::quantity channelUnits;
Compiler will claim you have to specify dimensions using template chevrons. But if you do so, you will not be able to store different type of quantities (say quantities with different dimensions).
Then, I looked for boost::units::quantity declaration to find out if there is a base class that I can use in a polymorphic way. But I haven't found it, instead I discovered that boost::units heavily uses Template Meta Programming which is not an issue but does not exactly fit my dynamic needs since everything is resolved at compile-time not at run-time.
After more reading, I tried to wrap different quantities in a boost::variant object (nice to meet it for the very first time).
typedef boost::variant<
boost::units::quantity<dim1>,
...
> channelUnitsType;
channelUnitsType channelUnits;
I performed some tests and it seems to work. But I am not confident with boost::variant and the visitor-pattern.
My questions are the following:
Is there another - maybe best - way to have run-time type resolution?
Is dynamic_cast one of them? Units conversion will not happen very often and only few data are in concern.
If boost::variant is a suitable solution, what are its drawbacks?
Going deeper in my problem I read two articles providing tracks for a solution:
Kostadin Damevski, Expressing Measurements Units in Interfaces for Scientific Component Software;
Lingxiao Jiang, A Practical Type System for Validating Dimensional Units Correctness of C Programs.
The first gives good ideas for the interface implementation. The second gives a complete overview of what you must cope with.
I keep in mind that boost::units is a complete and efficient way for dimension consistency at compile-time without overhead at runtime. Anyway, for runtime dimension consistency involving dimensions changes you do need a dynamic structure that boost::units does not provide. So here am I: designing a units class that will exactly fit my needs. More work to achieve, more satisfaction at the end...
About the original questions:
boost::variant works well (it provides the dynamic boost::units is missing) for this job. And furthermore, it can be serialized out of the box. Thus it is a effective approach. But it is adding a layer of abstraction for a simple - I am not saying trivial - task that could be done by a single class.
Casting is achieved by boost::variant_cast<> instead of dynamic_cast<>.
boost::any could be easier to implement but serialization becomes an hard way.
I have been thinking about this problem and came up with the following conclusion:
1. Implement type erasure (pros: nice interfaces, cons:memory overhead)
It looks impossible to store without overhead a general quantity with common dimension, that break one of the design principles of the libraries. Even type erasure won't help here.
2. Implement a convertible type (pros: nice interfaces, cons:operational overhead)
The only way I see without storage overhead, is to choose a conventional (possibly hidden) system where all units are converted to and from. There is no memory overhead but there is a multiplication overhead in almost all queries to the values and a tremendous number of conversion and some loose of precision of high exponent, (think of conversion from avogadro number to the 10 power).
3. Allow implicit conversions (pros: nice interfaces, cons:harder to debug, unexpected operational overheads)
Another option, mostly in the practical side to alleviate the problem is to allow implicit conversion at the interface level, see here: https://groups.google.com/d/msg/boost-devel-archive/JvA5W9OETt8/5fMwXWuCdDsJ
4. Template/Generic code (pros: no runtime or memory overhead, conceptually correct, philosophy follows that of the library, cons: harder to debug, ugly interfaces, possible code bloat, lots of template parameters everywhere)
If you ask the library designer probably they will tell you that you need to make your functions generic. This is possible but it complicates the code. For example:
template<class Length>
auto square(Length l) -> decltype(l*l){return l*l;}
I use C++11 to simplify the example here (it is possible to do it in C++98), and also to show that this is becoming easier to do in C++11 (and even simpler in C++14 with decltype(auto).
I know that this is not the type of code you had in mind but it is consistent with the design of the library. You may think, well how do I restrict this function to physical length and not something else? Well, the answer is that you don't need to this, however if you insist, in the worst case...
template<class Length, typename std::enable_if<std::is_same<typename get_dimension<Lenght>::type, boost::units::length_dimension>::value>::type>
auto square(Length l) -> decltype(l*l){return l*l;}
(In better cases decltype will do the SFINAE job.)
In my opinion, option 4. and possibly combined with 3. is the most elegant way ahead.
References:
https://www.boost.org/doc/libs/1_69_0/boost/units/get_dimension.hpp