Data conversion for ARM platform (from x86/x64) - c++

We have developed win32 application for x86 and x64 platform. We want to use the same application on ARM platform. Endianness will vary for ARM platform i.e. ARM platform uses Big endian format in general. So we want to handle this in our application for our device.
For e.g. // In x86/x64, int nIntVal = 0x12345678
In ARM, int nIntVal = 0x78563412
How values will be stored for the following data types in ARM?
double
char array i.e. char chBuffer[256]
int64
Please clarify this.
Regards,
Raphel

Endianess only matters for register <-> memory operations.
In a register there is no endianess. If you put
int nIntVal = 0x12345678
in your code it will have the same effect on any endianess machine.
all IEEE formats (float, double) are identical in all architectures, so this does not matter.
You only have to care about endianess in two cases:
a) You write integers to files that have to be transferable between the two architectures.
Solution: Use the hton*, ntoh* family of converters, use a non-binary file format (e.g. XML) or a standardised file format (e.g. SQLite).
b) You cast integer pointers.
int a = 0x1875824715;
char b = a;
char c = *(char *)&a;
if (b == c) {
// You are working on Little endian
}
The latter code by the way is a handy way of testing your endianess at runtime.
Arrays and the likes if you use write, fwrite falimies of calls to transfer them you will have no problems unless they contain integers: then look above.
int64_t: look above. Only care if you have to store them binary in files or cast pointers.

(Sergey L., above, says, taht you mostly don't have to care for the byte order. He is right, with at least 1 exception: I assumed you want to convert binary data from one platform to the other ...)
http://en.wikipedia.org/wiki/Endianness has a good overview.
In short:
Little endian means, the least significant byte is stored first (at the lowest address)
Big endian means the most significant byte is stored first
The order in which array elements are stored is not affected (but the byte order in array elements, of course)
So
char array is unchanged
int64 - byte order is reversed compared to x86
With regard to the floating point format, consider http://en.wikipedia.org/wiki/Endianness#Floating-point_and_endianness. Generally it seems to obey the same rules of endianness as the integer format, but there is are exceptions for older ARM platforms. (I've no first hand experience of that).
Generally I'd suggest, test your conversion of primitive types by controlled experiments first.
Also consider, that compilers might use different padding in structs (a topic you haven't addressed yet).
Hope this helps.

In 98% cases you don't need to care about endianness. Unless you need to transfer some data between systems of different endiannness, or read/write some endian-sensitive file format, you should not bother with it. And even in those cases, you can write your code to perform properly when compiled under any endianness.
From Rob Pike's "The byte order fallacy" post:
Let's say your data stream has a little-endian-encoded 32-bit integer.
Here's how to extract it (assuming unsigned bytes):
i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);
If it's big-endian, here's how to extract it:
i = (data[3]<<0) | (data[2]<<8) | (data[1]<<16) | (data[0]<<24);
Both these snippets work on any machine, independent of the machine's
byte order, independent of alignment issues, independent of just about
anything. They are totally portable, given unsigned bytes and 32-bit
integers.

The arm is little endian, it has two big endian variants depending on architecture, but it is better to just run native little endian, the tools and volumes of code out there are more fully tested in little endian mode.
Endianness is just one factor in system engineering if you do your system engineering it all works out, no fears, no worries. Define your interfaces and code to that design. Assuming for example that one processors endianness automatically results in having to byteswap is a bad assumption and will bite you eventually. You will end up having to swap an even number of times to undo other bad assumptions that cause a swap (ideally swapping zero times of course rather than 2 or 4 or 6, etc times). If you have any endian concerns at all when writing code you should write it endian independent.
Since some ARMs have BE32 (word invariant) and the newer arms BE8 (byte invariant) you would have to do even more work to try to make something generic that also tries to compensate for little intel, little arm, BE32 arm and BE8 arm. Xscale tends to run big endian natively but can be run as little endian to reduce the headaches. You may be assuming that because an ARM clone is big endian then all are big endian, another bad assumption.

Related

How to write convertible code, 32 bit/64 bit?

A c++ specific question. So i read a question about what makes a program 32 bit/64 bit, and the anwser it got was something like this (sorry i cant find the question, was somedays ago i looked at it and i cant find it again:( ): As long as you dont make any "pointer assumptions", you only need to recompile it. So my question is, what are pointer assumtions ? To my understanding there is 32 bit pointer and 64 bit pointers so i figure it is something to do with that . Please show the diffrence in code between them. Any other good habits to keep in mind while writing code, that helps it making it easy to convert between the to are also welcome :) tho please share examples with them
Ps. I know there is this post:
How do you write code that is both 32 bit and 64 bit compatible?
but i tougth it was kind of to generall with no good examples, for new programmers like myself. Like what is a 32 bit storage unit ect. Kinda hopping to break it down a bit more (no pun intended ^^ ) ds.
In general it means that your program behavior should never depend on the sizeof() of any types (that are not made to be of some exact size), neither explicitly nor implicitly (this includes possible struct alignments as well).
Pointers are just a subset of them, and it probably also means that you should not try to rely on being able to convert between unrelated pointer types and/or integers, unless they are specifically made for this (e.g. intptr_t).
In the same way you need to take care of things written to disk, where you should also never rely on the size of e.g. built in types, being the same everywhere.
Whenever you have to (because of e.g. external data formats) use explicitly sized types like uint32_t.
For a well-formed program (that is, a program written according to syntax and semantic rules of C++ with no undefined behaviour), the C++ standard guarantees that your program will have one of a set of observable behaviours. The observable behaviours vary due to unspecified behaviour (including implementation-defined behaviour) within your program. If you avoid unspecified behaviour or resolve it, your program will be guaranteed to have a specific and certain output. If you write your program in this way, you will witness no differences between your program on a 32-bit or 64-bit machine.
A simple (forced) example of a program that will have different possible outputs is as follows:
int main()
{
std::cout << sizeof(void*) << std::endl;
return 0;
}
This program will likely have different output on 32- and 64-bit machines (but not necessarily). The result of sizeof(void*) is implementation-defined. However, it is certainly possible to have a program that contains implementation-defined behaviour but is resolved to be well-defined:
int main()
{
int size = sizeof(void*);
if (size != 4) {
size = 4;
}
std::cout << size << std::endl;
return 0;
}
This program will always print out 4, despite the fact it uses implementation-defined behaviour. This is a silly example because we could have just done int size = 4;, but there are cases when this does appear in writing platform-independent code.
So the rule for writing portable code is: aim to avoid or resolve unspecified behaviour.
Here are some tips for avoiding unspecified behaviour:
Do not assume anything about the size of the fundamental types beyond that which the C++ standard specifies. That is, a char is at least 8 bit, both short and int are at least 16 bits, and so on.
Don't try to do pointer magic (casting between pointer types or storing pointers in integral types).
Don't use a unsigned char* to read the value representation of a non-char object (for serialisation or related tasks).
Avoid reinterpret_cast.
Be careful when performing operations that may over or underflow. Think carefully when doing bit-shift operations.
Be careful when doing arithmetic on pointer types.
Don't use void*.
There are many more occurrences of unspecified or undefined behaviour in the standard. It's well worth looking them up. There are some great articles online that cover some of the more common differences that you'll experience between 32- and 64-bit platforms.
"Pointer assumptions" is when you write code that relies on pointers fitting in other data types, e.g. int copy_of_pointer = ptr; - if int is a 32-bit type, then this code will break on 64-bit machines, because only part of the pointer will be stored.
So long as pointers are only stored in pointer types, it should be no problem at all.
Typically, pointers are the size of the "machine word", so on a 32-bit architecture, 32 bits, and on a 64-bit architecture, all pointers are 64-bit. However, there are SOME architectures where this is not true. I have never worked on such machines myself [other than x86 with it's "far" and "near" pointers - but lets ignore that for now].
Most compilers will tell you when you convert pointers to integers that the pointer doesn't fit into, so if you enable warnings, MOST of the problems will become apparent - fix the warnings, and chances are pretty decent that your code will work straight away.
There will be no difference between 32bit code and 64bit code, the goal of C/C++ and other programming languages are their portability, instead of the assembly language.
The only difference will be the distrib you'll compile your code on, all the work is automatically done by your compiler/linker, so just don't think about that.
But: if you are programming on a 64bit distrib, and you need to use an external library for example SDL, the external library will have to also be compiled in 64bit if you want your code to compile.
One thing to know is that your ELF file will be bigger on a 64bit distrib than on a 32bit one, it's just logic.
What's the point with pointer? when you increment/change a pointer, the compiler will increment your pointer from the size of the pointing type.
The contained type size is defined by your processor's register size/the distrib your working on.
But you just don't have to care about this, the compilation will do everything for you.
Sum: That's why you can't execute a 64bit ELF file on a 32bit distrib.
Typical pitfalls for 32bit/64bit porting are:
The implicit assumption by the programmer that sizeof(void*) == 4 * sizeof(char).
If you're making this assumption and e.g. allocate arrays that way ("I need 20 pointers so I allocate 80 bytes"), your code breaks on 64bit because it'll cause buffer overruns.
The "kitten-killer" , int x = (int)&something; (and the reverse, void* ptr = (void*)some_int). Again an assumption of sizeof(int) == sizeof(void*). This doesn't cause overflows but looses data - the higher 32bit of the pointer, namely.
Both of these issues are of a class called type aliasing (assuming identity / interchangability / equivalence on a binary representation level between two types), and such assumptions are common; like on UN*X, assuming time_t, size_t, off_t being int, or on Windows, HANDLE, void* and long being interchangeable, etc...
Assumptions about data structure / stack space usage (See 5. below as well). In C/C++ code, local variables are allocated on the stack, and the space used there is different between 32bit and 64bit mode due to the point below, and due to the different rules for passing arguments (32bit x86 usually on the stack, 64bit x86 in part in registers). Code that just about gets away with the default stacksize on 32bit might cause stack overflow crashes on 64bit.
This is relatively easy to spot as a cause of the crash but depending on the configurability of the application possibly hard to fix.
Timing differences between 32bit and 64bit code (due to different code sizes / cache footprints, or different memory access characteristics / patterns, or different calling conventions ) might break "calibrations". Say, for (int i = 0; i < 1000000; ++i) sleep(0); is likely going to have different timings for 32bit and 64bit ...
Finally, the ABI (Application Binary Interface). There's usually bigger differences between 64bit and 32bit environments than the size of pointers...
Currently, two main "branches" of 64bit environments exist, IL32P64 (what Win64 uses - int and long are int32_t, only uintptr_t/void* is uint64_t, talking in terms of the sized integers from ) and LP64 (what UN*X uses - int is int32_t, long is int64_t and uintptr_t/void* is uint64_t), but there's the "subdivisions" of different alignment rules as well - some environments assume long, float or double align at their respective sizes, while others assume they align at multiples of four bytes. In 32bit Linux, they align all at four bytes, while in 64bit Linux, float aligns at four, long and double at eight-byte multiples.
The consequence of these rules is that in many cases, bith sizeof(struct { ...}) and the offset of structure/class members are different between 32bit and 64bit environments even if the data type declaration is completely identical.
Beyond impacting array/vector allocations, these issues also affect data in/output e.g. through files - if a 32bit app writes e.g. struct { char a; int b; char c, long d; double e } to a file that the same app recompiled for 64bit reads in, the result will not be quite what's hoped for.
The examples just given are only about language primitives (char, int, long etc.) but of course affect all sorts of platform-dependent / runtime library data types, whether size_t, off_t, time_t, HANDLE, essentially any nontrivial struct/union/class ... - so the space for error here is large,
And then there's the lower-level differences, which come into play e.g. for hand-optimized assembly (SSE/SSE2/...); 32bit and 64bit have different (numbers of) registers, different argument passing rules; all of this affects strongly how such optimizations perform and it's very likely that e.g. SSE2 code which gives best performance in 32bit mode will need to be rewritten / needs to be enhanced to give best performance 64bit mode.
There's also code design constraints which are very different for 32bit and 64bit, particularly around memory allocation / management; an application that's been carefully coded to "maximize the hell out of the mem it can get in 32bit" will have complex logic on how / when to allocate/free memory, memory-mapped file usage, internal caching, etc - much of which will be detrimental in 64bit where you could "simply" take advantage of the huge available address space. Such an app might recompile for 64bit just fine, but perform worse there than some "ancient simple deprecated version" which didn't have all the maximize-32bit peephole optimizations.
So, ultimately, it's also about enhancements / gains, and that's where more work, partly in programming, partly in design/requirements comes in. Even if your app cleanly recompiles both on 32bit and 64bit environments and is verified on both, is it actually benefitting from 64bit ? Are there changes that can/should be done to the code logic to make it do more / run faster in 64bit ? Can you do those changes without breaking 32bit backward compatibility ? Without negative impacts on the 32bit target ? Where will the enhancements be, and how much can you gain ?
For a large commercial project, answers to these questions are often important markers on the roadmap because your starting point is some existing "money maker"...

Endianness swap without ntohs

I am writing an ELF analyzer, but I'm having some trouble converting endianness properly. I have functions to determine the endianness of the analyzer and the endiannness of the object file.
Basically, there are four possible scenarios:
A big endian compiled analyzer run on a big endian object file
nothing needs converted
A big endian compiled analyzer run on a little endian object file
the byte order needs swapped, but ntohs/l() and htons/l() are both null macros on a big endian machine, so they won't swap the byte order. This is the problem
A little endian compiled analyzer run on a big endian object file
the byte order needs swapped, so use htons() to swap the byte order
A little endian compiled analyzer run on a little endian object file.
nothing needs converted
Is there a function I can use to explicitly swap byte order/change endianness, since ntohs/l() and htons/l() take the host's endianness into account and sometimes don't convert? Or do I need to find/write my own swap byte order function?
I think it's worth raising The Byte Order Fallacy article here, by Rob Pyke (one of Go's author).
If you do things right -- ie you do not assume anything about your platforms byte order -- then it will just work. All you need to care about is whether ELF format files are in Little Endian or Big Endian mode.
From the article:
Let's say your data stream has a little-endian-encoded 32-bit integer. Here's how to extract it (assuming unsigned bytes):
i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);
If it's big-endian, here's how to extract it:
i = (data[3]<<0) | (data[2]<<8) | (data[1]<<16) | (data[0]<<24);
And just let the compiler worry about optimizing the heck out of it.
In Linux there are several conversion functions in endian.h, which allow to convert between arbitrary endianness:
uint16_t htobe16(uint16_t host_16bits);
uint16_t htole16(uint16_t host_16bits);
uint16_t be16toh(uint16_t big_endian_16bits);
uint16_t le16toh(uint16_t little_endian_16bits);
uint32_t htobe32(uint32_t host_32bits);
uint32_t htole32(uint32_t host_32bits);
uint32_t be32toh(uint32_t big_endian_32bits);
uint32_t le32toh(uint32_t little_endian_32bits);
uint64_t htobe64(uint64_t host_64bits);
uint64_t htole64(uint64_t host_64bits);
uint64_t be64toh(uint64_t big_endian_64bits);
uint64_t le64toh(uint64_t little_endian_64bits);
Edited, less reliable solution. You can use union to access the bytes in any order. It's quite convenient:
union {
short number;
char bytes[sizeof(number)];
};
Do I need to find/write my own swap byte order function?
Yes you do. But, to make it easy, I refer you to this question: How do I convert between big-endian and little-endian values in C++? which gives a list of compiler specific byte order swap functions, as well as some implementations of byte order swap functions.
The ntoh functions can swap between more than just big and little endian. Some systems are also 'middle endian' where the bytes are scrambled up rather than just ordered one way or another.
Anyway, if all you care about are big and little endian, then all you need to know is if the host and the object file's endianess differ. You'll have your own function which unconditionally swaps byte order and you'll call it or not based on whether or not host_endianess()==objectfile_endianess().
If I would think about a cross-platform solution that would work on windows or linux, I would write something like:
#include <algorithm>
// dataSize is the number of bytes to convert.
char le[dataSize];// little-endian
char be[dataSize];// big-endian
// Fill contents in le here...
std::reverse_copy(le, le + dataSize, be);

C++ : why bool is 8 bits long?

In C++, I'm wondering why the bool type is 8 bits long (on my system), where only one bit is enough to hold the boolean value ?
I used to believe it was for performance reasons, but then on a 32 bits or 64 bits machine, where registers are 32 or 64 bits wide, what's the performance advantage ?
Or is it just one of these 'historical' reasons ?
Because every C++ data type must be addressable.
How would you create a pointer to a single bit? You can't. But you can create a pointer to a byte. So a boolean in C++ is typically byte-sized. (It may be larger as well. That's up to the implementation. The main thing is that it must be addressable, so no C++ datatype can be smaller than a byte)
Memory is byte addressable. You cannot address a single bit, without shifting or masking the byte read from memory. I would imagine this is a very large reason.
A boolean type normally follows the smallest unit of addressable memory of the target machine (i.e. usually the 8bits byte).
Access to memory is always in "chunks" (multiple of words, this is for efficiency at the hardware level, bus transactions): a boolean bit cannot be addressed "alone" in most CPU systems. Of course, once the data is contained in a register, there are often specialized instructions to manipulate bits independently.
For this reason, it is quite common to use techniques of "bit packing" in order to increase efficiency in using "boolean" base data types. A technique such as enum (in C) with power of 2 coding is a good example. The same sort of trick is found in most languages.
Updated: Thanks to a excellent discussion, it was brought to my attention that sizeof(char)==1 by definition in C++. Hence, addressing of a "boolean" data type is pretty tied to the smallest unit of addressable memory (reinforces my point).
The answers about 8-bits being the smallest amount of memory that is addressable are correct. However, some languages can use 1-bit for booleans, in a way. I seem to remember Pascal implementing sets as bit strings. That is, for the following set:
{1, 2, 5, 7}
You might have this in memory:
01100101
You can, of course, do something similar in C / C++ if you want. (If you're keeping track of a bunch of booleans, it could make sense, but it really depends on the situation.)
I know this is old but I thought I'd throw in my 2 cents.
If you limit your boolean or data type to one bit then your application is at risk for memory curruption. How do you handle error stats in memory that is only one bit long?
I went to a job interview and one of the statements the program lead said to me was, "When we send the signal to launch a missle we just send a simple one bit on off bit via wireless. Sending one bit is extremelly fast and we need that signal to be as fast as possible."
Well, it was a test to see if I understood the concepts and bits, bytes, and error handling. How easy would it for a bad guy to send out a one bit msg. Or what happens if during transmittion the bit gets flipped the other way.
Some embedded compilers have an int1 type that is used to bit-pack boolean flags (e.g. CCS series of C compilers for Microchip MPU's). Setting, clearing, and testing these variables uses single-instruction bit-level instructions, but the compiler will not permit any other operations (e.g. taking the address of the variable), for the reasons noted in other answers.
Note, however, that std::vector<bool> is allowed to use bit-packing, i.e. to store the bits in smaller units than an ordinary bool. But it is not required.

Is there a relation between integer and register sizes?

Recently, I was challenged in a recent interview with a string manipulation problem and asked to optimize for performance. I had to use an iterator to move back and forth between TCHAR characters (with UNICODE support - 2bytes each).
Not really thinking of the array length, I made a curial mistake with not using size_t but an int to iterate through. I understand it is not compliant and not secure.
int i, size = _tcslen(str);
for(i=0; i<size; i++){
// code here
}
But, the maximum memory we can allocate is limited. And if there is a relation between int and register sizes, it may be safe to use an integer.
E.g.: Without any virtual mapping tools, we can only map 2^register-size bytes. Since TCHAR is 2 bytes long, half of that number. For any system that has int as 32-bits, this is not going to be a problem even if you dont use an unsigned version of int. People with embedded background used to think of int as 16-bits, but memory size will be restricted on such a device. So I wonder if there is a architectural fine-tuning decision between integer and register sizes.
The C++ standard doesn't specify the size of an int. (It says that sizeof(char) == 1, and sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long).
So there doesn't have to be a relation to register size. A fully conforming C++ implementation could give you 256 byte integers on your PC with 32-bit registers. But it'd be inefficient.
So yes, in practice, the size of the int datatype is generally equal to the size of the CPU's general-purpose registers, since that is by far the most efficient option.
If an int was bigger than a register, then simple arithmetic operations would require more than one instruction, which would be costly. If they were smaller than a register, then loading and storing the values of a register would require the program to mask out the unused bits, to avoid overwriting other data. (That is why the int datatype is typically more efficient than short.)
(Some languages simply require an int to be 32-bit, in which case there is obviously no relation to register size --- other than that 32-bit is chosen because it is a common register size)
Going strictly by the standard, there is no guarantee as to how big/small an int is, much less any relation to the register size. Also, some architectures have different sizes of registers (i.e: not all registers on the CPU are the same size) and memory isn't always accessed using just one register (like DOS with its Segment:Offset addressing).
With all that said, however, in most cases int is the same size as the "regular" registers since it's supposed to be the most commonly used basic type and that's what CPUs are optimized to operate on.
AFAIK, there is no direct link between register size and the size of int.
However, since you know for which platform you're compiling the application, you can define your own type alias with the sizes you need:
Example
#ifdef WIN32 // Types for Win32 target
#define Int16 short
#define Int32 int
// .. etc.
#elif defined // for another target
Then, use the declared aliases.
I am not totally aware, if I understand this correct, since some different problems (memory sizes, allocation, register sizes, performance?) are mixed here.
What I could say is (just taking the headline), that on most actual processors for maximum speed you should use integers that match register size. The reason is, that when using smaller integers, you have the advantage of needing less memory, but for example on the x86 architecture, an additional command for conversion is needed. Also on Intel you have the problem, that accesses to unaligned (mostly on register-sized boundaries) memory will give some penality. Off course, on todays processors things are even more complex, since the CPUs are able to process commands in parallel. So you end up fine tuning for some architecture.
So the best guess -- without knowing the architectore -- speeedwise is, to use register sized ints, as long you can afford the memory.
I don't have a copy of the standard, but my old copy of The C Programming Language says (section 2.2) int refers to "an integer, typically reflecting the natural size of integers on the host machine." My copy of The C++ Programming Language says (section 4.6) "the int type is supposed to be chosen to be the most suitable for holding and manipulating integers on a given computer."
You're not the only person to say "I'll admit that this is technically a flaw, but it's not really exploitable."
There are different kinds of registers with different sizes. What's important are the address registers, not the general purpose ones. If the machine is 64-bit, then the address registers (or some combination of them) must be 64-bits, even if the general-purpose registers are 32-bit. In this case, the compiler may have to do some extra work to actually compute 64-bit addresses using multiple general purpose registers.
If you don't think that hardware manufacturers ever make odd design choices for their registers, then you probably never had to deal with the original 8086 "real mode" addressing.

Any way to read big endian data with little endian program?

An external group provides me with a file written on a Big Endian machine, and they also provide a C++ parser for the file format.
I only can run the parser on a little endian machine - is there any way to read the file using their parser without add a swapbytes() call after each read?
Back in the early Iron Age, the Ancients encountered this issue when they tried to network primitive PDP-11 minicomputers with other primitive computers. The PDP-11 was the first little-Endian computer, while most others at the time were big-Endian.
To solve the problem, once and for all, they developed the network byte order concept (always big-Endia), and the corresponding network byte order macros ntohs(), ntohl(), htons(), and htonl(). Code written with those macros will always "get the right answer".
Lean on your external supplier to use the macros in their code, and the file they supply you will always be big-Endian, even if they switch to a little-Endian machine. Rewrite the parser they gave you to use the macros, and you will always be able to read their file, even if you switch to a big-Endian machine.
A truly prodigious amount of programmer time has been wasted on this particular problem. There are days when I think a good argument could be made for hanging the PDP-11 designer who made the little-Endian feature decision.
Try persuading the parser team to include the following code:
int getInt(char* bytes, int num)
{
int ret;
assert(num == 4);
ret = bytes[0] << 24;
ret |= bytes[1] << 16;
ret |= bytes[2] << 8;
ret |= bytes[3];
return ret;
}
it might be more time consuming than a general int i = *(reinterpret_cast<*int>(&myCharArray)); but will always get the endianness right on both big and small endian systems.
In general, there's no "easy" solution to this. You will have to modify the parser to swap the bytes of each and every integer read from the file.
It depends upon what you are doing with the data. If you are going to print the data out, you need to swap the bytes on all the numbers. If you are looking through the file for one or more values, it may be faster to byte swap your comparison value.
In general, Greg is correct, you'll have to do it the hard way.
the best approach is to just define the endianess in the file format, and not say it's machine dependent.
the writer will have to write the bytes in the correct order regardless of the CPU it's running on, and the reader will have to do the same.
You could write a parser that wraps their parser and reverses the bytes, if you don't want to modify their parser.
Be conscious of the types of data being read in. A 4-byte int or float would need endian correction. A 4-byte ASCII string would not.
In general, no.
If the read/write calls are not type aware (which, for example fread and fwrite are not) then they can't tell the difference between writing endian sensitive data and endian insensitive data.
Depending on how the parser is structured you may be able to avoid some suffering, if the I/O functions they use are aware of the types being read/written then you could modify those routines apply the correct endian conversions.
If you do have to modify all the read/write calls then creating just such a routine would be a sensible course of action.
Your question somehow conatins the answer: No!
I only can run the parser on a little endian machine - is there any way to read the file using their parser without add a swapbytes() call after each read?
If you read (and want to interpret) big endian data on a little endian machine, you must somehow and somewhere convert the data. You might do this after each read or after the whole file has been read (if the data read does not contain any information on how to read further data) - but there is no way in omitting the conversion.