lseek with 2 gb file in windows [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Question about file seeking position
I am facing one problem related to lseek(). It returns failure in case when we are trying to access 2GB+ file in windows (32 bit machine). Is there any limit upto which lseek can set the file pointer in the file which we are using???
offset valye is 2154654555.
Compiler Details
c:\Program Files\Inno Setup 5\Compil32.exe

You should have a look at _lseeki64, which takes 64-bit offsets. lseek() (and its successor, _lseek()) are limited to signed 32-bit offsets, which have an upper limit of 2147483647. Your offset of 215465455 exceeds that (and would be treated as a negative number if stored in a long). See http://msdn.microsoft.com/en-us/library/1yee101t. (You'll need something comparable for your compiler.)

The maximum value of off_t is 2147483647, where off_t is the the type for the offset in lseek()

lseek doesn't work with files greater than 2 GB, because the offset input is on a 32 bit variable which cannot take value greater than 2147483647. In many OS it is either supported through compile time macros or by providing alternative functions.
You can try _lseeki64 in case of MSVC Compiler. It takes 64 bit variable for offset. Since you are not using MSVC, you can check for equivalent function.

Related

Adding a 1 bit flag in a 64 bit pointer

If you define a union for traversing a data structure squentially (start and end offset) or via a pointer to a tree structure depending on the number of data elements on a 64 bit system where these unions are aligned with cache lines, is there a possibility of both adding a one bit flag at one of those 64 bits in order to know which traversal must be used and still allowing to reconstruct the right pointer?
union {
uint32_t offsets[2];
Tree<NodeData> * tree;
};
It's system dependent, but I don't think any 64-bit system really uses its full pointer-length yet.
Also, if you know your data is 2n-aligned, chances are those n bits are just sitting idle there (On some old systems they just would not exist. But I don't think any of those were 64-bit systems, and anyway they are no longer of interest).
As an example, x86_64 uses 48bits, the upper 16 must be the same as bit47. (sign-extended)
Another example, ARM64 uses 49bits (2 mappings of 48bit at the same time), so there you only have 15 bits left.
Just remember to corect the pilfered bits. (You might want to use uintptr_t instead of a pointer, and convert after the correction.)
Using a mis-aligned or impossible pointer causes behavior ranging from silent auto-correction, ranging over silent mis-behavior, to loud crashes.

size of char being written to file as a binary value in C++

What I understood about char type from a few questions asked here is that it is always 1 byte in C++, but number of bits can vary from system to system.
sizeof() operator uses char as a unit so sizeof(char) is always 1 in bytes of C++.(which takes number of bits of smallest unit of address of local machine) If when using file functions of fstream() in binary mode, we directly read and write from/to an address of any variable in RAM, the size of variable as smallest unit of data written to file should be in size of the value read from RAM and for one read from file it is vice-versa. Then can we say that data may not be written 8 by 8 in bits if something like this is tried:
ofstream file;
file.open("blabla.bin",ios::out|ios::binary);
char a[]="asdfghjkkll";
file.seekp(0);
file.write((char*)a,sizeof(a)-1);
file.close();
Unless char is always used in bytes existing standard 8 bits, what happens if a heap of data is written to file in a 16 bit machine and is read in a 32 bit machine? Or should I use OS-dependent text mode? If not, and I misunderstood what is truth?
Edit : I have corrected my mistake.
Thanks for warning.
Edit2: My system is 64 bit but I get number of bits of char type as 8.What is wrong? Is the way I get the result of 8false?
I got a 00000... by shifting a char variable more than possible size of it with bitwise operators.After guaranteeing that all bits of the variable is zero, I got a 111... by inverting it. And shifted until it become zero.If we shift it its size time, we get a zero, so we can get number of bits from indice of the loop terminated below.
char zero,test;
zero<<=64; //hoping that system is not more than 64 bit(most likely)
test=~zero; //we have a 111...
int i;
for(i=0; test!=zero; i++)
test=test<<1;
Value of variable of i after the loop is number of bits in char type.According to this, the result is 8.
My last question is:
Are filesystem byte and char type different data types because how computer adresses pointers in file stream is different from standart char type which is at least 8 bits?
So, exactly what is going on the background?
Edit3: Why these minuses? What is my mistake? Isn't the question clear enough? Maybe my question is stupid but why there is no any response related to my question?
A language standard can't really specify what the filesystem does - it can only specify how the language interacts with it. The C and C++ standards also don't address anything to do with interoperability or communication between different implementations. In other words, there isn't a general answer to this question except to say that:
the VAST majority of systems use 8-bit bytes
the C and C++ standard require that char is at least 8 bits
it is very likely that greater-than-8-bit systems have mechanisms in place to somehow utilize (or at least transcode) 8-bit files.

Doing serialization in c++

I was reading this page about serialization in c++.
http://www.parashift.com/c++-faq-lite/serialize-binary-format.html
Third bullet got me confused (the one that starts with: "If the binary data might get read by a different computer than the one that wrote it, be very careful about endian issues (little-endian vs. big-endian) and sizeof issues") which also mentions: "header file that contains machine dependencies (I usually call it machine.h)".
What are these endiannes and sizeof issues? (sizeof probably being that on one machine int can be 4 bytes while on another for example less bytes right?).
How would that machine.h file look like?
Is there some tutorial on internet which explains all these things, in an understandable way?
Sometimes in some source codes I also encounter typedefs like:
typedef unsigned long long u64_t;
is it related somehow to that machine.h file?
sizeof: on one architecture long is 64 bits on another 32 bits.
endianness: let's assume that 4-byte long. The 4 bytes can be placed in different order in memory, say on intel the least significant bits are at the lowest address, on motorola or sparc the order is the opposite, but there can be processors with 2301 order too.

Sizeof() difference between C++ on PC and Arduino [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
In the following code, the value of structSize is different depending on whether it's executed on an Arduino vs my PC (Ubuntu 11.04 x64).
struct testStruct{
uint8_t val1;
uint16_t val2;
};
...
uint_8_t structSize = sizeof(testStruct);
On my PC, the value of structSize is 4, and on my Arduino the value of structSize is 3 (as expected).
Where is this 4th byte coming from?
Actually, I would have expected the size to be 4, because uint16_t is usually aligned to 16 bits.
The extra byte is padding inserted between the members to keep the alignment of uint16_t.
This is compiler dependent though. Arduino might be more selfish with memory and probably doesn't care that much about alignment. (possible explanation)
It's because of differing ABIs between the two CPU types you're targetting. It seems like that on Arduino (ARM v7?) differs from x86_64.
On x86 at least, uint16_t (short) is generally aligned to a two-byte boundary. In order to achieve that, a byte of padding is inserted after val1. I expect the same is true on x86_64.
There's lots of information about this in the Wikipedia article on x86 structure padding.
You may be able to achieve what you want using the #pragma pack directive … but here be dragons, don't tell anyone I suggested it :)
If you are designing a database engine to run on a mobile processor then carry on, but for most anything else you are writing, your time will be better spent on building functionality using type systems that are easy to understand and relatively standard across architectures.

Writing binary data in c++

I am in the process of building an assembler for a rather unusual machine that me and a few other people are building. This machine takes 18 bit instructions, and I am writing the assembler in C++.
I have collected all of the instructions into a vector of 32 bit unsigned integers, none of which is any larger than what can be represented with an 18 bit unsigned number.
However, there does not appear to be any way (as far as I can tell) to output such an unusual number of bits to a binary file in C++, can anyone help me with this.
(I would also be willing to use C's stdio and File structures. However there still does not appear to be any way to output such an arbitrary amount of bits).
Thank you for your help.
Edit: It looks like I didn't specify how the instructions will be stored in memory well enough.
Instructions are contiguous in memory. Say the instructions start at location 0 in memory:
The first instruction will be at 0. The second instruction will be at 18, the third instruction will be at 36, and so on.
There is no gaps, or no padding in the instructions. There can be a few superfluous 0s at the end of the program if needed.
The machine uses big endian instructions. So an instruction stored as 3 should map to: 000000000000000011
Keep an eight-bit accumulator.
Shift bits from the current instruction into to the accumulator until either:
The accumulator is full; or
No bits remain of the current instruction.
Whenever the accumulator is full:
Write its contents to the file and clear it.
Whenever no bits remain of the current instruction:
Move to the next instruction.
When no instructions remain:
Shift zeros into the accumulator until it is full.
Write its contents.
End.
For n instructions, this will leave (8 - 18n mod 8) zero bits after the last instruction.
There are a lot of ways you can achieve the same end result (I am assuming the end result is a tight packing of these 18 bits).
A simple method would be to create a bit-packer class that accepts the 32-bit words, and generates a buffer that packs the 18-bit words from each entry. The class would need to do some bit shifting, but I don't expect it to be particularly difficult. The last byte can have a few zero bits at the end if the original vector length is not a multiple of 4. Once you give all your words to this class, you can get a packed data buffer, and write it to a file.
You could maybe represent your data in a bitset and then write the bitset to a file.
Wouldn't work with fstreams write function, but there is a way that is described here...
The short answer: Your C++ program should output the 18-bit values in the format expected by your unusual machine.
We need more information, specifically, that format that your "unusual machine" expects, or more precisely, the format that your assembler should be outputting. Once you understand what the format of the output that you're generating is, the answer should be straightforward.
One possible format — I'm making things up here — is that we could take two of your 18-bit instructions:
instruction 1 instruction 2 ...
MSB LSB MSB LSB ...
bits → ABCDEFGHIJKLMNOPQR abcdefghijklmnopqr ...
...and write them in an 8-bits/byte file thus:
KLMNOPQR CDEFGHIJ 000000AB klmnopqr cdefghij 000000ab ...
...this is basically arranging the values in "little-endian" form, with 6 zero bits padding the 18-bit values out to 24 bits.
But I'm assuming: the padding, the little-endianness, the number of bits / byte, etc. Without more information, it's hard to say if this answer is even remotely near correct, or if it is exactly what you want.
Another possibility is a tight packing:
ABCDEFGH IJKLMNOP QRabcdef ghijklmn opqr0000
or
ABCDEFGH IJKLMNOP abcdefQR ghijklmn 0000opqr
...but I've made assumptions about where the corner cases go here.
Just output them to the file as 32 bit unsigned integers, just as you have in memory, with the endianness that you prefer.
And then, when the loader / eeprom writer / JTAG or whatever method you use to send the code to the machine, for each 32 bit word that is read, just omit the 14 more significant bits and send the real 18 bits to the target.
Unless, of course, you have written a FAT driver for your machine...