APR and large files - c++

I want to use APR to mmap really large file, greater than 4Gb. At first I need to create file this big but I found that function apr_file_seek accepts parameter of type apr_seek_where_t that is just an alias for int. So it is possible to seek the first 4 gigs only. Is it possible to handle large files with APR?

You can seek multiple times with APR_CUR.
Also note that an int on a 32-bit system allows you two seek two gibibytes forward, not four.
Also note that on a 32-bit system the mmap will most probably fail to map more than two to three gibibytes. (When the address space is limited by 32 bits the maximum address space is four gibibytes but the operating system has to reserve some of that address space to itself).

Related

why the value of address in c s always even?

Why does the value of the address in C and C++ is always even?
For example: I declare a variable int x and x has a memory address 0x6ffe1c (in hexa decimal). No matter what. That value is never odd number it is always an even number. Why is that so??
Computer memory is composed of bits. The bits are organized into groups. A computer may have a gigabyte of memory, which is over 1,000,000,000 bytes or 8,000,000,000 bits, but the physical connections to memory cannot simply get any one particular bit from that memory.
When the processor wants data from memory, it puts a signal on a bus that asks for a particular word of memory. A bus is largely a set of wires that connects different parts of the computer. A word of memory is a group of bits of some size particular to that hardware, perhaps 32 bits. When the memory device sees a request for a word, it gets those bits and puts them on the bus, all at once. (The bus for that will have 32 or more wires, so it can carry all the data for one word at one time.)
Let’s continue with the example of 32-bit words. Since memory is grouped into words of 32 bits, each word has a memory address that is a multiple of 32 bits, or four bytes. And every address that is a multiple of four (0, 4, 8, 12, 16, … 4096, 4100, 4104,…) is the address of a word. The processor always reads or writes memory in units of words—that is the only interaction the hardware can do; the processor cannot read individual bytes from memory. If your int is in a single word, then the processor can get it from memory by asking for that word.
On the other hand, suppose your int starts at address 99. Then one byte of it is in the word that starts at address 96 (addresses 96 to 99), and three bytes of it are in the word that starts at address 100 (addresses 100 to 103). In order to get your int, the processor has to read two words and then stitch together bytes from them to make one int.
First, that is a waste of time. Doing two reads from memory takes longer than doing one read. Second, if the processor has to have extra wires and circuits for doing that, it makes the processor more expensive and use more energy, and it takes resources away from other things the processor could be doing, like adding or multiplying.
So processors are designed to prefer aligned data. They may have components for handling unaligned data, but using those components may take extra time or resources. So compilers are designed to align objects in ways that are preferable for the target architecture.
Why does the value of the address in C and C++ is always even?
This is not true in general. For example, if you have an array char[2], you'll find that exactly one of those elements has an even address, and the other must have an odd address.
I declare a variable int x and x has a memory address 0x6ffe1c
Every type in C and C++ have have some alignment requirement. That is, objects of the type must be stored in an address that is divisible by that alignment. Alignment requirement is an integer that is always a power of two.
There is exactly one power of two that is not even: 20 == 1. Objects with an alignment requirement of 1 can be stored in odd addresses. char always has size and alignment of 1 byte. int typically has a higher alignment requirement in which case it will be stored in an even address.
The reason why alignment is important is that there are CPU instruction sets which only allow reading and writing to memory addresses that are aligned to the width of the CPU word. Other CPU instruction sets may support operations on addresses aligned to some fractions of the word size. Further, some CPU instruction sets, such as the x86 support operating on entirely unaligned address but (at least on older models), such operations may be much slower.

Cache mapping techniques

Im trying to understand hardware Caches. I have a slight idea, but i would like to ask on here whether my understanding is correct or not.
So i understand that there are 3 types of cache mapping, direct, full associative and set associative.
I would like to know is the type of mapping implemented with logic gates in hardware and specific to say some computer system and in order to change the mapping, one would be required to changed the electrical connections?
My current understanding is that in RAM, there exists a memory address to refer to each block of memory. Within a block contains words, each words contain a number of bytes. We can represent the number of options with number of bits.
So for example, 4096 memory locations, each memory location contains 16 bytes. If we were to refer to each byte then 2^12*2^4 = 2^16
16 bit memory address would be required to refer to each byte.
The cache also has a memory address, valid bit, tag, and some data capable of storing a block of main memory of n words and thus m bytes. Where m = n*i (bytes per word)
For an example, direct mapping
1 block of main memory can only be at one particular memory location in cache. When the CPU requests for some data using a 16bit memory location of RAM, it checks for cache first.
How does it know that this particular 16bit memory address can only be in a few places?
My thoughts are, there could be some electrical connection between every RAM address to a cache address. The 16bit address could then be split into parts, for example only compare the left 8bits with every cache memory address, then if match compare the byte bits, then tag bits then valid bit
Is my understanding correct? Thank you!
Really do appreciate if someone read this long post
You may want to read 3.3.1 Associativity in What Every Programmer Should Know About Memory from Ulrich Drepper.
https://people.freebsd.org/~lstewart/articles/cpumemory.pdf#subsubsection.3.3.1
The title is a little bit catchy, but it explains everything you ask in detail.
In short:
the problem of caches is the number of comparisons. If your cache holds 100 blocks, you need to perform 100 comparisons in one cycle. You can reduce this number with the introduction of sets. if A specific memory-region can only be placed in slot 1-10, you reduce the number of comparisons to 10.
The sets are addressed by an additional bit-field inside the memory-address called index.
So for instance your 16 Bit (from your example) could be splitted into:
[15:6] block-address; stored in the `cache` as the `tag` to identify the block
[5:4] index-bits; 2Bit-->4 sets
[3:0] block-offset; byte position inside the block
so the choice of the method depends on the availability of hardware-resources and the access-time you want to archive. Its pretty much hardwired, since you want to reduce the comparison-logic.
There are few mapping functions used for map cache lines with main memory
Direct Mapping
Associative Mapping
Set-Associative Mapping
you have to have an slight idea about these three mapping functions

Casting size_t to allow more elements in a std::vector

I need to store a huge number of elements in a std::vector (more that the 2^32-1 allowed by unsigned int) in 32 bits. As far as I know this quantity is limited by the std::size_t unsigned int type. May I change this std::size_t by casting to an unsigned long? Would it resolve the problem?
If that's not possible, suppose I compile in 64 bits. Would that solve the problem without any modification?
size_t is a type that can hold size of any allocable chunk of memory. It follows that you can't allocate more memory than what fits in your size_t and thus can't store more elements in any way.
Compiling in 64-bits will allow it, but realize that the array still needs to fit in memory. 232 is 4 billion, so you are going to go over 4 * sizeof(element) GiB of memory. More than 8 GiB of RAM is still rare, so that does not look reasonable.
I suggest replacing the vector with the one from STXXL. It uses external storage, so your vector is not limited by amount of RAM. The library claims to handle terabytes of data easily.
(edit) Pedantic note: size_t needs to hold size of maximal single object, not necessarily size of all available memory. In segmented memory models it only needs to accommodate the offset when each object has to live in single segment, but with different segments more memory may be accessible. It is even possible to use it on x86 with PAE, the "long" memory model. However I've not seen anybody actually use it.
There are a number of things to say.
First, about the size of std::size_t on 32-bit systems and 64-bit systems, respectively. This is what the standard says about std::size_t (§18.2/6,7):
6 The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size
in bytes of any object.
7 [ Note: It is recommended that implementations choose types for ptrdiff_t and size_t whose integer
conversion ranks (4.13) are no greater than that of signed long int unless a larger size is necessary to
contain all the possible values. — end note ]
From this it follows that std::size_t will be at least 32 bits in size on a 32-bit system, and at least 64 bits on a 64-bit system. It could be larger, but that would obviously not make any sense.
Second, about the idea of type casting: For this to work, even in theory, you would have to cast (or rather: redefine) the type inside the implementation of std::vector itself, wherever it occurs.
Third, when you say you need this super-large vector "in 32 bits", does that mean you want to use it on a 32-bit system? In that case, as the others have pointed out already, what you want is impossible, because a 32-bit system simply doesn't have that much memory.
But, fourth, if what you want is to run your program on a 64-bit machine, and use only a 32-bit data type to refer to the number of elements, but possibly a 64-bit type to refer to the total size in bytes, then std::size_t is not relevant because that is used to refer to the total number of elements, and the index of individual elements, but not the size in bytes.
Finally, if you are on a 64-bit system and want to use something of extreme proportions that works like a std::vector, that is certainly possible. Systems with 32 GB, 64 GB, or even 1 TB of main memory are perhaps not extremely common, but definitely available.
However, to implement such a data type, it is generally not a good idea to simply allocate gigabytes of memory in one contiguous block (which is what a std::vector does), because of reasons like the following:
Unless the total size of the vector is determined once and for all at initialization time, the vector will be resized, and quite likely re-allocated, possibly many times as you add elements. Re-allocating an extremely large vector can be a time-consuming operation. [ I have added this item as an edit to my original answer. ]
The OS will have difficulties providing such a large portion of unfragmented memory, as other processes running in parallel require memory, too. [Edit: As correctly pointed out in the comments, this isn't really an issue on any standard OS in use today.]
On very large servers you also have tens of CPUs and typically NUMA-type memory architectures, where it is clearly preferable to work with relatively smaller chunks of memory, and have multiple threads (possibly each running on a different core) access various chunks of the vector in parallel.
Conclusions
A) If you are on a 32-bit system and want to use a vector that large, using disk-based methods such as the one suggested by #JanHudec is the only thing that is feasible.
B) If you have access to a large 64-bit system with tens or hundreds of GB, you should look into an implementation that divides the entire memory area into chunks. Essentially something that works like a std::vector<std::vector<T>>, where each nested vector represents one chunk. If all chunks are full, you append a new chunk, etc. It is straight-forward to implement an iterator type for this, too. Of course, if you want to optimize this further to take advantage of multi-threading and NUMA features, it will get increasingly complex, but that is unavoidable.
A vector might be the wrong data structure for you. It requires storage in a single block of memory, which is limited by the size of size_t. This you can increase by compiling for 64 bit systems, but then you can't run on 32 bit systems which might be a requirement.
If you don't need vector's particular characteristics (particularly O(1) lookup and contiguous memory layout), another structure such as a std::list might suit you, which has no size limits except what the computer can physically handle as it's a linked list instead of a conveniently-wrapped array.

Optimal Struct size for modern systems

I've read that the ideal size of a structure for performance, that's going to be used in a large collection, is 32 bytes. Is this true and why? Does this effect 64bit processors or is it not applicable?
This is in context of modern (2008+) home Intel-based systems.
The ideal size of a struct is enough to hold the information it needs to contain.
The optimal size for a struct is usually the minimum size needed to store whatever data it's supposed to contain without requiring any hacks like bit twiddling/misaligned accesses to make it fit.
The ideal size of a structure is likely to be one cache line (or a sub-multiple thereof). Level one cache lines are typically 32 or 64 bytes. Splitting an element of a data structure across a cache line boundary will require two main memory accesses to read or write it instead of one.
I don't think there is a reasonable answer to your question. Without any information on the context of the application, the "ideal size of a structure" is way, way underspecified.
As an aside, 32 bits is the space of one modern integer -- it isn't large enough for a "struct" except of a couple of characters or bitfields.

How to store bits to a huge char array for file input/output

I want to store lots of information to a block by bits, and save it into a file.
To keep my file not so big, I want to use a small number of bits to save specified information instead of a int.
For example, I want to store Day, Hour, Minute to a file.
I only want 5 bit(day) + 5 bit(Hour) + 6 bit(Minute) = 16 bit of memory for data storage.
I cannot find a efficient way to store it in a block to put in a file.
There are some big problems in my concern:
the data length I want to store each time is not constant. It depends on the incoming information. So I cannot use a structure to store it.
there must not be any unused bit in my block, I searched for some topics that mentioned that if I store 30 bits in an int(4 byte variable), then the next 3 bit I save will automatically go into the next int. but I do not want it to happen!!
I know I can use shift right, shift left to put a number to a char, and put the char into a block, but it is inefficient.
I want a char array that I can continue putting specified bits into, and use write to put it into a file.
I think I'd just use the number of bits necessary to store the largest value you might ever need for any given piece of information. Then, Huffman encode the data as you write it (and obviously Huffman decode it as you read it). Most other approaches are likely to be less efficient, and many are likely to be more complex as well.
I haven't seen such a library. So I'm afraid you'll have to write one yourself. It won't be difficult, anyway.
And about the efficiency. This kind of operations always need bits shifting and masking, because few CPUs support directly operating into bits, especially between two machine words. The only difference is you or your compiler does the translation.