function to compare two single byte elements - compare

I'm recording sounds and saving them into a byte[] buffer. Now I want to compare every single element of those two list, to know how different they are. Could anyone tell me wich is the best method to do that? I have read on here Array.equal(array1,array2), but I want to compare just two elements in each iteration and I want the position of the different bytes as well.

If you are storing the bytes in a byte[] array, why not do a classic a[i] == b[i] comparison? Or whatever operation you intend to perform on the two bytes.
If performance is an issue (say for very large byte arrays, which I can foresee for sound streams), there are plenty of parallel methods for speeding up the process.

Related

c++ read a multidimensional array as an integer only for comparison

UPDATE: Considerations: this is on an embedded platform, I am memory limited and not wanting to have to use any extra Libs to perfom things like hashing..
I am running a program that uses a multi dimensional array.
uint8_t array[8][32]
This array is constantly updating, I need to run a comparison to determine if in the last 3 steps, there is a repetition.
It occoured to me that the array is just memory, so rather than iterate through the array, and create a comparison temp array (memory and processor wasted) why not approach this almost like making a hash of the array. I dont need any unique crypto so why not just read the memory location of the whole array as an integer?
Thoughts? How would I go about addressing the array to read in an integer which I could store in another simple array for my much easier comparison
unsigned long arrayTemp[3]

Emulate memory-mapping of a game console, access different locations based on the address provided

I am implementing an emulator for an old game console, mostly for learning purposes.
This console maps roms, and a lot of other things, to regions within its address space. Certain locations are also mirrored so that multiple addresses can correspond to the same physical location. I would like to emulate this, but I am not sure what would be a good approach to do so (and also have no idea what this process is called, hence this somewhat generic question).
One thing that does work is a simple, unordered map. Have it contain absolute addresses and the corresponding pointers to my data structures. This way, I can easily map everything I need into the system's address space. The problem with this approach is that, it's obviously a memory hog. Even with small roms, I end up with close to ten million entries, thanks to the aforementioned mirroring. Surely, this can't be the rigth thing to do?
Any help is much appreciated.
Edit:
To provide some details as to how exactly I am doing this:
The system in question is, of course, the SNES. Using this wiki as my primary resource, I implemeted what I mentioned above as follows:
Create a std::unordered_map<uint32,uint8_t*> mMemoryMap;
Check whether the rom is LoRom or HiRom
For each byte in the rom
Calculate the address where it should be mapped and emplace both the address and a pointer to said byte in the map
If the section needs to be mirrored somewhere else, repeat the above
This will be applied to anything else I need to make available, such as video- or system-memory
If I now want to access anything within the address space, I can simply use the address the system would use internally.
I'm assuming that for contiguous addresses the physical locations are also contiguous, within certain blocks of memory or "chunks". That is, if the address 0x0000 maps to 0xFF00, then 0x0004 maps to 0xFF04.
If they work like that, then you can make a list that contains the information of those chunks. Say:
struct Chunk
{
int addressStart, memoryStart, size;
}
The chunks may be ordered by the addressStart, so you can find out the correct chunk you would need for any address. This requires you to iterate the list, but if you have only a few chunks this may be acceptable.
Rather than use simple maps (which even with ranges can grow to large sizes) you can instead use a more intelligent map.
For instance if the console maps 0x10XXXX through 0x1FXXXX all to the same 0x20XXXX you can design a structure which holds that repetition (start 0x100000 end 0x1FFFFF repeat 0x010000 although you may want to use a bitmask rather than repeat).
I'm currently in the same boat as you and doing an NES emulator for learning purposes. The complete memory map is declared as an array of pointers to bytes. Each pointer can point into another array, which allows me to have pointers to the same data in the case of mirroring.
byte *memoryMap[cpuMemSize];
I iterate over the addresses that repeat and map them to point to the bytes in the following array. The ram array is the memory that gets mapped 4 times across the CPU's memory map.
byte ram[ramSize];
The following code goes through the RAM array and maps it across the CPU memory map 4 times.
// map RAM 4 times to emulate mirroring
mem_address memAddr = 0;
for (int x = 0; x < 4; ++x)
{
for (mem_address ramAddr = 0; ramAddr < ramSize; ++ramAddr.value)
{
memoryMap[memAddr.value++] = &ram[ramAddr.value];
}
}
You can then write the a value to the 256th byte using something like this, which would of course be propagated to the other parts of the memory map since they point to the same byte in memory:
*memoryMap[0x00ff] = 10;
I haven't really tested this and want to do some more testing with regard to CPU cache use and performance. I was clearly out searching for other ways of doing this when I stumbled on your question and figured I'd put in my (unverified) two cents. Hope this makes sense.

Why is deque using so much more RAM than vector in C++?

I have a problem I am working on where I need to use some sort of 2 dimensional array. The array is fixed width (four columns), but I need to create extra rows on the fly.
To do this, I have been using vectors of vectors, and I have been using some nested loops that contain this:
array.push_back(vector<float>(4));
array[n][0] = a;
array[n][1] = b;
array[n][2] = c;
array[n][3] = d;
n++
to add the rows and their contents. The trouble is that I appear to be running out of memory with the number of elements I was trying to create, so I reduced the number that I was using. But then I started reading about deque, and thought it would allow me to use more memory because it doesn't have to be contiguous. I changed all mentions of "vector" to "deque", in this loop, as well as all declarations. But then it appeared that I ran out of memory again, this time with even with the reduced number of rows.
I looked at how much memory my code is using, and when I am using deque, the memory rises steadily to above 2GB, and the program closes soon after, even when using the smaller number of rows. I'm not sure exactly where in this loop it is when it runs out of memory.
When I use vectors, the memory usage (for the same number of rows) is still under 1GB, even when the loop exits. It then goes on to a similar loop where more rows are added, still only reaching about 1.4GB.
So my question is. Is this normal for deque to use more than twice the memory of vector, or am I making an erroneous assumption in thinking I can just replace the word "vector" with "deque" in the declarations/initializations and the above code?
Thanks in advance.
I'm using:
MS Visual C++ 2010 (32-bit)
Windows 7 (64-bit)
The real answer here has little to do with the core data structure. The answer is that MSVC's implementation of std::deque is especially awful and degenerates to an array of pointers to individual elements, rather than the array of arrays it should be. Frankly, only twice the memory use of vector is surprising. If you had a better implementation of deque you'd get better results.
It all depends on the internal implementation of deque (I won't speak about vector since it is relatively straightforward).
Fact is, deque has completely different guarantees than vector (the most important one being that it supports O(1) insertion at both ends while vector only supports O(1) insertion at the back). This in turn means the internal structures managed by deque have to be more complex than vector.
To allow that, a typical deque implementation will split its memory in several non-contiguous blocks. But each individual memory block has a fixed overhead to allow the memory management to work (eg. whatever the size of the block, the system may need another 16 or 32 bytes or whatever in addition, just for bookkeeping). Since, contrary to a vector, a deque requires many small, independent blocks, the overhead stacks up which can explain the difference you see. Also note that those individual memory blocks need to be managed (maybe in separate structures?), which probably means some (or a lot of) additional overhead too.
As for a way to solve your problem, you could try what #BasileStarynkevitch suggested in the comments, this will indeed reduce your memory usage but it will get you only so far because at some point you'll still run out of memory. And what if you try to run your program on a machine that only has 256MB RAM? Any other solution which goal is to reduce your memory footprint while still trying to keep all your data in memory will suffer from the same problems.
A proper solution when handling large datasets like yours would be to adapt your algorithms and data structures in order to be able to handle small partitions at a time of your whole dataset, and load/save those partitions as needed in order to make room for the other partitions. Unfortunately since it probably means disk access, it also means a big drop in performance but hey, you can't eat the cake and have it too.
Theory
There two common ways to efficiently implement a deque: either with a modified dynamic array or with a doubly linked list.
The modified dynamic array uses is basically a dynamic array that can grow from both ends, sometimes called array deques. These array deques have all the properties of a dynamic array, such as constant-time random access, good locality of reference, and inefficient insertion/removal in the middle, with the addition of amortized constant-time insertion/removal at both ends, instead of just one end.
There are several implementations of modified dynamic array:
Allocating deque contents from the center of the underlying array,
and resizing the underlying array when either end is reached. This
approach may require more frequent resizings and waste more space,
particularly when elements are only inserted at one end.
Storing deque contents in a circular buffer, and only resizing when
the buffer becomes full. This decreases the frequency of resizings.
Storing contents in multiple smaller arrays, allocating additional
arrays at the beginning or end as needed. Indexing is implemented by
keeping a dynamic array containing pointers to each of the smaller
arrays.
Conclusion
Different libraries may implement deques in different ways, but generally as a modified dynamic array. Most likely your standard library uses the approach #1 to implement std::deque, and since you append elements only from one end, you ultimately waste a lot of space. For that reason, it makes an illusion that std::deque takes up more space than usual std::vector.
Furthermore, if std::deque would be implemented as doubly-linked list, that would result in a waste of space too since each element would need to accommodate 2 pointers in addition to your custom data.
Implementation with approach #3 (modified dynamic array approach too) would again result in a waste of space to accommodate additional metadata such as pointers to all those small arrays.
In any case, std::deque is less efficient in terms of storage than plain old std::vector. Without knowing what do you want to achieve I cannot confidently suggest which data structure do you need. However, it seems like you don't even know what deques are for, therefore, what you really want in your situation is std::vector. Deques, in general, have different application.
Deque can have additional memory overhead over vector because it's made of a few blocks instead of contiguous one.
From en.cppreference.com/w/cpp/container/deque:
As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays.
The primary issue is running out of memory.
So, do you need all the data in memory at once?
You may never be able to accomplish this.
Partial Processing
You may want to consider processing the data into "chunks" or smaller sub-matrices. For example, using the standard rectangular grid:
Read data of first quadrant.
Process data of first quandrant.
Store results (in a file) of first quandrant.
Repeat for remaining quandrants.
Searching
If you are searching for a particle or a set of datum, you can do that without reading in the entire data set into memory.
Allocate a block (array) of memory.
Read a portion of the data into this block of memory.
Search the block of data.
Repeat steps 2 and 3 until the data is found.
Streaming Data
If your application is receiving the raw data from an input source (other than a file), you will want to store the data for later processing.
This will require more than one buffer and is more efficient using at least two threads of execution.
The Reading Thread will be reading data into a buffer until the buffer is full. When the buffer is full, it will read data into another empty one.
The Writing Thread will initially wait until either the first read buffer is full or the read operation is finished. Next, the Writing Thread takes data out of the read buffer and writes to a file. The Write Thread then starts writing from the next read buffer.
This technique is called Double Buffering or Multiple Buffering.
Sparse Data
If there is a lot of zero or unused data in the matrix, you should try using Sparse Matrices. Essentially, this is a list of structures that hold the data's coordinates and the value. This also works when most of the data is a common value other than zero. This saves a lot of memory space; but costs a little bit more execution time.
Data Compression
You could also change your algorithms to use data compression. The idea here is to store the data location, value and the number or contiguous equal values (a.k.a. runs). So instead of storing 100 consecutive data points of the same value, you would store the starting position (of the run), the value, and 100 as the quantity. This saves a lot of space, but requires more processing time when accessing the data.
Memory Mapped File
There are libraries that can treat a file as memory. Essentially, they read in a "page" of the file into memory. When the requests go out of the "page", they read in another page. All this is performed "behind the scenes". All you need to do is treat the file like memory.
Summary
Arrays and deques are not your primary issue, quantity of data is. Your primary issue can be resolved by processing small pieces of data at a time, compressing the data storage, or treating the data in the file as memory. If you are trying to process streaming data, don't. Ideally, streaming data should be placed into a file and then processed later.
A historical purpose of a file is to contain data that doesn't fit into memory.

MPI synchronize matrix of vectors

Excuse me if this question is common or trivial, I am not very familiar with MPI so bear with me.
I have a matrix of vectors. Each vector is empty or has a few items in it.
std::vector<someStruct*> partitions[matrix_size][matrix_size];
When I start the program each process will have the same data in this matrix, but as the code progresses each process might remove several items from some vectors and put them in other vectors.
So when I reach a barrier I somehow have to make sure each process has the latest version of this matrix. The big problem is that each process might manipulate any or all vectors.
How would I go about to make sure that every process has the correct updated matrix after the barrier?
EDIT:
I am sorry I was not clear. Each process may move one or more objects to another vector but only one process may move each object. In other words each process has a list of objects it may move, but the matrix may be altered by everyone. And two processes can't move the same object ever.
In that case you'll need to send messages using MPI_Bcast that inform the other processors about this and instruct them to do the same. Alternatively, if the ordering doesn't matter until you hit the barrier, you can only send the messages to the root process which performs the permutations and then after the barrier sends it to all the others using MPI_Bcast.
One more thing: vectors of pointers are usually quite a bad idea, as you'll need to manage the memory manually in there. If you can use C++11, use std::unique_ptr or std::shared_ptr instead (depending on what your semantics are), or use Boost which provides very similar facilities.
And lastly, representing a matrix as a fixed-size array of fixed-size arrays is readlly bad. First: the matrix size is fixed. Second: adjacent rows are not necessarily stored in contiguous memory, slowing your program down like crazy (it literally can be orders of magnitudes). Instead represent the matrix as a linear array of size Nrows*Ncols, and then index the elements as Nrows*i + j where Nrows is the number of rows and i and j are the row and column indices, respectively. If you don't want column-major storage instead, address the elements by i + Ncols*j. You can wrap this index-juggling in inline functions that have virtually zero overhead.
I would suggest to lay out the data differently:
Each process has a map of his objects and their position in the matrix. How that is implemented depends on how you identify objects. If all local objects are numbered, you could just use a vector<pair<int,int>>.
Treat that as the primary structure you manipulate and communicate that structure with MPI_Allgather (each process sends it data to all other processes, at the end everyone has all data). If you need fast lookup by coordinates, then you can build up a cache.
That may or may not be performing well. Other optimizations (like sharing 'transactions') totally depend on your objects and the operations you perform on them.

How to store bits to a huge char array for file input/output

I want to store lots of information to a block by bits, and save it into a file.
To keep my file not so big, I want to use a small number of bits to save specified information instead of a int.
For example, I want to store Day, Hour, Minute to a file.
I only want 5 bit(day) + 5 bit(Hour) + 6 bit(Minute) = 16 bit of memory for data storage.
I cannot find a efficient way to store it in a block to put in a file.
There are some big problems in my concern:
the data length I want to store each time is not constant. It depends on the incoming information. So I cannot use a structure to store it.
there must not be any unused bit in my block, I searched for some topics that mentioned that if I store 30 bits in an int(4 byte variable), then the next 3 bit I save will automatically go into the next int. but I do not want it to happen!!
I know I can use shift right, shift left to put a number to a char, and put the char into a block, but it is inefficient.
I want a char array that I can continue putting specified bits into, and use write to put it into a file.
I think I'd just use the number of bits necessary to store the largest value you might ever need for any given piece of information. Then, Huffman encode the data as you write it (and obviously Huffman decode it as you read it). Most other approaches are likely to be less efficient, and many are likely to be more complex as well.
I haven't seen such a library. So I'm afraid you'll have to write one yourself. It won't be difficult, anyway.
And about the efficiency. This kind of operations always need bits shifting and masking, because few CPUs support directly operating into bits, especially between two machine words. The only difference is you or your compiler does the translation.