c++ array sorting with some specifications - c++

I'm using C++. Using sort from STL is allowed.
I have an array of int, like this :
1 4 1 5 145 345 14 4
The numbers are stored in a char* (i read them from a binary file, 4 bytes per numbers)
I want to do two things with this array :
swap each number with the one after that
4 1 5 1 345 145 4 14
sort it by group of 2
4 1 4 14 5 1 345 145
I could code it step by step, but it wouldn't be efficient. What I'm looking for is speed. O(n log n) would be great.
Also, this array can be bigger than 500MB, so memory usage is an issue.
My first idea was to sort the array starting from the end (to swap the numbers 2 by 2) and treating it as a long* (to force the sorting to take 2 int each time). But I couldn't manage to code it, and I'm not even sure it would work.
I hope I was clear enough, thanks for your help : )

This is the most memory efficient layout I could come up with. Obviously the vector I'm using would be replaced by the data blob you're using, assuming endian-ness is all handled well enough. The premise of the code below is simple.
Generate 1024 random values in pairs, each pair consisting of the first number between 1 and 500, the second number between 1 and 50.
Iterate the entire list, flipping all even-index values with their following odd-index brethren.
Send the entire thing to std::qsort with an item width of two (2) int32_t values and a count of half the original vector.
The comparator function simply sorts on the immediate value first, and on the second value if the first is equal.
The sample below does this for 1024 items. I've tested it without output for 134217728 items (exactly 536870912 bytes) and the results were pretty impressive for a measly macbook air laptop, about 15 seconds, only about 10 of that on the actual sort. What is ideally most important is no additional memory allocation is required beyond the data vector. Yes, to the purists, I do use call-stack space, but only because q-sort does.
I hope you get something out of it.
Note: I only show the first part of the output, but I hope it shows what you're looking for.
#include <iostream>
#include <fstream>
#include <algorithm>
#include <iterator>
#include <cstdint>
// a most-wacked-out random generator. every other call will
// pull from a rand modulo either the first, or second template
// parameter, in alternation.
template<int N,int M>
struct randN
{
int i = 0;
int32_t operator ()()
{
i = (i+1)%2;
return (i ? rand() % N : rand() % M) + 1;
}
};
// compare to integer values by address.
int pair_cmp(const void* arg1, const void* arg2)
{
const int32_t *left = (const int32_t*)arg1;
const int32_t *right = (const int32_t *)arg2;
return (left[0] == right[0]) ? left[1] - right[1] : left[0] - right[0];
}
int main(int argc, char *argv[])
{
// a crapload of int values
static const size_t N = 1024;
// seed rand()
srand((unsigned)time(0));
// get a huge array of random crap from 1..50
vector<int32_t> data;
data.reserve(N);
std::generate_n(back_inserter(data), N, randN<500,50>());
// flip all the values
for (size_t i=0;i<data.size();i+=2)
{
int32_t tmp = data[i];
data[i] = data[i+1];
data[i+1] = tmp;
}
// now sort in pairs. using qsort only because it lends itself
// *very* nicely to performing block-based sorting.
std::qsort(&data[0], data.size()/2, sizeof(data[0])*2, pair_cmp);
cout << "After sorting..." << endl;
std::copy(data.begin(), data.end(), ostream_iterator<int32_t>(cout,"\n"));
cout << endl << endl;
return EXIT_SUCCESS;
}
Output
After sorting...
1
69
1
83
1
198
1
343
1
367
2
12
2
30
2
135
2
169
2
185
2
284
2
323
2
325
2
347
2
367
2
373
2
382
2
422
2
492
3
286
3
321
3
364
3
377
3
400
3
418
3
441
4
24
4
97
4
153
4
210
4
224
4
250
4
354
4
356
4
386
4
430
5
14
5
26
5
95
5
145
5
302
5
379
5
435
5
436
5
499
6
67
6
104
6
135
6
164
6
179
6
310
6
321
6
399
6
409
6
425
6
467
6
496
7
18
7
65
7
71
7
84
7
116
7
201
7
242
7
251
7
256
7
324
7
325
7
485
8
52
8
93
8
156
8
193
8
285
8
307
8
410
8
456
8
471
9
27
9
116
9
137
9
143
9
190
9
190
9
293
9
419
9
453

With some additional constraints on both your input and your platform, you can probably use an approach like the one you are thinking of. These constraints would include
Your input contains only positive numbers (i.e. can be treated as unsigned)
Your platform provides uint8_t and uint64_t in <cstdint>
You address a single platform with known endianness.
In that case you can divide your input into groups of 8 bytes, do some byte shuffling to arrange each groups as one uint64_t with the "first" number from the input in the lower-valued half and run std::sort on the resulting array. Depending on endianness you may need to do more byte shuffling to rearrange each sorted 8-byte group as a pair of uint32_t in the expected order.
If you can't code this on your own, I'd strongly advise you not to take this approach.
A better and more portable approach (you have some inherent non-portability by starting from a not clearly specified binary file format), would be:
std::vector<int> swap_and_sort_int_pairs(const unsigned char buffer[], size_t buflen) {
const size_t intsz = sizeof(int);
// We have to assume that the binary format in buffer is compatible with our int representation
// we also require an even number of integers
assert(buflen % (2*intsz) == 0);
// load pairwise
std::vector< std::pair<int,int> > pairs;
pairs.reserve(buflen/(2*intsz));
for (const unsigned char* bufp=buffer; bufp<buffer+buflen; bufp+= 2*intsz) {
// It would be better to have a more portable binary -> int conversion
int first_value = *reinterpret_cast<int*>(bufp);
int second_value = *reinterpret_cast<int*>(bufp + intsz);
// swap each pair here
pairs.emplace_back( second_value, firstvalue );
}
// less<pair<..>> does lexicographical ordering, which is what you are looking ofr
std::sort(pairs.begin(), pairs.end());
// convert back to linear vector
std::vector<int> result;
result.reserve(2*pairs.size());
for (auto& entry : pairs) {
result.push_back(entry.first);
result.push_back(entry.second);
}
return result;
}
Both the inital parse/swap pass (which you need anyway) and the final conversion are O(N), so the total complexity is still (O(N log(N)).
If you can continue to work with pairs, you can save the final conversion. The other way to save that conversion would be to use a hand-coded sort with two-int strides and two-int swap: much more work - and possibly still hard to get as efficient as a well-tuned library sort.

Do one thing at a time. First, give your data some *struct*ure. It seems that each 8 byte form a unit of the
form
struct unit {
int key;
int value;
}
If the endianness is right, you can do this in O(1) with a reinterpret_cast. If it isn't, you'll have to live with a O(n) conversion effort. Both vanish compared to the O(n log n) search effort.
When you have an array of these units, you can use std::sort like:
bool compare_units(const unit& a, const unit& b) {
return a.key < b.key;
}
std::sort(array, length, compare_units);
The key to this solution is that you do the "swapping" and byte-interpretation first and then do the sorting.

Related

How to extract data from Minecraft .mca files

I would like to generate a map for my own world, so I wrote a program try to analyze data from files in C++.
According to this page, the region file begins with a 4KB head which tells the positions of each chunk.
I wrote a program, but it outputs the wrong stuff.
This is my program
#include<bits/stdc++.h>
#include<fstream>
using namespace std;
char DataRaw[4100];
uint8_t DataUnsigned8[4100];
struct F4K{int pos,sz,dfn;} _chunk[40][40];
int main()
{
ifstream file("first4K.sample",ios::binary|ios::in|ios::ate);
ifstream::pos_type n=file.tellg();
freopen("offset.out","w",stdout);
file.seekg(0);
file.read((char*)(&DataRaw),n);
DataRaw[n]='\0';
file.close();
for (int i=0;i<n;i++)
DataUnsigned8[i+1]=uint8_t(DataRaw[i]);
for (int i=0;i<32;i++)
{
for (int j=0;j<32;j++)
{
int id=4*((i&31)+(j&31)*32);
_chunk[i][j].pos=(DataUnsigned8[id+1]<<16)|(DataUnsigned8[id+2]<<8)|(DataUnsigned8[id+3]);
_chunk[i][j].sz=DataUnsigned8[id+4];
}
}
cout<<" X Z offset sz\n-------------------"<<endl;
for (int i=0;i<32;i++)
{
for (int j=0;j<32;j++)
cout<<setw(2)<<i<<setw(3)<<j<<setw(9)<<_chunk[i][j].pos<<setw(4)<<_chunk[i][j].sz<<endl;
}
return 0;
}
And it outputs this
X Z offset sz
-------------------
0 0 2098751 32
0 1 2098252 2
0 2 139296 73
0 3 2113312 32
0 4 286978 32
0 5 2099058 3
0 6 2098275 2
0 7 139271 65
...
31 25 7602464 32
31 26 2556192 1
31 27 2105407 32
31 28 6546821 168
31 29 10927590 179
31 30 15109023 35
31 31 15315359 230
I expect that the offset is sorted and begins with 8192,but it was totally wrong! Some addressed (for example X:31,Y:31) is even bigger than the file size (The size is only 8,048,640 Bytes)
May anyone tell me why?

why there is abnormalities in printing the key and value of un_ordered map and map(dictionary)?

here is my code please tell me why it is not printing from starting as in map it is printing in correct manner
#include<bits/stdc++.h>
using namespace std;
int main(){
unordered_map<int,int>arr;
for(int i=1;i<=10;i++){
arr[i]=i*i;
}
for(auto it=arr.begin();it!=arr.end();it++){
cout<<it->first<<" "<<it->second<<"\n";
}
cout<<"normal map \n";
map<int,int>arry;
for(int i=1;i<=10;i++){
arry[i]=i*i;
}
for(auto it=arry.begin();it!=arry.end();it++){
cout<<it->first<<" "<<it->second<<"\n";
}
}
and my output is
10 100
9 81
8 64
7 49
6 36
5 25
1 1
2 4
3 9
4 16
normal map
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100
why un_ordered map printing the value in this fashion why not it printing like map
std::unordered_map doesn't order keys in any specific order. This is why it is called unordered.
Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its key. This allows fast access to individual elements, since once the hash is computed, it refers to the exact bucket the element is placed into.

Does fstream access the hardrive with read and writes during compile time if used within a class temlpate?

Consider the following code snippet of this class template...
template<class T>
class FileTemplate {
private:
std::vector<T> vals_;
std::string filenameAndPath_;
public:
inline FileTemplate( const std::string& filenameAndPath, const T& multiplier ) :
filenameAndPath_( filenameAndPath ) {
std::fstream file;
if ( !filenameAndPath_.empty() ) {
file.open( filenameAndPath_ );
T val = 0;
while ( file >> val ) {
vals_.push_back( val );
}
file.close();
for ( unsigned i = 0; i < vals_.size(); i++ ) {
vals_[i] *= multiplier;
}
file.open( filenameAndPath_ );
for ( unsigned i = 0; i < vals_.size(); i++ ) {
file << vals_[i] << " ";
}
file.close();
}
}
inline std::vector<T> getValues() const {
return vals_;
}
};
When used in main as such with the lower section commented out with the following pre-populated text file:
values.txt
1 2 3 4 5 6 7 8 9
int main() {
std::string filenameAndPath( "_build/values.txt" );
std::fstream file;
FileTemplate<unsigned> ft( filenameAndPath, 5 );
std::vector<unsigned> results = ft.getValues();
for ( auto r : results ) {
std::cout << r << " ";
}
std::cout << std::endl;
/*
FileTemplate<float> ft2( filenameAndPath, 2.5f );
std::vector<float> results2 = ft2.getValues();
for ( auto r : results2 ) {
std::cout << r << " ";
}
std::cout << std::endl;
*/
std::cout << "\nPress any key and enter to quit." << std::endl;
char q;
std::cin >> q;
return 0;
}
and I run this code through the debugger sure enough both the output to the screen and file are changed to
values.txt - overwritten are -
5 10 15 20 25 30 35 40 45
then lets say I don't change any code just stop the debugging or running of the application, and let's say I run this again 2 more times, the outputs respectively are:
values.txt - iterations 2 & 3
25 50 75 100 125 150 175 200 225 250
125 250 375 500 625 750 875 1000 1125 1250
Okay good so far; now lets reset our values in the text file back to default and lets uncomment the 2nd instantiation of this class template for the float with a multiplier value of 2.5f and then run this 3 times.
values.txt - reset to default
1 2 3 4 5 6 7 8 9
-iterations 1,2 & 3 with both unsigned & float the multipliers are <5,2.5> respectively. 5 for the unsigned and 2.5 for the float
- Iteration 1
cout:
5 10 15 20 25 30 35 40 45
12.5 25 37.5 50 62.5 75 87.5 100 112.5
values.txt:
12.5 25 37.5 50 62.5 75 87.5 100 112.5
- Iteration 2
cout:
60
150 12.5 62.5 93.75 125 156.25 187.5 218.75 250 281.25
values.txt:
150 12.5 62.5 93.75 125 156.25 187.5 218.75 250 281.25
- Iteration 3
cout:
750 60
1875 150 12.5 156.25 234.375 312.5 390.625 468.75 546.875 625 703.125
values.txt:
1875 150 12.5 156.25 234.375 312.5 390.625 468.75 546.875 625 703.125
A couple of questions come to mind: it is two fold regarding the same behavior of this program.
The first and primary question is: Are the file read and write calls being done at compile time considering this is a class template and the constructor is inline?
After running the debugger a couple of times; why is the output incrementing the number of values in the file? I started off with 9, but after an iteration or so there are 10, then 11.
This part just for fun if you want to answer:
The third and final question yes is opinion based but merely for educational purposes for I would like to see what the community thinks about this: What are the pros & cons to this type of programming? What are the potentials and the limits? Are their any practical real world applications & production benefits to this?
In terms of the other issues. The main issue is that you are not truncating the file when you do the second file.open statement, you need :
file.open( filenameAndPath_, std::fstream::trunc|std::fstream::out );
What is happening, is that, when you are reading unsigned int from a file containing floating points, it is only reading the first number (e.g. 12.5) up to the decimal place and then stopping (e.g. reading only 12)
, because there is no other text on the line that looks like an unsigned int. This means it only reads the number 12 and then multiplies it by 5 to get the 60, and writes it to the file.
Unfortunately because you don't truncate the file when writing the 60, it leaves the original text at the end which is interpreted as additional numbers in the next read loop. Hence, 12.5 appears in the file as 60 5
stream buffers
Extracts as many characters as possible from the stream and inserts them into the output sequence controlled by the stream buffer object pointed by sb (if any), until either the input sequence is exhausted or the function fails to insert into the object pointed by sb.
(http://www.cplusplus.com/reference/istream/istream/operator%3E%3E/)

Is there any way to add numbers to a "buffer" in a while-loop?

I'm pretty new to c++, and at the moment i am trying to make a calculator that calculates a Euklid's Algorithm.
Anyways, what i need help with is how i can add the final number to some kind of array for each loop.
Lets for example say i put in the numbers 1128 and 16. my program will then give this output
1128 % 16 = 70 + 8
70 % 16 = 4 + 6
4 % 16 = 0 + 4
theese three lines is printed, one at the time, for each loop. What i want is to add the last numbers (8, 6 and 4) to an array. How would i do this?
Use Vector instead of array, Hope this Helps!
#include <iostream>
#include<vector>
using namespace std;
int main()
{
int a=1128,b=16,i;
vector<int>arr;
while(a>b)
{
cout<<a/b<<" "<<a%b<<endl;
arr.push_back(a%b);
a/=b;
}
cout<<a/b<<" "<<a%b<<endl;
arr.push_back(a%b); // Case: When a<=b in Vector
for(i=0;i<arr.size();i++)
cout<<arr[i]<<" "; // Array i.e 8 6 4
return 0;
}
Output:
70 8
4 6
0 4
8 6 4 // Array

ant colony optimisation for 01 MKP

I'm trying to implement an ACO for 01MKP. My input values are from the OR-Library mknap1.txt. According to my algorithm, first I choose an item randomly. then i calculate the probabilities for all other items on the construction graph. the probability equation depends on pheremon level and the heuristic information.
p[i]=(tau[i]*n[i]/Σ(tau[i]*n[i]).
my pheremon matrix's cells have a constant value at initial (0.2). for this reason when i try to find the next item to go, pheremon matrix is becomes ineffective because of 0.2. so, my probability function determines the next item to go, checking the heuristic information. As you know, the heuristic information equation is
n[i]=profit[i]/Ravg.
(Ravg is the average of the resource constraints). for this reason my prob. functions chooses the item which has biggest profit value. (Lets say at first iteration my algorithm selected an item randomly which has 600 profit. then at the second iteration, chooses the 2400 profit value. But, in OR-Library, the item which has 2400 profit value causes the resource violation. Whatever I do, the second chosen is being the item which has 2400 profit.
is there anything wrong my algorithm? I hope ppl who know somethings about ACO, should help me. Thanks in advance.
Input values:
6 10 3800//no of items (n) / no of resources (m) // the optimal value
100 600 1200 2400 500 2000//profits of items (6)
8 12 13 64 22 41//resource constraints matrix (m*n)
8 12 13 75 22 41
3 6 4 18 6 4
5 10 8 32 6 12
5 13 8 42 6 20
5 13 8 48 6 20
0 0 0 0 8 0
3 0 4 0 8 0
3 2 4 0 8 4
3 2 4 8 8 4
80 96 20 36 44 48 10 18 22 24//resource capacities.
My algorithm:
for i=0 to max_ant
for j=0; to item_number
if j==0
{
item=rand()%n
ant[i].value+=profit[item]
ant[i].visited[j]=item
}
else
{
calculate probabilities for all the other items in P[0..n]
find the biggest P value.
item=biggest P's item.
check if it is in visited list
check if it causes resource constraint.
if everthing is ok:
ant[i].value+=profit[item]
ant[i].visited[j]=item
}//end of else
}//next j
update pheremon matrix => tau[a][b]=rou*tau[a][b]+deltaTou
}//next i