Fastest way to transpose 16 word array - c++

I have the following code:
void shuffle_words(WORD_TYPE* _state)
{
WORD_TYPE temp[DATA_SIZE];
temp[7] = _state[0];
temp[12] = _state[1];
temp[14] = _state[2];
temp[9] = _state[3];
temp[2] = _state[4];
temp[1] = _state[5];
temp[5] = _state[6];
temp[15] = _state[7];
temp[11] = _state[8];
temp[6] = _state[9];
temp[13] = _state[10];
temp[0] = _state[11];
temp[4] = _state[12];
temp[8] = _state[13];
temp[10] = _state[14];
temp[3] = _state[15];
memcpy_s(_state, temp, DATA_SIZE * WORD_SIZE);
}
int prp(WORD_TYPE* data, WORD_TYPE key)
{
shuffle_words(data);
key = round_function<14, 15>(data, key);
key = round_function<13, 14>(data, key);
key = round_function<12, 13>(data, key);
key = round_function<11, 12>(data, key);
key = round_function<10, 11>(data, key);
key = round_function<9, 10>(data, key);
key = round_function<8, 9>(data, key);
key = round_function<7, 8>(data, key);
key = round_function<6, 7>(data, key);
key = round_function<5, 6>(data, key);
key = round_function<4, 5>(data, key);
key = round_function<3, 4>(data, key);
key = round_function<2, 3>(data, key);
key = round_function<1, 2>(data, key);
key = round_function<0, 1>(data, key);
key = round_function<15, 0>(data, key);
return key;
}
I would like to know if there is a faster way to perform the shuffle_words operation. I have seen questions about matrix transposition, but those appear to be focused on situations where the matrix is either large or multidimensional.
My array will always be 16 words in size, and the prp function will be applied multiple times on the same array, one immediately after another. This leads me to believe simply accessing elements in the transposed order without actually transposing them is an option.
The round_function already writes data to the array, if it would be more efficient to move the shuffle into that it would be acceptable. Here is the code for that, in case it's needed:
template <int left_index, int right_index>
WORD_TYPE round_function(WORD_TYPE* state, WORD_TYPE key)
{
WORD_TYPE left, right;
left = state[left_index];
right = state[right_index];
key ^= right;
right = rotate_left<ROTATION_AMOUNT>(right + key + left_index);
key ^= right;
key ^= left;
left += right >> (BIT_WIDTH / 2);
left ^= rotate_left<(left_index % BIT_WIDTH) ^ ROTATION_AMOUNT>(right);
key ^= left;
state[left_index] = left;
state[right_index] = right;
return key;
}
I thought of supplying a destination index to round_function, but doing so overwrites bytes that have yet to be operated on, which destroys the data at the destination index.
What is the most efficient way to perform the word transposition step?
Is it possible to efficiently perform shuffle_words without temporary storage and memcpy? Will the compiler optimize this for me if I leave it as is?
edit:
For a sample input of 16 null words, I got the following output:
5390936987981438580
7289498000187791405
11630888819098945478
4862561973623181657
11364775727483781365
1302861686580238483
10934483497681452460
376472396741801
17443576244438476890
17213444377027086447
15287741771379858051
16772715748200046576
6216997191100954620
16389751604649919423
2033403819063771136
14517213842436349075
I used these #defines:
#define ROTATION_AMOUNT 41
#define BIT_WIDTH 64
#define DATA_SIZE 16
typedef unsigned long long WORD_TYPE;
I am ok if functionality is modified slightly if an increase in efficiency can be obtained.

Yes!
void shuffle_words(WORD_TYPE* _state) {
WORD_TYPE temp = _state[0];
_state[0] = _state[11];
_state[11] = _state[8];
_state[8] = _state[13];
_state[13] = _state[10];
_state[10] = _state[14];
_state[14] = _state[2];
_state[2] = _state[4];
_state[4] = _state[12];
_state[12] = _state[1];
_state[1] = _state[5];
_state[5] = _state[6];
_state[6] = _state[9];
_state[9] = _state[3];
_state[3] = _state[15];
_state[15] = _state[7];
_state[7] = temp;
}

Related

LeetCode 380: Insert Delete GetRandom O(1)

I came across this leetcode problem Insert Delete GetRandom where it is asked to implement a Data Structure to support Insert, Delete and getRandom in average O(1) time, and solved it as using map and a vector.
My solution passes all the test cases except for the last one and I'm not able to figure out why? The last test case is really very large to debug.
I changed my code a little bit and then it passes but still didn't got why the previous one didn't pass.
Non-Accepted Solution:
class RandomizedSet {
map<int, int> mp;
vector<int> v;
public:
/** Initialize your data structure here. */
RandomizedSet() {
}
/** Inserts a value to the set. Returns true if the set did not already contain the specified element. */
bool insert(int val) {
if(mp.find(val) == mp.end()){
v.push_back(val);
mp[val] = v.size()-1;
return true;
}
else return false;
}
/** Removes a value from the set. Returns true if the set contained the specified element. */
bool remove(int val) {
if(mp.find(val) == mp.end()){
return false;
}
else{
int idx = mp[val];
mp.erase(val);
swap(v[idx], v[v.size()-1]);
v.pop_back();
if(mp.size()!=0) mp[v[idx]] = idx;
return true;
}
}
/** Get a random element from the set. */
int getRandom() {
if(v.size() == 0) return 0;
int rndm = rand()%v.size();
return v[rndm];
}
};
/**
* Your RandomizedSet object will be instantiated and called as such:
* RandomizedSet* obj = new RandomizedSet();
* bool param_1 = obj->insert(val);
* bool param_2 = obj->remove(val);
* int param_3 = obj->getRandom();
*/
Accpeted Solution:
The problem is in remove function, when i change the remove function by below code, it passes.
if(mp.find(val) == mp.end()){
return false;
}
else{
int idx = mp[val];
swap(v[idx], v[v.size()-1]);
v.pop_back();
mp[v[idx]] = idx;
mp.erase(val);
return true;
}
I don't understand why is this happening. I placed the mp.erase(val) in the last and replaced the if(mp.size()!=0) mp[v[idx]] = idx to mp[v[idx]] = idx only.
Both versions of remove function are able to handle corner case - when there is only single element left in the map and we want to remove it.
LeetCode 380
This is because of undefined behavior when the element removed is the last element.
e.g, say the operations are
insert(1) // v = [1], mp = [1->0]
insert(2) // v = [1,2], mp = [1->0, 2->1]
remove(2):
int idx = mp[val]; // val = 2, idx = 1
mp.erase(val); // mp = [1->0]
swap(v[idx], v[v.size()-1]); // idx = v.size()-1 = 1, so this does nothing.
v.pop_back(); // v = [1]
if(mp.size()!=0) mp[v[idx]] = idx; // mp[v[1]] = 1.
// But v[1] is undefined after pop_back(), since v's size is 1 at this point.
I am guessing that it doesn't clear the memory location accessed by v[1], so v[1] still points to 2, and it ends up putting 2 back into mp.

How to sort a string without sorting commands

I have an assignment for one of my classes where I need to sort a string alphabetically without using any commands besides the simple ones that are already in here. Whenever I use it, it works for the most part except it will leave words like Fred & Eric, or Hazel & Ian (first letter's are next to each other in the alphabet). The string that is being compared to the others is set as "two" then all others are compared against it. The B string is just one that is being changed with the A string. If anyone knows why this is, that would be greatly appreciated!
for (int ct = 0; ct < kh; ct++){
hold = A[ct];
bool pass = false;
for (int ct2 = ct+1; ct2 < kh; ct2++){
two = A[ct2];
if (two[0] < hold[0]){
save = A[ct2];
A[ct2] = A[ct];
A[ct] = save;
hold = two;
save = B[ct2];
B[ct2] = B[ct];
B[ct] = save;
}
else if (two[0] == hold[0]){
if (two[1] < hold [1]){
save = A[ct2];
A[ct2] = A[ct];
A[ct] = save;
hold = two;
save = B[ct2];
B[ct2] = B[ct];
B[ct] = save;
}
}
else if (two[1] == hold[1]){
if (two[2] < hold [2]){
save = A[ct2];
A[ct2] = A[ct];
A[ct] = save;
hold = two;
save = B[ct2];
B[ct2] = B[ct];
B[ct] = save;
}
}
}
}

C++ Boolean Value always incorrectly returns true in 2-3 Tree Search

I'm literally ripping my hair out on this one fellas. Here's the problem. I've hard coded a 2-3 Tree and verified that it works with the use of an inorder traversal function that outputs the values of the node it's currently in. So I know the tree is built correctly.
Node *r;
Node zero,one,two,three,four,five,six,seven,eight,nine,ten;
r = &zero;
//Root
zero.small = 50;
zero.large = 90;
zero.left = &one; //Child node to the left
zero.middle = &four; //Child node in the middle
zero.right = &seven; //Child node to the right
//Left Tree
one.small = 20;
one.large = NULL;
one.left = &two;
one.middle = NULL;
one.right = &three;
two.small = 10;
two.large = NULL;
two.left = NULL;
two.middle = NULL;
two.right = NULL;
three.small = 30;
three.large = 40;
three.left = NULL;
three.middle = NULL;
three.right = NULL;
//Middle Tree
four.small = 70;
four.large = NULL;
four.left = &five;
four.middle = NULL;
four.right = &six;
five.small = 60;
five.large = NULL;
five.left = NULL;
five.middle = NULL;
five.right = NULL;
six.small = 80;
six.large = NULL;
six.left = NULL;
six.middle = NULL;
six.right = NULL;
//Right Tree
seven.small = 120;
seven.large = 150;
seven.left = &eight;
seven.middle = &nine;
seven.right = &ten;
eight.small = 100;
eight.large = 110;
eight.left = NULL;
eight.middle = NULL;
eight.right = NULL;
nine.small = 130;
nine.large = 140;
nine.left = NULL;
nine.middle = NULL;
nine.right = NULL;
ten.small = 160;
ten.large = NULL;
ten.left = NULL;
ten.middle = NULL;
ten.right = NULL;
cout<<"inorder traversal for debug"<<endl;
inOrder(*r);
Output would be: 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160
So that proves the tree is built correctly. I've been asked to modify the code to search for a value in the tree. so I wrote this function below, that's essentially the inorder traversal function minus the outputs and a simple if statement that returns TRUE if the search key is found in the tree.
bool retrieve(Node r, int key)
{
if (r.left)
retrieve(*r.left, key);
if (r.small)
{
if (r.small == key)
{
cout<<"The node: "<<r.small<<" is equal to search key: "<<key<<endl; //for debug purposes
return true;
}
}
if (r.middle)
retrieve(*r.middle, key);
if (r.large)
if (r.right)
retrieve(*r.right, key);
}
The user is prompted for a number to search for (int key), and upon entry enters an if statement
if (retrieve(*r, key))
{
cout<<key<<" is found!"<<endl;
}
else
cout<<key<<" is not found!"<<endl;
Now the problem is that this seems logically sound to me, and yet when I enter the value "85" (which is not located on the tree AT ALL), the program outputs "85 is found!". Notice how it didn't output the COUT statement I have in the function.cout<<"The node: "<<r.small<<" is equal to search key: "<<key<<endl; I've debugged and stepped through the program and no matter what the bool function (retrieve) always returns true... What? So I switched the if statement in the bool function to return false (just for debugging purposes) upon entering "60" (which IS located on the tree), the boolean function STILL returns true. I've tried several combinations of slightly different code but to no avail.. What the heck is going on??
Thanks in advance,
Tyler
You never return a value, except in the if (r.small == key) branch.
From 2–3 tree - Wikipedia, I would say your code should compare the key with the small and large key first and depending on the comparison return the result from retrieve(*r.left/middle/right, key).
Something along these lines (untested)
if (key < r.small)
return retrieve(*r.small, key);
if (key == r.small)
return TRUE;
if (r.right == NULL)
return retrieve(*r.middle, key);
if (key < r.large)
return retrieve(*r.middle, key);
if (key == r.large)
return TRUE;
return retrieve(*r.right, key);
You need to first check if the key is found in the current node in either small or large, and if it is, return true. if it is not you need to recursively call retrieve on each of the contained nodes, and if any of them return true, return true. If your function has not returned yet you need to return false.
You need an initial test to see if the recursion should stop because you are at a least node.
// precondition: current is not 0
// returns: true or false. If true, location is set to the node
// where it was found.
bool DoSearch(Node *current, int key, Node *location)
{
/*
* Is key in current?
*/
if (current->smallValue == key || (current->isThreeNode()
&& current->largeValue == key)) {
location = current;
return true;
} else if ((current->isLeafNode())) {
location = current;
return false;
/*
* Does current have two keys?
*/
} else if (current->isThreeNode()){
if (key < current->smallValue) {
DoSearch(key, current->leftChild, location);
} else if (key < current->largeValue) {
DoSearch(key, current->middleChild, location);
} else {
DoSearch(key, current->rightChild, location);
}
} else { // ...or only one?
if (key < current->smallValue) {
DoSearch(key, current->leftChild, location);
} else {
DoSearch(key, current->rightChild, location);
}
}
}

PE File Format Section Add On

I'm confused on why they use - 1 here. Can someone explain what this line is doing in very very very very low level detail please... Not o its subtracting 1 structure.... I need to know more...about the low level... thanks...
PIMAGE_SECTION_HEADER last_section = IMAGE_FIRST_SECTION(nt_headers) + (nt_headers->FileHeader.NumberOfSections - 1);
The code above is in the below function:
//Reference: http://www.codeproject.com/KB/system/inject2exe.aspx
PIMAGE_SECTION_HEADER add_section(const char *section_name, unsigned int section_size, void *image_addr) {
PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)image_addr;
if(dos_header->e_magic != 0x5A4D) {
wprintf(L"Could not retrieve DOS header from %p", image_addr);
return NULL;
}
PIMAGE_NT_HEADERS nt_headers = (PIMAGE_NT_HEADERS)((DWORD_PTR)dos_header + dos_header->e_lfanew);
if(nt_headers->OptionalHeader.Magic != 0x010B) {
wprintf(L"Could not retrieve NT header from %p", dos_header);
return NULL;
}
const int name_max_length = 8;
PIMAGE_SECTION_HEADER last_section = IMAGE_FIRST_SECTION(nt_headers) + (nt_headers->FileHeader.NumberOfSections - 1);
PIMAGE_SECTION_HEADER new_section = IMAGE_FIRST_SECTION(nt_headers) + (nt_headers->FileHeader.NumberOfSections);
memset(new_section, 0, sizeof(IMAGE_SECTION_HEADER));
new_section->Characteristics = IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_CNT_CODE;
memcpy(new_section->Name, section_name, name_max_length);
new_section->Misc.VirtualSize = section_size;
new_section->PointerToRawData = align_to_boundary(last_section->PointerToRawData + last_section->SizeOfRawData,
nt_headers->OptionalHeader.FileAlignment);
new_section->SizeOfRawData = align_to_boundary(section_size, nt_headers->OptionalHeader.SectionAlignment);
new_section->VirtualAddress = align_to_boundary(last_section->VirtualAddress + last_section->Misc.VirtualSize,
nt_headers->OptionalHeader.SectionAlignment);
nt_headers->OptionalHeader.SizeOfImage = new_section->VirtualAddress + new_section->Misc.VirtualSize;
nt_headers->FileHeader.NumberOfSections++;
return new_section;
}
In C and C++, array elements are indexed from 0 to n-1 (in FORTRAN from 1 to n). So, if you have a pointer p0 to the first element but want a pointer to the last element you have to add n-1:
plast=p0+n-1. This all there is to this.

BulkLoading the R* tree with spatialindex library

After successfully building the R* tree with spatial library inserting records one-by-one 2.5 million of times, I was trying to create the R* tree with bulkloading. I implemented the DBStream class to iteratively give the data to the BulkLoader. Essentially, it invokes the following method and prepared a Data (d variable in the code) object for the Bulkloader:
void DBStream::retrieveTuple() {
if (query.next()) {
hasNextBool = true;
int gid = query.value(0).toInt();
// allocate memory for bounding box
// this streets[gid].first returns bbox[4]
double* bbox = streets[gid].first;
// filling the bounding box values
bbox[0] = query.value(1).toDouble();
bbox[1] = query.value(2).toDouble();
bbox[2] = query.value(3).toDouble();
bbox[3] = query.value(4).toDouble();
rowId++;
r = new SpatialIndex::Region();
d = new SpatialIndex::RTree::Data((size_t) 0, (byte*) 0, *r, gid);
r->m_dimension = 2;
d->m_pData = 0;
d->m_dataLength = 0;
r->m_pLow = bbox;
r->m_pHigh = bbox + 2;
d->m_id = gid;
} else {
d = 0;
hasNextBool = false;
cout << "stream is finished d:" << d << endl;
}
}
I initialize the DBStream object and invoke the bulk loading in the following way:
// creating a main memory RTree
memStorage = StorageManager::createNewMemoryStorageManager();
size_t capacity = 1000;
bool bWriteThrough = false;
fileInMem = StorageManager
::createNewRandomEvictionsBuffer(*memStorage, capacity, bWriteThrough);
double fillFactor = 0.7;
size_t indexCapacity = 100;
size_t leafCapacity = 100;
size_t dimension = 2;
RTree::RTreeVariant rv = RTree::RV_RSTAR;
DBStream dstream();
tree = RTree::createAndBulkLoadNewRTree(SpatialIndex::RTree::BLM_STR, dstream,
*fileInMem,
fillFactor, indexCapacity,
leafCapacity, dimension, rv, indexIdentifier);
cout << "BulkLoading done" << endl;
Bulk loading calls my next() and hasNext() functions, retrieved my data, sorts it and then seg faults in the building phase. Any clues way? Yeah, the error is:
RTree::BulkLoader: Building level 0
terminate called after throwing an instance of 'Tools::IllegalArgumentException'
The problem supposedly lies in the memory allocation and a few bugs in the code (somewhat related to memory allocation too). Firstly one needs to properly assign the properties of the Data variable:
memcpy(data->m_region.m_pLow, bbox, 2 * sizeof(double));
memcpy(data->m_region.m_pHigh, bbox + 2, 2 * sizeof(double));
data->m_id = gid;
Second (and most importantly) getNext must return a new object with all the values:
RTree::Data *p = new RTree::Data(returnData->m_dataLength, returnData->m_pData,
returnData->m_region, returnData->m_id);
return returnData;
de-allocation of memory is done by RTree so no care is needed to be taken here.