Extendible Hashing, doubling the size of an array of pointers

Extendible Hashing, doubling the size of an array of pointers - c++

I'm trying to implement Extendible Hashing in C++
There's a struct which acts as an Index and it contains an array of type 'Bucket'
Bucket * bucket_pointers;
There's another struct, Bucket, which has an array, which holds my values
E values[N] = {};
I've got a more or less working program, with one problem: Everytime I to double the size of my hash table, I'm copying all of my buckets into a new array (twice the size)
Index_0
Bucket <n= 3, local_depth=2, 0x100200000>
[12,4,,8,]
Index_1
Bucket <n= 0, local_depth=1, 0x100200028>
[,,,,]
Index_2
Bucket <n= 3, local_depth=2, 0x100200050>
[2,10,6,,]
Index_3
Bucket <n= 0, local_depth=1, 0x100200078>
[,,,,]
However, the Bucket with address 0x100200078 should actually point to the bucket with address 0x100200028, i.e. both indices (1 and 3) should point to the same bucket.
Here I'm deciding whether to split a bucket or double the size of my index...
while (!bucket_pointers[h%index_size].append(e)){
if(bucket_pointers[h%index_size].local_depth<global_depth){
split(hashValue);
}
else if(bucket_pointers[h%index_size].local_depth==global_depth){
resize();
}
}
I'm currently doubling the size of my array like this:
for (size_t i = 0; i < index_size; ++i){
for (size_t j = 0; j < bucket_pointers[i].n; ++j){
newBucket_pointers[i] = bucket_pointers[i];
newBucket_pointers[i+index_size] = bucket_pointers[i];
}
}

Note that Bucket * bucket_pointers; is not an array of Bucket pointers as it's name would imply. It's a pointer to a Bucket (the first Bucket in an array of Buckets to be specific).
So, when you copy the array of buckets to another, you end up with identical copies of buckets each with their own values arrays.
newBucket_pointers[i] = bucket_pointers[i];
newBucket_pointers[i+index_size] = bucket_pointers[i];
If you want newBucket_pointers[i] and newBucket_pointers[i+index_size] to be pointers that point to the same Bucket then the type of bucket_pointers (and newBucket_pointers) should actually be Bucket**. Then bucket_pointers is a pointer to a Bucket* and bucket_pointers[i] is a pointer to a Bucket. That way bucket_pointers[i], newBucket_pointers[i] and newBucket_pointers[i+index_size] would point to the same Bucket. I recommend a std::vector<Bucket*> bucket_pointers instead though for easier memory management.
If instead, you intend to copy the Buckets as you do now but have their values member point to a shared array, then you can keep bucket_pointers as it is and you need to change the type of values to a pointer and allocate the array separately. If you want to share the array this way, you should probably use a shared_ptr to make the eventual deallocation easier.

I've included some code below that performs as a very simple hash table. It is for instructional purpose only and not robust enough for use in a real application. In real life use the built-in std::unordered_set which works much better.
I avoid the need to change the bucket size, by using a linked list as a bucket that can expand as needed.
Is this example helpful to set you on the right track?
#include <iostream>
#include <array>
#include <list>
#include <string>
#include <cassert>
class CTable
{
public:
void Add(const std::string &sKey, int nVal);
int Find(const std::string &sKey);
protected:
size_t Index(const std::string &sKey);
private:
struct SData
{
SData(const std::string &s, int n)
: sKey(s)
, nVal(n)
{
}
std::string sKey;
int nVal;
};
typedef std::list<SData> Bucket_t;
enum { nBuckets = 24 };
typedef std::array<Bucket_t, nBuckets> Table_t;
Table_t m_table;
const SData *Lookup(const Bucket_t &b, const std::string &sKey);
};
void CTable::Add(const std::string &sKey, int nVal)
{
size_t nIndex = Index(sKey);
const SData *p = Lookup(m_table.at(nIndex), sKey);
if (p)
throw std::runtime_error("duplicate key");
m_table.at(nIndex).push_back(SData(sKey, nVal));
}
int CTable::Find(const std::string &sKey)
{
size_t nIndex = Index(sKey);
const SData *p = Lookup(m_table.at(nIndex), sKey);
if (p)
return p->nVal;
else
throw std::runtime_error("not found");
}
size_t CTable::Index(const std::string &sKey)
{
return std::hash<std::string>()(sKey) % m_table.size();
}
const CTable::SData *CTable::Lookup(const CTable::Bucket_t &b,
const std::string &sKey)
{
for (const SData &s : b)
if (s.sKey == sKey)
return &s;
return nullptr;
}
int main()
{
CTable t;
t.Add("one", 1);
t.Add("two", 2);
t.Add("three", 3);
assert(2 == t.Find("two"));
try
{
t.Find("four");
assert(false);
}
catch (std::exception &)
{
}
try
{
t.Add("two", 3);
assert(false);
}
catch (std::exception &)
{
}
return 0;
}

As #user2079303 already pointed out, what you want is an array of Bucket**.
Let me clarify this with some imagery:
Extendible-hashing explained
One thing to remember in case Bucket** index = new Bucket*[<size_here>] confuses you,
say you want to make a simple int-array.
You would do:
int* nums = new int[5];
Simply imagine to decrease the number of *-symbols on the right-side since
that is defining what the content-type shall be. And so all you want to store is addresses to Buckets. Thus the index containing 1 or more pointer to Buckets.
Hope it helps!

Related

How to create a copy constructor and use a constructor as a function argument

at the beginning I would like to appologize for the title of this thread. I was not sure about it, so I will explain what I would like to achieve.
Lets get started :)
My code:
class CCompany
{
public:
CCompany(void); // Default constructor
CCompany(const CCompany & b); // Copy constructor
bool NewAccount(const char * AccountID, int initialBalance);
bool NewTranscation(const char * AccountID_SendFrom, const char * AccountID_SendTo, amount);
CCompany ShowAllTransactions(const char * AccountID); // Function that shows all the transactions of the selected account
int Balance(); // Shows actual account
...
...
...
private:
struct Transcaction
{
char AccountID[100];
int InitialBalance;
int ActualAccount;
int *transactionAmount;
int transactionCounter;
};
Transcaction Transcactions[10000];
}
int main ( void )
{
CCompany c0;
assert(c0.NewAccount( "11111", 200 ));
assert(c0.NewAccount("66666", -600 ));
assert(c0.NewTranscation( "66666", "11111", 1000 ) ); // After this transaction>>> 11111: 1200, 66666: -1600
assert(c0.Account( "66666" ).Balance() == -1600 );
}
First of all, I would like to say that we cannot use STL, nor string, so vector or other STL containers are not possible to use. I choose struct array for the purpose of the task. If you have a better suggestion, just let me know. Also, we cannot use strings, but char array. It is perhaps because of learning how to allocate correctly memory, copy char array..
Anyway, I would like to ask two questions:
1) how to create a copy constructor for this class
2) I am not sure about this "assert ( x0 . Account ( "32322" ). Balance ( ) == -1600 );"
How to create a function Account which output goes to the input of the Balance() function.
Thank you in advance.

You seem to have to rely on static allocation:
Transcaction Transcactions[10000];
Well, do so the same with your accounts:
Account accounts[1000];
You will have to create an appropriate account class, though, it might look similar to your transaction class with a member for id and balance. You might want to provide appropriate constructors to facilitate further work with it:
struct Account
{
char id[128];
int balance;
Account(char const* theId = nullptr, int balance = 0)
: balance(balance)
{
if(theId)
{
strncpy(id, theId, sizeof(id));
// just for the case of theId begin too long:
id[sizeof(id) - 1] = 0;
}
}
Account(Account const& other) = default;
};
Further details ommitted... Be aware that you get a default assignment operator, too.
Your transaction class, though, appears inappropriate to me. A transaction essentially requires a source account, a destination account and the amount to be transferred. Id is optional, but can be useful. Depending on your needs, you could use a pointer to the corresponding account instance or the index wihtin your companies account array:
struct Transaction
{
unsigned int id;
unsigned int sourceAccountIndex; // alternative: pointer or id string
unsigned int destinationAccountIndex; // alternative: pointer or id string
int amount;
};
Using indices or pointers, you get quicker access to the related accounts, however, you have to consider pointers/indices getting invalid if you're ever going to delete accounts again. This is not too difficult to cope with, if you don't want to, you still can keep the account id as you have now (but you need two of them!).
With the array class members, you get your copy constructor for free! Just declare it:
Company(Company const& other) = default;
If you want or have to write the copy constructor explicitly, simply iterate over the arrays and copy the entries (normally, you'd prefer std::copy, but as STL is explicitly excluded...):
Company::Company(Company const& other)
{
for(unsigned int i = 0; i < sizeof(accounts)/sizeof(*accounts); ++i)
{
accounts[i] = other.accounts[i];
}
// transactions analogously
}
Finally: Your account function would iterate over the entries and return the first account the id matches:
Company::Account* Company::account(char const* id)
{
for(unsigned int i = 0; i < sizeof(accounts)/sizeof(*accounts); ++i)
{
if(strcmp(id, accounts[i].id) == 0)
{
return accounts + i;
}
}
return nullptr;
}
Variant of using pointers (the one I personally would prefer):
Company::Account* Company::account(char const* id)
{
Account* end = accounts + sizeof(accounts)/sizeof(*accounts);
for(Account* a = accounts; a != end; ++a)
{
if(strcmp(id, a->id) == 0)
{
return a;
}
}
return nullptr;
}
A last hint: with every new Transaction being created, do not forget to adjust the balances of the involved accounts!

C++, array of objects, customize where they are stored in memory

Currently I working on a existing project (DLL ) which I have to extend.
For the transport through the DLL I have a struct for example 'ExternEntry'
and a struct which passes a array of it.
struct ExternEntry
{
unsigned int MyInt;
const wchar_t* Text;
}
struct ExternEntries
{
const ExternEntry* Data;
const unsigned int Length;
ExternEntries(const ExternEntry* ptr, const unsigned int size)
: Data(ptr)
, Length(size);
{
}
}
In the existing project architecture, it will be the first time that a array is passed to the DLL callers. So the existing architecture doesn't allow arrays and if a struct is passed to a caller, normally there is a wrapper-struct for it (because of their str pointers).
Inside the DLL I need to wrap the ExternEntry so have a valid Text pointer.
struct InternEntry
{
ExternEntry Data;
std::wstring Text;
inline const ExternEntry* operator&() const { return& Data }
UpdateText() { Data.Text = Text.c_str(); }
}
struct InternEntries
{
std::vector<InternEntry> Data;
operator ExternEntries() const
{
return ExternEntries(Data.data()->operator&(), Data.size());
}
}
So the problem is, when the Caller received the ExternEntries and created a vector again:
auto container = DllFuncReturnInternEntries(); // returns ExternEntries
std::vector<ExternEntry> v(container.Data, container.Data + container.Length);
The first element is valid. All other elements are pointing to the wrong memory because in memory the InternEntry (with the wstring Text) is stored between the next InternEntry.
Maybe I'm wrong with the reason why this can't work.
[Data][std::wstring][Data][std::wstring][Data][std::wstring]
Caller knows just about the size of the [Data]
So the vector is doing the following:
[Data][std::wstring][Data][std::wstring][Data][std::wstring] 
  |       |       |
 Get     Get     Get
instead of
[Data][std::wstring][Data][std::wstring][Data][std::wstring]
  |                   |                   |
 Get                 Get                 Get
Do I have any possibilities to customize how the vector stores InternEntry objects in memory?
like Data,Data,Data ..anywhere else wstring,wstring,wstring
I hope I have explained my problem well

Retrieve array name

I have written a function in c++ which receives a struct as a input. The struct object received has two arrays. I need to use both the arrays for different purposes. The array names have been created in a certain format. How to retrieve array names in a string.
struct INFO
{
float fADataLHS[3] = {1,2,3};
float fADataRHS[3] = {4,5,6};
Struct INFO has been defined where two arrays have been defined an initialized. The function useStruct uses both the function for different purposes.
void useStruct(struct *INFO)
{
--------;
--------;
}
int main()
{
struct INFO info;
useStruct(info);
}
I want a method in which I can retrieve the name of the array as for ex. fAdataLHS and store it to a string. The idea is to find the sub-string LHS and RHS from the string names and process then accordingly.
PS: I am quite new to c++.

I will go simple as you're a begginer to C++.
If you want to use both of arrays for different purposes, just doit. For instance:
void use_array_for_different_purposes(INFO *info)
{
// Purpose one, printing values using fADataLHS.
for (int i = 0; i < 3; i++) {cout << info->fADataLHS[i] << endl;}
// Purpose two, computing total sum using fADataRHS.
int acum;
for (int i = 0; i < 3; i++) {acum += info->fADataRHS[i];}
}
As you can see, you don't need to get the arrays names as strings values.

If I understand corectly, your use case is this: you have two (or more) names and each has a float array associated with it. You want to get the array by name and process the data.
Consider this code:
class INFO
{
std::map<std::string, std::vector<float>> vectors;
public:
INFO() : vectors{}
{
vectors["fADataLHS"] = { 1, 2, 3 };
vectors["fADataRHS"] = { 4, 5, 6 };
}
const std::vector<float>& operator[](const std::string& key) const // access vector by key
{
return vectors.at(key);
}
};
void useStruct(const INFO& info) // pass instance by const reference
{
std::cout << info["fADataLHS"][0] << "\n"; // access element 0 from the fADataLHS array
// get the entire array:
const auto& arr = info["fADataRHS"];
// this will throw a std::out_of_bounds
const auto& arr = info["non-existent-key"];
}
EDIT: A few other notes:
in C++ try not to use float - use double instead
if you need to alter the vector contents from client code, add a non-const version of the operator[]

Offset and pass vector reference

When using arrays you can do something like
class SomeClass
{
public:
int* LockMember( size_t& numInts );
private:
int* member;
size_t numInts;
};
int* SomeClass::LockMember( size_t& out_numInts )
{
out_numInts = numInts - 1;
return member + 1;
}
To return an array offset by some amount so as to prevent someone from modifying some part of contingeous memory, or, atleast, show some intent that this part of contingeous memory of the object should remain untouched.
Since I use vectors everywhere, I am wondering if there was some way to accomplish the same sort of thing:
class SomeClass
{
public:
std::vector<int> LockMember( void );
private:
std::vector<int> member;
};
std::vector<int> SomeClass::LockMember( void )
{
// somehow make a vector with its beginning iterator pointing to member.begin() + 1
// have a size smaller by one, still the same end iterator. The vector must be
// pointing to the same data as in this class as it needs to be modifiable.
return magicOffsetVector;
}
With the commented part replaced by real code. Any ideas?

If I understand you correctly: You want some memory with two parts: At the beginning you want something that can't be touched, and after that you want something that is open for use by client code.
You could do something along the following code. This will give the client code a copy to play with. This does mean you would have to do a lot of copying, though.
class SomeClass
{
public:
std::vector<int> getMember( void ) const;
void setMember(std::vector<int> newContent);
private:
std::vector<int> member;
size_t magicOffset;
};
// Read restricted part
std::vector<int> SomeClass::getMember( void ) const
{
return vector<int>(member.begin() + magicOffset, member.end());
}
// Assign to restricted part
void SomeClass::setMember(const std::vector<int>& v)
{
std::copy(v.begin(), v.end(), member.begin() + magicOffset);
}
In order to avoid the copying, it is possible that you could allocate memory for two vectors, one for the protected part and one for the unprotected part, and use placement new to put both vectors into that memory, thus ensuring that they are in contiguous memory. And then give the client code more or less free access to the public part of the vector. However, there's still the thing with bookkeeping variables in vector, and basically this would be an awful hack that's just waiting to blow up.
However, if you only need access to the unrestricted part on a per-element basis, you could just do range-checking on the arguments, i.e.:
int getElement(size_t idx)
{
idx += magicOffset;
if (idx > member.size() || idx < 0) throw std::out_of_range("Illegal index");
return member[idx];
}
And then either provide a setElement, or return int&.

Handling Huge Multidimensional Arrays in C++

I'm designing a game in C++ similar to Minecraft that holds an enormous amount of terrain data in memory. In general, I want to store an array in memory that is [5][4][5][50][50][50]. This isn't bad since it amounts to about 100mb of virtual memory since my structure will only be about 8 bytes.
However, I'm having trouble figuring out the best way to handle this. I do want this to be in virtual memory, but obviously not on the stack. And I keep making the mistake some how of creating this array on the stack an causing a stack overflow. What I would like to do is below. This is just code that I threw together to give you an example of what I'm doing, I have code with correct syntax on my machine, I just didn't want to clutter the post.
typedef struct modelBlock
{
// Information about the blocks
} BLOCK;
typedef struct modelGrid
{
bool empty;
BLOCK blocksArray[50][50][50];
} GRID;
class Parent
{
Child* child;
Parent(void);
}
Parent::Parent()
{
Child c;
child = &c;
}
class Child
{
GRID grids[5][4][5];
}
However, every time I do this, I cause a stack overflow (appropriate web site choice right?). I played with using pointer based arrays, but I had a lot of trouble with data being lost outside of its scope.
If anyone could give me some insight on how to get my data to store on the heap instead of the stack, or if I should use some other way of creating my array, I'd really appreciate the help. I'd like to avoid using vectors because of overhead, though I'm not sure how substantial it is.

Use boost::multi_array

If you want to allocate something on the heap, use new.
#include <memory>
class Parent
{
std::auto_ptr<Child> child; // use auto_ptr for dynamically-allocated members
Parent(const Parent&); // You probably don't want to copy this giant thing
public:
Parent();
};
Parent::Parent()
: child(new Child) // initialize members with an initializer list
{
}
Also, avoid mixing C and C++ styles. There's no reason to do
typedef struct blah{ ... } BLAH;
in C++. A struct is just a class with all of the members public by default; just like a class, you can refer to the struct type's name without using the struct tag. There's also no need to specify void for a function that takes no parameters.
boost::multi_array (linked in PigBen's answer) is a good choice over raw arrays.

If you want the class created on the heap, create it with new:
Child * c = new Child;
and then of course delete it, or better still use a smart pointer.

In order to do exactly what you're trying to do you have to declare everything as pointers (and pointers to pointers to pointers to pointers) and then allocate each one individually.
Teh sux!
A better option is to simply allocate the ginormous block in one chunk and use multiple variable along with pointer arithmetic to arrive at the correct location.
Edit: Wasn't paying attention and didn't notice your constructor. That's not only not the way to get your Child allocated on the free-store, it's a great way to create situations eliciting undefined behavior. Your Child will be gone when the constructor is through and the pointer to it will then be invalid. I wonder if you shouldn't run through some basic tutorials before trying to write a game.

Here's something that works and can be built upon without the boost dependency. One downside is it removes use of [][][] style of referencing elements, but it's a small cost and can be added.
template<class T>
class Matrix {
unsigned char* _data;
const size_t _depth;
const size_t _cols;
const size_t _rows;
public:
Matrix(const size_t& depth, const size_t& rows, const size_t& cols):
_depth(depth),
_rows(rows),
_cols(cols) {
_data = new unsigned char [depth * rows * cols * sizeof(T)];
}
~Matrix() {
delete[] _data;
}
T& at(const size_t& depthIndex, const size_t& rowIndex, const size_t& colIndex) const {
return *reinterpret_cast<T*>(_data + ((((depthIndex * _cols + colIndex) * _rows) + rowIndex) * sizeof(T)));
}
const size_t& getDepth() const {
return _depth;
}
const size_t& getRows() const {
return _rows;
}
const size_t& getCols() const {
return _cols;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
Matrix<int> block(50, 50, 50);
size_t d, r, c;
for (d = 0; d < block.getDepth(); d++) {
for (r = 0; r < block.getRows(); r++) {
for (c = 0; c < block.getCols(); c++) {
block.at(d, r, c) = d * 10000000 + r * 10000 + c;
}
}
}
for (d = 0; d < block.getDepth(); d++) {
for (r = 0; r < block.getRows(); r++) {
for (c = 0; c < block.getCols(); c++) {
assert(block.at(d, r, c) == d * 10000000 + r * 10000 + c);
}
}
}
return 0;
}

A smaller example (with changed names for all the structs, to make the general principle clearer). The 'Bloe' struct is the one you want to allocate on the heap, and this is accomplished using 'new'.
struct Bla {
int arr[4][4];
};
struct Bloe {
Bla bla[2][2];
};
int main()
{
Bloe* bloe = new Bloe();
bloe->bla[1][1].arr[1][1] = 1;
return 0;
}

I did this by putting all the data in a binary file. I calculated the offset of the data and used seek() and read() to get the data when needed. The open() call is very slow so you should leave the file open during the lifetime of the program.

Below is how I understood what you showed you were trying to do in your example. I tried to keep it straightforward. Each Array of [50][50][50] is allocated in one memory chunk on the heap, and only allocated when used. There is also an exemple of access code. No fancy boost or anything special, just basic C++.
#include <iostream>
class Block
{
public:
// Information about the blocks
int data;
};
class Grid
{
public:
bool empty;
Block (*blocks)[50][50];
Grid() : empty(true) {
}
void makeRoom(){
this->blocks = new Block[50][50][50];
this->empty = false;
}
~Grid(){
if (!this->empty){
delete [] this->blocks;
}
}
};
class Parent
{
public:
Grid (* child)[4][5];
Parent()
{
this->child = new Grid[5][4][5];
}
~Parent()
{
delete [] this->child;
}
};
main(){
Parent p;
p.child[0][0][0].makeRoom();
if (!p.child[0][0][0].empty){
Block (* grid)[50][50] = p.child[0][0][0].blocks;
grid[49][49][49].data = 17;
}
std::cout << "item = "
<< p.child[0][0][0].blocks[49][49][49].data
<< std::endl;
}
This could still be more simple and straightfoward and just use one bug array of [50][50][50][5][4][5] blocks in one memory chunk on the heap, but I'll let you figure out how if this is what you want.
Also, usind dynamic allocation in class Parent only has the sole purpose to use heap instaed of stack, but for such a small array (5*4*5 pointers), allocating it on stack should not be a problem, hence it could be written.
class Parent
{
public:
Grid child[5][4][5];
};
without changing anything in the way it is used.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extendible Hashing, doubling the size of an array of pointers - c++

Related

How to create a copy constructor and use a constructor as a function argument

C++, array of objects, customize where they are stored in memory

Retrieve array name

Offset and pass vector reference

Handling Huge Multidimensional Arrays in C++

Categories

Resources