Efficient way to declare and access a huge array - c++

I am writing a code for which I need to declare an array of around 200 indexes. Now to search a specific index I at least need to travel the array for a defined time or until desired value is achieved - hence at times I might need to travel 200 times if needed - for 200 value row.
This is exactly I wish to ignore so I landed coding it below way:
enum Index{ salary, age };
static const Datas Mydata [] =
{
[Index::one] = {"hello", function_call_ptr_1, function_call_ptr_2},
[Index::two] = "hekllo1", function_call_ptr_1, function_call_ptr_2}
};
Hence in my code I can directly seek it like below:
Mydata [Index::age]
Assuming that access to above structure is to be done inside a function - hence the function should receive Index value as argument to itself. But still what if arguments passed was wrong like:
age = 0;
fun(age);
Is there a better way to access Mydata so that its desired row can easily be accessed without any flaw?

Related

Heterogeneous container of base class when the derived instances are not pointers

I have a base class and I want to store instances of its derivatives in a collection of some sort.
At first I created a map:
std::map<int, Variable> varriableItems;
and then ussing templates I created functions for each derivative and I tried passing in the derivatives like so:
template <>
void Array::addToMap<Number>(Number input)
{
numberVariables[itemCount_] = input;
itemCount_++;
}
By doing so this function was not called because everything was of type Variable of course and I found out about slicing.
So instead I changed my map to take in pointers to my base class
std::map<int, Variable*> varriableItems;
but the problem I have is that all my objects are not created as pointers so I could not pass them in and I was getting errors.
No suitable conversion from "Number" to "Variable" exists.
Due to my implementation I can only create instances of objects
like so:
auto aNumberVariable = Number{50};
Ofcourse if I instead do:
Number aNumberVariable = new Number(50);
it works great.
The reason am doing this is explained bellow.
Please bear with me because this is a weird assignment.
We were asked to create a program that behaves/understands the syntax of a programming language called Logo, without actually analyzing the text as an input file, but rather "disguise" it to appear as such while in fact we just use C++ using what we learned from C++ and lots of overloads and pre-processor tricks
We have to be able to make our own "types" of variables called NUMBER,WORD,BOOLEAN,ARRAY, LIST,SENTENCE.
To declare them we have to use(note no semi-colons should be used):
//define number variable with value 21
MAKE number = NUMBER: 21
//define hello variable with value “hello”
MAKE hello = WORD: “hello”
//define myMoves variable contains list of turtle moves
MAKE myMoves = LIST [
LIST [WORD: “FORWARD”, NUMBER: 100],
LIST [WORD: “LEFT”, NUMBER: 90],
LIST [WORD: “FORWARD”, NUMBER: 100]
]
//define array variable with empty array
MAKE array = ARRAY {
number,
hello,
NUMBER: 12
BOOLEAN: TRUE,
ARRAY {
myMoves,
LIST [WORD: “BACK”, NUMBER: 100]
}
}
//define book variable with sentence type
MAKE book = SENTENCE (hello, WORD: “hello!”)
That's just a small part, we later have to support functions, nested loops , etc.
So do this I have to find a way to use the colon since I cannot overload it, so I did this:
//Create an instance of Number and write the first half of the ternary operator so we
//always get the false value so we can use the : like this
#define NUMBER Number{} = (false) ? 0
//semicolon infront for the previous command that needs it
#define MAKE ;auto
So now this:
//following commands will deal with the semicolon
MAKE myNumber = NUMBER: 21
worked great and it actually gets replaced by the processor to this:
auto myNumber = Number{} = (false) ? 0 : 21
So i worked with this for all my derivatives and I proceeded to overload operators to compare them, implement if else function in a similarly weird syntax.
Now I either have to figure out a way to make this work again but this time creating them as pointer instead (Which I assume is the only way for this to work, but I so far I couldn't figure it out) or create a single class for all types but doing it in separate objects that all inherit from a single base class makes more sense to me.
And am not sure how strict they will be, it is an unconventional project assignment for sure.
The reason I want to hold them together in a container is so I can then implement an Array and list object that can hold every type. At first I tried to use a different container for each type and made an iterator to iterate multiple maps separately, but when I got to the LIST implementation things got weird.
The list syntax is using the brackets [ ] which can only get 1 input value, so the idea was to collect them by overloading the comma operator and pass in one value to the list object.
I know this is weird , thank you for your time
I didn't read through all of your post. (actually I did because your task is so ... beyond words) but if you need polymorphism in a container and you also need the container to hold the objects, then the solution is unique_ptr:
container<std::unique_ptr<Base>>
In your case it would go something along this:
std::unordered_map<int, std::unique_ptr<Variable>> varriableItems;
varriableItems[0] = std::make_unique<Number>(50);

Best approach to query a database to create a collection of a custom class in C++

I am new to interfaces with databases through c++ and was wondering what is the best approach to do the following:
I have an object with member variables that I define ahead of time, and member variables that I need to pull from a database given the known variables. For example:
class DataObject
{
public:
int input1;
string input2;
double output1;
DataObject(int Input1, string Input2) :
input1(Input1), input2(Input2)
{
output1 = Initializer(input1,input2);
}
private:
Initializer(int, string);
static RecordSet rs; //I am just guessing the object would be called RecordSet
}
Now, I can do something like:
std::vector<DataObject> v;
for (int n = 0; n <= 10; ++n)
for (char w = 'a'; w <= 'z'; ++w)
v.push_back(DataObject{n,z});
And get an initialized vector of DataObjects. Behind the scenes, Initializer will check check if rs already has data. If not, it will connect to the database and query something like: select input1, input2, output1 from ... where input1 between 1 and 10 and input 2 between 'a' and 'z', and then start initializing each DataObject with output1 given each pair of input1 and input2.
This would be utterly simple in C#, but from code samples I have found online it looks utterly ugly in C++. I am stuck on two things. As stated earlier, I am completely new to database interfaces in C++, and there are so many methods from which to choose, but I would like to hone in on a specific method that truly fits my purpose. Furthermore - and this is the purpose - I am trying to make use of a static data set to pull data in a single query, rather than run a new query for each input1/input2 combination; even better yet, is there a way to have database results written directly into the newly created DataObjects rather than making a pit stop in some temporary RecordSet object.
To summarize and clarify: I have database on a relational database, and I am trying to pull the data and store it into a collection of objects. How do I do this? Any tips/direction - I am much obliged.
EDIT 8/16/17: After some research and trials I have come up with the below
So I've had progress by using an ADORecordset with the put_CursorLocation set to adUseServer:
rs->put_CursorLocation(adUseServer)
My understanding is that by using this setting the query result is stored on the server, and the client side only gets the current row pointed to by rs.
So I get my data from the row and create the DataObject on the spot, emplace_back it into the vector, and finally call rs->MoveNext() to get the next row and repeat until I reach the end. Partial example as follows:
std::vector<DataObject> v;
DataObject::rs.Open(connString,Sql); // Connection for wrapper class
for (int n = 0; n <= 10; ++n)
for (char w = 'a'; w <= 'z'; ++w)
v.emplace_back(DataObject{n,z});
// Somewhere else...
void DataObject::Initializer(int a, string b) {
int ra; string rb; double rc;
// For simplicity's sake, let's assume the result set is ordered
// in the same way as the for-loop, and that no data is missing.
// So the below sanity-check would be unnecessary, but included.
while (!rs.IsEOF())
{
// Let's assume I defined these 'Get' functions
ra = rs.Get<int>("Input1");
rb = rs.Get<string>("Input2");
rc = rs.Get<double>("Output1");
rs.MoveNext();
if (ra == a && rb == b) break;
}
return rc;
}
// Constructor for RecordSet:
RecordSet::RecordSet()
{
HRESULT hr = rs_.CoCreateInstance(CLSID_CADORecordset);
ATLENSURE_SUCCEEDED(hr);
rs_->put_CursorLocation(adUseServer);
}
Now I'm hoping that I interpreted how this works correctly; otherwise, this would be a whole lot of fuss over nothing. I am not an ADO or .Net expert - clearly - but I'm hoping someone can chime in to confirm that this is indeed how this works, and perhaps shed some more light on the topic. On my end, I tested the memory usage using VS2015's diagnostic tool, and the heap seems to be significantly larger when using adUseClient. If my conjecture is correct, then why would anyone opt to use adUseClient, or any of the other choices, over adUseServer.
I think of two options: by member type and BLOB.
For classes, I recommend one row per class instance with one column per member. Search the supported data types by your database. There are some common types.
Another method is to use the BLOB (Binary Large OBject) data type. This is a "binary" data type used for storing data-as-is.
You can use the BLOB type for members that are of unsupported data types.
You can get more complicated by researching "Database Normalization" or "Database normal forms".

static STL map of int and tuple returns 0

Building a calendar for my C++ class. I have a utility class of static methods and static containers. Most notably:
Dictionary.h
static std::map<int,std::tuple<std::string,int>>months;
static std::map<int,std::tuple<std::string,int>>::iterator mitr;
This map contains, as keys, months 0-11. The tuple value contains the string representation of each month and the number of days of each month. For instance:
Dictionary.cpp
map<int,tuple<string,int>> Dictionary::initMonths(){
map<int,tuple<string,int>>m;
map<int,tuple<string,int>>::iterator mapitr = m.begin();
m.insert(mapitr, make_pair(0,make_tuple("January",31)));
m.insert(mapitr, make_pair(1,make_tuple("February",28)));
// insert remaining months...
return m;
}
The problem occurs when I attempt to access this map from another class:
Calendar.cpp
Calendar::Calendar(){
Dictionary::init();
time_t t = chrono::system_clock::to_time_t(chrono::system_clock::now());
tm* t2 = localtime(&t);
int mo = (t2->tm_mon);
Dictionary::mitr = Dictionary::months.find(mo);
cout<<(*Dictionary::mitr).first<<endl; // => 0
cout<<get<0>((*Dictionary::mitr).second)<<endl; // nothing
}
I'm not sure what I'm doing wrong here. Any suggestions would be appreciated.
EDIT:
void Dictionary::init(){
packaged_task<map<int,tuple<string,int>>()>task3(initMonths);
future<map<int,tuple<string,int>>>fu3 = task3.get_future();
guarded_thread t3(std::move(task3));
map<int,tuple<string,int>>months = fu3.get();
}
How exactly do you init the map? Your Dictonary code shows a function Dictionary::initMonths(), which returns a map, but your example application code just calls Dictionary::init(). If those functions are actually the same and this is just a typo, then you forgot to assign the return value of initMonths to the static months variable.
Just as a suggestion, having a map here seems really a bit like overkill - you actually don't want the property of map that it arranges sparse keys in a tree-like structure in this case. It might be easier (and faster, for what it's worth) to just use a vector or a fixed array and just access by index, without iterator etc.

Hashing Function/Code

so I'm just learning (or trying to) a bit about hashing. I'm attempting to make a hashing function, however I'm confused where I save the data to. I'm trying to calculate the number of collisions and print that out. I have made 3 different files, one with 10,000 words, 20,000 words and 30,000 words. Each word is just 10 random numbers/letters.
long hash(char* s]){
long h;
for(int i = 0; i < 10; i++){
h = h + (int)s[i];
}
//A lot of examples then mod h by the table size
//I'm a bit confused what this table is... Is it an array of
//10,000 (or however many words)?
//h % TABLE_SIZE
return h
}
int main (int argc, char* argv[]){
fstream input(argv[1]);
char* nextWord;
while(!input.eof()){
input >> nextWord;
hash(nextWord);
}
}
So that's what I currently have, but I can't figure out what the table is exactly, as I said in the comments above... Is it a predefined array in my main with the number of words in it? For example, if I have a file of 10 words, do I make an array a of size 10 in my main? Then if/when I return h, lets say the order goes: 3, 7, 2, 3
The 4th word is a collision, correct? When that happens, I add 1 to collision and then add 1 to then check if slot 4 is also full?
Thanks for the help!
The point of hashing is to have a constant time access to every element you store. I'll try to explain on simple example bellow.
First, you need to know how much data you'd have to store. If for example you want to store numbers and you know, that you won't store numbers greater than 10. Simpliest solution is to create an array with 10 elements. That array is your "table", where you store your numbers. So how do I achieve that amazing constant time access? Hashing function! It's point is to return you an index to your array. Let's create a simple one: If you'd like to store 7, you just save it to array on position 7. Every time, you'd like to look, for element 7, you just pass it to your hasning funcion and bzaah! You got an position to your element in constant time! But what if you'd like to store more elements with value 7? Your simple hashing function is returning 7 for every element and now its position i already occupied! How to solve that? Well, there is not many solution, the simpliest are:
1: Chaining - you simply save element on first free position. This has significant draw back. Imagine, you want to delete some element ... (this is the method, you describing in question)
2: Linked list - if you create an array of pointers on some linked lists, you can easilly add your new element at the end of linked list, that is on position 7!
Both of this simple solutions has its drawbacks and cons. I guess you can see them. As #rwols has said, you don't have to use array. You can also use a tree or be a real C++ master and use unordered_map and unordered_set with custom hash function, which is quite cool. Also there is structure named trie, which is usefull, when you'd like to create some sort of dictionary (where is really hard to know, how many words you will need to store)
To sum it up. You has to know, how many things, you wan't to store and then, create ideal hashing function, that covers up array of apropriate size and in perfect world, it has to have uniform index distribution, with no colisions. (Achiving this is pretty hard and in the real world, I guess, this is impossible, so the less colisions, the better.)
Your hash function, is pretty bad. It will have lot of colisions (like strings "ab" and "ba") and also, you need to mod m it with m being the size of you array (aka. table), so you can save it to some array and you can profit of it. The modus is a way of simplyfiing the has function, because has function has to "fit" in table, that you specified in beginning, because you can't save element on position 11, 12, ... if you have array of 10.
How should good hashing function look like? Well, there is better sources than me. Some example (Alert! It's in Java)
To your example: You simply can't save 10k or even more words into table of size 10. That'll create a lot of collisions and you loose the main benefit of hashing function - constant access to elements you saved.
And how would your code look? Something like this:
int main (int argc, char* argv[]){
fstream input(argv[1]);
char* nextWord;
TypeOfElement table[size_of_table];
while(!input.eof()){
input >> nextWord;
table[hash(nextWord)] = // desired element which you want to save
}
}
But I guess, your goal isn't to save something somewhere, but to count number of colisions. Also note that code above doesn't solve colisions. If you'd like to count colisions, create array table of ints and initialize it to zero. Than, just increment the value, which is stored on index, which is returned by your hash funcion, like this:
table[hash(nextWord)]++;
I hope I helped. Please specify, what else you want to know.
If a hash table is required then as others have stated std::unordered_map will work in most cases. Now if you need something more powerful because of a large entry base, then I would suggest looking into tries. Tries combine the concepts of (Vector-Array) insertion, (Hashing) & Linked Lists. The run time is close to O(M) where M is the amount of characters in a string if you are hashing a string. It helps to remove the chance of collisions. And the more you add to a trie structure the less work has to be done as certain nodes are opened and created. The one draw back is that tries require more memory. Here is a diagram
Now your trie may vary on the size of the array due to what you are storing, but the overall concept and construction of one is the same. If you was doing a word - definition look up then you may want an array of 26 or a few more for each possible hashing character.
To count a number of words which have same hash, we should know hashes of all previous words. When you count a hash of some word, you should write it down, for example in some array. So you need an array with size equal to the number of words.
Then you should compare the new hash with all previous ones. Method of counting depends on what you need - number of pair of collisions or number off same elements.
Hash function should not be responsible for storing data. Normally you would have a container that uses hash function internally.
From what you wrote I understood that you want to create hashtable. One way you could do that (probably not the most efficient one, but should give you an idea):
#include <fstream>
#include <vector>
#include <string>
#include <map>
#include <memory>
using namespace std;
namespace example {
long hash(char* s){
long h;
for(int i = 0; i < 10; i++){
h = h + (int)s[i];
}
return h;
}
}
int main (int argc, char* argv[]){
fstream input(argv[1]);
char* nextWord;
std::map<long, std::unique_ptr<std::vector<std::string>>> hashtable;
while(!input.eof()){
input >> nextWord;
long newHash = example::hash(nextWord);
auto it = hashtable.find(newHash);
// Collision detected?
if (it == hashtable.end()) {
hashtable.insert(std::make_pair(newHash, std::unique_ptr<std::vector<std::string>>(new std::vector<std::string> { nextWord } )));
}
else {
it->second->push_back(nextWord);
}
}
}
I used some C++ 11 features to write an example faster.
I am not sure that I understand what you do not understand. The explanations below might help you.
A hash table is a kind of associative array. It is used to map keys to values in a similar manner an array is used to map indexes (keys) to values. For instance, an array of three numbers, { 11, -22, 33 }, associates index 0 to 11, index 1 to -22 and index 2 to 33.
Now, let us assume that we would like to associate 1 to 11, 2 to -22 and 3 to 33. The solution is simple: we keep the same array, only we transform the key by subtracting one from it, thus obtaining the original index
This is fine until we realize that this is just a particular case. What if the keys are not so “predictable”? A solution would be to put the associations in a list of {key, value} pairs and when someone is asking for a key, just search the list: { 123, 11}, {3, -22}, {0, 33} If the value associated to 3 is asked, we simply search the keys in list for a match and find -22. That’s fine, but if the list is large we’re in trouble. We could speed the search if we sort the array by keys and use binary search, but still the search may take some time if the list is large.
The search speed may be further enhanced if we break the list in sub-lists (or buckets) made of related pairs. This is what a hash function does: puts together pairs by related keys (an ideal hash function would associate one key to one value).
A hash table is a two columns table (an array):
The first column is the hash key (the index computed by a hash function). The size of the hash table is given by the maximum value of the hash function. If, for instance, the last step in computing the hash function is modulo 10, the size of the table will be 10; the pairs list will be broken into 10 sub-lists.
The second column is a list (bucket) of key/values pairs (the sub-list I was taking about).

Function to return SQL query into a vector<double> in C++

I have a query in T-SQL which returns 300 records. Each record has 2 columns (date, int)
What is the easiest way in C++ to put all the dates in one vector and all the integers in another one?
I would like to do it in a function.
It's hard to provide full code without knowing your SQL client library - this affects how you populate the vectors, but basically you loop through the rows read from the DB doing push_back on your two vectors for the values retrieved in each row.
The main question is how are you going to handle the returned parameters? You have two vectors, as you have specified the problem here. You could achieve this by having the caller create the vectors and then the function populate them, like this:
#include <vector>
// function declaration - return false on error, or throw exception if preferred
bool populate(std::vector<double>& dates, std::vector<int>& values);
// calling code
std::vector<double> myDates;
std::vector<int> myValues;
// if you know the row count is 300 ahead of time, do this
unsigned int rowCount;
// rowCount gets set up, to 300 in this example
myDates.reserve(rowCount);
myValues.reserve(rowCount);
// Populate vectors, checking for error (false = error)
if (populate(myDates, myValues)) {
// work with the returned data
}
For extra credit because of better encapsulation or the row data, I would be inclined to use a vector of POD structures. The advantage of this is that each date and value then remain tightly coupled - you can extend this into a full-blown class if you have operations you wish to do for each row. Hide the data behind getters, preferably.
struct Row {
public:
double date;
int value;
};
bool populate(std::vector<Row>& rows);