Is a compile-time checked string-to-int map possible? - c++

I'm probably trying to achieve the impossible, but StackExchange always surprises me, so please have a go at this:
I need to map a name to an integer. The names (about 2k) are unique. There will be no additions nor deletions to that list and the values won't change during runtime.
Implementing them as const int variables gives me compile-time checks for existence and type.
Also this is very clear and verbose in code. Errors are easily spotted.
Implementing them as std::map<std::string, int> gives me a lot of flexibility for building the names to look up with string manipulation. I may use this to give strings as parameters to functions which than can query the list for multiple values by appending pre-/suffixes to that string. I can also loop over several values by creating a numeral part of the key name from the loop variable.
Now my question is: is there a method to combine both advantages? The missing compile-time check (especially for key-existence) almost kills the second method for me. (Especially as std::map silently returns 0 if the key doesn't exist which creates hard to find bugs.) But the looping and pre-/suffix adding capabilities are so damn useful.
I would prefer a solution that doesn't use any additional libraries like boost, but please suggest them nevertheless as I might be able to re-implement them anyway.
An example on what I do with the map:
void init(std::map<std::string, int> &labels)
{
labels.insert(std::make_pair("Bob1" , 45 ));
labels.insert(std::make_pair("Bob2" , 8758 ));
labels.insert(std::make_pair("Bob3" , 436 ));
labels.insert(std::make_pair("Alice_first" , 9224 ));
labels.insert(std::make_pair("Alice_last" , 3510 ));
}
int main()
{
std::map<std::string, int> labels;
init(labels);
for (int i=1; i<=3; i++)
{
std::stringstream key;
key << "Bob" << i;
doSomething(labels[key.str()]);
}
checkName("Alice");
}
void checkName(std::string name)
{
std::stringstream key1,key2;
key1 << name << "_first";
key2 << name << "_last";
doFirstToLast(labels[key1.str()], labels[key2.str()]);
}
Another goal is that the code shown in the main() routine stays as easy and verbose as possible. (Needs to be understood by non-programmers.) The init() function will be code-generated by some tools. The doSomething(int) functions are fixed, but I can write wrapper functions around them. Helpers like checkName() can be more complicated, but need to be easily debuggable.

One way to implement your example is using an enum and token pasting, like this
enum {
Bob1 = 45,
Bob2 = 8758,
Bob3 = 436,
Alice_first = 9224,
Alice_last = 3510
};
#define LABEL( a, b ) ( a ## b )
int main()
{
doSomething( LABEL(Bob,1) );
doSomething( LABEL(Bob,2) );
doSomething( LABEL(Bob,3) );
}
void checkName()
{
doFirstToLast( LABEL(Alice,_first), LABEL(Alice,_last) );
}
Whether or not this is best depends on where the names come from.
If you need to support the for loop use-case, then consider
int bob[] = { 0, Bob1, Bob2, Bob3 }; // Values from the enum
int main()
{
for( int i = 1; i <= 3; i++ ) {
doSomething( bob[i] );
}
}

I'm not sure I understand all your requirements, but how about something like this, without using std::map.
I am assuming that you have three strings, "FIRST", "SECOND" and "THIRD" that you
want to map to 42, 17 and 37, respectively.
#include <stdio.h>
const int m_FIRST = 0;
const int m_SECOND = 1;
const int m_THIRD = 2;
const int map[] = {42, 17, 37};
#define LOOKUP(s) (map[m_ ## s])
int main ()
{
printf("%d\n", LOOKUP(FIRST));
printf("%d\n", LOOKUP(SECOND));
return 0;
}
The disadvantage is that you cannot use variable strings with LOOKUP. But now you can iterate over the values.

Maybe something like this (untested)?
struct Bob {
static constexpr int values[3] = { 45, 8758, 436 };
};
struct Alice {
struct first {
static const int value = 9224;
};
struct last {
static const int value = 3510;
};
};
template <typename NAME>
void checkName()
{
doFirstToLast(NAME::first::value, NAME::last::value);
}
...
constexpr int Bob::values[3]; // need a definition in exactly one TU
int main()
{
for (int i=1; i<=3; i++)
{
doSomething(Bob::values[i]);
}
checkName<Alice>();
}

Using enum you have both compile-time check and you can loop over it:
How can I iterate over an enum?

Related

Creating ArrayBuilders in a Loop

Is there any way to create a dynamic container of arrow::ArrayBuilder objects? Here is an example
int main(int argc, char** argv) {
std::size_t rowCount = 5;
arrow::MemoryPool* pool = arrow::default_memory_pool();
std::vector<arrow::Int64Builder> builders;
for (std::size_t i = 0; i < 2; i++) {
arrow::Int64Builder tmp(pool);
tmp.Reserve(rowCount);
builders.push_back(tmp);
}
return 0;
}
This yields error: variable ‘arrow::Int64Builder tmp’ has initializer but incomplete type
I am ideally trying to build a collection that will hold various builders and construct a table from row-wise data I am receiving. My guess is that this isn't the intended use for builders, but I couldn't find anything definitive in the Arrow documentation
What do your includes look like? That error message seems to suggest you are not including the right files. The full definition for arrow:Int64Builder is in arrow/array/builder_primitive.h but you can usually just include arrow/api.h to get everything.
The following compiles for me:
#include <iostream>
#include <arrow/api.h>
arrow::Status Main() {
std::size_t rowCount = 5;
arrow::MemoryPool* pool = arrow::default_memory_pool();
std::vector<arrow::Int64Builder> builders;
for (std::size_t i = 0; i < 2; i++) {
arrow::Int64Builder tmp(pool);
ARROW_RETURN_NOT_OK(tmp.Reserve(rowCount));
builders.push_back(std::move(tmp));
}
return arrow::Status::OK();
}
int main() {
auto status = Main();
if (!status.ok()) {
std::cerr << "Err: " << status << std::endl;
return 1;
}
return 0;
}
One small change to your example is that builders don't have a copy constructor / can't be copied. So I had to std::move it into the vector.
Also, if you want a single collection with many different types of builders then you probably want std::vector<std::unique_ptr<arrow::ArrayBuilder>> and you'll need to construct your builders on the heap.
One challenge you may run into is the fact that the builders all have different signatures for the Append method (e.g. the Int64Builder has Append(long) but the StringBuilder has Append(arrow::util::string_view)). As a result arrow::ArrayBuilder doesn't really have any Append methods (there are a few which take scalars, if you happen to already have your data as an Arrow C++ scalar). However, you can probably overcome this by casting to the appropriate type when you need to append.
Update:
If you really want to avoid casting and you know the schema ahead of time you could maybe do something along the lines of...
std::vector<std::function<arrow::Status(const Row&)>> append_funcs;
std::vector<std::shared_ptr<arrow::ArrayBuilder>> builders;
for (std::size_t i = 0; i < schema.fields().size(); i++) {
const auto& field = schema.fields()[i];
if (isInt32(field)) {
auto int_builder = std::make_shared<Int32Builder>();
append_funcs.push_back([int_builder] (const Row& row) ({
int val = row.GetCell<int>(i);
return int_builder->Append(val);
});
builders.push_back(std::move(int_builder));
} else if {
// Other types go here
}
}
// Later
for (const auto& row : rows) {
for (const auto& append_func : append_funcs) {
ARROW_RETURN_NOT_OK(append_func(row));
}
}
Note: I made up Row because I have no idea what format your data is in originally. Also I made up isInt32 because I don't recall how to check that off the top of my head.
This uses shared_ptr instead of unique_ptr because you need two copies, one in the capture of the lambda and the other in the builders array.

Efficient way to compare more 50 strings

I have method which takes two parameters one as string and other as int.
The string has to compare with more than 50 string and Once the match is found int value need to be mapped with hard coded string as Example below
EX:
string Compare_Method(std::string str, int val) {
if(str == "FIRST")
{
std::array<std::string, 3> real_value = {"Hello1","hai1","bye1"}
return real_value[val];
}
else if(str == "SECOND")
{
std::array<std::string, 4> real_value = {"Hello2","hai2","bye2"}
return real_value[val];
}
else if(str == "THIRD")
{
std::array<std::string, 5> real_value = {"Hello3","hai3","bye3"}
return real_value[val];
}
//----- 50+ else if
}
My approach is as above. What will be the efficient way
1.To compare more than 50 string.
2. create std::array for each if case
EDITED : std::array size is not fixed it can be 3,4,5 as edited above.
This would be my way of doing that. The data structure is created only once and the access times should be fast enough
#include <iostream>
#include <string>
#include <array>
#include <unordered_map>
std::string Compare_Method(const std::string& str, int val)
{
// or std::vector<std::string>
static std::unordered_map<std::string, std::array<std::string, 3>> map
{
{ "FIRST", { "Hello1", "hail1", "bye1" }},
{ "SECOND", { "Hello2", "hail2", "bye2" }},
{ "THIRD", { "Hello3", "hail3", "bye3" }},
// 50+ more
};
// maybe check if str is present in the map
return map[str][val];
}
int main()
{
std::cout << Compare_Method("SECOND", 1) << std::endl;
}
If std::unordered_map isn't (fast) enough for you, you can come up with some sort of static optimal hash structure, since keys are known at compile time.
If those 50 strings are something you will be widely using throughout your program, string comparisons will take a toll on performance. I'd suggest you to adapt them to an enum.
enum Strings
{
FIRST,
SECOND,
THIRD,
…
…
}
You'll obviously need a method to convert string to int whenever you get one from the source (user input, file read, etc.). This should be as infrequent as possible since your system now works on enum values (which can be used as indices on STL containers as we see in the next step)
int GetEnumIndex(const std::string& str)
{
// here you can map all variants of the same string to the same number
if ("FIRST" == str || "first" == str) return 1;
…
}
Then, the comparison method can be based on the enum instead of the string:
std::string Compare_Method(const int& strIndex, int val)
{
static std::vector<std::vector<std::string>> stringArray
{
{ "Hello1", "hail1", "bye1" },
{ "Hello2", "hail2", "bye2", "aloha2" },
{ "Hello3", "hail3", "bye3", "aloha3", "tata3" },
…
};
return stringArray[strIndex][val];
}
With information provided by you, I tried various variations to find out best way to achieve objective. I am listing best one here. You can see other methods here.
You can compile it and run run.sh to compare performance of all cases.
std::string Method6(const std::string &str, int val) {
static std::array<std::string, 5> NUMBERS{"FIRST", "SECOND", "THIRD",
"FOURTH", "FIFTH"};
static std::array<std::vector<std::string>, 5> VALUES{
std::vector<std::string>{"FIRST", "hai1", "bye1"},
std::vector<std::string>{"Hello1", "SECOND", "bye1"},
std::vector<std::string>{"Hello1", "hai1", "THIRD"},
std::vector<std::string>{"FOURTH", "hai1", "bye1"},
std::vector<std::string>{"Hello1", "FIFTH", "bye1"}};
for (int i = 0; i < NUMBERS.size(); ++i) {
if (NUMBERS[i] == str) {
return VALUES[i][val];
}
}
return "";
}
For simplicity I have been using NUMBERS with length of 5 but you can use what ever length you want to.
VALUES is std::array of std::vector so you can add any number if element to std::vector.
output from github code.
Method1 880
Method2 851
Method3 7292
Method4 989
Method5 598
Method6 440
You output may be different based on you system and system load at the time of execution.

creating a vector of objects

I have just coded while on the train this, that creates a vector of objects.
I would appreciate if someone has a suggestion to make it more elegant or effective?
#include <iostream>
#include <vector>
using namespace std;
typedef struct{
short xpos,ypos;
short width, height;
short area;
}LABELPROP;
int get_property(int n,LABELPROP *pP)
{
if(n%2)
{
pP->xpos=n*2;
pP->ypos=n*3;
return 1;
}
else return 0;
}
int main()
{
vector<LABELPROP> myvector;
cout<<"Initial Number :"<<myvector.size()<<endl;
LABELPROP temporal;
// LABELPROP *pT=&temporal;
for(int n=1;n<=10;n++) //10 objects
{
//if(get_property(n,pT))
if(get_property(n,&temporal))
myvector.push_back(temporal);
}
for(int i=0;i<myvector.size();i++)
cout<<"("<<myvector[i].xpos<<","<<myvector[i].ypos<<")"<<endl;
return 0;
}
As you can see I eliminated an unnecessary pointer that I originally put.
the temporal struct gets its values from the get_property function so that is why I put that as a pointer
Thanks in advance
You are writing C++ in a C style. You don't need to typedef your struct, just use:
struct LabelProp
{
LabelProp(short xpos, short ypos) : xpos(xpos), ypos(ypos) {}
short xpos,ypos;
short width, height;
short area;
};
I also changed the naming, because most conventions use all uppercase names as constants or macros. I also added a constructor to be used.
You have get_property returning an int, but since this is c++, return a bool.
Probably an even better idea would be replace get_property with addIfOdd and have it look something like:
void addIfOdd(int n, std::vector<LabelProp>& results)
{
if(n%2)
{
results.emplace_back(n * 2, n * 3);
}
}
int main()
{
std::vector<LabelProp> myvector;
for(int n=1;n<=10;n++) //10 objects
{
addIfOdd(n, myvector);
}
}
It looks like you are unnecessarily iterating over a full range of integers when it is in fact the odd numbers that you're after. Why not limit the range to odd numbers? (only half as many iterations), e.g.:
for(int i = 1; i < 10; i += 2) { /* 1, 3, ..., 9 */ }
Using modern C++ you could add a constructor to your class:
explicit label_prop(int n)
: xpos{static_cast<short>(n * 2)}, ypos{static_cast<short>(n * 3)} {}
and replace all your code with a std::generate_n, e.g.:
std::generate_n(std::back_inserter(myvector), 5, [n = 1] () mutable {
return label_prop{std::exchange(n, n + 2)};
});
Here, you're specifying that you want 5 consecutive odd numbers, starting at 1.
If your compiler is modern enough and supports C++11 features, you could use the new range-based for loop, like this:
for(auto& x : myvector)
cout<<"("<<x.xpos<<","<<x.ypos<<")"<<endl;
Also, in order to improve the eficiency you can use the reserve method if you know in advance the size of the vector, so that you can avoid unnecessary object reallocations and memory allocations/deallocations, which can be expensive.

To find duplicate entry in c++ using 2D Vector (std::vector)

I wrote a program to find duplicate entry in a table. I am a beginner in C++, hence I don't know how this program is working efficient. Is there any other idea to write this program? Here I have 3 tables (2D Vector), that they are 1)aRecord_arr 2)mainTable and 3)idxTable. idxtable is use to identify the keys to check duplicate entry. aRecord_arr table to be add in maintable. If it is already exist in maintable, it will show the error "Duplicate Entry". So Check this program, and give your suggestions.
typedef vector<string> rec_t;
typedef vector<rec_t> tab_t;
typedef vector<int> cn_t;
int main()
{
tab_t aRecord_arr= { {"a","apple","fruit"},
{"b","banana","fruit"} };
tab_t mainTable = { {"o","orange","fruit"},
{"p","pineapple","fruit"},
{"b","banana","fruit"},
{"m","melon","fruit"},
{"a","apple","fruit"},
{"g","guava","fruit"} };
tab_t idxTable = { {"code","k"},
{"name","k"},
{"category","n"}};
size_t Num_aRecords = aRecord_arr.size();
int idxSize = idxTable.size();
int mainSize = mainTable.size();
rec_t r1;
rec_t r2;
tab_t t1,t2;
cn_t idx;
for(int i=0;i<idxSize;i++)
{
if(idxTable[i][1]=="k")
{
idx.push_back(i);
}
}
for(size_t j=0;j<Num_aRecords;j++)
{
for(unsigned int id=0;id<idx.size();id++)
{
r1.push_back(aRecord_arr[j][idx[id]]);
}
t1.push_back(std::move(r1));
}
for(int j=0;j<mainSize;j++)
{
for(unsigned int id=0;id<idx.size();id++)
{
r2.push_back(mainTable[j][idx[id]]);
}
t2.push_back(std::move(r2));
}
for(size_t i=0;i<t1.size();i++)
{
for(size_t j=0;j<t2.size();j++)
{
if(t1[i]==t2[j])
{
cout<<"Duplicate Entry"<<endl;
exit(0);
}
}
}
}
If you want to avoid duplicate entries in an array, you should consider using a std::setinstead.
What you want is probably a std::map or a std::set
Don't reinvent the wheel, the STL is full of goodies.
You seem to be rooted in a weakly typed language - but C++ is strongly typed.
You will 'pay' the disadvantage of strong typing almost no matter what you do, but you almost painstakingly avoid the advantage.
Let me start with the field that always says 'fruit' - my suggestion is to make this an enum, like:
enum PlantType { fruit, veggie };
Second, you have a vector that always contain 3 strings, all with the same meaning. this seems to be a job for a struct, like:
struct Post {
PlantType kind;
char firstchar;
string name;
// possibly other characteristics
};
the 'firstchar' is probably premature optimization, but lets keep that for now.
Now you want to add a new Post, to an existing vector of Posts, like:
vector<Post> mainDB;
bool AddOne( const Post& p )
{
for( auto& pp : mainDB )
if( pp.name == p.name )
return false;
mainDB.push_back(p);
return true;
}
Now you can use it like:
if( ! AddOne( Post{ fruit, 'b', "banana" } ) )
cerr << "duplicate entry";
If you need speed (at the cost of memory), switch your mainDB to map, like:
map<string,Post> mainDB;
bool AddOne( const Post& p )
{
if( mainDB.find(p.name) != mainDB.end() )
return false;
mainDB[p.name]=p;
return true;
}
this also makes it easier (and faster) to find and use a specific post, like
cout << "the fruit is called " << mainDB["banana"].name ;
beware that the above will cause a runtime error if the post dont exists
As you can see, firstchar was never used, and could be omitted. std::map
has a hash-function-specialization for string keys, and it will probably be
orders of magnitude faster than anything you or I could whip up by hand.
All of the above assumed inclusion of the correct headers, and
using namespace std;
if you dont like using namespace, prepend std:: to all the right places
hope it helps :)

Assigning Unique ID's to each instance of a class

I am importing items from an XML file. Each XML element (FoodItem, Person, Order, CoffeeRun) is a class & each of these elements will have a unique ID(unique to that class).
<person>
<id>0</id>
<name>...</name>
</person>
<FoodItem>
<id>0</id>
<name>Coffee</name>
</FoodItem>
I am trying to develop a sub class DatabaseItem, that ensures that no 2 objects of a class have the same ID. Can you assist me, by helping me develop an efficient algorithm that makes sure no object will have the same ID as another?
My 2 approaches seem a little inefficient to me:
Use a static class vector that contains all the USED ids so far. When a new DatabaseID( int requestedID ) object is created I check whether the ID is available by going over all the used values in the vector to check the ID is not already there, I think thats Big-O'n speed?
Use a static class bool vector where each element of the vector corresponds to an id (so vector[1] will correspond to the object with ID 1). I check if an ID is already taken by seeing if that element in the vector is true if ( v[nID] == true ) { // this ID is already taken }. This seems inefficient because it means my vector will take a lot of memeory right?
I am not familiar with using maps in C++ but maybe I should use one here?
Any advice on an efficient algorithm would be really helpful:
class DatabaseItem
{
public:
static unsigned int instanceCount;
DatabaseItem()
{
// Assign next available ID
}
DatabaseItem( unsigned int nID )
{
// Check that that id is not already taken
// if id is taken, look for next available id &
// give the item that id
}
private:
unsigned int uniqueID;
};
// My solution: Do you have any better ideas that ensure no objects jave the same ID?
// This seems REALLY inefficient...
class DatabaseItem
{
public:
static unsigned int instanceCount;
static vector <unsigned int> usedIDs;
DatabaseItem()
{
DatabaseItem::instanceCount++;
uniqueID = instanceCount;
usedIDs.add( instanceCount );
}
DatabaseItem( unsigned int nID )
{
if ( isIDFree( nID ) )
{
uniqueID = nID;
}
else uniqueID = nextAvailableID();
DatabaseItem::instanceCount++;
}
bool isIDFree( unsigned int nID )
{
// This is pretty slow to check EVERY element
for (int i=0; i<usedIDs.size(); i++)
{
if (usedIDs[i] == nID)
{
return false;
}
}
return true;
}
unsigned int nextAvailableID()
{
while ( true )
{
unsigned int ID = 0;
if ( isIDFree( ID ) )
{
return ID;
}
else ID++;
}
}
private:
unsigned int uniqueID;
};
// Alternate that uses boolean vector to track which ids are occupied
// This means I take 30000 boolean memory when I may not need all that
class DatabaseItem
{
public:
static unsigned int instanceCount;
static const unsigned int MAX_INSTANCES = 30000;
static vector <bool> idVector;
// Is this how I initialise a static class vector...? (note this code will be outside the class definition)
// vector <bool> DatabaseItem::idVector( MAX_INSTANCES, false );
DatabaseItem()
{
uniqueID = nextAvailableID();
idVector[uniqueID] = true;
}
DatabaseItem( unsigned int nID )
{
if ( nID >= MAX_INSTANCES )
{
// not sure how I shd handle this case?
}
if ( idVector[nID] == false )
{
uniqueID = nID;
idVector[nID] = true;
}
else
{
uniqueID = nextAvailableID();
idVector[uniqueID] = true;
}
instanceCount++;
}
unsigned int nextAvailableID()
{
for (int i=0; i<idVector.size(); i++)
{
if ( !idVector[i] )
{
return i;
}
}
return -1;
}
bool isIDFree( unsigned int nID )
{
// Note I cannot do this: Because I am using Mosync API & it doesn't support any C++ exceptions'
// I declare idVector with no size! so not idVector( 30000, false)... just idVector;
// then I allow an exception to occur to check if an id is taken
try
{
return idVector[nID];
}
catch (...)
{
return true;
}
}
private:
unsigned int uniqueID;
};
A vector<bool> is implemented with one bit per bool, so it's not wasting as much space as you assume.
A set<unsigned int> is the easy solution to this. A vector<bool> is faster. Both could use a bit of memory. Depending on your usage patterns, there's a few other solutions:
An unsigned int all_taken_upto_this; combined with a set<int> covering all the oddball ID's that are higher than all_taken_upto_this - remove from set and increase the counter when you can.
A map<unsigned int, unsigned int> which is logically treated as begin,end of either taken or free sequences. This'll take a little fiddling to implement correctly (merging consecutive map elements when you add the last ID in between two elements e.g.)
You could probably use a premade "sparse bitset" type data structure - I don't know any implementations OTOH.
Depending on the number of elements and a couple other issues, you might consider actually storing them (or at least pointers to them) in a map. That would be rather simple to implement, but will take some space. On the other hand, it will provide you with fast lookup by id which might be a clear advantage if there are cross references in the XML. The map (assuming pointers) would look like:
std::map<int, std::shared_ptr<Object> > id_map;
std::shared_ptr<Object> p( new Object( xml ) );
if ( !id_map.insert( std::make_pair( p->id, p ) ).second ) {
// failed to insert, the element is a duplicate!!!
}
If you are not locked into using an integer you may look into GUIDs (Global Unique IDs). Depending on which platform you are using you can generally find a couple of utility functions to dynamically generate a GUID. If using Visual Studio, I've used the CoCreateGuid function.
If you are locked into a 32-bit integer another option option is a hash table. If each XML element is unique, then a hashing function could generate a unique hash value. Depending on the size of your data set there is still a small probability of collision. The one I've used that seems to have a pretty low collision rate with the data set that I've worked with is called the Jenkins hash function