boost::dynamic_bitset concat performance

boost::dynamic_bitset concat performance - c++

I want to concat a big bitset with a smaller one in a way that wont kill performance. Currently my application spends 20% of cpu time in just the following code:
boost::dynamic_bitset<> encode(const std::vector<char>& data)
{
boost::dynamic_bitset<> result;
std::for_each(data.begin(), data.end(), [&](unsigned char symbol)
{
for(size_t n = 0; n < codes_[symbol].size(); ++n)
result.push_back(codes_[symbol][n]); // codes_[symbol][n].size() avarage ~5 bits
});
return result;
}
I have read this post which proposes a solution, which unfortunately will not work for me as the size difference between the sizes of destination bitset and the source bitset is very large.
Any ideas?
If this is not possible to do efficiently with boost::dynamic_bitset then I'm open for other suggestions.

This is because you keep using push_back(), but in actual fact, you already know the size in advance. This means lots of redundant copying and reallocating. You should resize it first. In addition, you don't have to push_back() every value- it should be possible for you to use some form of insert() (I don't actually know it's exact interface, but I think append() is the name) to insert the whole target vector at once, which should be significantly better.
In addition, you're leaving the dynamic_bitset as unsigned long, but as far as I can see, you're only actually inserting unsigned char into it. Changing that could make life easier for you.
I'm also curious as to what type codes_ is- if it's a map you could replace it with a vector, or infact since it's statically sized maximally (256 entries is the max of an unsigned char) , a static array.

I've tried using boost bitset in performance code before and been disappointed. I dug into it a bit, and concluded I'd be better off implementing my own bit-buffer class, although I forget the details of what convinced me boost's class was never going to be fast (I did get as far as inspecting the assembly produced).
I still don't know what the fastest way of building bit-buffers/bitsets/bitstreams or whatever you want to call them is. A colleague is trying to find out with this related question, but at time of writing it's still awaiting a good answer.

I wrote my own bitset class. I appreciate any suggestions for improvements. I will try to look into SSE and see if there is anything useful there.
With my very rough benchmark I got a 11x performance increase while appending 6 bits at a time.
class fast_bitset
{
public:
typedef unsigned long block_type;
static const size_t bits_per_block = sizeof(block_type)*8;
fast_bitset()
: is_open_(true)
, blocks_(1)
, space_(blocks_.size()*bits_per_block){}
void append(const fast_bitset& other)
{
assert(!other.is_open_);
for(size_t n = 0; n < other.blocks_.size()-1; ++n)
append(other.blocks_[n], bits_per_block);
append(other.blocks_.back() >> other.space_, bits_per_block - other.space_);
}
void append(block_type value, size_t n_bits)
{
assert(is_open_);
assert(n_bits < bits_per_block);
if(space_ < n_bits)
{
blocks_.back() = blocks_.back() << space_;
blocks_.back() = blocks_.back() | (value >> (n_bits - space_));
blocks_.push_back(value);
space_ = bits_per_block - (n_bits - space_);
}
else
{
blocks_.back() = blocks_.back() << n_bits;
blocks_.back() = blocks_.back() | value;
space_ -= n_bits;
}
}
void push_back(bool bit)
{
append(bit, 1);
}
bool operator[](size_t index) const
{
assert(!is_open_);
static const size_t high_bit = 1 << (bits_per_block-1);
const size_t block_index = index / bits_per_block;
const size_t bit_index = index % bits_per_block;
const size_t bit_mask = high_bit >> bit_index;
return blocks_[block_index] & bit_mask;
}
void close()
{
blocks_.back() = blocks_.back() << space_;
is_open_ = false;
}
size_t size() const
{
return blocks_.size()*bits_per_block-space_;
}
const std::vector<block_type>& blocks() const {return blocks_;}
class reader
{
public:
reader(const fast_bitset& bitset)
: bitset_(bitset)
, index_(0)
, size_(bitset.size()){}
bool next_bit(){return bitset_[index_++];}
bool eof() const{return index_ >= size_;}
private:
const fast_bitset& bitset_;
size_t index_;
size_t size_;
};
private:
bool is_open_;
std::vector<block_type> blocks_;
size_t space_;
};

Related

How construct hash function for a user defined type?

For example, in the following struct:
1) editLine is a pointer to a data line which has CLRF,
2) nDisplayLine is the display line index of this editLine,
3) start is the offset in the display line,
4) len is the length of the text;
struct CacheKey {
const CEditLine* editLine;
int32 nDisplayLine;
int32 start;
int32 len;
friend bool operator==(const CacheKey& item1, const CacheKey& item2) {
return (item1.start == item2.start && item1.len == item2.len && item1.nDisplayLine == item2.nDisplayLine &&
item1.editLine == item2.editLine);
}
CacheKey() {
editLine = NULL;
nDisplayLine = 0;
start = 0;
len = 0;
}
CacheKey(const CEditLine* editLine, int32 dispLine, int32 start, int32 len) :
editLine(editLine), nDisplayLine(dispLine), start(start), len(len)
{
}
int hash() {
return (int)((unsigned char*)editLine - 0x10000) + nDisplayLine * nDisplayLine + start * 2 - len * 1000;
}
};
Now I need to put it into a std::unordered_map<int, CacheItem> cacheMap_
The problem is how to design the hash function for this structure, is there any guidelines?
How could i make sure the hash function is collision-free?

To create a hash function, you can use std::hash, which is defined for integers. Then, you can combine them "as the boost guys does" (because doing a good hash is something non trivial) as explained here : http://en.cppreference.com/w/cpp/utility/hash.
Here is a hash_combine method :
inline void hash_combine(std::size_t& seed, std::size_t v)
{
seed ^= v + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
So the "guideline" is more or less what's is shown on cppreference.
You CAN'T be sure your hash function is colision free. Colision free means that you do not loose data (or you restrict yourself to a small set of possibilities for your class). If any int32 value is allowed for each fields, a collision free hash is a monstrously big index, and it won't fit in a small table. Let unordered_map take care of collisions, and combine std::hash hash as explained above.
In you case, it will look something like
std::size_t hash() const
{
std::size_t h1 = std::hash<CEditLine*>()(editLine);
//Your int32 type is probably a typedef of a hashable type. Otherwise,
// you'll have to static_cast<> it to a type supported by std::hash.
std::size_t h2 = std::hash<int32>()(nDisplayLine);
std::size_t h3 = std::hash<int32>()(start);
std::size_t h4 = std::hash<int32>()(len);
std::size_t hash = 0;
hash_combine(hash, h1);
hash_combine(hash, h2);
hash_combine(hash, h3);
hash_combine(hash, h4);
return hash;
}
Then, you can specialize the std::hash operator for your class.
namespace std
{
template<>
struct hash<CacheKey>
{
public:
std::size_t operator()(CacheKey const& s) const
{
return s.hash();
}
};
}

Pointer type casting altering unintended memory

#define ARRAY_SIZE 20
float DataSource[ARRAY_SIZE];
void Read(unsigned char const *Source, unsigned char *Destination, unsigned long DataSize)
{
for ( unsigned long i = 0; i < DataSize; i++)
{
*(Destination + i*DataSize) = *(Source + i*DataSize);
}
}
void fun()
{
int Index;
float Dest;
for ( Index = 0; Index < ARRAY_SIZE; Index++ )
{
Read((unsigned char *)&DataSource[Index], (unsigned char *)&Dest, sizeof(DataSource[Index]));
}
}
I'm having an issue with the above code where upon calling Read(), my Index variable gets overwritten and I am certain the ugly pointer casting is the culprit, but I'm having trouble understanding exactly what is happening here.
The unsigned char pointer types are mandatory because the above code is intended to simulate some driver level software and maintain the same prototype.
Can someone help me to understand the issue here? All the above code is changeable except for the prototype of Read().

The error is here:
for ( unsigned long i = 0; i < DataSize; i++)
{
// vvvvvvvvvv vvvvvvvvvv
*(Destination + i*DataSize) = *(Source + i*DataSize);
}
i * DataSize is always greater than i => "out of bound" access.
Replace with:
for ( unsigned long i = 0; i < DataSize; i++)
{
*(Destination + i) = *(Source + i);
}

You pass in a single float's address to Read (&Dest) and then proceed to write many valuese to consecutive memory locations. Since you're writing random memory at that point it's not unlikely that it could have overwritten index (and other stuff) because stacks usually grow downwards.

This is wrong:
*(Destination + i*DataSize) = *(Source + i*DataSize);
You want to copy DataSize adjacent bytes, not bytes DataSize apart (total span DataSize*DataSize)
Just say
Destination[i] = Source[i];

An amusing (to me) C++ way.
template<typename Data>
struct MemBlockRefHelper {
typedef Data value_type;
Data* data;
size_t size;
MemBlockRefHelper( Data* d, size_t s):data(d), size(s) {}
template<typename Target, typename Other=typename Target::value_type>
Target& Assign( MemBlockRefHelper<Other> const& other ) {
Assert(size == other.size);
for (size_t i = 0; i < size; ++i) {
if (i < other.size) {
data[i] = other.data[i];
} else {
data[i] = 0;
}
}
Target* self = static_cast<Target*>(this);
return *self;
}
};
struct MemBlockRef;
struct MemBlockCRef:MemBlockRefHelper<const unsigned char> {
MemBlockCRef( const unsigned char* d, size_t s ):MemBlockRefHelper<const unsigned char>( d, s ) {}
MemBlockCRef( const MemBlockRef& other );
};
struct MemBlockRef:MemBlockRefHelper<unsigned char> {
MemBlockRef( unsigned char* d, size_t s ):MemBlockRefHelper<unsigned char>( d, s ) {}
MemBlockRef& operator=( MemBlockRef const& other ) {
return Assign< MemBlockRef >( other );
}
MemBlockRef& operator=( MemBlockCRef const& other ) {
return Assign< MemBlockRef, const unsigned char >( other );
}
};
inline MemBlockCRef::MemBlockCRef( const MemBlockRef& other ): MemBlockRefHelper<const unsigned char>( other.data, other.size ) {}
void Read( unsigned char const* Source, unsigned char* Dest, unsigned long DataSize ) {
MemBlockCRef src( Source, DataSize );
MemBlockRef dest( Dest, DataSize );
dest = src;
}
massively over engineered, but the idea is to wrap up the idea of a block of POD memory of a certain size, and provide reference semantics to its contents (initialization is creating a new reference to the same data, assignment does a copy over the referred to data).
Once you have such classes, the code for Read becomes a 3 liner. Well, you can do it in one:
MemBlockRef( Dest, DataSize ) = MemBlockCRef( Source, DataSize );
but that is needless.
Well, so it this entire framework.
But I was amused by writing it.

Let's take a closer look at your Read(): i changes from 0 to DataSize-1; each time you access memory by an offset of i*DataSize... that is, by an offset from 0 to DataSize*(DataSize-1). Looks wrong, as DataSize**2-DataSize makes no sense.
Unlike other answers, I don't want to guess what you wanted. Just showing a kind of "dimensional analysis" that can help spotting the wrongest part of code without reading the author's mind.

You are treating the scalar variable Dest declared inside fun() as an array inside Read(). It seems that both Dest and your Index variable are placed adjacent on the stack which explains that Index gets overwritten exactly when the loop inside Read() is executed for i==1.
So the solution is: declare Dest as an array, too:
float Dest[ARRAY_SIZE];

efficent way to save objects into binary files

I've a class that consists basically of a matrix of vectors: vector< MyFeatVector<T> > m_vCells, where the outer vector represents the matrix. Each element in this matrix is then a vector (I extended the stl vector class and named it MyFeatVector<T>).
I'm trying to code an efficient method to store objects of this class in binary files.
Up to now, I require three nested loops:
foutput.write( reinterpret_cast<char*>( &(this->at(dy,dx,dz)) ), sizeof(T) );
where this->at(dy,dx,dz) retrieves the dz element of the vector at position [dy,dx].
Is there any possibility to store the m_vCells private member without using loops? I tried something like: foutput.write(reinterpret_cast<char*>(&(this->m_vCells[0])), (this->m_vCells.size())*sizeof(CFeatureVector<T>)); which seems not to work correctly. We can assume that all the vectors in this matrix have the same size, although a more general solution is also welcomed :-)
Furthermore, following my nested-loop implementation, storing objects of this class in binary files seem to require more physical space than storing the same objects in plain-text files. Which is a bit weird.
I was trying to follow the suggestion under http://forum.allaboutcircuits.com/showthread.php?t=16465 but couldn't arrive into a proper solution.
Thanks!
Below a simplified example of my serialization and unserialization methods.
template < typename T >
bool MyFeatMatrix<T>::writeBinary( const string & ofile ){
ofstream foutput(ofile.c_str(), ios::out|ios::binary);
foutput.write(reinterpret_cast<char*>(&this->m_nHeight), sizeof(int));
foutput.write(reinterpret_cast<char*>(&this->m_nWidth), sizeof(int));
foutput.write(reinterpret_cast<char*>(&this->m_nDepth), sizeof(int));
//foutput.write(reinterpret_cast<char*>(&(this->m_vCells[0])), nSze*sizeof(CFeatureVector<T>));
for(register int dy=0; dy < this->m_nHeight; dy++){
for(register int dx=0; dx < this->m_nWidth; dx++){
for(register int dz=0; dz < this->m_nDepth; dz++){
foutput.write( reinterpret_cast<char*>( &(this->at(dy,dx,dz)) ), sizeof(T) );
}
}
}
foutput.close();
return true;
}
template < typename T >
bool MyFeatMatrix<T>::readBinary( const string & ifile ){
ifstream finput(ifile.c_str(), ios::in|ios::binary);
int nHeight, nWidth, nDepth;
finput.read(reinterpret_cast<char*>(&nHeight), sizeof(int));
finput.read(reinterpret_cast<char*>(&nWidth), sizeof(int));
finput.read(reinterpret_cast<char*>(&nDepth), sizeof(int));
this->resize(nHeight, nWidth, nDepth);
for(register int dy=0; dy < this->m_nHeight; dy++){
for(register int dx=0; dx < this->m_nWidth; dx++){
for(register int dz=0; dz < this->m_nDepth; dz++){
finput.read( reinterpret_cast<char*>( &(this->at(dy,dx,dz)) ), sizeof(T) );
}
}
}
finput.close();
return true;
}

A most efficient method is to store the objects into an array (or contiguous space), then blast the buffer to the file. An advantage is that the disk platters don't have waste time ramping up and also the writing can be performed contiguously instead of in random locations.
If this is your performance bottleneck, you may want to consider using multiple threads, one extra thread to handle the output. Dump the objects into a buffer, set a flag, then the writing thread will handle the output, releaving your main task to perform more important tasks.
Edit 1: Serializing Example
The following code has not been compiled and is for illustrative purposes only.
#include <fstream>
#include <algorithm>
using std::ofstream;
using std::fill;
class binary_stream_interface
{
virtual void load_from_buffer(const unsigned char *& buf_ptr) = 0;
virtual size_t size_on_stream(void) const = 0;
virtual void store_to_buffer(unsigned char *& buf_ptr) const = 0;
};
struct Pet
: public binary_stream_interface,
max_name_length(32)
{
std::string name;
unsigned int age;
const unsigned int max_name_length;
void load_from_buffer(const unsigned char *& buf_ptr)
{
age = *((unsigned int *) buf_ptr);
buf_ptr += sizeof(unsigned int);
name = std::string((char *) buf_ptr);
buf_ptr += max_name_length;
return;
}
size_t size_on_stream(void) const
{
return sizeof(unsigned int) + max_name_length;
}
void store_to_buffer(unsigned char *& buf_ptr) const
{
*((unsigned int *) buf_ptr) = age;
buf_ptr += sizeof(unsigned int);
std::fill(buf_ptr, 0, max_name_length);
strncpy((char *) buf_ptr, name.c_str(), max_name_length);
buf_ptr += max_name_length;
return;
}
};
int main(void)
{
Pet dog;
dog.name = "Fido";
dog.age = 5;
ofstream data_file("pet_data.bin", std::ios::binary);
// Determine size of buffer
size_t buffer_size = dog.size_on_stream();
// Allocate the buffer
unsigned char * buffer = new unsigned char [buffer_size];
unsigned char * buf_ptr = buffer;
// Write / store the object into the buffer.
dog.store_to_buffer(buf_ptr);
// Write the buffer to the file / stream.
data_file.write((char *) buffer, buffer_size);
data_file.close();
delete [] buffer;
return 0;
}
Edit 2: A class with a vector of strings
class Many_Strings
: public binary_stream_interface
{
enum {MAX_STRING_SIZE = 32};
size_t size_on_stream(void) const
{
return m_string_container.size() * MAX_STRING_SIZE // Total size of strings.
+ sizeof(size_t); // with room for the quantity variable.
}
void store_to_buffer(unsigned char *& buf_ptr) const
{
// Treat the vector<string> as a variable length field.
// Store the quantity of strings into the buffer,
// followed by the content.
size_t string_quantity = m_string_container.size();
*((size_t *) buf_ptr) = string_quantity;
buf_ptr += sizeof(size_t);
for (size_t i = 0; i < string_quantity; ++i)
{
// Each string is a fixed length field.
// Pad with '\0' first, then copy the data.
std::fill((char *)buf_ptr, 0, MAX_STRING_SIZE);
strncpy(buf_ptr, m_string_container[i].c_str(), MAX_STRING_SIZE);
buf_ptr += MAX_STRING_SIZE;
}
}
void load_from_buffer(const unsigned char *& buf_ptr)
{
// The actual coding is left as an exercise for the reader.
// Psuedo code:
// Clear / empty the string container.
// load the quantity variable.
// increment the buffer variable by the size of the quantity variable.
// for each new string (up to the quantity just read)
// load a temporary string from the buffer via buffer pointer.
// push the temporary string into the vector
// increment the buffer pointer by the MAX_STRING_SIZE.
// end-for
}
std::vector<std::string> m_string_container;
};

I'd suggest you to read C++ FAQ on Serialization and you can choose what best fits for your
When you're working with structures and classes, you've to take care of two things
Pointers inside the class
Padding bytes
Both of these could make some notorious results in your output. IMO, the object must implement to serialize and de-serialize the object. The object can know well about the structures, pointers data etc. So it can decide which format can be implemented efficiently.
You will have to iterate anyway or has to wrap it somewhere. Once you finished implementing the serialization and de-serialization function (either you can write using operators or functions). Especially when you're working with stream objects, overloading << and >> operators would be easy to pass the object.
Regarding your question about using underlying pointers of vector, it might work if it's a single vector. But it's not a good idea in the other way.
Update according to the question update.
There are few things you should mind before overriding STL members. They're not really a good candidate for inheritance because it doesn't have any virtual destructors. If you're using basic data types and POD like structures it wont make much issues. But if you use it truly object oriented way, you may face some unpleasant behavior.
Regarding your code
Why you're typecasting it to char*?
The way you serialize the object is your choice. IMO what you did is a basic file write operation in the name of serialization.
Serialization is down to the object. i.e the parameter 'T' in your template class. If you're using POD, or basic types no need of special synchronization. Otherwise you've to carefully choose the way to write the object.
Choosing text format or binary format is your choice. Text format has always has a cost at the same time it's easy to manipulate it rather than binary format.
For example the following code is for simple read and write operation( in text format).
fstream fr("test.txt", ios_base::out | ios_base::binary );
for( int i =0;i <_countof(arr);i++)
fr << arr[i] << ' ';
fr.close();
fstream fw("test.txt", ios_base::in| ios_base::binary);
int j = 0;
while( fw.eof() || j < _countof(arrout))
{
fw >> arrout[j++];
}

It seems to me, that the most direct root to generate a binary file containing a vector is to memory map the file and place it in the mapped region. As pointed out by sarat, you need to worry about how pointers are used within the class. But, boost-interprocess library has a tutorial on how to do this using their shared memory regions which include memory mapped files.

First off, have you looked at Boost.multi_array? Always good to take something ready-made rather than reinventing the wheel.
That said, I'm not sure if this is helpful, but here's how I would implement the basic data structure, and it'd be fairly easy to serialize:
#include <array>
template <typename T, size_t DIM1, size_t DIM2, size_t DIM3>
class ThreeDArray
{
typedef std::array<T, DIM1 * DIM2 * DIM3> array_t;
array_t m_data;
public:
inline size_t size() const { return data.size(); }
inline size_t byte_size() const { return sizeof(T) * data.size(); }
inline T & operator()(size_t i, size_t j, size_t k)
{
return m_data[i + j * DIM1 + k * DIM1 * DIM2];
}
inline const T & operator()(size_t i, size_t j, size_t k) const
{
return m_data[i + j * DIM1 + k * DIM1 * DIM2];
}
inline const T * data() const { return m_data.data(); }
};
You can serialize the data buffer directly:
ThreeDArray<int, 4, 6 11> arr;
/* ... */
std::ofstream outfile("file.bin");
outfile.write(reinterpret_cast<char*>(arr.data()), arr.byte_size());

C++ hash table w/o using STL

I need to create a hash table that has a key as a string, and value as an int. I cannot use STL containers on my target. Is there a suitable hash table class for this purpose?

Here's a quick a dirty C hash I just wrote. Compiles, but untested locally. Still, the idea is there for you to run with it as needed. The performance of this is completely dependant upon the keyToHash function. My version will not be high performance, but again demonstrates how to do it.
static const int kMaxKeyLength = 31;
static const int kMaxKeyStringLength = kMaxKeyLength + 1;
struct HashEntry
{
int value;
char key[kMaxKeyLength];
};
static const char kEmptyHash[2] = "";
static const int kHashPowerofTwo = 10;
static const int kHashSize = 1 &lt&lt kHashPowerofTwo;
static const int kHashMask = kHashSize - 1;
static const int kSmallPrimeNumber = 7;
static HashEntry hashTable[kHashSize];
int keyToHash(const char key[])
{
assert(strlen(key) &lt kMaxKeyLength);
int hashValue = 0;
for(int i=0; &lt strlen(key); i++)
{
hashValue += key[i];
}
return hashValue;
}
bool hashAdd(const char key[], const int value)
{
int hashValue = keyToHash(key);
int hashFullSentinal = 0;
while(strcmp(hashTable[hashValue & kHashMask].key, kEmptyHash))
{
hashValue += kSmallPrimeNumber;
if(hashFullSentinal++ &gt= (kHashSize - 1))
{
return false;
}
}
strcpy(hashTable[hashValue & kHashMask].key, key);
hashTable[hashValue & kHashMask].value = value;
return true;
}
bool hashFind(const char key[], int *value)
{
int hashValue = keyToHash(key);
while(strcmp(hashTable[hashValue & kHashMask].key, kEmptyHash))
{
if(!strcmp(hashTable[hashValue & kHashMask].key, key))
{
*value = hashTable[hashValue & kHashMask].value;
return true;
}
}
return false;
}
bool hashRemove(const char key[])
{
int hashValue = keyToHash(key);
while(strcmp(hashTable[hashValue & kHashMask].key, kEmptyHash))
{
if(!strcmp(hashTable[hashValue & kHashMask].key, key))
{
hashTable[hashValue & kHashMask].value = 0;
hashTable[hashValue & kHashMask].key[0] = 0;
return true;
}
}
return false;
}

In the case that you know your list of keys ahead of time (or some superset thereof), you can use a perfect hash function generator like gperf. gperf will spit out either C or C++ code.
(You may need to do some work to actually build a container, given the hash function, though.)

You can use the unordered associative container from Boost, aka. boost::unordered_map, which is implemented in terms of a hash table.

It's a moot point since STL has no hash table container; std::map would be the alternative. For most purposes there is no reason not to use std::map. For uses that require a hashtable, boost::unordered_map is the best choice (and I think matches the hashtable defined in the new C++ TR1 proposed standard. Some compilers -- but I can't name them -- may provide the TR1 hashtable as std::tr1::unordered_map

You might want to check out glib hash tables
http://library.gnome.org/devel/glib/stable/glib-Hash-Tables.html

If you need maximum performance, use MCT's closed_hash_map or Google's dense_hash_map. The former is easier to use, the latter is more mature. Your use case sounds like it would benefit from closed hashing.

What is the best way of comparing a string variable to a set of string constants?

if statement looks too awkward, because i need a possibility to increase the number of constatnts.
Sorry for leading you into delusion by that "constant" instead of what i meant.

Add all your constants to a std::set then you can check if the set contains your string with
std::set<std::string> myLookup;
//populate the set with your strings here
set<std::string>::size_type i;
i = myLookup.count(searchTerm);
if( i )
std::cout << "Found";
else
std::cout << "Not found";

Depends whether you care about performance.
If not, then the simplest code is probably to put the various strings in an array (or vector if you mean you want to increase the number of constants at run time). This will also be pretty fast for a small number of strings:
static const char *const strings[] = { "fee", "fie", "fo", "fum" };
static const int num_strings = sizeof(strings) / sizeof(char*);
Then either:
int main() {
const char *search = "foe";
bool match = false;
for (int i = 0; i < num_strings; ++i) {
if (std::strcmp(search, strings[i]) == 0) match = true;
}
}
Or:
struct stringequal {
const char *const lhs;
stringequal(const char *l) : lhs(l) {}
bool operator()(const char *rhs) {
return std::strcmp(lhs, rhs) == 0;
}
};
int main() {
const char *search = "foe";
std::find_if(strings, strings+num_strings, stringequal(search));
}
[Warning: I haven't tested the above code, and I've got the signatures wrong several times already...]
If you do care about performance, and there are a reasonable number of strings, then one quick option would be something like a Trie. But that's a lot of effort since there isn't one in the standard C++ library. You can get much of the benefit either using a sorted array/vector, searched with std::binary_search:
// These strings MUST be in ASCII-alphabetical order. Don't add "foo" to the end!
static const char *const strings[] = { "fee", "fie", "fo", "fum" };
static const int num_strings = sizeof(strings) / sizeof(char*);
bool stringcompare(const char *lhs, const char *rhs) {
return std::strcmp(lhs, rhs) < 0;
}
std::binary_search(strings, strings+num_strings, "foe", stringcompare);
... or use a std::set. But unless you're changing the set of strings at runtime, there is no advantage to using a set over a sorted array with binary search, and a set (or vector) has to be filled in with code whereas an array can be statically initialized. I think C++0x will improve things, with initializer lists for collections.

Put the strings to be compared in a static vector or set and then use std::find algorithm.

The technically best solution is: build a 'perfect hash function' tailored to your set of string constants, so later there are no collisions during hashing.

const char * values[]= { "foo", "bar", ..., 0 };
bool IsValue( const std::string & s ) {
int i = 0;
while( values[i] ) {
if ( s == values[i] ) {
return true;
}
i++;
}
return false;
}
Or use a std::set.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

boost::dynamic_bitset concat performance - c++

Related

How construct hash function for a user defined type?

Pointer type casting altering unintended memory

efficent way to save objects into binary files

C++ hash table w/o using STL

What is the best way of comparing a string variable to a set of string constants?

Categories

Resources