MongoDb creating sparse index using c++ driver - c++

Is there a way to create a sparse index using the MongoDb (2.2) C++ driver?
It seems that the ensureIndex function does not accept this argument. From MongoDb docs:
bool mongo::DBClientWithCommands::ensureIndex(
const string & ns,
BSONObj keys,
bool unique = false,
const string & name = "",
bool cache = true,
bool background = false,
int v = -1)

For that matter, dropDups isn't an argument either...
As a workaround, you can build the server command yourself and append the sparse argument. If you follow this link, you'll notice that the server command consists of building a BSONObject and the various index options are appended as fields. It should be straightforward to write your own version of ensureIndex.

I ended up patching the Mongo source code (mongo/cleint/dbclient.cpp):
bool DBClientWithCommands::ensureIndex( const string &ns , BSONObj keys , bool unique, const string & name , bool cache, bool background, int version, bool sparse ) {
BSONObjBuilder toSave;
toSave.append( "ns" , ns );
toSave.append( "key" , keys );
string cacheKey(ns);
cacheKey += "--";
if ( name != "" ) {
toSave.append( "name" , name );
cacheKey += name;
}
else {
string nn = genIndexName( keys );
toSave.append( "name" , nn );
cacheKey += nn;
}
if( version >= 0 )
toSave.append("v", version);
if ( unique )
toSave.appendBool( "unique", unique );
if ( sparse )
toSave.appendBool( "sparse", true );
if( background )
toSave.appendBool( "background", true );
if ( _seenIndexes.count( cacheKey ) )
return 0;
if ( cache )
_seenIndexes.insert( cacheKey );
insert( Namespace( ns.c_str() ).getSisterNS( "system.indexes" ).c_str() , toSave.obj() );
return 1;
}
The issue should be resolved in version 2.3.2

Related

Adding XPointer/XPath searches to ALL(?) C++ JSON libraries, is it doable?

Is it possible to extend all(?) existing C++ JSON libraries with XPath/XPointer or subset with just one C++ implementation? At least those with iterators for object and array values?
I have reviewed three C++ JSON libraries (reviewing nlohmann, Boost.JSON and RapidJSON) to see the internals and check their search functionality. Some have implemented Json pointer. Json pointer is basic, almost like working with json as a name-value list.
XML has XPath and XPointer searches and rules are standardized. With XPath and XPointer you can do more.
One reason to reviewing these libraries was to see if it is possible to extend any of them with better search functionality. Or might it be possible to extend all(?) C++ JSON libraries at once?
A longer text describing this can be found here, trying to be brief.
I tried to do one traverse method that selects json values with one specific property name and that method should work an all tested JSON libraries. If I got that to work it may be possible to add more search logic and get it to work on almost all C++ JSON.
I got this C++ templated function to work an all tested json libraries. It can walk the JSON tree and select json values on all tested libraries.
What is needed to is to implement specializations of is_object, is_array, compare_name, get_value, begin and end. That are just one liners so it's easy.
template<typename json_value>
bool is_object( const json_value* p )
{ static_assert(sizeof(json_value) == 0, "Only specializations of is_object is allowed"); }
template<typename json_value>
bool is_array( const json_value* p )
{ static_assert(sizeof(json_value) == 0, "Only specializations of is_array is allowed"); }
template<typename iterator>
bool compare_name( iterator it, std::string_view stringName )
{ static_assert(sizeof(it) == 0, "Only specializations of compare_name is allowed"); }
template<typename iterator, typename json_value>
const json_value* get_value( iterator it )
{ static_assert(sizeof(it) == 0, "Only specializations of get_value is allowed"); }
template<typename iterator, typename json_value>
iterator begin( const json_value& v ) { return std::begin( v ); }
template<typename iterator, typename json_value>
iterator end( const json_value& v ) { return std::end( v ); }
// ------------------------------------------------
// Selects all json values that match property name
template<typename json_value, typename object_iterator,typename array_iterator = object_iterator>
uint32_t select( const json_value& jsonValue, std::string_view stringQuery, std::vector<const json_value*>* pvectorValue = nullptr )
{ assert( is_object( &jsonValue ) || is_array( &jsonValue ) );
uint32_t uCount = 0;
if( is_object( &jsonValue ) == true ) // found object ?
{
for( auto it = begin<object_iterator,json_value>( jsonValue ); it != end<object_iterator,json_value>( jsonValue ); it++ )
{
if( is_object( get_value<object_iterator,json_value>( it ) ) == true )
{ // found object, scan it
auto value = get_value<object_iterator,json_value>( it );
uCount += select<json_value,object_iterator>( *value, stringQuery, pvectorValue );
}
else if( is_array( get_value<object_iterator,json_value>( it ) ) == true )
{ // found array, scan it
auto parray = get_value<object_iterator,json_value>( it );
uCount += select<json_value,object_iterator,array_iterator>( *parray, stringQuery, pvectorValue );
}
else if( compare_name<object_iterator>( it, stringQuery ) == true )
{ // property name matches, store value if pointer to vector
if( pvectorValue != nullptr ) pvectorValue->push_back( get_value<object_iterator,json_value>( it ) );
uCount++;
}
}
}
else if( is_array( &jsonValue ) == true ) // found array
{
for( auto it = begin<array_iterator,json_value>( jsonValue ); it != end<array_iterator,json_value>( jsonValue ); it++ )
{
if( is_object( get_value<array_iterator,json_value>( it ) ) == true )
{ // found object, scan it
auto value = get_value<array_iterator,json_value>( it );
uCount += select<json_value,object_iterator>( *value, stringQuery, pvectorValue );
}
else if( is_array( get_value<array_iterator,json_value>( it ) ) == true )
{ // found array, scan it
auto parray = get_value<array_iterator,json_value>( it );
uCount += select<json_value,object_iterator,array_iterator>( *parray, stringQuery, pvectorValue );
}
}
}
return uCount;
}
if this works and if I haven't forgot something, shouldn't it be possible to extend all libraries with just one implementation? The additional logic for XPath and XPointer is not dependent on the implementation of these C++ JSON libraries.
Am I missing something

How to manage memory while reading CSV using Apache Arrow C++ API?

I don't understand memory management in C++ Arrow API. I use Arrow 1.0.0 and I'm reading CSV file. After a few runs of ReadArrowTableFromCSV, my memory is full of allocated data. Am I missing something? How can I free that memory? I don't see any method in memory pool to clear all allocated memory. Code Listing below.
void LoadCSVData::ReadArrowTableFromCSV( const std::string & filePath )
{
auto tableReader = CreateTableReader( filePath );
ReadArrowTableUsingReader( *tableReader );
}
std::shared_ptr<arrow::csv::TableReader> LoadCSVData::CreateTableReader( const std::string & filePath )
{
arrow::MemoryPool* pool = arrow::default_memory_pool();
auto tableReader = arrow::csv::TableReader::Make( pool, OpenCSVFile( filePath ),
*PrepareReadOptions(), *PrepareParseOptions(), *PrepareConvertOptions() );
if ( !tableReader.ok() )
{
throw BadParametersException( std::string( "CSV file reader error: " ) + tableReader.status().ToString() );
}
return *tableReader;
}
void LoadCSVData::ReadArrowTableUsingReader( arrow::csv::TableReader & reader )
{
auto table = reader.Read();
if ( !table.ok() )
{
throw BadParametersException( std::string( "CSV file reader error: " ) + table.status().ToString() );
}
this->mArrowTable = *table;
}
std::unique_ptr<arrow::csv::ParseOptions> LoadCSVData::PrepareParseOptions()
{
auto parseOptions = std::make_unique<arrow::csv::ParseOptions>( arrow::csv::ParseOptions::Defaults() );
parseOptions->delimiter = mDelimiter;
return parseOptions;
}
std::unique_ptr<arrow::csv::ReadOptions> LoadCSVData::PrepareReadOptions()
{
auto readOptions = std::make_unique<arrow::csv::ReadOptions>( arrow::csv::ReadOptions::Defaults() );
readOptions->skip_rows = mNumberOfHeaderRows;
readOptions->block_size = 1 << 27; // 128 MB
readOptions->column_names.reserve( mTable->GetNumberOfColumns() );
for ( auto & colName : mTable->GetColumnsOrder() )
{
readOptions->column_names.emplace_back( colName );
}
return readOptions;
}
std::unique_ptr<arrow::csv::ConvertOptions> LoadCSVData::PrepareConvertOptions() const
{
auto convertOptions = std::make_unique<arrow::csv::ConvertOptions>( arrow::csv::ConvertOptions::Defaults() );
for ( auto & col : mTable->GetColumsInfo() )
{
convertOptions->column_types[col.second.GetName()] = MyTypeToArrowDataType( col.second.GetType() );
}
convertOptions->strings_can_be_null = true;
return convertOptions;
}
std::shared_ptr<arrow::io::ReadableFile> LoadCSVData::OpenCSVFile( const std::string & filePath )
{
MTR_SCOPE_FUNC();
auto inputFileResult = arrow::io::ReadableFile::Open( filePath );
if ( !inputFileResult.ok() )
{
throw BadParametersException( std::string( "CSV file reader error: " ) + inputFileResult.status().ToString() );
}
return *inputFileResult;
}
Maciej, the method TableReader::Read should return shared_ptr<arrow::Table>. The arrow::Table itself has a number of shared pointers to structures which eventually contain the data. To free up the data you will need to make sure that the arrow::Table and any copies of it are destroyed. This should happen as soon as that shared_ptr goes out of scope. However, it appears you are storing the table in a member variable here (which is to be expected, you probably want to use the data after you read it):
this->mArrowTable = *table;
So now you have a second copy of the arrow::Table instance. You could reassign this->mArrowTable to a new blank table or you could destroy whatever this is. Of course, if you are making any other copies of the table then you will need to ensure those go out of scope as well.

Classes with pointers and binary files

Well guys/girls, I already asked this but I think I didn't explain good and I couldn't find the solution so I'll ask again with more details and explain more the context of my problem.
I have two classes that contain user data and I want to save them in binary files. On the other hand, I have a template class responsible for save these classes.
There is a really important fact that I have to mention: in the beginning I chose encode a auxiliary class for any class that I would save. This auxiliary class is responsible of writing/reading of data. The original classes have string members and the auxiliary classes have pointers to char. But recently, looking for more simplicity and flexibility, I chose to combine the original class, that contains the benefits of string class ;and its auxiliary that has the pointers that makes the class more comfortable at the moment of save it. So, instead of have two classes, I have one class that handles the input/output of data and the write/read of data.
This change looks something like this:
class AuxNOTE;
//Original Class: Input/Output of Data
class NOTE{
private:
string _Category;
string _Description;
public:
NOTE() : _Category( "" ) , _Description( "" ) { }
NOTE( const NOTE & note ) : _Category( note._Category )
, _Description( note._Description ) { }
NOTE( string category , string description ) : _Category( category)
, _Description( description ) { }
NOTE( const AuxNOTE & aux ) : _Category( aux._Category )
, _Description( aux._Description ) { }
NOTE & operator=( const NOTE & note ){
_Category = note._Category;
_Description = note._Description;
return *this;
}
NOTE & operator=( const AuxNOTE & aux ){
_Category = string( aux._Category );
_Description = string( aux._Description );
return *this;
}
string GetCategory() const { return _Category; }
string GetDescription() const { return _Description; }
void SetCategory( string category ) { _Category = category; }
void SetDescription( string description ) { _Description = description; }
};
//Auxliary Class: Writing/Reading of Data to/from binary files
class AuxNOTE{
private:
char _Category[50];
char _Description[255];
public:
AuxNOTE(){ }
AuxNOTE( const NOTE & note ){
strcpy( _Category , note._Category );
strcpy( _Description , note._Description);
}
AuxNOTE & operator=( const NOTE & note ){
strcpy( _Category , note._Category );
strcpy( _Description , note._Description );
return *this;
}
};
What I have now is something like this:
//Class NOTE: Input/Output of Data and Writing/Reading to/from binary files.
// .h file
class NOTE{
private:
char * _Category;
char * _Description;
public:
NOTE();
NOTE( const NOTE & note );
NOTE( string category , string description );
NOTE & operator=( const NOTE & note )
string GetCategory() const;
string GetDescription() const;
void SetCategory( string category );
void SetDescription( string description );
};
// .cpp file
#include "NOTE.h"
NOTE :: NOTE() : _Category( nullptr ) ,_Description( nullptr )
{
}
NOTE :: NOTE( string description , string category )
: _Category ( new char[ category.size() + 1 ] )
, _Categoria( new char[ description.size() + 1 ] )
{
strcpy( _Categoria , category.c_str() );
strcpy( _Descripcion , description.c_str() );
}
NOTE :: NOTE (const NOTE & copy )
: _Category( nullptr )
, _Description nullptr )
{
if( copy._Description != nullptr ){
_Description = new char[ strlen( copy._Description ) + 1 ];
strcpy( _Description , copy._Description );
}
if( copy._Category != nullptr ){
_Category = new char[ strlen( copy._Category ) + 1 ];
strcpy( _Category , copy._Category );
}
}
NOTE :: ~NOTE() {
if( _Description != nullptr ) delete [] _Description;
if( _Category != nullptr ) delete [] _Category;
}
//Get Methods
string NOTE :: GetDescription() const { return string(_Description); }
string NOTE :: GetCategory() const { return string(_Category); }
//Set Methods
void NOTE :: SetDescription( string description ){
if( _Description != nullptr ) delete [] _Description;
_Description = new char[ description.size() + 1 ];
strcpy( _Description , description.c_str() );
}
void NOTE :: SetCategory( string category ){
if( m_Category != nullptr ) delete [] _Category;
_Category = new char[ category.size() + 1 ];
strcpy( _Category , category.c_str() );
}
//Operators
NOTE & NOTE :: operator=( const NOTE & note ){
if( note._Description != nullptr ) SetDescription( note.GetDescription() );
if( note._Category != nullptr ) SetCategory( note.GetCategory() );
return *this;
}
Note that the public interface looks like if the NOTE class works with string members but it doesn't because it works with pointers to char. Thus the NOTE class can be saved without any problem. However, the class is not responsible at all of writing/reading but I created another class that can save any class as long as these classes have members that can be saved.
And the class that is responsible of this is a template class and looks like this:
template< class T >
class SAVER{
private:
vector< T > _Vector;
string _File;
public:
SAVER( string file );
~SAVER();
};
template< class T >
SAVER< T > :: SAVER( string file ) : _File( file ){
assert( _File != "" );
ifstream file( _File , ios::binary );
if( file.is_open() ){
T obj;
while( file.read( reinterpret_cast<char*>(&obj) , sizeof(obj) ) )
_Vector.push_back( obj );
}
}
template< class T >
Saver< T > :: ~Saver() {
if( _Vector.empty() )
return;
ofstream file( _File , ios::binary | ios::trunc );
assert( file.is_open() );
auto itr = _Vector.begin();
auto end = _Vector.end();
while( itr != end ){
if ( !file.write( reinterpret_cast<char*>( &itr ) , sizeof(itr) ) )
break;
itr++;
}
}
The SAVER's constructor handles the reading and puts the data( e.g NOTE objects ) in its vector. The destroyer handles the writing of all vector's objects into the corresponding binary file.
I got to clear that my errors aren't compile error but they are runtime errors.
Now, This is the problem I have:
When I execute the entire program, it has to read the binary file but it breaks. I open it with the debugger and I see that the program finishes in this line with a "segmentation fault error" and this comes from the SAVER constructor:
NOTE :: ~NOTE() {
if( _Description != nullptr ) delete [] _Description; //It breaks at this line
if( _Category != nullptr ) delete [] _Category;
}
In the debugger I can see the value of _Description and next to it appears an memory error that says: error: Cannot access memory at address (value of _Description).
Why is this happen? Do you see any error? If you need more information or you don't understand something just let me know.
First, search the internet for "c++ serialization library". What you are performing is called serialization.
Pointers and any class that contains pointers cannot be written verbatim to a file. The pointers are locations in memory. There is no guarantee by most Operating Systems that your program will have the exact memory locations next time it is executed. Your program may run in different areas of memory which change where your data is stored.
There are techniques around this, such as either writing the quantity first, then the data or writing the data then some kind of sentinel (such as the '\0' in C-Style strings).
Consider not writing as a binary file, but using formatted textual representations. A number will be read in by many platforms and converted to native representations. A native representation written in binary mode to a file, many not be the same on another platform (look up "Endianess"). Also, most text editors and word processors can easily read text files. Reading and interpreting a binary file is more difficult.
Unless your application's bottleneck is I/O bound and the I/O timing is critical, consider using textual representation of data. It is easier to read (especially when debugging your program) and easily portable.
In the debugger I can see the value of _Description and next to it appears an memory error that says: error: Cannot access memory at address (value of _Description).
Sure, you cannot deserialize pointers from your binary. You need to store their size information and contents in the file instead.

C++ Prepend DateTime to console output

Searched around and couldn't find any info on this besides redirecting to files so hopefully someone can help me out.
I have a console application that launches and hooks another process, by default the new process output displays in the first console application.
What I would like to do is prepend a datetime value to all the output, the problem is that I don't control the output from the child process (3rd party app) so the easy solution of adding a datetime to all printed values is unavailable. Is it possible to prepend a string to all stdout?
Since you confirmed C++ (which implies a solution for
std::cout is what is needed, and not for stdout): the
obvious solution is a filtering streambuf:
class TimeStampStreambuf : public std::streambuf
{
std::streambuf* myDest;
std::ostream* myOwner;
bool myIsAtStartOfLine;
protected:
int overflow( int ch )
{
// To allow truly empty lines, otherwise drop the
// first condition...
if ( ch != '\n' && myIsAtStartOfLine ) {
std::string tmp = now();
// function now() should return the timestamp as a string
myDest->sputn( tmp.data(), tmp.size() );
}
myIsAtStartOfLine = ch == '\n';
ch = myDest->sputc( ch );
return ch;
}
public:
TimeStampStreambuf( std::streambuf* dest )
: myDest( dest )
, myOwner( nullptr )
, myIsAtStartOfLine( false )
{
}
TimeStampStreambuf( std::ostream& owner )
: myDest( dest.rdbuf() )
, myOwner( &owner )
, myIsAtStartOfLine( false )
{
myOwner->rdbuf( this );
}
~TimeStampStreambuf()
{
if ( myOwner != nullptr ) {
myOwner->rdbuf( myDest );
}
}
};
And to install it:
// scoped:
TimeStampStreambuf anyName( std::cout );
// Time stamping will be turned off when variable goes out
// of scope.
// unscoped:
std::streambuf* savedStreambuf = std::cout.rdbuf();
TimeStampStreambuf name( savedStreambuf );
// In this case, you have to restore the original streambuf
// yourself before calling exit.
As far as I know, the articles explaining this (from C++
Reports, Sept. 1998) are not on line.
EDIT:
I've actually found them:
http://gabisoft.free.fr/articles-en.html. I can't believe that
the link still works; it's been years since I had an account
with Free.

Linear Probing Hash Function not working?

What am i doing wrong with my linear probing function?
bool HashTable_lp::insert( const string & x )
{
// Insert x as active
int currentPos = findPos( x );
if( isActive( currentPos ) )
return true;
array[ currentPos ] = HashEntry( x, ACTIVE );
if( ++currentSize > array.size( ) / 2 )
rehash( );
return false;
}
You generally want a while loop until you find an empty slot. The other problem is you were equating a cell that your string hashes to being active to meaning that the cell contains the same value you are trying to insert. In general, you need to check the value of your key to insert against existing entries. It may be worth making an insert version that does not do this check for cases where the user can guarantee the item is not already in the hash table to avoid these comparisons...
bool HashTable_lp::insert( const string & x )
{
// Insert x as active
int currentPos = findPos( x );
while( isActive( currentPos ) ){
// here key gets out of array[currentPos] the string key
if(key(currentPos)==x) return true; // already in hash
currentPos++; // linearly probe to next guy
}
// at this point we know currentPos is empty
array[ currentPos ] = HashEntry( x, ACTIVE );
if( ++currentSize > array.size( ) / 2 ) // ensure load factor is .5 to guarantee probing works!
rehash( );
return false;
}