Design direction for multiple format parsing

Design direction for multiple format parsing - c++

I am writing an app to parse lines in a text file. The problem is that I need to be able to load different routines depending on a variable set at run-time. I can not change the format of the incoming file.
int intFormat = 1; //Loaded from INI file
void __fastcall TForm1::Button1Click(TObject *Sender) {
myFileConverstion *myFC;
switch(intFormat) {
case 1:
myFC = new FileConverstionCompanyA();
case 2:
myFC = new FileConverstionCompanyB();
}
myFileConverstion->Execute("fileName");
}
Within ->Execute(), I would be calling private (or protected) methods to do the parsing. There are some of the methods that could be used across all formats, too.
What would be the best OOP way to do this?
Create a virtual object i.e: myFileConverstion? Then inherit from that for the CompanyA, B, C, etc.
write myFileConverstion with all the common methods (private/protected) and a virtual Execute(). Then just change the Execute() internals for the various "companies"?
I'm looking for some guidance.
Haven't really tried anything yet, I'm in the planning stage.

There would also be the possibility to realize the whole thing with std:.function by inserting them into some kind of lookup map. Here is some example:
#include <functional>
#include <iostream>
#include <unordered_map>
class FileConverter
{
using convert_engine_t = std::function< bool( const std::string& ) >;
std::unordered_map< size_t, convert_engine_t > engines;
convert_engine_t *activeEngine { nullptr };
public:
template < class T >
auto addEngine( size_t id, T && routine ) -> void
{
engines.emplace( id, std::forward< T >( routine ) );
}
auto setActiveEngine( size_t id ) -> bool
{
const auto iterFound = engines.find( id );
if ( iterFound == engines.end() )
return false;
activeEngine = &iterFound->second;
return true;
}
auto convert( const std::string fileName ) -> bool
{
if ( !activeEngine || !( *activeEngine ) )
return false;
return ( *activeEngine )( fileName );
}
};
int main()
{
struct MyConvertingEngineA
{
auto operator()( const auto& fn ) {
std::cout << "Doing A" << std::endl; return true;
}
};
auto myConvertingEngineB = []( const auto& fn ) {
std::cout << "Doing B" << std::endl; return true;
};
FileConverter myFC;
myFC.addEngine( 1, MyConvertingEngineA() );
myFC.addEngine( 2, std::move( myConvertingEngineB ) );
myFC.addEngine( 8, []( const auto& fn ){ std::cout << "Doing C" << std::endl; return true; } );
myFC.setActiveEngine( 1 );
myFC.convert( "MyFileA1" );
myFC.convert( "MyFileA2" );
myFC.setActiveEngine( 2 );
myFC.convert( "MyFileB" );
myFC.setActiveEngine( 8 );
myFC.convert( "MyFileC" );
myFC.setActiveEngine( 7 ); // Will fail, old engine will remain active
return 0;
}
Output:
Doing A
Doing A
Doing B
Doing C
A few explanations about the code:
The addEngine function uses templates with a forwarding reference and
perfect forwarding to provide the widest possible interface.
Engines are copied or moved here, depending on the type of reference passing. The engines are passed in form of "callable objects".
Different types like functors or lambdas can be added.
Even (as in the example) generic lambdas can be passed (auto parameter) as long as the function signature of of std::function is fulfilled.
This saves another redundant specification of the string type.
If you have large and complex engines, use a separate class instead of a lambda to keep the local scope small
Shared parser-functionality could also be passed to the respective engines via smart pointer if you don't want to have any inheritance.
class EngineA
{
std::shared_ptr< ParserCore > parserCore;
public:
EngineA( std::shared_ptr< ParserCore > parserCoreParam ) :
parserCore( std::move( parserCoreParam ) )
{}
auto operator()( const auto& fn ) {
std::cout << "Doing A" << std::endl; return true;
// -> parserCore->helperFuncForABC() ....;
}
};

Related

C++ directory item iteration without exceptions

In C++17 it became easy to iterate over the items in some directory dir:
for ( auto& dirEntry: std::filesystem::directory_iterator(dir) )
{
if ( !dirEntry.is_regular_file() ) continue;
...
Unfortunately this way may throw exceptions, which I want to avoid in my program.
The iteration without throwing exceptions is also possible:
std::error_code ec;
const std::filesystem::directory_iterator dirEnd;
for ( auto it = std::filesystem::directory_iterator( dir, ec ); !ec && it != dirEnd; it.increment( ec ) )
{
if ( !it->is_regular_file( ec ) ) continue;
...
but it is much wordier in C++. For example, I cannot use range based for. And this larger code size is really significant for me since I have a lot of places with iteration.
Is there a way to simplify the code iterating directory items and still avoid exceptions?

I think one can create a safe wrapper iterator, which operator ++ will not throw an exception, as follows
// object of this struct can be passed to range based for
struct safe_directory
{
std::filesystem::path dir;
std::error_code & ec;
};
//iterator of directory items that will save any errors in (ec) instead of throwing exceptions
struct safe_directory_iterator
{
std::filesystem::directory_iterator it;
std::error_code & ec;
safe_directory_iterator & operator ++() { it.increment( ec ); return * this; }
auto operator *() const { return *it; }
};
safe_directory_iterator begin( const safe_directory & sd )
{
return safe_directory_iterator{ std::filesystem::directory_iterator( sd.dir, sd.ec ), sd.ec };
}
std::filesystem::directory_iterator end( const safe_directory & )
{
return {};
}
bool operator !=( const safe_directory_iterator & a, const std::filesystem::directory_iterator & b )
{
return !a.ec && a.it != b;
}
Then it can be used in a program like that
int main()
{
std::error_code ec;
safe_directory sdir{ std::filesystem::current_path(), ec };
for ( auto dirEntry : sdir )
{
if ( dirEntry.is_regular_file( ec ) )
std::cout << dirEntry.path() << std::endl;
}
}
See example in online compiler: https://gcc.godbolt.org/z/fb4qPE6Gf

Handling combinatorial explosion when converting AoS to SoA

Suppose I want to convert an Array of Structures into a Structure of Arrays with a runtime-parameter indicating which of the members of the source structure should be converted. For example:
struct SourceElement {
string member1;
float member2;
int member3;
//More members...
};
auto source_elements = ...; //A forward-iterable range of SourceElement objects
vector<string> members1;
vector<float> members2;
vector<int> members3;
for(auto& source_element : source_elements) {
if(member1_required) {
members1.push_back(source_element.member1);
}
if(member2_required) {
members2.push_back(source_element.member2);
}
if(member3_required) {
members3.push_back(source_element.member3);
}
//...and so on...
}
//Some of the vectors might be empty, which I am fine with
I want to get rid of the conditionals within the loop in the hopes that the conditional-less code will run a bit faster. The typical way I know of is to simply move the conditionals out of the loop, which works fine if it is only a single conditional, but with multiple conditionals this results in a combinatorial explosion - for N members I have to write 2^N different loop bodies. Adding a new member requires writing a lot of code as well. Here is an example of how this looks:
if(member1_required && !member2_required && !member3_required) {
for(auto& source_element : source_elements) {
members1.push_back(source_element.member1);
}
} else if(member1_required && member2_required && !member3_required) {
for(auto& source_element : source_elements) {
members1.push_back(source_element.member1);
members2.push_back(source_element.member2);
}
}
//... and so on
What is a good way to deal with this sort of problem? An ideal solution should have the following properties:
Generated code should be as close as possible to a hand-rolled solution (one for-loop per combination)
Adding new members should require little effort
Destructuring the source element should allow for data transformations (e.g. members1.push_back(my_conversion(source_element.member1))). A simple case: SourceElement has a double member, but I want to store only float data
Source data might come from a forward iterator, so one cannot assume that all data is stored linearily in memory

You can use templates, example (untested):
struct DestElements
{
vector<string> members1;
vector<float> members2;
vector<int> members3;
};
template<uint32_t bitMask>
DestElements copy( const SourceElement* begin, const SourceElement* end )
{
DestElements dest;
for( ; begin < end; begin++ )
{
if constexpr( bitMask & 1 )
dest.members1.push_back( begin->member1 );
if constexpr( bitMask & 2 )
dest.members2.push_back( begin->member2 );
if constexpr( bitMask & 4 )
dest.members3.push_back( begin->member3 );
}
return std::move( dest );
}
using pfnCopy = DestElements( *)( const SourceElement* begin, const SourceElement* end );
static const std::array<pfnCopy, 8> dispatch =
{
// You can do crazy C++ metaprogramming here, std::apply, std::make_index_sequence, etc.
// When I have too large count of them, I write ~2 lines of C# in a T4 template instead.
&copy<0>, &copy<1>, &copy<2>, &copy<3>, &copy<4>, &copy<5>, &copy<6>, &copy<7>,
};
// Usage
uint32_t mask = 0;
if( member1_required ) mask |= 1;
if( member2_required ) mask |= 2;
if( member3_required ) mask |= 4;
DestElements dest = dispatch[ mask ]( source_elements.data(),
source_elements.data() + source_elements.size() );
You'll need to replace the const pointer with the type of your forward iterator, obviously.
However, I’m not sure this will have measurable effect on performance. All modern CPUs do branch prediction. These conditions don’t change while you iterating. After the first loop iteration, all these branches will be predicted with 100% accuracy.

You can select which functions to run in your loop from a map of lambdas before the loop. Like the others here have said though, this is unlikely to give a performance boost. The following code is a working example:
#include <unordered_map>
#include<functional>
#include <iostream>
#include <vector>
#include <string>
struct component
{
float member1;
std::string member2;
int member3;
};
struct vector_builder
{
std::vector<std::function<void(component&)>> commands;
std::unordered_map<int, std::function<void(component&)>> function_map{
{1, [&](component& comp){members1.push_back(comp.member1);}},
{2, [&](component& comp){members2.push_back(comp.member2);}},
{3, [&](component& comp){members3.push_back(comp.member3);}}
};
vector_builder(std::vector<int> must_contain)
{
for(auto i : must_contain) {commands.push_back(function_map[i]);}
}
std::vector<float> members1;
std::vector<std::string> members2;
std::vector<int> members3;
void Push(component& c) {for(auto func : commands) func(c);}
};
int main()
{
// Create an iterable collection of component objects we want to transform
std::vector<component> components{{2.1, "hello", 5},
{3.4, "world", 6},
{0.5, "great", 10}};
// Let's say we want only members 2 and 3 to be made into vectors:
vector_builder builder({2, 3});
// Now the loop comprises only the two push_back functions we wanted
for (auto& comp : components) builder.Push(comp);
// Print the results
std::cout << "members1: "; // This should be empty.
for (auto& i : builder.members1) std::cout << i << " ";
std::cout << "\nmembers2: ";
for (auto& i : builder.members2) std::cout << i << " ";
std::cout << "\nmembers3: ";
for (auto& i : builder.members3) std::cout << i << " ";
std::cout << std::endl;
return 0;
}

Is there a way with catch framework to compare stream or files?

I have seen in the boost testing tools the macro:
BOOST_<level>_EQUAL_COLLECTION(left_begin, left_end, right_begin, right_end)
which can work for streams by using ifstream_iterator.
Does Catch framework provide such a way to compare streams/files?

Not built-in, but then it does not purport to.
For this you write your own matcher.
Here's the documentation's example for integer range checking:
// The matcher class
class IntRange : public Catch::MatcherBase<int> {
int m_begin, m_end;
public:
IntRange( int begin, int end ) : m_begin( begin ), m_end( end ) {}
// Performs the test for this matcher
virtual bool match( int const& i ) const override {
return i >= m_begin && i <= m_end;
}
// Produces a string describing what this matcher does. It should
// include any provided data (the begin/ end in this case) and
// be written as if it were stating a fact (in the output it will be
// preceded by the value under test).
virtual std::string describe() const {
std::ostringstream ss;
ss << "is between " << m_begin << " and " << m_end;
return ss.str();
}
};
// The builder function
inline IntRange IsBetween( int begin, int end ) {
return IntRange( begin, end );
}
// ...
// Usage
TEST_CASE("Integers are within a range")
{
CHECK_THAT( 3, IsBetween( 1, 10 ) );
CHECK_THAT( 100, IsBetween( 1, 10 ) );
}
Clearly you can adapt this to perform any check that you need.

C++ equivalent of this Python code for dictionaries/lists?

In my Python code I have an issue that I need clarified before I start translating to c++: how to make proper dictionaries/lists that I can use the equivalent of "if var in _" in.
arbitrary example that needs translation:
CONFIRMATION = ('yes', 'yeah', 'yep', 'yesh', 'sure', 'yeppers', 'yup')
DECLINATION = ('no', 'nope', 'too bad', 'nothing')
varResponse = str(input('yes or no question'))
if varResponse in CONFIRMATION:
doSomething()
elif varResponse in DECLINATION:
doSomethingElse()
else:
doAnotherThing()
It's fairly easy to do similar tasks using arrays, like:
if (userDogName == name[0])
execute something;
but what I need is something like:
if (userDogName is one of a population of dog names in a dictionary)
execute something;

You can use the STL container class set. It uses balanced binary trees:
#include <iostream>
#include <set>
#include <string>
int main(int argc, char* argv[])
{
std::set<std::string> set;
std::set<std::string>::const_iterator iter;
set.insert("yes");
set.insert("yeah");
iter = set.find("yess");
if (iter != set.end( ))
{
std::cout << "Found:" << *iter;
}
else
{
std::cout << "Not found!";
}
return 0;
}

C++11 permits a solutions that's very similar to the Python code:
#include <iostream>
#include <set>
#include <string>
using namespace std;
set<string> CONFIRMATION = {"yes", "yeah", "yep", "yesh", "sure", "yeppers", "yup"};
set<string> DECLINATION = {"no", "nope", "too bad", "nothing"};
int main() {
cout << "yes or no question";
string varResponse;
getline(cin, varResponse);
if (CONFIRMATION.find(varResponse) != CONFIRMATION.end()) {
doSomething();
} else if (DECLINATION.find(varResponse) != DECLINATION.end()) {
doSomethingElse();
} else {
doAnotherThing();
}
}

Well, C++ isn't well suited for small throw-off programs, because it doesn't provide much infra-structure. You're meant to create your own infra-structure (such as, well, even just plain sets!) on top of the standard library. Or use some 3rd-party libraries, i.e. your choice.
So while Python comes with batteries included, with C++ there is no strong pressure to accept the particular provided batteries (because there are none), but you have to at least choose batteries.
For just the basic code, the Python snippet
CONFIRMATIONS = ("yes", "yeah", "yep", "yesh", "sure", "yeppers", "yup")
DECLINATIONS = ("no", "nope", "too bad", "nothing")
response = raw_input( "yes or no? " )
if response in CONFIRMATIONS:
pass # doSomething()
elif response in DECLINATIONS:
pass # doSomethingElse()
else:
pass #doAnotherThing()
can look like this in C++:
typedef Set< wstring > Strings;
Strings const confirmations = temp( Strings() )
<< L"yes" << L"yeah" << L"yep" << L"yesh" << L"sure" << L"yeppers" << L"yup";
Strings const declinations = temp( Strings() )
<< L"no" << L"nope" << L"too bad" << L"nothing";
wstring const response = lineFromUser( L"yes or no? " );
if( isIn( confirmations, response ) )
{
// doSomething()
}
else if( isIn( declinations, response ) )
{
// doSomethingElse()
}
else
{
// doAnotherThing()
}
But then, it relies on some infra-structure having been defined, like the Set class:
template< class TpElem >
class Set
{
public:
typedef TpElem Elem;
private:
set<Elem> elems_;
public:
Set& add( Elem const& e )
{
elems_.insert( e );
return *this;
}
friend Set& operator<<( Set& s, Elem const& e )
{
return s.add( e );
}
bool contains( Elem const& e ) const
{
return (elems_.find( e ) != elems_.end());
}
};
template< class Elem >
bool isIn( Set< Elem > const& s, Elem const& e )
{
return s.contains( e );
}
I used an operator<< because still as of 2012 Visual C++ does not support C++11 curly braces list initialization.
Here set is std::set from the standard library.
And, hm, the temp thingy:
template< class T >
T& temp( T&& o ) { return o; }
And, more infra-structure, the lineFromUser function:
wstring lineFromUser( wstring const& prompt )
{
wcout << prompt;
wstring result;
getline( wcin, result )
|| throwX( "lineFromUser: std::getline failed" );
return result;
}
Which, relies on a throwX function:
bool throwX( string const& s ) { throw runtime_error( s ); }
But that's about all, except that you have to put the C++ code I showed first, into some function, say, call that cppMain, and invoke that from your main function (even more infra-structure to define!):
int main()
{
try
{
cppMain();
return EXIT_SUCCESS;
}
catch( exception const& x )
{
wcerr << "!" << x.what() << endl;
}
return EXIT_FAILURE;
}
So, to do things even half-way properly in C++, there is some steep overhead.
C++ is mainly for larger programs, and Python (which I often use) for smallish programs.
And yes, I know that some students may or will react to that statement, either they feel that it's a slur on C++ to say that it's no good for small programs (hey, I make those all the time!) and/or it's a slur on Python to say that it's no good for large systems (hey, haven't you heard of YouTube, you dumb incompetent person?), but, that's the way that it is. Sometimes it can be more convenient to use the hammer one has to fasten a screw, so sometimes I, for example, use C++ to do some small task. But generally that's because it would be too much hassle to install Python on the machine at hand, and in general, to do a task X, it's best to use tools that have been designed for X-like work.

This can be solved using std::find on any Standard Template Library container.
std::vector<std::string> answers;
std::string input;
...
if(std::find(answers.begin(), answers.end(), input) != answers.end()) {
/* input was found in answers */
} else {
/* input was not found in answers */
}
For larger lists, it may be better to store your lists in a std::set object instead, as Tilo suggested. std::find will work the same.

Accessing for_each iterator from lambda

Is it possible to access the std::for_each iterator, so I can erase the current element from an std::list using a lambda (as below)
typedef std::shared_ptr<IEvent> EventPtr;
std::list<EventPtr> EventQueue;
EventType evt;
...
std::for_each(
EventQueue.begin(), EventQueue.end(),
[&]( EventPtr pEvent )
{
if( pEvent->EventType() == evt.EventType() )
EventQueue.erase( ???Iterator??? );
}
);
I've read about using [](typename T::value_type x){ delete x; } here on SO, but VS2010 doesn't seem to like this statement (underlines T as error source).

You are using the wrong algorithm. Use remove_if:
EventQueue.remove_if([&](EventPtr const& pEvent)
{
return pEvent->EventType() == evt.EventType();
});
The STL algorithms do not give you access to the iterator being used for iteration. This is in most cases a good thing.
(In addition, consider whether you really want to use std::list; it's unlikely that it is the right container for your use case. Consider std::vector, with which you would use the erase/remove idiom to remove elements that satisfy a particular predicate.)

no, use a regular for instead.
for( auto it = EventQueue.begin(); it != EventQueue.end(); ++it )
{
auto pEvent = *it;
if( pEvent->EventType() == evt.EventType() )
it = EventQueue.erase( it );
);

Erase is not the only time you may need to know iterator from lambda.
To do this in a more general way, I am using & operator (implicit conversion to iterator) like this :
int main (int argc, char* argv []) {
size_t tmp [6] = {0, 1, 2, 3, 4, 5};
std::list<size_t> ls ((size_t*)tmp, (size_t*) &tmp [6]);
//printing next element
std::for_each ((const size_t*)tmp, (const size_t*) &tmp [5], [] (const size_t& s) {
std::cout << s << "->";
std::cout << *(&s +1) << " ";
});
std::cout << std::endl;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Design direction for multiple format parsing - c++

Related

C++ directory item iteration without exceptions

Handling combinatorial explosion when converting AoS to SoA

Is there a way with catch framework to compare stream or files?

C++ equivalent of this Python code for dictionaries/lists?

Accessing for_each iterator from lambda

Categories

Resources