How to do string formatting in BetterC mode? - d

I'd like to use something like the "Concepts" package from Atila Neves.
I implemented the check of an object against a type signature myself in a simple naive way. I can check struct objects against interfaces which I define within compile-time-evaluated delegate blocks to make them work with BetterC. I only used compile-time function evaluation with enums which receive return values of executed delegate code blocks.
Now I faced problems with std.format.format which uses TypeInfo for %s formatters which gives errors when compiling in BetterC. For code generation I'd like to use token strings because they have syntax highlighting. But proper usage of them requires string interpolation or string formatting. core.stdc.stdio.snprintf is no alternative because CTFE can only interprete D source code.
This is not technically a problem. I can just turn token strings into WYSIWYG strings.
But I wonder why I can't use it. The official documentation says, compile-time features are unrestricted for BetterC (I assume this includes TypeInfo). Either it is plain wrong or I am doing it wrong.
template implementsType(alias symbol, type)
if (isAbstractClass!type)
{
enum implementsType = mixin(implementsTypeExpr);
enum implementsTypeExpr =
{
import std.format : format;
auto result = "";
static foreach(memberName; __traits(allMembers, type))
{
result ~= format(
q{__traits(compiles, __traits(getMember, symbol, "%1$s")) && }~
q{covariantSignature!(__traits(getMember, symbol, "%1$s"), __traits(getMember, type, "%1$s")) && }
, memberName);
}
return (result.length >= 3)? result[0 .. $-3] : result;
}();
}

TypeInfo are not available with BetterC.
There's a bc-string dub package that provides a limited string formatter that will work in BetterC.

Related

Can I use virtual tokens (tokens with identical return value) in ANTLR4 similar to c++?

In C++ I can use virtual functions to process data from similar classes that have the same parent/ancestor, does ANTLR4 support this and how would I have to set up the grammar?
I have tried to set up a grammar, using arguments that have the same return value and use that value in a token that contains the different "subclassed" tokens.
Here is some code I have tried to work with:
amf_group
: statements=amf_statements (GROUPSEP WS? LINE_COMMENT? EOL? | EOF)
;
amf_statements returns [amf::AmfStatements stmts]
: ( WS? ( stmt=amf_statement { stmts.emplace_back(std::move($stmt.stmtptr)); } WS? EOL) )*
;
amf_statement returns [amf::AmfStatementPtr stmtptr]
: (
stmt = jsonparent_statement
| stmt = jsonvalue_statement
)
{
$stmtptr = std::move($stmt.stmtptr);
}
;
jsonparent_statement returns [amf::AmfStatementPtr stmtptr] locals [int lineno=0]
:
(T_JSONPAR { $lineno = $T_JSONPAR.line;} ) WS (arg=integer_const)
{
$stmtptr = std::make_shared<amf::JSONParentStatement>($lineno, nullptr);
}
;
jsonvalue_statement returns [amf::AmfStatementPtr stmtptr] locals [int lineno=0]
: ( T_JSONVALUE { $lineno = $T_JSONVALUE.line; } ) WS (arg=integer_const) (WS fmt=integer_const)?
{
$stmtptr = std::make_shared<amf::JSONValueStatement>($lineno, std::move($arg.argptr), std::move($fmt.argptr));
}
;
I receive the following error:
error(75): amf1.g4:23:10: label stmt=jsonvalue_statement type mismatch with previous definition: stmt=jsonparent_statement
This error is or course quite logical, because the tokens are indeed of a different type, but there return value types are identical. For two (virtual) tokens I can write all the code separatelty, but in my case I have some 40+ different tokens that either represent arguments or statements and writing all the combinations would be cumbersome. The above code did work in Antlr3 by the way.
Is there another way to get around these errors using ANTLR4? Does anybody have any suggestions?
What's specified in a rule return value is not really a return value in a functional sense. Instead the context representing the rule will get a new member field that takes the "return" value. Given that it makes no sense trying to treat parser rules like C++ functions, they are simply not comparable.
Instead of handling all the fields in your grammar, I recommend a different approach: with ANTLR4 you will get a parse tree (if enabled), which represents the matched rules using parse rule contexts (which is super view of the previously generated AST). This context contains all the values that have been parsed out. You just need a listener in a second step after the parse run (often called the semantic phase) to walk over this tree, pick those values up and create your own data structures from them. This separation also allows to use your parser for quick syntax checks, since you don't do all the heavy work in the parse run.

Function with a custom return type and the "false" return conditions?

I have a function that returns a custom class structure, but how should I handle the cases where I wish to inform the user that the function has failed, as in return false.
My function looks something like this:
Cell CSV::Find(std::string segment) {
Cell result;
// Search code here.
return result;
}
So when succesful, it returns the proper result, but how should I handle the case when it could fail?
I thought about adding a boolean method inside Cell to check what ever Cell.data is empty or not (Cell.IsEmpty()). But am I thinking this issue in a way too complicated way?
There are three general approaches:
Use exceptions. This is what's in Bathsheba's answer.
Return std::optional<Cell> (or some other type which may or may not hold an actual Cell).
Return bool, and add a Cell & parameter.
Which of these is best depends on how you intend this function to be used. If the primary use case is passing a valid segment, then by all means use exceptions.
If part of the design of this function is that it can be used to tell if a segment is valid, exceptions aren't appropriate, and my preferred choice would be std::optional<Cell>. This may not be available on your standard library implementation yet (it's a C++17 feature); if not, boost::optional<Cell> may be useful (as mentioned in Richard Hodges's answer).
In the comments, instead of std::optional<Cell>, user You suggested expected<Cell, error> (not standard C++, but proposed for a future standard, and implementable outside of the std namespace until then). This may be a good option to add some indication on why no Cell could be found for the segment parameter passed in, if there are multiple possible reasons.
The third option I include mainly for completeness. I do not recommend it. It's a popular and generally good pattern in other languages.
Is this function a query, which could validly not find the cell, or is it an imperative, where the cell is expected to be found?
If the former, return an optional (or nullable pointer to) the cell.
If the latter, throw an exception if not found.
Former:
boost::optional<Cell> CSV::Find(std::string segment) {
boost::optional<Cell> result;
// Search code here.
return result;
}
Latter:
as you have it.
And of course there is the c++17 variant-based approach:
#include <variant>
#include <string>
struct CellNotFound {};
struct Cell {};
using CellFindResult = std::variant<CellNotFound, Cell>;
CellFindResult Find(std::string segment) {
CellFindResult result { CellNotFound {} };
// Search code here.
return result;
}
template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };
template<class... Ts> overloaded(Ts...) -> overloaded<Ts...>;
void cellsAndStuff()
{
std::visit(overloaded
{
[&](CellNotFound)
{
// the not-found code
},
[&](Cell c)
{
// code on cell found
}
}, Find("foo"));
}
The C++ way of dealing with abject failures is to define an exception class of the form:
struct CSVException : std::exception{};
In your function you then throw one of those in the failure branch:
Cell CSV::Find(std::string segment) {
Cell result;
// Search code here.
if (fail) throw CSVException();
return result;
}
You then handle the fail case with a try catch block at the calling site.
If however the "fail" branch is normal behaviour (subjective indeed but only you can be the judge of normality), then do indeed imbue some kind of failure indicator inside Cell, or perhaps even change the return type to std::optional<Cell>.
If you can use C++17, another approach would be to use an std::optional type as your return value. That's a wrapper that may or may not contain a value. The caller can then check whether your function actually returned a value and handle the case where it didn't.
std::optional<Cell> CSV::Find(std::string segment) {
Cell result;
// Search code here.
return result;
}
void clientCode() {
auto cell = CSV::Find("foo");
if (cell)
// do stuff when found
else
// handle not found
}
A further option is using multiple return values:
std::pair<Cell, bool> CSV::Find(std::string segment) {
Cell result;
// Search code here.
return {result, found};
}
// ...
auto cell = CSV::Find("foo");
if (cell->second)
// do stuff with cell->first
The boolean flag says whether the requested Cell was found or not.
PROs
well known approach (e.g. std::map::insert);
quite direct: value and success indicator are return values of the function.
CONs
obscureness of first and second which requires to always remember the relative positions of values within the pairs. C++17 structured bindings / if statement with initializer partially resolve this issue:
if (auto [result, found] = CSV::Find("foo"); found)
// do stuff with `result`
possible loss of safety (the calling code has to check if there is a result value, before using it).
Details
Returning multiple values from functions in C++
C++ Error Handling - downside of using std::pair or std::tuple for returning error codes and function returns
For parsing, it is generally better to avoid std::string and instead use std::string_view; if C++17 is not available, minimally functional versions can be whipped up easily enough.
Furthermore, it is also important to track not only what was parsed but also the remainder.
There are two possibilities to track the remainder:
taking a mutable argument (by reference),
returning the remainder.
I personally prefer the latter, as in case of errors it guarantees that the caller has in its hands a unmodified value which is useful for error-reporting.
Then, you need to examine what potential errors can occur, and what recovery mechanisms you wish for. This will inform the design.
For example, if you wish to be able to parse ill-formed CSV documents, then it is reasonable that Cell be able to represent ill-formed CSV cells, in which case the interface is rather simple:
std::pair<Cell, std::string_view> consume_cell(std::string_view input) noexcept;
Where the function always advances and the Cell may contain either a proper cell, or an ill-formed one.
On the other hand, if you only wish to support well-formed CSV documents, then it is reasonable to signal errors via exceptions and that Cell only be able to hold actual cells:
std::pair<std::optional<Cell>, std::string_view> consume_cell(...);
And finally, you need to think about how to signal end of row conditions. It may a simple marker on Cell, though at this point I personally prefer to create an iterator as it presents a more natural interface since a row is a range of Cell.
The C++ interface for iterators is a bit clunky (as you need an "end", and the end is unknown before parsing), however I recommend sticking to it to be able to use the iterator with for loops. If you wish to depart from it, though, at least make it work easily with while, such as std::optional<Cell> cell; while ((cell = row.next())) { ... }.

How to pass token kind with its associated information from lexer to preprocessor, then to parser

I try to implement a simple C/C++ parser, which try to partially parsing C++ language. So I need to create a Lexer, a Preprocessor and a Parser class.
I'm considering what is the data type I need to pass information from those three layers. Normally, a Token class is need here, for right now, my Token class looks like below:
struct Token
{
TokenKind id;
std::string lexeme;
int fileIndex;
int line;
int column;
}
I think the most important part is the TokenKind(it could be IDENTIFIER or CLASS_KEYWORD or any other punctuation like LPAREN), and some times, the lexeme is also important, because it usually contains the type name or variable name information.
I looked at some implementations about how the Token is passed to Parsers.
1, I see the Clang has some functions in it's Preprocessor class like Preprocessor.cpp:739
void Preprocessor::Lex(Token &Result)
You see, a reference is passed as a the function argument, and the function fill the object with the result, see another reference here on a Clang's tutorial here:Clang-tutorial/CItutorial3.cpp at master · loarabia/Clang-tutorial, here the instance tok is reused in a loop.
Token tok;
do {
ci.getPreprocessor().Lex(tok);
if( ci.getDiagnostics().hasErrorOccurred())
break;
ci.getPreprocessor().DumpToken(tok);
std::cerr << std::endl;
} while ( tok.isNot(clang::tok::eof));
2, For some lexer generator, I see function yylex() just return an int type, which is actually a TokenKind, and the other information such as the actual lexeme string is stored in a global variables like yylval.
3, For a tiny language for GCC A tiny GCC front end – Part 3 | Think In Geek, I see the Lexer return a std::shared_ptr<Token>, that is:
static TokenPtr
make_identifier (location_t locus, const std::string& str)
{
return TokenPtr(new Token (IDENTIFIER, locus, str));
}
The Lexer return a TokenPtr which is a smart pointer of the Token object to the Parser, so the whole Token is returned to the Parser.
4, GCC's cpp library has some interface of the cpp_get_token() function like below:
const cpp_token *token = cpp_get_token (pfile);
Then token->type is just like the TokenKind field.
So, my question is: what are the advantages and disadvantages of those kinds of implementations. Some of the mentioned methods above do not even have a preprocess layer, for me, I do need three layers(the lexer, the preprocessor and the parser).
Note that my parser won't be big enough as clang or GCC's parser. My main idea is that my parser can only parse very limited part of C++ language, and I would like to make them all hand written.
EDIT A similar question is here What should be the datatype of the tokens a lexer returns to its parser?, I also post some comments there several days ago, but that question does not involve the three layers.

Syntax for std::binary_function usage

I'm a newbie at using the STL Algorithms and am currently stuck on a syntax error. My overall goal of this is to filter the source list like you would using Linq in c#. There may be other ways to do this in C++, but I need to understand how to use algorithms.
My user-defined function object to use as my function adapter is
struct is_Selected_Source : public std::binary_function<SOURCE_DATA *, SOURCE_TYPE, bool>
{
bool operator()(SOURCE_DATA * test, SOURCE_TYPE ref)const
{
if (ref == SOURCE_All)
return true;
return test->Value == ref;
}
};
And in my main program, I'm using as follows -
typedef std::list<SOURCE_DATA *> LIST;
LIST; *localList = new LIST;;
LIST* msg = GLOBAL_DATA->MessageList;
SOURCE_TYPE _filter_Msgs_Source = SOURCE_TYPE::SOURCE_All;
std::remove_copy(msg->begin(), msg->end(), localList->begin(),
std::bind1st(is_Selected_Source<SOURCE_DATA*, SOURCE_TYPE>(), _filter_Msgs_Source));
What I'm getting the following error in Rad Studio 2010. The error means "Your source file used a typedef symbol where a variable should appear in an expression. "
"E2108 Improper use of typedef 'is_Selected_Source'"
Edit -
After doing more experimentation in VS2010, which has better compiler diagnostics, I found the problem is that the definition of remove_copy only allows uniary functions. I change the function to uniary and got it to work.
(This is only relevant if you didn't accidentally omit some of your code from the question, and may not address the exact problem you're having)
You're using is_Selected_Source as a template even though you didn't define it as one. The last line in the 2nd code snippet should read std::bind1st(is_Selected_Source()...
Or perhaps you did want to use it as a template, in which case you need to add a template declaration to the struct.
template<typename SOURCE_DATA, typename SOURCE_TYPE>
struct is_Selected_Source : public std::binary_function<SOURCE_DATA *, SOURCE_TYPE, bool>
{
// ...
};
At a guess (though it's only a guess) the problem is that std::remove_copy expects a value, but you're supplying a predicate. To use a predicate, you want to use std::remove_copy_if (and then you'll want to heed #Cogwheel's answer).
I'd also note that:
LIST; *localList = new LIST;;
Looks wrong -- I'd guess you intended:
LIST *locallist = new LIST;
instead.

C++ std::string and NULL const char*

I am working in C++ with two large pieces of code, one done in "C style" and one in "C++ style".
The C-type code has functions that return const char* and the C++ code has in numerous places things like
const char* somecstylefunction();
...
std::string imacppstring = somecstylefunction();
where it is constructing the string from a const char* returned by the C style code.
This worked until the C style code changed and started returning NULL pointers sometimes. This of course causes seg faults.
There is a lot of code around and so I would like to most parsimonious way fix to this problem. The expected behavior is that imacppstring would be the empty string in this case. Is there a nice, slick solution to this?
Update
The const char* returned by these functions are always pointers to static strings. They were used mostly to pass informative messages (destined for logging most likely) about any unexpected behavior in the function. It was decided that having these return NULL on "nothing to report" was nice, because then you could use the return value as a conditional, i.e.
if (somecstylefunction()) do_something;
whereas before the functions returned the static string "";
Whether this was a good idea, I'm not going to touch this code and it's not up to me anyway.
What I wanted to avoid was tracking down every string initialization to add a wrapper function.
Probably the best thing to do is to fix the C library functions to their pre-breaking change behavior. but maybe you don't have control over that library.
The second thing to consider is to change all the instances where you're depending on the C lib functions returning an empty string to use a wrapper function that'll 'fix up' the NULL pointers:
const char* nullToEmpty( char const* s)
{
return (s ? s : "");
}
So now
std::string imacppstring = somecstylefunction();
might look like:
std::string imacppstring( nullToEmpty( somecstylefunction());
If that's unacceptable (it might be a lot of busy work, but it should be a one-time mechanical change), you could implement a 'parallel' library that has the same names as the C lib you're currently using, with those functions simply calling the original C lib functions and fixing the NULL pointers as appropriate. You'd need to play some tricky games with headers, the linker, and/or C++ namespaces to get this to work, and this has a huge potential for causing confusion down the road, so I'd think hard before going down that road.
But something like the following might get you started:
// .h file for a C++ wrapper for the C Lib
namespace clib_fixer {
const char* somecstylefunction();
}
// .cpp file for a C++ wrapper for the C Lib
namespace clib_fixer {
const char* somecstylefunction() {
const char* p = ::somecstylefunction();
return (p ? p : "");
}
}
Now you just have to add that header to the .cpp files that are currently calling calling the C lib functions (and probably remove the header for the C lib) and add a
using namespace clib_fixer;
to the .cpp file using those functions.
That might not be too bad. Maybe.
Well, without changing every place where a C++ std::string is initialized directly from a C function call (to add the null-pointer check), the only solution would be to prohibit your C functions from returning null pointers.
In GCC compiler, you can use a compiler extension "Conditionals with Omitted Operands" to create a wrapper macro for your C function
#define somecstylefunction() (somecstylefunction() ? : "")
but in general case I would advise against that.
I suppose you could just add a wrapper function which tests for NULL, and returns an empty std::string. But more importantly, why are your C functions now returning NULL? What does a NULL pointer indicate? If it indicates a serious error, you might want your wrapper function to throw an exception.
Or to be safe, you could just check for NULL, handle the NULL case, and only then construct an std::string.
const char* s = somecstylefunction();
if (!s) explode();
std::string str(s);
For a portable solution:
(a) define your own string type. The biggest part is a search and replace over the entire project - that can be simple if it's always std::string, or big one-time pain. (I'd make the sole requriement that it's Liskov-substitutable for a std::string, but also constructs an empty string from an null char *.
The easiest implementation is inheriting publicly from std::string. Even though that's frowned upon (for understandable reasons), it would be ok in this case, and also help with 3rd party libraries expecting a std::string, as well as debug tools. Alternatively, aggegate and forward - yuck.
(b) #define std::string to be your own string type. Risky, not recommended. I wouldn't do it unless I knew the codebases involved very well and saves you tons of work (and I'd add some disclaimers to protect the remains of my reputation ;))
(c) I've worked around a few such cases by re-#define'ing the offensive type to some utility class only for the purpose of the include (so the #define is much more limited in scope). However, I have no idea how to do that for a char *.
(d) Write an import wrapper. If the C library headers have a rather regular layout, and/or you know someone who has some experience parsing C++ code, you might be able to generate a "wrapper header".
(e) ask the library owner to make the "Null string" value configurable at least at compile time. (An acceptable request since switching to 0 can break compatibility as well in other scenarios) You might even offer to submit the change yourself if that's less work for you!
You could wrap all your calls to C-stlye functions in something like this...
std::string makeCppString(const char* cStr)
{
return cStr ? std::string(cStr) : std::string("");
}
Then wherever you have:
std::string imacppstring = somecstylefunction();
replace it with:
std::string imacppstring = makeCppString( somecystylefunction() );
Of course, this assumes that constructing an empty string is acceptable behavior when your function returns NULL.
I don't generally advocate subclassing standard containers, but in this case it might work.
class mystring : public std::string
{
// ... appropriate constructors are an exercise left to the reader
mystring & operator=(const char * right)
{
if (right == NULL)
{
clear();
}
else
{
std::string::operator=(right); // I think this works, didn't check it...
}
return *this;
}
};
Something like this should fix your problem.
const char *cString;
std::string imacppstring;
cString = somecstylefunction();
if (cString == NULL) {
imacppstring = "";
} else {
imacppstring = cString;
}
If you want, you could stick the error checking logic in its own function. You'd have to put this code block in fewer places, then.