Chunk arguments for noweb - literate-programming

In nuweb, I can do something like this
#d Define the chunk with argument
echo "Hello, #1";
Then I can use it in other chunks by passing arguments:
#d Second chunk
#<Define the chunk with argument#(John#)#>
It will generate the following line:
echo "Hello, John";
I know, that in this particular case I can use in chunks another means (a variable), but such passing of arguments to the chunks is very useful for various code declarations, then you need to use bits of code that are almost the same, but not completely (for example, calling functions with various names). This is a useful feature. It works in nuweb fine.
Right now I switched to noweb, but I don't see any way to pass an argument to chunk. Is there any way to do this in noweb like in nuweb?

To avoid such weird cryptic syntax, try NanoLP for Literate Programming (it supports named arguments, variables dictionaries and many other)

Related

Can I use tparm() without tputs or putp

My understanding is that the function char *tparm(char *str, ...); just converts the given string str to an expanded parameterized version which will be fine to use with stdout outputting functions like printf or cout. But the man page mentions -
Parameterized strings should be passed through tparm to instantiate them. All terminfo strings [including the output of tparm] should be printed with tputs or putp.
So can I parse terminfo entries and use tparm() on them passing appropriate parameters and output them using stdout output functions? Also I'm doing the checks of non-tty output and ignoring these methods so I got that base covered.
Sure, you can. But some capability strings include padding and time delays, which tparm assumes will be interpreted by tputs.
For instance, the flash capability would use time-delays, which are passed along to tputs (using the syntax described in the terminfo(5) manual page).

TCL: check if variable is list

set var1 A
set var2 {A}
Is it possible to check if variable is list in TCL? For var1 and var2 llength gives 1. I am thinking that these 2 variables are considered same. They are both lists with 1 element. Am I right?
Those two things are considered to be entirely identical, and will produce identical bytecode (except for any byte offsets used for indicating where the content of constants are location, which is not information normally exposed to scripts at all so you can ignore it, plus the obvious differences due to variable names). Semantically, braces are a quoting mechanism and not an indicator of a list (or a script, or …)
You need to write your code to not assume that it can look things up by inspecting the type of a value. The type of 123 could be many different things, such as an integer, a list (of length 1), a unicode string or a command name. Tcl's semantics are based on you not asking what the type of a value is, but rather just using commands and having them coerce the values to the right type as required. Tcl's different to many other languages in this regard.
Because of this different approach, it's not easy to answer questions about this in general: the answers get too long with all the different possible cases to be considered in general yet most of it will be irrelevant to what you're really seeking to do. Ask about something specific though, and we'll be able to tell you much more easily.
You can try string is list $var1 but that will accept both of these forms - it will only return false on something that can't syntactically be interpreted as a list, eg. because there is an unmatched bracket like "aa { bb".

Named parameter string formatting in C++

I'm wondering if there is a library like Boost Format, but which supports named parameters rather than positional ones. This is a common idiom in e.g. Python, where you have a context to format strings with that may or may not use all available arguments, e.g.
mouse_state = {}
mouse_state['button'] = 0
mouse_state['x'] = 50
mouse_state['y'] = 30
#...
"You clicked %(button)s at %(x)d,%(y)d." % mouse_state
"Targeting %(x)d, %(y)d." % mouse_state
Are there any libraries that offer the functionality of those last two lines? I would expect it to offer a API something like:
PrintFMap(string format, map<string, string> args);
In Googling I have found many libraries offering variations of positional parameters, but none that support named ones. Ideally the library has few dependencies so I can drop it easily into my code. C++ won't be quite as idiomatic for collecting named arguments, but probably someone out there has thought more about it than me.
Performance is important, in particular I'd like to keep memory allocations down (always tricky in C++), since this may be run on devices without virtual memory. But having even a slow one to start from will probably be faster than writing it from scratch myself.
The fmt library supports named arguments:
print("You clicked {button} at {x},{y}.",
arg("button", "b1"), arg("x", 50), arg("y", 30));
And as a syntactic sugar you can even (ab)use user-defined literals to pass arguments:
print("You clicked {button} at {x},{y}.",
"button"_a="b1", "x"_a=50, "y"_a=30);
For brevity the namespace fmt is omitted in the above examples.
Disclaimer: I'm the author of this library.
I've always been critic with C++ I/O (especially formatting) because in my opinion is a step backward in respect to C. Formats needs to be dynamic, and makes perfect sense for example to load them from an external resource as a file or a parameter.
I've never tried before however to actually implement an alternative and your question made me making an attempt investing some weekend hours on this idea.
Sure the problem was more complex than I thought (for example just the integer formatting routine is 200+ lines), but I think that this approach (dynamic format strings) is more usable.
You can download my experiment from this link (it's just a .h file) and a test program from this link (test is probably not the correct term, I used it just to see if I was able to compile).
The following is an example
#include "format.h"
#include <iostream>
using format::FormatString;
using format::FormatDict;
int main()
{
std::cout << FormatString("The answer is %{x}") % FormatDict()("x", 42);
return 0;
}
It is different from boost.format approach because uses named parameters and because
the format string and format dictionary are meant to be built separately (and for
example passed around). Also I think that formatting options should be part of the
string (like printf) and not in the code.
FormatDict uses a trick for keeping the syntax reasonable:
FormatDict fd;
fd("x", 12)
("y", 3.141592654)
("z", "A string");
FormatString is instead just parsed from a const std::string& (I decided to preparse format strings but a slower but probably acceptable approach would be just passing the string and reparsing it each time).
The formatting can be extended for user defined types by specializing a conversion function template; for example
struct P2d
{
int x, y;
P2d(int x, int y)
: x(x), y(y)
{
}
};
namespace format {
template<>
std::string toString<P2d>(const P2d& p, const std::string& parms)
{
return FormatString("P2d(%{x}; %{y})") % FormatDict()
("x", p.x)
("y", p.y);
}
}
after that a P2d instance can be simply placed in a formatting dictionary.
Also it's possible to pass parameters to a formatting function by placing them between % and {.
For now I only implemented an integer formatting specialization that supports
Fixed size with left/right/center alignment
Custom filling char
Generic base (2-36), lower or uppercase
Digit separator (with both custom char and count)
Overflow char
Sign display
I've also added some shortcuts for common cases, for example
"%08x{hexdata}"
is an hex number with 8 digits padded with '0's.
"%026/2,8:{bindata}"
is a 24-bit binary number (as required by "/2") with digit separator ":" every 8 bits (as required by ",8:").
Note that the code is just an idea, and for example for now I just prevented copies when probably it's reasonable to allow storing both format strings and dictionaries (for dictionaries it's however important to give the ability to avoid copying an object just because it needs to be added to a FormatDict, and while IMO this is possible it's also something that raises non-trivial problems about lifetimes).
UPDATE
I've made a few changes to the initial approach:
Format strings can now be copied
Formatting for custom types is done using template classes instead of functions (this allows partial specialization)
I've added a formatter for sequences (two iterators). Syntax is still crude.
I've created a github project for it, with boost licensing.
The answer appears to be, no, there is not a C++ library that does this, and C++ programmers apparently do not even see the need for one, based on the comments I have received. I will have to write my own yet again.
Well I'll add my own answer as well, not that I know (or have coded) such a library, but to answer to the "keep the memory allocation down" bit.
As always I can envision some kind of speed / memory trade-off.
On the one hand, you can parse "Just In Time":
class Formater:
def __init__(self, format): self._string = format
def compute(self):
for k,v in context:
while self.__contains(k):
left, variable, right = self.__extract(k)
self._string = left + self.__replace(variable, v) + right
This way you don't keep a "parsed" structure at hand, and hopefully most of the time you'll just insert the new data in place (unlike Python, C++ strings are not immutable).
However it's far from being efficient...
On the other hand, you can build a fully constructed tree representing the parsed format. You will have several classes like: Constant, String, Integer, Real, etc... and probably some subclasses / decorators as well for the formatting itself.
I think however than the most efficient approach would be to have some kind of a mix of the two.
explode the format string into a list of Constant, Variable
index the variables in another structure (a hash table with open-addressing would do nicely, or something akin to Loki::AssocVector).
There you are: you're done with only 2 dynamically allocated arrays (basically). If you want to allow a same key to be repeated multiple times, simply use a std::vector<size_t> as a value of the index: good implementations should not allocate any memory dynamically for small sized vectors (VC++ 2010 doesn't for less than 16 bytes worth of data).
When evaluating the context itself, look up the instances. You then parse the formatter "just in time", check it agaisnt the current type of the value with which to replace it, and process the format.
Pros and cons:
- Just In Time: you scan the string again and again
- One Parse: requires a lot of dedicated classes, possibly many allocations, but the format is validated on input. Like Boost it may be reused.
- Mix: more efficient, especially if you don't replace some values (allow some kind of "null" value), but delaying the parsing of the format delays the reporting of errors.
Personally I would go for the One Parse scheme, trying to keep the allocations down using boost::variant and the Strategy Pattern as much I could.
Given that Python it's self is written in C and that formatting is such a commonly used feature, you might be able (ignoring copy write issues) to rip the relevant code from the python interpreter and port it to use STL maps rather than Pythons native dicts.
I've writen a library for this puporse, check it out on GitHub.
Contributions are wellcome.

Initializing a char array in C. Which way is better?

The following are the two ways of initializing a char array:
char charArray1[] = "foo";
char charArray2[] = {'f','o','o','\0'};
If both are equivalent, one would expect everyone to use the first option above (since it requires fewer key strokes). But I've seen code where the author takes the pain to always use the second method.
My guess is that in the first case the string "foo" is stored in the data segment and copied into the array at runtime, whereas in the second case the characters are stored in the code segment and copied into the array at runtime. And for some reason, the author is allergic to having anything in the data segment.
Edit: Assume the arrays are declared local to a function.
Questions: Is my reasoning correct? Which is your preferred style and why?
What about another possibility:
char charArray3[] = {102, 111, 111, 0};
You shouldn't forget the C char type is a numeric type, it just happens the value is often used as a char code. But if I use an array for something not related to text at all, I would would definitely prefer initialize it with the above syntax than encode it to letters and put them between quotes.
If you don't want the terminal 0 you also have to use the second form or in C use:
char charArray3[3] = "foo";
It is a a C feature that nearly nobody knows, but if the compiler does not have room enough to hold the final 0 when initializing a charArray, it does not put it, but the code is legal. However this should be avoided because this feature has been removed from C++, and a C++ compiler would yield an error.
I checked the assembly code generated by gcc, and all the different forms are equivalent. The only difference is that it uses either .string or .byte pseudo instruction to declare data. But tha's just a readability issue and does not make a bit of difference in the resulting program.
I think the second method is used mostly in legacy code where compilers didn't support the first method. Both methods should store the data in the data segments. I prefer the first method due to readability. Also, I needed to patch a program once (can't remember which, it was a standard UNIX tool) to not use /etc (it was for an embedded system). I had a very hard time finding the correct place because they used the second method and my grep couldn't find "etc" anywhere :-)

Macro Replacement during Code Generation

Presently I have a some legacy code, which generates the op code. If the code has more number of macros then the code generation takes so much of time (In terms of hours!!).
I have gone through the logic, they are handling the macro by searching for it and doing a replace of each variable in it some thing like inlining.
Is there a way that I can optimize it without manipulating the string?
You must tokenize your input before starting this kind of process. (I can't recommend the famous Dragon Book highly enough - even the ancient edition stood the test of time, the updated 2006 version looks great). Compiling is the sort of job that's best split up into smaller phases: if your first phase performs lexical analysis into tokens, splitting lines into keywords, identifiers, constants, and so on, then it's much simpler to find the references to macros and look them up in a symbol table. (It's also relatively easier to use a tool like lex or flex or one of their modern equivalents to do this job for you, than to attempt to do it from scratch).
The 'clue' seems to be if the code has more number of macros then the code generation takes so much of time. That sounds like the process is linear in the number of macros, which is certainly too much. I'm assuming this process occurs one line at a time (if your language allows that, obviously that has enormous value, since you don't need to treat the program as one huge string), and the pseudocode looks something like
for(each line in the program)
{
for(each macro definition)
{
test if the macro appears;
perform replacement if needed;
}
}
That clearly scales with the number of macro definitions.
With tokenization, it looks something like this:
for(each line in the program)
{
tokenize the line;
for(each token in the line)
{
switch(based on the token type)
{
case(an identifier)
lookup the identifier in the table of macro names;
perform replacement as necessary;
....
}
}
}
which scales mostly with the size of the program (not the number of definitions) - the symbol table lookup can of course be done with more optimal data structures than looping through them all, so that no longer becomes the significant factor. That second step is something that again programs like yacc and bison (and their more modern variants) can happily generate code to do.
afterthought: when parsing the macro definitions, you can store those as a token stream as well, and mark the identifiers that are the 'placeholder' names for parameter replacement. When expanding a macro, switch to that token stream. (Again, something things like flex can easily do).
I have an application which has its own grammer. It supports all types of datatypes that a typical compiler supports (Even macros). More precisely it is a type of compiler which generates the opcodes by taking a program (which is written using that grammer) as input.
For handling the macros, it uses the text replacement logic
For Example:
Macro Add (a:int, b:int)
int c = a + b
End Macro
// Program Sum
..
int x = 10, y = 10;
Add(x, y);
..
// End of the program
After replacement it will be
// Program Sum
..
int x = 10, y = 10;
int c = x + y
..
// End of program
This text replacement is taking so much of time i.e., replacing the macro call with macro logic.
Is there a optimal way to do it?
This is really hard to answer without knowing more of your preprocessor/parse/compile process. One idea would be to store the macro names in a symbol table. When parsing, check text tokens against that table first, If you find a match, write the replacement into a new string, and run that through the parser, then continue parsing the original text following the macrto's close parens.
Depending on your opcode syntax, another idea might be - when you encounter the macro definition while parsing, generate the opcodes, but put placeholders in place of the arguments. Then when the parser encounter calls to the macro, generate the code for evaluating the arguments, and insert that code in place of the placeholders in the pre-generated macro code.