Context: I'm using Maxima on a platform that also uses KaTeX. For various reasons related to content management, this means that we are regularly using Maxima functions to generate the necessary KaTeX commands.
I'm currently trying to develop a group of functions that will facilitate generating different sets of strings corresponding to KaTeX commands for various symbols related to vectors.
Problem
I have written the following function makeKatexVector(x), which takes a string, list or list-of-lists and returns the same type of object, with each string wrapped in \vec{} (i.e. makeKatexVector(string) returns \vec{string} and makeKatexVector(["a","b"]) returns ["\vec{a}", "\vec{b}"] etc).
/* Flexible Make KaTeX Vector Version of List Items */
makeKatexVector(x):= block([ placeHolderList : x ],
if stringp(x) /* Special Handling if x is Just a String */
then placeHolderList : concat("\vec{", x, "}")
else if listp(x[1]) /* check to see if it is a list of lists */
then for j:1 thru length(x)
do placeHolderList[j] : makelist(concat("\vec{", k ,"}"), k, x[j] )
else if listp(x) /* check to see if it is just a list */
then placeHolderList : makelist(concat("\vec{", k, "}"), k, x)
else placeHolderList : "makeKatexVector error: not a list-of-lists, a list or a string",
return(placeHolderList));
Although I have my doubts about the efficiency or elegance of the above code, it seems to return the desired expressions; however, I would like to modify this function so that it can distinguish between single- and multi-character strings.
In particular, I'd like multi-character strings like x_1 to be returned as \vec{x}_1 and not \vec{x_1}.
In fact, I'd simply like to modify the above code so that \vec{} is wrapped around the first character of the string, regardless of how many characters there may be.
My Attempt
I was ready to tackle this with brute force (e.g. transcribing each character of a string into a list and then reassembling); however, the real programmer on the project suggested I look into "Regular Expressions". After exploring that endless rabbit hole, I found the command regex_subst; however, I can't find any Maxima documentation for it, and am struggling to reproduce the examples in the related documentation here.
Once I can work out the appropriate regex to use, I intend to implement this in the above code using an if statement, such as:
if slength(x) >1
then {regex command}
else {regular treatment}
If anyone knows of helpful resources on any of these fronts, I'd greatly appreciate any pointers at all.
Looks like you got the regex approach working, that's great. My advice about handling subscripted expressions in TeX, however, is to avoid working with names which contain underscores in Maxima, and instead work with Maxima expressions with indices, e.g. foo[k] instead of foo_k. While writing foo_k is a minor convenience in Maxima, you'll run into problems pretty quickly, and in order to straighten it out you might end up piling one complication on another.
E.g. Maxima doesn't know there's any relation between foo, foo_1, and foo_k -- those have no more in common than foo, abc, and xyz. What if there are 2 indices? foo_j_k will become something like foo_{j_k} by the preceding approach -- what if you want foo_{j, k} instead? (Incidentally the two are foo[j[k]] and foo[j, k] when represented by subscripts.) Another problematic expression is something like foo_bar_baz. Does that mean foo_bar[baz], foo[bar_baz] or foo_bar_baz?
The code for tex(x_y) yielding x_y in TeX is pretty old, so it's unlikely to go away, but over the years I've come to increasing feel like it should be avoided. However, the last time it came up and I proposed disabling that, there were enough people who supported it that we ended up keeping it.
Something that might be helpful, there is a function texput which allows you to specify how a symbol should appear in TeX output. For example:
(%i1) texput (v, "\\vec{v}");
(%o1) "\vec{v}"
(%i2) tex ([v, v[1], v[k], v[j[k]], v[j, k]]);
$$\left[ \vec{v} , \vec{v}_{1} , \vec{v}_{k} , \vec{v}_{j_{k}} ,
\vec{v}_{j,k} \right] $$
(%o2) false
texput can modify various aspects of TeX output; you can take a look at the documentation (see ? texput).
While I didn't expect that I'd work this out on my own, after several hours, I made some progress, so figured I'd share here, in case anyone else may benefit from the time I put in.
to load the regex in wxMaxima, at least on the MacOS version, simply type load("sregex");. I didn't have this loaded, and was trying to work through our custom platform, which cost me several hours.
take note that many of the arguments in the linked documentation by Dorai Sitaram occur in the reverse, or a different order than they do in their corresponding Maxima versions.
not all the "pregexp" functions exist in Maxima;
In addition to this, escaping special characters varied in important ways between wxMaxima, the inline Maxima compiler (running within Ace editor) and the actual rendered version on our platform; in particular, the inline compiler often returned false for expressions that compiled properly in wxMaxima and on the platform. Because I didn't have sregex loaded on wxMaxima from the beginning, I lost a lot of time to this.
Finally, the regex expression that achieved the desired substitution, in my case, was:
regex_subst("\vec{\\1}", "([[:alpha:]])", "v_1");
which returns vec{v}_1 in wxMaxima (N.B. none of my attempts to get wxMaxima to return \vec{v}_1 were successful; escaping the backslash just does not seem to work; fortunately, the usual escaped version \\vec{\\1} does return the desired form).
I have yet to adjust the code for the rest of the function, but I doubt that will be of use to anyone else, and wanted to be sure to post an update here, before anyone else took time to assist me.
Always interested in better methods / practices or any other pointers / feedback.
I was looking at the question posed in this stackoverflow link (Regular expression for odd number of a's) for which it is asked to find the regular expression for strings that have odd number of a over Σ = {a,b}.
The answer given by the top comment which works is b*(ab*ab*)*ab*.
I am quite confused - a was placed just before the last b*, does this ordering actually matter? Why can't it be b*a(ab*ab*)*b* instead (where a is placed after the first b*), or any other permutation of it?
Another thing I am confused about is why it is (ab*ab*)* and not (b*ab*ab*)*. Isn't b*ab*ab* the more accurate definition of 'having exactly 2 a'?
Why can't it be b*a(ab*ab*)*b* instead?
b*a(ab*ab*)*b* does not work because it would require the string to have two consecutive as before the first non-leading b, wouldn't it? For example, abaa would not be matched by your proposed regex when it should. Use the regex debugger on a site like Regex101 to see this for yourself.
On the other hand, moving the whole ab* part to the start (b*ab*(ab*ab*)*) works as well.
why it is (ab*ab*)* and not (b*ab*ab*)*?
(b*ab*ab*)* does work, but the first b* is quite redundant because whatever b there is left, will be matched by the last b* in the group. There is also a b* before the group, which causes the b* to not be able to match anything, hence it is redundant.
There are infinitely many equivalent regular expressions which generate a given (infinite) regular language. A particular expression might be preferable in some cases and by certain authors: one might prefer a minimal expression, or one which shows structure or symmetry, or even one that simplifies the reasoning in a proof by induction.
Your particular suggestion to move the a is insufficient since, as noted above, that ensures the substring aa will appear in any string with more than one a. However, abab could be changed to baba to make that placement work. Choosing babab* would work with either placement. You could even go for an expression like bab + bababab + (babab*)a(babab*) which might be nice to work with depending on your application. Something like b*(abab)ab* has the advantage of being minimal (if it's not strictly minimal, it must be pretty close).
Taken from Introduction to Ada—If expressions:
Ada's if expressions are similar to if statements. However, there are a few differences that stem from the fact that it is an expression:
All branches' expressions must be of the same type
It must be surrounded by parentheses if the surrounding expression does not already contain them
An else branch is mandatory unless the expression following then has a Boolean value. In that case an else branch is optional and, if not present, defaults to else True.
I do not understand the need to have two different ways of constructing code with the if keyword. What is the reasoning behind this?
Also there case expressions and case statements. Why is this?
I think this is best answered by quoting the Ada 2012 Rationale Chapter 3.1:
One of the key areas identified by the WG9 guidance document [1] as
needing attention was improving the ability to write and enforce
contracts. These were discussed in detail in the previous chapter.
When defining the new aspects for preconditions, postconditions, type
invariants and subtype predicates it became clear that without more
flexible forms of expressions, many functions would need to be
introduced because in all cases the aspect was given by an expression.
However, declaring a function and thus giving the detail of the
condition, invariant or predicate in the function body makes the
detail of the contract rather remote for the human reader. Information
hiding is usually a good thing but in this case, it just introduces
obscurity. Four forms are introduced, namely, if expressions, case
expressions, quantified expressions and expression functions. Together
they give Ada some of the flexible feel of a functional language.
In addition, if statements and case statements often assigns different values to the same variable in all branches, and nothing else:
if Foo > 10 then
Bar := 1;
else
Bar := 2;
end if;
In this case, an if expression may increase readability and more clearly state in the code what's going on:
Bar := (if Foo > 10 then 1 else 2);
We can now see that there's no longer a need for the maintainer of the code to read a whole if statement in order to see that only a single variable is updated.
Same goes for case expressions, which can also reduce the need for nesting if expressions.
Also, I can throw the question back to you: Why does C-based languages have the ternary operator ?: in addition to if statements?
Egilhh already covered the main reason, but there are sometimes other useful reasons to implement expressions. Sometimes you make packages where only one or two methods are needed and they are the only reason to make a package body. You can use expressions to make expression functions which allow you to define the operations in the spec file.
Additionally, if you ever end up with some complex variant record combinations, sometimes expressions can be used to setup default values for them in instances where you normally would not be able to as cleanly. Consider the following example:
with Ada.Text_IO; use Ada.Text_IO;
procedure Hello is
type Binary_Type is (On, Off);
type Inner(Binary : Binary_Type := Off) is record
case Binary is
when On =>
Value : Integer := 0;
when Off =>
null;
end case;
end record;
type Outer(Some_Flag : Boolean) is record
Other : Integer := 32;
Thing : Inner := (if Some_Flag then
(Binary => Off)
else
(Binary => On, Value => 23));
end record;
begin
Put_Line("Hello, world!");
end Hello;
I had something come up with a more complex setup that was meant to map to a complex messaging interface at the hardware level. It's nice to have defaults whenever possible. Now I cold have used a case inside of Outer, but then I would have had to come up with two separately named versions of the message field for each case, which really isn't optimal when you want your code to map to an ICD. Again, I could have used a function to initialize it as well, but as noted in the other posters answer, that isn't always a good way to go.
Another place that outlines the motivation for adding conditional expressions to Ada can be found in the ARG document, AI05-0147-1, which explains the motivation and gives some examples of use.
An example of a place where I find them quite useful is in processing command line parameters, for the case when a default value is used if the parameter is not specified on the command line. Generally, you'd want to declare such values as constants in one's program. Conditional expressions makes it easier to do that.
with Ada.Command_Line; use Ada;
procedure Main
is
N : constant Positive :=
(if Command_Line.Argument_Count = 0 then 2_000_000
else Positive'Value (Command_Line.Argument (1)));
...
Otherwise, without conditional expressions, in order to achieve the same effect you'd need to declare a function, which I find to be more difficult to read;
with Ada.Command_Line; use Ada;
procedure Main
is
function Get_N return Positive is
begin
if Command_Line.Argument_Count = 0 then
return 2_000_000;
else
return Positive'Value (Command_Line.Argument (1));
end if;
end Get_N;
N : constant Positive := Get_N;
...
The if expression in Ada feels and works a lot like a statement using the ternary operator in the C-based languages. I took the liberty of copying some code from learn.adacore.com that introduces the if expression:
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
procedure Check_Positive is
N : Integer;
begin
Put ("Enter an integer value: ");
Get (N);
Put (N,0);
declare
S : constant String :=
(if N > 0 then " is a positive number"
else " is not a positive number");
begin
Put_Line (S);
end;
end Check_Positive;
And I translated it to a C-based language - in this case, Java. I believe the main point to notice is that both languages, although syntactically different, are effectively doing the same thing: testing a condition and assigning one of two values to a variable all within one statement. Although I realize this is an oversimplification for most here on stackoverlfow. My goal is to help the beginner to understand the basic concept with introductory examples. Cheers.
import java.util.Scanner;
public class IfExpression {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Enter an integer value: ");
var N = in.nextInt();
System.out.print(N);
var S = N > 0 ? " is a positive number" : " is not a positive number";
System.out.println(S);
in.close();
}
}
I am wondering whether it is possible to use zero-parameter options multiple times with boost::program_options.
I have something in mind like this:
mytool --load myfile --print_status --do-something 23 --print_status
It is easy to get this working with one "print_status" parameter, but it is not obvious to me how one could use this option two times (in my case, boost throws an exception if a zero-parameter option is specified more than once).
So, the question is:
Is there any (simple) way to achieve this with out-of-the box functionality from program_options?
Right now, it seems this is a drawback of the current program_options implementation.
P.S.:
There have already been similar questions in the past (both over four years old), where no solution was found:
http://lists.boost.org/boost-users/2006/08/21631.php
http://benjaminwolsey.de/de/node/103
This thread contains a solution, but it is not obvious whether it is a working one, and it seems rather complex for such a simple feature:
Specifying levels (e.g. --verbose) using Boost program_options
If you don't need to count the number of times the option has been specified, it's fairly easy (if a little odd); just declare the variable as vector<bool> and set the following parameters:
std::vector<bool> example;
// ...
desc.add_options()
("example,e",
po::value(&example)
->default_value(std::vector<bool>(), "false")
->implicit_value(std::vector<bool>(1), "true")
->zero_tokens()
)
// ...
Specifying a vector suppresses multiple argument checking; default_value says that the vector should by default be empty, implicit_value says to set it to a 1-element vector if -e/--example is specified, and zero_tokens says not to consume any following tokens.
If -e or --example is specified at least once, example.size() will be exactly 1; otherwise it will be 0.
Example.
If you do want to count how many times the option occurs, it's easy enough to write a custom type and validator:
struct counter { int count = 0; };
void validate(boost::any& v, std::vector<std::string> const& xs, counter*, long)
{
if (v.empty()) v = counter{1};
else ++boost::any_cast<counter&>(v).count;
}
Example.
Note that unlike in the linked question this doesn't allow additionally specifying a value (e.g. --verbose 6) - if you want to do something that complex you would need to write a custom value_semantic subclass, as it's not supported by Boost's existing semantics.
Maybe I am not from this planet, but it would seem to me that the following should be a syntax error:
int a[] = {1,2,}; //extra comma in the end
But it's not. I was surprised when this code compiled on Visual Studio, but I have learnt not to trust MSVC compiler as far as C++ rules are concerned, so I checked the standard and it is allowed by the standard as well. You can see 8.5.1 for the grammar rules if you don't believe me.
Why is this allowed? This may be a stupid useless question but I want you to understand why I am asking. If it were a sub-case of a general grammar rule, I would understand - they decided not to make the general grammar any more difficult just to disallow a redundant comma at the end of an initializer list. But no, the additional comma is explicitly allowed. For example, it isn't allowed to have a redundant comma in the end of a function-call argument list (when the function takes ...), which is normal.
So, again, is there any particular reason this redundant comma is explicitly allowed?
It makes it easier to generate source code, and also to write code which can be easily extended at a later date. Consider what's required to add an extra entry to:
int a[] = {
1,
2,
3
};
... you have to add the comma to the existing line and add a new line. Compare that with the case where the three already has a comma after it, where you just have to add a line. Likewise if you want to remove a line you can do so without worrying about whether it's the last line or not, and you can reorder lines without fiddling about with commas. Basically it means there's a uniformity in how you treat the lines.
Now think about generating code. Something like (pseudo-code):
output("int a[] = {");
for (int i = 0; i < items.length; i++) {
output("%s, ", items[i]);
}
output("};");
No need to worry about whether the current item you're writing out is the first or the last. Much simpler.
It's useful if you do something like this:
int a[] = {
1,
2,
3, //You can delete this line and it's still valid
};
Ease of use for the developer, I would think.
int a[] = {
1,
2,
2,
2,
2,
2, /*line I could comment out easily without having to remove the previous comma*/
}
Additionally, if for whatever reason you had a tool that generated code for you; the tool doesn't have to care about whether it's the last item in the initialize or not.
I've always assumed it makes it easier to append extra elements:
int a[] = {
5,
6,
};
simply becomes:
int a[] = {
5,
6,
7,
};
at a later date.
Everything everyone is saying about the ease of adding/removing/generating lines is correct, but the real place this syntax shines is when merging source files together. Imagine you've got this array:
int ints[] = {
3,
9
};
And assume you've checked this code into a repository.
Then your buddy edits it, adding to the end:
int ints[] = {
3,
9,
12
};
And you simultaneously edit it, adding to the beginning:
int ints[] = {
1,
3,
9
};
Semantically these sorts of operations (adding to the beginning, adding to the end) should be entirely merge safe and your versioning software (hopefully git) should be able to automerge. Sadly, this isn't the case because your version has no comma after the 9 and your buddy's does. Whereas, if the original version had the trailing 9, they would have automerged.
So, my rule of thumb is: use the trailing comma if the list spans multiple lines, don't use it if the list is on a single line.
I am surprised after all this time no one has quoted the Annotated C++ Reference Manual(ARM), it says the following about [dcl.init] with emphasis mine:
There are clearly too many notations for initializations, but each seems to serve a particular style of use well. The ={initializer_list,opt} notation was inherited from C and serves well for the initialization of data structures and arrays. [...]
although the grammar has evolved since ARM was written the origin remains.
and we can go to the C99 rationale to see why this was allowed in C and it says:
K&R allows a trailing comma in an initializer at the end of an
initializer-list. The Standard has retained this syntax, since it
provides flexibility in adding or deleting members from an initializer
list, and simplifies machine generation of such lists.
Trailing comma I believe is allowed for backward compatibility reasons. There is a lot of existing code, primarily auto-generated, which puts a trailing comma. It makes it easier to write a loop without special condition at the end.
e.g.
for_each(my_inits.begin(), my_inits.end(),
[](const std::string& value) { std::cout << value << ",\n"; });
There isn't really any advantage for the programmer.
P.S. Though it is easier to autogenerate the code this way, I actually always took care not to put the trailing comma, the efforts are minimal, readability is improved, and that's more important. You write code once, you read it many times.
I see one use case that was not mentioned in other answers,
our favorite Macros:
int a [] = {
#ifdef A
1, //this can be last if B and C is undefined
#endif
#ifdef B
2,
#endif
#ifdef C
3,
#endif
};
Adding macros to handle last , would be big pain. With this small change in syntax this is trivial to manage. And this is more important than machine generated code because is usually lot of easier to do it in Turing complete langue than very limited preprocesor.
One of the reasons this is allowed as far as I know is that it should be simple to automatically generate code; you don't need any special handling for the last element.
It makes code generators that spit out arrays or enumerations easier.
Imagine:
std::cout << "enum Items {\n";
for(Items::iterator i(items.begin()), j(items.end); i != j; ++i)
std::cout << *i << ",\n";
std::cout << "};\n";
I.e., no need to do special handling of the first or last item to avoid spitting the trailing comma.
If the code generator is written in Python, for example, it is easy to avoid spitting the trailing comma by using str.join() function:
print("enum Items {")
print(",\n".join(items))
print("}")
The reason is trivial: ease of adding/removing lines.
Imagine the following code:
int a[] = {
1,
2,
//3, // - not needed any more
};
Now, you can easily add/remove items to the list without having to add/remove the trailing comma sometimes.
In contrast to other answers, I don't really think that ease of generating the list is a valid reason: after all, it's trivial for the code to special-case the last (or first) line. Code-generators are written once and used many times.
It allows every line to follow the same form. Firstly this makes it easier to add new rows and have a version control system track the change meaningfully and it also allows you to analyze the code more easily. I can't think of a technical reason.
The only language where it's - in practice* - not allowed is Javascript, and it causes an innumerable amount of problems. For example if you copy & paste a line from the middle of the array, paste it at the end, and forgot to remove the comma then your site will be totally broken for your IE visitors.
*In theory it is allowed but Internet Explorer doesn't follow the standard and treats it as an error
It's easier for machines, i.e. parsing and generation of code.
It's also easier for humans, i.e. modification, commenting-out, and visual-elegance via consistency.
Assuming C, would you write the following?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
puts("Line 1");
puts("Line 2");
puts("Line 3");
return EXIT_SUCCESS
}
No. Not only because the final statement is an error, but also because it's inconsistent. So why do the same to collections? Even in languages that allow you to omit last semicolons and commas, the community usually doesn't like it. The Perl community, for example, doesn't seem to like omitting semicolons, bar one-liners. They apply that to commas too.
Don't omit commas in multiline collections for the same reason you don't ommit semicolons for multiline blocks of code. I mean, you wouldn't do it even if the language allowed it, right? Right?
This is allowed to protect from mistakes caused by moving elements around in a long list.
For example, let's assume we have a code looking like this.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Super User",
"Server Fault"
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
And it's great, as it shows the original trilogy of Stack Exchange sites.
Stack Overflow
Super User
Server Fault
But there is one problem with it. You see, the footer on this website shows Server Fault before Super User. Better fix that before anyone notices.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Server Fault"
"Super User",
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
After all, moving lines around couldn't be that hard, could it be?
Stack Overflow
Server FaultSuper User
I know, there is no website called "Server FaultSuper User", but our compiler claims it exists. Now, the issue is that C has a string concatenation feature, which allows you to write two double quoted strings and concatenate them using nothing (similar issue can also happen with integers, as - sign has multiple meanings).
Now what if the original array had an useless comma at end? Well, the lines would be moved around, but such bug wouldn't have happened. It's easy to miss something as small as a comma. If you remember to put a comma after every array element, such bug just cannot happen. You wouldn't want to waste four hours debugging something, until you would find the comma is the cause of your problems.
Like many things, the trailing comma in an array initializer is one of the things C++ inherited from C (and will have to support for ever). A view totally different from those placed here is mentioned in the book "Deep C secrets".
Therein after an example with more than one "comma paradoxes" :
char *available_resources[] = {
"color monitor" ,
"big disk" ,
"Cray" /* whoa! no comma! */
"on-line drawing routines",
"mouse" ,
"keyboard" ,
"power cables" , /* and what's this extra comma? */
};
we read :
...that trailing comma after the final initializer is not a typo, but a blip in the syntax carried over from aboriginal C. Its presence or absence is allowed but has no significance. The justification claimed in the ANSI C rationale is that it makes automated generation of C easier. The claim would be more credible if trailing commas were permitted in every comma-sepa-rated list, such as in enum declarations, or multiple variable declarators in a single declaration. They are not.
... to me this makes more sense
In addition to code generation and editing ease, if you want to implement a parser, this type of grammar is simpler and easier to implement. C# follows this rule in several places that there's a list of comma-separated items, like items in an enum definition.
It makes generating code easier as you only need to add one line and don't need to treat adding the last entry as if it's a special case. This is especially true when using macros to generate code. There's a push to try to eliminate the need for macros from the language, but a lot of the language did evolve hand in hand with macros being available. The extra comma allows macros such as the following to be defined and used:
#define LIST_BEGIN int a[] = {
#define LIST_ENTRY(x) x,
#define LIST_END };
Usage:
LIST_BEGIN
LIST_ENTRY(1)
LIST_ENTRY(2)
LIST_END
That's a very simplified example, but often this pattern is used by macros for defining things such as dispatch, message, event or translation maps and tables. If a comma wasn't allowed at the end, we'd need a special:
#define LIST_LAST_ENTRY(x) x
and that would be very awkward to use.
So that when two people add a new item in a list on separate branches, Git can properly merge the changes, because Git works on a line basis.
It makes editing the code a lot easier.
I'm comparing editinc c/c++ array elements with editing json documents - if you forget to remove the last comma, the JSON will not parse. (Yes, I know JSON is not meant to be edited manually)
If you use an array without specified length,VC++6.0 can automaticly identify its length,so if you use "int a[]={1,2,};"the length of a is 3,but the last one hasn't been initialized,you can use "cout<