String functions in MediaWiki template? - templates

One of the more interesting "programming languages" I've been stuck with lately is MediaWiki templates. You can do a surprising amount of stuff with the limited syntax they give you, but recently I've run into a problem that stumps me: using string functions on template arguments. What I'd like to do (somewhat simplified) is:
{{myTemp|a=1,2,3,4}}
then write a template that can do some sort of magic like
You told me _a_ starts with {{#split:{{{a}}}, ",", 0}}
At present, I can do this with embedded javascript, capturing regexp matching, and document.write, but a) it's huge, b) it's hacky, and c) it will break horribly if anybody turns off javascript. (Note that "split" is merely an example; concatenate, capturing-regexp matching, etc., would be even better)
I realize the right solution is to have the caller invoke the template with separate arguments, but for various reasons that would be hard in my particular case. If it's simply not possible, I guess that's the answer, but if there is some way to have templates do string-manipulation on the back end, that'd be great.

Concatenate is easy. To assign x = y concat z
{{#vardefine:x|{{{y}}}{{{z}}}}}
And, to add to Mark's answer, there are also RegexParserFunctions
Ceterum censeo: MediaWiki will never be not hacky.

You can do this with extensions, e.g. StringFunctions. But see also ParserFunctions and ParserFunctions/Extended. (You'll find a lot more examples in the Category:Parser function extensions.)
A great overview Help:Extension:ParserFunctions.

Related

Abstract structure of Clojure

I've been learning Clojure and am a good way through a book on it when I realized how much I'm still struggling to interpret the code. What I'm looking for is the abstract structure, interface, or rules, Clojure uses to parse code. I think it looks something like:
(some-operation optional-args)
optional-args can be nearly anything and that's where I start getting confused.
(operation optional-name-string [vector of optional args]) would equal (defn newfn [argA, argB])
I think this pattern holds for all lists () but with so much flexibility and variation in Clojure, I'm not sure. It would be really helpful to see the rules the interpreter follows.
You are not crazy. Sure it's easy to point out how "easy" ("simple"? but that another discussion) Clojure syntax is but there are two things for a new learner to be aware of that are not pointed out very clearly in beginning tutorials that greatly complicate understanding what you are seeing:
Destructuring. Spend some quality time with guides on destructuring in Clojure. I will say that this adds a complexity to the language and is not dissimilar from "*args" and "**kwargs" arguments in Python or from the use of the "..." spread operator in javascript. They are all complicated enough to require some dedicated time to read. This relates to the optional-args you reference above.
macros and metaprogramming. In the some-operation you reference above, you wish to "see the rules the interpreter follows". In the majority of the cases it is a function but Clojure provides you no indication of whether you are looking at a function or a macro. In the standard library, you will just need to know some standard macros and how they affect the syntax they headline. (e.g. if, defn etc). For included libraries, there will typically be a small set of macros that are core to understanding that library. Any macro will to modify, dare I say, complicate the syntax in the parens you are looking at so be on your toes.
Clojure is fantastic and easy to learn but those two points are not to be glossed over IMHO.
Before you start coding with Clojure, I highly recommend studying functional programming and LISB. In Clojure, everything is a prefix, and when you want to run and specific function, you will call it and then feed it with some arguments. for example, 1+2+3 will be (+ 1 2 3) in Clojure. In other words, every function you call will be at the start of a parenthesis, and all of its arguments will be follows the function name.
If you define a function, you may do as follow:
(defn newfunc [a1 a2]
(+ 100 a1 a2))
Which newfunc add 100 and a1 and a2. When you call it, you should do this:
(newfunc 1 2)
and the result will be 103.
in the first example, + is a function, so we call it at the beginning of the parenthesis.
Clojure is a beautiful world full of simplicity. Please learn it deeply.

Maxima: creating a function that acts on parts of a string

Context: I'm using Maxima on a platform that also uses KaTeX. For various reasons related to content management, this means that we are regularly using Maxima functions to generate the necessary KaTeX commands.
I'm currently trying to develop a group of functions that will facilitate generating different sets of strings corresponding to KaTeX commands for various symbols related to vectors.
Problem
I have written the following function makeKatexVector(x), which takes a string, list or list-of-lists and returns the same type of object, with each string wrapped in \vec{} (i.e. makeKatexVector(string) returns \vec{string} and makeKatexVector(["a","b"]) returns ["\vec{a}", "\vec{b}"] etc).
/* Flexible Make KaTeX Vector Version of List Items */
makeKatexVector(x):= block([ placeHolderList : x ],
if stringp(x) /* Special Handling if x is Just a String */
then placeHolderList : concat("\vec{", x, "}")
else if listp(x[1]) /* check to see if it is a list of lists */
then for j:1 thru length(x)
do placeHolderList[j] : makelist(concat("\vec{", k ,"}"), k, x[j] )
else if listp(x) /* check to see if it is just a list */
then placeHolderList : makelist(concat("\vec{", k, "}"), k, x)
else placeHolderList : "makeKatexVector error: not a list-of-lists, a list or a string",
return(placeHolderList));
Although I have my doubts about the efficiency or elegance of the above code, it seems to return the desired expressions; however, I would like to modify this function so that it can distinguish between single- and multi-character strings.
In particular, I'd like multi-character strings like x_1 to be returned as \vec{x}_1 and not \vec{x_1}.
In fact, I'd simply like to modify the above code so that \vec{} is wrapped around the first character of the string, regardless of how many characters there may be.
My Attempt
I was ready to tackle this with brute force (e.g. transcribing each character of a string into a list and then reassembling); however, the real programmer on the project suggested I look into "Regular Expressions". After exploring that endless rabbit hole, I found the command regex_subst; however, I can't find any Maxima documentation for it, and am struggling to reproduce the examples in the related documentation here.
Once I can work out the appropriate regex to use, I intend to implement this in the above code using an if statement, such as:
if slength(x) >1
then {regex command}
else {regular treatment}
If anyone knows of helpful resources on any of these fronts, I'd greatly appreciate any pointers at all.
Looks like you got the regex approach working, that's great. My advice about handling subscripted expressions in TeX, however, is to avoid working with names which contain underscores in Maxima, and instead work with Maxima expressions with indices, e.g. foo[k] instead of foo_k. While writing foo_k is a minor convenience in Maxima, you'll run into problems pretty quickly, and in order to straighten it out you might end up piling one complication on another.
E.g. Maxima doesn't know there's any relation between foo, foo_1, and foo_k -- those have no more in common than foo, abc, and xyz. What if there are 2 indices? foo_j_k will become something like foo_{j_k} by the preceding approach -- what if you want foo_{j, k} instead? (Incidentally the two are foo[j[k]] and foo[j, k] when represented by subscripts.) Another problematic expression is something like foo_bar_baz. Does that mean foo_bar[baz], foo[bar_baz] or foo_bar_baz?
The code for tex(x_y) yielding x_y in TeX is pretty old, so it's unlikely to go away, but over the years I've come to increasing feel like it should be avoided. However, the last time it came up and I proposed disabling that, there were enough people who supported it that we ended up keeping it.
Something that might be helpful, there is a function texput which allows you to specify how a symbol should appear in TeX output. For example:
(%i1) texput (v, "\\vec{v}");
(%o1) "\vec{v}"
(%i2) tex ([v, v[1], v[k], v[j[k]], v[j, k]]);
$$\left[ \vec{v} , \vec{v}_{1} , \vec{v}_{k} , \vec{v}_{j_{k}} ,
\vec{v}_{j,k} \right] $$
(%o2) false
texput can modify various aspects of TeX output; you can take a look at the documentation (see ? texput).
While I didn't expect that I'd work this out on my own, after several hours, I made some progress, so figured I'd share here, in case anyone else may benefit from the time I put in.
to load the regex in wxMaxima, at least on the MacOS version, simply type load("sregex");. I didn't have this loaded, and was trying to work through our custom platform, which cost me several hours.
take note that many of the arguments in the linked documentation by Dorai Sitaram occur in the reverse, or a different order than they do in their corresponding Maxima versions.
not all the "pregexp" functions exist in Maxima;
In addition to this, escaping special characters varied in important ways between wxMaxima, the inline Maxima compiler (running within Ace editor) and the actual rendered version on our platform; in particular, the inline compiler often returned false for expressions that compiled properly in wxMaxima and on the platform. Because I didn't have sregex loaded on wxMaxima from the beginning, I lost a lot of time to this.
Finally, the regex expression that achieved the desired substitution, in my case, was:
regex_subst("\vec{\\1}", "([[:alpha:]])", "v_1");
which returns vec{v}_1 in wxMaxima (N.B. none of my attempts to get wxMaxima to return \vec{v}_1 were successful; escaping the backslash just does not seem to work; fortunately, the usual escaped version \\vec{\\1} does return the desired form).
I have yet to adjust the code for the rest of the function, but I doubt that will be of use to anyone else, and wanted to be sure to post an update here, before anyone else took time to assist me.
Always interested in better methods / practices or any other pointers / feedback.

Regex that matches a list of comma separated items in any order

I have three "Clue texts" that say:
SomeClue=someText
AnotherClue=somethingElse
YetAnotherClue=moreText
I need to parse a string and see if it contains exactly these 3 texts, separated by a comma. No Clue Text contains any comma.
The problem is, they can be in any order and they must be the only clues in the string.
Matches:
SomeClue=someText,AnotherClue=somethingElse,YetAnotherClue=moreText
SomeClue=someText,YetAnotherClue=moreText,AnotherClue=somethingElse
AnotherClue=somethingElse,SomeClue=someText,YetAnotherClue=moreText
YetAnotherClue=moreText,SomeClue=someText,AnotherClue=somethingElse
Non-Matches:
SomeClue=someText,AnotherClue=somethingElse,YetAnotherClue=moreText,
SomeClue=someText,YetAnotherClue=moreText,,AnotherClue=somethingElse
,AnotherClue=somethingElse,SomeClue=someText,YetAnotherClue=moreText
YetAnotherClue=moreText,SomeClue=someText,AnotherClue=somethingElse,UselessText
YetAnotherClue=moreText,SomeClue=someText,AnotherClue=somethingElse,AClueThatIDontWant=wrongwrongwrong
Putting togheter what I found on other posts, I have:
(?=.*SomeClue=someText($|,))(?=.*AnotherClue=somethingElse($|,))(?=.*YetAnotherClue=moreText($|,))
This works as far as Clues and their order are concerned.
Unfortunately, I can't find a way to avoid adding a comma and then some stupid text at the end.
My real case has somewhat more complicated Clue Texts, because each of them is a small regex, but I am pretty sure once I know how to handle commas, the rest will be easy.
I think you'd be better off with a stronger tool than regexes (and I genuinely love regular expressions). Regexes aren't good with needing supplementary memory, which is what you have here: you need exactly these 3, but they can come in any order.
In principle, you could write a regex for each of the 6 permutations. But that would never scale. You ought to use something with parsing power.
I suggest writing a verification function in your favorite scripting language, made up of underlying string functions.
In basic Python, you could do (for instance)
ref = set(['SomeClue=someText', 'AnotherClue=somethingElse', 'YetAnotherClue=moreText'])
def ismatch(myline):
splt = myline.split(',')
return ref == set(splt)
You can tweak that as necessary, of course. Note that this nearly-complete solution is not really longer, and much more readable, than any regex would be.

Regular Expression for whole world

First of all, I use C# 4.0 to parse the code of a VB6 application.
I have some old VB6 code and about 500+ copies of it. And I use a regular expression to grab all kinds of global variables from the code. The code is described as "Yuck" and some poor victim still has to support this. So I'm hoping to help this poor sucker a bit by generating overviews of specific constants. (And yes, it should be rewritten but it ain't broke, so...)
This is a sample of a code line I need to match, in this case all boolean constants:
Public Const gDemo = False 'Is this a demo version
And this is the regular expression I use at this moment:
Public\s+Const\s+g(?'Name'[a-zA-Z][a-zA-Z0-9]*)\s+=\s+(?'Value'[0-9]*)
And I think it too is yuckie, since the * at the end of the boolean group. But if I don't use it, it will only return 'T' or 'F'. I want the whole word.
Is this the proper RegEx to use as solution or is there an even nicer-looking option?
FYI, I use similar regexs to find all string constants and all numeric constants. Those work just fine. And basically the same .BAS file is used for all 50 copies but with different values for all these variables. By parsing all files, we have a good overview of how every version is configured.
And again, yes, we need to rebuild the whole project from scratch since it becomes harder to maintain these days. But it works and we need the manpower for other tasks. It just needs the occasional tweaks...
You can use: Public\s+Const\s+g(?<Name>[a-zA-Z][a-zA-Z0-9]*)\s+=\s+(?<Value>False|True)
demo

hierarchical regex expression

Is it possible/practical to build a single regular expression that matches hierarchical data?
For example:
<h1>Action</h1>
<h2>Title1</h2><div>data1</div>
<h2>Title2</h2><div>data2</div>
<h1>Adventure</h1>
<h2>Title3</h2><div>data3</div>
I would like to end up with matches.
"Action", "Title1", "data1"
"Action", "Title2", "data2"
"Adventure", "Title3", "data3"
As I see it this would require knowing that there is a hierarchical structure at play here and if I code the pattern to capture the H1, it only matches the first entry of that hierarchy. If I don't code for H1 then I can't capture it. Was wondering if there are any special tricks I an employ to solve this.
This is a .NET project.
The solution is to not use regular expressions. They're not powerful enough for this sort of thing.
What you want is a parser - since it looks like you're trying to match HTML, there are plenty to choose from.
It's generally considered bad practice to attempt to parse HTML/XML with RegEx, precisely because it's hierarchical. You COULD use a recursive function to do so, but a better solution in this case is to use a real XML parser. I couldn't give you better advice than that without knowing the platform you're using.
EDIT: Regex is also very slow, which is another reason it's bad for processing HTML; however, I don't know that an XML/DOM processor is likely to be faster since it's likely to use a lot more memory.
If you JUST want data from a simple document like you've demonstrated, and/or if you want to build a solution yourself, it's not that tough to do. Just build a simple, recursive state-based stream processor that looks for tags and passes the contents to the the next recursive level.
For example:
- In a recursive function, seek out a "<" character.
- Now find a ">" character.
- Preserve everything you find until the next "<" character.
- Find a ">" character.
- Pass whatever you found between those tags into the recursive function.
You'd have to work out error checking yourself, but the base case (when you return back up to the previous level) is just when there's nothing else to find.
Maybe this helps, maybe not. Good luck to you.
Regex does not work for this type of data. It is not regular, per se.
You should use an XML parser for this.