Removing hash from ml output

Removing hash from ml output - sml

I have written an ml function and in the output i am getting
out = Mary ("a",[Zary #,Zary #])
where Mary and Zary are constructors. But as you can see there are some "#" in the output.
if i do
val Mary("a",x) = out;
then it is showing
x = [Zary("b"),Zary("c")]; which is right.
I want to get the complete output instead of hashes. Kindly help me.

If (as seems to be the case) you're using SML/NJ, then you need to set either Control.Print.printDepth or Compiler.Control.Print.printDepth (depending on which version of SML/NJ you're using) to a value larger than its default.
The hashes are used as an abbreviation, to make complicated output more manageable. The thresholds governing how complicated a given bit of output has to be before eliding it are rather low. (There's a printLength threshold too, governing elision of long lists as opposed to deeply-nested structures.)
See http://www.smlnj.org/doc/Compiler/pages/printcontrol.html for the official documentation.

Related

Maxima: creating a function that acts on parts of a string

Context: I'm using Maxima on a platform that also uses KaTeX. For various reasons related to content management, this means that we are regularly using Maxima functions to generate the necessary KaTeX commands.
I'm currently trying to develop a group of functions that will facilitate generating different sets of strings corresponding to KaTeX commands for various symbols related to vectors.
Problem
I have written the following function makeKatexVector(x), which takes a string, list or list-of-lists and returns the same type of object, with each string wrapped in \vec{} (i.e. makeKatexVector(string) returns \vec{string} and makeKatexVector(["a","b"]) returns ["\vec{a}", "\vec{b}"] etc).
/* Flexible Make KaTeX Vector Version of List Items */
makeKatexVector(x):= block([ placeHolderList : x ],
if stringp(x) /* Special Handling if x is Just a String */
then placeHolderList : concat("\vec{", x, "}")
else if listp(x[1]) /* check to see if it is a list of lists */
then for j:1 thru length(x)
do placeHolderList[j] : makelist(concat("\vec{", k ,"}"), k, x[j] )
else if listp(x) /* check to see if it is just a list */
then placeHolderList : makelist(concat("\vec{", k, "}"), k, x)
else placeHolderList : "makeKatexVector error: not a list-of-lists, a list or a string",
return(placeHolderList));
Although I have my doubts about the efficiency or elegance of the above code, it seems to return the desired expressions; however, I would like to modify this function so that it can distinguish between single- and multi-character strings.
In particular, I'd like multi-character strings like x_1 to be returned as \vec{x}_1 and not \vec{x_1}.
In fact, I'd simply like to modify the above code so that \vec{} is wrapped around the first character of the string, regardless of how many characters there may be.
My Attempt
I was ready to tackle this with brute force (e.g. transcribing each character of a string into a list and then reassembling); however, the real programmer on the project suggested I look into "Regular Expressions". After exploring that endless rabbit hole, I found the command regex_subst; however, I can't find any Maxima documentation for it, and am struggling to reproduce the examples in the related documentation here.
Once I can work out the appropriate regex to use, I intend to implement this in the above code using an if statement, such as:
if slength(x) >1
then {regex command}
else {regular treatment}
If anyone knows of helpful resources on any of these fronts, I'd greatly appreciate any pointers at all.

Looks like you got the regex approach working, that's great. My advice about handling subscripted expressions in TeX, however, is to avoid working with names which contain underscores in Maxima, and instead work with Maxima expressions with indices, e.g. foo[k] instead of foo_k. While writing foo_k is a minor convenience in Maxima, you'll run into problems pretty quickly, and in order to straighten it out you might end up piling one complication on another.
E.g. Maxima doesn't know there's any relation between foo, foo_1, and foo_k -- those have no more in common than foo, abc, and xyz. What if there are 2 indices? foo_j_k will become something like foo_{j_k} by the preceding approach -- what if you want foo_{j, k} instead? (Incidentally the two are foo[j[k]] and foo[j, k] when represented by subscripts.) Another problematic expression is something like foo_bar_baz. Does that mean foo_bar[baz], foo[bar_baz] or foo_bar_baz?
The code for tex(x_y) yielding x_y in TeX is pretty old, so it's unlikely to go away, but over the years I've come to increasing feel like it should be avoided. However, the last time it came up and I proposed disabling that, there were enough people who supported it that we ended up keeping it.
Something that might be helpful, there is a function texput which allows you to specify how a symbol should appear in TeX output. For example:
(%i1) texput (v, "\\vec{v}");
(%o1) "\vec{v}"
(%i2) tex ([v, v[1], v[k], v[j[k]], v[j, k]]);
$$\left[ \vec{v} , \vec{v}_{1} , \vec{v}_{k} , \vec{v}_{j_{k}} ,
\vec{v}_{j,k} \right] $$
(%o2) false
texput can modify various aspects of TeX output; you can take a look at the documentation (see ? texput).

While I didn't expect that I'd work this out on my own, after several hours, I made some progress, so figured I'd share here, in case anyone else may benefit from the time I put in.
to load the regex in wxMaxima, at least on the MacOS version, simply type load("sregex");. I didn't have this loaded, and was trying to work through our custom platform, which cost me several hours.
take note that many of the arguments in the linked documentation by Dorai Sitaram occur in the reverse, or a different order than they do in their corresponding Maxima versions.
not all the "pregexp" functions exist in Maxima;
In addition to this, escaping special characters varied in important ways between wxMaxima, the inline Maxima compiler (running within Ace editor) and the actual rendered version on our platform; in particular, the inline compiler often returned false for expressions that compiled properly in wxMaxima and on the platform. Because I didn't have sregex loaded on wxMaxima from the beginning, I lost a lot of time to this.
Finally, the regex expression that achieved the desired substitution, in my case, was:
regex_subst("\vec{\\1}", "([[:alpha:]])", "v_1");
which returns vec{v}_1 in wxMaxima (N.B. none of my attempts to get wxMaxima to return \vec{v}_1 were successful; escaping the backslash just does not seem to work; fortunately, the usual escaped version \\vec{\\1} does return the desired form).
I have yet to adjust the code for the rest of the function, but I doubt that will be of use to anyone else, and wanted to be sure to post an update here, before anyone else took time to assist me.
Always interested in better methods / practices or any other pointers / feedback.

In clang-format, what do the penalties do?

The clang-format sytle options documentation includes a number of options called PenaltyXXX. The documentation doesn't explain how these penalties should be used. Can you describe how to use these penalty values and what effect they achieve (perhaps with an example)?

When you have a line that's over the line length limit, clang-format will need to insert one or more breaks somewhere. You can think of penalties as a way of discouraging certain line-breaking behavior. For instance, say you have:
Namespaces::Are::Pervasive::SomeReallyVerySuperDuperLongFunctionName(args);
// and the column limit is here: ^
Clang-format will probably format to look a little strange:
Namespaces::Are::Pervasive::SomeReallyVerySuperDuperLongFunctionName(
args);
You might decide that you're willing to violate the line length by a character or two for cases like this, so you could steer that by setting the PenaltyExcessCharacter to a low number and PenaltyBreakBeforeFirstCallParameter to a higher number.
Personally, I really dislike when the return type is on its own line, so I set PenaltyReturnTypeOnItsOwnLine to an absurdly large number.
An aside, this system was inherited from Latex, which allows you to specify all kinds of penalties for line-breaking, pagination, and hyphenation.

Can you describe how to use these penalty values and what effect they achieve (perhaps with an example)?
You can see an example in this Git 2.15 (Q4 2017) clang-format for the Git project written in C:
See commit 42efde4 (29 Sep 2017) by Johannes Schindelin (dscho).
(Merged by Johannes Schindelin -- dscho -- in commit 42efde4, 01 Oct 2017)
You can see the old and new values here:
To illustrate those values:
clang-format: adjust line break penalties
We really, really, really want to limit the columns to 80 per line: One
of the few consistent style comments on the Git mailing list is that the
lines should not have more than 80 columns/line (even if 79 columns/line
would make more sense, given that the code is frequently viewed as diff,
and diffs adding an extra character).
The penalty of 5 for excess characters is way too low to guarantee that,
though, as pointed out by Brandon Williams.
(See this thread)
From the existing clang-format examples and documentation, it appears
that 100 is a penalty deemed appropriate for Stuff You Really Don't Want, so let's assign that as the penalty for "excess characters", i.e.
overly long lines.
While at it, adjust the penalties further: we are actually not that keen
on preventing new line breaks within comments or string literals, so the
penalty of 100 seems awfully high.
Likewise, we are not all that adamant about keeping line breaks away
from assignment operators (a lot of Git's code breaks immediately after
the = character just to keep that 80 columns/line limit).
We do frown a little bit more about functions' return types being on
their own line than the penalty 0 would suggest, so this was adjusted,
too.
Finally, we do not particularly fancy breaking before the first parameter
in a call, but if it keeps the line shorter than 80 columns/line, that's
what we do, so lower the penalty for breaking before a call's first
parameter, but not quite as much as introducing new line breaks to
comments.

Is there a dataset available to fully test a base64 encode/decoder?

I see that there are many base64 implementations available in the opensource and I found multiple internal implementations in a product that I am maintaining.
I'm trying to factor out duplicates but I am not 100% certain that all these implementations give identical output. Therfore I need to have a dataset that tests all possible combinations of input.
Is that somewhere available ? google search did not really report it.
I saw a similar question on stackoverflow but that one has not been fully answered and it is actually just asking for one phrase (in ascii) that would test all 64 chars. It does not handle padding with = for example. So one test string will certainly not fit the bill for a 100% test.

Perhaps something like Base64Test in Bouncy Castle would do what you want?. The tricky part in base64 is handling the padding correctly. It's certainly important to cover that as you mentioned. Accordingly, RFC 4648 specifies these test vectors:
BASE64("") = ""
BASE64("f") = "Zg=="
BASE64("fo") = "Zm8="
BASE64("foo") = "Zm9v"
BASE64("foob") = "Zm9vYg=="
BASE64("fooba") = "Zm9vYmE="
BASE64("foobar") = "Zm9vYmFy"
Some of your implementations may produce base64 output that differs only by whether they insert line breaks, and where implementations that break lines insert the break and the line termination used. You would have to do additional testing to determine whether you can safely replace an implementation that's using one style with a different one. In particular, a decoder might make assumptions about line length or termination.

uuid_generate_random (libuuid) on solaris

I have a simple C++ function that calls "uuid_generate_random". It seems the generated UUID is not standard compliant (UUID Version 4). Any Ideas?
The code I am using is (uuid_generate_random & uuid_unparse functions are available in libuuid):
char uuidBuff[36];
uuid_t uuidGenerated;
uuid_generate_random(uuidGenerated);
uuid_unparse(uuidGenerated, uuidBuff);
I understand that "uuid_generate_random" generates the UUID version 4, however, the string that is being returned is like:
Run1) 2e9dc3c5-426a-eea0-8186-f6b65b5dc361
Run2) 112c6a78-51bc-cbbb-ae9c-92a33e1fd95f
Run3) 5abd3d6d-0259-ce5c-8428-a51bacc04c0f
Run4) 66a22485-7e5f-e5c2-a317-c1b295bf124f
It appears that it is returning random characters for the whole string. The definition from wikipedia, it should be in the form of:
xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx with any hexadecimal digits for x but only one of 8, 9, A, or B for y.
Any input will be greatly appreciated.
Regards
Daniel

You're quite correct that libuuid seems not to be adhering to the ITU recommendation... It could be argued that the recommendation itself is overly pedantic about retaining version information, as it serves little purpose beyond allowing technorati to easily distinguish how the UUID was generated, but that's beside the point.
The good news is that, if you care to, you can easily make these into conformant UUIDs through the simple expedient of smashing in the right version bits. :-)

How to Predict if Function Name Follows Convention

Suppose you have a repository of 10,000 function names and possibly their frequency of use in a corpus of code which can be in C/C#/C++. (they have different conventions usually prescribed)
Some Samples may be:
DoPaint
OnPaint
CloseWindow
DeleteGraphOnClose
FreeConnection
ConnectInternat (smallTypo, but part of code)
FreeSoH
Now given a function name, how can we predict if the name follows the convention of Human Generated Name?
Note:
Obviously all candidate names will be valid names
generated names can have arbitrary characters and will be treated as bad
Letter cases can get garbled up
Some candidates:
Z090292 - not likely
onDelete - likely
CloseWindow - likely
iGetIndex - unlikely
Any pointers on technique and software are welcome

You could try conducting some Bayesian analysis on the text:
Load the list of names (and their frequencies) into your program. It might be worth tokenising the names at this point. So e.g. CloseWindow becomes Close and Window, with the frequency of both incremented. At this point it would also be useful to load in some non human function names to train the program in nagatives as well.
Take a function name, and using the data you have just gathered find the probability of each part coming up
P((HumanGenerated|Seeing the Token) = P(Seeing the Token|Human Generated) * P(Humangenerated)) / P(Seeing the Token)
In this case the probability of something being human or computer generated would be decided based on known knowledge i.e. what percentage of function names are thought to be human generated.
The probability of seeing the token ( P(Seeing the Token)) would have to gradually evolve. It would consist of the number of of times the token is seen in human functions and the number of times it is seen in computer functions...this solution is based on the premise that the program learns over time (and thus needs to be trained)
The result, P((HumanGenerated|Seeing the Token) , will give you a probability of the function name being generated by a human.
NB: This is only a rough outline, many details are missing. If you are interested in this line of investigation that I would suggested reading up on probability theory and in particular Bayesian analysis

Split the identifiers into individal words (based on capitalization), and put the words into a spell checker (such as ispell). Consider all words with spelling errors as non-human-generated, along with the identifiers in which they occur.

A friend of mine might help. He is doing a PhD on this very subject, as far as I can tell.
Home page

Predicting if it's human-generated is a very tricky question. Analyzing the code base to find the function names is easier - you might look at tools such as NDepend.

You can probably detect camelcase. Also, you could possible do a regex search for typical words like: do, get, set, in, etc before the next capitalized word.

In addition to using a dictionary as Martin V. Lowes suggested is a good one, but you have to remember to also account for the following common forms of variables:
Single-letter variable names.
Variable names that use underscores instead of camel case.
Metasyntactic variables.
Hungarian notation.
Keywords/types with a character attached (i.e. $return or list_).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Removing hash from ml output - sml

Related

Maxima: creating a function that acts on parts of a string

In clang-format, what do the penalties do?

Is there a dataset available to fully test a base64 encode/decoder?

uuid_generate_random (libuuid) on solaris

How to Predict if Function Name Follows Convention

Categories

Resources