xsl:value-of vs xsl:apply-templates: what is faster? - xslt

Looking at all the examples of XSL, I've noticed that
some people use xsl:value-of while others use apply-templates. Which method is faster than the another and why?

Firstly, they don't do the same thing, so it's not a simple choice based on performance.
Secondly, you can't ask questions about the performance of language constructs except in the context of a specific processor. There are a dozen popular XSLT processors and another dozen that are less well known, and they all perform differently.
Thirdly, if anyone tells you that one is faster than the other, they probably won't tell you all the configuration variables that you need to reproduce their measurements, and if they do, your own circumstances will probably be quite different.
So the best thing to do is to measure it and see, and tell us what results you obtained.

Related

CUDA: Synchronize global memory writing & reading with compute capability 1.1

Well, found the solution on nVidia forums (https://devtalk.nvidia.com/default/topic/458062/atomicadd-float-float-atomicmul-float-float-/).
Thanks.
Serialization can be achieved by using Atomic Functions.
Compute Capability 1.1 does not support atomicAdd() for floating point numbers but any atomic operation can be implemented based on atomicCAS() (Compare And Swap).
Firstly, this code is a mess of repetitive garble that'd make debugging quite painful. Work out which sub-expressions are redundant and separate them into different variables, so that your code becomes more legible. Otherwise, it makes about as much sense to ask for help regarding this code as it'd make to ask for help winning the lottery. Nobody's going to bother reading your code because it's an eyesore.
Get a single-threaded solution running. Use a profiler to determine which parts of this code would be best exposed to parallelisation, otherwise your optimisation is just guesswork which you can't put in measurable terms. I would guess that once you have the single-threaded solution running, you might get fairly good performance from running that same solution, in parallel, on each core, on an independant range of values so there's virtually no need for synchronisation.

Which XSLT processor actually took advantage of parallel-processing?

XSLT processing has the potential to be very fast because many of it's language elements allows things to be processed in parallel.
However even though theoretically things can run in parallel and processing can be insanely fast, in practice is there an actual implementation of an XSLT processor that took advantage of this potential and actually run things in parallel?
You'll probably have to look at the high-end commercial XSLT processors (Datapower, Intel) for this kind of capability. There's very little technical information available about either, but there have been one or two conference papers describing techniques that may or may not have found their way into product.
(Personally, I have a bit of a feeling that both these products sell on the basis that if the product is expensive, it must be good. But that feeling is based solely on the absence of information, rather than on any real knowledge.)
Saxon's documentation http://www.saxonica.com/documentation/extensions/attributes/threads.xml documents an extension attribute for xsl:for-each to specify the number of threads to be used to execute the for-each. It is only available in the commercial version of Saxon, I haven't used that so I can't tell you more about it.

Do very long methods always need refactoring?

I face a situation where we have many very long methods, 1000 lines or more.
To give you some more detail, we have a list of incoming high level commands, and each generates results in a longer (sometime huge) list of lower level commands. There's a factory creating an instance of a class for each incoming command. Each class has a process method, where all the lower level commands are generated added in sequence. As I said, these sequences of commands and their parameters cause quite often the process methods to reach thousands of lines.
There are a lot of repetitions. Many command patterns are shared between different commands, but the code is repeated over and over. That leads me to think refactoring would be a very good idea.
On the contrary, the specs we have come exactly in the same form as the current code. Very long list of commands for each incoming one. When I've tried some refactoring, I've started to feel uncomfortable with the specs. I miss the obvious analogy between the specs and code, and lose time digging into newly created common classes.
Then here the question: in general, do you think such very long methods would always need refactoring, or in a similar case it would be acceptable?
(unfortunately refactoring the specs is not an option)
edit:
I have removed every reference to "generate" cause it was actually confusing. It's not auto generated code.
class InCmd001 {
OutMsg process ( InMsg& inMsg ) {
OutMsg outMsg = OutMsg::Create();
OutCmd001 outCmd001 = OutCmd001::Create();
outCmd001.SetA( param.getA() );
outCmd001.SetB( inMsg.getB() );
outMsg.addCmd( outCmd001 );
OutCmd016 outCmd016 = OutCmd016::Create();
outCmd016.SetF( param.getF() );
outMsg.addCmd( outCmd016 );
OutCmd007 outCmd007 = OutCmd007::Create();
outCmd007.SetR( inMsg.getR() );
outMsg.addCmd( outCmd007 );
// ......
return outMsg;
}
}
here the example of one incoming command class (manually written in pseudo c++)
Code never needs refactoring. The code either works, or it doesn't. And if it works, the code doesn't need anything.
The need for refactoring comes from you, the programmer. The person reading, writing, maintaining and extending the code.
If you have trouble understanding the code, it needs to be refactored. If you would be more productive by cleaning up and refactoring the code, it needs to be refactored.
In general, I'd say it's a good idea for your own sake to refactor 1000+ line functions. But you're not doing it because the code needs it. You're doing it because that makes it easier for you to understand the code, test its correctness, and add new functionality.
On the other hand, if the code is automatically generated by another tool, you'll never need to read it or edit it. So what'd be the point in refactoring it?
I understand exactly where you're coming from, and can see exactly why you've structured your code the way it is, but it needs to change.
The uncertainty you feel when you attempt to refactor can be ameliorated by writing unit tests. If you've tests specific to each spec, then the code for each spec can be refactored until you're blue in the face, and you can have confidence in it.
A second option, is it possible to automatically generate your code from a data structure?
If you've a core suite of classes that do the donkey work and edge cases, you can auto-generate the repetitive 1000 line methods as often as you wish.
However, there are exceptions to every rule.
If the methods are a literal interpretation of the spec (very little additional logic), and the specs change infrequently, and the "common" portions (i.e. bits that happen to be the same right now) of the specs change at different times, and you're not going to be asked to get a 10x performance gain out of the code anytime soon, then (and only then) . . . you may be better off with what you have.
. . . but on the whole, refactor.
Yes, always. 1000 lines is at least 10x longer than any function should ever be, and I'm tempted to say 100x, except that when dealing with input parsing and validation it can become natural to write functions with 20 or so lines.
Edit: Just re-read your question and I'm not clear on one point - are you talking about machine generated code that no-one has to touch? In which case I would leave things as they are.
Refectoring is not the same as writing from scratch. While you should never write code like this, before you refactor it, you need to consider the costs of refactoring in terms of time spent, the associated risks in terms of breaking code that already works, and the net benefits in terms of future time saved. Refactor only if the net benefits outweigh the associated costs and risks.
Sometimes wrapping and rewriting can be a safer and more cost effective solution, even if it appears expensive at first glance.
Long methods need refactoring if they are maintained (and thus need to be understood) by humans.
As a rule of thumb, code for humans first. I don't agree with the common idea that functions need to be short. I think what you need to aim at is when a human reads your code they grok it quickly.
To this effect it's a good idea to simplify things as much as possible--but not more than that. It's a good idea to delegate roughly one task for each function. There is no rule as for what "roughly one task" means: you'll have to use your own judgement for that. But do recognize that a function split into too many other functions itself reduces readability. Think about the human being who reads your function for the first time: they would have to follow one function call after another, constantly context-switching and maintaining a stack in their mind. This is a task for machines, not for humans.
Find the balance.
Here, you see how important naming things is. You will see it is not that easy to choose names for variables and functions, it takes time, but on the other hand it can save a lot of confusion on the human reader's side. Again, find the balance between saving your time and the time of the friendly humans who will follow you.
As for repetition, it's a bad idea. It's something that needs to be fixed, just like a memory leak. It's a ticking bomb.
As others have said before me, changing code can be expensive. You need to do the thinking as for whether it will pay off to spend all this time and effort, facing the risks of change, for a better code. You will possibly lose lots of time and make yourself one headache after another now, in order to possibly save lots of time and headache later.
Take a look at the related question How many lines of code is too many?. There are quite a few tidbits of wisdom throughout the answers there.
To repost a quote (although I'll attempt to comment on it a little more here)... A while back, I read this passage from Ovid's journal:
I recently wrote some code for
Class::Sniff which would detect "long
methods" and report them as a code
smell. I even wrote a blog post about
how I did this (quelle surprise, eh?).
That's when Ben Tilly asked an
embarrassingly obvious question: how
do I know that long methods are a code
smell?
I threw out the usual justifications,
but he wouldn't let up. He wanted
information and he cited the excellent
book Code Complete as a
counter-argument. I got down my copy
of this book and started reading "How
Long Should A Routine Be" (page 175,
second edition). The author, Steve
McConnell, argues that routines should
not be longer than 200 lines. Holy
crud! That's waaaaaay to long. If a
routine is longer than about 20 or 30
lines, I reckon it's time to break it
up.
Regrettably, McConnell has the cheek
to cite six separate studies, all of
which found that longer routines were
not only not correlated with a greater
defect rate, but were also often
cheaper to develop and easier to
comprehend. As a result, the latest
version of Class::Sniff on github now
documents that longer routines may not
be a code smell after all. Ben was
right. I was wrong.
(The rest of the post, on TDD, is worth reading as well.)
Coming from the "shorter methods are better" camp, this gave me a lot to think about.
Previously my large methods were generally limited to "I need inlining here, and the compiler is being uncooperative", or "for one reason or another the giant switch block really does run faster than the dispatch table", or "this stuff is only called exactly in sequence and I really really don't want function call overhead here". All relatively rare cases.
In your situation, though, I'd have a large bias toward not touching things: refactoring carries some inherent risk, and it may currently outweigh the reward. (Disclaimer: I'm slightly paranoid; I'm usually the guy who ends up fixing the crashes.)
Consider spending your efforts on tests, asserts, or documentation that can strengthen the existing code and tilt the risk/reward scale before any attempt to refactor: invariant checks, bound function analysis, and pre/postcondition tests; any other useful concepts from DBC; maybe even a parallel implementation in another language (maybe something message oriented like Erlang would give you a better perspective, given your code sample) or even some sort of formal logical representation of the spec you're trying to follow if you have some time to burn.
Any of these kinds of efforts generally have a few results, even if you don't get to refactor the code: you learn something, you increase your (and your organization's) understanding of and ability to use the code and specifications, you might find a few holes that really do need to be filled now, and you become more confident in your ability to make a change with less chance of disastrous consequences.
As you gain a better understanding of the problem domain, you may find that there are different ways to refactor you hadn't thought of previously.
This isn't to say "thou shalt have a full-coverage test suite, and DBC asserts, and a formal logical spec". It's just that you are in a typically imperfect situation, and diversifying a bit -- looking for novel ways to approach the problems you find (maintainability? fuzzy spec? ease of learning the system?) -- may give you a small bit of forward progress and some increased confidence, after which you can take larger steps.
So think less from the "too many lines is a problem" perspective and more from the "this might be a code smell, what problems is it going to cause for us, and is there anything easy and/or rewarding we can do about it?"
Leaving it cooking on the backburner for a bit -- coming back and revisiting it as time and coincidence allows (e.g. "I'm working near the code today, maybe I'll wander over and see if I can't document the assumptions a bit better...") may produce good results. Then again, getting royally ticked off and deciding something must be done about the situation is also effective.
Have I managed to be wishy-washy enough here? My point, I think, is that the code smells, the patterns/antipatterns, the best practices, etc -- they're there to serve you. Experiment to get used to them, and then take what makes sense for your current situation, and leave the rest.
I think you first need to "refactor" the specs. If there are repetitions in the spec it also will become easier to read, if it makes use of some "basic building blocks".
Edit: As long as you cannot refactor the specs, I wouldn't change the code.
Coding style guides are all made for easier code maintenance, but in your special case the ease of maintenance is achieved by following the spec.
Some people here asked if the code is generated. In my opinion it does not matter: If the code follows the spec "line by line" it makes no difference if the code is generated or hand-written.
1000 thousand lines of code is nothing. We have functions that are 6 to 12 thousand lines long. Of course those functions are so big, that literally things get lost in there, and no tool can help us even look at high level abstractions of them. the code is now unfortunately incomprehensible.
My opinion of functions that are that big, is that they were not written by brilliant programmers but by incompetent hacks who shouldn't be left anywhere near a computer - but should be fired and left flipping burgers at McDonald's. Such code wreaks havok by leaving behind features that cannot be added to or improved upon. (too bad for the customer). The code is so brittle that it cannot be modified by anyone - even the original authors.
And yes, those methods should be refactored, or thrown away.
Do you ever have to read or maintain the generated code?
If yes, then I'd think some refactoring might be in order.
If no, then the higher-level language is really the language you're working with -- the C++ is just an intermediate representation on the way to the compiler -- and refactoring might not be necessary.
Looks to me that you've implemented a separate language within your application - have you considered going that way?
It has been my understanding that it's recommended that any method over 100 lines of code be refactored.
I think some rules may be a little different in his era when code is most commonly viewed in an IDE. If the code does not contain exploitable repetition, such that there are 1,000 lines which are going to be referenced once each, and which share a significant number of variables in a clear fashion, dividing the code into 100-line routines each of which is called once may not be that much of an improvement over having a well-formatted 1,000-line module which includes #region tags or the equivalent to allow outline-style viewing.
My philosophy is that certain layouts of code generally imply certain things. To my mind, when a piece of code is placed into its own routine, that suggests that the code will be usable in more than one context (exception: callback handlers and the like in languages which don't support anonymous methods). If code segment #1 leaves an object in an obscure state which is only usable by code segment #2, and code segment #2 is only usable on a data object which is left in the state created by #1, then absent some compelling reason to put the segments in different routines, they should appear in the same routine. If a program puts objects through a chain of obscure states extending for many hundreds of lines of code, it might be good to rework the design of the code to subdivide the operation into smaller pieces which have more "natural" pre- and post- conditions, but absent some compelling reason to do so, I would not favor splitting up the code without changing the design.
For further reading, I highly recommend the long, insightful, entertaining, and sometimes bitter discussion of this topic over on the Portland Pattern Repository.
I've seen cases where it is not the case (for example, creating an Excel spreadsheet in .Net often requires a lot of line of code for the formating of the sheet), but most of the time, the best thing would be to indeed refactor it.
I personally try to make a function small enough so it all appears on my screen (without affecting the readability of course).
1000 lines? Definitely they need to be refactored. Also not that, for example, default maximum number of executable statements is 30 in Checkstyle, well-known coding standard checker.
If you refactor, when you refactor, add some comments to explain what the heck it's doing.
If it had comments, it would be much less likely a candidate for refactoring, because it would already be easier to read and follow for someone starting from scratch.
Then here the question: in general, do
you think such very long methods would
always need refactoring,
if you ask in general, we will say Yes .
or in a
similar case it would be acceptable?
(unfortunately refactoring the specs
is not an option)
Sometimes are acceptable, but is very unusual, I will give you a pair of examples:
There are some 8 bit microcontrollers called Microchip PIC, that have only a fixed 8 level stack, so you can't nest more than 8 calls, then care must be taken to avoid "stack overflow", so in this special case having many small function (nested) is not the best way to go.
Other example is when doing optimization of code (at very low level) so you have to take account the jump and context saving cost. Use it with care.
EDIT:
Even in generated code, you could need to refactorize the way its generated, for example for memory saving, energy saving, generate human readable, beauty, who knows, etc..
There has been very good general advise, here a practical recommendation for your sample:
common patterns can be isolated in plain feeder methods:
void AddSimpleTransform(OutMsg & msg, InMsg const & inMsg,
int rotateBy, int foldBy, int gonkBy = 0)
{
// create & add up to three messages
}
You might even improve that by making this a member of OutMsg, and using a fluent interface, such that you can write
OutMsg msg;
msg.AddSimpleTransform(inMsg, 12, 17)
.Staple("print")
.AddArtificialRust(0.02);
which can be an additional improvement under circumstances.

Tool to identify the similarities in logic between a function of C and C++

Is there a tool in either Linux/Windows that would enable us to determine if a logic of the particular function in C is same as that of a particular function in C++ ?
In general, the equivalence of Turing machines is undecidable, so no.
If you are just talking of control structures, if/else, blocks of code, swtich/case, while, for, etc AND if you are willing to be able to accept "gettign a good feel for it", rather than 100% accuray, then a picture may be work a thousand words, and you might look at a code to flowchart program.
I won't recommend any, as I don't know them well enough (but have always wanted to try them out, espcially if round trip. It might not be easy to find something free. In general, you will see something like this ... alt text http://www.ezprog.com/wp-content/uploads/flowchart.gif
is that what you have is mind? Do it for both C and C++ versions, and you can get a rough feel for similarity of logic.
Perhaps you can tell us a little more what exactly you are looking for? Help us to help you? Thanks.
You can imagine a tool that compares the structure of ASTs after the compiler has done the initial conversion to abstract representation or after one of more optimization passes.
This would probably
Miss some real matches (i.e. generate false negatives)
Identify some bogus matches (i.e. generate false positives)
With tuning you could force the second case to be more common. I have no feel for how good it would have to be to be useful as a front end to a vgrep process.
But it get worse, because you've asking for a cross-language implementation, and that will make it harder. Still, gcc uses the same abstract representation for everything, so it is not beyond imagining.
That said, I know of no such tool.
I think there is such a tool, called an assembly listing.

What's so bad about xsl:for-each?

I hear time and time again about how you should avoid the use of XSLT for-each. That it's your inner imperative programming demon that should be banished.
What's so bad about it?
Does this best practice matter depending on the size of XML (i.e 100 vs 10,000 nodes)?
Essential difference between <xsl:apply-templates> and <xsl:-for-each> that nobody has pointed out:
<xsl:apply-templates> is really something much more than a nicer, more elegant equivalent of <xsl:for-each>:
xsl:apply-templates is much richer and deeper than xsl:for-each, even
simply because we don't know what code will be applied on the nodes of
the selection -- in the general case this code will be different for
different nodes of the node-list.
Also, the code that will be applied
can be written way after the xsl:apply templates was written and by
people that do not know the original author.
_2. On the other side, using <xsl:for-each> is in no way harmful if one knows exactly how an <xsl:for-each> is processed.
The trouble is that a lot of newcomers to XSLT that have experience in imperative programming take <xsl:for-each> as a substitute of a "loop" in their favorite PL and think that it allows them to perform the impossible -- like incrementing a counter or any other modification of an already defined <xsl:variable>.
One indispensable use of <xsl:for-each> is to change the current document -- this is often needed in order to be able to use the key() function on a document, different from the current source XML document, for example to efficiently access lookup-table that resides in its own xml document.
Templates tend to split up the code more nicely. Also, for-each loops suffer from the fact that people often come to them with the idea that they operate identically to the way for loops work in the major programming languages.
It is the use of a for-each to call templates that is discouraged, not the widespread use of for-each, per se. Even the Muenchian method of grouping relies on xsl:key constructs with xsl:for-each loops.
The idea of writing good XSLT is that the structure of your XML should dictate what templates are matched and then applied. Therefore, whenever possible, use apply-templates to select the nodes, rather than applying a select using for-each.
Quick answer: XSLT is largely functional in nature, and imperative loops are not very functional.
In general the based way to get the best out of XSLT is to use pattern matching as much as possible (xsl:apply-template rather than loops, ifs and call-template1).
In the end, it is all about style, and will make little difference in short pieces of XSLT, but being more functional will help with longer/complex problems.
1 Other than when creating functions which return values rather than modifying the output.
By using apply-templates, perhaps with a mode, makes it easier to include appropriate transformations of more kinds of elements that may be of interest at that location.
For example if you had XML containing a libraries contents, and you keep using for-each all over the place to select books, then you start keeping record of DVDs, CDs, Ginger Nut biscuits, etc, you will have to examine all the for-each sections to include more than just books. If you had used apply-templates, you may be able to simply create the new matches for the new elements.