Truncate text formatted via HTML with XSLT 1.0 - xslt

I am trying to truncate some text that has been formatted via HTML, but I need to keep the html in tact. I am doing so in SharePoint 2007 - so I am using XSLT 1.0.
I found this bit of XSLT here: http://symphony-cms.com/download/xslt-utilities/view/20816/
I was able to implement it, but it is telling me that the variable or parameter "Limit" has been defined twice.
However, the author has named many variables and parameters "Limit" and I am not sure which one I need to change.
I am fairly new to XSLT, and any help is greatly appreciated.

This is because at the top the XSLT the author has defined limit as a parameter
<xsl:param name="limit"/>
But a few lines down, then defines it as a variable
<xsl:variable name="limit">
Perhaps he had a 'buggy' xslt processor which allowed variables to be re-defined, but it should not actually be valid.
I did try renaming the variable to newlimit but it is hard to know whn he subsequently refers to limit whether it is the paramater or variable it is referring too (I couldn't actually get it to output useful HTML).
You are probably better off looking for something else to meet your needs. There may even be similar questions here on StackOverflow if you search about. For example, perhaps this meets your needs
XSLT - Using substring with copy-of to preserve inner HTML tags
I am sure there may be others if you look. If not, feel free to ask a new question, giving your input HTML, and your expected output, so that it is clear what your requirements are.

Related

How to skip a self closing tag in a ST function on an SAP system?

So I have this problem handling an XML file in my SAP ABAP-based software, with a Simple Transformation.
The file I receive have normally no empty tags like <test></test>, but can happen sometimes that I receive some self closing tag like <test/>.
This is an example of what I thought to use now. The first condition handles if the ref('test') is blank by skipping it. The second one takes the values if we have one.
<tt:cond check="initial(ref('test'))">
<tt:skip count="*" name="test"/>
</tt:cond>
<tt:cond check="not-initial(ref('test'))">
<test tt:value-ref="test"/>
</tt:cond>
The idea is: if we have this tag <test/> we need to skip it, otherwise we need to assign the data. Now, this working in the first case, cause he takes no date, but not in the second cause it not takes the data again.
Someone can help?
Thanks in advantages.
The XDM tree representations of <test></test> and <test/> are 100% identical, so there is no way an XSLT stylesheet can distinguish them or treat them differently. The idea of attaching different meanings to the two constructs is completely misguided: you can never be sure which representation an XML library will choose to use.
It is of course possible to distinguish an element that contains a value (such as <test>value</test>) from one that is empty - but both the above examples represent empty elements and must be treated as equivalent.

RegEx to remove specific XML elements

I'm using Kate to process text to create an XML file but I've hit a roadblock. The text now contains additional data that I need to remove based on its content.
To be specific, I have an XML element called <officers> that contains 0 or more <officer> elements, which contain further elements such as <title>, <name>, etc.. While I probably could exclude these at run time using XSL, the file also drives another process that I don't want to touch - it's a general purpose data importer for Scribus so I don't want to touch the coding.
What I want to do is remove an <officer> element if the <title> content isn't what I want. For example, I don't want the First VP, so I'd like to remove:
<officer>
<title>First VP</title>
<incumbent>Joe Somebody</incumbent>
<address>....</address>
<address>....</address>
......
</officer>
I don't know how many lines will be in any <officer> element nor what positions they will in within the <officers> element.
The easy part it getting to the start of the content I want removed. The hard part is getting to the </officer> end tag. All the solutions I've found so far just result in Kate deciding that the RegEx is invalid.
Any suggestions are appreciated.
Regex is the wrong tool for this job; never process XML without a proper parser, except possibly for a one-off job on a single document where you will throw the code away after running it and checking the results by hand. You might find a regex that works on one sample document, but you'll never get it to work properly on a well-designed set of 100 test documents.
And it's easily done using XSLT. It's a stylesheet with two template rules: a default "identity template" rule to copy elements unchanged, and a second rule to delete the elements you don't want. In fact in XSLT 3.0 it gets even simpler:
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="officer[title='First VP']"/>

How to break caching on exist-db of included XSLs in Transform

I have a large set of XSLs that we recently went through and implemented a shared XSL template with common bits. We included an xsl:include in all the main XSLs now to pull these in. We had no issues at first until we started to make changes to the shared XSL.
For information, the whole system is web based, calling queries to dynamically format documents in the database given different XSLs through XSL FO and RenderX.
The main transform is:
let $fo := util:expand(transform:transform($articles, doc("/db/Customer/data/edit/xsl/Custbatch.xsl"), $parameters))
That XSL (Custbatch.xsl) has:
<xsl:include href="Custshared.v1.xsl"/>
If we make an edit to "Custshared.v1.xsl" is not reflected in the result because it is obvious that "Custshared.v1.xsl" is being cached and used. We know this because as you can see the name now includes "v1". If we make a change and change all the references say from v1 to v2, it all works. But this seems a bit ridiculous as that means we have to change the 18 XSLs that include this XSL or do something silly like restart the database.
So, what am I missing in the setup or controller.xql (which has the following on all not matched paths), to get things not to cache. I assume that is all internal so this setting likely does not matter. Is there some other setting in the config that does?
<dispatch xmlns="http://exist.sourceforge.net/NS/exist">
<cache-control cache="no"/>
</dispatch>
In reading the document here: http://exist-db.org/exist/apps/doc/xsl-transform.xml, it states:
"The stylesheet will be compiled into a template using the standard Java APIs (javax.xml.transform). The template is shared between all instances of the function and will only be reloaded if modified since its last invocation."
However, if I change an included XSL, it is not being used.
Update #1
I even went as far as creating a query that returns the XSL that is included, then I use:
<xsl:include href="http://localhost/get-include-xsl.xq"/>
This does work as formatting is not broken, but changing the underlying XSL yields the same result. So even that Xquery result is cached.
Update #2
And yes, through some simple test all is proven.
If I make any change to the root template (like add a meaningless space) and run, it does include the changes made in the include. If I only change the included XSL, no changes happen.
So lacking anything else, we could always write a Xquery that basically touches all the main templates after a change is made to the include template. Seems so wrong as a workaround.
Update #3
So the workaround we are currently using is that we have an unused "variable" in the XSL (version) and when we update the shared template, we execute that query which basically updates the value in that variable. At least it's only one XQuery and maybe we should attach to a trigger.
There is a setting in $exist-db-root$/conf.xml for the XSL transformer where you can turn off caching: <transformer class="net.sf.saxon.TransformerFactoryImpl" caching="no"> (The default is 'yes')

Regex or Xpath for extracting nodes?

I have an XML file with the following structure;
<JobList>
<Job><subnodes/></Job>
<Job><subnodes/></Job>
</JobList>
This xml can be broken sometimes leaving a missing ending of <JobList> and missing end of </Job>.
I would like to be able to extract the <Job> nodes with full content on those that are closed with </Job>. What is the best way to do this?
To make a long story short I am using .NET and built in serializers for deserializing xml content. But since new properties are added you cannot just go back and forth between different versions as it is to strict. Mostly it works, but I would like to have a backup recovery method for this - hence the question.
The current situation is that the deserializer "crashes" the whole deserializing when a new property has been added instead of ignoring it. I am looking to manually parse it on error.
As mentioned on the comments, the ideal would be to make the xml valid, if for whatever reason that is not possible, the workaround is parsing the file as text with a regex.
A general regex for this case could be something like:
<Job>((?!<Job>).)*</Job>$
this will bring anything between a complete pair
Please notice that this will also return nodes with 'broken' inner nodes, but according to your question you are only concerned about missing and tags.

Using xpath/xslt to get the anchor part of the page url

I'm writing a template in xslt with xpath 1.0 and need to access the anchor tag from the url of the current page. For example, the url:
http://localhost/destinations/london.aspx#accommodation
I need to be able to get #accommodation and assign it to a variable. I realise I'm somewhat limited by using xpath 1.0 - has anyone got any experience doing this?
Thanks, Adam
Why is this an xpath problem at all? A URL is not an XML document, ergo xpath does not apply.
XSLT is completely unaware of any state like page location. Guessing a bit at what you're trying to do, you're probably best off getting #accomodation from string manipulation or framework in the layer which calls the XSLT, passing the value in as a param.
OTOH maybe this is nonsensical and your question just needs clarification.
As #Annakata said, this is not an XPath problem. It doesn't seem to be an XSLT problem either, though I may be mistaken. If it is related to XSLT string parsing, then what you need is something like this question talks about.
What you probably need instead is Javascript to get the current URL (document.location) and then perform Javascript string parsing on it.
There is no way in standard XSLT to access to URL of a document : http://www.dpawson.co.uk/xsl/sect2/nono.html#d1974e804
Some vendors might provide this information via custom properties, but then you would be dependant on the XSLT processor.
If you have managed to get the URL into the XSLT in some fashion, then I suggest you will have to resort to simple string manipulation to get the anchor.