Check if <fo:page-number> is even or not using XSLT 2.0 - xslt

How to check the <fo:page-number> is even or odd using xslt 2.0 Is there any way to use <fo:page-number> inside <xsl:if test="fo:page-number mod 2 = 0">

The XSLT stage generates the XSL-FO that the formatter then makes into pages. So, no, you can't get the current page number when you are generating the XSL-FO.
What do you want to change if it is an even-numbered page?
With XSL-FO, you can set up different page masters for odd and even pages (and more besides). The different page masters can have different margins, and you can set things up so that the formatter will direct different content to headers and footers on even pages than is used on odd pages.
See the 'Page Region and Structure' PDF and FO files in the 'XSL-FO Samples Collection' at https://www.antennahouse.com/xsl-fo-samples#structure

What you ask for cannot be done with a true batch formatter in a single pass. It requires "human" intervention to mark only those places where the break needs to occur and not others.
Also, there is no guarantee that one XSL FO formatter might yield different results than another. Because of the complexities in the way some formatters handle "line tightness" (which is very small squeezing of spaces and characters together to fit text within a line) as well as some supporting kerning and others not as well as many other factors, it is not possible to "pre-predict" whether some paragraph will appear/start on a page or not.
Formatting text in true typography is not merely word-space-word-space ... there are many other factors involved that could change the number of lines in a paragraph between one formatter and another which can easily ripple to a known paragraph existing on an even page in one formatter, yet an odd page in a different formatter.
Then you also need other rules like what if your paragraph using your formatter of choice is the first one on your page in which you wish to break. Do you want a blank page? Maybe, who knows?
The only way to accomplish your task is through a multipass approach that could be implemented such that it is generic to any formatter. You would need to format a whole document (or if you are chunking that document with page masters) at least a chunk that starts and ends in page boundaries. Format it, test your condition on the first paragraph. If it passes (meaning if a break is needed), go back to original content (or modify the XSL FO) and mark some attribute that would result in break-before="page" on that structure. Then repeat the process until you reach the end of the document. Some formatters can provide you the area tree and markers you can put in that tree so that you could do this programmatically and not by eye).
If your document is long and in one page-sequence (say like 3000 pages when formatted) and your break condition is frequent, you may have to repeat the process 700+ times.
As stated, some formatters through their API may allow you to control this programmatically. You can examine the area tree, look for your marker and keep count of pages. You may even be able to start formatting again at the break condition and not start over, but you need to program such things.

Related

How to skip a self closing tag in a ST function on an SAP system?

So I have this problem handling an XML file in my SAP ABAP-based software, with a Simple Transformation.
The file I receive have normally no empty tags like <test></test>, but can happen sometimes that I receive some self closing tag like <test/>.
This is an example of what I thought to use now. The first condition handles if the ref('test') is blank by skipping it. The second one takes the values if we have one.
<tt:cond check="initial(ref('test'))">
<tt:skip count="*" name="test"/>
</tt:cond>
<tt:cond check="not-initial(ref('test'))">
<test tt:value-ref="test"/>
</tt:cond>
The idea is: if we have this tag <test/> we need to skip it, otherwise we need to assign the data. Now, this working in the first case, cause he takes no date, but not in the second cause it not takes the data again.
Someone can help?
Thanks in advantages.
The XDM tree representations of <test></test> and <test/> are 100% identical, so there is no way an XSLT stylesheet can distinguish them or treat them differently. The idea of attaching different meanings to the two constructs is completely misguided: you can never be sure which representation an XML library will choose to use.
It is of course possible to distinguish an element that contains a value (such as <test>value</test>) from one that is empty - but both the above examples represent empty elements and must be treated as equivalent.

RegEx to remove specific XML elements

I'm using Kate to process text to create an XML file but I've hit a roadblock. The text now contains additional data that I need to remove based on its content.
To be specific, I have an XML element called <officers> that contains 0 or more <officer> elements, which contain further elements such as <title>, <name>, etc.. While I probably could exclude these at run time using XSL, the file also drives another process that I don't want to touch - it's a general purpose data importer for Scribus so I don't want to touch the coding.
What I want to do is remove an <officer> element if the <title> content isn't what I want. For example, I don't want the First VP, so I'd like to remove:
<officer>
<title>First VP</title>
<incumbent>Joe Somebody</incumbent>
<address>....</address>
<address>....</address>
......
</officer>
I don't know how many lines will be in any <officer> element nor what positions they will in within the <officers> element.
The easy part it getting to the start of the content I want removed. The hard part is getting to the </officer> end tag. All the solutions I've found so far just result in Kate deciding that the RegEx is invalid.
Any suggestions are appreciated.
Regex is the wrong tool for this job; never process XML without a proper parser, except possibly for a one-off job on a single document where you will throw the code away after running it and checking the results by hand. You might find a regex that works on one sample document, but you'll never get it to work properly on a well-designed set of 100 test documents.
And it's easily done using XSLT. It's a stylesheet with two template rules: a default "identity template" rule to copy elements unchanged, and a second rule to delete the elements you don't want. In fact in XSLT 3.0 it gets even simpler:
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="officer[title='First VP']"/>

Is it safe to wrap an entire CFM page in a cfoutput tag

I am placing a <cfoutput> tag around my entire <html> tag. The ColdBox best practice guide states "When you are creating view templates, try to always surround it with 1 cfoutput tag, instead of nesting them all over the place."
But I have on occasion seen errors pop up where a <script> block containing javascript code is within the <cfoutput> tag. This probably because Coldfusion sees a hash # and tries to parse it but it can't because its javascript.
So how does one get away with having a single <cfoutput> tag on a view page in which to place everything?
I am not aware of any significant security or performance concerns in regards to wrapping an entire page in cfoutput. Of course, you'll always need to be aware to escape any pound signs by doubling them up any time you're inside a cfoutput.
The best practices in that ColdBox guide are geared primarily toward readability and reducing clutter on the page. If you have large sections of the page that you don't want to escape pound signs on or if you like to use cfoutput's grouping functionality, there's nothing wrong with breaking up your cfoutputs in a way that makes sense.
In the olden days of CF there might have been more overhead, but these days I can't imagine it being more than a few nanoseconds, and that's once at compile time.
In my view files I tend to wrap all output in a single cfoutput tag.
You can escape # symbols in JavaScript, etc, by converting them to ##.
The simple answer to your question as posted is yes.
There are only two issues that I'm aware of to keep in mind:
Escape any single hashtags (#) with double hashtags (##) that may occur in your code (i.e., CSS, JS, etc.) ... unless the hashtags are actually being wrapped around a CFML function or variable.
If you're using "cfoutput query...", you will probably want to close the first "cfoutput" tag, and then reopen after the query output. Otherwise, you can run into issues when trying to group query output.
My preference is to use as few tags as possible, mostly for readability and to reduce clutter.

Multi language website, how to approach this?

I have a website (Coldfusion) on which I want to offer multi language, but no idea what is the best way to do this.
There 2 plans I have:
1:
Of course all content (text) is in a database.
If a user would want a different language, the user would click on a link/flag, this would put the requested language in a session variable, for example: session.language = "es"
In the database I would have 2 columns (every language has 1 column) and then select the text which belongs to 'es'
Every page would then do a request to the database to get the text beloging to the session.language.
PROS: Relatively simple to implement
CONS: SEO wise I don't think this could be very good. http:// www.domain.com/page.cfm would give an english text or spanish text (or other language). Google will not add duplicate URL's
2:
Do something with http:// www.domain.com/en/page.cfm for english and http:// www.domain.com/es/page.cfm for english.
With a URL rewrite rule the language value in the URL http:// www.domain.com/en/page.cfm would actually be a page http:// www.domain.com/page.cfm?language=en
The url.language variable will then select the correct language from the database.
PROS: Unique URL for each language. Good for SEO and Google indexing.
CONS: A bit more difficult to implement. (I think)
Or does anyone have other / better ideas?
Thanks!!
You should always first check the browser header "Accept-Language" for the default language(s) (the correct standard way to do it), and offer links (the intuitively seemingly right way) only as an alternative.
Doing it in a database doesn't seem very standard. Let's assume you would like to use MVC architecture (model-view-controller). Most software uses keys in the presentation layer (view) (eg. html) and along with the presentation layer, you have language files (in Java, this is typically properties files) which are mapped simply by their filenames, and can be modified by regular users, without any special skills, such as professional translators with no computer skills. Certainly you could put it in a database, but then it is just more work, and moves the information out of the presentation layer.
There are various libraries for doing this. You should find the normal one for your application. Please edit your question to include what you are using to develop the application. (eg. JSP, Tapestry, Wicket, ASP, PHP, etc.) So for example, if you wanted to use JSPs, I would then suggest you use the JSTL tag library's language support. Or if you were using Tapestry, I would point you to http://tapestry.apache.org/localization.html or http://tapestry.apache.org/tapestry4.1/UsersGuide/localization.html
To look it up, you can look for the terms "internationalization" aka "i18n", or "localization". (The terms don't mean the same thing, but few use them correctly, so either works. http://www.w3.org/International/questions/qa-i18n)
I would go for option 2. Every translation should have its own url. Links to your website will already be in the intended translation.
To store translations in a database, I wouldn't put every translation in a seperate column, but rather put them in a seperate table:
Table Posts:
- id
- title_id
- ...
Table Translations:
- label_id
- value
- country_code
- language_code
Where title_id matches label_id
This way you won't have to alter your table structure when a new translation is added. This allows you to have infinite translations for any label or text.
To effectively do a multi-lingual site then you need set a rule for yourself that NO TEXT is ever put in the source as hard coded. It either needs to come from the database and / or a Resource Bundle.
Text from the database
You need to make sure that the column you are storing your data in is unicode otherwise you'll have issues with accented character. Also don't have a column per language as this is not scalable, do what #jan suggests and have a translations table where the items are keyed on a reference as well as a language.
Resource bundles
You are not going to want to get every last little bit of text from the database so for those you can utilise a resource bundle. This is an, admittedly old, link http://www.sustainablegis.com/blog/cfg11n/index.cfm?mode=entry&entry=FD48909C-50FC-543B-1FE177C1B97E8CC1 from Paul Hastings's blog about some solutions to resource bundles. To be honest his blog is an excellent resource on this very subject.
With regards to how you handle the URLs do not do option 1 as you quite rightly identified you will cause issues the SEO rankings of the page and it will mean that users cannot correctly share or return to the page.
Two approaches are having the language code in the URL as you identified in option 1.
Pros
Simpler to configure
Cons
You have one application which means that as you add more languages you add more complexity and weight on the memory of that app
Or you can have a different sub domain or domain per application e.g. es.yourdomain.com or yourdomain.es they can all be the same codebase
Pros
Each language is a standalone application meaning it has it's own memory
Cons
more effort to configure
http://i18n.riaforge.org/ has a download for i18n. It can be used to make sure that all string labels match. That way if some one wants to change "Save" to "Update", it can all be done in one spot.
It is also important to consider the technical background of those that will being doing the translation. It is often easier to get the translation team to edit files in notepad as opposed to updating a db. Text files work well with version control.
The best way i found is to use an XML to hold just that pages language stuff, one xml to cover each page, and you then vary it for language. when the page loads, just load a different XML from the database or files... many ways to do this. all other methods i have tried have their issues, and at least this one allows you to take a language XML, hand it to someone who will copy it, and then change the boxes... you put it in the DB to serve it.
one can also do this for text, and have the DB make the XML for just the text for that page by using a list of items to include in the XML for the page.
once you get the idea, the rest becomes very easy...
and given CF ways of accessing such data with dot notation, easy peasy to us
say you have "Load Images"
in english xml it may be <LoadIMGS>Load Images</LoadIMGS>
in chinese it may be <LoadIMGS>加载图像</LoadIMGS>
or <LoadIMGS>Jiā zǎi túxiàng</LoadIMGS>
regardless, in your CFM code you would just put #variablename.LoadIMGS# in the place... i would also suggest putting in the loadimages tag the size the font should be adjusted to if not normal size. that way, when translations are too large, you can shrink that font there for that... etc.
enjoy!!!

Aspose, append Pdf's without extra space

I've got list of Pdf objects (or Streams) and would like to append them that there wont be extra white space between them. Aspose 'PdfFileEditor.Concatenate' adds next Pdf starting from next page but I'd like to add it immediately after previous finishes.
Is it possible?
I don't know if that was possible (there wasn't much time) so had to refactor the code that only one stream was created with correct layout.