XSL-FO Remove Static Content when break-before creates empty page - xslt

I have added to my xml elements that they are forced to be rendered on either an even or an odd page. This I have done with the attribute break-before="odd-page" or also break-before="even-page". So far everything works great!
My problem is that it can happen that an empty page is created, for example when an element is already on an even page and is then forced to start on an even page. The created empty page (only the body is empty) still has the same static-content as the page before.
But I want a completely empty page to be inserted in such a case without page number, header or similar.
Is this somehow possible?
I am using Apache-Formatting-Objects-Processor 2.3 (fop 2.3)

You want an fo:conditional-page-master-reference (see https://www.w3.org/TR/xsl11/#fo_conditional-page-master-reference) with a blank-or-not-blank (see https://www.w3.org/TR/xsl11/#blank-or-not-blank) value of blank.
The first fo:conditional-page-master-reference for which all of its traits are true is the one that is selected, so the new fo:conditional-page-master-reference should come before your fo:conditional-page-master-reference that refer to the page masters for odd and even pages. Here's an example that comes from a FOP test file (I think):
<fo:page-sequence-master master-name="pages">
<fo:repeatable-page-master-alternatives>
<fo:conditional-page-master-reference page-position="first" master-reference="first-page"/>
<fo:conditional-page-master-reference page-position="last" master-reference="last-page"/>
<fo:conditional-page-master-reference blank-or-not-blank="blank"
master-reference="blank-page"/>
<fo:conditional-page-master-reference odd-or-even="odd" master-reference="odd-page"/>
<fo:conditional-page-master-reference odd-or-even="even" master-reference="even-page"/>
</fo:repeatable-page-master-alternatives>
</fo:page-sequence-master>
For completely blank pages, the second part of the solution is not direct your existing fo:static-content to any regions on the blank pages. The fo:simple-page-master for blank pages should either not contain fo:region-after, etc., or, if it did include them, those regions should have different region-name values to the flow-name values in your existing fo:static-content.

Related

Check if <fo:page-number> is even or not using XSLT 2.0

How to check the <fo:page-number> is even or odd using xslt 2.0 Is there any way to use <fo:page-number> inside <xsl:if test="fo:page-number mod 2 = 0">
The XSLT stage generates the XSL-FO that the formatter then makes into pages. So, no, you can't get the current page number when you are generating the XSL-FO.
What do you want to change if it is an even-numbered page?
With XSL-FO, you can set up different page masters for odd and even pages (and more besides). The different page masters can have different margins, and you can set things up so that the formatter will direct different content to headers and footers on even pages than is used on odd pages.
See the 'Page Region and Structure' PDF and FO files in the 'XSL-FO Samples Collection' at https://www.antennahouse.com/xsl-fo-samples#structure
What you ask for cannot be done with a true batch formatter in a single pass. It requires "human" intervention to mark only those places where the break needs to occur and not others.
Also, there is no guarantee that one XSL FO formatter might yield different results than another. Because of the complexities in the way some formatters handle "line tightness" (which is very small squeezing of spaces and characters together to fit text within a line) as well as some supporting kerning and others not as well as many other factors, it is not possible to "pre-predict" whether some paragraph will appear/start on a page or not.
Formatting text in true typography is not merely word-space-word-space ... there are many other factors involved that could change the number of lines in a paragraph between one formatter and another which can easily ripple to a known paragraph existing on an even page in one formatter, yet an odd page in a different formatter.
Then you also need other rules like what if your paragraph using your formatter of choice is the first one on your page in which you wish to break. Do you want a blank page? Maybe, who knows?
The only way to accomplish your task is through a multipass approach that could be implemented such that it is generic to any formatter. You would need to format a whole document (or if you are chunking that document with page masters) at least a chunk that starts and ends in page boundaries. Format it, test your condition on the first paragraph. If it passes (meaning if a break is needed), go back to original content (or modify the XSL FO) and mark some attribute that would result in break-before="page" on that structure. Then repeat the process until you reach the end of the document. Some formatters can provide you the area tree and markers you can put in that tree so that you could do this programmatically and not by eye).
If your document is long and in one page-sequence (say like 3000 pages when formatted) and your break condition is frequent, you may have to repeat the process 700+ times.
As stated, some formatters through their API may allow you to control this programmatically. You can examine the area tree, look for your marker and keep count of pages. You may even be able to start formatting again at the break condition and not start over, but you need to program such things.

Drupal 8 Webform: how to display text input on one page on the next page?

I am trying to develop a multistep webform in Drupal 8 using Webform 8.x-5.1. I have written a WebformHandler that extends Drupal\webform\Plugin\WebformHandlerBase and made it available to the webform.
In the first step of the webform, I collect a text-field. I would like to display the value of that text-field in an HTML element (Advanced HTML/Text or Basic HTML) on the second page after doing some computation.
I have overwritten submitForm() in the WebformHandler and in it assign the value I want to the HTML element as follows:
$form['elements']['page_name']
['advanced_html_element']['#text'] = '...my HTML...';
Using ksm() I can see that this assignment works, but the the HTML element is not rendered with my HTML: the element is either invisible or contains the initial value set up in the form editor.
Clearly I'm missing something. Should I be using something other than submitForm? Can anyone help me?
It's been a long haul, but I've finally worked out how to do what I want to. The following works for me.
Firstly, I discovered the method validateForm in WebformHandlerBase. On each page in a form with multiple pages, you will find that the following methods are called in the order given here:
submitForm (called once)
alterForm(called possibly more than once)
validateForm (called once)
The name validateForm leads me to believe I may be misusing this method, but that is where I set up the elements on the following page that I wish to programmatically initialise. It works, so what the hey!
In validateForm, I initialise the elements that appear on the following page as follows:
$form_state->setValue(<element name>, <data structure>);
The <element name> is the name you give the element in the form editor ("Build" tab). The <data structure> has to be correct, of course: I suggest you find the appropriate structure by first filling in the element on the next page manually and seeing what turns up in $form_state.
There is also a $form_state->getValue(<element name>), which seems to me to mean that $form_state can also be used for storing session data, say in hidden fields. I initially used Drupal::service('tempstore.private')->get('xxx') for storing data that had to be available across page boundaries, but $form_state might be a cleaner solution.
I hope this helps someone: I spent a horribly long time trying to get this to work.

Build validation rules for item DisplayName property

I'm working on a multi-language solution in Sitecore and want to use the DisplayName property of an item to represent the URL to allow for language-specific URLs.
I've set the useDisplayName web.config property to true as shown below
<linkManager defaultProvider="sitecore">
<providers>
<clear />
<add name="sitecore"
alwaysIncludeServerUrl="false"
encodeNames="true"
type="Sitecore.Links.LinkProvider, Sitecore.Kernel"
addAspxExtension="false"
shortenUrls="true"
languageEmbedding="asNeeded"
languageLocation="filePath"
useDisplayName="true" />
</providers>
</linkManager>
I've also been playing with the <encodeNameReplacements> section which can replace %20 with a hyphen in the URL to give nice clean URLs - this is done with the following for those who are interested:
<replace mode="on" find=" " replaceWith="-" />
All very good, except that Sitecore breaks if a user enters a hyphen within a DisplayName with the above setting turned on.... If I turn the above setting off, then I have to ensure that users enter nice hyphen separated values for the DisplayName otherwise we start seeing nasty %20s again in the URL...
So, is there a way to validate the DisplayName property to either disallow or allow hyphens being used?
Or, even better, is there a way to hook into whatever code is executed when the encodeNameReplacements thing happens? This would be ideal, as I could allow users to enter whatever they like for DisplayName, then just sanitise this value on the fly.
There is a solution for this, but it will require some coding and it's quite complex.
I have used this kind of solution in many projects before and it's the only way of solving this, that i'm aware of.
You don't want to really replace spaces with hyphens when saving the item, because it's not user friendly.
My solution works at runtime.
First get rid of the <replace> rule you added.
Then create your own LinkProvider (inherit from Sitecore's default provider).
Inside the LinkProvider create a method to "normalize" the item's displayname (e.g. replace spaces for hyphens), let's call this method NormalizeDisplayName(). Make it public and static cause you will need it later.
So now you have managed to let Sitecore replace all spaces with hyphens in links. The rest you can still configure using the default provider options (addAspxExtension="false" useDisplayName="true", etc)
Next up is the ItemResolver: Sitecore's default ItemResolver is not going to recognize the item path anymore so you are going to add your own ItemResolver to fix this.
Create a class that inherits from Sitecore.Pipelines.HttpRequest.HttpRequestProcessor and configure it to be used in the <httpRequestBegin> pipeline after the default ItemResolver.
Now, when the itemresolver is processed you will first split up the requested itempath (let's assume that "/category-name/subitem-name" was requested).
Starting from the siteroot (which can be pulled from Sitecore.Context.Site), loop through all children while normalizing their item names using your NormalizeDisplayName() method you created earlier, until you find one that matches the part of your item path.
So in this case, loop through the children of your Home item until you find one that matches normalized displayname "category-name". Then do the same thing for the children of that item until you find the item with normalized displayname "subitem-name".
This way you can resolve the requested item and it will also work if the original displayname already contained hyphens!
I'm sorry that I can't give you complete code examples as it's quite complex and is not limited to just the above things. You also need to think about redirecting if the URL is not properly formatted and make exceptions for master/core database to prevent Sitecore from breaking.
Hope this helps you!
If you see the source in .net reflector (Sitecore.Shell.Framework.Commands.SetDisplayName) there are no pipelines that runs.
You could make a Saving event, making the display name, as you want replacing hypen with a space
public void OnItemSaving(object sender, EventArgs args)
{
Item item = Event.ExtractParameter(args, 0) as Item;
item.Appearance.DisplayName = item.Appearance.DisplayName.Replace("-", " ");
}
Just a quick example of the event

Why might a sitecore presentation component produce different output dependent on domain?

As I can't post code I'm asking this as a theoretical question, but giving a scenario.
I have a "newsroom" sublayout, which staticaly binds a couple of XSLTs to list latest news and latest events. The sublayout is used on a newsroom item, the events and news items are descendants of it (though not direct child items, there are a couple of layers of folders to categorise and date items).
The subayout is in use in around 10 sites in our solution with no problem. Each site is a clone of our main site with an extra language version added. We hae succesfully used this with (amongst others) Japanese, Chinese, Russian, Polish and Czech language sites.
Our most recent clone (Turkish), however, shows no items in the event or news lists. The items exist and are published, and display as expected when browsed individually.
The presentation details for the newsroom item are identical to all other newsroom items.
Even more perplexing, the newsoom item itself, when displayed in the context of a different domain, displays correctly.
i.e.
www.mysite.com/sitecore/content/my_turkish_site/path/newsroom?sc_lang=tr-TR
shows the lists without a problem, including dates formatted according to culture, but
www.mysite.com.tr/sitecore/content/my_turkish_site/path/newsroom?sc_lang=tr-TR
shows empty lists.
The exact same problem occurs if language is switched to English (the language of the source of the clone)
Almost all of the Turkish site is working properly.
None of the presentation components are marked as cachable.
None of the presentation components have a specified data source (i.e. they all use the current item/descendants axis)
What are the possible causes of this problem, and how can I test them?
EDIT:
For Mark Ursino
This is the site definition (slightly fictionalised). I can't post that much more of the web.config...
<site name="www.mysite.com.tr" patch:after="site[#name='www.mysite.com.au']" virtualFolder="/" physicalFolder="/" rootPath="/sitecore/content/CloneData/TurkishClone" hostName="www.mysite.com.tr" startItem="/Turkey_Home" database="web" domain="extranet" allowDebug="true" cacheHtml="true" htmlCacheSize="10MB" enablePreview="true" enableWebEdit="true" enableDebugger="true" disableClientData="false" language="tr-TR" />
Some debugging shows that the XSLTs aren't matching the template when the item is viewed in the Turkish context.
This is the debug match used (the select is what we use in our for-each):
<xsl:value-of select="count(./descendant::item[#template='newsitem' and #id!=$topNewsId and sc:fld('__created',.)])"/>
It matches on the same item viewed from other domains.
Debug output shows the Turkish site thinks the template is newsıtem instead of newsitem (the i is wrong!).
I've also tested viewing the newsroom of other sites through the Turkish domain - the problem is the same.
We have the same problem with items based on the eventitem template, and apparently with an Image Spot template.
Have you tried printing out the language, culture and context item id, lang, version within your events lister and xslts?
If that shows anything unexpected you could step-in to some of the Sitecore pipeline resolvers with Reflector Pro's external assembly debugging.
Answering my own question.
Sitecore template names are converted to lower case for use in XSLT comparison, by the Sitecore XSLT extensions.
This lower case conversion hasn't been set as culturally invariant. In Turkish, forcing lower casing of a upper case I gives the lower case ı
Sitecore has replaced calls with ToLower() in the Sitecore API with ToLowerInvariant() as of 6.5 rev.110419
http://sdn.sitecore.net/Products/Sitecore%20V5/Sitecore%20CMS%206/ReleaseNotes/ChangeLog.aspx
As we're not upgrading to 6.5 quite yet we'll be using template IDs rather than names in XSLT for now.
.Net info on strings and culture, including specifically the "Turkish I problem"
http://msdn.microsoft.com/en-us/library/ms973919.aspx

Markdown and XSS

Ok, so I have been reading about markdown here on SO and elsewhere and the steps between user-input and the db are usually given as
convert markdown to html
sanitize html (w/whitelist)
insert into database
but to me it makes more sense to do the following:
sanitize markdown (remove all tags -
no exceptions)
convert to html
insert into database
Am I missing something? This seems to me to be pretty nearly xss-proof
Please see this link:
http://michelf.com/weblog/2010/markdown-and-xss/
> hello <a name="n"
> href="javascript:alert('xss')">*you*</a>
Becomes
<blockquote>
<p>hello <a name="n"
href="javascript:alert('xss')"><em>you</em></a></p>
</blockquote>
∴​ you must sanitize after converting to HTML.
There are two issues with what you've proposed:
I don't see a way for your users to be able to format posts. You took advantage of Markdown to provide nice numbered lists, for example. In the proposed no-tags-no-exceptions world, I'm not seeing how the end user would be able to do such a thing.
Considerably more important: When using Markdown as the "native" formatting language, and whitelisting the other available tags,you are limiting not just the input side of the world, but the output as well. In other words, if your display engine expects Markdown and only allows whitelisted content out, even if (God forbid) somebody gets to the database and injects some nasty malware-laden code into a bunch of posts, the actual site and its users are protected because you are sanitizing it upon display, as well.
There are some good resources on the web about output sanitization:
Sanitizing user data: Where and how to do it
Output sanitization (One of my clients, who shall remain nameless and whose affected system was not developed by me, was hit with this exact worm. We have since secured those systems, of course.)
BizTech: Best Practices: Never heard of XSS?
Well certainly removing/escaping all tags would make a markup language more secure. However the whole point of Markdown is that it allows users to include arbitrary HTML tags as well as its own forms of markup(*). When you are allowing HTML, you have to clean/whitelist the output anyway, so you might as well do it after the markdown conversion to catch everything.
*: It's a design decision I don't agree with at all, and one that I think has not proven useful at SO, but it is a design decision and not a bug.
Incidentally, step 3 should be ‘output to page’; this normally takes place at the output stage, with the database containing the raw submitted text.
insert into database
convert markdown to html
sanitize html (w/whitelist)
perl
use Text::Markdown ();
use HTML::StripScripts::Parser ();
my $hss = HTML::StripScripts::Parser->new(
{
Context => 'Document',
AllowSrc => 0,
AllowHref => 1,
AllowRelURL => 1,
AllowMailto => 1,
EscapeFiltered => 1,
},
strict_comment => 1,
strict_names => 1,
);
$hss->filter_html(Text::Markdown::markdown(shift))
convert markdown to html
sanitize html (w/whitelist)
insert into database
Here, the assumptions are
Given dangerous HTML, the sanitizer can produce safe HTML.
The definition of safe HTML will not change, so if it is safe when I insert it into the DB, it is safe when I extract it.
sanitize markdown (remove all tags - no exceptions)
convert to html
insert into database
Here the assumptions are
Given dangerous markdown, the sanitizer can produce markdown that when converted to HTML by a different program will be safe.
The definition of safe HTML will not change, so if it is safe when I insert it into the DB, it is safe when I extract it.
The markdown sanitizer has to know not just about dangerous HTML and dangerous markdown, but how the markdown->HTML converter does its job. That makes it more complex, and more likely to be wrong than the simpler unsafeHTML->safeHTML function above.
As a concrete example, "remove all tags" assumes you can identify tags, and would not work against UTF-7 attacks. There might be other encoding attacks out there that render this assumption moot, or there might be a bug that causes the markdown->HTML program to convert (full-width '<', exotic white-space characters stripped by markdown, SCRIPT) into a <script> tag.
The most secure would be:
sanitize markdown (remove all tags - no exceptions)
convert markdown to HTML
sanitize HTML
insert into a DB column marked risky
re-sanitize HTML every time you fetch that column from the DB
That way, when you update your HTML sanitizer you get protection against any newly discovered attacks. This is often inefficient, but you can get pretty good security by storing a timestamp with HTML inserted so that you can tell which might have been inserted during the time when someone knew about an attack that gets past your sanitizer.