Aspose.Word convert DOCX to HTML looses MERGEFIELD, IF conditions, headers and footer, table cell widths - aspose

I'm trying to write a online document editor with TinyMCE 5 as editor and Aspose.Word v20.8 as converter.
But when I convert the DOCX to HTML5 with Aspose.Word, it is not rendering as expected in TinyMCE.
The HTML looses for example headers, footers, MergeFields, IF, TableStart:TableEnd sofar I can tell now.
I need this HTML has all the data because I need to convert it back to DOCX again.
Code to generate the HTML5 is:
var doc = new Document({Stream_Of_DOCX});
var options = new HtmlSaveOptions();
options.SaveFormat = SaveFormat.Html;
options.Encoding = System.Text.Encoding.UTF8;
options.UpdateFields = true;
options.ExportRoundtripInformation = true;
options.ExportImagesAsBase64 = true;
options.ExportFontsAsBase64 = true;
options.ExportPageSetup = true;
options.ExportDocumentProperties = true;
options.ExportHeadersFootersMode = ExportHeadersFootersMode.PerSection;
options.HtmlVersion = HtmlVersion.Html5;
doc.Save($"{fileName}.html", options);
The code to convert the HTML5 back to DOCX is, were the model.Html is the TinyMCE textarea:
var doc = new Document();
var builder = new DocumentBuilder(doc);
builder.InsertHtml(model.Html);
doc.Save($"{fileName}.docx");
Can anybody help me to get this working with some code examples?
Or maybe has a better idear to accomplish the task.
The main idear is to be able to edit DOCX files online, without to have to download it and upload again with some windows service as client for example.

Aspose.Words do preserve headers and footers upon saving to HTML if ExportRoundtripInformation option is enabled. In this case Aspose.Words writes header and footer content with special css attributes, which are understood by Aspose.Words:
<div style="-aw-headerfooter-type:header-primary; clear:both">
<p style="margin-top:0pt; margin-bottom:0pt; line-height:normal">
<span>header</span>
</p>
</div>
Also, Aspose.Words preserves some fields (PAGE, NUMPAGES, NOTEREF, REF, AUTOR and TITLE). For example, PAGE field is exported like the following:
<span style="-aw-field-start:true"></span><span style="-aw-field-code:' PAGE \\* MERGEFORMAT '"></span><span style="-aw-field-separator:true"></span><span>1</span><span style="-aw-field-end:true"></span>
Such content is recognized by Aspose.Words upon reading HTML and loaded into the model as field. I logged a request WORDSNET-21037 to preserve other types of fields too.
I am not familiar with TinyMCE, but I suspect that custom attributes used by Aspose.Words for roundtrip MS Word features are removed and that is why Header and Footer are not preserved in your case.
Disclosure: I work at Aspose.Words team.

Related

Use the title= HTML attribute with RMarkdown

I am trying to understand if it is possible to insert the HTML title= attribute (not necessarily inside an <abbr> tag) within an RMarkdown document (e.g. a blog post written through blogdown)
From W3C: the title attribute specifies extra information about an element. The information is most often shown as a tooltip text when the mouse moves over the element.
The <abbr title="World Health Organization">WHO</abbr> was founded in 1948.
Couldn't find anything regarding using in in RMarkdown tho
You can write raw HTML in Markdown. However, if you are using Hugo >= v0.60.0, raw HTML will be ignored by default. You need to set an option in your config file to enable it:
[markup.goldmark.renderer]
unsafe= true

How to render a link with a query string in Sitecore

I am trying to generate a link field on to the page, in the below format
<a class="book__btn" href="https://oc.axis.com/rez.aspx?submit=&shell=CASGCF">
Book
</a>
aspx:
<sc:Link ID="lnkBook" runat="server" Field="Target URL"></sc:Link>
<sc:FieldRenderer ID="frBook" runat="server" FieldName="Target URL"></sc:FieldRenderer>
aspx.cs:
Item offerDetails = this.DataSource;
lnkBook.Item = offerDetails;
frBook.Item = offerDetails;
The pic shows my declarations using Rocks
When the page is previewed, the button does not render at all. However, if I remove the text in the Query String field, it renders fine.
The Sitecore Rocks interface is a little misleading for links because the applicable fields do not change based on the link type. Query String is only supported for internal links. If you want to add a query string to your external link, just add it directly to the Url.

How to replace all anchor tags with a different anchor using regex in ColdFusion

I found a similar question here: Wrap URL within a string with a href tags using Coldfusion
But what I want to do is replace tags with a slightly modified version AFTER the user has submitted it to the server. So here is some typical HTML text that the user will submit to the server:
<p>Terminator Genisys is an upcoming 2015 American science fiction action film directed by Alan Taylor. You can find out more by clicking here</p>
What I want to do is replace the <a href=""> part with a new version which would be like this:
...
clicking here
So I'm just adding the text rel="nofollow noreferrer" to the tag.
I must match anchor tags that contain a href attribute with a URL, not just the URL string itself, because sometimes a user could just do this:
<p>Terminator Genisys is an upcoming 2015 American science fiction action film directed by Alan Taylor. You can find out more by http://www.imdb.com</p>
In which case I still only want to replace the tag. I don't want to touch the actual anchor text used even though it is a URL.
So how could I rewrite this Regex
#REReplaceNoCase(myStr, "(\bhttp://[a-z0-9\.\-_:~###%&/?+=]+)", "\1", "all")#
the other way round, where its selecting tags and replacing them with my modified text?
If you're willing, this is a really easy task for jQuery (client-side)
JSFiddle: http://jsfiddle.net/mz1rwo0u/
$(document).ready(function () {
$("a").each(function(e) {
if ($(this).attr('href').match(/^https?:\/\/(www\.)?imdb\.com/i)) {
$(this).attr('rel','nofollow noreferrer');
}});
});
(If you right click any of the imdb links and Inspect Element, you'll see the rel attribute is added to the imdb links. Note that View Source won't reflect the changes, but Inspect Element is the important part.)
If you want to effect every a link, you can do this.
$(document).ready(function () {
$("a").each(function(e) {
$(this).attr('rel','nofollow noreferrer');
});
});
Finally, you can also use a selector to narrow it down, you might have the content loading into a dom element with the id contentSection. You can do...
$(document).ready(function () {
$("#contentSection a").each(function(e) {
if ($(this).attr('href').match(/^https?:\/\/(www\.)?imdb\.com/i)) {
$(this).attr('rel','nofollow noreferrer');
}});
});
It's a bit tougher to reliably parse this in cold fusion without the possibility of accidentally adding it twice (without invoking a tool like jSoup) but the jQuery version is client-side and works by obtaining data from the DOM rather than trying to hot-wire into it (a jSoup implementation works similarly, creating a DOM-like structure you can work with).
When talking about client-side vs server-side, you have to consider the mythical user who doesn't have javascript enabled (or who turns it off with malicious intent). If this functionality is not mission-critical. I'd use JQuery to do it. I've used similar functionality to pop an alert box when the user clicks an outside link on one of my sites.
Here's a jSoup implementation, quick and dirty. jSoup is great for how it selects similarly to jQuery.
<cfscript>
jsoup = CreateObject("java", "org.jsoup.Jsoup");
HTMLDocument = jsoup.parse("<A href='http://imdb.com'>test</a> - <A href='http://google.com'>google</a>");
As = htmldocument.select("a");
for (link in As) {
if (reFindnoCase("^https?:\/\/(www\.)?imdb\.com",link.attr("href"))) {
link.attr("rel","nofollow noreferrer");
}
}
writeOutput(htmldocument);
</cfscript>

Extract Sharepoint 2013 wiki page HTML Source / Display wiki page without master layout in iFrame?

What I am trying to achieve is to find a way of displaying a wiki page content in a floating iFrame ( and of course keep the styling ) for a tool that I am developing for our employees. Right now the tool I made is using jQuery dialog box to display a specific document / pdf, For compatibility and usability purposes I would really like to upgrade that so it uses a wiki page instead of documents / PDFs.The problem that I am facing is that there is no really direct link to the content of a Sharepoint wiki page instead the only available direct link is the one to the page all together with all the navigation menus, option panel, user panel etc. I want to avoid using javascrip to strip away these elements. Instead I am simply trying to find out if sharepoint 2013 has some more elegant way of providing the content such as: Web service or javascript SP API.
My ideas so far:
REST Url to give the content back? I know for sure it works for lists and libraries but I couldn't find anything in the REST API About wiki page content
SP.js ? Couldn't find anything about that either
Anyways, it could be possible that I have overlooked things, or probably haven't searched hard enough. However, any help is very very welcome. If you don't know about a concrete solution I would be very happy with nice suggestions too :)
If there is nothing out of the box I will have to get to my backup plan of a jQuery solution to get the page and strip off all unnecessary content and keep the styling.
I believe you are on the right track with REST API, in Enterprise Wiki Page the content is stored in PublishingPageContent property.
The following example demonstrates how to retrieve Enterprise Wiki Page content:
var getWikiPageContent = function (webUrl,itemId,result) {
var listTitle = "Pages";
var url = webUrl + "/_api/web/lists/GetByTitle('" + listTitle + "')/items(" + itemId + ")/PublishingPageContent";
$.getJSON(url,function( data ) {
result(data.value);
});
}
Usage
getWikiPageContent('https://contoso.sharepoint.com/',1,function(pageContent){
console.log(pageContent);
});
And something for those of you who like to have more than one different examples:
var inner_content;
var page_title = "home";
$.ajax({
url: "https://mysharepoint.sharepoint.com/MyEnterpriseWikiSite/_api/web/Lists/getbytitle('Pages')/items?$filter=Title eq '" + page_title +"'",
type: "GET",
headers: {
"ACCEPT": "application/json;odata=verbose"
},
success: function (data) {
if (data.d.results[0]) {
inner_content = data.d.results[0].PublishingPageContent;
}
},
error: function(){ //Show Error here }
});
That's what did the job for me.
This example fetches the inner content of an Enterprise wiki page by Title( make sure you are not using the Name of the page, although Title and Name can be given the same string value, they are different fields in sharepoint 2013 )

Extract URL from document.write()

I have a simple html page that sources a javascript file. The javascript file's only purpose is to write the following...
document.write('<img src="http://www.location.com/image.png">');
Once the information has been written, im needing some javascript to extract the url and the image source and return the url and image locations alone.
Any help is appreciated. Thank you in advance!
Check out this answer which gives shows you how to do it with JQuery or just plain Javascript
UPDATE:
If you have the ability to modify the HTML, then why don't you put in a DOM element that you can hook on to right after where the image will be inserted? Then you can use the following JQuery:
var linkDest = $('#Anchor').prev().attr('href');
var imgSrc = $('#Anchor').prev().children().attr('src');
Which you can see in this JSFiddle example