Convert xlf to html using okapi - file-conversion

I have implemented a local service that allows converting multiple formats like html, docx, xlsx, tmx... to XLIFF. After performing a specific process with xlf generated file I want to get it back to its original format. I use okapi libraries for this purpose and all works properly.
I would like to know if okapi implements a mechanism to convert xlf to its original file format, speciall xlf to html (this format is mandatory for me).
Is there any suitable approach?
Thanks in advance

Yes, this is generally possible. Okapi calls it merging, and it requires that the source HTML (or other format) file is available in addition to the translated XLIFF.
A common method for doing this is to use a pair of rainbow pipelines. The first ("extraction") pipeline looks like this:
Raw Document to Filter Events
[Other steps, such as segmentation, are
optional here]
Rainbow Translation Kit Creation (select "Generic
XLIFF" as the type)
This will generate a "translation kit" containing the source file, an extracted XLIFF, and some metadata in a file called manifest.rkm. You can then modify the XLIFF to perform the translation, etc. Then, use another pipeline to perform the merge:
Raw Document to Filter Events
Rainbow Translation Kit Merging
Sort of confusingly, the source file for this merge pipeline should be the manifest.rkm file for the translation kit, not the XLIFF or the source file. Okapi will parse the manifest and figure out where everything else is, then merge the translations from the XLIFF back into a new output copy of the HTML.
This process can fail if you do sufficiently gruesome things to the XLIFF that Okapi can't figure out how to map the translated segments back to the original document any more.
A quick-and-dirty way to do this same thing, without the kit, is to use the tikal command-line tool that is bundled with Okapi. First, use this to extract test.html to test.html.xlf:
tikal.sh -fc okf_html -x test.html
Then, merge the translated test.html.xlf to an output test.out.html:
tikal.sh -fc okf_html -m test.html.xlf

I do not understand your question: can you convert files back or not? I assume not, and that's what this answer is about.
The Okapi doc at http://www.opentag.com/okapi/wiki/index.php?title=Rainbow says:
There are filters for many formats, for example: OpenOffice, XML, HTML, Properties, DTD, MS Office, tables, etc.
To convert XLIFF files back to their original format you have to add the Filter Events to Raw Document Step to your command pipeline. There are two filter configurations available for HTML, and one for HTML 5.

Related

How to use *.mbconfig files with mlnet CLI

I am looking to automate more of the auto-training that can be done via the Visual Studio GUI. The mlnet command line tool is useful, but doesn't allow specification of column types, and seems to default many of my numerical fields to "strings" rather than "single" when loading data from a CSV file (especially values such as '0.05663258').
Is there a way to pass a .mbconfig file to the mlnet command line tool (since these are just JSON files with a great deal more flexibility)? It looks like this might be a pending feature request, but the tool's documentation is a little inconsistent from source to source...
Alternatively, is there a way to specify column types (or default column types) in the CLI? I do see the command options to ignore columns, but nothing to control either default column datatypes, or datatypes for individual columns.
If you install the new ML.NET CLI (version 16.13 or up), then it will include a train command and you will use it like this...
mlnet train --training-config <mbconfig-name>
Note that the training data that was used to generate the "mbconfig" file will also need to be in the same directory.

Pentaho reports localization into not supported languages

I am from Slovakia, I wouldn't be surprised if most of you haven't heard about it.
However, that causes me a troubles when it comes to reports. We need to have 3 (soon 4) language versions of each report: Slovak is main language, than, Polish and English.
Since pentaho does not support Polish nor Slovak, it is really pain for me to keep these localized.
What I do is:
Create report in Slovak language
Write down all phrases from report
Send phrases to one of our partners to translate
Create its copy in either pl/en directory
Open it in Report Designer and edit every phrase accordingly
Save as another language version
As you can imagine, the process is very time consuming, and error prone. Plus, every time I add new parameter to report or change its data source (which is BeanShell script), I need to do it in 3 separated files. As a result of this, language mutations are usually out of date, way behind main language version.
I have tried to automate it with OneSky and did a python script that does 2 stages:
Stage 1 (extract and upload):
Change *.prpt files sufix to *.zip
Extract phrases from files: ~/datadefinition.xml, ~/layout.xml, ~/styles.xml, ~/datasources/inline-ds.xml
Put those phrases into *.po file
Export *.po file into OneSky
Stage 2 (download and import):
Change *.prpt files sufix to *.zip
Download translated *.po file from OneSky
Run through ~/datadefinition.xml, ~/layout.xml, ~/styles.xml, ~/datasources/inline-ds.xml files and replace original phrases by translated
While this aproach works fine, it doe not translate everything. There are still flaws of this process. I need to go through it every time I do even slightest change in data source of report or fix small mistakes. Even if I just do a small six in SQL code, I need to do it in 3 files. That of course increases chance to mistake be made.
Soo, I was wondering, how are you guys solving this issue with translating of your reports?
I will share very simple method which we are following.
1)create a properties file with key value format for each language for resource labels(for static values)
2)put it into resources folder(report-designer/resources/)
3)Based on the parameter you can specify which properties file to select and you can specify keys into value field so that it can understand which value to display in which language.
4)if you need to convert the data which are coming from database,you have to design data warehouse specify all the mappings,accordingly it can fetch the data.
5)For converting dates and currency symbols or number format you can use inbuilt functions which can handle all this things,i am using mysql and mysql has translation functions which can handle all such things.
it is difficult to explain entire process here, but if you can get and idea from this it can be useful to you.

XSLT to convert an XML element containing RTF data to HTML?

OK, so here's the background:
We have a third-party piece of software that does a lot of complicated stuff to generate an XML file from a lot of tables based on a wide array of business rules. The software allows you to apply an XSL transformation by supplying an XSLT file as part of its workflow, before continuing on in the process, which is usually an upload to one or more servers, based on more business rules.
Here's the problem:
One of the elements (with more on the way) this application is processing contains RTF text, and needs to be converted into formatted HTML before being uploaded. There are no means of transforming the XML inside the application other than through an XSLT file, and once we output the file, we cannot resume the workflow. My original thought was, "Easy! someone must have written a few XSL transforms for converting RTF to formatted HTML!" Hours of searching later, I must conclude I either suck at searching or it's awfully obscure.
Disclaimers:
I know the software is pretty darned limited; I'm stuck with it.
I know there are a lot of third-party tools to do this; they are not available to me because I would need to run them externally.
I know that this is not a pretty or efficient thing to do with XSLT. Changing that is not an option for me at this point.
If I cannot find a means to do this through pure XSL transforms, I will need to output the files locally, run the extra process, and take the destination routing on through a custom process. I really don't want to do that.
Does anyone have access to an XSL transformation function/ scheme that will allow me to do this natively in the application? Perhaps a series of regular expressions I could use or something?
So it turns out that external scripts can be invoked from the XSLT. It seems I will be using another scripting language to get this to work. I'm a little bummed there was no other answer available.

C++ Logger-Should I use an ordinary xml parser?

I'm working on a logging system for my 2D engine, and I'm confused on how I should go about creating/editing the file, and how I should output that file.
I've learned that XML is more of a data carrier rather than a data displayer like HTML is. I've read that I can use XML to HTML converters. One method I've thought about is writing characters to a file in HTML.
Clarity on these matters is what I ask of you, stack overflow.
Creating an XML (or HTML) file doesn't need any special library. Straightforward string concatenation is usually good enough, you may have to encode some special characters (e.g. > into >.
But as Owen says, plain text is a log more common for log files. One reasonable compromise is comma-separated values in a text file, this gives you a little bit of structure without much overhead. For example, the Windows web server (IIS) uses this format by default, and if you have some fields that are output for each line such as timestamp or source filename and line number, this makes it easy to separate those out again.
Just about every log I've ever worked with has been pure text delimited by newlines. If you're going to depart from that, you may want to ask yourself what it is about your logging needs that you want to accomplish with markup.
If you must go the way of markup, I would suggest an XML format that contains a minimal set of markup that would be useful in your situation. You could use XML to capture structure in your log entries (timestamp, severity, and operational code, for example) that would be inconvenient to code for in HTML.
Note that you could also go hybrid and embed some XHTML tags in an XML element whose purpose is to capture displayable text, if you want.
The problem with XML or HTML files is that you cannot append at any time. You have to close the final tag (document tag) properly at the end of writing.
Therefore, it's not a popular format for logging.
For logging, I suggest using one of the existing log engines, such as Apache logger, or, John Torjo's boost log candidate. They will support log levels, runtime configuration, etc.
If you are considering writing logs in XML files, please, stop.
Log files should be simple plain text files, XML-izing it is introducing needless complexity. They are not structured data, they are meant to be read by people, not automated tools.
It all starts with XML logs, and then it goes downhill from there.

Combining two PDF files in C++

In C++ I'm generating a PDF report with libHaru. I'm looking for someway to append two pages from an existing PDF file to the end of my report. Is there any free way to do that?
Thanks.
Try PoDoFo
http://podofo.sourceforge.net/
You should be able to open both of the PDFs as PDFMemDocuments using PDFMemDocument.Load( filename ).
Then, acquire references to the two pages you want to copy and add to the end of the document using InsertPages, or optionally, remove all but the last two pages of the source document, then call PDFDocument.
Append and pass the called document. Hard to say which would be faster or more stable.
Hope that helps,
Troy
You can use the Ghostscript utility pdf2ps to convert the PDF files to PostScript, append the PostScript files, and then convert them back to a PDF using ps2pdf.