XSLT Transformation requires counts - xslt

Hello I am transforming a csv file using XSLT files to pull training records out by employees.
What I know need to do is also pull a footer on the bottom of the CSV file with the total record count and somehow count each record that is transformed so I can do a compare of these in the system I am importing these into.
This is what the source file looks like -
TrainingRecord,,SP Training,,geoff.culbertson,,Trained,,IT
TrainingRecord,,SP Training,,jim.schultz,,Trained,,IT
RecordCount|2
So I need to transform the Record Count at the end of the file and do a count for each record in this example it would be 2 and transform that so I can do a compare.

It seems very strange to use XSLT with a CSV as the input, but to answer your question, but based on the XSLT you've provided us, I believe you could obtain the row count with:
<xsl:value-of select="count(//Line)" />

Related

Expressions in Data Integrator tool on Informatica Cloud

I use Data Integrator tool on IICS and I have a csv file as source and need to change the data type on every single column as they all become nvarchar when read from the file. I have made an Expression transformation and use the To_Decimal function in each expression. But i find it very time consuming and booring to creat about a 100 expressions? This was easier and quicker to do in PowerCenter ... is there a smarter and quicker way to do this in IICS?
Br,
Ø
This is where re-usability plays vital role.
create a reusable exp transformation which will take input and convert it to decimal (). create 10 generic input and 10 generic output. One pair is shown below. Just copy and paste them 10 times and make sure the columns are properly set in formula.
in_col1 (string (150))
...
out_col1 (decimal(22,7) = To_Decimal( ltrim(rtrim( in_col1,7)))
Then copy it 10 times for your mapping. Pls note i used trim to remove spaces.
You can do this for date columns, trim space from string too.

PDI - Multiple file input based on date in filename

I'm working with a project using Kettle (PDI).
I have to input multiple file of .csv or .xls and insert it into DB.
The file name are AAMMDDBBBB, where AA is code for city and BBBB is code for shop. MMDD is date format like MM-DD. For example LA0326F5CA.csv.
The Regexp I use in the Input file steps look like LA.\*\\.csv or DT.*\\.xls, which is return all files to insert it into DB.
Can you indicate me how to select the files the file just for yesterday (based on the MMDD of the file name).
As you need some "complex" logic in your selection, you cannot filter based only on regexp. I suggest you first read all filenames, then filter the filenames based on their "age", then read the file based on the selected filenames.
In detail:
Use the Get File Names step with the same regexp you currently use (LA.*\.csv or DT.*\.xls). You may be more restrictive at that stage with a Regexp like LA\d\d\d\d.....csv, to ensure MM and DD are numbers, and DDDD is exactly 4 characters.
Filter based on the date. You can do this with a Java Filter, but it would be an order of magnitude easier to use a Javascript Script to compute the "age" of you file and then to use a Filter rows to keep only the file of yesterday.
To compute the age of the file, extract the MM and DD, you can use (other methods are available):
var regexp = filename.match(/..(\d\d)(\d\d).*/);
if(regexp){
var age = new Date() - new Date(2018, regexp[1], regexp[2]);
age = age /1000 /60 /60 /24;
};
If you are not familiar with Javascript regexp: the match will test
the filename against the regexp and keep the values of the parenthesis
in an array. If the test succeed (which you must explicitly check to
avoid run time failure), use the values of the match to compute the
corresponding date, and subtract the date of today to get the age.
This age is in milliseconds, which is converted in days.
Use the Text File Input and Excel Input with the option Accept file from previous step. Note that CSV Input does not have this option, but the more powerful Text File Input has.
well I change the Java Filter with Modified Java Script Value and its work fine now.
Another question, how can I increase the Performance and Speed of my current transformation(now I have 2 trans. for 2 cities)? My insert update make my tranformation to slow and need almost 1hour and 30min to process 500k row of data with alot of field(300mb) and my data not only this, if is it work more fast and my company like to used it, im gonna do it with 10TB of data/years and its alot of trans and rows. I need sugestion about it

xml format for spreadsheet

I have a scientific c++ project, which need to report some information, like performance in each iteration, time step size in each iteration, some other specific values per iteration and total values. Definitely, it is possible to dump this information to std output, redirect to a file, that is how it works now. But it would be nice to dump this to an xml file. First it is in human readable format. Second, if this format is possible to be imported by OpenOffice, LibreOffice, that the data would be display in a nice table view with computed max, min, average and some graphics. Is there a format for that?
OpenDocument Format (ISO/IEC 26300:2006) and Office Open XML (ISO/IEC 29500) are standard XML based file formats that support spreadsheets.

XSL-T: How to get the length of XML tree in characters other than "string-length(serialize(.))"?

Good afternoon!
The question: How to get the length of XML tree in characters using XSL-T or XPath in Saxon?
The goal: I would like to transform XML to "large" CSV and "small" CSV based on the size of the second-level elements (/root/secondLevelElement). The size is expressed in the number of characters. Additionally edited: The whole my effort is is about ETL (extract transform load) of XML to SQL database with huge continuous parallel load in the following way: Application Server -> extract to XML file -> transform from XML file to CSV file using XSL-T -> import into database. In one XML file will be from 20.000 to 50.000 secondLevelElements based on configuration of the script. Each of secondLevelElement could be from 5 to 15+ element level deep. The last column of the CSV will be the full secondLevelElement XML ready to imported as VARCHAR2(4.000) or CLOB, while previous columns will be some metadata extracted by XPath from secondLevelElement. Since the character length during import into database is crucial, that is why I need to know the EXACT length of the each full secondLevelElement XML.
The problem: I have found the following solution using XSL-T 3.0 functions "string-length(serialize(.))"
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="3.0">
<xsl:output
method="text"/>
<xsl:template
match="/root/secondLevelElement">
<xsl:value-of
select="string-length(serialize(.))"/>
</xsl:template>
</xsl:stylesheet>
but it looks like quite slow for large XMLs. Is there any faster solution like some Saxon extension in Saxon PE or EE?
Thank You in advance for Your tips. Stepan
If by "length of XML tree" (a strange concept: trees have height and breadth, but not length) you do actually mean the number of characters in the serialized output, then a pretty close approximation will be something like
sum(.//*/(string-length(name())*2 + 5))
+ sum(.//#*/(string-length(name()) + string-length(.) + 4))
+ sum(.//text()/string-length())
Computing that should be a fair bit faster than actually serializing.
It doesn't allow for empty element tags, namespace declarations, comments, or processing instructions, but it's not clear how accurate you need to be.
Because Saxon HE 9.6.n.n for Java released at 2014-10-02 has support of XPath 3.0 and XPath 3.0 contains function serialize() and function string-length(), so the final string-length(serialize(myElement)) is my choice now.

Convert text to elements in XML using XSLT

I'm currently migrating XML from one CMS to another and needs to convert some text to elements. Because of how the system works, some editors can only enter escaped text. The challange is to replace some of these escaped elements and convert them into valid XML elements.
Source file:
<p>Press the <button-name>Select key </button-name>to show more information.</p>
<p>Press the <button-name>Back key</button-name> to save the
values.</p>
<p>When the storage is completed, the <product-name/> machine
displays:</p>
<p><attention>
<display-text translate="no">STORAGE COMPLETED
Press BACK to exit</display-text>
</attention></p>
What I want to do
Replace <button-name> with <gui>
Replace <button-name> with <kt.in name="custom-name"/>
Keeping other escaped elements.
XML I want
<p>Press the <gui>Select key</gui>to
show more information.</p>
<p>Press the <gui>Back key</gui>
to save the calibrations values.</p>
<p>When the storage is completed, the <kt.in name="custom-name"/> machine
displays:</p>
<p><attention> <display-text translate="no">STORAGE COMPLETED
Press BACK to exit</display-text>
</attention></p>
I tried using a string-based search-and-replace but as I want proper an XML element as output this wouldn't do it.
This is probably only going to work by string-based-search-and-replace - depending on the amount of text "tags" you want to switch to xml. The bigger problem I see is actually keeping it all in a proper XML-Element.
I dont think that you could keep this without writing a small tool that will read the strings between the elements of the text e.g.
<button-name>
and copy them into the right variables of an Object which you then parse back as XML conform Element.
It doesnt really depend on the language you prefer since there should be plenty of object-xml parsers available
For just changing the tags you could also switch the encoding of the text as
< would turn into -> <
and then filter any content in between the <> to exchange the ones you want e.g. button-name to gui
hope I could give you an idea..