Transposing the rows manually - kettle

I'm using the denormaliser step, in Targetfieldname column of denormaliser step I want to include 8000 rows, it is not convenient to manually enter the 8000 rows. Can someone help me with some automation?

Look at the format of the ktr (kettle transformation) files. It is pure XML. You should be able to generate an XML with the fields you need.

Related

How to Generalize an Informatica Mapping & Workflow?

I need to read a text file (JSON format) and load it into a database table in Informatica Developer. For only one text file and one database table, that is easy.
But now I have N different text files, hence N different database tables and their corresponding data processor transformations. The transformation logic inside the mappings is the same. Besides creating N sets of mappings and workflows for each set of text files, is it possible to create just one generalized mapping and workflow to cater for all text files? I would appreciate it if any one of you could give me a general direction for me to explore further.

Trying to aggregate data from multiple files in two distinct tables

I just got started in PowerBI and I am generating two report files every month from Service NOW.
SLA's report and the Incident report. Eventually, these files have the naming INC_MM_YY.xls or SLA_MM_YY.xls.
I am trying to make the addition of the previous month's files without the need to add new data sources/edit the queries. It seems that it is possible using M language in the advanced query editor but seems a lot complicated since I have 0 experience with power query M.
Are there other ways?
Or in the case above. I can retrieve the folder data as a table and iterate over the files. But how to do that in the M language?
Thank you.
EDIT: Just to try to make it clear let's look at the table generated by the folder source.
We have the name of the file and it's path for each row.
So in pseudo code should be something like:
For (each row as n) {
if (n.folderpath ends with "sla") {
tablesla += load source n."folderpath" && n."filename"
}
else tableincident += load source n."folderpath" && n."filename"
}
It just seems not practical in powerquery :/ I could find how to make something similar to a for loop but very confusing.
I figured it out.
You can actually create two different sources, one for the folder with the SLA and another with the folder for incident. Just after combining and transforming the data from one of the folders. Still in the Query Editor, you just click New Source and the other folder data will combined in a different table.
With that you have two distinct tables and any time when you put a new file in one of the folders, hit refresh, the data will be added to the correct table.
Thank you guys.
try the load from folder option, you can place each months files into a its own folder one for the SLA's and one for the Incidents. With the load from folder, it will go though each file and load it. So the next month, you add in Novembers data, refresh the dataset(s) and it will add it automatically.
The files need to be the same structure for it to work effectively, and it will load what it sees in the folder, so if you remove a file, Power BI will not retain it in the workbook, it only loads what it can see.
Other examples
https://powerbi.tips/2016/06/loading-data-from-folder/
https://insightsoftware.com/blog/power-bi-load-data-from-folder/
Hope that helps

Is it possible to validate the column order when uploading data from flat files using aws copy command

I'm uploading data from zipped flat files to redshift using copy command, I would like to understand if there is any way to validate that the column order of the files is correct? (for example, if fields are all varchar then the data could be uploaded to the wrong columns).
In the copy command documentation it shows that you can specify the column order, but not for flat files, but I was wondering if there are any other approaches that would allow me to check how the columns have been supplied (for example, uploading only the header row into a dummy table to check, but that doesn't seem a possibility).
You can't really do this inside Redshift. COPY doesn't provide any options to only load a specific number of rows or perform any validation.
Your best option would be to do this in the tool where you schedule the loads. You can get the first line from a compressed file easily enough (zcat < file.z|head -1) but for a file on S3 you may have to download the whole thing first.
FWIW, the process generating the load file should be fully automated in such a way that the column order can't change. If these files are being manually prepared you're asking for all sorts of trouble.

Using RegEx in SSIS

I currently have a package pulling data from an excel file, but when pulling the data out I get rows I do not want. So I need to extract everything from the 'ID' field that has any sort of letter in it.
I need to be able to run a RegEx command such as "%[a-zA-Z]%" to pull out that data. But with the current limitation of conditional split it's not letting me do that. Any ideas on how this can be done?
At the core of the logic, you would use a Script Transformation as that's the only place you can access the regex.
You could simply a second column to your data flow, IDCleaned and that column would only contain cleaned values or a NULL. You could then use the Conditional Split to filter good rows vs bad. System.Text.RegularExpressions.Regex.Replace error in C# for SSIS
If you don't want to add another column, you can set your current ID column to be ReadWrite for the Script and then update in place. Perhaps adding a boolean column might make the Conditional Split logic easier at this point.

Add new attribute calculated based on other attributes

I'm starting with WEKA and want to achieve the following.
I have file with 2 attributes: user_id, user_age.
I can successfully load data using WEKA API and get Instances object.
Now I want to calculate new attribute user_age_range - like (0-18) - 0, (19-25) - 1, etc.
Is there a way to calculate this attribute using WEKA Filters?
Also I would like not to iterate manually through all instances, but to define method that operates on single Instance and use some filter (or other abstraction) that'll apply corresponding "transformation" to all instances.
Please advice - how I could achieve this.
Thanks in advance.
After looking through the docs I found one or two filters that you could use in conjunction to achieve what you want.
http://weka.sourceforge.net/doc.dev/weka/filters/unsupervised/attribute/Copy.html
Use copy to create a copy that you will transform.
http://weka.sourceforge.net/doc.dev/weka/filters/unsupervised/attribute/NumericTransform.html
The numeric transform takes a class and a method option, you could write your own class that boxes the ages into the ranges you want and supply this class and method as your options.
Hope this helps
Using a csv file you can do that on Excel.
If you are using arff files, convert it to csv and then you can add the columns that you want depending on the number of new attributes and then just do whatever you want to do with one or more atributes on the first row. Extend that to all rows and it's done.