Pentaho kettle denormalizer or concat

Pentaho kettle denormalizer or concat - kettle

I have one step that I need to concat x rows into one separated by coma or similar. I have this data on my final step:
I need to have the "tarifas" row in the same column with the 5 diferent "tarifas" for the same "REFERENCIA" : example:
tarifas:
T00:6.9000,T01:7.9000,T02:8.9000,T03:9.9000,T04:10.9000,T05:11.9000
I saw denormalizer step but can't get a good result.

Have you tried the Group By step?
It has an operation called Concatenate string separated by , which is exactly what you want here. For your other non-group fields use First non-null value as the operation.
Make sure your data is sorted on the group field and then the tarifas field.

Related

UNIQUE formula in Google Sheets for multiple ranges

I have a list of participants in column A. A full employee list in column B. I want to get the list of non-participants in column C. Basically 'B-A' but in list form.
'January' is the participants list:

try:
=FILTER(A:A; NOT(COUNTIF(B:B; A:A)))

It is always an added challenge to write formulas when we don't have access to actual date. But based on what I can see, try this formula in the top cell of any empty column:
=ArrayFormula({"My Header"; FILTER(R2:R,ISERROR(VLOOKUP(TRIM(R2:R),TRIM(T2:T),1,FALSE)))})
You can change "My Header" to something meaningful.
The next part means "FILTER in anything in the range R2:R that cannot be found [i.e., ISERROR(VLOOKUP(...))] in T2:T."
TRIM is used just to account for any accidental/stray spaces that may occur in either list, since that would result in no match if one or the other had extra space.
If this does not do what you expect, please share a link to a sample spreadsheet.

Sum values in one column of all rows which match one or more of multiple criteria

I have some table data in which I'd like to sum all the values in a specific column of all rows where column A contains string A and/or column B contains string B. How can I achieve this?
This works for one criterium:
=SUM(FILTER(G:G,REGEXMATCH(F:F,"stringA")))
I tried this, but it didn't work:
=SUM(FILTER(G:G,OR(ISTEXT(REGEXMATCH(F:F,"stringA")),ISTEXT(REGEXMATCH(C:C,"stringB")))))

Please try:
=SUM(FILTER(G:G,REGEXMATCH(F:F,"stringA")+REGEXMATCH(C:C,"stringB")))
+ works for or logic. ISTEXT is not needed because REGEXMATCH gives true or false.
OR does not work because filter is an arrayformula, use + in array formulas.

=SUM(FILTER(G:G,REGEXMATCH(F:F&C:C,"stringA|stringB")))
OR is denoted by |
EDIT Added &C:C to denote different Columns

How to search multiple strings in a string?

I want to check in a powerquery new column if a string like "This is a test string" contains any of the strings list items {"dog","string","bark"}.
I already tried Text.PositionOfAny("This is a test string",{"dog","string","bark"}), but the function only accepts single-character values
Expression.Error: The value isn't a single-character string.
Any solution for this?

This is a case where you'll want to combine a few M library functions together.
You'll want to use Text.Contains many times against a list, which is a good case for List.Transform. List.AnyTrue will tell you if any string matched.
List.AnyTrue(List.Transform({"dog","string","bark"}, (substring) => Text.Contains("This is a test string", substring)))
If you wished that there was a Text.ContainsAny function, you can write it!
let
Text.ContainsAny = (string as text, list as list) as logical =>
List.AnyTrue(List.Transform(list, (substring) => Text.Contains(string, substring))),
Invoked = Text.ContainsAny("This is a test string", {"dog","string","bark"})
in
Invoked

Another simple solution is this:
List.ContainsAny(Text.SplitAny("This is a test string", " "), {"dog","string","bark"})
It transforms the text into a list because there we find a function that does what you need.

If it's a specific (static) list of matches, you'll want to add a custom column with an if then else statement in PQ. Then use a filter on that column to keep or remove the columns. AFAIK PQ doesn't support regex so Alexey's solution won't work.
If you need the lookup to be dynamic, it gets more complicated... but doable you essentially need to
have an ID column for the original row.
duplicate the query so you have two queries, then in the newly created query
split the text field into separate columns, usually by space
unpivot the newly created columns.
get the list of intended names
use list.generate method to generate a list that shows 1 if there's a match and 0 if there isn't.
sum the values of the list
if sum > 0 then mark that row as a match, usually I use the value 1 in a new column. Then you can filter the table to keep only rows with value 1 in the new column. Then group this table on ID - this is the list of ID that contain the match. Now use the merge feature to merge in the first table ensuring you keep only rows that match the IDs. That should get you to where you want to be.

Thanks for giving me the lead. In my own case I needed to ensure two items exist in a string hence I replaced formula as:
List.AllTrue(List.Transform({"/","2017"},(substring) => Text.Contains("4/6/2017 13",substring)))
it returned true perfectly.

You can use regex here with logical OR - | expression :
/dog|string|bark/.test("This is a test string") // retruns true

Searching for unmatched ntheames when comparing spreadsheets

In one spreadsheet I have 3 columns with a first and last name of a person combined. In the 2nd spreadsheet, I have column a = first name and column b = last name.
I want to know which names in spreadsheet one cannot be found in spreadsheet two. I also need to verify the data to make sure that the formula was accurate on finding the correct lookup.
Do I have to combine my columns in spreadsheet 2 to make the first and last name in the same column to make this work?
Which formula would you use for either scenario?

Use this:
=ISNA(MATCH($A1&" "&$B1,Sheet2!$A:$A,FALSE)))
Where (in order):
A1 is the first name column in Sheet1
B1 is the last name column in Sheet1
Sheet2 is the sheet that has the data stored as names separately
$A:$A is the rows that have the two names together
FALSE is because it's an exact match
This will return FALSE if the element does not exist, and TRUE if it does
You can also use:
=VLOOKUP($A1&" "&$B1,Sheet2!$A:$D,3,FALSE)
If you want to retrieve data for a match.
Finally, if you need to do your lookups the other way, take a look at this thread for some ideas on how to split the string into two pieces.

Splitting column values with Sybase and ColdFusion

I need to trim the date from a text string in the function call of my app.
The string comes out as text//date and I would like to trim or replace the date with blank space. The column name is overall_model and the value is ford//1911 or chevy//2011, but I need the date part removed so I can loop over the array or list to get an accurate count of all the models.
The problem is that if there is a chevy//2011 and a chevy//2010 I return two rows in my table because of the date. So if I can remove the date and loop over them I can get my results of chevy there are 23 chevy models.

I have not used Sybase in a while, but I remember its string functions are very similar to MS SQL Server.
If overall_model always contains "//", use charindex to return the position of the delimiter and substring to retrieve the "text" before it. Then combine it with a COUNT. (If the "//" is not always present, you will need to add a CASE statement as well).
SELECT SUBSTRING(overall_model, 1, CHARINDEX('/', overall_model)-1) AS Model
, COUNT(*) AS NumberOfRecords
FROM YourTable
GROUP BY SUBSTRING(overall_model, 1, CHARINDEX('/', overall_model)-1)
However, ideally the "text" and "date" should be stored separately. That would offer greater flexibility and generally better performance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Pentaho kettle denormalizer or concat - kettle

Have you tried the Group By step? It has an operation called Concatenate string separated by , which is exactly what you want here. For your other non-group fields use First non-null value as the operation. Make sure your data is sorted on the group field and then the tarifas field.

Related

UNIQUE formula in Google Sheets for multiple ranges

Sum values in one column of all rows which match one or more of multiple criteria

How to search multiple strings in a string?

Searching for unmatched ntheames when comparing spreadsheets

Splitting column values with Sybase and ColdFusion

Categories

Resources