CDA kettle over kettleTransFromFile diffren behaviour vs Pentaho Data Integration - kettle

I create in pentaho server a form using CDE. This form is a table with some input fields. On button click is generate an array which is send as parameter value. IN db table i have 3 columns: alfa, beta, gamma.
//var data = JSON.stringify(array);
var data = [
{"alfa":"some txt","beta":"another text","gamma": 23},
{"alfa":"stxt","beta":"anoxt","gamma": 43}
]
I create a kettle transformation which is run as expected. This 2 rows of array are inserted in database, but when i run same kettle transformation using CDA kettle over kettleTransFromFile in Pentaho, only first row is inserted.
This is my transformation:
Get Variable: data(string)
Modified Java Script Value: data_decode contain the json array
var data_decode = eval(data.toString());
JSON Input : alfa - $..[0].alpha, beta - $..[0].beta, gamma -$..[0].gamma
tableinsert - insert in database.
... From spoon, kettle command line all are OK, but not from Pentaho.
What is wrong?
Thank you!
Geo
UPDATE
Maybe it's miss configuration or a bug or a feauture, but i don't use this method. I find a simple method: i create a scriptable over scripting datasource with a simple java code inside (using beanshell). Now it's work as expected. I'll move this form inside a Sparkl plugin. Thank you.
This question, still remains open, maybe somebody want to try this approach.

Please, use a correct JSONPath to eliminate side effects: $.*.alpha

Related

What is the equivalent of the SQL function REGEXP_EXTRACT in Azure Synapse?

I want to convert my code that I was running in Netezza (SQL) to Azure Synapse (T-SQL). I was using the built-in Netezza SQL function REGEXP_EXTRACT but this function is not built-in Azure Synapse.
Here's the code I'm trying to convert
-- Assume that "column_v1" has datatype Character Varying(3) and can take value between 0 to 999 or NULL
SELECT
column_v1
, REGEXP_EXTRACT(column_v1, '[0-9]+') as column_v2
FROM INPUT_TABLE
;
Thanks,
John
regexExtract() function is supported in Synapse.
In order to implement it, you need to use couple of things, here is a demo that i built, here im using the SalesLT.Customer data that is supported as a sample data in microsoft:
In Synapse -> Integrate tab:
Create new pipeline
Add dataflow activity to your pipline
In dataflow activity: under settings tab -> create new data flow
double click on the dataflow (it should open it) Add source (it can be blob storage / files on prem etc.)
add a derived column transformation
in derived column add new column (or override an existing column) in Expression: add this command regexExtract(Phone,'(\\d{3})') it will select the 3 first digits, since my data has dashes in it, its makes more sense to replace all characters that are not digits using regexReplace method: regexReplace(Phone,'[^0-9]', '')
add sink
DataFlow activities:
derived column transformation:
Output:
please check MS docs:
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-derived-column
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-expression-functions
Regex_extract is not available in T-SQL. Thus, we try to do similar functionalities using Substring/left/right functions along with Patindex function
SELECT input='789A',
extract= SUBSTRING('789A', PATINDEX('[0-9][0-9][0-9]', '789A'),4);
Result
Refer Microsoft documents patindex (T-SQL), substring (T-SQL) for additional information.

Great Expectations - Run Validation over specific subset of a PostgreSQL table

I am fairly new to Great Expectations - and have a question. Essentially I have a PostgreSQL database, and every time I run my data pipeline, i want to validate a specific subset of the PostgreSQL table based off some key. Eg: If the data pipeline is run every day, the would be a field called current_batch. And the validation would occur for the below query:
SELECT * FROM jobs WHERE current_batch = <input_batch>.
I am unsure the best way to complete this. I am a using v3-api of great expectations and am a bit confused as to whether to use a checkpoint, or a validator. I assume I want to use a checkpoint but I can't seem to figure out how to create a checkpoint, but then only validate a specific subset of the PostgreSQL datasource.
Any help or guidance would be much appreciated.
Thanks,
I completely understand your confusion because I am working with GE too and the documentation is not really clear.
First of all "Validators" are now called "Checkpoints", so they are not a different entity, as you can read here.
I am working on an Oracle database and the only way I found to apply a query before testing my data with expectations is to put the query inside the checkpoint.
To create a checkpoint you should run the great_expectations checkpoint new command from your terminal. After creating it, you should add the "query" field inside the .yml file that is your checkpoint.
Below you can see a snippet of a checkpoint I am working with. When I want to validate my data, I run the command great_expectations checkpoint run check1
name: check1
module_name: great_expectations.checkpoint
class_name: LegacyCheckpoint
batches:
- batch_kwargs:
table: pso
schema: test
query: SELECT p AS c,
[ ... ]
AND lsr = c)
datasource: my_database
data_asset_name: test.pso
expectation_suite_names:
- exp_suite1
Hope this helps! Feel free to ask if you have any doubts :)
I managed this using Views (in Postgres). Before running GE, I create (or replace the existing) view as a query with all necessary joins, filtering, aggregations, etc. And then specify the name of this view in GE checkpoints.
Yes, it is not the ideal solution. I would rather use a query in checkpoints too. But as a workaround, it covers all my cases.
Let's have view like this:
CREATE OR REPLACE VIEW table_to_check_1_today AS
SELECT * FROM initial_table
WHERE dt = current_date;
And checkpoint be configured something like this:
name: my_task.my_check
config_version: 1.0
validations:
- expectation_suite_name: my_task.my_suite
batch_request:
datasource_name: my_datasource
data_connector_name: default_inferred_data_connector_name
data_asset_name: table_to_check_1_today
Yes, a view can be created using the "current_date" - and the checkpoint can simply run the view. However, this would mean that the variable (current_date) is stored in the database - which may not be desirable; you might want to run the query in the checkpoint for a different date - which could be coming from a environment variable or elsewhere - to the CLI or python/notebook
Yet to find a solution where we can substitute a string in the checkpoint query; using a config variable from the file is a very static way - there may be different checkpoints running for different dates.

Doctrine Migration from string to Entity

I've an apparently simple task to perform, i have to convert several tables column from a string to a new entity (integer FOREIGN KEY) value.
I have DB 10 tables with a column called "app_version" which atm are VARCHAR columns type. Since i'm going to have a little project refactor i'd like to convert those VARCHAR columns to a new column which contains an ID representing the newly mapped value so:
V1 -> ID: 1
V2 -> ID: 2
and so on
I've prepared a Doctrine Migration (i'm using symfony 3.4) which performs the conversion by DROPPING the old column and adding the new id column for the AppVersion table.
Of course i need to preserve my current existing data.
I know about preUp and postUp but i can't figure how to do it w/o hitting the DB performance too much. I can collect the data via SELECT in the preUp, store them in some PHP vars to use later on inside postUp to write new values to DB but since i have 10 tables with many rows this become a disaster real fast.
Do you guys have any suggestion i could apply to make this smooth and easy?
Please do not ask why i have to do this refactor now and i didn't setup the DB correctly in the first time. :D
Keywords for ideas: transaction? bulk query? avoid php vars storage? write sql file? everything can be good
I feel dumb but the solution was much more simple, i created a custom migration with all the "ALTER TABLE [table_name] DROP app_version" to be executed AFTER one that simply does:
UPDATE [table_name] SET app_version_id = 1 WHERE app_version = "V1"

Ajax call returned server error ORA-01403: no data found for APEX Interactive Grid

I am trying to save data into my table using an interactive grid with the help of custom plsql. I am running into an "ORA-01403-no data found" error while inserting data and I can't figure out why.
This is my plsql custom process which I run. Appreciate your help.
DECLARE
em_id NUMBER;
BEGIN
CASE :apex$row_status
WHEN 'C'
THEN
SELECT NVL (MAX (emergency_id), 0) + 1
INTO em_id
FROM emp_emergency_contact;
INSERT INTO emp_emergency_contact
(emergency_id, emp_id, emergency_name, emergency_relation
)
VALUES (em_id, :emp_id, :emergency_name, :emergency_relation
);
WHEN 'U'
THEN
UPDATE emp_emergency_contact
SET emergency_name = :emergency_name,
emergency_relation = :emergency_relation
WHERE emergency_id = :emergency_id;
WHEN 'D'
THEN
DELETE emp_emergency_contact
WHERE emergency_id = :emergency_id;
END CASE;
END;
So far I have not come across any documented way on how to use custom PL/SQL logic for processing submitted rows of APEX 5.1 Interactive Grid via AJAX call.
You are getting no data found error because the return is expected to be in certain json format.
The example you have provided is not too complex and can be with done using standard "Interactive Grid - Automatic Row Processing (DML)" process, which is an AJAX approach. If AJAX call is not important then you can create your own PL/SQL process with custom logic. Example of which is demonstrated in "Sample Interactive Grids" package application, check out Advanced > Custom Server Processing page in this application for more information.
I agree with Scott, you should be using a sequence or identity column for ids.
Not entirely sure. A 'select into' can raise a no_data_found exception, but yours shouldn't.
That being said, you shouldn't have max(id)+1 anywhere in your code. This is a bug. Use a sequence or identity column instead.
I have gotten this many times so the first thing I do is go look at any columns in my grid sql that are not part of the "Save", they are from a join for data only.
I just got it again and it was a heading sort column that I had as a column type of "Number". I changed it to display only and the "Save" now works.
Although, I had already set the "Source" of the column to "Query Only" which is also needed.
It is a bummer the Ajax error message doesn't at least give the column name that caused the error.
Hope this helps someone..
BillC
Add a RETURNING INTO clause after the insert. IG expects a primary key to be returned to query the inserted row.

Simple query to a WSS/MOSS list from remote client

What is the simplest way to query a WSS/MOSS list from a remote client
Using /_vti_bin/lists.asmx and XML fragments for the query seems to be a large chunk of work for a simple task?
I have found the U2U CAML Query Builder which helps a bit
If you write in .NET, reference Microsoft.SharePoint.Dll, get an instance of SPWeb and through that an instance of your list. Then you can access it directly, or create a CAML query through defining an SPQuery q, setting the CAML query string to q.Query and getting the SPListItemCollection col = list["myList"].GetItems(q);
I found the solution - Linq for Sharepoint how cool is that!
http://www.codeplex.com/LINQtoSharePoint
It's not finished, but what it has is fine for me