WSO2 - Table created using Analytic Script Invisible in Gadget Generation Tool - wso2

My use case: pushes data from a stream configured in the ESB to BAM and create a report using “Gadget Generation Tool”
Publishing the stream from ESB to BAM after adding an agent to the proxy service worked fine.
From the stream I created a table using the Analytics->Add screen and the table seems to persist as I am able to do a select and see results from the same screen.
Now I am trying to generate a Dashboard using the Gadget Generation Tool but the table is not available, though the jdbc connection is working fine but the table is nowhere:
Script for Analytic Table run from Analytics->Add screen
CREATE EXTERNAL TABLE IF NOT EXISTS CREDITTABLE(creditkey STRING, creditFlag STRING, version STRING)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1" ,
cassandra.port" = "9163" , "cassandra.ks.name" = "EVENT_KS" ,
"cassandra.ks.username" = "admin" ,
"cassandra.ks.password" = "admin" ,
"cassandra.cf.name" = "firstStream" ,
"cassandra.columns.mapping" = ":key,payload_k1-constant, Version" );
Tried looking for table in following databases:
jdbc:h2:repository/database/WSO2CARBON_DB;AUTO_SERVER=TRUE
jdbc:h2:repository/database/metastore_db;AUTO_SERVER=TRUE
jdbc:h2:repository/database/samples/BAM_STATS_DB;AUTO_SERVER=TRUE
Have not done any custom db configurations.

Did you try jdbc:h2:repository/database/samples/WSO2CARBON_DB;AUTO_SERVER=TRUE? Also, what you have pasted is the Cassandra Storage Definition, probably used for getting the input, not persisting the output. If you give the full hive query, that would help to figure out the problem more.

Why did I not see the table in Gadget Generation tool?
The table I have created using the Hive script is a Casandra Distributed database table and the reference I gave in the Gadget generation tool while looking up for the table were from the h2 RDBMS database table.
Below are the references to the h2 RDBMS databse which comes out of box with WSO2
jdbc:h2:repository/database/WSO2CARBON_DB;AUTO_SERVER=TRUE
jdbc:h2:repository/database/metastore_db;AUTO_SERVER=TRUE
jdbc:h2:repository/database/samples/BAM_STATS_DB;AUTO_SERVER=TRUE
Resolution ----- How to get tables listed in the Gadget Generation tool?
To get the tables listed in the Gadget Generation tool you have to extensively use the Hive Script to complete the following 3 steps:
Create a Hive table reference for the Casandra data stream to which data is pushed from ESB in my case.
CREATE EXTERNAL TABLE IF NOT EXISTS CREDITTABLE(
payload_creditkey STRING, payload_creditFlag STRING, payload_version STRING) STORED BY
'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1" ,
"cassandra.port" = "9163" , "cassandra.ks.name" = "EVENT_KS" , "cassandra.ks.username" = "admin" , "cassandra.ks.password" = "admin" ,
"cassandra.cf.name" = "firstStream" , "cassandra.columns.mapping" = ":key,payload_k1-constant, Version" );
Using Hive script create a H2 RDBMS script and reference to which I would be copying my data from the Casandra stream.
CREATE EXTERNAL TABLE IF NOT EXISTS CREDITTABLEh2summary(
creditFlg STRING,
verSion STRING
)
STORED BY
'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler'
TBLPROPERTIES (
'mapred.jdbc.driver.class' = 'org.h2.Driver' ,
'mapred.jdbc.url' = 'jdbc:h2:C:/wso2bam-2.2.0/repository/samples/database/BAM_STATS_DB' ,
'mapred.jdbc.username' = 'wso2carbon' ,
'mapred.jdbc.password' = 'wso2carbon' ,
'hive.jdbc.update.on.duplicate' = 'true' ,
'hive.jdbc.primary.key.fields' = 'creditFlg' ,
'hive.jdbc.table.create.query' = 'CREATE TABLE CREDITTABLE_newh2(creditFlg VARCHAR(100), version VARCHAR(100))' );
Write a Hive query using which data would be copied from Casandra to H2[RDBMS]
insert overwrite table CREDITTABLEh2summary select a.payload_creditFlag,a.payload_version from CREDITTABLE a;
On doing this I was able to see the table in the Gadget Generation tool however I also had to chage the referenc to the H2 Database to absolute in the JDBC URL value that I passed.
Observation:
Was wondering if the Gadget generation tool can directly point to the Casandra Stream without having to copy the tables to a RDBMS database.

Related

GCP - load table from file and variable in BQ routine

I want to load table from file and variable . As the file schema is not same as table to be loaded hence extra columns needs to be filled by variable inside stored procedure.
Like below example pty is not part of csv file and other 2 columns mt and de are part of file.
set pty = 'sss';
LOAD DATA INTO `###.Tablename`
(
pty STRING ,
mt INTEGER ,
de INTEGER
)
FROM FILES
(
format='CSV',
skip_leading_rows=1,
uris = ['gs://###.csv']
);
I think you can do that on 2 steps and 2 queries :
LOAD DATA INTO `###.Tablename`
FROM FILES
(
format='CSV',
skip_leading_rows=1,
uris = ['gs://###.csv']
);
update `###.Tablename`
set pty = "sss"
where pty is null;
If it's complicated for you to apply your logic with Bigquery and SQL, you can also create a Python script with Google Biguery client and Google storage client.
You script loads the csv file
Transforms results to a list of Dict
Add extra fields to each element of the Dict with your code logic
Load the result Dicts to Bigquery

PowerBI Query contains transformations that can't be used for DirectQuery

I am using PowerBI Desktop (2.96.1061.0) to connect to a local MS SQL server so I can prepare some visualizations. It is important to mention that all data connections (Tables, SQL queries) are using the DirectQuery option.
It's been quite a smooth experience so far. No issues at all. Now I am trying to get some new data, again, through a direct SQL query:
SELECT BillId, string_agg(PGroupName, ', ')
FROM
(SELECT bm.ImportedBillsId as BillId, pg.Name as PGroupName
FROM [BillMp] bm
JOIN [Mps] m on bm.ImportersId = m.Id
JOIN [PGroups] pg on m.PoliticalGroupId = pg.Id
GROUP BY bm.ImportedBillsId, pg.Name) t
GROUP BY BillId
but for some reason, it is not letting me re-create the model and apply the new changes. No matter that the import wizard is able to visualize the actual data prior to the update. This is the error that I am getting:
I have also tried to import only the data from the internal/nested query
SELECT bm.ImportedBillsId as BillId, pg.Name as PGroupName
FROM [BillMp] bm
JOIN [Mps] m on bm.ImportersId = m.Id
JOIN [PGroups] pg on m.PoliticalGroupId = pg.Id
GROUP BY bm.ImportedBillsId, pg.Name
and process (according to this article) the other/outer query through PowerBI but I am still getting the same error.

How to use query parameters in GCP BigQuery federated queries

I have a gcp based environment. I use standard SQL scripting in gcp BigQuery and federated query to cloudsql MySql. Federated query selects data from cloudsql mysql database. I need to select data from cloudsql mysql database based on condition that depends on data in BigQuery. I use variables in standard sql scriping in gcp bigquery to store the value that I select from bigquery. I want to value of this variable in the where clause of mysql query. See following example where I select a date from BigQuery and store it in a variable "BQ_LAST_DATETIME".
DECLARE BQ_LAST_DATETIME DATETIME
SET BQ_LAST_DATETIME = (select max(date_created) from bq_my_dataset.bq_my_table);
Since I am using bigquery federated query to read data out of cloudsql database (https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries) as shown below and I want to use value that I stored in the variable "BQ_LAST_DATETIME" in the mysql query where clause
SELECT * FROM EXTERNAL_QUERY("my-gcp-project.my-region.my-connection2-cloudsql", "select * from mysqlschema.mysql_table where where date_created = #BQ_LAST_DATETIME;" );
Please note that in above query I have used "#BQ_LAST_DATETIME" as a placeholder to show what I want to achieve. I am not sure if I can directly use bigquery scripting variable as query parameter in the "external" query part of federated query.
Any suggestions on how to achieve parametrization of external queries in federated query, or if you know how I could achieve effect similar to what my intent is?
I actually tried following as depicted . I used bigquery scripting variable as query parameter in the "external" query part of federated query. only nuance here is that since the I was dealing with dates I performed a cast and also since the date variable actually is treated as a string I formatted it back to date using mysql STR_TO_DATE as follows
DECLARE BQ_LAST_DATETIME DATETIME
SET BQ_LAST_DATETIME = (select max(date_created) from bq_my_dataset.bq_my_table);
SET BQ_LAST_DATE= CAST(BQ_LAST_DATETIME AS DATE);
SELECT * FROM EXTERNAL_QUERY("my-gcp-project.my-region.my-connection2-cloudsql", "select * from mysqlschema.mysql_table where where date_created = STR_TO_DATE(#BQ_LAST_DATE,'%Y-%m-%d') ;" );
While this query is accepted by parser it is NOT giving expected result.
Basically the value of the variable #BQ_LAST_DATE does not seem to get to MySQL query as expected.
Does anyone know what am I missing ?
Thanks a lot for your help
You can try EXECUTE IMMEDIATE:
DECLARE BQ_LAST_DATETIME STRING;
DECLARE DSQL STRING;
SET BQ_LAST_DATETIME = 'SELECT max(date_created) from bq_my_dataset.bq_my_table';
SET DSQL = '"select * from mysqlschema.mysql_table where date_created = (' || BQ_LAST_DATETIME || ')"';
EXECUTE IMMEDIATE 'SELECT * FROM EXTERNAL_QUERY("my-gcp-project.my-region.my-connection2-cloudsql",' || DSQL || ');'

QuickSight could not generate any output column after applying transformation Error

I am running a query that works perfectly on AWS Athena however when I use athena as a data source from quicksight and tries to run query it keeps on giving me QuickSight could not generate any output column after applying transformation error message.
Here is my query:
WITH register as (
select created_at as register_time
, serial_number
, node_name
, node_visible_time_name
from table1
where type = 'register'),
bought as (
select created_at as bought_time
, node_name
, serial_number
from table1
where type= 'bought')
SELECT r.node_name
, r.serial_number
, r.register_time
, b.bought_time
, r.node_visible_time_name
FROM register r
LEFT JOIN bought b
ON r.serial_number = b.serial_number
AND r.node_name = b.node_name
AND b.bought_time between r.deploy_time and date(r.deploy_time + INTERVAL '1' DAY)
LIMIT 11;
I've did some search and found similar question Quicksight custom query postgresql functions In this case adding INTERVAL '1' DAY had the problem. I've tried other alternatives but no luck. Furthermore running query without it still outputs same error message.
No other lines seems to be getting transformed in any other way.
Re-creating dataset and running exact same query works.
I think queries that has been ran on existing dataset transforms the data. Please let me know if anyone knows why this is so.

Issue querying Athena with select having special characters

Below is the select query I am trying:
SELECT * from test WHERE doc = '/folder1/folder2-path/testfile.txt';
This query returns zero results.
If I change the query using like, it works omitting the special chars /-.
SELECT * from test WHERE doc LIKE '%folder1%folder2%path%testfile%txt';
This works
How can I fix this query to use eq or IN operator, as I am interested to run a batch select?
To test your situation, I created a text file containing:
hello
there
/folder1/folder2-path/testfile.txt
this/that
here.there
I uploaded the file to a directory on S3, then created an external table in Athena:
CREATE EXTERNAL TABLE stack (doc string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = ",", "escapeChar" = "\\")
LOCATION 's3://my-bucket/my-folder/'
I then ran the command:
select * from stack WHERE doc = '/folder1/folder2-path/testfile.txt'
It returned:
1 /folder1/folder2-path/testfile.txt
So, it worked for me. Therefore, your problem would either be a result of the contents of the file, or the way that the external table is defined (eg using a different Serde).