Im my scenario, I´d like to run a number of batch analytics queries changing just a few parameters. For instance, I´d like do change the range of dates to be used in the query.
WSO2´s point to spark sql docs, where UDFs are mentioned as a way to extend spark´s capabilities. Is this approach supported/recommended ?
TIA.
Related
I want to validate the data that is exported from Y42 to BigQuery in Google Cloud (e.g. given a predefined schema, I want to check whether all columns appear in the data, the ranges of the values, etc.).
I created a Python script that validates the data that comes in a CSV file. However, I do not know how to run the script before exporting the data to Google Cloud. I can create a VM instance in Google Cloud and run a Python script there, but I don't know how to use the data that is stored in Google Cloud in my script. Can anyone give me any hints regarding this issue?
I investigated whether there are any other ways to validate data directly in Google Cloud, but I did not find anything. Is someone aware of any data validation methods in Google Cloud?
What I usually do, I import the data in BigQuery (in a temporary table to not break my clean prod table) and I run a query on it. That query perform all the checks that I want.
If the query return lines, those lines are in error, the others are OK. Then I merge the valid data in the clean prod table, and the bad data in a log table for further analysis.
All that sequence is orchestrated with Cloud Workflow.
I am new in informatica cloud. I have list of queries ready in my table. Like below.
Now I want to take one by one query from this table which work as a source query and whatever results return which I need to load into target. All tables were already created in source and target.
I just need to copy the data based on dynamic queries which kept in my one of sql tables.
If anyone have any idea then please share your toughs with me. It great helps to me.
The source connection will be the connector to your source database and the Source Type will be query. From there it depends how you are managing your variables. See thread on Informatica Network for links to multiple examples.
Read the table like normally you would do in the cloud. Then pass each of the record into the sql transformation for execution. configure where the sql transformation has to execute and it will run the queries in the database you want.
you can use a SQL task to run dynamic SQL queries.
link to using SQL task approach: https://www.datastackpros.com/2019/12/informatica-cloud-incremental-load_14.html
I am new to wso2 6.4.0 dss, i have to do retrieve the data from multiple sheets of single excel file and insert those data into multiple tables. Please help me to do this. just guide me.
It looks like you need sophisticated logic to implement. Excel files may be a source of data. First of all how wsodss does know about a moment when it must start read excel? It sounds like wsoesb job, which supports a virtual file-system, and can truck directory and generate an event if there are any changes.
Why don't you use wsoesb to read sheet by sheet and insert data?
It provides the necessary tools (mediators) to execute.
Anyway, it does look like a ETL job.
I am new to development, so I am sorry if this is a really basic question. I am trying to access some of the data available from instagram's API as documented here. https://developers.facebook.com/docs/instagram-api/insights.
I would like some kind of data repository to pull the data into, so I am looking at Google Big Query to see if I can pull in the data. (The ultimate place will be PowerBi so I can publish online)
Looking at the Facebook request code - is it possible to put this into Google Big query to return the data?
I am replacing the 'instagram-business-user-id' with an ID I have generated already - but it feels like perhaps it needs more markup to let Big Query know what language it is in.
Any help would be much appreciated.
GET graph.facebook.com/{instagram-business-user-id}/insights
?metric=impressions,reach,profile_views
&period=day
Looking at the Facebook request code - is it possible to put this into Google Big query to return the data?
Yes it's absolutely possible using bigQuery API or bigQuery CLI
You can use this Psuedo workflow as an example (using BigQuery API):
Create a table in bigQuery with the desired schema for this you also have 2 options:
Save the result in 1 column with the full JSON, This means to the select you need you use JSON_EXTRACT to fetch specific data
Process the JSON in your code and save it in specific columns to simplify the select statement
Call instagram's API
Call bigQuery API or bigQuery CLI to insert the data, This link provides one option how to do this
Call bigQuery API or bigQuery CLI to fetch the data, This link provides one option how to do this
I'm trying to retrieve the result of query with aggregates, based on the GA sessions and using the BigQuery API in python. And then to push it to my data warehouse.
Issue: I can only retrieve 8333 records of the aforementioned query result.
But there are always 40k+ records any day of the year..
I tried to do 'allowLargeResults': True
I read I should extract all to google cloud first and then retrieve it...
Also read somewhere in Google doc that I might only get the first page?!
Has anybody faced the same situation?
See section on paging through results in the BigQuery docs https://cloud.google.com/bigquery/docs/data#paging
Alternately, you can export your table to Google Cloud Storage: https://cloud.google.com/bigquery/exporting-data-from-bigquery