Upload data from API into GCP dataprep - google-cloud-platform

Is it possible to import data from a Restful API directly into data prep?
I think there might be a couple of work arounds...
1: Save the results to a JSON file in a GCS bucket and import from there.
2: Import the results into a Big Query table and then import into data prep from there.
It would be much smoother to just call an API and get a result set, as opposed to having to take an extra step. I just can't find anywhere that explains how to do this.
TIA!

Long story short: there's no real way to directly stream data into Data Prep. Even the new Dataprep Premium Edition expects that you'll have the data in some form of a database--though this does expand your options to Google Sheets, Salesforce, Oracle, Microsoft SQLServer, MySQL and PostgreSQL.
Personally, I've just gotten in the habit of writing directly into BigQuery and/or Firestore-to-BigQuery to get around this sort of thing. It also has the nice side effect of being another type of logging from applications.

Related

Do I need to update item csv in AWS personalize?

I'm trying to use AWS personalize, and following their documents.
So I've uploaded dataset files(interaction, user, item) to S3, then created a solution and a campaign.
And I implemented PutEvents API using java.
GetRecommendations API call works good.
At this moment I'm curious I need to update dataset files, especially item csv.
In general it's done at this point for very basic recommendations.
Since you are using PutEvents call, then all of the real-time events are added to Interactions dataset this way. Interactions datasets created by manual import and by PutEvents calls are separated from themselves. You can actually see them in Personalize Datasets web console.
Still you might want to update dataset files, using dataset import job feature, but it's going to replace your existing dataset. In general I would recommend using it only when:
You just created a fresh/bigger/better dump of your database with Interactions.
You've found, that your previous interactions dataset was invalid.
The schema of dataset changed (pretty much you are forced to do it then).
User or Item dataset changed/improved, it's actually a good idea to refresh it often, so Personalize can produce better recommendations. Keep in mind, that it also requires retraining of the Solution, so the new Items/Users will be included during the recommendations generation.
So for interactions you usually don't want to update dataset. For other datasets it might be a good idea to even create an automatic import mechanism.
Keep in mind, that Items and Users datasets are used only with Personalize Recipes, that support metadata. Otherwise they are simply ignored.

Export Data from BigQuery to Google Cloud SQL using Create Job From SQL tab in DataFlow

I am working on a project which crunching data and doing a lot of processing. So I chose to work with BigQuery as it has good support to run analytical queries. However, the final result that is computed is stored in a table that has to power my webpage (used as a Transactional/OLTP). My understanding is, BigQuery is not suitable for transactional queries. I was looking more into other alternatives and I realized I can use DataFlow to do analytical processing and move the data to Cloud SQL (relationalDb fits my purpose).
However, It seems, it's not as straightforward as it seems. First I have to create a pipeline to move the data to the GCP bucket and then move it to Cloud SQL.
Is there a better way to manage it? Can I use "Create Job from SQL" in the dataflow to do it? I haven't found any examples which use "Create Job From SQL" to process and move data to GCP Cloud SQL.
Consider a simple example on Robinhood:
Compute the user's returns by looking at his portfolio and show the graph with the returns for every month.
There are other options, beside pipeline use, but in all cases you cannot export table data to a local file, to Sheets, or to Drive. The only supported export location is Cloud Storage, as stated on the Exporting table data documentation page.

Querying BigQuery Dataset from Django App Engine

I have data stored in BigQuery - it is a small dataset - roughly 500 rows. I want to be able to query this data and load it in to the front end of Django Application. What is the best practice for this type of data flow?
I want to be able to make calls to the BigQuery API using Javascript. I will then parse the result of the query and serve it in the webpage. The alternative seems to be to find a way of making a regular copy of the BigQuery data which I could store in a Cloud Storage Bucket but this adds a potentially unnecessary level of complexity which I could hopefully avoid if there is a way to query the live dataset.

How to update data in google cloud storage/bigquery for google data studio?

For context, we would like to visualize our data in google data studio - this dataset receives more entries each week. I have tried hosting our data sets in google drive, but it seems that they're too large and this slows down google data studio (the file is only 50 mb, am I doing something wrong?).
I have loaded our data into google cloud storage --> google bigquery, and connected my google data studio to my bigquery table. This has allowed me to use the google data studio dashboard much quicker!
I'm not sure what is the best way to update our data weekly in google cloud/bigquery. I have found a slow way to do this by uploading the new weekly data to google cloud, then appending the data to my table manually in bigquery, but I'm wondering if there's a better way to do this (or at least a more automated way)?
I'm open to any suggestions, and if you think that bigquery/google cloud storage is not the answer for me, please let me know!
If I understand your question correctly, you want to automate the query that populate your table, which is connected to Data Studio.
If this is the case, then you can use Scheduled Query from BigQuery. Scheduled query allow you to define a query which results can be inserted in a new table. Particularly you can specify different rules for repetition (minimum each 15 minutes) and execution, as well as destination writing options (destination table, writing mode: append, truncate).
In order to use Scheduled Queries your account must have the right permissions. You can have a look at the following documentation to better understand how to use Scheduled Query [1].
Also, please note that at the front end the updated data in the BigQuery table will be seen updated in Datastudio at each refresh (click on refresh button in Datastudio). To automatically refresh the front-end visualization you can use the following plugin [2] or automate the click on the refresh button through Browser console commands.
[1] https://cloud.google.com/bigquery/docs/scheduling-queries
[2] https://chrome.google.com/webstore/detail/data-studio-auto-refresh/inkgahcdacjcejipadnndepfllmbgoag?hl=en

Pulling Instagram data into Google Big Query

I am new to development, so I am sorry if this is a really basic question. I am trying to access some of the data available from instagram's API as documented here. https://developers.facebook.com/docs/instagram-api/insights.
I would like some kind of data repository to pull the data into, so I am looking at Google Big Query to see if I can pull in the data. (The ultimate place will be PowerBi so I can publish online)
Looking at the Facebook request code - is it possible to put this into Google Big query to return the data?
I am replacing the 'instagram-business-user-id' with an ID I have generated already - but it feels like perhaps it needs more markup to let Big Query know what language it is in.
Any help would be much appreciated.
GET graph.facebook.com/{instagram-business-user-id}/insights
?metric=impressions,reach,profile_views
&period=day
Looking at the Facebook request code - is it possible to put this into Google Big query to return the data?
Yes it's absolutely possible using bigQuery API or bigQuery CLI
You can use this Psuedo workflow as an example (using BigQuery API):
Create a table in bigQuery with the desired schema for this you also have 2 options:
Save the result in 1 column with the full JSON, This means to the select you need you use JSON_EXTRACT to fetch specific data
Process the JSON in your code and save it in specific columns to simplify the select statement
Call instagram's API
Call bigQuery API or bigQuery CLI to insert the data, This link provides one option how to do this
Call bigQuery API or bigQuery CLI to fetch the data, This link provides one option how to do this