I have several questions regarding the Google Spanner Export / Import tool. Apparently the tool creates a dataflow job.
Can an import/export dataflow job be re-run after it had run successfully from the tool? If so, will it use the current timestamp?
How to schedule a daily backup (export) of Spanner DBs?
How to get notified of new enhancements within the GCP platform? I was browsing the web for something else and I noticed that the export / import tool for GCP Spanner had been released 4 days earlier.
I am still browsing through the documentation for dataflow jobs and templates, etc.. Any suggestions to the above would be greatly appreciated.
Thx
My response based on limited experience with the Spanner Export tool.
I have not seen a way to do this. There is no option in the GCP console, though that does not mean it cannot be done.
There is no built-in scheduling capability. Perhaps this can be done via Google's managed Airflow service, Cloud Composer (https://console.cloud.google.com/composer)? I have yet to try this, but it is next step as I have similar needs.
I've made this request to Google several times. I have yet to get a response. My best recommendation is to read the change logs when updating the gcloud CLI.
Finally-- there is an outstanding issue with the Export tool that causes it to fail if you export a table with 0 rows. I have filed a case with Google (Case #16454353) and they confirmed this issue. Specifically:
After running into a similar error message during my reproduction of
the issue, I drilled down into the error message and discovered that
there is something odd with the file path for the Cloud Storage folder
[1]. There seems to be an issue with the Java File class viewing
‘gs://’ as having a redundant ‘/’ and that causes the ‘No such file or
directory’ error message.
Fortunately for us, there is an ongoing internal investigation on this
issue, and it seems like there is a fix being worked on. I have
indicated your interest in a fix as well, however, I do not have any
ETAs or guarantees of when a working fix will be rolled out.
Related
I am just getting started with both, GCP & Google Cloud Data Fusion. Just viewed the intro video. I see that pipelines can be exported. I was wondering how we might promote a pipeline from say, Dev to Prod env? My guess is that after some testing, the exported file is copied to the Prod branch on Git, from where we need to invoke the APIs to deploy it? Also, what about connection details, how do we avoid hard-coding the source/destination configurations & credentials?
Yes. You would have to export and re-import the pipeline.
About the first question, if you have different environments for development and production, you can export your pipeline and import it in the correct environment.
I didn't understand the second question very well. In the official Data Fusion plugins there is a standard way to provide your credentials. If you need a better answer, please explain a little more carefully your doubt.
I read many articles and solutions regarding scheduling queries to external storage places in Google Big Query but they didn't seem to be that clear.
Note: My company has subscription only to Google Big Query and not to the complete cloud Services (Google Cloud Platform).
I know how to do it manually but I am looking to automate the process since I need the same data every week.
Any suggestions will be appreciated. Thank you.
Option 1
You can use Apache Airflow which provides the option to create schedule task on to of BigQuery using BigQuery operator.
You can find in this link the basic steps required to start setting this up
option 2
You can use the Google BigQuery command line to export your data as you do from the webUI, for example:
bq --location=[LOCATION] extract --destination_format [FORMAT] --compression [COMPRESSION_TYPE] --field_delimiter [DELIMITER] --print_header [BOOLEAN] [PROJECT_ID]:[DATASET].[TABLE] gs://[BUCKET]/[FILENAME]
Once you get this working you can use any schedule process of your liking to schedule the run of this job
BTW: Airflow has a connector which enables you to run the command line tool
Once the file in GCP you can use Box G suite integration to see and manage your files
I'm attempting to create an integration between Bitbucket Repo and Google Cloud Build to automatically build and test upon pushes to certain branches and report status back (for that lovely green tick mark). I've got the first part working, but the second part (reporting back) has thrown up a bit of a stumbling block.
Per https://cloud.google.com/cloud-build/docs/send-build-notifications, Cloud Build is supposed to automatically publish update messages to a Pub/Sub topic entitled "cloud-builds". However, trying to find it (both through the web interface and via gcloud command line tool) has turned up nothing. Copious amounts of web searching has turned up https://github.com/GoogleCloudPlatform/google-cloud-visualstudio/issues/556, which seems to suggest that the topic referenced in that doc is now being filtered out of results; however, that issue seems to be specific to the visual studio tools and not GCP as a whole. Moreover, https://cloud.google.com/cloud-build/docs/configure-third-party-notifications suggests that it's still accessible, but perhaps only to Cloud Functions? And maybe only manually via the command line, since the web interface for Cloud Functions also does not display this phantom "cloud-builds" topic?
Any guidance as to where I can go from here? Near as I can tell, the two possibilities are that something is utterly borked in my GCP project and the Pub/Sub topic is either not visible just for me or has somehow been deleted, or I'm right and this topic just isn't accessible anymore.
I was stuck with the same issue, after a while I created the cloud-builds topic manually and created a cloud function that subscribed to that topic.
Build details are pushed to the topic as expected after that, and my cloud function gets triggered with new events.
You can check the existence of the cloud-builds topic an alternate way from the UI, by downloading the gcloud command line tool and, after running gcloud init, running gcloud pubsub topics list to list all topics for the configured project. If the topic projects/{your project}/topics/cloud-builds is not listed, I would suggest filing a bug to the cloud build team here.
Creating the cloud-builds topic manually won't work since it's a special topic that Google managed.
In this case, you have to go to the API central and disable the CloudBuild API, and then enable it again, the cloud-builds topic will be created for you. Enable and disable Cloud Build API
I've been using Dataprep for months, and have a lot of different flows built in one of my projects. I was working with it this morning, but now when I log in, the project in Dataprep is blank, like I'm a brand new user. I'm starting to panic because months of work has vanished! Does anyone have any suggestions on what to do?
Things I've tried without success:
I switched into a different project and I can see that project's
flows listed.
Logged out/in
restarted browser
Thank you for your help, you are correct. It turns out we received an email from google with the subject "[Action Required] Please migrate off JSON-RPC and Global HTTP Batch Endpoints" (specifically storage#v1). We were not using this API with the solutions we developed within this project, so one of our developers deactivated it. It showed the affected dependencies, which included the Dataflow API. DataPrep was not disabled, nor did it need to be reenabled before accessing it again...it just lost it's metadata like both Ali T and James commented.
Google Cloud Support recommends exporting the recipes and flows (manually I believe) as the best way to prevent DataPrep working file loss in the future.
When I open the Google Cloud Shell Code Editor it is not loading the resources and hence I am unable to work. I have attached a screenshot below with a view of the developer tools console. Please help me out. Thanks.
This issue seems to be related to an internal project/billing configuration. Since this kind of access errors are thrown when the accounts have payment issues, I think that you should firstly verify that your billing account is in a good status; however, if you continue getting these error messages after this validation, I suggest you to take a look the Issue Tracker tool that you can use to raise a Cloud Shell ticket in order to verify this scenario with the Google Technical Support Team.
A couple things could cause this:
An interfering browser extension
Are you using any browser extensions that could be interfering
(e.g., an ad blocker)
A bug.
As #Armin_SC suggested, use Issue Tracker to file an issue in this case.
As a workaround, you might want to try gcloud compute ssh to connect to your instances.