Our source is bigquery datasets. We would like to copy datasets across regions.
What is the difference between Copying datasets feature vs BigQuery Transfer service?
Per documentation Copying datasets is Beta Feature and it is free , Does this mean even the data extraction from Source is free or is it only the Egress charges?
There is no "real" difference. In fact, BigQuery Data Transfer Service is used for more sources than BigQuery dataset only.
For BigQuery, the copy dataset is a feature of BigQuery data transfert. To convince yourself, you can try a copy dataset without the BigQUery data transfer service API activated, and you can't! The activation is mandatory.
About the pricing, you have to pay the egress cost. But I'm currently in discussion with Google Cloud because it's not clear. Google Cloud told me the doc is outdated and normally, it's the Cloud Storage egress cost that apply (instead of Compute Engine egress cost).
It makes sense because the copy dataset is simply a wrapper:
Export the data from a table to Cloud Storage (free operation in BigQuery) in region A
Copy the export from the region A to a Cloud Storage in region B (here egress should apply)
BigQuery load job is created from the export in the region B (free operation in BigQuery)
I have no confirmation yet. Especially because the feature is free for now and I don't know if the egress is also free.
Related
I'm reading through BigQuery doc and I'm confused by the point quoted below.
Source: Google Doc
BigQuery transparently and automatically provides highly durable,
replicated storage in multiple locations and high availability with no
extra charge and no additional setup.
Source: Google Doc
BigQuery does not automatically provide a backup or replica of your data in
another geographic region. You can create cross-region dataset copies
to enhance your disaster recovery strategy.
Does BigQuery automatically replicate data across zones/regions?
For long term data storage, given the options of Big Table, Big Query and Regional Persistent Disk, is it preferable to use Regional Persistent Disk to automatically replicate data across different geographical location?
Yes, BigQuery automatically replicates data across zones/regions.
It's the part of Google doc
In either case, BigQuery automatically stores copies of your data in two different Google Cloud zones within the selected location.
But as you read, I think that you're missing some information. it mentions a hard regional failure.
What is hard regional failure? as Google Doc describe,
Hard failure is an operational deficiency where hardware is destroyed. Hard failures are more severe than soft failures. Hard failure examples include damage from floods, terrorist attacks, earthquakes, and hurricanes.
For example, in Asia-east1 (Taiwan), earthquake occurrence is quite frequent, if you're creating a dataset in this region, you might consider cross-region dataset copies to enhance your disaster recovery strategy.
I think that you can export your table data to GCS for long term data storage
Because there are some storage classes. For example,
Archive Storage is the lowest-cost, highly durable storage service for data archiving, online backup, and disaster recovery.
There is a requirement to copy from Azure Blob to S3 for 10TB data and also from Synpase to Redshift for 10TB of data.
What is the best way to achieve these 2 migrations?
For the Redshift - you could export Azure Synapse Analytics to a a blob storage in a compatible format ideally compressed and then copy the data to S3. It is pretty straightforward to import data from S3 to Redshift.
You may need a VM instance to load read from Azure Storage and put into AWS S3 (doesn't matter where). The simplest option seems to be using the default CLI (Azure and AWS) to read the content to the migration instance and write to to the target bucket. However me personally - I'd maybe create an application writing down checkpoints, if the migration process interrupts for any reason, the migration process wouldn't need to start from the scratch.
There are a few options you may "tweak" based on the files to move, if there are many small files or less large files, from which region to move where, ....
https://aws.amazon.com/premiumsupport/knowledge-center/s3-upload-large-files/
As well you may consider using the AWS S3 Transfer Acceleration, may or may not help too.
Please note every larger cloud provider has some outbound data egress cost, for 10TB it may be considerable cost
Is there any API(preferably python) that can be used to get the resources' usage cost report in GCP? Billing APIs don't seem to return the costs of resources being used.
You can export you cloud billing data to BigQuery:
https://cloud.google.com/billing/docs/how-to/export-data-bigquery
https://cloud.google.com/billing/docs/how-to/export-data-bigquery-setup
You select the dataset where the cost metadata goes, once in BigQuery, it's fairly easy to query, which you can do with the python BigQuery client API, it also makes sure you keep a historic in case you change billing provider, of course it'll incur storage cost which will vary based on your usage.
I have some datasets in my own bigquery account (organization A) that needs to be transferred to another bigquery account in organization B. How to do it?
I am aware the data transfer service and RESTapi but that seem to be transfer data across project and region level within the organization.
Thanks!
You can use BigQuery Copy Datasets feature (beta feature at the moment) to copy datasets across projects/organization and across regions (not all regions supported). Cross organization copy works as long as you don't have VPC service controls set. You can use COPY DATASET or TRANSFERS on BQ Web UI or use CLI. Using Transfers allows running the copy on a recurring schedule.
Usage: bq mk --transfer_config --project_id=[PROJECT_ID] --data_source=[DATA_SOURCE] --target_dataset=[DATASET] --display_name=[NAME] --params='[PARAMETERS]'
Use —params for the specifying source dataset and other options.
I wanted to export BigQuery table data using API and would like to know that is there any charges for API?
Export is free as operation. But you will be charged for storage you will use for files in Google Cloud Storage - you can download it quickly to your local machine - but I believe it would be still small charge for outboud operation. But it depends on how big data you export if it KB/MB - it is free - if it is TB/PB you may inquire big bill.
There is info about GCS pricing https://cloud.google.com/storage/pricing