Copying BigQuery result to another table is not working? - google-cloud-platform

I have noticed one weird problem on BigQuery in last 2-3 days, earlier it was working fine.
I have a bigquery table in a dataset located in EU region. I am running a simple SELECT query on that table and it ran without any issue.
Now, I am trying to save that query result into another bigquery table in the same dataset, it is giving below error -
To copy a table, the destination and source datasets must be in the
same region. Copy an entire dataset to move data between regions.
Strange part is that, other alternatives are working fine, such as -
Copying the source table to new table is working fine.
When I set the destination table in the query setting and run the query then it is able to save the query result into that configured table.
I ran the query and access the temporary table where BigQuery actually stores the query result and then copy that temporary table to destination table, this is also working.
Not sure why only the save result option is not working, it was working before though.
Anyone has any idea if something has changed on GCP recently?

You can try to create or replace table 'abc.de.omg' AS SELECT .... to store the same result.
edit: another workaround is to set it up as a schedule query and run it as a backfill once.
On another note, anyone finding this can comment on the reported bug here: https://issuetracker.google.com/issues/233184546 (i'm not the original poster)

I tried to save query results as the BigQuery table as below, I manually gave the Dataset name and it worked.

This happens when you have source and destination dataset in different region.
you can share source and destination dataset screenshots to check the region.

Tried to reproduce issue
Error when i typed dataset name manually.
Successfull When I selected dataset from drop-down - Strange

Related

Append Query Error during Service Refresh

Hello I am working on a dashboard that needs to combine 2 tables: sales table, sales order table. Two two tables come from different sources: sales table from SAP BW and sales order table through a dataflow. Both tables have a number of applied steps that delete, reformat and rename the columns. I then want to append the Sales Order Table to the Sales table. In the desktop version everything works even if I hit "refresh all" in the Power Query Editor. When I publish the report to the service and refresh, I am getting an error saying: The key didn't match any rows in the table
Full Error Message:
After troubleshooting for the last couple of days I'm pretty sure that the issue has to do with my column names being renamed. However its strange that the append would work in desktop but not in service.
My main questions are:
Can you append tables after multiple applied transformation steps?
Can the tables both refresh daily or does one or the other have to be static?
Does the order of the table queries in query editor effect the order in which tables are loaded or does everything load all at once? Can this order affect my append query?
Any suggestions or help would be greatly appreciated. I am 99% of the way to launching this new report but this refresh issue is driving me crazy.
You can append tables after multiple applied transformation steps
Both tables can refresh daily
The order of the table queries in query editor doesn't matter. Power BI will determine the order in which they are loaded. E.g. if you want to append a table to another table, that other table is loaded first.
Thanks for answering those questions Peter. I concluded that the issue had something to do with appending a dataflow. I switched to a new data source from SAP BW and its working now. What's interesting is that the dataflow is still being imported and refreshed daily so the issue had to do with just the append.

How to fetch the latest schema change in BigQuery and restore deleted column within 7 days

Right now I fetch columns and data type of BQ tables via the below command:
SELECT COLUMN_NAME, DATA_TYPE
FROM `Dataset`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE table_name="User"
But if I drop a column using command : Alter TABLE User drop column blabla:
the column blabla is not actually deleted within 7 days(TTL) based on official documentation.
If I use the above command, the column is still there in the schema as well as the table Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
It is just that I cannot insert data into such column and view such column in the GCP console. This inconsistency really causes an issue.
If I want to write bash script to monitor schema changes and do some operation based on it.
I need more visibility on the table schema of BigQuery. The least thing I need is:
Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS can store a flag column that indicates deleted or TTL:7days
My questions are:
How can I fetch the correct schema in spanner which reflects the recently deleted the column?
If the column is not actually deleted, is there any way to easily restore it?
If you want to fetch the recently deleted column you can try searching through Cloud Logging. I'm not sure what tools Spanner supports but if you want to use Bash you can use gcloud to fetch logs. Though it will be difficult to parse the output and get the information you want.
Command used below fetched the logs for google.cloud.bigquery.v2.JobService.InsertJob since an ALTER TABLE is considered as an InsertJob and filter it based from the actual query where it says drop. The regex I used is not strict (for the sake of example), I suggest updating the regex to be stricter.
gcloud logging read 'protoPayload.methodName="google.cloud.bigquery.v2.JobService.InsertJob" AND protoPayload.metadata.jobChange.job.jobConfig.queryConfig.query=~"Alter table .*drop.*"'
Sample snippet from the command above (Column PADDING is deleted based from the query):
If you have options other than Bash, I suggest that you create a BQ sink for your logging and you can perform queries there and get these information. You can also use client libraries like Python, NodeJS, etc to either query in the sink or directly query in the GCP Logging.
As per this SO answer, you can use the time travel feature of BQ to query the deleted column. The answer also explains behavior of BQ to retain the deleted column within 7 days and a workaround to delete the column instantly. See the actual query used to retrieve the deleted column and the workaround on deleting a column on the previously provided link.

How to copy one BigQuery table to the existing dataset in another region?

I'm having a trouble trying to copy the particular table from the dataset (as an example) in US region to the dataset in the south-asia-1 region.
But after I try to copy the table using "Copy" button in the UI the error appears, which tells that no such dataset is found (presumably trying to find target table in the US region, or source in asia-south-1).
I don't need to copy the whole dataset anywhere as answers in another questions suggested, just a couple.
I couldn't find compelling answers to that problem on SO yet. Thanks!
Table copy only works when source and destination tables are in the same region. A workaround solution could be:
create a temp_source dataset in the same region as the source table
copy the source table to temp_source dataset
create temp_destination dataset in the same region at the wanted destination (asia-south1 in your case)
use the BigQuery Data Transfer service (Data transfers in BigQuery cloud console)to copy temp_source dataset (containing your one table) to temp_destination
copy temp_destination.your_table to your wanted destination dataset (asia-south1)

Move entire dataset from one google project to another google project without data

as part of code deployment to production, we need to copy all tables from a big query dataset to production environment. However, the UI option or the bq command line option is moving the data too . How do I just move all the BIG QUERY tables at once from non prod to prod environment without data??
Kindly suggest?
posting my comment as an answer:
I don't know about any way how to achieve what you want directly, but there is a possible workaround:
You first need to create the dataset in the destination project and then run CREATE TABLE new_project.dataset.xx AS SELECT * FROM old_project.dataset.xx WHERE 1=0.
You also need to make sure to specify the partition field. This works well for datasets where there are just a few tables, for larger datasets you can script this operation in Python or whatever else you use.

Update AWS Athena data & table to rename columns

Today, I saw myself with a simple problem, renaming column of an Athena glue table from old to new name.
First thing, I search here and tried some solutions like this, this, and many others. Unfortunately, none works, so I decided to use my knowledge and imagination.
I'm posting this question with the intention of share, but also, with the intention to get how others did and maybe find out I reinvented the wheel. So please also share your way if you know how to do it.
My setup is, a Athena JSON table partitioned by day with valuable and enormous amount of data, the infrastructure is defined and updated through Cloudformation.
How to rename an Athena column and still keep the data?
Explaining without all the cloudformation infrastructure.
Imagine a table containing:
userId
score
otherColumns
eventDateUtc
dt_utc
Partitioned by dt_utc and stored using JSON format. Wee need to change the column score to deltaScore.
Keep in mind, although I haven't tested with others format/configurations, this should apply to any configuration supported by athena as we are going to use athena algorithm to do the job for us.
How to do
if you run the cloudformation migration first, you gonna "lose" access to the dropped column.
but you can simply rename the column back and the data appears.
Those are the steps required for rename a AWS Athena table:
Create a temporary table mapping the old column name to the new one:
This can be done by use of CREATE TABLE AS, read more in the aws docs
With this command, we use Athena engine to apply the transformation on the files of the original table for us and save at s3://bucket_name/A_folder/temp_table_rename/.
CREATE TABLE "temp_table_rename"
WITH(
format = 'JSON',
external_location = 's3://bucket_name/A_folder/temp_table_rename/',
partitioned_by = ARRAY['dt_utc']
)
AS
SELECT DISTINCT
userid,
score as deltascore,
otherColumns,
eventDateUtc,
"dt_utc"
FROM "my_database"."original_table"
Apply the database rename by running the cloudformation with the changes or on the way you have.
At this point, you can even drop the original_table, and create again using the right column name.
After rename, you will notice that the renamed column have no data.
Remove the data of the original table by deleting it's s3 source.
Copy the data from the temp table source to the original table source
I prefer to use a aws command as, there can be thousands of files to copy
aws s3 cp s3://bucket_name/A_folder/temp_table_rename/ s3://bucket_name/A_folder/original_table/ --recursive
Restore the index of the original table
MSCK REPAIR TABLE "my_database"."original_table"
done.
Final notes:
Using CREATE TABLE AS to do the transformation job, allow you to do much more than only renaming the column, for example split the data of a column into 2 new columns, or merge it to a single one.