Big Query: Query Failed. Dataset Not Found - google-cloud-platform

Guys I imported a table from .csv to my dataset in a project.
Then I preview my table it's shown, but whenever I ran query table, it always responded with
Query Failed
Error: Not found: Dataset <project-id>:<table-name>. Please verify that the dataset exists and the correct location was used for the job.
Here's my query
SELECT distinct(customer_id) as cust_id FROM [<project-id>:<table-name>.orders] LIMIT 1000
Is there anything wrong?
Or how should I query an imported table?

From your question, I see that you are using as table name <project-id>:<table-name>, but as you can see in this documentation page, the correct naming for project-qualified table definitions is the following:
#legacySQL
[PROJECT_ID:DATASET.TABLE]
#standardSQL
`PROJECT_ID.DATASET.TABLE`
I see you are using Legacy SQL (by the usage of square brackets [ ]), so you should go with the first naming definition, but you are missing the dataset name between the project and the table.
Additionally, I see you are appending orders to the table name, but it is not clear what that is, given that you hid the table name as <table-name>.
Additionally, make sure that, if your dataset is not located in the US or EU, you specify the location when running the query, as explained in this entry in the documentation.

Related

Retrieve acolumn name from in underlying dataset (Before it got renamed in Power BI)

I'm trying to build a dynamic data dictionary for my Power BI data set. To do that, I am querying the DMVs in DAX studio to get the objects names and descriptions from the model directly.
Used query for the columns details:
SELECT * from $SYSTEM.TMSCHEMA_COLUMNS
However, when I run this query, I'm getting ExplicitName = SourceColumn. I had assumed that the SoruceColumn would contain the column name before any transformation in PowerQuery. Does anyone have any idea on how to get the original column name (the name of the column in the SQL Server DB per example)?
I have found a solution for this. You can find the technical column names in:
select * from $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS where [COLUMN_TYPE] = 'BASIC_DATA'
If building this type of dynamic data dictionary interests anyone, do let me know. I can share the end result when I'm done.

Create a table using `pg_table_def` data in Redshift or DBT

To create a table from all the data in pg_table_def that is visible to my user, I tried:
create table adhoc_schema.pg_table_dump as (
select *
from pg_table_def
);
But it throws an error:
Column "schemaname" has unsupported type "name"
Any way to create a table from pg_table_def or information_schema.columns?
Found this from another thread, which seems like it would help.
Amazon considers the internal functions that INFORMATION_SCHEMA.COLUMNS is using Leader-Node Only functions. Rather than being sensible and redefining the standardized INFORMATION_SCHEMA.COLUMNS, Amazon sought to define their own proprietary version. For that they made available another function PG_TABLE_DEF which seems to address the same need. Pay attention to the note in the center about adding the schema to search_path.
Stores information about table columns.
PG_TABLE_DEF only returns information about tables that are visible to the user. If PG_TABLE_DEF does not return the expected results, verify that the search_path parameter is set correctly to include the relevant schemas.
You can use SVV_TABLE_INFO to view more comprehensive information about a table, including data distribution skew, key distribution skew, table size, and statistics.
So using your example code (rewritten to use NOT EXISTS for clarity),
SET SEARCH_PATH to '$user', 'public', 'target_schema';
SELECT "column"
FROM dev.fields f
WHERE NOT EXISTS (
SELECT 1
FROM PG_TABLE_DEF pgtd
WHERE pgtd.column = f.field
AND schemaname = 'target_schema'
);
See also,
Official docs on Querying Redshift System Tables: https://docs.aws.amazon.com/redshift/latest/dg/t_querying_redshift_system_tables.html
pg_table_def is a leader-node system table with some additional data types not supported in user tables. You will need to cast to text first.
However, this still won't work because pg_table_def is a leader-node table and the table you are creating is a user table stored on the compute nodes. You will need to pull your source information from tables that are stored on compute-nodes. Since I don't know what information you are looking for from pg_table_def I cannot say exactly which ones you need but you can start with stv_tbl_perm and join in pg_class and other tables as more info is needed.

How to insert Billing Data from one Table into another Table in BigQuery

I have two tables both billing data from GCP in two different regions. I want to insert one table into the other. Both tables are partitioned by day, and the larger one is being written to by GCP for billing exports, which is why I want to insert the data into the larger table.
I am attempting the following:
Export the smaller table to Google Cloud Storage (GCS) so it can be imported into the other region.
Import the table from GCS into Big Query.
Use Big Query SQL to run INSERT INTO dataset.big_billing_table SELECT * FROM dataset.small_billing_table
However, I am getting a lot of issues as it won't just let me insert (as there are repeated fields in the schema etc). An example of the dataset can be found here https://bigquery.cloud.google.com/table/data-analytics-pocs:public.gcp_billing_export_v1_EXAMPL_E0XD3A_DB33F1
Thanks :)
## Update ##
So the issue was exporting and importing the data with the Avro format and using the auto-detect schema when importing the table back in (Timestamps were getting confused with integer types).
Solution
Export the small table in JSON format to GCS, use GCS to do the regional transfer of the files and then import the JSON file into a Bigquery table and DONT use schema auto detect (e.g specify the schema manually). Then you can use INSERT INTO no problems etc.
I was able to reproduce your case with the example data set you provided. I used dummy tables, generated from the below queries, in order to corroborate the cases:
Table 1: billing_bigquery
SELECT * FROM `data-analytics-pocs.public.gcp_billing_export_v1_EXAMPL_E0XD3A_DB33F1`
where service.description ='BigQuery' limit 1000
Table 2: billing_pubsub
SELECT * FROM `data-analytics-pocs.public.gcp_billing_export_v1_EXAMPL_E0XD3A_DB33F1`
where service.description ='Cloud Pub/Sub' limit 1000
I will propose two methods for performing this task. However, I must point that the target and the source table must have the same columns names, at least the ones you are going to insert.
First, I used INSERT TO method. However, I would like to stress that, according to the documentation, if your table is partitioned you must include the columns names which will be used to insert new rows. Therefore, using the dummy data already shown, it will be as following:
INSERT INTO `billing_bigquery` ( billing_account_id, service, sku, usage_start_time, usage_end_time, project, labels, system_labels, location, export_time, cost, currency, currency_conversion_rate, usage, credits )#invoice, cost_type
SELECT billing_account_id, service, sku, usage_start_time, usage_end_time, project, labels, system_labels, location, export_time, cost, currency, currency_conversion_rate, usage, credits
FROM `billing_pubsub`
Notice that for nested fields I just write down the fields name, for instance: service and not service.description, because they will already be used. Furthermore, I did not select all the columns in the target dataset but all the columns I selected in the target's tables are required to be in the source's table selection as well.
The second method, you can simply use the Query settings button to append the small_billing_table to the big_billing_table. In BigQuery Console, click in More >> Query settings. Then the settings window will appear and you go to Destination table, check Set a destination table for query results, fill the fields: Project name,
Dataset name and Table name -these are the destination table's information-. Subsequently, in
Destination table write preference check Append to table, which according to the documentation:
Append to table — Appends the query results to an existing table
Then you run the following query:
Select * from <project.dataset.source_table>
Then after running it, the source's table data should be appended in the target's table.

Power BI change original table name which is displayed in Daxstudio

I noticed very odd thing that Daxstudio allows you to view original table name. It is a bit niuanse because when you rename table created in M then Daxstudio still refers to it using original table name, not the new name. Here is how to reproduce the bug.
Power BI > Home > Enter Data
Name the table RedTable.
Run a query in Daxstudio:
EVALUATE
DISTINCT('RedTable'[Column1])
Now rename the table to BlueTable:
And run the query in Daxstudio for a new table name:
You can still see the original table name in the Query column of Server Timings.
Is there a way to change this original table name created with M?
I can add that this bug is applicable only to tables created with M (be it Enter Data or plug to Server). This bug is not reproducible with DAX tables. The DAX tables are updated after you change their name and actual (not original) name is display in Query column of Server Timings of Daxstudio.
This is not a problem of DAX Studio, even though we could try to fix it in DAX Studio. :)
Here is what happens: the Tabular model (used also by Power BI) has an ID for each object and then a Name. The ID is assigned by the Power BI and Visual Studio UI when you create an entity (table/column/...). Visual Studio shows the ID as a readonly property, whereas Power BI doesn't show that property at all.
The ID is used internally to reference the objects in the Tabular Object Model (TOM). It is also used to identify column in the interaction between Formula Engine and Storage Engine.
The Storage Engine queries captured by DAX Studio are messages intercepted with a debugging session connected to the Analysis Services session and in these messages the text represents entities (in particular table names) through their ID rather than through the Name property.
As Microsoft would say, this is "by design".
So far, DAX Studio simply capture the text of the storage engine queries and display this information.
However, DAX Studio "massages" the text, removing a lot of "noise" and making the query more readable. During this phase, it is possible to consider renaming IDs with Names.
I just created a feature request. That was easy. When to implement the feature, finding the time is much harder!

Bad Data in linked table

I pull data to SQL Server from a cobol database that is connected as a linked server.
we have ended up with bad data in one of our tables, and I am trying to track down the offending record. specifically we have a letter entered in to a year field, when SQL pulls the data over it attempts to convert that column to a numeric data type.
I believe what I need is a combination of openquery and cast to select all columns with at least that specific column as varchar, so that I can retrieve the specific offending record and have the dept. fix the error.
I have tried the following two syntax but both produces an error.
select * from [incode]...ctvehl
where VEH_YEAR like '992D'
select * from openquery (incode, 'select cast(* as nvarchar) from ctvehl')
for clarity
linked server name = incode
table name = CTVEHL
Specific offending column = VEH_YEAR
assistance with this would be greatly appreciated.
Thanks
You could just initially insert the data into a work table within SQL Server that has all varchar() columns. You could then validate and parse the work table for possible errors, moving the bad rows to an "error" table for other processing/reporting. Then insert the remaining rows into your actual table.
You should look into SQL Server Integration Services, it offers ways to mass import data and handle bad rows, see: SQL Server Integration Services Dealing with Bad Data