I have two tables in my report :
Stages Table (Column names - Stage Name, Stage Id, Status, etc.)
Job Table (Column names - Job Name, Job Id, Stage Id, Stage Name, Job Status, etc.)
One stage has multiple jobs and hence in Job Table, we get the stage Id to which the job belongs.
Now, one of my requirements is if I select a specific Stage Id from Stage Table, I should be able to see all the jobs belonging to that Stage, in the Job Table.
For this, I created a relationship between Stage Id (Stage Table) --> Stage Id (Job Table), basically a one to many relationship as a particular stage Id in stage table will have multiple jobs belonging to that stage.
After creating this relationship and saving all the changes, if I click on any Stage Id from Stage Table, the Job table goes blank. I don't see anything. And this is happening for all the stage Ids from Stage Table.
Note that the data is all correct in both tables. If I apply filter(on dataset level in the data section) in Job Table with a specific Stage Id, I get all the jobs for that Stage Id.
But on the visualization table when I click on stage Id in stage table, the data in Job table just vanishes.
What could be the reason ? Am I missing anything here ?
Related
In the performanceInsights of bigquery there is one field called avgPreviousExecutionMs, In the description, it was given as like this
Output only. Average execution ms of previous runs. Indicates the job ran slow compared to previous executions. To find previous executions, use INFORMATION_SCHEMA tables and filter jobs with the same query hash.
I tried to validate this avgPreviousExecutionMs data for one of my jobs based on the views in the information schema by filtering the queries with the same hash based on the query_info.query_hashes.normalized_literals field.
Steps that I have done to validate this info.
I ran a job on the flat rate pricing concurrently with other queries, to make it slower such that I can get the avgPreviousExecutionMs field in the PerformanceInsights section.
Now I want to validate this field with the information schema data
I ran this query in the information schema, by excluding my current job id
SELECT
AVG(TIMESTAMP_DIFF(end_time, start_time, MILLISECOND)) as avg_duration
FROM
region-us.INFORMATION_SCHEMA.JOBS_BY_USER where query_info.query_hashes.
normalized_literals = "myqueryHash" and job_id != "myjobId";
The result of the query and the avgPreviousExecutionMs value shown for that job in this section are not matching.
How can we validate this info ?
the avgPreviousExecutionMs is based on taking the how much time
period of data.
This average is based on the JOBS view or JOBS_BY_USER view or JOBS_BY_FOLDER or JOBS_BY_ORGANIZATION
I have this going on with two separate relationships but for the sake of this question we will focus on one.
I have a Fact table full of a given services info and a separate region lookup table
There are only six regions and IDs. The job table has a Job ID which is correlated to a specific region
i.e., Job-2000 is region 1000
I would expect that when I add the JobID and the region name, I would get
Job-2000 LA West
However, that is not happening when I test a count on the data in the matrix one region ID is returning multiple region names
I am unsure why this is, as the region_key are all distinct
So for example
this job despite having one region ID it is getting a count of 6 regions
I'd instead expect the following
JobName RegionID Region Name
Job-0069 00038329-ac05-4f61-897a-e2ec7924a8f3 Olive Plaza Pilot
When I select the RegionID, I pull it from the Fact table and not the regions lookup table
What could be happening?
I want to find out what my table sizes are (in BigQuery).
However I want to sum up the size of of all tables that belong to a specific set of sharded tables.
So I need to find metadata that shows that a table is part of a set of sharded tables.
So I can do: How to get BigQuery storage size for a single table
select
sum(size_bytes)/pow(2, 30) as size_gb
from
<your_dataset>.__TABLES__
But here I can't see if the table is part of a set of sharded set of tables.
This is what my Google Analytics sharded tables look like in BQ:
So somewhere must be metadata that indicates that tables with for example name ga_sessions_20220504 belong to a sharded set ga_sesssions_
Where/how can I find that metadata?
I think you are exploring the right query, most of the time, I use the following query to drill down on shards & it's sizes
SELECT
project_id,
dataset_id,
table_id,
array_reverse(SPLIT(table_id, '_'))[OFFSET(0)] AS shard_pt,
DATE(TIMESTAMP_MILLIS(creation_time)) creation_dt,
ROUND(size_bytes/POW(1024, 3), 2) size_in_gb
FROM
`<project>.<dataset>.__TABLES__`
WHERE
table_id LIKE 'ga_sessions_%'
ORDER BY
4 DESC
Result (on some random GA dataset I have access to FYI)
There is no metadata on Sharded tables via SQL.
Tables being displayed as Sharded in BigQuery UI happens when you do the following ->
Create 2 or more tables that have the following characteristics:
exist in the same dataset
have the exact same table schema
the same prefix
have a suffix of the form _YYYYMMDD (eg. 20210130)
These are something of a legacy feature, they were more commonly used with bigquery’s legacy SQL.
This blog was very insightful on this:
https://mark-mccracken.medium.com/bigquery-date-sharding-vs-date-partitioning-cee3754f7900
Got this error when setting up new scheduled query.
BigQuery scheduled query: Cannot create a transfer in >
JURISDICTION_US when destination dataset is located in >
REGION_EUROPE_NORTH_1
I try schedule query from Query editor > Schedule query.
Location in query setting and destination table location are both "europe-north-1".
Scheduled query not supports cross-region query.
Is table JURISDICTION_US's region is europe_north_1 too?
The table seems to be in US just by name..
You should check both table's region(queried table and destination table) before run scheduled query.
Refer here to get information about scheduled query support region.
If both table's regions are different, you need to migrate one to the other.
Refer how to moving bigquery data between locations to get guide for migration.
I have an interesting problem I need to resolve. I have a table A in Postgres. This table is treated like a queue which has a set of tasks. ID is incremental id in Postgres.
I want to have a metric to contain current processed position (ID) and the max number of ID. Those two numbers are accumulating every second.
Is there an efficient way to do it ?
The easiest way on top of my head is to execute this SQL query every 10 seconds (varies):
select blablah from table then limit 1 order by asc
to get smallest id and use the same approach to get largest id.
But this command is expensive. Is there any better way to do this ?
When you insert a new record into the table, return the record ID. When you extract a record do the same. You could cache this in memory, a file, a different DB table, etc. Then run a scheduled task to post these values to CloudWatch as a custom metric.
Example (very simple) SQL statement to return the ID when inserting new records:
INSERT INTO table (name) OUTPUT Inserted.ID VALUES('bob');