Issue description
Trying to insert DML statistics from Bigquery system tables to BQ native tables for monitoring purpose using Airflow task.
For this need, I am using below query:
INSERT INTO
`my-project-id.my_dataset.my_table_metrics`
(table_name,
row_count,
inserted_row_count,
updated_row_count,
creation_time)
SELECT
b.table_name,
a.row_count AS row_count,
b.inserted_row_count,
b.updated_row_count,
b.creation_time
FROM
`my-project-id.my_dataset`.__TABLES__ a
JOIN (
SELECT
tables.table_id AS table_name,
dml_statistics.inserted_row_count AS inserted_row_count,
dml_statistics.updated_row_count AS updated_row_count,
creation_time AS creation_time
FROM
`my-project-id`.`region-europe-west3`.INFORMATION_SCHEMA.JOBS,
UNNEST(referenced_tables) AS tables
WHERE
DATE(creation_time) = current_date ) b
ON
a.table_id = b.table_name
WHERE
a.table_id = 'my_bq_table'
The query is working in Bigquery console but not working via airflow.
Error as per Airflow
python_http_client.exceptions.UnauthorizedError: HTTP Error 401:
Unauthorized
[2022-10-21, 12:18:30 UTC] {standard_task_runner.py:93}
ERROR - Failed to execute job 313922 for task load_metrics (403 Access
Denied: Table
my-project-id:region-europe-west3.INFORMATION_SCHEMA.JOBS: User does
not have permission to query table
my-project-id:region-europe-west3.INFORMATION_SCHEMA.JOBS, or perhaps
it does not exist in location europe-west3.
How to fix this access issue?
Appricate your help.
I've created glue table (external) via terraform where I din't put location of the table.
Location of the table should be updated after app run. And when app runs it receives an exception:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error: ',', ':', or ';' expected at position 291 from 'bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:smallint:smallint:smallint:decimal(12,2):decimal(12,2):decimal(12,2):bigint:string:bigint:string:timestamp:timestamp:bigint:bigint:bigint:bigint:bigint:string:string:decimal(12,2) :bigint:timestamp:string:bigint:decimal(12,2):string:bigint:bigint:timestamp:int' [0:bigint, 6::, 7:bigint, 13::, 14:bigint, 20::, 21:bigint, 27::, 28:bigint, 34::, 35:bigint, 41::, 42:bigint, 48::, 49:bigint, 55::, 56:bigint, 62::, 63:bigint, 69::, 70:bigint, 76::, 77:bigint, 83::, 84:bigint, 90::, 91:bigint, 97::, 98:string, 104::, 105:string, 111::, 112:smallint, 120::, 121:smallint, 129::, 130:smallint, 138::, 139:decimal, 146:(, 147:12, 149:,, 150:2, 151:), 152::, 153:decimal, 160:(, 161:12, 163:,, 164:2, 165:), 166::, 167:decimal, 174:(, 175:12, 177:,, 178:2, 179:), 180::, 181:bigint, 187::, 188:string, 194::, 195:bigint, 201::, 202:string, 208::, 209:timestamp, 218::, 219:timestamp, 228::, 229:bigint, 235::, 236:bigint, 242::, 243:bigint, 249::, 250:bigint, 256::, 257:bigint, 263::, 264:string, 270::, 271:string, 277::, 278:decimal, 285:(, 286:12, 288:,, 289:2, 290:), 291: , 292::, 293:bigint, 299::, 300:timestamp, 309::, 310:string, 316::, 317:bigint, 323::, 324:decimal, 331:(, 332:12, 334:,, 335:2, 336:), 337::, 338:string, 344::, 345:bigint, 351::, 352:bigint, 358::, 359:timestamp, 368::, 369:int]
This exception kind of represents fields which were defined in terraform.
From aws console I couldn't set location after table was created. When I connected to AWS EMR which uses Glue metastore and tried to execute same query I receive same exception.
So I have several questions:
Does anybody know how to alter empty location of the external glue table?
The default location of the table should looks like that hive/warehouse/dbname.db/tablename. So what is the correct path in that case in EMR ?
Snowflake S3 data is in .txt.bz2, I need to export the data files present in this SnowFlake S3 to my AWS S3, the exported results must be the same format as in the source location.This is wat I tried.
COPY INTO #mystage/folder from
(select $1||'|'||$2||'|'|| $3||'|'|| $4||'|'|| $5||'|'||$6||'|'|| $7||'|'|| $8||'|'|| $9||'|'|| $10||'|'|| $11||'|'|| $12||'|'|| $13||'|'|| $14||'|'||$15||'|'|| $16||'|'|| $17||'|'||$18||'|'||$19||'|'|| $20||'|'|| $21||'|'|| $22||'|'|| $23||'|'|| $24||'|'|| $25||'|'||26||'|'|| $27||'|'|| $28||'|'|| $29||'|'|| $30||'|'|| $31||'|'|| $32||'|'|| $33||'|'|| $34||'|'|| $35||'|'|| $36||'|'|| $37||'|'|| $38||'|'|| $39||'|'|| $40||'|'|| $41||'|'|| $42||'|'|| $43
from #databasename)
CREDENTIALS = (AWS_KEY_ID = '*****' AWS_SECRET_KEY = '*****' )
file_format=(TYPE='CSV' COMPRESSION='BZ2');
PATTERN='*/*.txt.bz2
Right now Snowflake does not support exporting data to file in bz2.
My suggestion is to set COMPRESSION='gzip', then you can export the Data to your S3 in gzip.
If exporting file in bz2 is high priority for you, please contact Snowflake support.
If you want to unload bz2 file from a Snowflake stage to your own S3, you can do something like this.
COPY INTO #myS3stage/folder from
(select $1||'|'||$2||'|'|| $3||'|'|| $4||'|'|| $5||'|'||$6||'|'|| $7||'|'|| $8||'|'|| $9||'|'|| $10||'|'|| $11||'|'|| $12||'|'|| $13||'|'|| $14||'|'||$15||'|'|| $16||'|'|| $17||'|'||$18||'|'||$19||'|'|| $20||'|'|| $21||'|'|| $22||'|'|| $23||'|'|| $24||'|'|| $25||'|'||26||'|'|| $27||'|'|| $28||'|'|| $29||'|'|| $30||'|'|| $31||'|'|| $32||'|'|| $33||'|'|| $34||'|'|| $35||'|'|| $36||'|'|| $37||'|'|| $38||'|'|| $39||'|'|| $40||'|'|| $41||'|'|| $42||'|'|| $43
from #snowflakeStage(PATTERN => '*/*.txt.bz2'))
CREDENTIALS = (AWS_KEY_ID = '*****' AWS_SECRET_KEY = '*****' )
file_format=(TYPE='CSV');
I want to use AWS S3 to store image files for my website. I create a bucket name images.mydomain.com which was referred by dns cname images.mydomain.com from AWS Route 53.
I want to check whether a folder or file exists; if not I will create one.
The following php codes work fine for regular bucket name using stream wrapper but fails for this type of bucket name such as xxxx.mydomain.com. This kind of bucket name fails in doesObjectExist() method too.
// $new_dir = "s3://aaaa/akak3/kk1/yy3/ww4" ; // this line works !
$new_dir = "s3://images.mydomain.com/us000000/10000" ; // this line fails !
if( !file_exists( $new_dir) ){
if( !mkdir( $new_dir , 0777 , true ) ) {
echo "create new dir $new_dir failed ! <br>" ;
} else {
echo "SUCCEED in creating new dir $new_dir <br" ;
}
} else {
echo "dir $new_dir already exists. Skip creating dir ! <br>" ;
}
I got the following message
Warning: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint: "images.mydomain.com.s3.amazonaws.com". in C:\AppServ\www\ecity\vendor\aws\aws-sdk-php\src\Aws\S3\StreamWrapper.php on line 737
What is the problem here?
Any advise on what to do for this case?
Thanks!
In Cassandra Cluster EVENT_KS Key Space , I have a bookTicket1 (stream) and it has columns
payload_provider,payload_totalNoTickets. When I tried to a new Analytics script as below ,
CREATE EXTERNAL TABLE IF NOT EXISTS BusTicketTable
(provider STRING, totalNoTickets STRING, version STRING)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES (
"cassandra.host" = "127.0.0.1" ,
"cassandra.port" = "9160" ,
"cassandra.ks.name" = "EVENT_KS" ,
"cassandra.ks.username" = "admin" ,
"cassandra.ks.password" = "admin" ,
"cassandra.cf.name" = "bookTicket1" ,
"cassandra.columns.mapping" = ":payload_provider,payload_totalNoTickets, Version" );
It returns the error:
ERROR: Error while executing Hive script.Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask "
Consider this line,
"cassandra.columns.mapping" = ":payload_provider,payload_totalNoTickets, Version"
There the key is not set in Cassandra. I am not sure but I think you may have to set the key as well because the row key is mandatory for Cassandra column family.
e.g.:
"cassandra.columns.mapping" = ":key, payload_provider,payload_totalNoTickets, Version"
You may need to set a unique field as the key.