Vertica Parquet format

Vertica Parquet format - hdfs

I use following Vertica version "Vertica Analytic Database v7.1.1-0" and I try to copy data from Parquet format file to table using following query:
COPY temp.sessions_parquet FROM '/dbadmin/vertica-import/parquet/*' ON ANY NODE PARQUET;
I've created table with following DDL:
CREATE TABLE temp.sessions_parquet (c0 VARCHAR, c1 VARCHAR, c2 VARCHAR, c3 VARCHAR, c4 VARCHAR, c5 VARCHAR, c6 VARCHAR, c7 VARCHAR, c8 VARCHAR, c9 VARCHAR, c10 VARCHAR, c11 VARCHAR, c12 VARCHAR, c13 VARCHAR, c14 VARCHAR, c15 VARCHAR, c16 VARCHAR, c17 VARCHAR, c18 VARCHAR, c19 VARCHAR, c20 VARCHAR, c21 VARCHAR, c22 VARCHAR, c23 VARCHAR, c24 VARCHAR, c25 VARCHAR, c26 VARCHAR, c27 VARCHAR, c28 VARCHAR, c29 VARCHAR, c30 VARCHAR, c31 VARCHAR, c32 VARCHAR, c33 VARCHAR, c34 VARCHAR, c35 VARCHAR, c36 VARCHAR, c37 VARCHAR, c38 VARCHAR, c39 VARCHAR, c40 VARCHAR, c41 VARCHAR, c42 VARCHAR, c43 VARCHAR, c44 VARCHAR, c45 VARCHAR, c46 VARCHAR, c47 VARCHAR, c48 VARCHAR, c49 VARCHAR, c50 VARCHAR, c51 VARCHAR, c52 VARCHAR, c53 VARCHAR, c54 VARCHAR, c55 VARCHAR, c56 VARCHAR, c57 VARCHAR);
Parquet format file is on Vertica node already in following location:
/dbadmin/vertica-import/parquet/part-r-00000.snappy.parquet
When I try to execute following command:
COPY temp.sessions_parquet FROM '/dbadmin/vertica-import/parquet/*' ON ANY NODE PARQUET;
I am getting following error:
An error occurred when executing the SQL command: COPY
temp.sessions_parquet FROM '/dbadmin/vertica-import/parquet/*' ON ANY
NODE PARQUET
[Vertica]VJDBC ERROR: Syntax error at or near "PARQUET" [SQL
State=42601, DB Errorcode=4856] 1 statement failed.
Could you please help and tell how can I import data?

Reading from Parquet files was first supported in version 7.2.3. It looks like you've found the syntax from that version, but you're using it with an old version.
Here is the documentation of this feature from 7.2.3. Note, by the way, that it doesn't support complex types, which is still true in 8.0.x.

Related

Django ORM make Union Query with column not in common in both tables, set value of not in common column as null

Hi i want to make a query in Djang ORM
like this
Select Col1, Col2, Col3, Col4, Col5 from Table1
Union
Select Col1, Col2, Col3, Null as Col4, Null as Col5 from Table2
as you see Col4, Col5 are not in common but they will return null instead in Table2.
Table1_qs = Table1.objects.all()
Table2_qs = Table2.objects.all()
Table1_qs.values('Col1', 'Col2','Col3','Col4','Col5').union(Table2_qs.values('Col1', 'Col2','Col3','Null as Col4','Null as Col5'))
How can i make the query in Django?

the solution is made possible by Value and annotate.
here is how.
let say Col4 is type IntegerField,
and Col5 is type CharField
from django.db.models import Value, IntegerField, CharField
Table1_qs = Table1.objects.all()
Table2_qs = Table2.objects.all()
Table1_qs = Table1_qs.values('Col1', 'Col2','Col3','Col4','Col5')
Table2_qs = Table2_qs.values('Col1', 'Col2','Col3').annotate(
Col4=Value(None, output_field=IntegerField()),
Col5=Value(None, output_field=CharField()) )
unioned_query = Table1_qs.union(Table2_qs)
please note:
1: columns type must be the same as each.
2: and they must be in same order as well.
the problem that arise is within foreign-key. as only the id (primary key) of them will be returned when using Values() on a query-set!
I hope Django add a way to get them as usual objects too.

Athena / Presto data last week

I am currently trying to write an Athena query to fetch all data in a table from the last 7 days.
SELECT *
FROM "engagement_metrics"."spikes"
where spike_noticed_moment_utc > date_add('day', -7, now())
When running this query I get the following error:
SYNTAX_ERROR: line 3:32: '>' cannot be applied to varchar, timestamp with time zone
How can I achieve grabbing data from the last week given the current day in Athena?

Looks like the column spike_noticed_moment_utc is defined as varchar, you can cast it quite easy to timestamp using from_iso8601_timestamp:
SELECT *
FROM "engagement_metrics"."spikes"
where from_iso8601_timestamp(spike_noticed_moment_utc) > date_add('day', -7, now())

How to do an importrange query with a time duration condition (all rows under 1 minute)

I'm trying to make a query(importrange) of this google-sheet file
I want to filter my data based on 3 conditions:
Col5='GC' OR
Col5='CL' AND(this is the problem I can not solve)
In Col4 the time must be under 60 seconds.
I've tried different solutions (time, seconds, timevalue) but none of them works.
I tried this but it's WITHOUT the last, crucial passage:
=query(IMPORTRANGE("18OOzibH9rmuzNxOPo_EbZ1rhF32qESuvPa4x4pB1BmA/edit#gid=0",
"data!A1:Q"),
"select Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14, Col15, Col16, Col17, WHERE (Col5='GC') OR (Col5='CL')"
)
The result I am expecting to see is to have only the rows with GC and CL in Col5 and a duration <= 60 seconds.

=QUERY(IMPORTRANGE("18OOzibH9rmuzNxOPo_EbZ1rhF32qESuvPa4x4pB1BmA", "data!A1:Q"),
"where Col5 matches 'CL|GC'
and minute(Col4) < 1", 1)

How the pricing is calculated If I query a Bigquery View?

Say I have a BigQuery View
MyView:
select col1, col2, col3, col4, col5, col6, col7 from mytable;
Now If I Query my view:
select col1 from MyView;
So In this case , will the pricing will be calculated for all columns or only for col1?

will the pricing will be calculated for all columns or only for col1?
Only for col1!
You can easily check this in UI by looking into estimation of how many bytes will be processed using
select col1 from MyView
vs
select * from MyView

sql update command with skip primary key duplicates

I have two tables T1(col1,col2,col3) and T2(col4,col5,col6)
Only for T1 , col1 is primary key.
I need to update col1=col4, col2 = col5, col3=col6 where col1=col4 or col1=col5
There are chances that primary key is getting duplicated... and update command does not work
basically update primary key with out duplicates in it.

You do not need to update col1 as you are updating the row with same primary key.
col1 is primary key of T1, so it won't be duplicated.
The query should be UPDATE T1 SET col2=col5, col3=col6 WHERE col1=col4
in case of col1 != col4 and col1 = col5 please execute the query SELECT * FROM T1 WHERE col1 = col4
if the number of rows in result array > 0 then skip

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Vertica Parquet format - hdfs

Related

Django ORM make Union Query with column not in common in both tables, set value of not in common column as null

Athena / Presto data last week

How to do an importrange query with a time duration condition (all rows under 1 minute)

How the pricing is calculated If I query a Bigquery View?

sql update command with skip primary key duplicates

Categories

Resources