Athena: Possible To Use Alias For Column? - amazon-web-services

My AWS Athena table contains a schema as follows:
CREATE EXTERNAL TABLE IF NOT EXISTS .... (
name STRING,
address STRING,
phone STRING,
...
)
However, when querying against this table I want to be able to query against name and for example personName
Ideally I'd like to be able to do this
CREATE EXTERNAL TABLE IF NOT EXISTS .... (
name STRING as personName,
address STRING as personAddress,
phone STRING as personPhone,
...
)
...but I don't see how to achieve this using the documentation. (I am using Avro)
How might I achieve this without having 2 tables?

Related

How to query dynamodb where I have fetch records based on a list of key values?

I have a dynamodb table on which a GSI is defined with a partition key and sort key.
Let's say the parition key is name and sort key is ssn for the GSI.
I have to fetch based upon a name and ssn, below is the query I am using and it works fine.
table.query(IndexName='lookup-by-name',KeyConditionExpression=Key('name').eq(name)\
& Key('ssn').eq(ssn))
Now, I have to query based upon a name and a list of ssns.
For Example
ssns=['ssn1','ss2','ss3',ssn4']
name='Alex'
query all records which has name as 'Alex' and whose ssn is present in ssns list.
How do I implement something like this ?
While DynamoDB native SDK cannot provide the functionality to do this, you can achieve it using PartiQL which provides a SQL like interface for interacting with DynamoDB.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-gettingstarted.html
import boto3
client = boto3.client('dynamodb', region_name="eu-west-1")
name = 'Alex'
ssns = ['ssn1','ssn2','ssn3','ssn4']
response = client.execute_statement(
Statement = "Select * from \"MyTableTest\".\"lookup-by-name\" where \"name\" = '%s' AND \"ssn\" IN %s" % (name, ssns)
)
print(response['Items'])
It would also require you to use the lower level Client instead of the Table level resource which you are using above.
You would have to do multiple queries.
Ended up using just the name as keycondition and then filter out the ssn in python code.
Below worked for me as the number of records was not a lot.
response=table.query(IndexName='lookup-by-name',KeyConditionExpression=Key('name').eq(name)
ssns=['ssn1','ss2','ss3',ssn4']
data= response['Items']
data=list(filter(lambda record: record['ssn'] in ssns,data))
return data

Does AWS Athena supports Sequence File

Has any one tried creating AWS Athena Table on top of Sequence Files. As per the Documentation looks like it is possible. I was able to execute below create table statement.
create external table if not exists sample_sequence (
account_id string,
receiver_id string,
session_index smallint,
start_epoch bigint)
STORED AS sequencefile
location 's3://bucket/sequencefile/';
The Statement executed Successfully but when i try to read data from the table it throws below error
Your query has the following error(s):
HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://viewershipforneo4j/2017-09-26/000030_0 (offset=372128055, length=62021342) using org.apache.hadoop.mapred.SequenceFileInputFormat: s3://viewershipforneo4j/2017-09-26/000030_0 not a SequenceFile
This query ran against the "default" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 9f0983b0-33da-4686-84a3-91b14a39cd09.
Sequence file are valid one . Issue here is there is not deliminator defined.
Ie row format delimited fields terminated by is missing
if in your case if tab is column deliminator row data is in next row it will be
create external table if not exists sample_sequence (
account_id string,
receiver_id string,
session_index smallint,
start_epoch bigint)
row format delimited fields terminated by '\t'
STORED AS sequencefile
location 's3://bucket/sequencefile/';

Column names containing dots in Spectrum

I created a customers table with columns has account_id.cust_id, account_id.ord_id and so on.
My create external table query was as follows:
CREATE EXTERNAL TABLE spectrum.customers
(
"account_id.cust_id" numeric,
"account_id.ord_id" numeric
)
row format delimited
fields terminated by '^'
stored as textfile
location 's3://awsbucketname/test/';
SELECT "account_id.cust_id" FROM spectrum.customers limit 100
and I get an error as :
Invalid Operation: column account_id.cust_id does not exists in
customers.
Is there any way or syntax to write column names like account_id.cust_id (text.text) while creating the table or while writing the select query?
Please help.
PS: Single quotes, back ticks don't work either.

libpqxx : How to retrieve table name from the results container

I am trying to get table name ( or oid ) from postgresql using pqxx library. I know i can get it using column_table() if my result container is based on previous sql query command. However i am using insert,update or delete commands and i want to retrieve table name from the results after then. Can`t find a way how to do it .
....
const std::string _sqlcmd="update mytable set port=35012 where id=1";
pqxx::work Xaction(*m_conn);
m_result = Xaction.exec(_sqlcmd);
Xaction.commit();
result.hxx does not seem to cover such a function for commands like insert , update or delete functions.

Dynamodb2 Table Schema Creation

I'm using the following: dynamodb2, boto, python. I have the following code for creating a table:
table = Table.create('mySecondTable',
schema=[HashKey('ID')],
RangeKey('advertiser'),
throughput={'read':5,'write':2},
global_indexes=[GlobalAllIndex('otherDataIndex',parts=[
HashKey('date',data_type=NUMBER),
RangeKey('publisher', date_type=str),
],throughput={'read':5,'write':3})],
connection=conn)
I would like to be able to have the following data that I can query by:
ID, advertiser, date, publisher, size, and color
That means I need a different schema. When I add additional points it does not query unless the column name is listed in the schema.
The problem however is that right now I am only able to query by Id, advertiser, date, and publisher in this case. How can I add additional columns that I can query by?
I read this which appears to say that it is possible:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
However there is no example here:
http://boto.readthedocs.org/en/latest/dynamodb2_tut.html
I tried adding an additional range key however it doesn't work (cannot have duplicates)
I'd like it to be like:
table = Table.create('mySecondTable',
schema=[
RangeKey('advertiser'),
otherKey('date')
fourthKey('publisher') ... etc
throughput={'read':5,'write':2},
connection=conn)
Thanks!
If you want to add additional range keys you need to use Local secondary index.
You can query the LSI in the same way that you query the base table. You need to provide an exact value for the hashkey and a comparison-predicate for range key.