Redshift: create external table returns 0 rows - amazon-web-services

I have a text file test.txt located at s3://myBucket/, see sample below, using which I want to create an external table in Redshift.
When I select from the table, it returns 0 rows.
1,One
2,Two
3,Three
create external table spectrum_schema.test(
Id integer,
Name varchar(255))
row format delimited
fields terminated by ','
stored as textfile
location 's3://myBucket/';
select * from spectrum_schema.test //returns 0 rows
Any suggestions how I can fix this?

I fixed this by moving the file to s3://myBucket/test

Related

How to RENAME struct/array nested columns using ALTER TABLE in BigQuery?

Suppose we have the following table in BigQuery:
CREATE TABLE sample_dataset.sample_table (
id INT
,struct_geo STRUCT
<
country STRING
,state STRING
,city STRING
>
,array_info ARRAY
<
STRUCT<
key STRING
,value STRING
>
>
);
I want to rename the columns inside the STRUCT and the ARRAY using an ALTER TABLE command. It's possible to follow the Google documentation available here for normal columns ("non-nested" columns) i:
ALTER TABLE sample_dataset.sample_table
RENAME COLUMN id TO str_id
But when I try to run the same command for nested columns I got errors from BigQuery.
Running the command for a column inside a STRUCT gives me the following message:
ALTER TABLE sample_dataset.sample_table
RENAME COLUMN `struct_geo.country` TO `struct_geo.str_country`
Error: ALTER TABLE RENAME COLUMN not found: struct_geo.country.
The exact same message appears when I run the same statement, but targeting a column inside an ARRAY:
ALTER TABLE sample_dataset.sample_table
RENAME COLUMN `array_info.str_key` TO `array_info.str_key`
Error: ALTER TABLE RENAME COLUMN not found: array_info.str_key
I got stuck since the BigQuery documentation about nested columns (available here) lacks examples of ALTER TABLE statements and refers directly to the default documentation for non-nested columns.
I understand that I can rename the columns by simply creating a new table using a CREATE TABLE new_table AS SELECT ... and then passing the new column names as aliases, but this would run a query over the whole table, which I'd rather avoid since my original table weighs way over 10TB...
Thanks in advance for any tips or solutions!

Unable to load data in Hive table in the correct form

Hi Please i have this create statement in for external table in Hive, but my data is not consistent - so when i run it, i get Null?
create external table sampleartistdata(
artistid int,
artistname string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES ("field.delim"="# ")
STORED AS TEXTFILE
location '/user/users/sampledata/';
select * from sampleartistdata limit 3;
this is what the data looks like:
1134999 06Crazy Life
10113088 Terfel, Bartoli- Mozart: Don
6826647 Bodenstandig 3000
10186265 Jota Quest e Ivete Sangalo
6828986 Toto_XX (1977
10236364 U.S Bombs -
1135000 artist formaly know as Mat
10299728 Kassierer - Musik für beide Ohren
10299744 Rahzel, RZA
result:
sampleartistdata.artistid sampleartistdata.artistname
NULL NULL
NULL NULL
NULL NULL
I was able to resolve it by changing the values of the row delimiter, rather than using ROW FORMAT SERDE, i used
drop table sampleartistdata;
create external table sampleartistdata(
artistid int,
artistname string
) ROW format DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
location '/user/jovyan/sampledata/';

Issue querying Athena with select having special characters

Below is the select query I am trying:
SELECT * from test WHERE doc = '/folder1/folder2-path/testfile.txt';
This query returns zero results.
If I change the query using like, it works omitting the special chars /-.
SELECT * from test WHERE doc LIKE '%folder1%folder2%path%testfile%txt';
This works
How can I fix this query to use eq or IN operator, as I am interested to run a batch select?
To test your situation, I created a text file containing:
hello
there
/folder1/folder2-path/testfile.txt
this/that
here.there
I uploaded the file to a directory on S3, then created an external table in Athena:
CREATE EXTERNAL TABLE stack (doc string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = ",", "escapeChar" = "\\")
LOCATION 's3://my-bucket/my-folder/'
I then ran the command:
select * from stack WHERE doc = '/folder1/folder2-path/testfile.txt'
It returned:
1 /folder1/folder2-path/testfile.txt
So, it worked for me. Therefore, your problem would either be a result of the contents of the file, or the way that the external table is defined (eg using a different Serde).

Does AWS Athena supports Sequence File

Has any one tried creating AWS Athena Table on top of Sequence Files. As per the Documentation looks like it is possible. I was able to execute below create table statement.
create external table if not exists sample_sequence (
account_id string,
receiver_id string,
session_index smallint,
start_epoch bigint)
STORED AS sequencefile
location 's3://bucket/sequencefile/';
The Statement executed Successfully but when i try to read data from the table it throws below error
Your query has the following error(s):
HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://viewershipforneo4j/2017-09-26/000030_0 (offset=372128055, length=62021342) using org.apache.hadoop.mapred.SequenceFileInputFormat: s3://viewershipforneo4j/2017-09-26/000030_0 not a SequenceFile
This query ran against the "default" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 9f0983b0-33da-4686-84a3-91b14a39cd09.
Sequence file are valid one . Issue here is there is not deliminator defined.
Ie row format delimited fields terminated by is missing
if in your case if tab is column deliminator row data is in next row it will be
create external table if not exists sample_sequence (
account_id string,
receiver_id string,
session_index smallint,
start_epoch bigint)
row format delimited fields terminated by '\t'
STORED AS sequencefile
location 's3://bucket/sequencefile/';

Column names containing dots in Spectrum

I created a customers table with columns has account_id.cust_id, account_id.ord_id and so on.
My create external table query was as follows:
CREATE EXTERNAL TABLE spectrum.customers
(
"account_id.cust_id" numeric,
"account_id.ord_id" numeric
)
row format delimited
fields terminated by '^'
stored as textfile
location 's3://awsbucketname/test/';
SELECT "account_id.cust_id" FROM spectrum.customers limit 100
and I get an error as :
Invalid Operation: column account_id.cust_id does not exists in
customers.
Is there any way or syntax to write column names like account_id.cust_id (text.text) while creating the table or while writing the select query?
Please help.
PS: Single quotes, back ticks don't work either.