How do I query a CSV dataset that's been imported? - questdb

I have imported data from CSV into Questdb using the web console but I can't query any values from it. I can see it in the list of tables but if I run something like this:
select * from imported-data.csv;
I get an error saying:
function, literal or constant is expected

If the table hasn't been renamed, you have to escape the filename with singlequotes, i.e.:
select * from 'imported-data.csv';
And if you want to rename the table to something else, you can also easily do this with RENAME:
RENAME TABLE 'imported-data.csv' TO 'newTable';

Related

Can Query Builder in SAS EG be used to output fixed-width columns?

thanks for any help! I’ll try to be concise.
I am new to SAS and have inherited a SAS run that outputs a table with 15 columns, which I export as an excel file and deliver.
The recipient would now like it as a .txt file with fixed-width columns, and has provided the character limit/width for each column.
Is this something that can be done in query builder, so I can just query the existing table rather than modifying the code in the run? I tried filling the field for ‘Length’ when you add a column to query builder, but it did not have the desired result.

Hive Array<Struct<field1:string, field2:string> equivalent in BigQuery

I have a hive table where one of the column is Array
ruleInfoList array<structfield1:string,field2:string,field3:string,field4:string>
I am trying to create a similar DDL in Bigquery but unable to figure out how to do that.
Then I tried inserting the data without creating a table in BigQuery using Java and noticed below schema and I'm not sure what Record is and from where list and element got created. Please see the below screenshot.
RECORD is STRUCT, and REPEATED is ARRAY in BigQuery SQL, the type of your column should be:
ruleinfolist STRUCT<list ARRAY<element STRUCT<dataClassCode STRING, dataRuleGroupCode> > >
Tip: to see the type of a table column, you can query INFORMATION_SCHEMA.COLUMNS, and check the DATA_TYPE column.

Amazon Athena - Column cannot be resolved on basic SQL WHERE query

I am currently evaluating Amazon Athena and Amazon S3.
I have created a database (testdb) with one table (awsevaluationtable). The table has two columns, x (bigint) and y (bigint).
When I run:
SELECT *
FROM testdb."awsevaluationtable"
I get all of the test data:
However, when I try a basic WHERE query:
SELECT *
FROM testdb."awsevaluationtable"
WHERE x > 5
I get:
SYNTAX_ERROR: line 3:7: Column 'x' cannot be resolved
I have tried all sorts of variations:
SELECT * FROM testdb.awsevaluationtable WHERE x > 5
SELECT * FROM awsevaluationtable WHERE x > 5
SELECT * FROM testdb."awsevaluationtable" WHERE X > 5
SELECT * FROM testdb."awsevaluationtable" WHERE testdb."awsevaluationtable".x > 5
SELECT * FROM testdb.awsevaluationtable WHERE awsevaluationtable.x > 5
I have also confirmed that the x column exists with:
SHOW COLUMNS IN sctawsevaluation
This seems like an extremely simple query yet I can't figure out what is wrong. I don't see anything obvious in the documentation. Any suggestions would be appreciated.
In my case, changing double quotes to single quotes resolves this error.
Presto uses single quotes for string literals, and uses double quotes for identifiers.
https://trino.io/docs/current/migration/from-hive.html#use-ansi-sql-syntax-for-identifiers-and-strings
Strings are delimited with single quotes and identifiers are quoted with double quotes, not backquotes:
SELECT name AS "User Name"
FROM "7day_active"
WHERE name = 'foo'
I have edited my response to this issue based on my current findings and my contact with both the AWS Glue and Athena support teams.
We were having the same issue - an inability to query on the first column in our CSV files. The problem comes down to the encoding of the CSV file. In short, AWS Glue and Athena currently do not support CSV's encoded in UTF-8-BOM. If you open up a CSV encoded with a Byte Order Mark (BOM) in Excel or Notepad++, it looks like any comma-delimited text file. However, opening it up in a Hex editor reveals the underlying issue. There are a bunch of special characters at the start of the file:  i.e. the BOM.
When a UTF-8-BOM CSV file is processed in AWS Glue, it retains these special characters, and associates then with the first column name. When you try and query on the first column within Athena, you will generate an error.
There are ways around this on AWS:
In AWS Glue, edit the table schema and delete the first column, then reinsert it back with the proper column name, OR
In AWS Athena, execute the SHOW CREATE TABLE DDL to script out the problematic table, remove the special character in the generated script, then run the script to create a new table which you can query on.
To make your life simple, just make sure your CSV's are encoded as UTF-8.
I noticed that the csv source of the original table had column headers with capital letters (X and Y) unlike the column names that were being displayed in Athena.
So I removed the table, edited the csv file so that the headers were lowercase (x and y), then recreated the table and now it works!

Postgres Copy select rows from CSV table

This is my first post to stackoverflow. Your forum has been SO very helpful as I've been learning Python and Postgres on the fly for the last 6 months, that I haven't needed to post yet. But this task is tripping me up and I figure I need to start earning reputation points:
I am creating a python script for backing up data into an SQL database daily. I have a CSV file with an entire months worth of hourly data, but I only want to select a single day of data from from the file and copy those select rows into my database. Am I able to query the CSV table and append the query results into my database? For example:
sys.stdin = open('file.csv', 'r')
cur.copy_expert("COPY table FROM STDIN
SELECT 'yyyymmddpst LIKE 20140131'
WITH DELIMITER ',' CSV HEADER", sys.stdin)
This code and other variations aren't working out - I keep getting syntax errors. Can anyone help me out with this task? Thanks!!
You need create temporary table at first:
cur.execute('CREATE TEMPORARY TABLE "temp_table" (LIKE "your_table") WITH OIDS')
Than copy data from csv:
cur.execute("COPY temp_table FROM '/full/path/to/file.csv' WITH CSV HEADER DELIMITER ','")
Insert necessary records:
cur.execute("INSERT INTO your_table SELECT * FROM temp_table WHERE yyyymmddpst LIKE 20140131")
And don't forget do conn.commit()
Temp table will destroy after cur.close()
You can COPY (SELECT ...) TO an external file, because PostgreSQL just has to read the rows from the query and send them to the client.
The reverse is not true. You can't COPY (SELECT ....) FROM ... . If it were a simple SELECT PostgreSQL could try to pretend it was a view, but really it doesn't make much sense, and in any case it'd apply to the target table, not the source rows. So the code you wrote wouldn't do what you think it does, even if it worked.
In this case you can create an unlogged or temporary table, copy the full CSV to it, and then use SQL to extract just the rows you want, as pointed out by Dmitry.
An alternative is to use the file_fdw to map the CSV file as a table. The CSV isn't copied, it's just read on demand. This lets you skip the temporary table step.
From PostgreSQL 12 you can add a WHERE clause to your COPY statement and you will get only the rows that match the condition.
So your COPY statement could look like:
COPY table
FROM '/full/path/to/file.csv'
WITH( FORMAT CSV, HEADER, DELIMITER ',' )
WHERE yyyymmddpst LIKE 20140131

Create DynamoDB tables using Hive

I have in my cloud, inside a S3 bucket, a CSV file with some data.
I would like to export that data into a DynamoDB table with columns "key" and "value".
Here's the current hive script I wrote:
CREATE EXTERNAL TABLE FromCSV(key string, value string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ', '
LOCATION 's3://mybucket/output/';
CREATE EXTERNAL TABLE hiveTransfer(col1 string, col2 string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "InvertedIndex",
"dynamodb.column.mapping" = "col1:key,col2:value");
INSERT OVERWRITE TABLE hiveTransfer SELECT * FROM FromCSV;
Now, basically the script works. though I would like to make some modifications to this script as follows:
1) The script works only if the table "InvertedIndex" already exists in DynamoDB, I would like the script to create the new table by itself and then put the data as it already does.
2) In the CSV the key is always a string but I have 2 kinds of values, string or integer. I would like the script to distinguish between the two and make two different tables.
Any help with those two modifications will be appriciated.
Thank you
Hi this could be accomplished but it is not trivial case.
1) For creating dynamo table that can't be done by hive because Dynamo Tables are managed by Amazon cloud. One thing which gets in my mind is to create Hive UDF for creating dynamo table and call it inside some dummy query before running insert. For example:
SELECT CREATE_DYNO_TABLE() FROM dummy;
Where dummy table has only one record.
2) You can split loading into two queries where in one query you will use RLIKE operator and [0-9]+ regular expression to detect numeric values and other just negation of that.
HTH,
Dino