Can you make table read-only in QuestDB? - questdb

I have few big historical data tables in QuestDB and don't want to accidentally modify data in it.
Is there a way to make table locked or read-only in QuestDB?

You cannot have any security / permissions over the tables in QuestDB.
As a workaround however you can file system security and allow only read only access to the files in the table folders. Directory structure is very simple in QustDB
root/
conf/
db/
tableA/
tableB/
...

Related

Athena tables having history of records of every csv

I am uploading CSV files in the s3 bucket and creating tables through glue crawler and seeing the tables in Athena, making connection between Athena and Quicksight, and showing the result graphically there in quicksight.
But what I need to do now is keep the history of the files uploaded, instead of a new CSV file being uploaded and crawler updating the table, can I have crawler save each record separately? or is it even a reasonable thing to do? since I wonder it would then create so many tables and it'll be a mess?
I'm just trying to figure out a way to keep a history of previous records. how can i achieve this?
When you run an Amazon Athena query, Athena will look at the location parameter defined in the table's DDL. This specifies where the data is stored in an Amazon S3 bucket.
Athena will include all files in that location when it runs the query on that table. Thus, if you wish to add more data to the table, simply add another file in that S3 location. To replace data in that table, you can overwrite the file(s) in that location. To delete data, you can delete files from that location.
There is no need to run a crawler on a regular basis. The crawler can be used to create the table definition and it can be run again to update the table definition if anything has changed. But you typically only need to use the crawler once to create the table definition.
If you wish to preserve historical data in the table while adding more data to the table, simply upload the data to new files and keep the existing data files in place. That way, any queries will include both the historical data and the new data because Athena simply looks at all the files in that location.

Can you have a schema or folder structure in AWS Athena?

I am copying an entire snowflake DB into S3 to be viewed through Athena. I would like to preserve the schema/hierarchy so that the corresponding queries do not change. All the files are organized properly for this in S3 as follows
DataBase/Schema/Folder/Table/{parquet files}
When I crawl with Glue they all end up in one DB at the same level. Is it possible to have a similar folder structure in Athena?
Right now all queries in Athena are like
Select *
FROM database.table
I would like to have
Select *
FROM database.schema.folder.table
The only logical grouping of tables available in Athena is a database, and as you have indicated, there is no concept of hierarchy, schemas, or folders in Athena.
Database and schema comprise a namespace in Snowflake. If your intention is to simply have a similar namespace, what you can do is combine the Snowflake database d1 and schema name s1 to create a flattened logical grouping in Athena d1_s1. Then you can do:
SELECT * FROM d1_s1.table
Also, the only special character you can have in the database name is an underscore, so there really is no other way to preserve the structure or the existing queries. At least, this way the format is close enough that it should be easy enough to programmatically fix the existing queries (e.g., using regex to replace a.b.c with a_b.c).
However, there will still be differences. For example, grants are managed differently for Snowflake databases and schemas. Schemas also have a concept of managed access. This will not be possible in Athena.

Having trouble setting up multiple tables in AWS glue from a single bucket

So, I've used Glue before, but it's been with a single file <> single folder relationship.
What I'm trying to do now is to have a structure like this create individual tables for each folder:
- Data Bucket
- Table 1 Folder
- file1.csv
- file2.csv
- Table 2 Folder
- file1.csv
- file2.csv
...and so on.
But every time I create the crawler and set the Data Bucket as the data source, I only get a single table created. I've tried every combo of the "create single schema ...etc" I can think of.
I'm hoping that I don't have to add each sub-folder as a separate data source as my ultimate goal is to translate it eventually into an RDS instance. Hoping to keep the high-level bucket as the single data source if possible. I can easily tweak folder/file structure if needed.
And yes, I'm aware of partitioning, but isn't that only applicable to individual tables?
Thanks!
I ran into the same issue and digging into Glue docs, I found that setting table level in crawler's output configurations do the trick.
Table level seems to be set from the bucket level, in your case, I believe setting table level to 2 (the first folder after the root), would do the trick. 2 means that the tables definition starts at that point
I've been trying to accomplish the same thing. I was hoping that Glue would magically see the different folders and automatically create separate tables. Glue seems to want to create a single table, especially when the schemas overlap. In my example, I'm using US census data so there are some common fields, especially in the beginning of each file.
In the end, I was able to get this to work by creating multiple data stores in the Glue Crawler. By doing this, it would create the five separate tables I wanted, but I had to add each folder manually. Still hoping to find a way to get Glue to discover them automatically.

Is it possible to change a database (schema) name in AWS Athena?

I created a database and some tables with Data on AWS Athena and would like to rename the database without deleting and re-creating the tables and database. Is there a way to do this? I tried the standard SQL alter database but it doesn't seem to work.
thanks!
I'm afraid there is no way to do this according to this official forum thread. You would need to remove the database and re-create it. However, since Athena does not store any data by itself, deleting a table or a database won't impact your data stored on S3. Therefore, if you kept all the scripts that create external tables, re-creating a database should be fairly quick thing to do.
Athena doesn't support renaming database. You need to recreate database with a new name.
You can use Presto which is an open source version of Athena and Presto supports more DDL queries.

PostgreSQL: update table with new records from the same table on remote server

We have a PostgreSQL server running in production and a plenty of workstations with an isolated development environments. Each one has its own local PostgreSQL server (with no replication with the production server). Developers need to receive updates stored in production server periodically.
I am trying to figure out how to dump the contents of several selected tables from server in order to update the tables on development workstations. The biggest challenge is that the tables I'm trying to synchronize may be diverged (developers may add - but not delete - new fields to the tables through the Django ORM, while schema of the production database remains unchanged for a long time).
Therefore the updated records and new fields of the tables stored on workstations must be preserved against the overwriting.
I guess that direct dumps (e.g. pg_dump -U remote_user -h remote_server -t table_to_copy source_db | psql target_db) are not suitable here.
UPD: If possible I would also like to avoid the use of third (intermediate) database while transferring the data from production database to the workstations.
I would recommend the following approach.
I'll outline example based on a single table customer.
We want to copy some entries from this table on production. Obviously, full table dump will break new stuff that exists on development envs;
Therefore, create a table with the similar structure, but a different name, say customer_$. Another way is to create a dedicated schema for such “copying” tables. You might also want to include a couple of extra columns there, like copy_id and/or copy_stamp;
Now you can INSERT INTO customer_$ SELECT ... to populate your copying table with wanted data. You might need to think of the way how to do this, though. In the tool we use here we can supply predicate data via the -w switch, like -w "customer_id IN (SELECT id FROM cust2copy)";
After you've populated your copying table(s), you can dump them. Make sure to use the following switches to the pg_dump:
--column-inserts to explicitly list target columns, for on development env copying table might have changed it's structure. This might be “slow” for big volumes though;
--table / -t to specify tables to dump.
On the target env, make sure to (1) empty copying tables and (2) prevent parallel activities of similar nature;
Load date into the copying tables;
The most interesting part comes: you need to check, that data you're bout to INSERT into the main tables will not conflict with any of the constraints defined on the tables. You might have:
PRIMARY KEY violations. You can (1) replace existing entries or (2) merge entries together or (3) skip entries from the copying tables or (4) choose to assign different ID in the copying tables;
UNIQUE KEY violations, most likely you'll have to UPDATE some columns in the copying tables;
FOREIGN KEY violations, you'll have either to give up on such entries, or to copy over missing stuff from the production as well;
CHECK violations, you'll have to investigate this ones manually.
After checks are done and data in the copying tables is fixed, you can copy it into the main tables.
This is a very formal description of the approach. Say, for step #7 we have a huge pile of extra tools to do ID or ID ranges remapping, to manipulate data in the copying tables, adjusting security settings, ownership, some defaults, etc.
Also, we have a so-called catalogue for this tool, which allows us to group logically tied tables under common names. Say, to copy customers from production we have to check round 50 tables in order to satisfy all possible dependencies.
I haven't seen similar tools in the wild though so far.