we'd like to use DynamoDB for several application (each with multiple tables). Is there any way how to group tables together (something like folders)? I tried to tag a tables, but when I created resource group I didn't see dynamoDB under resource types. Thx
Currently, there is no approach to organize the tables in AWS Console. In most of the cases, prefixes are used to keep the tables together in the list.
e.g:
prod_users
prod_tenants
stag_users
stag_tenants
Related
I am copying an entire snowflake DB into S3 to be viewed through Athena. I would like to preserve the schema/hierarchy so that the corresponding queries do not change. All the files are organized properly for this in S3 as follows
DataBase/Schema/Folder/Table/{parquet files}
When I crawl with Glue they all end up in one DB at the same level. Is it possible to have a similar folder structure in Athena?
Right now all queries in Athena are like
Select *
FROM database.table
I would like to have
Select *
FROM database.schema.folder.table
The only logical grouping of tables available in Athena is a database, and as you have indicated, there is no concept of hierarchy, schemas, or folders in Athena.
Database and schema comprise a namespace in Snowflake. If your intention is to simply have a similar namespace, what you can do is combine the Snowflake database d1 and schema name s1 to create a flattened logical grouping in Athena d1_s1. Then you can do:
SELECT * FROM d1_s1.table
Also, the only special character you can have in the database name is an underscore, so there really is no other way to preserve the structure or the existing queries. At least, this way the format is close enough that it should be easy enough to programmatically fix the existing queries (e.g., using regex to replace a.b.c with a_b.c).
However, there will still be differences. For example, grants are managed differently for Snowflake databases and schemas. Schemas also have a concept of managed access. This will not be possible in Athena.
I've searched the documentation a lot, but couldn't find anything that allows me to do the following:
Allow creating a role which allows full table access to tables with certain table names only (ex.: "table1", etc.) that'll be created in future. This should work across all available datasets in a GCP project, and also the ones that'll be created in future.
Is this possible? If not directly, indirectly maybe?
Thanks..
The simplest way to do that would be to create a dataset for housing such tables, and set the access appropriate to what you need. Tables requiring a different set of policies should be housed in other datasets.
More information here: https://cloud.google.com/bigquery/docs/dataset-access-controls
So, I've used Glue before, but it's been with a single file <> single folder relationship.
What I'm trying to do now is to have a structure like this create individual tables for each folder:
- Data Bucket
- Table 1 Folder
- file1.csv
- file2.csv
- Table 2 Folder
- file1.csv
- file2.csv
...and so on.
But every time I create the crawler and set the Data Bucket as the data source, I only get a single table created. I've tried every combo of the "create single schema ...etc" I can think of.
I'm hoping that I don't have to add each sub-folder as a separate data source as my ultimate goal is to translate it eventually into an RDS instance. Hoping to keep the high-level bucket as the single data source if possible. I can easily tweak folder/file structure if needed.
And yes, I'm aware of partitioning, but isn't that only applicable to individual tables?
Thanks!
I ran into the same issue and digging into Glue docs, I found that setting table level in crawler's output configurations do the trick.
Table level seems to be set from the bucket level, in your case, I believe setting table level to 2 (the first folder after the root), would do the trick. 2 means that the tables definition starts at that point
I've been trying to accomplish the same thing. I was hoping that Glue would magically see the different folders and automatically create separate tables. Glue seems to want to create a single table, especially when the schemas overlap. In my example, I'm using US census data so there are some common fields, especially in the beginning of each file.
In the end, I was able to get this to work by creating multiple data stores in the Glue Crawler. By doing this, it would create the five separate tables I wanted, but I had to add each folder manually. Still hoping to find a way to get Glue to discover them automatically.
We have a very large number of folders and files in S3, all under one particular folder, and we want to crawl for all the CSV files, and then query them from one table in Athena. The CSV files all have the same schema. The problem is that the crawler is generating a table for every file, instead of one table. Crawler configurations have a checkbox option to "Create a single schema for each S3 path" but this doesn't seem to do anything.
Is what I need possible? Thanks.
Glue crawlers claims to solve many problems, but in fact solves few. If you're slightly outside the scope of what they designed for you're out of luck. There might be a way to configure it to do what you want, but in my experience trying to make Glue crawlers do things that aren't perfectly aligned with it is not worth the effort.
It sounds like you have a good idea of what the schema of your data is. When that is the case Glue crawlers also provide very little value. You probably have a better idea of what the schema should look than Glue will ever be able to figure out.
I suggest that you manually create the table, and write a one off script that lists all the partition locations on S3 that you want to include in the table and generate ALTER TABLE ADD PARTITION … SQL, or Glue API calls to add those partitions to the table.
To keep the table up to date when new partition locations are added, have a look at this answer for guidance: https://stackoverflow.com/a/56439429/1109
One way to do what you want is to use just one of the tables created by the crawler as an example, and create a similar table manually (in AWS Glue->Tables->Add tables, or in Athena itself, with
CREATE EXTERNAL TABLE `tablename`(
`column1` string,
`column2` string, ...
using existing table as an example, you can see the query used to create that table in Athena when you go to Database -> select your data base from Glue Data Catalog, then click on 3 dots in front of the one "automatically created by crawler table" that you choose as an example, and click on "Generate Create table DDL" option. It will generate a big query for you, modify it as necessary (I believe you need to look at LOCATION and TBLPROPERTIES parts, mostly).
When you run this modified query in Athena, a new table will appear in Glue data catalog. But it will not have any information about your s3 files and partitions, and crawler most likely will not update metastore info for you. So you can in Athena run "MSCK REPAIR TABLE tablename;" query (it's not very efficient, but works for me), and it will add missing file information, in the Result tab you will see something like (in case you use partitions on s3, of course):
Partitions not in metastore: tablename:dt=2020-02-03 tablename:dt=2020-02-04
Repair: Added partition to metastore tablename:dt=2020-02-03
Repair: Added partition to metastore tablename:dt=2020-02-04
After that you should be able to run your Athena queries.
I would like to know if it is possible to have multiple dynamodb request using only one dynamo resolver in AppSync?
Or the only/best way to have more complicated processing is to use a lambda function ?
Practically, no. You even cannot query on multiple indices in a single resource definition for an query, indeed.
However, if you are to use that structure for joining multiple DynamoDB tables, you can attach resolvers not to the query entry; but to the field you want to relate on other fields.
I had an issue like relating users to another table for containing the posts and I've passed it by attaching a resolver aiming the Posts field of the User type.
This issue refers to a similar problem and is quite helpful for that kind of cases: https://github.com/awslabs/aws-mobile-appsync-sdk-js/issues/17
If it is not the case of yours, you can elaborate the question. I may look like guessing your purpose for relating tables, all in all.
Have you looked at batch resolvers with AWS AppSync?https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-dynamodb-batch.html
This will allow you to write to one or more tables in a single request, and also allow you to do multiple write/read/delete operations in a single request.
You can do it with pipeline resolvers
https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-pipeline-resolvers.html