AWS Beanstalk - Customize app to create dynamodb tables - amazon-web-services

I'm trying to deploy my app (nodejs) using AWS beanstalk. I want to create DynamoDB tables during the deployment. I'm trying to use the customization feature of beanstalk, which means I'm trying to write a config file (YAML) under .exextensions. I want to create a table something like this:
TestTable with fields:
field 1 (hash key),
field 2 (range key),
field 3,
field 4, ..
.
When googling, I can find only examples of config files with a single field (e.g. http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-environment-resources-dynamodb.html).
Looking for some examples of config files with tables containing multiple fields or a document containing the entire features/keywords of the YAML template of beanstalk.

Dynamo DB is schemaless so you don't need to specify field3, field4 etc. You can specify key schema as shown on the page you linked. The specific example on the page only uses hash key for the dynamo DB table but you can also specify range key (field2) similar to how hash key is specified. Syntax of the config file needs to have RANGE key in addition to HASH key in KeySchema. It follows the cloudformation resource description syntax. See the following links for details:
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-dynamodb-table.html
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-dynamodb-keyschema.html

Do you want to create a table on each deployment?
If the table is a resource that your app uses, I would go with CloudFormation to define your template (list of AWS resources that you need together, like - 3 EC2 machines, a load-balancer, an RDS server, a DynamoDB table and an ElasticBeanstalk app and environment). Then, your app will only get a reference to the table's name and use it (can be done using an environment variable).
If you haven't use CloudFormation before, you might find a slight learning-curve at the beginning, but at the end it's relatively simple. A template is a JSON file with declarations and after you upload your template, you can create many instances of it (for example - production and staging).
You can find a snippet of using Elastic Beanstalk in CloudFormation here - http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/quickref-elasticbeanstalk.html

Related

Access current environment name for a NextJS app running on amplify

I have added a few tables on DynamoDB using the amplify add storage command.
But the table has a suffix that is the environment name (dev, prod, etc).
How can I access the environment name on my NextJS backend so I can suffix the DynamoDB table name on my code ?
Or there is another way to achieve what I want ?
Amplify automatically creates DynamoDB tables (and also AppSync queries, etc) to match your current Amplify environment. When you create a new environment (eg, 'dev'), the Amplify will automatically create duplicate 'prod' tables, that will perform the same as you 'dev' tables. I'm guessing in your case, you won't need to access environment variables.
If you are using AppSync/GraphQL to make calls, then you can use Amplify's built in dynamic env features here: https://docs.amplify.aws/cli-legacy/graphql-transformer/function/#usage
For example, you could set up a custom Lambda function to update your DynamoDB. You could then set up an AppSync call to that Lambda in your schema.graphql file.
There are some cases where you may need to access your environment variables. You can either set them up manually in .env.local, or possibly easier to run a query in your NextJS javascript to determine the current domain:
const origin =
typeof window !== "undefined" && window.location.origin
? window.location.origin
: "";
console.log(origin); // "https://dev.<>.amplifyapp.com"
An better solution would be to follow this Amplify documentation, except I've tried it and it doesn't work.
I get this in the left nav panel. I've explored each one and no sign of the described Environment Variables section:
It describes accessing/updating env vars here, but apparently you can only find/use this feature if you've connected your Amplify app to Github first. (It would have been nice if the docs had clarified this!)

AWS Glue - custom s3 partition to single table

I have multiple files stored into an S3 bucket under uniquely named folders which I would expect AWS Glue to put into a single table - instead it creates one per file. Any ideas how to configure the crawler to get a single table ?
The current tS3 structure is s3://bucket_name/YYYYMMDDUUID/data.json:
20210801123123cfec/data.json
20210808876551cedc/data.json
....
20210810112313feed/data.json
The json schema is definitely not a problem, it is similar - for example when I change the folder names from the custom names to "1", "2", ... etc I get a single table with multiple partitions.

Assigning default table properties to tables created by a crawler

I'm trying to assign table properties to the tables that are created with a crawler.
The idea is to have all of the tables that are created with a crawler have the same default properties (plus the ones they'd usually have).
I examined the options in the crawler creation interface but didn't see such an option.
Creating a python boto3 script to alter table property values after the table creation is the only thing that comes to mind.
If this is not possible with the default crawler functionality, what is a viable approach to attach table properties to every table that is created with a certain crawler?
EDIT: One possible solution would be to create a lambda function that checks if the custom parameters exist on the glue tables and if not creates them.
Option 1
Directly adding the fields in the definition might be the best way in approaching this (using CloudFormation).
https://docs.amazonaws.cn/en_us/AWSCloudFormation/latest/UserGuide/aws-properties-glue-classifier-csvclassifier.html
Option 2
I guess there's some reason why you do not add the table fields directly. If this should be triggered by the data itself the clean way you might want to look into is writing custom classifiers:
https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html
Option 3
When you need a quick hack you could merge the schema by crawling an additional file with the schema info that's missing and let the crawler merge the fields:
If you have JSON S3 files for example (or any consistent format for your use case) you can add an additional init file and add the columns there. Set
{
"Version": 1.0,
"Grouping": {
"TableGroupingPolicy": "CombineCompatibleSchemas" }
}
Cite from AWS doc:
"To help illustrate this option, suppose that you define a crawler with an include path s3://bucket/table1/. When the crawler runs, it finds two JSON files with the following characteristics:
File 1 – S3://bucket/table1/year=2017/data1.json
File content – {“A”: 1, “B”: 2}
Schema – A:int, B:int
File 2 – S3://bucket/table1/year=2018/data2.json
File content – {“C”: 3, “D”: 4}
Schema – C: int, D: int
By default, the crawler creates two tables, named year_2017 and year_2018 because the schemas are not sufficiently similar. However, if the option Create a single schema for each S3 path is selected, and if the data is compatible, the crawler creates one table. The table has the schema
A:int,B:int,C:int,D:int and partitionKey year:string.
See https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html

Can AWS Glue write to DynamoDB?

I need to do some grouping job from a Source DynamoDB table, then write each resulting Item to another Target DynamoDB table (or a secondary index of the Source one).
Here I see that DynamoDB can be used as a Source (as well as reported in Connection Types).
However, it's not clear to me if a DynamoDB table can be used as Target as well.
Note: each resulting grouping item must be written into a separate DynamoDB Item (i.e., if there are X objects resulting from grouping, X Items must be written to Target DynamoDB table).
Glue can now read and write to DynamoDB. The option to write is not available via the console, but can be done by editing the script.
Example:
Datasink1 = glueContext.write_dynamic_frame_from_options(
frame=ApplyMapping_Frame1,
connection_type="dynamodb",
connection_options={
"dynamodb.output.tableName": "myDDBTable",
"dynamodb.throughput.write.percent": "1.0"
}
)
As per:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html#etl-connect-dynamodb-as-sink
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-dynamo-db-cross-account.html
The Glue Job scripts can be customized to write to any datasource. If you are using the auto generated scripts, you can add boto3 library to write to DynamoDb tables.
If you want to test the scripts easily, you can create a Dev endpoint through AWS console & launch a jupyter notebook to write and test your glue job scripts.

AWS Amplify filter for #searchable annotation

Currently I am using a DynamoDB instance for my social media application. While designing the schema I sticked to the "one table" rule. So I am putting every data in the same table like posts, users, comments etc. Now I want to make flexible queries for my data. Here I found out that I could use the #searchable annotation to create an Elastic Search instance for a table which is annotated with #model
In my GraphQL schema I only have one #model, since I only have one table. My problem now is that I don't want to make everything in the table searchable, since that would be most likely very expensive. There are some data which don't have to be added to the Elastic Search instance (For example comment related data). How could I handle it ? Do I really have to split my schema down into multiple tables to be able to manage the #searchable annotation ? Couldn't I decide If the row should be stored to the Elastic Search with help of the Partitionkey / Primarykey, acting like a filter ?
The current implementation of the amplify-cli uses a predefined python Lambda that are added once we add the #searchable directive to one of our models.
The Lambda code can not be edited and currently, there is no option to define a custom Lambda, you read about it
https://github.com/aws-amplify/amplify-cli/issues/1113
https://github.com/aws-amplify/amplify-cli/issues/1022
If you want a custom Lambda where you can filter what goes to the Elasticsearch Instance, you can follow the steps described here https://github.com/aws-amplify/amplify-cli/issues/1113#issuecomment-476193632
The closest you can get is by creating a template in amplify\backend\api\myapiname\stacks\ where you can manage all the resources related to Elasticsearch. A good start point is to
Add #searchable to one of your model in the schema.grapql
Run amplify api gql-compile
Copy the generated template in the build folder, \amplify\backend\api\myapiname\build\stacks\SearchableStack.json to amplify\backend\api\myapiname\stacks\
Remove the #searchable directive from the model added in step 1
Start editing your new template copied in step 3
Add a Lambda and use it in the template as the resolver for the DynamoDB Stream
Using this approach will give you total control of the resources related to the Elasticsearch service, but, will also require to do it all by your own.
Or, just go by creating a table for each model.
Hope it helps
It is now possible to override the generated streaming function code as well.
thanks to the AWS Support for the information provided
leaved a message on the related github issue as well https://github.com/aws-amplify/amplify-category-api/issues/437#issuecomment-1351556948
All you need is to run
amplify override api
edit the corresponding overrode.ts
change the code with the resources.opensearch.OpenSearchStreamingLambdaFunction.code
resources.opensearch.OpenSearchStreamingLambdaFunction.functionName = 'python_streaming_function';
resources.opensearch.OpenSearchStreamingLambdaFunction.handler = 'index.lambda_handler';
resources.opensearch.OpenSearchStreamingLambdaFunction.code = {
zipFile: `
# python streaming function customized code goes here
`
}
Resources:
[1] https://docs.amplify.aws/cli/graphql/override/#customize-amplify-generated-resources-for-searchable-opensearch-directive
[2]AWS::Lambda::Function Code - Properties - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-lambda-function-code.html#aws-properties-lambda-function-code-properties