I have a very restrictive policy on QA environment's aws in my company. I need a way to clean up dynamo db tables using jenkins. One of the thing I could think of using aws cli commands but I could not find a way to wipe out the content of dynamo db by just using aws cli. If there was a command I could have easily done it using aws cli commands on Jenkins. Any insights would be really helpful.
We had the same problem. Deleting bulk records is time consuming, costly proceess.
We delete the table and recreate it and recreate the data as needed.
I have not seen anything special with jenkins except running the aws cli shell script.
Hope it helps.
There are couple of concerns in terms, if you allow Jenkins to directly access DynamoDB and delete content, make sure to grant fine grained access control with IAM policies given to the AWS CLI execution to restrict permission to delete the data in particular tables.
Another approach is to use a SNS trigger (HTTP, Email & etc.) by Jenkins where it will run a Lambda function to delete the content. Here you do not need to give AWS DynamoDB Access Permission to your Jenkins Server and Script can be version controlled and managed outside Jenkins.
You can also consider using DynamoDB Local for Testing purposes.
The downloadable version of DynamoDB lets you write and test
applications without accessing the DynamoDB web service. Instead, the
database is self-contained on your computer. When you're ready to
deploy your application in production, you can make a few minor
changes to the code so that it uses the DynamoDB web service.
Related
I have two Redshift clusters on two different regions, one is production and the other is development. I need to export multiple schemas from production cluster to dev cluster on a weekly basis. So I am using the Prod UNLOAD --> S3 --> LOAD Dev approach to do this.
Currently I am using the query below which will return a table of query commands I need to run.
select 'unload (''select * from '||n.nspname||'.'||c.relname||''') to ''s3://my-redshift-data/'||c.relname||'_''
iam_role ''arn:aws:iam::xxxxxxxxxxxxxx:role/my-redshift-role'' ;' as sql
from pg_class c
left join pg_namespace n on c.relnamespace=n.oid
where n.nspname in ('schema1','schema2')and relkind = 'r'
The results returned look something like below:
unload ('select * from schema1.table1') to 's3://my-redshift-data/table1_' iam_role 'arn:aws:iam::xxxxxxxxxxxxxx:role/my-redshift-role';
unload ('select * from schema2.table2') to 's3://my-redshift-data/table2_' iam_role 'arn:aws:iam::xxxxxxxxxxxxxx:role/my-redshift-role';
Just some additional information, I have a Kubernetes cluster on the same vpc as my Redshift cluster, running apps that are connecting to Redshift cluster. I also have a Gitlab server that have gitlab runners that are connected to Kubernetes cluster using Gitlab agent. There are a few ways I can think of to do this:
Using Gitlab scheduled pipeline and run the UNLOAD/LOAD script using Redshift Data API
Using Kubernetes batch job to run UNLOAD/LOAD script using Redshift Data API
Using AWS Lambda and write a python script (something new to me), and schedule it using Event Bridge?
I would appreciate any suggestion because I can't decide what is the best way to approach.
Whenever building anything new it's helpful to think about the time / cost / quality tradeoff. How much time do you have? How much time do you want to spend supporting it in the future? Will you have time to improve it later if necessary? Will someone else be able to pick it up to improve it? etc.
If you're looking to get this set up with as little effort as possible, and worry about the future when it happens, it's likely that a Gitlab pipeline & Redshift Data API will do the job. It is quite literally just a few calls to that API.
you need an IAM role for your Gitlab runner to authenticate to the API, which you likely already have. Obviously check that the role has the correct access.
you can store the database credentials as Gitlab variables as a fast/simple option, or leverage some more secure secret store if necessary / critical
remember that the IAM role needs to access both the "source" (prod) and "target" (dev) clusters; if they are in different accounts this is more setup.
We are successfully spinning up an AWS CloudFormation stack that includes a serverless RDS PostgreSQL instance. Once the PostgreSQL instance is in place, we're automatically restoring a PostgreSQL database dump (in binary format) that was created using pg_dump on a local development machine on the instance PostgreSQL instance just created by CloudFormation.
We're using a Lambda function (instantiated by the CloudFormation process) that includes a build of the pg_restore executable within a Lambda layer and we've also packaged our database dump file within the Lambda.
The above approached seems complicated for something that presumably has been solved many times... but Google searches have revealed almost nothing that corresponds to our scenario. We may be thinking about our situation in the wrong way, so please feel free to offer a different approach (e.g., is there a CodePipeline/CodeBuild approach that would automate everything). Our preference would be to stick with the AWS toolset as much as possible.
This process will be run every time we deploy to a new environment (e.g., development, test, stage, pre-production, demonstration, alpha, beta, production, troubleshooting) potentially per release as part of our CI/CD practices.
Does anyone have advice or a blog post that illustrates another way to achieve our goal?
If you have provisioned everything via IaC (Infrastructure as Code) anyway that is most of the time-saving done and you should already be able to replicate your infrastructure to other accounts/ stacks via different roles in your AWS credentials and config files by passing the —profile some-profile flag. I would recommend AWS SAM (Serverless application model) over Cloudformation though as I find I only need to write ~1/2 the code (roles & policies are created for you mostly) and much better & faster feedback via the console. I would also recommend sam sync (it is in beta currently but worth using) so you don’t need to create a ‘change set’ on code updates, purely the code is updated so deploys take 3-4 secs. Some AWS SAM examples here, check both videos and patterns: https://serverlessland.com (official AWS site)
In terms of the restore of RDS, I’d probably create a base RDS then take a snapshot of that and restore all other RDS instances from snapshots rather than manually creating it all the time you are able to copy snapshots and in fact automate backups to snapshots cross-account (and cross-region if required) which should be a little cleaner
There is also a clean solution for replicating data instantaneously across AWS Accounts using AWS DMS (Data Migration Services) and CDC (Change Data Capture). So, process is you have a source DB and a target DB and also a 'Replication Instance' (eg EC2 Micro) that monitors your source DB and can replicate that across to different instances (so you are always in sync on data, for example if you have several devs working on separate 'stacks' to not pollute each others' logs, you can replicate data and changes out seamlessly if required from one DB) - so this works with Source DB and Destination DB both already in AWS however you can also use AWS DMS of course for local DB migration over to AWS, some more info here: https://docs.aws.amazon.com/dms/latest/sbs/chap-manageddatabases.html
Is there some way to 'dehydrate' or extract an entire AWS setup? I have a small application that uses several AWS components, and I'd like to put the project on hiatus so I don't get charged every month.
I wrote / constructed the app directly through the various services' sites, such as VPN, RDS, etc. Is there some way I can extract my setup into files so I can save these files in Version Control, and 'rehydrate' them back into AWS when I want to re-setup my app?
I tried extracting pieces from Lambda and Event Bridge, but it seems like I can't just 'replay' these files using the CLI to re-create my application.
Specifically, I am looking to extract all code, settings, connections, etc. for:
Lambda. Code, Env Variables, layers, scheduling thru Event Bridge
IAM. Users, roles, permissions
VPC. Subnets, Route tables, Internet gateways, Elastic IPs, NAT Gateways
Event Bridge. Cron settings, connections to Lambda functions.
RDS. MySQL instances. Would like to get all DDL. Data in tables is not required.
Thanks in advance!
You could use Former2. It will scan your account and allow you to generate CloudFormation, Terraform, or Troposphere templates. It uses a browser plugin, but there is also a CLI for it.
What you describe is called Infrastructure as Code. The idea is to define your infrastructure as code and then deploy your infrastructure using that "code".
There are a lot of options in this space. To name a few:
Terraform
Cloudformation
CDK
Pulumi
All of those should allow you to import already existing resources. At least Terraform has a import command to import an already existing resource into your IaC project.
This way you could create a project that mirrors what you currently have in AWS.
Excluded are things that are strictly taken not AWS resources, like:
Code of your Lambdas
MySQL DDL
Depending on the Lambdas deployment "strategy" the code is either on S3 or was directly deployed to the Lambda service. If it is the first, you just need to find the S3 bucket etc and download the code from there. If it is the second you might need to copy and paste it by hand.
When it comes to your MySQL DDL you need to find tools to export that. But there are plenty tools out there to do this.
After you did that, you should be able to destroy all the AWS resources and then deploy them later on again from your new IaC.
CloudFormation amateur here. Been looking online and can't find any references as to how I would go about creating my tables after my RDS instance is stood up through CloudFormation. Is it possible to specify a Lambda to launch and create all the tables, or maybe specify a SQL file to be applied? What's the standard pattern on this?
There aren't any CloudFormation resources which deal with the 'internals' of an RDS instance once it's been created; it's a black box which you're expected to configure (with users, tables and data) outside of CloudFormation. This is a bit sad, as versioning, managing and deploying database schemas is a classic deployment problem on the borderline between infrastructure and application, which is also exactly where CloudFormation sits (as a tool for versioning, managing and deploying other infrastructure).
What you could do, although it's not a beginner-level feature, is use a custom resource to connect to the RDS instance once it's created, and run appropriate SQL commands to create the users and tables. You define the schema in your CloudFormation template (either as SQL or a more structured description similar to DynamoDB AtributeDefinitions), and that schema is passed to your custom resource lambda function, which then runs the queries in the database.
If you go down this route, you'll probably do a lot of wheel-inventing of how to translate differences in the 'before' and 'after' schemas into ALTER TABLE SQL statements to fire at the database. Your custom resource lambda code needs to be very robust and always send a response back to CloudFormation, even in the case of an error. And don't forget that if your stack update fails for any reason, CloudFormation will call your custom resource again 'in reverse' (ie asking you to alter the database from the post-update schema back to the pre-update schema) as part of the rollback process.
If you do go down this road and come up with something at all robust, do publish it, because I'm sure a lot of people would be interested in it! I had a quick look online but I couldn't find anything obvious pre-existing.
Unfortunately, CF template cannot run the setup script directly. You can try, like what you have mentioned, using Lambda to run the table set up once the RDS has been created successfully by the CF template. DependsOn attribute helps. You should pass the credentials to Lambda to access the RDS.
Is there a way list/view(graphically?) all created resources on amazon? All the db's users, pools etc.
The best way I can think of is to run each of the cli aws <resource> ls commands in a bash file.
What would be great would be to have a graphical tool that showed all the relationships. Is anyone aware of such a tool?
UPDATE
I decided to make my own start on this, currently its just on the cli, but might move to graphical output. Help needed!
https://github.com/QuantumInformation/aws-x-ray
No, it is not possible to easily list all services created on AWS.
Each service has a set of API calls and will typically have Describe* calls that can list resources. However, these commands would need to be issued to each service individually and they typically have different syntax.
There are third-party services (eg Kumolus) that offer functionality to list and visualize services but they are typically focussed on Amazon EC2 and Amazon VPC-based services. They definitely would not go 'into' a database to list DB users, but they would show Amazon RDS instances.