What is difference between aws redshift-data permissions and redshift permissions - amazon-web-services

Basically, I am trying to add permissions for AWS redshift. But AWS provides permissions under two headings, that is, redshift and redshift-data.
For example, if I want to have permissions of ListDatabases, then aws provides two permissions, one is redshift:ListDatabases and another is redshift-data:ListDatabases, not sure which one I should use.
Somewhere I read that redshift gives access for "Amazon Redshift console" and "redshift-data" gives access for "Cluster". Not sure what does that mean.

To explain the differene, lets consider the SDK clients. Notice two clients:
RedshiftClient
RedshiftDataClient
The RedshiftClient lets you manage Amazon Redshift clusters such as creating clusters, deleting clusters and so on.
The RedshiftDataClient lets you perform CRUD operations on the database. For example, insert operations using the executeStatement method.
That is the main difference. One lets you manage clusters and the other lets you perform CRUD operations.

Related

Move data from Amazon S3 to Amazon RDS

Can we connect Amazon S3 buckets present in two different regions and migrate CSV Files data into one Particular region Amazon RDS ? I am trying to use AWS Glue.
There are certainly different ways to solve this use case. You can use AWS Glue. You can also write a workflow using AWS Step Functions that can solve this as well. For example, you can write a series of Lambda functions that can read CSV in an Amazon S3 bucket, get the values and then write the values to an Amazon RDS database. Both ways are valid.
See these docs as ref:
https://aws.amazon.com/blogs/big-data/orchestrate-multiple-etl-jobs-using-aws-step-functions-and-aws-lambda/
https://aws.amazon.com/glue/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc
Keep in mind however. a workflow is not ideal when your data set is so large, it will timeout the 15 min window that Lambda uses. In this case, you should use AWS Glue.

DynamoDB replication across AWS accounts

I am looking for a better way to replicate data from one AWS account DynamoDB to another account.
I know this can be done using Lambda triggers and streams.
Is there something like Global tables which exist in AWS we can use for replication across accounts?
I think the best way for migrating data between accounts is using AWS Data pipeline. This process will essentially take a backup (export) of your DynamoDb table in Account A to a S3 bucket of account B via DataPipeline. Then, one more DataPipeline job in account B would import the data from S3 back to the provided DynamoDb table.
The step by step manual is given in this document
https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb.html.
Also you will need Cross-account access to the S3 bucket which you will be using to store your DynamoDB table data from account A, so the bucket or (the files) you are using must be shared between your account (A) and your destination account (B), till the migration gets completed.
Refer this doc for permissions https://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html
Another approach you can take is using script. There is no direct API for migration. You will have to use two clients, one for each account. One client will scan the data and other client could write that data into the table in another account.
Also I think they do have an import-export tool as mentioned in their AWSLabs repo. Although I have never tried this.
https://github.com/awslabs/dynamodb-import-export-tool

Moving Across AWS Regions: us-east-1 to us-east-2

I have the following currently created in AWS us-east-1 region and per the request of our AWS architect I need to move it all to the us-east-2, completely, and continue developing in us-east-2 only. What are the easiest and least work and coding options (as this is a one-time deal) to move?
S3 bucket with a ton of folders and files.
Lambda function.
AWS Glue database with a ton of crawlers.
AWS Athena with a ton of tables.
Thank you so much for taking a look at my little challenge :)
There is no easy answer for your situation. There are no simple ways to migrate resources between regions.
Amazon S3 bucket
You can certainly create another bucket and then copy the content across, either using the AWS Command-Line Interface (CLI) aws s3 sync command or, for huge number of files, use S3DistCp running under Amazon EMR.
If there are previous Versions of objects in the bucket, it's not easy to replicate them. Hopefully you have Versioning turned off.
Also, it isn't easy to get the same bucket name in the other region. Hopefully you will be allowed to use a different bucket name. Otherwise, you'd need to move the data elsewhere, delete the bucket, wait a day, create the same-named bucket in another region, then copy the data across.
AWS Lambda function
If it's just a small number of functions, you could simply recreate them in the other region. If the code is stored in an Amazon S3 bucket, you'll need to move the code to a bucket in the new region.
AWS Glue
Not sure about this one. If you're moving the data files, you'll need to recreate the database anyway. You'll probably need to create new jobs in the new region (but I'm not that familiar with Glue).
Amazon Athena
If your data is moving, you'll need to recreate the tables anyway. You can use the Athena interface to show the DDL commands required to recreate a table. Then, run those commands in the new region, pointing to the new S3 bucket.
AWS Support
If this is an important system for your company, it would be prudent to subscribe to AWS Support. They can provide advice and guidance for these types of situations, and might even have some tools that can assist with a migration. The cost of support would be minor compared to the savings in your time and effort.
Is it possible for you to create CloudFormation stacks (from existing resources) using the console, then copying the contents of those stacks and run them in the other region (replacing values where they need to be).
See this link: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/resource-import-new-stack.html

How to load data into Redshift from a custom REST API

I am new to AWS and please forgive me if this question is asked previously.
I have a REST API which returns 2 parameters (name, email). I want to load this data into Redshift.
I thought of making a Lambda function which starts every 2 minutes and call the REST API. The API might return max 3-4 records within this 2 minutes.
So, under this situation is it okay to just do a insert operation or I have to still use COPY (using S3)? I am worried only about performance and error-free (robust) data insert.
Also, the Lambda function will start asynchronously every 2 mins, so there might be a overlap of insert operation (but there won't be an overlap in data).
At this situation and if I go with S3 option, I am worried the S3 file generated by previous Lambda invoke will be overwritten and a conflict occurs.
Long story short, what is the best practise to insert fewer records into redshift?
PS: I am okay with using other AWS components as well. I even looked into Firehose which is perfect for me but it can't load data into Private Subnet Redshift.
Thanks all in advance
Yes, it would be fine to INSERT small amounts of data.
The recommendation to always load via a COPY command is for large amounts of data because COPY loads are parallelized across multiple nodes. However, for just a few lines, you can use INSERT without feeling guilty.
If your SORTKEY is a timestamp and you are loading data in time order, there is also less need to perform a VACUUM, since the data is already sorted. However, it is good practice to still VACUUM the table regularly if rows are being deleted.
As you don't have much data; you can use either copy or insert. Copy command is more optimized for bulk insert .. its like giving u capability of batch insert..
both will work equally fine
FYI, AWS now supports Data API feature.
As described in the official document, you can easily access Redshift data using HTTP request without JDBC connection anymore.
The Data API doesn't require a persistent connection to the cluster. Instead, it provides a secure HTTP endpoint and integration with AWS SDKs. You can use the endpoint to run SQL statements without managing connections. Calls to the Data API are asynchronous.
https://docs.aws.amazon.com/redshift/latest/mgmt/data-api.html
Here's the steps you need to use Redshift Data API
Determine if you, as the caller of the Data API, are authorized. For more information about authorization, see Authorizing access to the Amazon Redshift Data API.
Determine if you plan to call the Data API with authentication credentials from Secrets Manager or temporary credentials. For more information, see Choosing authentication credentials when calling the Amazon Redshift Data API.
Set up a secret if you use Secrets Manager for authentication credentials. For more information, see Storing database credentials in AWS Secrets Manager.
Review the considerations and limitations when calling the Data API. For more information, see Considerations when calling the Amazon Redshift Data API.
Call the Data API from the AWS Command Line Interface (AWS CLI), from your own code, or using the query editor in the Amazon Redshift console. For examples of calling from the AWS CLI, see Calling the Data API with the AWS CLI.

Amazon AWS: DynamoDB requirements

Objective: Using iPhone app, I would like the users store objects in DynamoDB and have Fine-Grained Access Control for the objects using IAM with TVM.
The objects will contain only Strings, no images/file storage -- I'm thinking I won't need an S3?
Question: Since there is no server-side application, do I still need an EC2 Instance? What all suite of AWS services will I have to subscribe to in order to accomplish my objective?
You can use either DynamoDB (or S3), and neither of them would require an EC2 instance - there is no dependency.
If it was me, I'd first see if I could get what I wanted down in S3(because you mentioned it as a possibility), and then go to DynamoDB if I couldn't (i.e. I wanted to be able to run agregation queries across my data set). S3 will be cheaper and depending on what your are doing, may even be faster and would allow you to globally distribute the stored data thru CloudFront easily, which if you have a globally diverse user base may be beneficial.