I have IBM Datastage server installed on premises.
I want to connect to an Amazon S3 bucket from datastage to load data.
How can i establish a connection to Amazon S3 from datastage server.
From doing some online reading, the IBM product seems to work with a Java API.
https://www.ibm.com/docs/en/iis/11.5?topic=connections-java-code
Therefore, you can use the AWS SDK for Java to invoke Amazon S3 operations. If you are not familiar with how to use the AWS SDK for Java (V2), see this doc topic:
Get started with the AWS SDK for Java 2.x
Depending on your version, you may even have an S3 Connector stage available in your Palette.
I am planning to use Mulesoft API S3 connector to integrate with AWS and I have a few queries.
Will the connector move the data via internet or other channel?
Is the data encrypted in transit?
Will the connector move the data via internet or other channel?
The documentation states that:
...you must have access to the Amazon S3 target resource, Amazon Web
Services, and Anypoint Platform.
Basically the connector uses AWS SDK to connect to AWS S3 services, which uses HTTP REST APIs. The application has to have access to connect to S3 over the Internet because it will assume access to the public DNS hosts names for AWS.
Is the data encrypted in transit?
You can use an AWS KWS master key to encrypt objects. Check with AWS documentation what that provides in term of encryption.
I have an Amazon AWS RDS (PostgreSQL) database. I am trying to connect it to Amazon API Gateway as simply as possible (AWS Service WITHOUT Lambda).
I am trying to perform a simple get request. To get all "animals" (table name "animals") from the db.
The question is which Action to select. All actions in the documentation change the db. And I need only to perform a simple GET request.
Also we need to setup the policy and specify the actions for it.
Api Gateway request:
https://i.ibb.co/2hkdVqZ/AWS.png
Api Gateway Policy:
https://i.ibb.co/vk8pLzd/AWS2.png
The AWS API is for creating/changing the DB server itself, as you have mentioned. You can't query the RDS database directly from the AWS API. You have to create a DB connection to the PostgreSQL database using traditional database drivers in order to run queries against the database.
You will need to use a Lambda function to accomplish what you are trying to achieve.
When we upload data to S3, is it protected in transit by default (via HTTPS maybe)?
I found this article which, if I understand correctly, states S3 does not use HTTPS:
Amazon Simple Storage Service: You can still use HTTP with Amazon S3
and securely make authenticated requests. The service uses a different
secure signing protocol.
Should we in this case protect the data in transit with Client-Side Encryption?
The article you cited is obsolete. It was originally written in 2008, and apparently when updated in 2015, some of the outdated information was left in place.
The version refers to the particular algorithm for signing the request. These AWS services have deprecated the older, less-secure methods (signature versions 0 and 1) and will no longer allow them after September 2009.
Indeed, versions 0 and 1 are not supported.
A few AWS services don't support signature version 2:
Amazon Simple Storage Service: You can still use HTTP with Amazon S3 and securely make authenticated requests. The service uses a different secure signing protocol.
This is also inaccurate. S3 supports signature version 2 in all regions where signature version 2 was deployed. Regions launched in 2014 or later do not support V2 at all, they require Signature Version 4, and in those regions, S3 also requires Signature Version 4.
Importantly, though, none of this has anything at all to do with HTTPS.
From the same document:
Most AWS services accept HTTPS requests, including:
...
Amazon Simple Storage Service
Okay, so, let's revisit this line:
The service uses a different secure signing protocol.
This statement is not about encryption, or security of the payload. This is a statement about the secuity of the request authentication and authorization process -- its resistance to forgery and reverse-engineering -- whether or not the request is sent encrypted.
HTTPS is supported by S3, to protect data in transit.
Quoting from the Security section of the S3 FAQs:
You can securely upload/download your data to Amazon S3 via SSL
endpoints using the HTTPS protocol.
If you're using the https:// endpoint for S3, then your data in transit should be encrypted properly. The quote that you referred to in the question means that it's also possible to access S3 using http:// protocol, in which case the data wouldn't be encrypted in transit. See this related question.
If you were asking specifically about whether AWS CLI encrypts data in transit, then the answer is yes. See this question.
Also, please note that the primary purpose of using client-side encryption would be to encrypt data at rest, and to use an encryption algorithm of your own choosing. If you use client-side encryption but still use the http:// endpoint, your communication over the wire would still be unencrypted, technically speaking, because the cyphertexts being passed over the wire could be extracted by an attacker for analysis.
Update:
If you were asking specifically about AWS Java SDK, the default protocol is again https. Quoting from javadocs for AWS Java SDK:
By default, all service endpoints in all regions use the https
protocol. To use http instead, specify it in the ClientConfiguration
supplied at construction.
And from the javadocs for ClientConfiguration.getProtocol:
The default configuration is to use HTTPS for all requests for
increased security.
Client-side/server-side encryption's primary purpose is to secure data at rest. If anyone was to break open your cloud provider's data center somehow and steal the disks that had your data, you're making it difficult for them to get hold of your data in plaintext by encrypting it either client-side/server-side. Doing it client-side gives you the benefit of having more control on the encryption algorithm with the seemingly additional side-effect of your data not being transmitted in plaintext over the wire. However, the communication channel itself is not encrypted. If you were using a weak encryption algorithm for example, an attacker could still sniff the encrypted data over the wire and decrypt it. Also, it's important to know that using SSL means:
You as the client can be sure you're talking to AWS
Your communication with AWS is encrypted, so others can't intercept it
You have verification that the message received is the same as the
message sent
In essence, you definitely want to use SSL irrespective of whether you want to use client-side encryption or not.
I am going to use Amazon S3 Infrequent Access. I have played around and found that it is possible to access this service in two ways.
Making Requests Using the REST API. This way looks pretty simple and clear.
Using Amazon API gateway. I am not big expert in this service and one different that I have found that payload size is limited to 10 MB.
What other advantages/disadvantages has using Amazon S3 Infrequent Access via Amazon API gateway?
If you are going to use Amazon S3, then you should call S3 directly. This can be done via the REST API or via AWS SDKs for most popular programming languages.
You can also use the AWS Command-Line Interface (CLI), which makes it possible to write scripts that can call the AWS API.
Amazon API Gateway allows you to create your own APIs that can call an AWS Lambda function or your own application. It should not be used to call Amazon S3 unless you are trying to Create an API as an Amazon S3 Proxy, which is a rare situation.