I am trying to connect to S3 from EC2 instance using AmazonS3Client, to get the list of objects present in S3 bucket. While I can connect to S3 when running this code from my local machine, I am having a hard time running the same code on EC2.
Am I missing any setting or configuration on EC2 instance?
Code
AWSCredentials credentials = new BasicAWSCredentials("XXXX", "YYYY");
AmazonS3Client conn = new AmazonS3Client(credentials);
String bucketName = "s3-xyz";
String prefix = "123";
ObjectListing objects = conn.listObjects(bucketName, prefix);
List<S3ObjectSummary> objectSummary = objects.getObjectSummaries();
for(S3ObjectSummary os : objectSummary)
{
System.out.println(os.getKey());
}
Errors
ERROR com.amazonaws.http.AmazonHttpClient - Unable to execute HTTP request: Connect to s3-xyz.amazonaws.com:443 timed out
org.apache.http.conn.ConnectTimeoutException: Connect to s3-xyz.s3.amazonaws.com:443 timed out
at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:551)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:640)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:318)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:202)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3037)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3008)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:531)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:515)
ClientConfiguration cc = new ClientConfiguration();
cc.setProxyHost("10.66.80.122");
cc.setProxyPort(8080);
propertiesCredentials = new BasicAWSCredentials(aws_access_key_id, aws_secret_access_key);
s3 = new AmazonS3Client(propertiesCredentials,cc);
To find proxy_host & port go to LAN settings.
Related
I need to copy files from the s3 bucket from one account to another. I am trying to do that via aws-java-sdk client and its functions getObject and putObject. There are a lot of files that should be uploaded. So during putObject run, I get this error:
Exception in thread "main" software.amazon.awssdk.services.s3.model.S3Exception: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: S3, Status Code: 400)
How can this issue be fixed?
Here's the code that produces this error:
val clientConfigurationBuilder = ClientOverrideConfiguration.builder()
val clientConfiguration = clientConfigurationBuilder.build
val builder = AWSS3Client
.builder
.credentialsProvider(createCredentialsProvider(accessKey, secretKey))
.region(Region.of(region))
.overrideConfiguration(clientConfiguration)
.httpClientBuilder(ApacheHttpClient.builder())
endpoint.map(URI.create).foreach(builder.endpointOverride)
val awsS3Client = builder.build
val getObjectRequest = GetObjectRequest
.builder()
.bucket(fromBucket)
.key(fromKey)
.build()
val getObjectResponse = awsS3Client.getObject(getObjectRequest)
val putObjectRequest = PutObjectRequest
.builder()
.bucket(toBucket)
.key(toKey)
.build()
val reqBody = RequestBody.fromInputStream(getObjectResponse,
getObjectResponse.response().contentLength())
awsS3Client.putObject(putObjectRequest, reqBody)
I am trying to download a 19 Mb file from an an Amazon S3 bucket using Amazon SDK but it eventually takes a lot more time than Amazon CLI. The code I am using is below:
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.withRegion(Regions.EU_WEST_1)
.withCredentials(new DefaultAWSCredentialsProviderChain())
.build();
s3Client.getObject(new GetObjectRequest("bucketName", "path/fileName.zip"), new File("localFileName.zip"));
If we compare downloading timings of both mechanisms then: Amazon SDK took around 9 min to get it downloaded whereas Amazon CLI took around 5 seconds.
Is there a way where we can decrease downloading time while using Amazon SDK?
First issue here is you are using the OLD SDK for Java which is V1. Amazon recommends moving to V2 as best practice.
To learn about AWS SDK for Java V2, see:
Developer guide - AWS SDK for Java 2.x
Here is the code you should use to download an object from an Amazon S3 bucket. This is the V2 S3TransferManager:
package com.example.transfermanager;
import software.amazon.awssdk.auth.credentials.EnvironmentVariableCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.transfer.s3.FileDownload;
import software.amazon.awssdk.transfer.s3.S3TransferManager;
import java.nio.file.Paths;
/**
* To run this AWS code example, ensure that you have setup your development environment, including your AWS credentials.
*
* For information, see this documentation topic:
*
* https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html
*/
public class GetObject {
public static void main(String[] args) {
final String usage = "\n" +
"Usage:\n" +
" <bucketName> <objectKey> <objectPath> \n\n" +
"Where:\n" +
" bucketName - the Amazon S3 bucket to upload an object into.\n" +
" objectKey - the object to download (for example, book.pdf).\n" +
" objectPath - the path where the file is written (for example, C:/AWS/book2.pdf). \n\n" ;
if (args.length != 3) {
System.out.println(usage);
System.exit(1);
}
long MB = 1024;
String bucketName = args[0];
String objectKey = args[1];
String objectPath = args[2];
Region region = Region.US_EAST_1;
S3TransferManager transferManager = S3TransferManager.builder()
.s3ClientConfiguration(cfg ->cfg.region(region)
.credentialsProvider(EnvironmentVariableCredentialsProvider.create())
.targetThroughputInGbps(20.0)
.minimumPartSizeInBytes(10 * MB))
.build();
downloadObjectTM(transferManager, bucketName, objectKey, objectPath);
System.out.println("Object was successfully downloaded using the Transfer Manager.");
transferManager.close();
}
public static void downloadObjectTM(S3TransferManager transferManager, String bucketName, String objectKey, String objectPath ) {
FileDownload download =
transferManager.downloadFile(d -> d.getObjectRequest(g -> g.bucket(bucketName).key(objectKey))
.destination(Paths.get(objectPath)));
download.completionFuture().join();
}
}
I just ran this code and downloaded a PDF that is 25 MB in seconds...
Any idea how to set aws proxy host, and region to spark session or spark context.
I am able to set in aws javasdk code, and it is working fine.
ClientConfiguration clientConfig = new ClientConfiguration();
clientConfig.setProxyHost("aws-proxy-qa.xxxxx.organization.com");
clientConfig.setProxyPort(8099));
AmazonS3ClientBuilder.standard()
.withRegion(getAWSRegion(Regions.US_WEST_2)
.withClientConfiguration(clientConfig) //Setting aws proxy host
Can help me to set same thing to spark context ( both region and proxy) since i am reading a s3 file which is different region from emr region.
based on fs.s3a.access.key and fs.s3a.secret.key region will be automatically determined.
just like other s3 properties
set this to sparkConf
/**
* example getSparkSessionForS3
* #return
*/
def getSparkSessionForS3():SparkSession = {
val conf = new SparkConf()
.setAppName("testS3File")
.set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
.set("spark.hadoop.fs.s3a.endpoint", "yourendpoint")
.set("spark.hadoop.fs.s3a.connection.maximum", "200")
.set("spark.hadoop.fs.s3a.fast.upload", "true")
.set("spark.hadoop.fs.s3a.connection.establish.timeout", "500")
.set("spark.hadoop.fs.s3a.connection.timeout", "5000")
.set("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", "2")
.set("spark.hadoop.com.amazonaws.services.s3.enableV4", "true")
.set("spark.hadoop.com.amazonaws.services.s3.enforceV4", "true")
.set("spark.hadoop.fs.s3a.proxy.host","yourhost")
val spark = SparkSession
.builder()
.config(conf)
.getOrCreate()
spark
}
I am trying to upload an image to AWS S3.
The web app runs in my local desktop in tomcat server.
When I upload the image from server, I see the file details in http request multipart file , I'm able to get its size and details.
This is how I set up connection
File convFile = new File( file.getOriginalFilename());
file.transferTo(convFile);
AmazonS3 s3 = AmazonS3ClientBuilder.standard()
.withRegion(Regions.US_WEST_2) //regionName is a string for a region not supported by the SDK yet
.withCredentials(new AWSStaticCredentialsProvider
(new BasicAWSCredentials("key", "accessId")))
// .setEndpointConfiguration(new EndpointConfiguration("https://s3.console.aws.amazon.com", "us-west-1"))
.enablePathStyleAccess()
.disableChunkedEncoding()
.build();
s3.putObject(new PutObjectRequest(bucketName, "key", convFile));
I tried two methodologies.
1) Converting Multipart file to java.io.File and uploading
Error: com.amazonaws.SdkClientException: Unable to calculate MD5 hash: MyImage.png (No such file or directory)
2) Sending the image as bytestream
Error: I am getting java.io.FileNotFound Exception: /path/to/tomcat/MyImage.tmp not found
The actual image name is MyImage.png.
Either method I try, I get exception.
Ok. There were several issues.
I mis typed the Region for a different set of keys.
But still the issues was happening and I went back to 1.11.76 version. And still there were some issues and this is how I fixed.
ObjectMetadata objectMetadata = new ObjectMetadata();
objectMetadata.setContentType(file.getContentType());
byte[] contentBytes = null;
try {
InputStream is = file.getInputStream();
contentBytes = IOUtils.toByteArray(is);
} catch (IOException e) {
System.err.printf("Failed while reading bytes from %s", e.getMessage());
}
Long contentLength = Long.valueOf(contentBytes.length);
objectMetadata.setContentLength(contentLength);
objectMetadata.setHeader("filename", fileNameWithExtn);
/*
* Reobtain the tmp uploaded file as input stream
*/
InputStream inputStream = file.getInputStream();
File convFile = new File(fileNameWithExtn); //If i don't do //this, I think I was getting file not found or MD5 error.
file.transferTo(convFile);
FileUtils.copyInputStreamToFile(inputStream, convFile); //you //need to have commons.io in your pom.xml for this FileUtils to work. Not //the apache FileUtils.
AmazonS3 s3 = new AmazonS3Client(new AWSStaticCredentialsProvider
(new BasicAWSCredentials("<yourkeyId>", "<YourAccessKey>")));
s3.setRegion(Region.US_West.toAWSRegion());
s3.setEndpoint("yourRegion.amazonaws.com");
versionId = s3.putObject(new PutObjectRequest("YourBucketName", name, convFile)).getVersionId();
I am having trouble to use the provisioner (both "file" and "remote-exec") with aws lightsail. For the "file" provisioner, I kept getting a dialup error to port 22 with connection refused, the "remote-exec" gives me a timeout error. I can see it keeps trying to connect to the instance but it just can not connect to it.
For the file provisioner, I have also tried with scp directly and it works just fine.
A sample snippet of the connection block I am using is as the following:
resource "aws_lightsail_instance" "han-mongo" {
name = "han-mongo"
availability_zone = "us-east-1b"
blueprint_id = "ubuntu_16_04"
bundle_id = "nano_1_0"
key_pair_name = "my_key_pair"
user_data = "${file("userdata.sh")}"
provisioner "file" {
source = "file.service"
destination = "/home/ubuntu"
connection {
type = "ssh"
private_key = "${file("my_key.pem")}"
user = "ubuntu"
timeout = "20s"
}
}
}
In addition to the authentication information, it's also necessary to tell Terraform which IP address it should use to connect, like this:
connection {
type = "ssh"
host = "${self.public_ip_address}"
private_key = "${file("my_key.pem")}"
user = "ubuntu"
timeout = "20s"
}
For some resources Terraform is able to automatically infer some of the connection details from the resource attributes, but at present that is not supported for Lightsail instances and so it's necessary to specify the host argument explicitly.