Updating a file in Amazon S3 bucket - amazon-web-services

I am trying to append a string to the end of a text file stored in S3.
Currently I just read the contents of the file into a String, append my new text and resave the file back to S3.
Is there a better way to do this. I am thinkinig when the file is >>> 10MB then reading the entire file would not be a good idea so how should I do this correctly?
Current code
[code]
private void saveNoteToFile( String p_note ) throws IOException, ServletException
{
String str_infoFileName = "myfile.json";
String existingNotes = s3Helper.getfileContentFromS3( str_infoFileName );
existingNotes += p_note;
writeStringToS3( str_infoFileName , existingNotes );
}
public void writeStringToS3(String p_fileName, String p_data) throws IOException
{
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream( p_data.getBytes());
try {
streamFileToS3bucket( p_fileName, byteArrayInputStream, p_data.getBytes().length);
}
catch (AmazonServiceException e)
{
e.printStackTrace();
} catch (AmazonClientException e)
{
e.printStackTrace();
}
}
public void streamFileToS3bucket( String p_fileName, InputStream input, long size)
{
//Create sub folders if there is any in the file name.
p_fileName = p_fileName.replace("\\", "/");
if( p_fileName.charAt(0) == '/')
{
p_fileName = p_fileName.substring(1, p_fileName.length());
}
String folder = getFolderName( p_fileName );
if( folder.length() > 0)
{
if( !doesFolderExist(folder))
{
createFolder( folder );
}
}
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(size);
AccessControlList acl = new AccessControlList();
acl.grantPermission(GroupGrantee.AllUsers, Permission.Read);
s3Client.putObject(new PutObjectRequest(bucket, p_fileName , input,metadata).withAccessControlList(acl));
}
[/code]

It's not possible to append to an existing file on AWS S3. When you upload an object it creates a new version if it already exists:
If you upload an object with a key name that already exists in the
bucket, Amazon S3 creates another version of the object instead of
replacing the existing object
Source: http://docs.aws.amazon.com/AmazonS3/latest/UG/ObjectOperations.html
The objects are immutable.
It's also mentioned in these AWS Forum threads:
https://forums.aws.amazon.com/message.jspa?messageID=179375
https://forums.aws.amazon.com/message.jspa?messageID=540395

It's not possible to append to an existing file on AWS S3.
You can delete existing file and upload new file with same name.
Configuration
private string bucketName = "my-bucket-name-123";
private static string awsAccessKey = "AKI............";
private static string awsSecretKey = "+8Bo..................................";
IAmazonS3 client = new AmazonS3Client(awsAccessKey, awsSecretKey,
RegionEndpoint.APSoutheast2);
string awsFile = "my-folder/sub-folder/textFile.txt";
string localFilePath = "my-folder/sub-folder/textFile.txt";
To Delete
public void DeleteRefreshTokenFile()
{
try
{
var deleteFileRequest = new DeleteObjectRequest
{
BucketName = bucketName,
Key = awsFile
};
DeleteObjectResponse fileDeleteResponse = client.DeleteObject(deleteFileRequest);
}
catch (Exception ex)
{
throw new Exception(ex.Message);
}
}
To Upload
public void UploadRefreshTokenFile()
{
FileInfo file = new FileInfo(localFilePath);
try
{
PutObjectRequest request = new PutObjectRequest()
{
InputStream = file.OpenRead(),
BucketName = bucketName,
Key = awsFile
};
PutObjectResponse response = client.PutObject(request);
}
catch (Exception ex)
{
throw new Exception(ex.Message);
}
}

One option is to write the new lines/information to a new version of the file. This would create a LARGE number of versions. But, essentially, whatever program you are using the file for could read ALL the versions and append them back together when reading it (this seems like a really bad idea as I write it out).
Another option would be to write a new object each time with a time stamp appended to the object name. my-log-file-date-time . Then whatever program is reading from it could append them all together after downloading my-log-file-*.
You would want to delete objects older than a certain time just like log rotation.
Depending on how busy your events are this might work. If you have thousands per second, I don't think this would work. But if you just have a few events per minute it may be reasonable.

You can do it with s3api put-object.
First download the version you want and use below commend. it will upload as the latest version.
ᐅ aws s3api put-object --bucket $BUCKET --key $FOLDER/$FILE --body $YOUR_LOCAL_DOWNLOADED_VERSION_FILE

Related

Pentaho ETL Transformation Using Lamda

Is it possible to run Pentaho ETL Jobs/transformation using AWS Lamda functions?
I have Pentaho ETL jobs running on schedule on the Windows server, we are planning to migrate to AWS. I am considering the Lambda function. just to understand if it is possible to schedule the Pentaho ETL Jobs using AWS Lamdba
Here is the snippet of code that I was able to successfully run in AWS Lambda Function.
handleRequest Function is called from AWS Lambda Function
public Integer handleRequest(String input, Context context) {
parseInput(input);
return executeKtr(transName);
}
parseInput: This function is used to parse out a string parameter passed by Lambda Function to extract KTR name and its parameters with value. Format of the input is "ktrfilename param1=value1 param2=value2"
public static void parseInput(String input) {
String[] tokens = input.split(" ");
transName = tokens[0].replace(".ktr", "") + ".ktr";
for (int i=1; i<tokens.length; i++) {
params.add(tokens[i]);
}
}
Executing KTR: I am using git repo to store all my KTR files and based on the name passed as a parameter KTR is executed
public static Integer executeKtr(String ktrName) {
try {
System.out.println("Present Project Directory : " + System.getProperty("user.dir"));
String transName = ktrName.replace(".ktr", "") + ".ktr";
String gitURI = awsSSM.getParaValue("kattle-trans-git-url");
String repoLocalPath = clonePDIrepo.cloneRepo(gitURI);
String path = new File(repoLocalPath + "/" + transName).getAbsolutePath();
File ktrFile = new File(path);
System.out.println("KTR Path: " + path);
try {
/**
* IMPORTANT NOTE FOR LAMBDA FUNCTION MUST CREATE .KEETLE DIRECOTRY OTHERWISE
* CODE WILL FAIL IN LAMBDA FUNCTION WITH ERROR CANT CREATE
* .kettle/kettle.properties file.
*
* ALSO SET ENVIRNOMENT VARIABLE ON LAMBDA FUNCTION TO POINT
* KETTLE_HOME=/tmp/.kettle
*/
Files.createDirectories(Paths.get("/tmp/.kettle"));
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException("Error Creating /tmp/.kettle directory");
}
if (ktrFile.exists()) {
KettleEnvironment.init();
TransMeta metaData = new TransMeta(path);
Trans trans = new Trans(metaData);
// SETTING PARAMETERS
trans = parameterSetting(trans);
trans.execute( null );
trans.waitUntilFinished();
if (trans.getErrors() > 0) {
System.out.print("Error Executing transformation");
throw new RuntimeException("There are errors in running transformations");
} else {
System.out.print("Successfully Executed Transformation");
return 1;
}
} else {
System.out.print("KTR File:" + path + " not found in repo");
throw new RuntimeException("KTR File:" + path + " not found in repo");
}
} catch (KettleException e) {
e.printStackTrace();
throw new RuntimeException(e.getMessage());
}
}
parameterSetting: If KTR is accepting parameter and it is passed while calling AWS Lambda function, it is set using parameterSetting function.
public static Trans parameterSetting(Trans trans) {
String[] transParams = trans.listParameters();
for (String param : transParams) {
for (String p: params) {
String name = p.split("=")[0];
String val = p.split("=")[1];
if (name.trim().equals(param.trim())) {
try {
System.out.println("Setting Parameter:"+ name + "=" + val);
trans.setParameterValue(name, val);
} catch (UnknownParamException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
trans.activateParameters();
return trans;
}
CloneGitRepo:
public class clonePDIrepo {
/**
* Clones the given repo to local folder
*
* #param pathWithPwd Gir repo URL with access token included in the url. e.g.
* https://token_name:token_value#github.com/ktr-git-repo.git
* #return returns Local Repository String Path
*/
public static String cloneRepo(String pathWithPwd) {
try {
/**
* CREATING TEMP DIR TO AVOID FOLDER EXISTS ERROR, THIS TEMP DIRECTORY LATER CAN
* BE USED TO GET ABSOLETE PATH FOR FILES IN DIRECTORY
*/
File pdiLocalPath = Files.createTempDirectory("repodir").toFile();
Git git = Git.cloneRepository().setURI(pathWithPwd).setDirectory(pdiLocalPath).call();
System.out.println("Git repository cloned successfully");
System.out.println("Local Repository Path:" + pdiLocalPath.getAbsolutePath());
// }
return pdiLocalPath.getAbsolutePath();
} catch (Exception e) {
e.printStackTrace();
return null;
}
}
}
AWSSSMgetParaValue: Gets string value of the parameter passed.
public static String getParaValue(String paraName) {
try {
Region region = Region.US_EAST_1;
SsmClient ssmClient = SsmClient.builder()
.region(region)
.build();
GetParameterRequest parameterRequest = GetParameterRequest.builder()
.name(paraName)
.withDecryption(true)
.build();
GetParameterResponse parameterResponse = ssmClient.getParameter(parameterRequest);
System.out.println(paraName+ " value retreived from AWS SSM");
ssmClient.close();
return parameterResponse.parameter().value();
} catch (SsmException e) {
System.err.println(e.getMessage());
return null;
}
}
Assumptions:
Git repo is created with KTR files in the root of the repo
git repo url exists on the aws SSM with valid tokens to clone the repo
Input string contains name of the KTR file
Environment Variable is configured on Lambda Function for KETTLE_HOME=/tmp/.kettle
Lambda Function has necessary permissions for SSM and S3 VPC Network
Proper Security Group rules are setup to allow required network access for the KTR File
I am planning to upload complete code to git. I will update this post with the URL of the repository.

Read n number of lines from s3 object using AWS lambda

In my Lambda I am trying to parse the content of a document from s3 bucket. The document I am processing is a txt file with more than 100Mb. I need to parse only the first line of the file.
What is the best cost-effective way to read the file?
Currently, I am taking the content using getObjectContent() method and taking the 1st line from it like this.
private AmazonS3 s3 = AmazonS3ClientBuilder.standard().build ();
GetObjectRequest getObjectRequest = new GetObjectRequest(bucket, key);
S3Object s3Object = s3.getObject(getObjectRequest);
BufferedReader reader = new BufferedReader(new InputStreamReader(s3Object.getObjectContent()));
String firstLine;
try {
while ((firstLine = reader.readLine()) != null) {
logger.log("META PROCESSOR | FIRST LINE OF FILE : " + firstLine);
break;
}
} catch (IOException e) {
logger.log("META PROCESSOR | FAILED TO LOAD FIRST LINE ");
return null;
}
Is it a good way to read the entire content just to read the first line? Is there any method available to read n number of lines from a file or n number of bytes from a file?

Uploading large files to aws s3 bucket without loading to memory

I am trying to upload 2GB+ file to my bucket using MultipartFile and AmazonS3, controller:
#PostMapping("/uploadFile")
public String uploadFile(#RequestPart(value = "file") MultipartFile file) throws Exception {
String fileUploadResult = this.amazonClient.uploadFile(file);
return fileUploadResult;
}
amazonClient-uploadFile:
public String uploadFile(MultipartFile multipartFile) throws Exception {
StringBuilder fileUrl = new StringBuilder();
try {
File file = convertMultiPartToFile(multipartFile);
String fileName = generateFileName(multipartFile);
fileUrl.append(endpointUrl);
fileUrl.append("/");
fileUrl.append(bucketName);
fileUrl.append("/");
fileUrl.append(fileName);
uploadFileTos3bucket(fileName, file);
file.delete();
} catch (Exception e) {
e.printStackTrace();
throw e;
}
return fileUrl.toString();
}
amazonClient-convertMultiPartToFile:
private File convertMultiPartToFile(MultipartFile file) throws IOException {
File convFile = new File(file.getOriginalFilename());
FileOutputStream fos = new FileOutputStream(convFile);
fos.write(file.getBytes());
fos.close();
return convFile;
}
amazonClient-uploadFileTos3bucket:
private void uploadFileTos3bucket(String fileName, File file) {
s3client.putObject(
new PutObjectRequest(bucketName, fileName, file)
.withCannedAcl(CannedAccessControlList.PublicRead));
}
The process works well for small file, to deal with large ones I defined in my application.properties -
spring.servlet.multipart.max-file-size=5GB
spring.servlet.multipart.max-request-size=5GB
spring.servlet.multipart.enabled=true
And getting - java.lang.OutOfMemoryError so:
1- How can I upload file without loading it to memory(not sure that is possible)?
2- How to load it in smaller parts?
Excption -
{"timestamp":"2018-11-12T12:50:38.250+0000","status":500,"error":"Internal Server Error","message":"No message available","trace":"java.lang.OutOfMemoryError\r\n\tat java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)\r\n\tat java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)\r\n\tat java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)\r\n\tat java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)\r\n\tat org.springframework.util.StreamUtils.copy(StreamUtils.java:143)\r\n\tat org.springframework.util.FileCopyUtils.copy(FileCopyUtils.java:110)\r\n\tat org.springframework.util.FileCopyUtils.copyToByteArray(FileCopyUtils.java:162)\r\n\tat org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile.getBytes(StandardMultipartHttpServletRequest.java:245)\r\n\tat com.siemens.plm.it.aws.connect.handels.AmazonClient.convertMultiPartToFile(AmazonClient.java:51)\r\n\tat com.siemens.plm.it.aws.connect.handels.AmazonClient.uploadFile(AmazonClient.java:75)\r\n\tat com.siemens.plm.it.aws.connect.controllers.BucketController.uploadFile(BucketController.java:48)\r\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\r\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\r\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\r\n\tat java.lang.reflect.Method.invoke(Method.java:498)\r\n\tat org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:215)\r\n\tat org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:142)\r\n\tat org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)\r\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:895)\r\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:800)\r\n\tat org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)\r\n\tat org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1038)\r\n\tat org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942)\r\n\tat org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:998)\r\n\tat org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:901)\r\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:660)\r\n\tat org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:875)\r\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:741)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)\r\n\tat org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)\r\n\tat org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)\r\n\tat org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)\r\n\tat org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:92)\r\n\tat org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)\r\n\tat org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:93)\r\n\tat org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)\r\n\tat org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)\r\n\tat org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)\r\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)\r\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)\r\n\tat org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:490)\r\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)\r\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)\r\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)\r\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)\r\n\tat org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408)\r\n\tat org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)\r\n\tat org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:770)\r\n\tat org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1415)\r\n\tat org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)\r\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\r\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\r\n\tat org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)\r\n\tat java.lang.Thread.run(Thread.java:748)\r\n","path":"/storage/uploadFile"}
I too had this problem and I solved it by directly uploading the file as stream to the bucket.
In case you are using java servlet , below line of code is to read the video from form-data.
Part filePart = request.getPart("video");
//Now convert it to inputstream
InputStream in = request.getPart("video").getInputStream();
Now I made one S3Class, below is the part of code from the function.
try {
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.withRegion(clientRegion)
.withCredentials(DefaultAWSCredentialsProviderChain.getInstance())
.build();
ObjectMetadata meta = new ObjectMetadata();
meta.setContentLength(in.available());
meta.setContentType("video/mp4");
TransferManager tm = TransferManagerBuilder.standard()
.withS3Client(s3Client)
.build();
PutObjectRequest request = new PutObjectRequest(bucketName, keyName, in, meta);
Upload upload = tm.upload(request);
// Optionally, you can wait for the upload to finish before continuing.
upload.waitForCompletion();
if(upload.isDone())
{
System.out.println("Total bytes transferred is : "+totalBytesTransferred);
tm.shutdownNow();
}
} catch (AmazonServiceException e) {
// The call was transmitted successfully, but Amazon S3 couldn't process
// it, so it returned an error response.
output = "Couldn't Upload the file.";
output_code = 0;
System.out.println("Inside exception");
e.printStackTrace();
} catch (SdkClientException e) {
// Amazon S3 couldn't be contacted for a response, or the client
// couldn't parse the response from Amazon S3.
output = "Couldn't Upload the file.";
output_code = 0;
e.printStackTrace();
}
P.S. Don't forget to add setContentLength() to ObjectMetaData else it will lead to OOM error.

Accessing specified key from s3 bucket?

I have a S3 bucket xxx. I wrote one lambda function to access data from s3 bucket and writing those details to a RDS PostgreSQL instance. I can do it with my code. I added one trigger to the lambda function for invoking the same when a file falls on s3.
But from my code I can only read file having name 'sampleData.csv'. consider my code given below
public class LambdaFunctionHandler implements RequestHandler<S3Event, String> {
private AmazonS3 s3 = AmazonS3ClientBuilder.standard().build();
public LambdaFunctionHandler() {}
// Test purpose only.
LambdaFunctionHandler(AmazonS3 s3) {
this.s3 = s3;
}
#Override
public String handleRequest(S3Event event, Context context) {
context.getLogger().log("Received event: " + event);
String bucket = "xxx";
String key = "SampleData.csv";
System.out.println(key);
try {
S3Object response = s3.getObject(new GetObjectRequest(bucket, key));
String contentType = response.getObjectMetadata().getContentType();
context.getLogger().log("CONTENT TYPE: " + contentType);
// Read the source file as text
AmazonS3 s3Client = new AmazonS3Client();
String body = s3Client.getObjectAsString(bucket, key);
System.out.println("Body: " + body);
System.out.println();
System.out.println("Reading as stream.....");
System.out.println();
BufferedReader br = new BufferedReader(new InputStreamReader(response.getObjectContent()));
// just saving the excel sheet data to the DataBase
String csvOutput;
try {
Class.forName("org.postgresql.Driver");
Connection con = DriverManager.getConnection("jdbc:postgresql://ENDPOINT:5432/DBNAME","USER", "PASSWORD");
System.out.println("Connected");
// Checking EOF
while ((csvOutput = br.readLine()) != null) {
String[] str = csvOutput.split(",");
String name = str[1];
String query = "insert into schema.tablename(name) values('"+name+"')";
Statement statement = con.createStatement();
statement.executeUpdate(query);
}
System.out.println("Inserted Successfully!!!");
}catch (Exception ase) {
context.getLogger().log(String.format(
"Error getting object %s from bucket %s. Make sure they exist and"
+ " your bucket is in the same region as this function.", key, bucket));
// throw ase;
}
return contentType;
} catch (Exception e) {
e.printStackTrace();
context.getLogger().log(String.format(
"Error getting object %s from bucket %s. Make sure they exist and"
+ " your bucket is in the same region as this function.", key, bucket));
throw e;
}
}
From my code you can see that I mentioned key="SampleData.csv"; is there any way to get the key inside a bucket without specifying a specific file name?
These couple of links would be of help.
http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingJava.html
You can list objects using prefix and delimiter to find the key you are looking for without passing a specific filename.
If you need to get the event details on S3, you can actually enable the s3 event notifier to lambda function. Refer the link
You can enable this by,
Click on 'Properties' inside your bucket
Click on 'Events '
Click 'Add notification'
Give a name and select the type of event (eg. Put, delete etc.)
Give prefix and suffix if necessary or else leave blank which consider all events
Then 'Sent to' Lambda function and provide the Lambda ARN.
Now the event details will be sent lambda function as a json format. You can fetch the details from that json. The input will be like this:
{"Records":[{"eventVersion":"2.0","eventSource":"aws:s3","awsRegion":"ap-south-1","eventTime":"2017-11-23T09:25:54.845Z","eventName":"ObjectRemoved:Delete","userIdentity":{"principalId":"AWS:AIDAJASDFGZTLA6UZ7YAK"},"requestParameters":{"sourceIPAddress":"52.95.72.70"},"responseElements":{"x-amz-request-id":"A235BER45D4974E","x-amz-id-2":"glUK9ZyNDCjMQrgjFGH0t7Dz19eBrJeIbTCBNI+Pe9tQugeHk88zHOY90DEBcVgruB9BdU0vV8="},"s3":{"s3SchemaVersion":"1.0","configurationId":"sns","bucket":{"name":"example-bucket1","ownerIdentity":{"principalId":"AQFXV36adJU8"},"arn":"arn:aws:s3:::example-bucket1"},"object":{"key":"SampleData.csv","sequencer":"005A169422CA7CDF66"}}}]}
You can access the key as objectname = event['Records'][0]['s3']['object']['key'](Oops, this is for python)
and then sent this info to RDS.

AWS S3 returns 404 for a file that definitely still exists there

We have some code that downloads a bunch of S3 files to a local directory. The list of files to retrieve is from a query we run. It only lists files that actually exist in our S3 bucket.
As we loop to retrieve these files, about 10% of them return a 404 error as if the file doesn't exist. I log out the name/location of that file, so I can go to S3 and check, and sure enough every single one of the IS ON S3 in the location we went looking for it.
Why does S3 throw a 404 when the file exists?
Here is the Groovy code of the script.
class RetrieveS3FilesFromCSVLoader implements Loader {
private static String missingFilesFile = "00-MISSED_FILES.csv"
private static String csvFileName = "/csv/s3file2.csv"
private static String saveFilesToLocation = "/tmp/retrieve/"
public static final char SEPARATOR = ','
#Autowired
DocumentFileService documentFileService
private void readWithCommaSeparatorSQL() {
int counter = 0
String fileName
String fileLocation
File missedFiles = new File(saveFilesToLocation + missingFilesFile)
PrintWriter writer = new PrintWriter(missedFiles)
File fileCSV = new File(getClass().getResource(csvFileName).toURI())
fileCSV.splitEachLine(SEPARATOR as String) { nextLine ->
//if (counter < 15) {
if (nextLine != null && (nextLine[0] != 'FileLocation')) {
counter++
try {
//Remove 0, only if client number start with "0".
fileLocation = nextLine[0].trim()
byte[] fileBytes = documentFileService.getFile(fileLocation)
if (fileBytes != null) {
fileName = fileLocation.substring(fileLocation.indexOf("/") + 1, fileLocation.length())
File file = new File(saveFilesToLocation + fileName)
file.withOutputStream {
it.write fileBytes
}
println "$counter) Wrote file ${fileLocation} to ${saveFilesToLocation + fileLocation}"
} else {
println "$counter) UNABLE TO RETRIEVE FILE ELSE: $fileLocation"
writer.println(fileLocation)
}
} catch (Exception e) {
println "$counter) UNABLE TO RETRIEVE FILE: $fileLocation"
println(e.getMessage())
writer.println(fileLocation)
}
} else {
counter++;
}
//}
}
writer.close()
}
Here is the code for getFile(fileLocation) and client creation.
public byte[] getFile(String filename) throws IOException {
AmazonS3Client s3Client = connectToAmazonS3Service();
S3Object object = s3Client.getObject(S3_BUCKET_NAME, filename);
if(object == null) {
return null;
}
byte[] fileAsArray = IOUtils.toByteArray(object.getObjectContent());
object.close();
return fileAsArray;
}
/**
* Connects to Amazon S3
*
* #return instance of AmazonS3Client
*/
private AmazonS3Client connectToAmazonS3Service() {
AWSCredentials credentials;
try {
credentials = new BasicAWSCredentials(S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY);
} catch (Exception e) {
throw new AmazonClientException(
"Cannot load the credentials from the credential profiles file. " +
"Please make sure that your credentials file is at the correct " +
"location (~/.aws/credentials), and is in valid format.",
e);
}
AmazonS3Client s3 = new AmazonS3Client(credentials);
Region usWest2 = Region.getRegion(Regions.US_EAST_1);
s3.setRegion(usWest2);
return s3;
}
The code above works for 90% of the files in the list passed to the script, but we know with fact that all 100% of the files exist in S3 and with the location String we are passing.
I am just an idiot. Thought it had the production AWS credentials in the properties file. Instead it was development credentials. So I had the wrong credentials.