Alright, so I have the file transfer part working, but what I'm dealing with is on a huge scale (100s of thousands of potential uploads); so what I'm trying to do is this:
Trigger lambda to move source uploaded object to a new location
Location to be a named key that includes the objects name (in a different bucket)
I have it moving the files from one s3 bucket to another, i just can't figure out how to get it to create a new key in my destination bucket based on the name of the uploaded file.
Example: uploaded file : grandkids.jpg -> lambda put trigger moves file to /grandkids/grandkids.jpg
Thank you all in advance (It doesn't help that I only know the little bit of nodejs/python due to lambda, I am not an experienced coder at all)
You just want to split the filename and use that as the prefix, like below.
fn = 'grandkids.jpg'
folder = fn.split('.')[0]
newkey = folder + '/' + fn
print(newkey)
grandkids/grandkids.jpg
But what if you have a filename with more than one '.'? Use rsplit and '1' to only split on the farthest right '.'
fn = 'my.awesome.grandkids.jpg'
folder = fn.rsplit('.', 1)[0].replace('.', '_') #personal preference to use underscores in folder names
newkey = folder + '/' + fn
print(newkey)
my_awesome_grandkids/my.awesome.grandkids.jpg
Related
I have a S3 Bucket Streaming logs to a lambda function that tags files based on some logic.
While I have worked around this issue in the past and I understand there are some characters that need to be handled I'm wondering if there is a safe way to handle this with some API or is it something I need to handle on my own.
For example I have a lambda function like so:
import boto3
def lambda_handler(event, context):
s3 = boto3.client("s3")
for record in event["Records"]:
bucket = record["s3"]["bucket"]["name"]
objectName = record["s3"]["object"]["key"]
tags = []
if "Pizza" in objectName:
tags.append({"Key" : "Project", "Value" : "Great"})
if "Hamburger" in objectName:
tags.append({"Key" : "Project", "Value" : "Good"})
if "Liver" in objectName:
tags.append({"Key" : "Project", "Value" : "Yuck"})
s3.put_object_tagging(
Bucket=bucket,
Key=objectName,
Tagging={
"TagSet" : tags
}
)
return {
'statusCode': 200,
}
This code works great. I upload a file to s3 called Pizza-Is-Better-Than-Liver.txt then the function runs and tags the file with both Great and Yuck (sorry for the strained example).
However If I upload the file Pizza Is+AmazeBalls.txt things go sideways:
Looking at the event in CloudWatch the object key shows as: Pizza+Is%2BAmazeBalls.txt.
Obviously the space is escaped to a + and the + to a %2B when I pass that key to put_object_tagging() it fails with a NoSuchKey Error.
My question; is there a defined way to deal with escaped characters in boto3 or some other sdk, or do I just need to do it myself? I really don't and to add any modules to the function and I could just use do a contains / replace(), but it's odd I would get something back that I can't immediately use without some transformation.
I'm not uploading the files and can't mandate what they call things (i-have-tried-but-it-fails), if it's a valid Windows or Mac filename it should work (I get that is a whole other issue but I can deal with that).
EDIT:
So after some comments on the GitHub I should have been using urllib.parse.unquote_plus in this situation. this would be the proper way to solve escaping issues like this.
from urllib.parse import unquote_plus
print(unquote_plus("Pizza+Is%2BAmazeBalls.txt"))
# Pizza Is+AmazeBalls.txt
Original Answer:
Since no other answers I guess I post my bandaid:
def format_path(path):
path = path.replace("+", " ")
path = path.replace("%21", "!")
path = path.replace("%24", "$")
path = path.replace("%26", "&")
path = path.replace("%27", "'")
path = path.replace("%28", "(")
path = path.replace("%29", ")")
path = path.replace("%2B", "+")
path = path.replace("%40", "#")
path = path.replace("%3A", ":")
path = path.replace("%3B", ";")
path = path.replace("%2C", ",")
path = path.replace("%3D", "=")
path = path.replace("%3F", "?")
return path
I'm sure there is a simpler, more complete way to do this but this seems to work... for now.
I created a Google Form with a linked Google Spreadsheet. I would like that everytime someone submits the form, the spreadsheet is copied to an s3 bucket in AWS. To do so, I just got started with Google Scripts. I managed to get the trigger part working on form submit but I am struggling to understand the readme of this GitHub project to upload to s3.
function setUpTrigger() {
ScriptApp.newTrigger('copyDataS3')
.forForm('1SK-2Ow63vs_TaoF54UjSgn35FL7F8_ANHDTOOiTabMM')
.onFormSubmit()
.create();
}
function copyDataS3() {
// https://github.com/viuinsight/google-apps-script-for-aws
// I do not understand where should I place aws.js and util.js.
// Should I do File -> New -> Script file and copy paste the contents? Should the file be .js or .gs?
S3.init("MY_ACCESS_KEY", "MY_SECRET_KEY");
// if I wanwt to copy an spreadsheet with the following id, what should go into "object" below?
var ssID = "SPREADSHEET_ID";
S3.putObject(bucketName, objectName, object, region)
}
I believe your goal as follows.
You want to send Google Spreadsheet to s3 bucket as a CSV data using Google Apps Script.
Modification points:
When I saw google-apps-script-for-aws of the library you are using, I noticed that the data is requested as the string. I thought that in this case, your CSV data might be able to be directly sent. But for example, when you want to sent a binary data, it will occur an error. So in this answer, I would like to propose the modified script of 2 patterns.
I thought that the situation might similar to this thread. But I noticed that you are using the different library from the thread. So I post this answer.
Pattern 1:
In this pattern, it supposes that only the text data is sent. It's like the CSV data in your replying. In this case, I think that it is not required to modify the library.
Modified script:
S3.init("MY_ACCESS_KEY", "MY_SECRET_KEY"); // Please set this.
var spreadsheetId = "###"; // Please set the Spreadsheet ID.
var sheetName = "Sheet1"; // Please set the sheet name.
var region = "###"; // Please set this.
var csv = SpreadsheetApp
.openById(spreadsheetId)
.getSheetByName(sheetName)
.getDataRange()
.getValues() // or .getDisplayValues()
.map(r => r.join(","))
.join("\n");
var blob = Utilities.newBlob(csv, MimeType.CSV, sheetName + ".csv");
S3.putObject("bucketName", "test.csv", blob, region);
Pattern 2:
In this pattern, it supposes that both the text data and binary data are sent. In this case, it is required to also modify the library side.
For google-apps-script-for-aws
Please modify the line 110 in s3.js as follows.
From:
var content = object.getDataAsString();
To:
var content = object.getBytes();
And, please modify the line 146 in s3.js as follows.
From:
Utilities.DigestAlgorithm.MD5, content, Utilities.Charset.UTF_8));
To:
Utilities.DigestAlgorithm.MD5, content));
For Google Apps Script:
In this case, please give the blob to S3.putObject as follows.
Script:
S3.init("MY_ACCESS_KEY", "MY_SECRET_KEY"); // Please set this.
var fileId = "###"; // Please set the file ID.
var region = "###"; // Please set this.
var blob = DriveApp.getFileById(fileId).getBlob();
S3.putObject("bucketName", blob.getName(), blob, region);
References:
viuinsight/google-apps-script-for-aws
Class UrlFetchApp
computeDigest(algorithm, value)
PutObject
I am a newbie in PowerBi and currently working on a POC where I need to load data from a folder or directory. Before this load, I need to check if
1) the respective folder exists
2) the file under the folder is with.csv extension.
Ex. Let suppose we have a file '/MyDoc2004/myAction.csv'.
Here first we need to check if MyDoc2004 exists and then if myAction file is with.csv extension.
Is there any way we can do this using Power Query?
1. Check if the folder exists
You can apply Folder.Contents function with the absolute path of the folder, and handle the error returned when the folder does not exist with try ... otherwise ... syntax.
let
absoluteFolderPath = "C:/folder/that/may/not/exist",
folderContentsOrError = Folder.Contents(absoluteFolderPath),
alternativeResult = """" & absoluteFolderPath & """ is not a valid folder path",
result = try folderContentsOrError otherwise alternativeResult
in
result
2. Check if the file is with .csv extension
I'm not sure what output you are expecting.
Here is a way to get the content of the file by full path including ".csv", or return an alternative result if not found.
let
absoluteFilePath = "C:/the/path/myAction.csv",
fileContentsOrError = File.Contents(absoluteFilePath),
alternativeResult = """" & absoluteFilePath & """ is not a valid file path",
result = try fileContentsOrError otherwise alternativeResult
in
result
If this is not what you are looking for, please update the question with the expected output.
Hope it helps.
Using RegExp to search a folder for file types held in an Array.
How can I show a fail if the folder contains more than one different file type but it is perfectly OK to have multiple of the same file type in the folder.
Here's my code:
var AllowedFileTypes = ["orf", "tif", "tiff", "jpg", "jpeg"];
var regex = new RegExp('.+\.(?:' + AllowedFileTypes.join('|')+ ')$','i');
fileList = inputFolder.getFiles(regex);
Thanks
I use Paperclip 4.0.2 and in my app to upload pictures.
So my Document model has an attached_file called attachment
The attachment has few styles, say :medium, :thumb, :facebook
In my model, I stop the styles processing, and I extracted it inside a background job.
class Document < ActiveRecord::Base
# stop paperclip styles generation
before_post_process
false
end
But the :original style file is still uploaded!
I would like to know if it's possible to stop this behavior and copy the file inside the :original/filename.jpg from a remote directory
My goal being to use a file that has been uploaded in a S3 /temp/ directory with jQuery File upload, and copy it to the directory where Paperclip needs it to generate the others styles.
Thank you in advance for your help!
New Answer:
paperclip attachments get uploaded in the flush_writes method which, for your purposes, is part of the Paperclip::Storage::S3 module. The line which is responsible for the uploading is:
s3_object(style).write(file, write_options)
So, by means of monkey_patch, you can change this to something like:
s3_object(style).write(file, write_options) unless style.to_s == "original" and #queued_for_write[:your_processed_style].present?
EDIT: this would be accomplished by creating the following file: config/initializers/decorators/paperclip.rb
Paperclip::Storage::S3.class_eval do
def flush_writes #:nodoc:
#queued_for_write.each do |style, file|
retries = 0
begin
log("saving #{path(style)}")
acl = #s3_permissions[style] || #s3_permissions[:default]
acl = acl.call(self, style) if acl.respond_to?(:call)
write_options = {
:content_type => file.content_type,
:acl => acl
}
# add storage class for this style if defined
storage_class = s3_storage_class(style)
write_options.merge!(:storage_class => storage_class) if storage_class
if #s3_server_side_encryption
write_options[:server_side_encryption] = #s3_server_side_encryption
end
style_specific_options = styles[style]
if style_specific_options
merge_s3_headers( style_specific_options[:s3_headers], #s3_headers, #s3_metadata) if style_specific_options[:s3_headers]
#s3_metadata.merge!(style_specific_options[:s3_metadata]) if style_specific_options[:s3_metadata]
end
write_options[:metadata] = #s3_metadata unless #s3_metadata.empty?
write_options.merge!(#s3_headers)
s3_object(style).write(file, write_options) unless style.to_s == "original" and #queued_for_write[:your_processed_style].present?
rescue AWS::S3::Errors::NoSuchBucket
create_bucket
retry
rescue AWS::S3::Errors::SlowDown
retries += 1
if retries <= 5
sleep((2 ** retries) * 0.5)
retry
else
raise
end
ensure
file.rewind
end
end
after_flush_writes # allows attachment to clean up temp files
#queued_for_write = {}
end
end
now the original does not get uploaded. You could then add some lines, like those of my origninal answer below, to your model if you wish to transfer the original to its appropriate final location if it was uploaded to s3 directly.
Original Answer:
perhaps something like this placed in your model executed with the after_create callback:
paperclip_file_path = "relative/final/destination/file.jpg"
s3.buckets[BUCKET_NAME].objects[paperclip_file_path].copy_from(relative/temp/location/file.jpg)
thanks to https://github.com/uberllama