Best way to public download a full folder with amazon? - amazon-web-services

I'm doing a launcher (in C#) that downloads a full game or app. The app can be very large (i.e. 5GB) and I need to get it with the correct folder hierarchhy, so the same launcher can check if the user has the correct app or it needs to be repaired or updated.
I'm trying to do that with amazon s3 and CloudFront, but seems that I can only get objects and not the full folder of the app.
I also have stored the folder in an EC2, and that works fine, but seems that EC2 is not designed for that, so downloads are extremely slow.
Is there any amazon service to do that?

Have you considered zipping the files first? It solves alot of issues eg folder structure, compression and works great from s3 and cloud front. Its a common solution for this use case.

You can do this in your application with the DownlodDirectory method in TransferUtility class in the .NET SDK.
You can read more about the DownloadDirectory method here. By default I believe it only downloads objects in the root path, so don’t forget to do it recursively for sub-folders if necessary.

Related

AEM/Adobe Experience Manager upload only some assets to AWS S3

My company is using AEM 6.5 and we were thinking to get some better performance out of our systems.
The idea we had is to upload only some assets (for example videos) to an S3 bucket and keep the other assets locally, we do not want to upload all the assets/datastore to S3. I know I can switch the datastore to S3, but that would mean all the assets go to S3, and we don't want this.
Restriction: we want the video upload to be done seamlessly from within the AEM Author, the editor should upload the video normally and somehow, behind the scenes, this transition to S3 to happen.
I checked as much documentation as I could find, and there is no mention of this partial asset upload to S3, you either go full S3 or nothing at all (we already tested full S3 datastore, it's working, but we do not want it).
So, my question is: did someone manage to do something like this?
Thanks
Have you looked into writing an Adobe Experience Manager workflow that would then read a list of assets to upload and then only update those specified assets. You could control which assets are uploaded to an Amazon S3 bucket before running the AEM workflow.
You can create a custom workflow step as discussed here. However in your use case - you would use the S3 Java API to create a custom workflow step. This is one way you can control which assets are uploaded to an Amazon S3 bucket from AEM.
https://helpx.adobe.com/experience-manager/using/message_service_gateway_api_64.html
Technically, it is possible to upload assets to S3, when they are uploaded to AEM instead of storing them in JCR. Nevertheless, this probably won't work as you expect and would require a lot of refactoring of AEM itself to make it work properly.
Just because the binary is stored in S3, does not mean that AEMs internals are aware of that and can deal with it.
Take asset preview on the author for example: this part of AEM would expect the binary to be stored in JCR. Now you have to rewrite this whole part of AEM to go look for those assets in S3. This would be a massive headache, overlaying those parts of AEM are already deprecated etc. And this is just one example of hundreds, that you would need to find a solution for.
It is not worth the effort.
You probably need to go "all-in" with S3 or leave it as is. Not sure what the reasoning is behind this drive to only use S3 "partially" for videos instead of all assets. Videos are probably already the largest assets you have, so it can't be cost. We run pure asset installations with S3 datastore that have 20TB-60TB of data which is totally fine.

Azure DevOps Copying files to Google cloud failed

Currently we were using this extension (https://marketplace.visualstudio.com/items?itemName=GlobalFreightSolutionsLtd.copy-files-to-google-buckets) to copy files from Azure to google cloud bucket. After almost one year of use everything was perfect, until we got error:
UnhandledPromiseRejectionWarning: ResumableUploadError: A resumable upload could not be performed. The directory, C:\Users\VssAdministrator.config, is not writable. You may try another upload, this time setting options.resumable to false.
Maybe someone had similar problem and can help to solve it. Or we should contact product owner to solve the issue? Any other options/suggestions for uploading files are also acceptable. Thanks.
Judging by the error message its coming from the Node JS client library used to connect to GCP buckets, the extension is apparently built with this GCP client library, the suggestion mentioned on the GitHub would need to be implemented by the Developer of the extension, concretely it looks that it may be using the createWriteStream method which according to the doc:
Resumable uploads require write access to the $HOME directory. Through
config-store, some metadata is stored. By default, if the directory is
not writable, we will fall back to a simple upload. However, if you
explicitly request a resumable upload, and we cannot write to the
config directory, we will return a ResumableUploadError
I would suggest maybe contacting the extension publisher, or trying to disable resumable uploads within the extension options (if any)

Google Cloud Storage - files not showing

I have over 30 Leaflet maps hosted on my Google Cloud Platform bucket (for example) and it has always been an easy process to upload my folder (which includes an html file with sub-folders including .js and .css files) and share the map publicly.
I tried uploading another map today, but within the folder there are no files showing and I get the following message "There are no live objects in this folder. If you have object versioning enabled, this folder may contain archived versions of objects, which aren't visible in the console. You can list archived object versions using gsutil or the APIs."
Does anyone know what is going on here?
We have also seen this problem, and it seems that the issue is limited to buckets that have spaces in the name.
It's also not reproducible through the gcloud web console, but if you use gsutil to upload a file to a bucket with a space in the name then it won't be visible on the web UI.
I can see from your screenshot that your bucket also has spaces (%20 in the url).
If you need a workaround asap, you could rename your bucket...
But google should fix this soon, I hope.
There is currently open issue on GCS/Console integration
If files have any symbols that needs urlencoding - they are not visible in console - but accessible via gsutil/API (which is currently recommended as workaround)
Issue has been resolved as of 8-May-2018 10:00 UTC
This can happen if the file doesn't have an extension, the UI treats it as a folder and lets you navigate into it, showing a blank folder instead of the file contents.
We had the same symptom (files show up in API but invisible on the web and via CLI).
The issue turned out to be that we were saving files to "./uploads", which Google interprets as "create a directory literally called '.' and then a subdirectory called uploads."
The fix was to upload to "uploads/" instead of "./uploads". We also just ran a mass copy operation via the API for everything under "./uploads". All visible now!
I also had spaces in my url and it was not working properly yesterday. Checked this morning and everything is working as expected. I still have the spaces in my URL btw.

Is there an easier way to remove old files on servers through GitHub after deleting files and syncing locally?

I currently work with a web dev team and we have 100+ GitHub Repo's, each for a different e-commerce website that has an instance on AWS. The developers use the GitHub app to upload their changes to the servers, and do this multiple times a day.
I'm trying to find the easiest way for us to remove old, deleted files from our servers after we delete and sync GitHub locally.
To make it clear, say we have an index.html, page1.html and page2.html. We want to remove page1.html, so they delete page1.html and sync through the GitHub app. The file is no longer visibly in the repo, but for us to completely remove the file I must also SSH into our AWS server, go to the www directory and find page1.html and also remove it there. Is there an easier way for the developers, who do not use SSH and the command line, to get rid of those files in terms of syncing with GitHub? It becomes a pain to have to SSH into many different servers and then determining which files were removed from the repo so that I can remove them there.
Thanks in advance
Something we do with our repo is we use tags(releases) and then through automation (chef in our case) we tell it to pull the new tag. It sounds like this wouldn't necessarily work for you but what Chef actually does with the tag might be of interest
It pulls the tag and then updates a symlink (and graceful restarts Apache). This means there's 0 downtime (symlink updates instantly) and, because it's pulling a fresh copy, any deleted files are gone.

How can I copy clone/duplicate a folder on S3?

I want to make a copy of the folders and images on my s3 bucket for my development server. How can I do that?
I just wanted to write an updated answer here:
You can now use Amazon's AWS Management Console (under the S3 tab) to right click on any folder (or file) in Bucket A, click Copy, then navigate to Bucket B and right click and click Paste
This makes it extremely easy to copy the contents of your production bucket over to your dev bucket.
If you are using linux, and just want to drag copies down to the local filesystem, then you could use s3sync:
http://www.s3sync.net/wiki
If you wanted to access the files directly on s3, you could mount s3 as a fuse filesystem locally, but beware that accessing files using this method is dependent on your connection, so there could be speed issues. I've used s3fs perfectly well for accessing backups etc:
(can only post one link atm, but google s3fs - it's hosted on googlecode)
If you just need a copy, then s3sync is the easiest option.
Hope this helps.
I have to say, in conclusion, I recommend using a GUI. They've already laid out the work for you.
My best recommendation is Bucket Explorer ( works on all OS's)
Second runner up is CloudBerry ( only on PC's )
Bucket Explorer has a sweet very easy to understand GUI, and has a lot of great perks, analytics, and usability that outweighs all the others I experimented with.