I have two buckets with large quantity of pdf files. And, I wanted to make these searchable with file name and content after indexing all documents. I tried using the CloudSearch but it appeared to be useful for same data type. Please guide me how I can make documents searchable in amazon s3 bucket for a domain name or using any web browser.
CloudSearch can index PDFs. You can submit that data from S3 buckets using the AWS CLI or the web console. This functionality is documented here http://docs.aws.amazon.com/cloudsearch/latest/developerguide/uploading-data.html
If you want something automated, AWS Lambdas can monitor your buckets for changes and submit new documents for indexing.
Related
I am looking to create an AWS solution where a lambda function will transform some excel data from a S3 bucket. When thinking about how I'm going to create the architecture background, I need to think of a way where I can get non-technical users, which don't have access to the AWS account, to upload data into a S3 bucket. One possible solution is using an S3 API and creating a UI to allow the users to upload the data. However, I do not have much experience with front end programming skills such as JS and HTML. Are there any other possible solutions we can use?
I've thought about creating a simple UI and using a S3 API to ingest data into the bucket but I do not have front end programming experience.
Some ideas:
Serve a web page from somewhere (S3?) and have them upload via the browser, or
Give them a simple program like CyberDuck and they can drag & drop files, or
Setup the AWS Command-Line Interface (CLI) on their computer and have them double-click a script file to sync local disk folder to the s3 bucket, or
Use Dropbox and Invoke Lambda Function on New File from Dropbox - Pipedream
The main thing to think about is how you want to secure the upload. For example, do they need to authenticate first, or do you want anyone in the world to be able to upload to the bucket (not a good idea!). Should each user have their own IAM User credentials, or should they authenticate to an application that manages its own logins?
Also, have a think about what you want to happen after they upload the file. If they need to download something after the file has been processed, then you'll need a way to 'give back' the new file.
Use-case
We basically want to collect files from external customers into a file server.
We were thinking of using the S3 bucket as the file server that customers can interact with directly.
Question
Is it possible to accomplish this where we create a bucket for each customer, and he can be given a link to the S3 bucket that also serves as the UI for him to drag and drop his files into directly?
He shouldn't have to log-in to AWS or create an AWS account
He should directly interact with only his S3 bucket (drag-drop, add, delete files), there shouldn't be a way for him to check other buckets. We will probably create many S3 buckets for our customers in the same AWS account. His entry point into the S3 bucket UI is via a link (S3 bucket URL perhaps)
If such a thing is possible - would love some general pointers as to what more I should do (see my approach below)
My work so far
I've been able to create an S3 bucket - grant public access
Set policies to Get, List and PutObject into the S3 bucket.
I've been able to give public access to objects inside the bucket using their link, but never their bucket itself.
Is there something more I can build on or am I hitting a dead-end and this is not possible to accomplish?
P.S: This may not be a coding question, but maybe your answer could have code to accomplish it if at all possible, general pointers if possible
S3 presigned url can help in such cases, but you have write your own custom frontend application for drag and drop features.
Link: https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html
I am using RESTful API, API provider having images on S3 bucket more than 80GB size.
I need to download these images and upload in my AWS S3 bucket, its time taking job.
Is there any way to copy image from API to my S3 bucket instead of I download and upload again.
I talked with API support they saying you are getting image URL, so its up to you how you handle,
I am using laravel.
is it way to get the sourced images url's and directly move images to S3 instead of first I download and upload.
Thanks
I think downloading and re-uploading to different accounts would be inefficient plus pricey for the API Provider. Instead of that I would talk to the respective API Provider and try to replicate the images across accounts.
Post replicate you can Amazon S3 inventory for various information related to the objects in the bucket.
Configuring replication when the source and destination buckets are owned by different accounts
You want "S3 Batch Operations". Search for "xcopy".
You do not say how many images you have, but 1000 at 80GB is 80TB, and for that size you would not even want to be downloading to a temporary EC2 instance in the same region file by file which might be a one or two day option otherwise, you will still pay for ingress/egress.
I am sure AWS will do this in an ad-hoc manner for a price, as they would do if you were migrating from the platform.
It may also be easier to allow access to the original bucket from the alternative account, but this is no the question.
I am using Amazon Rekognition in a project. My requirement is to upload a set of products to the bucket initially and when a user uploads an image to my portal he/she should get matching(similar) image/images from my bucket as a result. Is this possible?
AWS Rekognition only supports a set of labels, Amazon keeps expanding it but they don't support user defined labels. Here is a snippet from their FAQ (source)
Q. I can’t find the label I need. How do I request a new label?
Please send us your requests through AWS Customer Support. Amazon
Rekognition continuously expands its catalog of labels based on
customer feedback.
I have a large bucket on AWS. The program I'm using EVS requires that all videos and photos must be inone bucket instead of creating new buckets.
So my bucket now has a crapload of stuff in there. Is there a way that I can just skip to the item I want in the bucket without having to scroll through the entire bucket.
thanks
If you're using the AWS web console, you can just start typing and it will filter the files in the bucket.