Which AWS Service to make videos webready? - amazon-web-services

I have users uploading videos in all kinds of formats, some not supported by chrome.
I just want to transcode all videos so they play in the browser, e.g. h.264.
I have looked at AWS Elastic Media Convert that does not have any documentation that is actually explaining how to go from zero to hero, let alone having meaningful templates for jobs.
Is it at all possible to do that with media convert and if not, what would be an appropriate service?

In general, yes. Existing AWS Blog articles describe workflows using S3 "watch folders" to convert new files as they arrive, using MediaConvert to covert files to a ubiquitous output format and deliver the converted output to a specified S3 location which can support web access both directly and as an origin for CDN distributions.
There are several considerations (for example: very long input files, exotic file formats, nature of content, etc) for which you may want to account. The design and testing of a multi-step workflow with error handling can be complex. If you wish to use outside expertise, you have at least three options:
[a] AWS Paid Professional Services -
There is a large global AWS ProServices team able to help via paid service engagements. The fastest way to start this dialog is by submitting the AWS Sales team 'contact me' form found at this link and specifying 'Sales Support' : https://aws.amazon.com/contact-us/
[b] AWS Certified Consulting Partners -- The partner search tool & listings are here: https://iq.aws.amazon.com/services/aws/medialive
[c] AWS Solutions Architects -- for AWS Enterprise Support customers. The TAM or the Sales contact linked in item [a] is the best way to engage them. Purchasing AWS Enterprise Support will entitle the customer to a dedicated TAM / SA combination.

Here is the documentation for AWS Elemental MediaConvert and following the getting started steps will get you a basic understanding of the service.
If there is any specific issue with the usage of the service, please create a new question.

Related

Google Cloud Vision - Which region does Google upload the images to?

I am building an OCR based solution to extract information from certain financial documents.
As per the regulation in my country (India), this data cannot leave India.
Is it possible to find the region where Google Cloud Vision servers are located?
Alternately, is it possible to restrict the serving region from the GCP console?
This is what I have tried:
I went through GCP Data Usage FAQ: https://cloud.google.com/vision/docs/data-usage
GCP Terms of Service:
https://cloud.google.com/terms/
(Look at point 1.4 Data Location on this page)
Talking to the GCP Sales rep. He did not know the answer.
I know that I can talk to Google support but that requires $100 to activate, which is a lot for for me.
Any help would be appreciated. I went through the documentation for Rekognition as well but it seems to send some data outside for training so not considering it at the moment.
PS - Edited to make things I have tried clearer.
For anyone looking at this topic recently, Google Vision has introduced multi-region support in December 2019, as can be see in their release notes.
Currently Google Vision supports 2 different processing regions: eu and us, and they say that using a specific endpoint guarantees that processing will only take place in the chosen territory.
The documentation for regionalization mentions that you can simply replace the default API endpoint vision.googleapis.com with either of the regional ones:
eu-vision.googleapis.com
us-vision.googleapis.com
The vision client libraries offer options for selecting the endpoint as well, and the documentation gives code samples.
For example, here is how you would do it in Python:
from google.cloud import vision
client_options = {'api_endpoint': 'eu-vision.googleapis.com'}
client = vision.ImageAnnotatorClient(client_options=client_options)
As pointed out by #Tedinoz in a comment above, the answer can be found here: https://groups.google.com/forum/#!topic/cloud-vision-discuss/at43gnChLNY
To summarise:
1. Google stores images uploaded to Cloud Vision only in memory
2. Data is not restricted to a particular region (as of Dec 6, 2018)
3. They might add data residency features in Q1, 2019.

Planning an architecture in GCP

I want to plan an architecture based on GCP cloud platform. Below are the subject areas what I have to cover. Can someone please help me to find out the proper services which will perform that operation?
Data ingestion (Batch, Real-time, Scheduler)
Data profiling
AI/ML based data processing
Analytical data processing
Elastic search
User interface
Batch and Real-time publish
Security
Logging/Audit
Monitoring
Code repository
If I am missing something which I have to take care then please add the same too.
GCP offers many products with functionality that can overlap partially. What product to use would depend on the more specific use case, and you can find an overview about it here.
That being said, an overall summary of the services you asked about would be:
1. Data ingestion (Batch, Real-time, Scheduler)
That will depend on where your data comes from, but the most common options are Dataflow (both for batch and streaming) and Pub/Sub for streaming messages.
2. Data profiling
Dataprep (which actually runs on top of Dataflow) can be used for data profiling, here is an overview of how you can do it.
3. AI/ML based data processing
For this, you have several options depending on your needs. For developers with limited machine learning expertise there is AutoML that allows to quickly train and deploy models. For more experienced data scientists there is ML Engine, that allows training and prediction of custom models made with frameworks like TensorFlow or scikit-learn.
Additionally, there are some pre-trained models for things like video analysis, computer vision, speech to text, speech synthesis, natural language processing or translation.
Plus, it’s even possible to perform some ML tasks in GCP’s data warehouse, BigQuery in SQL language.
4. Analytical data processing
Depending on your needs, you can use Dataproc, which is a managed Hadoop and Spark service, or Dataflow for stream and batch data processing.
BigQuery is also designed with analytical operations in mind.
5. Elastic search
There is no managed Elastic search service directly provided by GCP, but you can find several options on the marketplace, like an API service or a Kubernetes app for Google’s Kubernetes Engine.
6. User interface
If you are referring to a user interface for your own use, GCP’s console is what you’d be using. If you are referring to a UI for end-users, I’d suggest using App Engine.
If you are referring to a UI for data exploration, there is Datalab, which is essentially a managed notebook service, and Data Studio, where you can build plots of your data in real time.
7. Batch and Real-time publish
The publishing service in GCP, for both synchronous and asynchronous messages is Pub/Sub.
8. Security
Most security concerns in GCP are addressed here. Which is a wide topic by itself and should probably need a separate question.
9. Logging/Audit
GCP uses Stackdriver for logging of most of its products, and provides many ways to process and analyze those logs.
10. Monitoring
Stackdriver also has monitoring features.
11. Code repository
For this there is Cloud Source Repositories, which integrate with GCP’s automated build system and can also be easily synched with a Github repository.
12. Analytical data warehouse
You did not ask for this one, but I think it's an important part of a data analysis stack.
In the case of GCP, this would be BigQuery.

What is the best tool to use for real-time web statistics?

I operate a number of content websites that have several million user sessions and need a reliable way to monitor some real-time metrics on particular pieces of content (key metrics being: pageviews/unique pageviews over time, unique users, referrers).
The use case here is for the stats to be visible to authors/staff on the site, as well as to act as source data for real-time content popularity algorithms.
We already use Google Analytics, but this does not update quickly enough (4-24 hours depending on traffic volume). Google Analytics does offer a real-time reporting API, but this is currently in closed beta (I have requested access several times, but no joy yet).
New Relic appears to offer a few analytics products, but they are quite expensive ($149/500k pageviews - we have several times this).
Other answers I found on StackOverflow suggest building your own, but this was 3-5 years ago. Any ideas?
Heard some good things about Woopra and they offer 1.2m page views for the same price as Relic.
https://www.woopra.com/pricing/
If that's too expensive then it's live loading your logs and using an elastic search service to read them to get he data you want but you will need access to your logs whilst they are being written to.
A service like Loggly might suit you which would enable you to "live tail" your logs (view whilst being written) but again there is a cost to that.
Failing that you could do something yourself or get someone on freelancer to knock something up for you enabling logs to be read and displayed in a format you recognise.
https://www.portent.com/blog/analytics/how-to-read-a-web-site-log-file.htm
If the metrics that you need to track are just limited to the ones that you have listed (Page Views, Unique Users, Referrers) you may think of collecting the logs of your web servers and using a log analyzer.
There are several free tools available on the Internet to get real-time statistics out of those logs.
Take a look at www.elastic.co, for example.
Hope this helps!
Google Analytics offers real time data viewing now, if that's what you want?
https://support.google.com/analytics/answer/1638635?hl=en
I believe their API is now released as we are now looking at incorporating this!
If you have access to web server logs then you can actually set up Elastic Search as a search engine and along with log parser as Logstash and Kibana as Front end tool for analyzing the data.
For more information: please go through the elastic search link.
Elasticsearch weblink

building an amazon store with drupal 7

I've been playing around with an idea for an amazon store with Drupal 7. I do a lot of product reviews, and I typically link to amazon pages already (without referrer IDs, since I wanted toa void any questions of integrity all together), but having a seperate storefront link, well I'm playing with the idea.
I'm using Drupal 7, and I installed the Amazon API and Amazon Store module. It uses an Amazon AWS account and amazon associates ID. Basically creates a light storefront that does all the lifting through amazon itself. It only even uses Amazon items, which is fine since what isn't on Amazon, and only gives you a referral payout.
Well, what I'd love to do is have a stronger control over the items in the store. The Amazon Store module just gives you the option to control the basic items that are visible upon loading.
What I'd like to do: Create a store where categories match the contents of my site, and disable the search options. Is this possible with these modules? Does anyone have advice on creating something like this?
please see the below module and I hope it will be handy
https://drupal.org/project/amazon_store

Are there monitor tools for AWS S3 and CloudFront

I am using the amazon services S3 and CloudFront for a web application and I would like to have various statistics about accessing the data that I am providing through the logs of those services (there is logging activated in both services).
I did a bit of googling and the only thing I could find is how to manage my S3 storage. I also noticed that newrelic offers monitoring for many amazon services but not for those 2.
Is there something that you use? A service that could read my logs periodically and provide me with some nice analytics that would make developers and managers happy?
e.g.
I am trying to avoid writing my own log parsers.
I believe Piwik supports the Amazon S3 log format. Take a look at their demo site to see some example reports.
Well, this may not be what you expect but I use qloudstat for my cloudfront distributions.
The $5 plan covers my needs thats less than a burrito here where I live.
Best regards.
Well, we have a SaaS product Cloudlytics which offers you many reports including, Geo, IP tracking, SPAM, CloudFront cost analysis. You can try it for free for upto 25 MB of logs.
I might be answering this very late. But I have worked on a golang library that can run analysis of CDN and S3 usages and store them in a backend of your choice varying from influxdb, MongoDB or Cassandra for later time series evaluations. The project is hosted at http://github.com/meson10/cdnlysis
See if this fits.
Popular 3rd party analytics packages include S3stat, Cloudlytics and Qloudstat. They all run around $10/month for low traffic sites.
Several stand-alone analytics packages support Amazon's logfile format if you want to download logs each night and feed them in directly. Others might need pre-processing to transform to Combined Logfile Format (CLF) first.
I've written about how to do that here:
https://www.expatsoftware.com/articles/2007/11/roll-your-own-web-stats-for-amazon-s3.html