What is the expectation of privacy on an EC2 instance? - amazon-web-services

If I turn on a machine in EC2, what expectation of privacy do I have for my running processes, command line history, data stored on ephemeral disk, etc?
Can people at Amazon decide to take a look at what I'm running?
Could Amazon decide to do some profiling for the purposes of upselling?
Hi there! Looks like you're running Cassandra! Here's the optimal
tuning requirements for Cassandra on your m1.xlarge machine!
I can't seem to find anything in the docs...

This is the most applicable thing I found:
AWS only uses each a customer's content to provide the AWS services
selected by that customer and does not use customer content for any
other purposes. AWS treats all customer content the same and has no
insight into what type of content the customer chooses to store in
AWS. AWS simply makes available the compute, storage, database,
mobile, and network services selected by the customer. AWS does not
require access to customer content to provide its services.
http://aws.amazon.com/compliance/data-privacy-faq/

What you are asking about should be addressed in their "Data Privacy" policy (http://aws.amazon.com/agreement/) in their Customer Agreement page:
3.2 Data Privacy. We participate in the safe harbor programs described in the Privacy Policy. You may specify the AWS regions in which Your
Content will be stored and accessible by End Users. We will not move
Your Content from your selected AWS regions without notifying you,
unless required to comply with the law or requests of governmental
entities. You consent to our collection, use and disclosure of
information associated with the Service Offerings in accordance with
our Privacy Policy, and to the processing of Your Content in, and the
transfer of Your Content into, the AWS regions you select.
Here's a link to their "Privacy Policy":
http://aws.amazon.com/privacy/
So in essence, it's saying that you need to consent for them to gather information stored in your server. Now that's different from poking at the TCP ports on your machines from the outside. Amazon constantly runs port checking and traffic checking from the outside (it could be in their intranet too) to make sure you are complying with their customer agreement. For example, they can monitor that you are not hosting something illegal (through public content) or that you are not sending spam or robot traffic to hack into other servers.
Having said that, it's quite possible that they use some of these monitoring tools to check: ok this person has port so and so open. So he/she must be running this application and we can suggest something better for them.
Hope it helps.

Related

Usage monitoring from whitelisted IPs

I need to setup a shared processing service that uses a load balancer and several EC2 instances to process incoming requests using a custom .NET application. My issue is that I need to be able to bill based on usage. Only white-listed IPs will be able to call the application, but each IP only gets a set number of calls before each call is a billable event.
Since the AWS documentation for the ELB states "We recommend that you use access logs to understand the nature of the requests, not as a complete accounting of all requests", I do not feel the Access Logs on the ELB is what I'm looking for.
The question I have is how to best manage this so that the accounting team has an easy report each month that says how many calls each client made.
Actually you can use Access logs and since access logs will be written to S3, you can query each IP with Athena by using standard SQL. You can analyze your logs and extract reports.
References:
https://docs.aws.amazon.com/athena/latest/ug/what-is.html
https://aws.amazon.com/premiumsupport/knowledge-center/athena-analyze-access-logs/

AWS load balancer log analyzer

I'm new to AWS wolrd. My purpose is to find as soon as possible in case of problems using Elastic Load Balancer logs top ips from requests, if possible who they are or some inspection on it. I only found paid services. Does anyone know a free application or maybe a website that analyzes AWS ELB logs?
Completely free solution isn't available as I know. Btw, there are cheap solutions.
You can monitor your load balancer by "Access logs", "CloudWatch metrics", "Request tracing" and "CloudTrail logs".
I don't understand exactly what you want, but there are some possible solutions.
If you're afraid of being attacked and you need immediate protection (against security scans, DDoS etc), you can use AWS's own services. "AWS Shield Standard" is automatically included at no extra cost. Btw, "For added protection against DDoS attacks, AWS offers AWS Shield Advanced". https://docs.aws.amazon.com/shield/
WAF is also good against attacks. You can create rules, rule-actions etc. Sadly it's not completely free. It runs "pay-as-you-use" style. https://aws.amazon.com/waf/pricing/
you can store the access log in S3 and analyse it later, but this can be costly in the end (and it's not real time)
you can analyse your log records with Lambda function. In this case, you need to use some NoSQL or something to store states or logics. (Lambda and DynamoDB is "pay-as-you-use" style and cheap, but not for free)
Keep in mind that:
The load balancer and lambda also increments the corresponding CloudWatch metric (it's cheap, but not for free)
You will pay for the outgoing data transfer. I mean from AWS to internet 1TB/month/account is always free (through CloudFront): https://aws.amazon.com/free/
you should use AWS's own services if you want a cheap and good solution
Elastic Load Balancing provides access logs that capture detailed information about requests sent to your load balancer. Each log contains information such as the time the request was received, the client's IP address, latencies, request paths, and server responses.
But keep in mind that access logging is an optional feature of Elastic Load Balancing that is disabled by default. After you enable access logging for your load balancer, Elastic Load Balancing captures the logs and stores them in the Amazon S3 bucket that you specify as compressed files. You can disable access logging at any time.
There are many complex and paid application that returns information regard access log but i advise you a simple, easy to use website that i use when i want to see top requester on our load balancer.
Website is https://vegalog.net
You shoud only upload your log file taken from S3 bucket and it returns to you a report with top requester, who they are (using whois function), response time and other useful informations.

Is Redshift/S3 Data co-mingled?

I am working on moving our business needs into the cloud, namely using AWS Simple Storage Service and Amazon Redshift. Because we are working with sensitive data we have some client concerns to consider. One question we will have to answer is whether or not the data in S3/Redshift is co-mingled with other data and show evidence that it is isolated.
While researching I found information about EC2 instances being shared on the same server unless the instance is specified as a dedicated instance. However, I been totally unable to find anything similar regarding other AWS services. Is the data in S3 and Redshift co-mingled as well?
Everything on cloud is co-mingled but with security boundaries unless you pay more to get dedicated service (like dedicated EC2 hosts) in which case you should stick with on-prem.
Your concerns of co-mingling falls under Shared Responsiblity Model where AWS is responsible to make sure your data is not accessible by other services running on their hosts unless you open up the access.
Read this article on how Shared Responsibility Model works
https://aws.amazon.com/compliance/shared-responsibility-model/
Or this whitepaper
https://d0.awsstatic.com/whitepapers/Security/AWS_Security_Best_Practices.pdf

How to measure speed from AWS regions to specific location (not mine)?

I'm looking for a way to pick the best AWS region to host a Proof of Concept installation for a potential customer in India.
For this, I'd like to try to ping the customer's web site (I verified that it's hosted in India, I assume by the customer itself since that's part of their business) from multiple AWS regions and see which one gives best results.
I found multiple tools which would allow me to run ping from my own browser to multiple AWS locations (e.g. https://cloudharmony.com/speedtest, http://www.cloudping.info/) but none which will allow me to ping between all AWS regions and a specific third party.
Does such a tool exist, or is my only option to run up an EC2 instance in each region and try to ping from it?
You might want to check the answers to this very similar question.
Keep in mind that not all regions have all AWS services available at this time, so make sure the region you pick has all the services that you plan to use. Also, Amazon has said that an India region is in the works.

need some guidance on usage of Amazon AWS

every once in a while i read/hear about AWS and now i tried reading the docs.
But such docs seem to be written for people who already know which AWS they need to use and only search for how it can be used.
So, for myself, to understand AWS better i try to sketch a hypothetical Webapplication with a few questions.
The apps purpose is to modify content like videos or images. So a user has some kind of webinterface where he can upload his files, do some settings and a server grabs the file and modifies it (e.g. reencoding). The Service also extracts the audio track of a video and trys to index the spoken words so the customer can search within his videos. (well its just hypothetical)
So my questions:
given my own domain 'oneofmydomains.com' is it possible to host the complete webinterface on AWS? i thought about using GWT to create the interface and just deliver the JS/images via AWS, but which one, simple storage? what about some kind of index.html, is there an EC2 instance needed to host a webserver which has to run 24/7 causing costs?
now the user has the interface with a login form, is it possible to manage logins with an AWS? here i also think about an EC2 instance hosting a database, but it would also cause costs and im not sure if there is a better way?
the user has logged in and uploads a file. which storage solution could be used to save the customers original and modified content?
now the user wants to browse the status of his uploads, this means i need some kind of ACL, so that the customer only sees his own files. do i need to use a database (e.g. EC2) for this, or does amazon provide some kind of ACL, so the GWT webinterface will be secure without any EC2?
the customers files are reencoded and the audio track is indexed. so he wants to search for a video. Which service could be used to create and maintain the index for each customer?
hope someone can give a few answers so i understand AWS better on how one could use it
thx!
Amazon AWS offers a whole ecosystem of services which should cover all aspects of a given architecture, from hosting to data storage, or messaging, etc. Whether they're the best fit for purpose will have to be decided on a case by case basis. Seeing as your question is quite broad I'll just cover some of the basics of what AWS has to offer and what the different types of services are for:
EC2 (Elastic Cloud Computing)
Amazon's cloud solution, which is basically the same as older virtual machine technology but the 'cloud' offers additional knots and bots such as automated provisioning, scaling, billing etc.
you pay for what your use (by hour), for the basic (single CPU, 1.7GB ram) would prob cost you just under $3 a day if you run it 24/7 (on a windows instance that is)
there's a number of different OS to choose from including linux and windows, linux instances are cheaper to run without the license cost associated with windows
once you're set up the server to be the way you want, including any server updates/patches, you can create your own AMI (Amazon machine image) which you can then use to bring up another identical instance
however, if all your html are baked into the image it'll make updates difficult, so normal approach is to include a service (windows service for instance) which will pull the latest deployment package from a storage (see S3 later) service and update the site at start up and at intervals
there's the Elastic Load Balancer (which has its own cost but only one is needed in most cases) which you can put in front of all your web servers
there's also the Cloud Watch (again, extra cost) service which you can enable on a per instance basis to help you monitor the CPU, network in/out, etc. of your running instance
you can set up AutoScalers which can automatically bring up or terminate instances based on some metric, e.g. terminate 1 instance at a time if average CPU utilization is less than 50% for 5 mins, bring up 1 instance at a time if average CPU goes beyond 70% for 5 mins
you can use the instances as web servers, use them to run a DB, or a Memcache cluster, etc. choice is yours
typically, I wouldn't recommend having Amazon instances talk to a DB outside of Amazon because of the round trip is much longer, the usual approach is to use SimpleDB (see below) as the database
the AmazonSDK contains enough classes to help you write some custom monitor/scaling service if you ever need to, but the AWS console allows you to do most of your configuration anyway
SimpleDB
Amazon's non-relational, key-value data store, compared to a traditional database you tend to pay a penalty on per query performance but get high scalability without having to do any extra work.
you pay for usage, i.e. how much work it takes to execute your query
extremely scalable by default, Amazon scales up SimpleDB instances based on traffic without you having to do anything, AND any control for that matter
data are partitioned in to 'domains' (equivalent to a table in normal SQL DB)
data are non-relational, if you need a relational model then check out Amazon RDB, I don't have any experience with it so not the best person to comment on it..
you can execute SQL like query against the database still, usually through some plugin or tool, Amazon doesn't provide a front end for this at the moment
be aware of 'eventual consistency', data are duplicated on multiple instances after Amazon scales up your database, and synchronization is not guaranteed when you do an update so it's possible (though highly unlikely) to update some data then read it back straight away and get the old data back
there's 'Consistent Read' and 'Conditional Update' mechanisms available to guard against the eventual consistency problem, if you're developing in .Net, I suggest using SimpleSavant client to talk to SimpleDB
S3 (Simple Storage Service)
Amazon's storage service, again, extremely scalable, and safe too - when you save a file on S3 it's replicated across multiple nodes so you get some DR ability straight away.
you only pay for data transfer
files are stored against a key
you create 'buckets' to hold your files, and each bucket has a unique url (unique across all of Amazon, and therefore S3 accounts)
CloudBerry S3 Explorer is the best UI client I've used in Windows
using the AmazonSDK you can write your own repository layer which utilizes S3
Sorry if this is a bit long winded, but that's the 3 most popular web services that Amazon provides and should cover all the requirements you've mentioned. We've been using Amazon AWS for some time now and there's still some kinks and bugs there but it's generally moving forward and pretty stable.
One downside to using something like aws is being vendor locked-in, whilst you could run your services outside of amazon and in your own datacenter or moving files out of S3 (at a cost though), getting out of SimpleDB will likely to represent the bulk of the work during migration.