One of the great advantages of AWS Lambda functions is that we don't have to worry about deploying or managing servers. However in some regulated industries (healthcare, finance, etc) we have to worry about our data and code executing in a shared compute instance.
With that in mind, what kind of virtualization is used to stand up and execute Lambda functions?
Do they use Hyper-V (like EC2 instances), which enforces better tenant separation (hypervisor layer firewall is used to separate the physical and logical network interfaces and similar mechanisms are used to separate the physical RAM. Any logical disks presented to the instances are securely scrubbed by the hypervisor once unallocated and customers have the option to encrypt at-rest any data persisted in memory) OR do they use Docker containers to separate Lambdas?
Lambda uses Docker containers behind the scenes.
If you are wondering if you can use Lambda for healthcare or finance systems you are asking the wrong questions. You need to be familiar with the specific compliance standards you are required to meet, like HIPAA for healthcare and PCI for finance. Then you need to read the compliance related documentation published by Amazon. This is a good place to start For example at this time you are not able to use AWS Lambda to process health data covered by HIPAA.
Related
I am very new with AWS and wanted to clear my concept on AWS services. I have read that that AWS has plenty of services that can also be accessed through API. A service is basically a software program. Then why are services not available in all regions. If my customers are from India, I can buy the EC2 instance from Asia but why should I choose service from USA East. Again, why does AWS provide regions for End Points. They could have installed all the services in all their regions - assuming that they are only software programs and not hardware resources.
Latency is not a big problem for you, I think, you can choose the best price options for your sources. If latency big a problem, you must choose the region/zone near your target market. Better understanding read this doc.
AWS Services operate on multiple levels and are all exposed through APIs.
Some services operate at a global scope (e.g. Identity and Access Management or Route53), most on a regional level (e.g. S3) and others somewhere between the region and availability zone (EC2, RDS, VPC...).
AWS uses the concept of a region for multiple purposes, one of the major drivers being fault isolation. Something breaking in Ireland (eu-west-1) shouldn't stop a service in Frankfurt (eu-central-1) from operating. Latency is another driver here. Since physics is involved, higher distances also increase the latency, which makes things like replication more tricky. Data residency and other compliance aspects are also a good reason to compartmentalize services.
Services being regional results in their endpoints being regional as well.
As to not every service being available in every region: Hardware availability is part of the reason, it doesn't make sense to have the more obscure hardware for niche use cases (think GroundStation, their satellite control service) in all regions. Aside from that, there are most likely some financial aspects involved as well, as global scale and complexity come at a cost and if demand isn't sufficient, it may not make sense to roll out a service everywhere.
I can choose to pay more to have dedicated AWS EC2 instances so that my VMs are physically isolated from other people's instances.
However, using EC2 also means I bear the responsibility of maintenance, either through automation or not.
So I would like to use things like Fargate and Lambda, which removes the maintenance burden from me.
Is possible to still have the same level of hardware isolation?
Can I require Amazon to run my Lambda functions and Fargate containers in a physically isolated fashion?
It is not possible as far as I know.
Pulling from the documentation of AWS
For FarGate
Ensure that the VPC that you choose is not configured to require dedicated hardware tenancy, as that is not supported by Fargate tasks.
And at the moment, Lambda also share resource. One Lambda invocation takes up some part of the big chip's CPU time and I do not think they will roll dedicated Lambda out soon as It's one of the reason they can offer computational power that cheap ( keeping their hardware busy serving multiple people )
Also from the docs
Lambda doesn't currently support running in dedicated tenancy
Context :
We are prototyping a multi cloud deployment of our application (based on micro services).
For balancing between high availability and co location we used "Availability Sets" feature in Azure. Which kind off ensures that Azure platform/service upgrades doesn't happen in two distinct sets simultaneously.
Availability sets Azure
Scenario :
I couldn't find anything similar in Google Cloud Platform and AWS. So in this case we have to go with separate "Zones" for high availability.
One argument in favor of Availability sets ( theoretically) are they are kind of more closer that Zones as the former is inside an data center.
Do we have anything close to "availability sets" in GCP and AWS. Please share your thoughts.
Regarding GCP, there are several solutions for high-availability. In general it is recommended to Design Robust Systems prone to failures and Building scalable and resilient applications.
By designing robust systems you are insuring that your VMs are available in case of single instance failure, reboot of the instance or if there is an issue with the zone.
What looks most similar to Availability Sets is Managed Instance Groups.
The managed instance group auto-updater allows you to deploy new versions of software to instances in your MIG, supporting different rollout scenarios (rolling updates, canary updates). You can control the speed and scope of deployment as well as the level of disruption to your service.
Also you can use Regional Persistent Disk that replicates data across zones (datacenters).
It sounds like Placement Groups may be an equivalent feature in AWS. There are a few different configurations where you can ask AWS to cluster your instances very close to maximize network I/O performance or spread your instances across hardware to reduce correlated failures.
Cluster – packs instances close together inside an Availability Zone. This strategy enables workloads to achieve the low-latency network performance necessary for tightly-coupled node-to-node communication that is typical of HPC applications.
Partition – spreads your instances across logical partitions such that groups of instances in one partition do not share the underlying hardware with groups of instances in different partitions. This strategy is typically used by large distributed and replicated workloads, such as Hadoop, Cassandra, and Kafka.
Spread – strictly places a small group of instances across distinct underlying hardware to reduce correlated failures.
I can't speak for Google Cloud as I am not aware of a similar feature but I am also not nearly as familiar with their offerings.
Hope that helps.
How to share S3 storage between multiple EC2 instances? I am beginner to AWS, I need to know how to share a drive between multiple EC2 instances.
Currently you can't, and S3 is your best bet, but AWS does have their Elastic File System in BETA currently, and there is the possibility it will be available for general availability anytime (I have no inside knowledge, just a guess - maybe even this week, they often have lots of announcements during their annual conference going on now).
You can signup for 'preview' access and see if it suits your needs, and then decide if you can wait for it to become fully available.
AWS EFS will allow you to share a drive between instances:
Amazon EFS supports the Network File System version 4 (NFSv4)
protocol, so the applications and tools that you use today work
seamlessly with Amazon EFS. Multiple Amazon EC2 instances can access
an Amazon EFS file system at the same time, providing a common data
source for workloads and applications running on more than one
instance.
https://aws.amazon.com/efs/
EFS (still in beta, half a year later) indeed looks like the best option. But as EFS is basically just a managed, highly available NFS server, it should be possible to roll out some other NFS solution first, and replace it with EFS once it's finally available.
One promising candidate seems dCache, which is
a system for storing and retrieving huge amounts of data, distributed
among a large number of heterogenous server nodes, under a single
virtual filesystem tree with a variety of standard access methods.
It is used by research institutions all over the world to store over 100PB of data, and it provides an NFSv4 interface. Not sure how easy setup on AWS would be, or what the performance would be like.
https://www.dcache.org/
every once in a while i read/hear about AWS and now i tried reading the docs.
But such docs seem to be written for people who already know which AWS they need to use and only search for how it can be used.
So, for myself, to understand AWS better i try to sketch a hypothetical Webapplication with a few questions.
The apps purpose is to modify content like videos or images. So a user has some kind of webinterface where he can upload his files, do some settings and a server grabs the file and modifies it (e.g. reencoding). The Service also extracts the audio track of a video and trys to index the spoken words so the customer can search within his videos. (well its just hypothetical)
So my questions:
given my own domain 'oneofmydomains.com' is it possible to host the complete webinterface on AWS? i thought about using GWT to create the interface and just deliver the JS/images via AWS, but which one, simple storage? what about some kind of index.html, is there an EC2 instance needed to host a webserver which has to run 24/7 causing costs?
now the user has the interface with a login form, is it possible to manage logins with an AWS? here i also think about an EC2 instance hosting a database, but it would also cause costs and im not sure if there is a better way?
the user has logged in and uploads a file. which storage solution could be used to save the customers original and modified content?
now the user wants to browse the status of his uploads, this means i need some kind of ACL, so that the customer only sees his own files. do i need to use a database (e.g. EC2) for this, or does amazon provide some kind of ACL, so the GWT webinterface will be secure without any EC2?
the customers files are reencoded and the audio track is indexed. so he wants to search for a video. Which service could be used to create and maintain the index for each customer?
hope someone can give a few answers so i understand AWS better on how one could use it
thx!
Amazon AWS offers a whole ecosystem of services which should cover all aspects of a given architecture, from hosting to data storage, or messaging, etc. Whether they're the best fit for purpose will have to be decided on a case by case basis. Seeing as your question is quite broad I'll just cover some of the basics of what AWS has to offer and what the different types of services are for:
EC2 (Elastic Cloud Computing)
Amazon's cloud solution, which is basically the same as older virtual machine technology but the 'cloud' offers additional knots and bots such as automated provisioning, scaling, billing etc.
you pay for what your use (by hour), for the basic (single CPU, 1.7GB ram) would prob cost you just under $3 a day if you run it 24/7 (on a windows instance that is)
there's a number of different OS to choose from including linux and windows, linux instances are cheaper to run without the license cost associated with windows
once you're set up the server to be the way you want, including any server updates/patches, you can create your own AMI (Amazon machine image) which you can then use to bring up another identical instance
however, if all your html are baked into the image it'll make updates difficult, so normal approach is to include a service (windows service for instance) which will pull the latest deployment package from a storage (see S3 later) service and update the site at start up and at intervals
there's the Elastic Load Balancer (which has its own cost but only one is needed in most cases) which you can put in front of all your web servers
there's also the Cloud Watch (again, extra cost) service which you can enable on a per instance basis to help you monitor the CPU, network in/out, etc. of your running instance
you can set up AutoScalers which can automatically bring up or terminate instances based on some metric, e.g. terminate 1 instance at a time if average CPU utilization is less than 50% for 5 mins, bring up 1 instance at a time if average CPU goes beyond 70% for 5 mins
you can use the instances as web servers, use them to run a DB, or a Memcache cluster, etc. choice is yours
typically, I wouldn't recommend having Amazon instances talk to a DB outside of Amazon because of the round trip is much longer, the usual approach is to use SimpleDB (see below) as the database
the AmazonSDK contains enough classes to help you write some custom monitor/scaling service if you ever need to, but the AWS console allows you to do most of your configuration anyway
SimpleDB
Amazon's non-relational, key-value data store, compared to a traditional database you tend to pay a penalty on per query performance but get high scalability without having to do any extra work.
you pay for usage, i.e. how much work it takes to execute your query
extremely scalable by default, Amazon scales up SimpleDB instances based on traffic without you having to do anything, AND any control for that matter
data are partitioned in to 'domains' (equivalent to a table in normal SQL DB)
data are non-relational, if you need a relational model then check out Amazon RDB, I don't have any experience with it so not the best person to comment on it..
you can execute SQL like query against the database still, usually through some plugin or tool, Amazon doesn't provide a front end for this at the moment
be aware of 'eventual consistency', data are duplicated on multiple instances after Amazon scales up your database, and synchronization is not guaranteed when you do an update so it's possible (though highly unlikely) to update some data then read it back straight away and get the old data back
there's 'Consistent Read' and 'Conditional Update' mechanisms available to guard against the eventual consistency problem, if you're developing in .Net, I suggest using SimpleSavant client to talk to SimpleDB
S3 (Simple Storage Service)
Amazon's storage service, again, extremely scalable, and safe too - when you save a file on S3 it's replicated across multiple nodes so you get some DR ability straight away.
you only pay for data transfer
files are stored against a key
you create 'buckets' to hold your files, and each bucket has a unique url (unique across all of Amazon, and therefore S3 accounts)
CloudBerry S3 Explorer is the best UI client I've used in Windows
using the AmazonSDK you can write your own repository layer which utilizes S3
Sorry if this is a bit long winded, but that's the 3 most popular web services that Amazon provides and should cover all the requirements you've mentioned. We've been using Amazon AWS for some time now and there's still some kinks and bugs there but it's generally moving forward and pretty stable.
One downside to using something like aws is being vendor locked-in, whilst you could run your services outside of amazon and in your own datacenter or moving files out of S3 (at a cost though), getting out of SimpleDB will likely to represent the bulk of the work during migration.