Securing Folder on EC2 Amazon Marketplace AMI - django

I'm planning to start a small business and submit an Linux AMI to Amazon's AWS Marketplace. As I'm reading the seller's guide, I see this:
AMIs MUST allow OS-level administration capabilities to allow for compliance requirements, vulnerability updates and log file access. For Linux-based AMIs this is through SSH." (6.2.2)
How can I protect my source code if anyone who uses my product can SSH to the machine and poke around? Can I lock down certain folders yet still allow "os-level administration"?
Here is a bit of context if needed:
I'm using Ubuntu Server 16.04 LTS (HVM), SSD Volume Type (ami-cd0f5cb6) as my base AMI
I'm provisioning a slightly modified MySQL database that I want my customers to be able to access. This is their primary way of interacting with my service.
I'm building a django web service that will come packaged on the AMI. This is what I'd like to lock down and prevent access to.

Whether or not you provide SSH access, it'll always be possible for your users to mount the root EBS-volume of your AMI on another EC2-instance to investigate its contents, so disabling SSH or making certain files unreadable for an SSH-user doesn't help you in this regard.
Instead of trying to to keep users away from your source code I suggest you simply state clearly what the users are allowed to do with it and what not in the terms of service.
Even large companies provide OS-images which contain the source code of their applications (whenever they use a scripting language) in clear form or just slightly obfuscated.

Related

AWS share custom AMI build instructions

I want to use a trusted cloud provider (I chose Amazon, maybe there is an alternative) to share an application without leaking its code.
The application is supposed to use the customer data, and I want to prove to him that I am not using his data for any other purpose.
So, is there any way to ask Amazon for publishing the instructions I followed to create an Amazon Machine Image which contains the application so that the user can happily instantiate the machine send it his data without any fear of any misuse?
Please help me, thanks a lot
You can create an AMI and grant permission for your user to launch an Amazon EC2 instance in their own AWS Account that uses that AMI.
Any data they place on the instance will remain within their AWS Account and you will not have access to it.
However, they might argue that there could be some software installed in the AMI that secretly sends their data to another computer on the Internet, so they might prefer to install the software themselves rather than use a pre-built AMI. However, unless they look through your entire source code, they wouldn't know whether your software itself is stealing data, either!

Accessing a database inside an EC2 instance

Is it possible to create a database server (MySQL or PostgreSQL) inside an EC2 instance (running Windows 2016) and access it the way we access an RDS or do I need to have a separate RDS for that purpose?
My plan was to have an EC2 instance and use it as a server for accessing some Windows applications to my (small) company as well as an always-available database to store our reports.
Please let me know if I am on the wrong path.
Yes, you can install MySQL or PostgreSQL on an EC2 instance, just like you would for a server that was within your company.
You of course won't have all of the extra redundancy/backup features that RDS provides for you - unless you start adding all of that yourself i.e. automated backups, slave/master configurations, read replicas etc. (and if you do start adding all of those extra features in I would reconsider your decision not to use RDS).
I do this for some smaller, less mission critical solutions I support, and generally have not had many issues; I still prefer RDS when possible, but its not always an option for me.
You can install and configure DB on windows and access from your app. the endpoint will be windows machine IP and running service port. you have to allow the application from the security group.

Prevent an elastic beanstalk app being downloaded using the AWS account?

Short background, we're a small business but our clients are much larger businesses. We have some software they subscribe to which is deployed to AWS elastic beanstalk. Clients have their own devops teams, unlike us, and will need to manage some of the technical support. They will need access to the AWS account running the software, so they can do things like reboot the server, clear the database if they screw it up, change the EC2 instance type etc. This is OK but we want to prevent the software being downloaded outside of the AWS account.
The software is a java WAR running on Tomcat, on a single elastic beanstalk instance. We only care about limiting access to the WAR file (not the database for example).
The beanstalk application versions page appears to have no way to download the WAR file - which is good. They could SSH into the underlying EC2 instance though so presumably they could just copy the WAR out of the tomcat directory. Given the complexity of AWS there's probably other ways they could get access the WAR file too (e.g. clone the EBS volume and attach to another EC2 instance).
I assume that the machine instances available for purchase via AWS marketplace must have some form of copy protection but I've not been able to find any details on this. Also it looks like AWS only accepts marketplace vendors who are much larger than us, so marketplace option may not be open to us.
Any idea how I could prevent access to the WAR file running on elastic beanstalk while still allowing the client access to the AWS account? (Or at least make access hard).
The only solution that comes to mind for this would be, removing any EC2 SSH Key Pairs from the account, and specifically denying them access to ec2:CreateKeyPair. Really, what you need to be doing is granting them least privilege access to the account, that is, specifically granting them access only to those actions they absolutely need.
This will go a long way, but with sufficient knowledge of AWS, it's going to be an uphill battle trying to ensure that you give them enough access to do what they need, while not giving them more than you want. I'd question if a legal option (like contracts, licenses, etc) would be a better protection for this.

How to setup shared persistent storage for multiple AWS EC2 instances?

I have a service hosted on Amazon Web Services. There I have multiple EC2 instances running with the exact same setup and data, managed by an Elastic Load Balancer and scaling groups.
Those instances are web servers running web applications based on PHP. So currently there are the very same files etc. placed on every instance. But when the ELB / scaling group launches a new instance based on load rules etc., the files might not be up-to-date.
Additionally, I'd rather like to use a shared file system for PHP sessions etc. than sticky sessions.
So, my question is, for those reasons and maybe more coming up in the future, I would like to have a shared file system entity which I can attach to my EC2 instances.
What way would you suggest to resolve this? Are there any solutions offered by AWS directly so I can rely on their services rather than doing it on my on with a DRBD and so on? What is the easiest approach? DRBD, NFS, ...? Is S3 also feasible for those intends?
Thanks in advance.
As mentioned in a comment, AWS has announced EFS (http://aws.amazon.com/efs/) a shared network file system. It is currently in very limited preview, but based on previous AWS services I would hope to see it generally available in the next few months.
In the meantime there are a couple of third party shared file system solutions for AWS such as SoftNAS https://aws.amazon.com/marketplace/pp/B00PJ9FGVU/ref=srh_res_product_title?ie=UTF8&sr=0-3&qid=1432203627313
S3 is possible but not always ideal, the main blocker being it does not natively support any filesystem protocols, instead all interactions need to be via an AWS API or via http calls. Additionally when looking at using it for session stores the 'eventually consistent' model will likely cause issues.
That being said - if all you need is updated resources, you could create a simple script to run either as a cron or on startup that downloads the files from s3.
Finally in the case of static resources like css/images don't store them on your webserver in the first place - there are plenty of articles covering the benefit of storing and accessing static web resources directly from s3 while keeping the dynamic stuff on your server.
From what we can tell at this point, EFS is expected to provide basic NFS file sharing on SSD-backed storage. Once available, it will be a v1.0 proprietary file system. There is no encryption and its AWS-only. The data is completely under AWS control.
SoftNAS is a mature, proven advanced ZFS-based NAS Filer that is full-featured, including encrypted EBS and S3 storage, storage snapshots for data protection, writable clones for DevOps and QA testing, RAM and SSD caching for maximum IOPS and throughput, deduplication and compression, cross-zone HA and a 100% up-time SLA. It supports NFS with LDAP and Active Directory authentication, CIFS/SMB with AD users/groups, iSCSI multi-pathing, FTP and (soon) AFP. SoftNAS instances and all storage is completely under your control and you have complete control of the EBS and S3 encryption and keys (you can use EBS encryption or any Linux compatible encryption and key management approach you prefer or require).
The ZFS filesystem is a proven filesystem that is trusted by thousands of enterprises globally. Customers are running more than 600 million files in production on SoftNAS today - ZFS is capable of scaling into the billions.
SoftNAS is cross-platform, and runs on cloud platforms other than AWS, including Azure, CenturyLink Cloud, Faction cloud, VMware vSPhere/ESXi, VMware vCloud Air and Hyper-V, so your data is not limited or locked into AWS. More platforms are planned. It provides cross-platform replication, making it easy to migrate data between any supported public cloud, private cloud, or premise-based data center.
SoftNAS is backed by industry-leading technical support from cloud storage specialists (it's all we do), something you may need or want.
Those are some of the more noteworthy differences between EFS and SoftNAS. For a more detailed comparison chart:
https://www.softnas.com/wp/nas-storage/softnas-cloud-aws-nfs-cifs/how-does-it-compare/
If you are willing to roll your own HA NFS cluster, and be responsible for its care, feeding and support, then you can use Linux and DRBD/corosync or any number of other Linux clustering approaches. You will have to support it yourself and be responsible for whatever happens.
There's also GlusterFS. It does well up to 250,000 files (in our testing) and has been observed to suffer from an IOPS brownout when approaching 1 million files, and IOPS blackouts above 1 million files (according to customers who have used it). For smaller deployments it reportedly works reasonably well.
Hope that helps.
CTO - SoftNAS
For keeping your webserver sessions in sync you can easily switch to Redis or Memcached as your session handler. This is a simple setting in the PHP.ini and they can all access the same Redis or Memcached server to do sessions. You can use Amazon's Elasticache which will manage the Redis or Memcache instance for you.
http://phpave.com/redis-as-a-php-session-handler/ <- explains how to setup Redis with PHP pretty easily
For keeping your files in sync is a little bit more complicated.
How to I push new code changes to all my webservers?
You could use Git. When you deploy you can setup multiple servers and it will push your branch (master) to the multiple servers. So every new build goes out to all webserver.
What about new machines that launch?
I would setup new machines to run a rsync script from a trusted source, your master web server. That way they sync their web folders with the master when they boot and would be identical even if the AMI had old web files in it.
What about files that change and need to be live updated?
Store any user uploaded files in S3. So if user uploads a document on Server 1 then the file is stored in s3 and location is stored in a database. Then if a different user is on server 2 he can see the same file and access it as if it was on server 2. The file would be retrieved from s3 and served to the client.
GlusterFS is also an open source distributed file system used by many to create shared storage across EC2 instances
Until Amazon EFS hits production the best approach in my opinion is to build a storage backend exporting NFS from EC2 instances, maybe using Pacemaker/Corosync to achieve HA.
You could create an EBS volume that stores the files and instruct Pacemaker to umount/dettach and then attach/mount the EBS volume to the healthy NFS cluster node.
Hi we currently use a product called SoftNAS in our AWS environment. It allows us to chooses between both EBS and S3 backed storage. It has built in replication as well as a high availability option. May be something you can check out. I believe they offer a free trial you can try out on AWS
We are using ObjectiveFS and it is working well for us. It uses S3 for storage and is straight forward to set up.
They've also written a doc on how to share files between EC2 instances.
http://objectivefs.com/howto/how-to-share-files-between-ec2-instances

What to bake into an AWS AMI and what to provision using cloud-init?

I'm using AWS Cloudformation to setup numerous elements of network infrastructure (VPCs, SecurityGroups, Subnets, Autoscaling groups, etc) for my web application. I want the whole process to be automated. I want click a button and be able to fire up the whole thing.
I have successfully created a Cloudformation template that sets up all this network infrastructure. However the EC2 instances are currently launched without any needed software on them. Now I'm trying to figure out how best to get that software on them.
To do this, I'm creating AMIs using Packer.io. But some people have instead urged me to use Cloud-Init. What heuristic should I use to decide what to bake into the AMIs and/or what to configure via Cloud-Init?
For example, I want to preconfigure an EC2 instance to allow me (saqib) to login without a password from my own laptop. Thus the EC2 must have a user. That user must have a home directory. And in that home directory must live a file .ssh/known_hosts containing encrypted codes. Should I bake these directories into the AMI? Or should I use cloud-init to set them up? And how should I decide in this and other similar cases?
I like to separate out machine provisioning from environment provisioning.
In general, I use the following as a guide:
Build Phase
Build a Base Machine Image with something like Packer, including all software required to run your application. Create an AMI out of this.
Install the application(s) onto the Base Machine Image creating an Application Image. Tag and version this artifact. Do not embed environment specific stuff here like database connections etc. as this precludes you from easily reusing this AMI across different environment runtimes.
Ensure all services are stopped
Release Phase
Spin up an environment consisting of the images and infra required, using something like CFN.
Use Cloud-Init user-data to configure the application environment (database connections, log forwarders etc.) and then start the applications/services
This approach gives the greatest flexibility and cleanly separates out the various concerns of a continuous delivery pipeline.
One of the important factors that determines how you should assemble servers, AMIs, and infrastructure planning is to answer the question: In production, how fast will I need a new instance launched?
The answer to this question will determine how much you bake into the AMI vs. how much you build after boot.
NOTE: My experience is with Chef Server so I will use Chef terminology but the concepts are the same for any other configuration management stack.
The general rule of thumb is to treat your "Infrastructure as Code". This means think about the process of launching instances, creating users on that machine, and the process of managing a known_hosts files and SSH keys the same as you would your application code. Being able to track the changes to Infrastructure in source code makes management easier, redeployments, and even CI much easier.
This Chef Introduction covers the terminology in Chef of Cookbooks, Recipes, Resources, and more. It shows you how to build a simple LAMP stack, and how you can relaunch it just as easily with one command.
So given the example in your question, at a high level I would do the following:
Launch a base Ubuntu Linux AMI (currently 14.04) with a Cloudformation script.
In the UserData section of the Instance configuration, boot strap the Chef Client Install process.
Run a Recipe to create a user.
Run a Recipe to create the known_hosts file for the user
Tools like Chef are used because you are able to break down the infrastructure into small blocks of code performing specific functions. There are numerous Cookbooks already built and available that perform the basic building blocks of creating services, installing software packages, etc.
All that being said, there are some times when you have to deviate from best practices in the interest of your specific domain and requirements. There may be situations where given all the advantages of a infrastructure management you will still need to bake items into the AMI.
Let's pretend your application does image processing and has a requirement to use ImageMagick. Let's assume that you will need to build ImageMagick from source. If you were to do this via Chef Recipes this could add another 7 minutes of just compiling ImageMagick to the normal instance boot time. If waiting 10-12 minutes is too long for a new instance to come online then you may want to consider baking your own AMI that has ImageMagick already compiled and installed.
This is an acceptable solution but you should keep in mind that managing your own fleet of pre-baked AMIs adds additional infrastructure overhead. You will need to keep your custom AMIs updated as new AMIs are released, you expand to different instance types and to different AWS Regions.