saltstack best practices on AWS - amazon-web-services

I have been using saltstack for a few years with bare metal server. Now we need to setup a whole new environment on AWS. I'd prefer to use saltstack set everything up because I like the orchestration of salt and the event based stuff, like beacons & reactors. Plus it's easy to write your own customised python module. We will also be running kubernetes clusters on EC2 instances. Can someone provide some best practices for using salt with AWS and k8s?

There’s a few reusable setups floating around, last I remember https://github.com/valentin2105/Kubernetes-Saltstack was the most complete of them. But all of them are less solid than tools closer to the community mainstream (kops, kubespray) so beware of weird problems. I would recommend going through Kubernetes The Hard Way just so you have some familiarity with the underlying components that make up Kubernetes so you’ll have a better chance of debugging them :)

Related

Emacs Tramp access to AWS SageMaker instance

I do machine learning code development on an AWS SageMaker instance, and would like to make use of Emacs/Tramp†. My question: how, if at all, could that be done? I strongly suspect that some knowledge about security protocols / IAM roles, etc., would be important, but I am a mere SageMaker end-user.
A co-worker may be able to assist on some of these questions, but he is way over-subscribed. It is a big ask of him to indulge my personal preferences (however strong), so I start by posing the question, has anybody else has already solved this problem, or are there good starting points for consideration?
Already seen:
Martin Baillie's Emacs TRAMP over AWS SSM APIs, which seems light on key details, and may not be applicable for SageMaker environments
AWS's own Tutorial: Set Up PyCharm Professional with a Development Endpoint, which seems to be pretty specific to PyCharm, and which again may not be suitable for SageMaker environments.
†Why? Because I have a long lifetime of using emacs key bindings and macros, etc., that improve my efficiency greatly. Why not emacs within a terminal running on a SageMaker instance? That's what I'm doing now, but it leaves out important flexibility compared with a local windowed emacs client; the latter can be as tall or wide as my pixels permit, can have multiple frames simultaneously, wouldn't have as many networking latencies, etc.

Usefulness of IaaS Provisoning tools like Terraform?

I have a quick point of confusion regarding the whole idea of "Infrastructure as a Code" or IaaS provisioning with tools like Terraform.
I've been working on a team recently that uses Terraform to provision all of its AWS resources, and I've been learning it here and there and admit that it's a pretty nifty tool.
Besides Infrastructure as Code being a "cool" alternative to manually provisioning resources in the AWS console, I don't understand why it's actually useful though.
Take, for example, a typical deployment of a website with a database. After my initial provisioning of this infrastructure, why would I ever need to even run the Terraform plan again? With everything I need being provisioned on my AWS account, what are the use cases in which I'll need to "reprovision" this infrastructure?
Under this assumption, the process of provisioning everything I need is front-loaded to begin with, so why do I bother learning tools when I can just click some buttons in the AWS console when I'm first deploying my website?
Honestly I thought this would be a pretty common point of confusion, but I couldn't seem to find clarity elsewhere so I thought I'd ask here. Probably a naive question, but keep in mind I'm new to this whole philosophy.
Thanks in advance!
Manually provisioning, in the long term, is slow, non-reproducible, troublesome, not self-documenting and difficult to do in teams.
With tools such as terraform or CloudFormation you can have the following benefits:
Apply all the same development principles which you have when you write a traditional code. You can use comments to document your infrastructure. You can track all changes and who made these changes using software version control system (e.g. git).
you can easily share your infrastructure architecture. Your VPC and ALB don't work? Just post your terraform code to SO or share with a colleague for a review. Its much easier then sharing screenshots of your VPC and ALB when done manually.
easy to plan for disaster recovery and global applications. You just deploy the same infrastructure in different regions automatically. Doing the same manually in many regions would be difficult.
separation of dev, prod and staging infrastructure. You just re-use the same infrastructure code across different environments. A change to dev infrastructure can be easily ported to prod.
inspect changes before actually performing them. Manual upgrades to your infrastructure can have disastrous effects due to domino effect. Changing one, can change/break many other components of your architecture. With infrastructure as a code, you can preview the changes and have good understanding what implications can be before you actually do the change.
work team. You can have many people working on the same infrastructure code, proposing changes, testing and reviewing.
I really like the #Marcin's answer.
Here some additional points from my experience:
As for software version control case you not only can see history/authors, perform code review, but also treat infrastructural changes as product features. Let's say for example you're adding CDN support to your application so you have to make some changes in your infrastructure (to provision a cloud CDN service), application (to actually support and work with CDN) and your pipelines (to deliver static to CDN, if you're using this approach). If all changes related to this new feature will be in a one single branch - all feature related changes will be transparent for everyone in the team and can be easily tracked down later.
Another thing related to version control - is have ability to easily provision and destroy infrastructures for review apps semi-automatically using triggers and capabilities of your CI/CD tools for automated and manual testing. It's even possible to run automated tests for your changes in infrastructure declaration.
If you working on multiple similar project or if your project requires multiple similar but isolated from each other environment, IaC can help save countless hours of provisioning and tracking down everything. Although it's not always silver bullet, but in almost all cases it helps with saving time and avoiding most of accidental mistakes.
Last but not least - it helps with seeing bigger picture if you working with hybrid or multicloud environments. Not as good as infrastructural diagrams, but diagrams might not be always up date unlike your code.

Is there a way to discover VMs using Terraform?

Infrastructure team members are creating, deleting and modifying resources in GCP project using console. Security team wants to scan the infra and check weather proper security measures are taken care
I am tryng to create a terraform script which will:
1. Take project ID as input and list all instances of the given project.
2. Loop all the instances and check if the security controls are in place.
3. If any security control is missing, terraform script will be modifying the resource(VM).
I have to repeat the same steps for all resoources available in project like subnet, cloud storage buckets, firewalls etc.
As per my initial investigation to do such task We will have to import the resources to terraform using "terraform import" command and after that will have to think of loops.
Now it looks like using APIs of GCP is the best fit for this task, as it looks terraform is not the good choice for this kind of tasks and I am not sure weather it is achievable using teffarform.
Can somebody provide any directions here?
Curious if by "console" you mean the gcp console (aka by hand), because if you are not already using terraform to create the resources (and do not plan to in the future), then terraform is not the correct tool for what you're describing. I'd actually argue it is increasing the complexity.
Mostly because:
The import feature is not intended for this kind of use case and we still find regular issues with it. Maybe 1 time for a few resources, but not for entire environments and not without it becoming the future source of truth. Projects such as terraforming do their best but still face wild west issues in complex environments. Not all resources even support importing
Terraform will not tell you anything about the VM's that you wouldn't know from the GCP cli already. If you need more information to make an assessment about the controls then you will need to use another tool or have some complicated provisioners. Provisioners at best would end up being a wrapper around other tooling you could probably use directly.
Honestly, I'm worried your team is trying to avoid the pain of converting older practices to IaC. It's uncomfortable and challenging, but yields better fruit in the long run then the path you're describing.
Digress, if you have infra created via terraform then I'd invest more time in some other practices that can accomplish the same results. Some other options are: 1) enforce best practices via parent modules that security has "blessed", 2) implement some CI on your terraform, 3) AWS has Config and Systems Manager, not sure if GCP has an equivalent but I would look around. Also it's worth evaluating using different technologies for different layers of abstraction. What checks your OS might be different from what checks your security groups and that's ok. Knowing is half the battle and might make for a more sane first version then automatic remediation.
With or without terraform, there is a an ecosystem of both products and opensource projects that can help with the compliance or control enforcement. Take a look at tools like inspec, sentinel, or salstack for inspiration.

Deploy hyperledger on AWS - production setup

My company is currently evaluating hyperledger(fabric) and we're using it for our POC. It looks very promising and we're targeting rolling out to production in next few months.
We're targeting AWS as our production environment.
However, we're struggling to find good tutorial/practices/recommendations about operating hyperledger network in such environment.
I'm aware that Cello is aiming to solve/ease deploying/monitoring hyperledger network but i also read that its not production ready yet. Question is, should we even consider looking at Cello at this point?
If not, what are our alternatives? Docker swarm, kubernetes?
I also didn't find information about recommended instance types. I understand this is application and AWS specific but what are the minimal system requirements
(memory&CPU&network) for example for 'peer' node (our application is not network intensive, nor a lot of transactions will be submitted per hour/day, only few of them per day).
Another question is where to create those instances on AWS from geographical&decentralization point of view. Does it make sense all of them to be created in same region? Or, we must create instances running in different regions?
Tnx a lot.
Igor.
yes, look at Cello.. if nothing else it will help you see the aws deployment model.
really nothing special..
design the desired system, peers, orderer, gateways, etc..
then decide who many ec2 instance u need to support that.
as for WHERE (region).. depends on where the connecting application is and what kind of fault tolerance you need for your business model.
one of the businesses I am working with wants a minimum of 99.99999 % availability. so, multi-region is critical. its just another ec2 instance with sockets open from different hosts..
aws doesn't provide much in terms of support for hyperledger. they have some templates which allow you to setup the VMs initially, but that's stuff you can do yourself as well.
you are right, the documentation is very light and most of the time confusing. I got to the point where I can start from scratch with a brand new VM and got everything ready and deploy my own network definition and chaincode and have the scripts to do that.
IBM cloud has much better support for hyperledger however. you can design your network visually, you can download your connection profiles, deploy and instantiate chaincode, create and join channels, handle certificates, pretty much everything you need to run and support such a network. It's light years ahead of AWS. They even have a full CI / CD pipepline that you could replicate for your own project. if you look at their marbles demo, you'll see what i mean.
Cello is definitely worth looking at, with the caveat that it's incubation meaning, not real yet, not production ready and not really useful until it becomes a fully fledged product.

django deployment with java and c++

I have created a django app that contains c++ for some of the views as well as a java library. How would I deploy this app? What kind of hosting service allows for multiple languages? I have looked at EC2, GAE, and several platforms (like heroku) but I can't seem to find a definitive solution.
I have never deployed anything to the web so a simple explanation would be much appreciated.
PaaS stuff is probably not your best bet. If you want the scalability and associated buzzwords(muh 99.9999999999% availability because my servers are hosted in a parallel dimension without electrical storms, power outages, hurricanes, earthquakes, or nuclear holocausts) that comes with hosting your application on a huge web company's platform, check out IaaS(Infrastructure as a service) systems like Google's Compute Engine or AWS. With these you just get a virtual server (or servers), running your Linux distro of choice, and you can install and run whatever you please on them without being constrained to a specific platform like App Engine or Heroku(where you have to basically write your app to specifically run on that platform). If you plan on consuming a ton of bandwidth/resources from the get-go, you will almost certainly get a better deal using a dedicated server(s) from a small company.
Interested in what specifically you are executing C++ for in a Django view. Image/video processing?
Well. Deployment is not really something where a simple explanation helps much.
First I would check what the requirements to the operating system are (compilers, dependencies,…). That will maybe reduce the options quickly.
I guess that with a setup containing C++ & Java artifacts, the usual PaaS (GaE, Heroku,…) offerings will not be sufficient because they define the stack. And a mixture of Python/C++/Java is rather uncommon I'd say.
Choosing an IaaS offering (EC2, …) may be an option. There you can run your whole self-defined stack and have the possibility of easier scaling.
Hosting the application on your own server(s) is also always possible. Check your data protection regulations to find out if it's not even a requirement.
There are a lot of ways to get the Django application to run. The Django documentation has some information about deployment. If you have certain special requirements, uwsgi may be a good application server.
You may also want a web server in front of the application. Possibilities range from using uwsgi's built-in http server or using e.g. Nginx with uwsgi.
All in all every component of the whole "deployment" has hundereds of bells and whistels and it's not easy to give advice without knowing specific requirements and properties of the system itself. You'll also probably need a database you have to deploy.
But before deploying it to the web, it's also important to have a solid build process to assemble all the parts. And not only on the development machine. With three languages involved this should be the first step solve. If it easily and automagically deploys in a development environment, moving it to a server is easier.