istio-operator image is no longer available in the public istio-release registry on GCR - google-cloud-platform

We have recently come across an issue on one of our cluster pods, which caused an outage on our application and impacted our customers.
Here is the thing: We were able to pull the gke.gcr.io/istio/operator:1.6.3 image from GCR, though, it started failing overnight.
Finally, we noticed that this image is no longer available in the public istio-release registry, on gcr.io, causing a ImagePullBackoff failure. However, we are still able to find it on docker.io.
Having said that, we're sticking with the solution approach of pulling the image from docker.io/istio/operator:1.6.3, which is a pretty straightforward one for now. Nevertheless, we're still skeptical and wondering why this image has suddenly vanished from gcr.io.
Has anyone been facing something similar?
Best regards.

I did some reasearch but I can't find anything related.
As I mentioned in comments, I strongly suggest you keep all critical images in a private container registry. Using this approach you can avoid incidents like that, and earn some extra control upon the images, such as: versioning, the security etc.
There are many guides on the internet to setup your own managed private container registry like Nexus, if you want to use as a service, you can try Gooogle Container Registry.
Keep in mind that when you are working in a critical environment, you need to try minize the variables to keep your service as resilient as possible.

I noticed a small downtime with one of our services deployed to the GKE and noticed istio-operator was listed with a red warning.
The log was:
Back-off pulling image "gke.gcr.io/istio/operator:1.6.4": ImagePullBackOff
Since istio-operator is a workload GKE manages I was hesitant but the downtime repeated couple of times for couple of minutes so I also edited the service yaml and update the image with docker.

Related

Is there a best practice to test your stack locally before deploying to AWS and avoid deploying your stack over and over during debuggin?

I have been working with AWS and the Serverless Framework/Cloud Formation over the last few months.
A solid amount of time went into debugging my applications and most of this time share went into staring at my console while my stack is being deployed.
I did read in „The Software Craftsman“ (Sandro Mancuso) that the Author worked for a company where the developers where working in a similar fashion: Changing a tiny bit of code, deploying all of the code to the server, executing it, checking print statements before again changing a tiny bit of code and deploying all the code again.
Mancaso heavily criticized this approach and strongly recommended to write tests before deployment to avoid this kind of behavior. Since I currently am developing in a pretty much exactly the same fashion, I gave this approach some thought, but I came across some issues.
Of cause testing is very important and it catches some issues I would have missed before deploying my code. However, when working on cloud infrastructure, microservices and other distributed systems, there are a lot of aspects I simply can not capture in my tests. Errors stemming from the AWS Infrastructure itself, errors stemming from Interaction with other micro services or connected systems etc.
Therefore I am looking for a way (if any exists) to test my AWS stack locally in any way, to avoid changing tiny bits of code and then waiting for my code to deploy to AWS for a few minutes during debugging.
I have not found yet a perfect solution to it. Even if you are testing code locally, with some mocked services, it still can fail after deployment, because you forgot to combine the IAM rights, permissions, security groups, policies etc.
Currently I am working with AWS CLI, which creates Cloud Formation stacks. We can test an Lambda locally, this is not a problem, but even if it communicates with your local DB, it can fail after deployment, as the DB in your account is in VPC and you forgot to change the policies...
Our approach is currently to work with nested stacks, so that we don't have to redeploy entire infrastructure, but only that one part, that was really changed.
Nested stacks works good with AWS CLI.

The zone 'projects/*******/zones/northamerica-northeast1-b' does not have enough resources available

I am unable to restart my VM for 2 hours now, my services are down because of that error :
The zone 'projects/******/zones/northamerica-northeast1-b' does not have enough resources available to fulfill the request. Try a different zone, or try again later.
I can't rely on gcloud having to be down for hours because of ressources. what should I do, I can't afford changing zone, it needs to be in Canada. I can't also afford changing the IP it's behind a DNS. I just need to restart my VM. my business is down...
What's the issue/solution ?
thank you
I'm glad to see that you solved your issue by trying a different machine type. I was about to suggest trying a different machine type and then checking whether it allowed you to restart your VM.
I wanted also to mention in case this can help other users that in case that trying a non-shared core machine type, or a VM from a different family doesn't help you can try to recreate your VM in a different zone of the same region (I've been using northamerica-northeast1-a without any issue so far).
However, in case you want to prevent this from happening at all after a given restart, I recommend you to create a reservation to make sure that these resources are available to you and don't impact your workload/application.
Finally I found this links that maybe you can be interested on: Patterns for scalable apps. It discusses how it's best to deploy your app/workload in different zones to make sure it is more resilient by being balanced and you wouldn't need to change your DNS records every time you need to switch the VM serving the backend.

Usefulness of IaaS Provisoning tools like Terraform?

I have a quick point of confusion regarding the whole idea of "Infrastructure as a Code" or IaaS provisioning with tools like Terraform.
I've been working on a team recently that uses Terraform to provision all of its AWS resources, and I've been learning it here and there and admit that it's a pretty nifty tool.
Besides Infrastructure as Code being a "cool" alternative to manually provisioning resources in the AWS console, I don't understand why it's actually useful though.
Take, for example, a typical deployment of a website with a database. After my initial provisioning of this infrastructure, why would I ever need to even run the Terraform plan again? With everything I need being provisioned on my AWS account, what are the use cases in which I'll need to "reprovision" this infrastructure?
Under this assumption, the process of provisioning everything I need is front-loaded to begin with, so why do I bother learning tools when I can just click some buttons in the AWS console when I'm first deploying my website?
Honestly I thought this would be a pretty common point of confusion, but I couldn't seem to find clarity elsewhere so I thought I'd ask here. Probably a naive question, but keep in mind I'm new to this whole philosophy.
Thanks in advance!
Manually provisioning, in the long term, is slow, non-reproducible, troublesome, not self-documenting and difficult to do in teams.
With tools such as terraform or CloudFormation you can have the following benefits:
Apply all the same development principles which you have when you write a traditional code. You can use comments to document your infrastructure. You can track all changes and who made these changes using software version control system (e.g. git).
you can easily share your infrastructure architecture. Your VPC and ALB don't work? Just post your terraform code to SO or share with a colleague for a review. Its much easier then sharing screenshots of your VPC and ALB when done manually.
easy to plan for disaster recovery and global applications. You just deploy the same infrastructure in different regions automatically. Doing the same manually in many regions would be difficult.
separation of dev, prod and staging infrastructure. You just re-use the same infrastructure code across different environments. A change to dev infrastructure can be easily ported to prod.
inspect changes before actually performing them. Manual upgrades to your infrastructure can have disastrous effects due to domino effect. Changing one, can change/break many other components of your architecture. With infrastructure as a code, you can preview the changes and have good understanding what implications can be before you actually do the change.
work team. You can have many people working on the same infrastructure code, proposing changes, testing and reviewing.
I really like the #Marcin's answer.
Here some additional points from my experience:
As for software version control case you not only can see history/authors, perform code review, but also treat infrastructural changes as product features. Let's say for example you're adding CDN support to your application so you have to make some changes in your infrastructure (to provision a cloud CDN service), application (to actually support and work with CDN) and your pipelines (to deliver static to CDN, if you're using this approach). If all changes related to this new feature will be in a one single branch - all feature related changes will be transparent for everyone in the team and can be easily tracked down later.
Another thing related to version control - is have ability to easily provision and destroy infrastructures for review apps semi-automatically using triggers and capabilities of your CI/CD tools for automated and manual testing. It's even possible to run automated tests for your changes in infrastructure declaration.
If you working on multiple similar project or if your project requires multiple similar but isolated from each other environment, IaC can help save countless hours of provisioning and tracking down everything. Although it's not always silver bullet, but in almost all cases it helps with saving time and avoiding most of accidental mistakes.
Last but not least - it helps with seeing bigger picture if you working with hybrid or multicloud environments. Not as good as infrastructural diagrams, but diagrams might not be always up date unlike your code.

Deploy hyperledger on AWS - production setup

My company is currently evaluating hyperledger(fabric) and we're using it for our POC. It looks very promising and we're targeting rolling out to production in next few months.
We're targeting AWS as our production environment.
However, we're struggling to find good tutorial/practices/recommendations about operating hyperledger network in such environment.
I'm aware that Cello is aiming to solve/ease deploying/monitoring hyperledger network but i also read that its not production ready yet. Question is, should we even consider looking at Cello at this point?
If not, what are our alternatives? Docker swarm, kubernetes?
I also didn't find information about recommended instance types. I understand this is application and AWS specific but what are the minimal system requirements
(memory&CPU&network) for example for 'peer' node (our application is not network intensive, nor a lot of transactions will be submitted per hour/day, only few of them per day).
Another question is where to create those instances on AWS from geographical&decentralization point of view. Does it make sense all of them to be created in same region? Or, we must create instances running in different regions?
Tnx a lot.
Igor.
yes, look at Cello.. if nothing else it will help you see the aws deployment model.
really nothing special..
design the desired system, peers, orderer, gateways, etc..
then decide who many ec2 instance u need to support that.
as for WHERE (region).. depends on where the connecting application is and what kind of fault tolerance you need for your business model.
one of the businesses I am working with wants a minimum of 99.99999 % availability. so, multi-region is critical. its just another ec2 instance with sockets open from different hosts..
aws doesn't provide much in terms of support for hyperledger. they have some templates which allow you to setup the VMs initially, but that's stuff you can do yourself as well.
you are right, the documentation is very light and most of the time confusing. I got to the point where I can start from scratch with a brand new VM and got everything ready and deploy my own network definition and chaincode and have the scripts to do that.
IBM cloud has much better support for hyperledger however. you can design your network visually, you can download your connection profiles, deploy and instantiate chaincode, create and join channels, handle certificates, pretty much everything you need to run and support such a network. It's light years ahead of AWS. They even have a full CI / CD pipepline that you could replicate for your own project. if you look at their marbles demo, you'll see what i mean.
Cello is definitely worth looking at, with the caveat that it's incubation meaning, not real yet, not production ready and not really useful until it becomes a fully fledged product.

How to convert a WAMP stacked app running on a VPS to a scalable AWS app?

I have a web app running on php, mysql, apache on a virtual windows server. I want to redesign it so it is scalable (for fun so I can learn new things) on AWS.
I can see how to setup an EC2 and dump it all in there but I want to make it scalable and take advantage of all the cool features on AWS.
I've tried googling but just can't find a simple guide (note - I have no command line experience of Linux)
Can anyone direct me to detailed resources that can lead me through the steps and teach me? Or alternatively, summarise the steps in an answer so I can research based on what you say.
Thanks
AWS is growing and changing all the time, so there aren't a lot of books to help. Amazon offers training that's excellent. I took their three day class on Architecting with AWS that seems to be just what you're looking for.
Of course, not everyone can afford to spend the travel time and money to attend a class. The AWS re:Invent conference in November 2012 had a lot of sessions related to what you want, and most (maybe all) of the sessions have videos available online for free. Building Web Scale Applications With AWS is probably relevant (slides and video available), as is Dissecting an Internet-Scale Application (slides and video available).
A great way to understand these options better is by fiddling with your existing application on AWS. It will be easy to just move it to an EC2 instance in AWS, then start taking more advantage of what's available. The first thing I'd do is get rid of the MySql server on your own machine and use one offered with RDS. Once that's stable, create one or more read replicas in RDS, and change your application to read from them for most operations, reading from the main (writable) database only when you need completely current results.
Does your application keep any data on the web server, other than in the database? If so, get rid of all local storage by moving that data off the EC2 instance. Some of it might go to the database, some (like big files) might be suitable for S3. DynamoDB is a good place for things like session data.
All of the above reduces the load on the web server to just your application code, which helps with scalability. And now that you keep no state on the web server, you can use ELB and Auto-scaling to automatically run multiple web servers (and even automatically launch more as needed) to handle greater load.
Does the application have any long running, intensive operations that you now perform on demand from a web request? Consider not performing the operation when asked, but instead queueing the request using SQS, and just telling the user you'll get to it. Now have long running processes (or cron jobs or scheduled tasks) check the queue regularly, run the requested operation, and email the result (using SES) back to the user. To really scale up, you can move those jobs off your web server to dedicated machines, and again use auto-scaling if needed.
Do you need bigger machines, or perhaps can live with smaller ones? CloudWatch metrics can show you how much IO, memory, and CPU are used over time. You can use provisioned IOPS with EC2 or RDS instances to improve performance (at a cost) as needed, and use difference size instances for more memory or CPU.
All this AWS setup and configuration can be done with the AWS web console, or command-line tools, or SDKs available in many languages (Python's boto library is great). After learning the basics, look into CloudFormation to automate it better (I've written a couple of posts about that so far).
That's a bit of the 10,000 foot high view of one approach. You'll need to discover the details of each AWS service when you try to use them. AWS has good documentation about all of them.
Depending on how you look at it, this is more of a comment than it is an answer, but it was too long to write as a comment.
What you're asking for really can't be answered on SO--it's a huge, complex question. You're basically asking is "How to I design a highly-scalable, durable application that can be deployed on a cloud-based platform?" The answer depends largely on:
The specifics of your application--what does it do and how does it work?
Your tolerance for downtime balanced against your budget
Your present development and deployment workflow
The resources/skill sets you have on-staff to support the application
What your launch time frame looks like.
I run a software consulting company that specializes in consulting on Amazon Web Services architecture. About 80% of our business is investigating and answering these questions for our clients. It's a multi-week long project each time.
However, to get you pointed in the right direction, I'd recommend that you look at Elastic Beanstalk. It's a PaaS-like service that abstracts away the underlying AWS resources, making AWS easier to use for developers who don't have a lot of sysadmin experience. Think of it as "training wheels" for designing an autoscaling application on AWS.