`kops update cluster` returns multiple will create/modify resources - amazon-web-services

I have a Kubernetes cluster that uses 1.17.17. I want to increase the CPU/RAM of a node using KOPS. When running kops update cluster command, I expect it would return the preview of my old instance type VS new instance type.
However, it returns a long line of will create resources/will modify resources.
I want to know why it shows a long log of changes it will execute instead of showing only the changes I made for instance type. Also, if this is safe to apply the changes.

After you will do that cluster update you are going to do rolling update on that cluster. The nodes will be terminated one by one and the new ones are going to show. Also while one node is going down to be replaced with the new one the services inside that node are going to be shifted on that one . Small tip remove all poddistributionbudgets. Also the log is fine dont worry.

Related

Trying to come up with a way to track any ec2 instance type changes across our account

I have been trying to come up with a way to track any and all instance type changed that happen in our companies account. (ex: t2.micro to t2.nano)
I settled on creating a custom config rule that would alert us if the instance changed with a uncompliant warning, but I think this might be over complicating it and am suspecting that I should be using CloudWatch alarms or EventBridge.
I have used the following setup (from the CLI):
rdk create ec2_check_instance_type --runtime python3.7 --resource-types AWS::ED2::Instance --input-parameters '{"modify-instance-type":"*"}'
modify-instance-type seemed to be the only thing I could find which related to what I was looking for the lambda function to track and I used the wildcard to signify any changes.
I then added the following to the lambda function:
if configuration_item['resourceType'] != 'AWS::EC2::Instance':
return 'NOT_APPLICABLE'
if configuration_item['configuration']['instanceType'] == valid_rule_parameters['ModifyInstanceAttribute']:
return 'NON_COMPLIANT'
is there a different input-parameter that I should be using for this instead of "modify-instance-type"? so far this has returned nothing. I don't think it is evaluating properly.
or does anyone have a service that might be a better way to track configuration changes like this within aws that I'm just not thinking of?

How to add network tag when creating compute node in Slurm-GCP?

I would like to automatically add network-tag(http-server, https-server) to compute node which is automatically created by slurm-gcp .
After creating a node, the method of manually adding it using the gcloud command is temporarily used.
However, as more nodes are created, the time of manual addition becomes too slow.
I created a custom image and created an instance template using that image. (including http-server and https-server)
However, it does not appear to utilize the instance template at all when created.
I wonder how to solve it.
Thank you.
According to the documentation:
If you need to create a tag on a VM, you must create the tag manually.
You can assign network tags to new VMs at creation time, or you can edit the set of assigned tags at any time later. You can edit network tags without stopping a VM.You can also add tags to, and remove tags from, an existing VM.
Check out the documentation on Configuring network tags.
I solved this problem.
I modified the partial code that creates the network tag of the compute node in the slurm/scripts/resume.py file.
'tags': {'items': ['compute']},
-------
'tags': {'items': ['compute', 'http-server', 'https-server']},
changed below.

Is it safe to apply Terraform plan when it says the database instance must be replaced?

I'm importing the existing resources (AWS RDS) but the terraform plan command showed a summary:
#aws_db_instance.my_main_db must be replaced
+/- resource "aws_db_instance" "my_main_db" {
~ address = x
allocated_storage = x
+ apply_immediately = x
~ arn = x
~ username = x
+ password = x
(others arguments with alot of +/- and ~)
}
my_main_db is online with persistent data. My question is as the title; Is it safe for the existing database to run terrafrom apply? I don't want to lose all my customer data.
"Replace" in Terraform's terminology means to destroy the existing object and create a new one to replace it. The +/- symbol (as opposed to -/+) indicates that this particular resource will be replaced in the "create before destroy" mode, where there will briefly be two database instances existing during the operation. (This may or may not be possible in practice, depending on whether the instance name is changing as part of this operation.)
For aws_db_instance in particular, destroying an instance is equivalent to deleting the instance in the RDS console: unless you have a backup of the contents of the database, it will be lost. Even if you do have a backup, you'll need to restore it via the RDS console or API rather than with Terraform because Terraform doesn't know about the backup/restore mechanism and so its idea of "create" is to produce an entirely new, empty database.
To sum up: applying a plan like this directly is certainly not generally "safe", because Terraform is planning to destroy your database and all of the contents along with it.
If you need to make changes to your database that cannot be performed without creating an entirely new RDS instance, you'll usually need to make those changes outside of Terraform using RDS-specific tools so that you can implement some process for transferring data between the old and new instances, whether that be backup and then restore (which will require a temporary outage) or temporarily running both instances and setting up replication from old to new until you are ready to shut off the old one. The details of such a migration are outside of Terraform's scope, because they are specific to whatever database engine you are using.
It's most likely not safe, but really only someone familiar with the application can make that decision. Look at the properties and what is going to change or be recreated. Unless you are comfortable with all of those properties changing, then it's not safe.

Best way to retire an index

I am retiring an old elastic search index in AWS that has not received a new document since 2016. However, something is still trying to search it.
I still want deprecate this index in a manner manner where I can get back to the original state quickly. I have created a manual snapshot of the index and it is sitting in S3. I was planning on deleting the domain, but, from what I understand, that deletes everything billable under AWS including the end point. As I mentioned above, I want to be able to get back to the original state of the index. So this domain contains a series of indexes. The largest index is 20.5 Gb. I was going to delete the large index and resize the cluster to a smaller instance size and footprint. Will this work or will it be unsearchable?
I've no experience using Elasticsearch on AWS, but I have an idea about your index.
You say the index has received no new documents for a long time. If this also means no deletions and no updates, you could theoretically just take this index to a new cluster, using either snapshot + restore, or a cross-cluster reindex. Continue operating your old cluster until you're sure the new one is working well.
Again - not familiar with AWS terminology, but it sounds like this approach translates to using separate "domains". First you fully ensure the new "domain" is working with the right hardware spec and data, and then delete the old "domain".
TL;DR -> yes!
The backup to S3 will work, but the documents will be unsearchable because in order to downsize the storage you have to delete the index.
But if someday you want to restore the data from S3 back to the index, you can.
You can resize instances and storage sizes with no downtime, however, that takes a long time and you pay extra for the machines while they are resizing.
Example:
you change your storage size from 100gb to 99gb
elasticsearch service will spin up another instance, copy all your data from the old instance to the new one and then delete the old one.
same for instance sizes.
machine up, cluster sync, machine down.
while they are syncing, you pay for them.
your plan will work, es is very flexible.
if you really don't trust aws, just make a json export from the index and keep it on s3 too, just in case things go south.

Migrate ColdFusion scheduled tasks using neo-cron.xml

We currently have two ColdFusion 10 dedicated servers which we are migrating to a single VPS server. We have many scheduled tasks on each. I have taken each of the neo-cron.xml files and copied the var XML elements, from within the struct type='coldfusion.server.ConfigMap' XML element, and pasted them within that element in the neo-cron.xml file on the new server. Afterward I restarted the ColdFusion service, log into cf admin, and the tasks all show as expected.
My problem is, when I try to update any of the tasks I get the following error when saving:
An error occured scheduling the task. Unable to store Job :
'SERVERSCHEDULETASK#$%^DEFAULT.job_MAKE CATALOGS (SITE CONTROL)',
because one already exists with this identification
Also, when I try to delete a task it tells me a task with that name does not exist. So it seems to me that the task information must also be stored elsewhere. So there when I try to update a task, the record doesn't exist in the secondary location so it tries to add it new to the neo-cron.xml file, which causes an error because it already exists. And when trying to delete, it doesn't exist in the secondary location so it says a task with that name does not exist. That is just a guess though.
Any ideas how I can get this to work without manually re-creating dozens of tasks? From what I've read this should work, but I need to be able to edit the tasks.
Thank you.
After a lot of hair-pulling I was able to figure out the problem. It all boiled down to having parentheses in the scheduled task names. This was causing both the "Unable to store Job : 'SERVERSCHEDULETASK#$%^DEFAULT.job_MAKE CATALOGS (SITE CONTROL)', because one already exists with this identification" error and also causing me to be unable to delete jobs. I believe it has something to do with encoding the parentheses because the actual neo-cron.xml name attribute of the var element encodes the name like so:
serverscheduletask#$%^default#$%^MAKE CATALOGS (SITE CONTROL)
Note that this anomaly did not exist on ColdFusion 10, Update 10, but does exist on Update 13. I'm not sure which update broke it, but there you go.
You will have to copy the neo-cron.xml from C:\ColdFusion10\\lib of one server to another. After that restart the server to make the changes effective. Login to the CF Admin and check the functionality.
This should work.
Note:- Please take a backup of the existing neo-cron.xml, before making the changes.