Can't use Composer environment after re-enabling service - google-cloud-platform

I have been trying to find a way to save on the costs of Airflow by disabling it when not in use. I have discovered that if we disable the composer.googleapis.com service while not in use that Google does not charge for the service while it is disabled, although it does continue to charge for other resources that are still active. Unfortunately, if the service is disabled for more than an hour or so, the service is not usable after re-enabling it. After the service has been disabled for an extended period of time, the Composer Environment Details Page shows
An error occurred with retrieving the last operation on this environment
and
This environment cannot be edited due to the errors that occurred during environment creation/update. Please investigate the logs to determine the cause, or create a new environment.
And gcloud composer environments describe shows state: ERROR
The one error that I did see in the logs was due to a duplicate key when the airflow_monitoring DAG was rescheduled after a little over an hour. Therefore, created a new Composer environment, disabled all DAGs, disabled the composer service, waited a while, then enabled it again. The environment was once again in an error state.
The Cloud Composer documentation states:
If you disable the Cloud Composer API, environments become unusable within an hour of service deactivation unless you re-enable the API. If you re-enable the API, you are billed for the service usage that occurs while the Cloud Composer service is deactivating.
Maybe this is poorly worded, but to me it sounds like it would become unusable within an hour if you disable it, but if you re-enabled it anytime later, it will become usable again. I am wondering if it really means that if you disable it, you must re-enable it within 1 hour or it will become permanently unusable.
Is there a way to disable the composer.googleapis.com service for longer than an hour and then get it working again after the service has been re-enabled? Is there something I can restart, or some way to clear the error state? Is there more I should do before disabling it?
I am using composer-1.10.4-airflow-1.10.6 with Python 3.
Thanks.

No, there is no way to disable the composer.googleapis.com service for more than an hour and then have environments be functional after re-enablement.
GCP services are not meant to be enabled/disabled on the fly in this manner, and disablement of a service is meant to be performed with the intention of disabling it for the long term. Keeping a service disabled for long enough means Google-managed components created for the service (specifically for your project) will be decommissioned, and in Composer's case, this will render your environments permanently unusable.
The error state in the environment cannot be cleared. If you want to save on costs, you should delete Composer environments as opposed to deactivating the service entirely. The "service" is not cluster-like and isn't meant to be toggled on and off.

Related

When I deploy LAMP Stack on Google Cloud Platform the deployment shows these warnings. How to fix this?

When I deploy LAMP Stack in Google cloud platform this warnings show after finished the deployment.
This deployment has resources from the Runtime Configurator service,
which is in Beta. There is no planned date for moving this feature
into General Availability (GA). Examples of runtimeconfig types used:
runtimeconfig.v1beta1.config, runtimeconfig.v1beta1.waiter
How can avoid this?
As the warning message states that the feature that you would like to use is not completely available yet and there is no planned date as well and to confirm this I have tried creating the same deployment in my test environment and I even ended up with the same warning message even after it was deployed.
The warning was an intended behavior and made to notify that the product you are trying to use Run time Configurator is still at Pre-GA offerings hence not to rely completely on it. If you want to continue the usage of the respective deployments using Run time Configurator you can ignore the default warning messages.

Google Cloud Composer | Airflow - Specifying or closing maintenance windows

I'm using Airflow with Cloud Composer. I have a Dag scheduled to work every hour. However, I read something like this in the composer documentation: "Maintenance operations might impact the execution of your DAGs and Airflow tasks"
But the cron job I created should run for every hour. I don't want any downtime due to maintenance windows.
I'm worried about any problems with the selected maintenance windows. Can you give me information about this? Do I have an option to close the maintenance window?
Thanks in advance.
As you have mentioned, running DAGs on the maintenance window might cause scheduling or execution issues. This was also mentioned in Cloud Composer Troubleshooting Scheduler issues.
You can define specific maintenance windows for your environment.
During these time periods, maintenance events for Cloud SQL and GKE
take place.
Avoid scheduling DAG runs during maintenance windows because this
might cause scheduling or execution issues.
Also removing the maintenance window won't be considered as a feature moving forward as per Cloud Composer Issue Tracker.
Information from the Engineering team is that this feature is not in
the road map. Hence, We're not going to have this feature in the
future to remove the maintenance window once it's applied to the
composer environment.
Reason: If Environment don't have any maintenance windows then
maintenance operations are happening at random times and having
maintenance window allows to have maintenance operations in
predictable slots.
Unfortunately your only option is to deal with the scheduling or execution issues if ever you encounter one. Since you have mentioned that your DAG runs every hour.

What's the gcloud equivalent of appcfg rollback?

The GCP command appcfg has been deprecated. appcfg used to have appcfg rollback to be used when there is a failed deployment.
What is its equivalent for gcloud (the new command)? I can't find it in Google GCP documentation.
More context:
Rolling back in appcfg was not meant for managing the traffic and going back to the previous version. It was used to remove the lock on your deploy.
If you had an unsuccessful deployment, you were not able to deploy any more. appcfg rollback was used to remove that lock and make it available for you to deploy again.
I think there is no direct command to appcfg rollback. However, I would highly recommend you to consider the Splitting the traffic option.
This will let you redirect the traffic from one of your versions to another, even between old versions of your service.
Let's imagine this:
You have version 1 of your service and it works just fine.
A couple weeks later you decide to deploy a new the version: version 2
However, the deploy fails and your app is completely down. You are loosing users and money. Everything is on fire.
You can easily switch the traffic to the trusty version 1 by redirecting 100% of the traffic to it.
Version 2 is out of the game until you deploy a new version.
The advantage of this is that you don't have to wait until the rollback is done. The traffic is automatically redirected to an old version. Additionally, it has the gcloud set traffic command for you to run it via the CLI.
Hope this is helpful!

Google Cloud service stopped and never restarting

I have been using the Google Cloud speech recognition service for some time, through a python application.
Due to accidentally copying my Google Cloud json file to a GitHub shared location (I was doing a backup), I suddenly got a warning from Google Cloud that I was violating the rules as json is private. Then, I promptly removed the file, but nevertheless, I got an email saying that my resources for my project "santo1" were suspended, saying some reasons of "cryptocurrency mining" which I have no idea.
I applied to reactivate and my appeal was accepted promptly, saying that my resources about santo1 were reinstated.
Unfortunately, the speech recognition still didn't work.
Launching it from python, it records from the microphone but no answer from the service - and no error messages at all.
Then I attempted the following:
regenerate API
create a new json
create a new project with its own json under my same google account
as suggested by the Google Cloud chat operator, I manually clicked play to the VM resource that appeared stopped
create a new gmail account, with another new project, setup with billing and everything (also reconfigured through "gcloud init")
None of these attempts worked.
I need assistance on this, as the chat operator didn't seem capable of telling me more.
Thank you in advance
Best regards
I would recommend you to contact GCP support for this case as your cloud service could be still in suspended status regardless your access is OK
Apparently, the access key is stolen and used by hackers and they did crypto mining using your GCP account, hence your service account was banned
If it's your testing account/project, you should consider to create a new project rather than continue with it, the hacker could create some other services which you may not realize until too late
Worse case is it's your PROD service, then you'd better review the bill and transaction report thoroughly

Google CloudSQL instance non-responsive, how to get support?

When it comes to databases, we want to leave managing them to the pros, which is why we went for a managed solution in the form of a CloudSQL 2nd gen db instance. Today the instance stopped responding, I clicked restart, it has been restarting for hours and is not responding, I have tried clone the instance, also not responding.
I don't know what else to do, our db is crippled and the service that uses it is down. These things happen, fine.
The thing that shocked me is that I am unable to contact anybody to resolve this problem. I understand that I can pay for a support subscription, $150p/m and up. This confuses me though, the GCloud console UI is not responding, am I incorrect in assuming I should not have to pay for support for the core product to at least work?
This leads me to my main question, if I want to continue using Google Cloud products in production, do I NEED a support subscription?
Same happened to us yesterday. The cloud SQL instance did not respond for an hour and a half (from 18h to 19:30h GTM+1).
We couldn't do absolutely nothing, we tried to backup the instance to a bucket but the command was returning an error saying that another operation is in progress.
We are a small startup and we can't pay for a support plan, but when we hired the cloud SQL service we thought that this kind of situations doesn't happen.
Honestly, after this I believe that Cloud SQL is not a good option if you do not contract at the same time a gold or platinum support plan. It is frustrating that something fails and you can not do anything, or even report the error.
Try the gcloud command line tool in your active shell, instead of the console UI. Try exporting the data from your SQL instance to google cloud storage bucket by using this command:
gcloud sql instances \
export <sql-instnace-name> \
gs://<bucket-name>/backup.sql
The SQL instance's service account by default has read and write access to google cloud storage bucket.
Create a new SQL instance using this command:
gcloud sql instances \
create <new-sql-instance-name>
Now, add the data to the new SQL instance using this command:
gcloud sql instances \
import <new-sql-instance-name> \
gs://<bucket-name>/backup.sql
You can get free or premium support here. You do not need a subscription to get help; it all depends on your needs and the level of urgency you estimate for eventual future problems.
If you have a recent backup of your database, you may consider re-creating it in another instance, from there.
You may consider posting your issue in the Google Cloud SQL Product Issue Tracker. This way, it will enjoy much better visibility from developers and Google support, without attracting any extra costs.