WSO2 Governance Registry: How to avoid registry tables growth? - wso2

We are using the WSO2 Carbon / WSO2 Governance Registry for one of our products and noticed a abnormal database growth. It turned out that the constant groth is related to REGISTRY-3919.
This issue is marked as resolved with resolution postponed. But this is not a resolution for productive systems. Is any other WSO2 product user facing this problem and has a proper resolution for the constant database growth?

Like stated in comments, this is a design issue. We ended up using pt-archiver in a cronjob
to clean orphaned registry properties from our percona cluster.
pt-archiver \
--run-time 4h \
--progress 10000 \
--limit 2000 \
--no-check-charset \
--source h=<hostname>,D=<database>,t=REG_RESOURCE_PROPERTY \
--purge \
--where '(REG_VERSION, REG_TENANT_ID) NOT IN (SELECT REG_VERSION, REG_TENANT_ID FROM REG_RESOURCE)'

Related

How to pause a Google Cloud Composer Environment?

We had spinned a google cloud composer environment, but need to use it only for testing purpose. Is there a way to pause the environment and only use it when needed?
I am unable to find a way to do it.
Please suggest if any solutions possible to pause or diable it, rather than deleting it.
Thanks!
I tried to find a way to disable/pause the environment but could not find any.
You can't do that but if you are using Cloud Composer 2, it uses a GKE with autopilot mode.
The autopilot mode is optimized if there is no DAGs executions in the cluster.
If it's used for testing purpose, I recommend you using the small environment size and a cheap and small configuration regarding worker and webserver : cpu, memory and storage..., example :
gcloud composer environments create example-environment \
--location us-central1 \
--image-version composer-2.0.31-airflow-2.2.5 \
--environment-size small \
--scheduler-count 1 \
--scheduler-cpu 0.5 \
--scheduler-memory 2.5 \
--scheduler-storage 2 \
--web-server-cpu 1 \
--web-server-memory 2.5 \
--web-server-storage 2 \
--worker-cpu 1 \
--worker-memory 2 \
--worker-storage 2 \
--min-workers 1 \
--max-workers 2
Check the documentation for the best sizing in your case.

Dataproc Serverless - how to set javax.net.ssl.trustStore property to fix java.security.cert.CertPathValidatorException

Trying to use google-cloud-dataproc-serveless with spark.jars.repositories option
gcloud beta dataproc batches submit pyspark sample.py --project=$GCP_PROJECT --region=$MY_REGION --properties \
spark.jars.repositories='https://my.repo.com:443/artifactory/my-maven-prod-group',\
spark.jars.packages='com.spark.mypackage:my-module-jar',spark.dataproc.driverEnv.javax.net.ssl.trustStore=.,\
spark.driver.extraJavaOptions='-Djavax.net.ssl.trustStore=. -Djavax.net.debug=true' \
--files=my-ca-bundle.crt
giving this exception
javax.net.ssl.SSLHandshakeException: java.security.cert.CertPathValidatorException
Tried to set this property javax.net.ssl.trustStore using spark.dataproc.driverEnv/spark.driver.extraJavaOptions, but its not working.
Is it possible to fix this issue by setting the right config properties and values,
or
Custom Image is the ONLY solution, with pre installed certificates?
You need to have a Java trust store with your cert imported. Then submit the batch with
--files=my-trust-store.jks \
--properties spark.driver.extraJavaOptions='-Djavax.net.ssl.trustStore=./my-trust-store.jks',spark.executor.extraJavaOptions='-Djavax.net.ssl.trustStore=./my-trust-store.jks'

How to create connector in airflow that is of type external provider (like the google-cloud-plaform) with the airflow REST API

I'm trying to automate creation of connector in airflow by github action, but since it is an external provider, the payload that need to be sent to airflow REST API doesn't work and i didn't find any documentation on how to do it.
So here is the PAYLOAD i'm trying to send :
PAYLOAD = {
"connection_id": CONNECTOR,
"conn_type": "google_cloud_platform",
"extra": json.dumps({
"google_cloud_platform": {
"keyfile_dict" : open(CONNECTOR_SERVICE_ACCOUNT_FILE, "r").read(),
"num_retries" : 2,
}
})
}
According to the airflow documentation here
And the information i found on the "create connector" page of airflow UI :
Airflow UI create connector page
But i received no error (code 200) and the connector is created but doesn't have the settings i tried to configure.
I confirm the creation works on the UI.
Does anyone have a solution or document that refer to the exact right payload i need to sent to airflow rest api ? Or maybe i miss something.
Airflow version : 2.2.3+composer
Cloud Composer version (GCP) : 2.0.3
Github runner version : 2.288.1
Language : Python
Thanks guys and feel free to contact me for further questions.
Bye
#vdolez was write, it's kind of a pain to format the payload to have the exact same format airflow REST API want. it's something like this :
"{\"extra__google_cloud_platform__key_path\": \"\",
\"extra__google_cloud_platform__key_secret_name\": \"\",
\"extra__google_cloud_platform__keyfile_dict\": \"{}\",
\"extra__google_cloud_platform__num_retries\": 5,
\"extra__google_cloud_platform__project\": \"\",
\"extra__google_cloud_platform__scope\": \"\"}"
And when you need to nest dictionnary inside some of these field, not worth the time and effort. But in case someone want to know, you have to escape every special character.
I change my workflow to notify competent users to create connector manually after my pipeline succeed.
I will try to contact airflow/cloud composer support to see if we can have a feature for better formatting.
You might be running into encoding/decoding issues while sending data over the web.
Since you're using Composer, it might be a good idea to use Composer CLI to create a connection.
Here's how to run airflow commands in Composer:
gcloud composer environments run ENVIRONMENT_NAME \
--location LOCATION \
SUBCOMMAND \
-- SUBCOMMAND_ARGUMENTS
Here's how to create a connection with the native Airflow commands:
airflow connections add 'my_prod_db' \
--conn-type 'my-conn-type' \
--conn-login 'login' \
--conn-password 'password' \
--conn-host 'host' \
--conn-port 'port' \
--conn-schema 'schema' \
...
Combining the two, you'll get something like:
gcloud composer environments run ENVIRONMENT_NAME \
--location LOCATION \
connections \
-- add 'my_prod_db' \
--conn-type 'my-conn-type' \
--conn-login 'login' \
--conn-password 'password' \
--conn-host 'host' \
--conn-port 'port' \
--conn-schema 'schema' \
...
You could run this in a Docker image where gcloud is already installed.

Terraform 0.9.6 remote config outdated

I have been trying to update some of the terraform scripts from version 0.6.13 to 0.9.6. In my scripts I had before
terraform remote config -backend=s3 \
-backend-config="bucket=my_bucker" \
-backend-config="access_key=my_access_key" \
-backend-config="secret_key=my_secret" \
-backend-config="region=my_region" \
-backend-config="key=my_state_key"
and then
terraform/terraform remote pull
Which was pulling the remote state from aws. Upon running terraform apply it will give me the exact resources that needed to be updated/ created based on the remote tfstate that is stored in an s3 bucket.
Now the issue I'm facing is that remote pull and remote config commands are outdated and don't work anymore.
I tried to follow the instructions on https://www.terraform.io/docs/backends/types/remote.html
however it was not much helpful.
From what I understand I would have to do an init first with a partial configuration which presumably would automatically pull the remote state as following:
`terraform init -var-file="terraform.tfvars"\
-backend=true \
-backend-config="bucket=my_bucker" \
-backend-config="access_key=my_access_key" \
-backend-config="secret_key=my_secret" \
-backend-config="region=my_region" \
-backend-config="key=my_state_key"`
However it doesn't really pull the remote state as it was doing before.
Would anyone be able to guide me into the right direction?
You don't need terraform remote pull any more. Terraform by default will automatically based on the refresh flag which defaults to true.
Apparently I had to add a minimal backend configuration such as
terraform {
backend "s3" {
}
}
in my main.tf file for it to work

How to restart webserver in Cloud Composer

We recently met a known issue on airflow:
Airflow "This DAG isnt available in the webserver DagBag object "
Now we used a temporary solution to restart whole environment by changing configurations but this is not an efficient method.
The best workaround now we think is to restart webservers on cloud composer, but we didn't find any command to restart webserver. Is it a possible action?
Thanks!
For those who wander and find this thread: currently the for versions >= 1.13.1 Composer has a preview for a web server restart
Only certain types of updates will cause the webserver container to be restarted, like adding, removing, or upgrading one of the PyPI packages or like changing an Airflow setting.
You can do for example:
# Set some arbitrary Airflow config value to force a webserver rebuild.
gcloud composer environments update ${ENVIRONMENT_NAME} \
--location=${ENV_LOCATION} \
--update-airflow-configs=dummy=true
# Remove the previously set config value.
gcloud composer environments update ${ENVIRONMENT_NAME} \
--location=${ENV_LOCATION} \
--remove-airflow-configs=dummy
From Google Cloud Docs:
gcloud beta composer environments restart-web-server ENVIRONMENT_NAME --location=LOCATION
I finally found an alternative solution!
Based on this document:
https://cloud.google.com/composer/docs/how-to/managing/deploy-webserver
We can build airflow webservers on kubernetes (Yes, please throw away built-in webserver). So we can kill webserver pods to force restart =)
From console dags can be retrieved, we can list all dag present. There other commands too.