VCenter update via SDDC (Cloud foundation) Error - vmware

I have a VXRail Cluster with SDDC version 4.4.1.1 Installed. I am trying to apply the VMware Software Update 4.4.1.0 as it is shown in the Available updates for my cluster and I get failure. Before I was able toapply updates via VCenter admin console (running on 5480) and the VCenter seems up to date (Version: 7.0.3,Build: 20395099) but SDDC still gives error on update and the error is strange as it should say at least VCenter already updated or something, but the error is COMPLETED_WITH_FAILURE , VCenter already up to date)
The update activity is:
/29/22, 12:35 PM
COMPLETED_WITH_FAILURE
9/29/22, 12:34 PM
Upgrade element resourceType: VCENTER resourceId: b74dd3f1-b683-472d-98b8-9c2ae98ecf24 status changed to COMPLETED_WITH_FAILURE
9/29/22, 12:34 PM
Upgrade failed : post upgrade checks failed.
9/29/22, 12:34 PM
Upgrade failed: VCenter version post-validation failed.
9/29/22, 12:34 PM
Post-validation of the VCenter for populating known hosts is successful
9/29/22, 12:34 PM
Upgrade element resourceType: VCENTER resourceId: b74dd3f1-b683-472d-98b8-9c2ae98ecf24 recorded stage VCENTER_UPGRADE_POST_VALIDATION
9/29/22, 12:34 PM
Upgrade succeeded : VCenter is already up to date
9/29/22, 12:34 PM
Upgrade element resourceType: VCENTER resourceId: b74dd3f1-b683-472d-98b8-9c2ae98ecf24 recorded stage VCENTER_UPGRADE_PRE_VALIDATION
9/29/22, 12:34 PM
Update VC begin
9/29/22, 12:34 PM
Upgrade element resourceType: VCENTER resourceId: b74dd3f1-b683-472d-98b8-9c2ae98ecf24 recorded all stages [ "VCENTER_UPGRADE_PRE_VALIDATION", "VCENTER_UPGRADE_UNSTAGE", "VCENTER_UPGRADE_SET_REPO", "VCENTER_UPGRADE_PRECHECK", "VCENTER_UPGRADE_STAGE", "VCENTER_UPGRADE_INSTALL", "VCENTER_UPGRADE_POST_VALIDATION", "VCENTER_UPGRADE_POST_INSTALL_VSAN_HCL_UPDATE" ]
The question is how could I make this update as successful as I suppose after this is applied I will get the possibility to apply the VxRail bundle update.

Related

Jenkins suddenly started failing to provision agents in Amazon EKS

We are using the Kubernetes plugin to provision agents in EKS and around 8:45 pm EST yesterday, with no apparent changes on our end (I'm the only admin, and I certainly wasn't doing anything then) we started getting issues with provisioning agents. I have rebooted the EKS node and the Jenkins master. I can confirm that kubectl works fine and lists 1 node running.
I'm suspecting something must have changed on the AWS side of things.
What's odd is that those ALPN errors don't show up anywhere else in our logs until just before this started happening. Google around, I see people saying to ignore these "info" messages because the Java version doesn't support ALPN, but the fact that it's complaining about "HTTP/2" makes me wonder if Amazon changed something on their end to be HTTP/2 only?
I know this might seem too specific for a SO question, but if something did change with AWS that broke compatibility, I think this would be the right place.
From the Jenkins log at around 8:45:
INFO: Docker Container Watchdog check has been completed
Aug 29, 2019 8:42:05 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished DockerContainerWatchdog Asynchronous Periodic Work. 0 ms
Aug 29, 2019 8:45:04 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
INFO: Excess workload after pending Kubernetes agents: 1
Aug 29, 2019 8:45:04 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
INFO: Template for label eks: Kubernetes Pod Template
Aug 29, 2019 8:45:04 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Aug 29, 2019 8:45:04 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
INFO: Started provisioning Kubernetes Pod Template from eks with 1 executors. Remaining excess workload: 0
Aug 29, 2019 8:45:14 PM hudson.slaves.NodeProvisioner$2 run
INFO: Kubernetes Pod Template provisioning successfully completed. We have now 3 computer(s)
Aug 29, 2019 8:45:14 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Created Pod: jenkins-eks-39hfp in namespace jenkins
Aug 29, 2019 8:45:14 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Aug 29, 2019 8:45:14 PM io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1 onFailure
WARNING: Exec Failure: HTTP 403, Status: 403 -
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Aug 29, 2019 8:45:14 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
WARNING: Error in provisioning; agent=KubernetesSlave name: jenkins-eks-39hfp, template=PodTemplate{inheritFrom='', name='jenkins-eks', namespace='jenkins', slaveConnectTimeout=300, label='eks', nodeSelector='', nodeUsageMode=NORMAL, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], volumes=[HostPathVolume [mountPath=/var/run/docker.sock, hostPath=/var/run/docker.sock], EmptyDirVolume [mountPath=/tmp/build, memory=false]], containers=[ContainerTemplate{name='jnlp', image='infra-docker.artifactory.mycompany.io/jnlp-docker:latest', alwaysPullImage=true, workingDir='/home/jenkins/work', command='', args='-url http://jenkins.mycompany.io:8080 ${computer.jnlpmac} ${computer.name}', ttyEnabled=true, resourceRequestCpu='', resourceRequestMemory='', resourceLimitCpu='', resourceLimitMemory='', envVars=[KeyValueEnvVar [getValue()=/home/jenkins, getKey()=HOME]], livenessProbe=org.csanchez.jenkins.plugins.kubernetes.ContainerLivenessProbe#2043f440}], envVars=[KeyValueEnvVar [getValue()=/tmp/build, getKey()=BUILDDIR]], imagePullSecrets=[org.csanchez.jenkins.plugins.kubernetes.PodImagePullSecret#40ba07e2]}
io.fabric8.kubernetes.client.KubernetesClientException:
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:198)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Aug 29, 2019 8:45:14 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Terminating Kubernetes instance for agent jenkins-eks-39hfp
Aug 29, 2019 8:45:14 PM hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtException
SEVERE: A thread (OkHttp Dispatcher/255634) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code.
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask#2c315338 rejected from java.util.concurrent.ScheduledThreadPoolExecutor#2bddc643[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:632)
at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:678)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.scheduleReconnect(WatchConnectionManager.java:300)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$800(WatchConnectionManager.java:48)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:213)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Ran into this today as AWS just pushed the update for the net/http golang CVE for K8s versions 1.12.x. That patch apparently broke the version of the Kubernetes plugin we were on. Updating to the latest version of the plugin 1.18.3 resolved the issue.
https://issues.jenkins-ci.org/browse/JENKINS-59000?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel
Just realized my Kubernetes plugin had an update available. Applied that and it seems to be working fine now.
It's a bit difficult to see without direct troubleshooting, but the log points to Jenkins (through the Java API library) not being able to talk to the kube-apiserver because it has been denied.
It would check if you can still talk to the cluster with a previously working KUBECONFIG using the standard kubectl.
I speculate that the reason for the changed behavior could be to an automatic EKS upgrade of your minor version. For example, EKS recently released a patch (~08/30/19) to address CVE-2019-9512 and CVE-2019-9514.
PS. I don't think the issue is related to dropping the HTTP/2 connection.
Just update the latest version of Kubernetes plugin (1.18.3) and restart the Jenkins. This worked for me.

Artifactory : automation with packer and AWS

I am looking for a way to automate Artifactory deployment on AWS with packer.
I want a simple configuration :
AWS ALB + [ASG:just 1 EC2] Artifactory + EFS for Blob + AWS RDS PostgreSQL
I wrote Terraform and infrastructure is setup working properly
I build Artifactory AMI with packer easily (I build from RPM
installation
In AWS userdata :
I update blob path /var/opt/jfrog/artifactory/data to point AWS EFS
I would like to change db from derby to Postgre, following this url : https://www.jfrog.com/confluence/display/RTF/PostgreSQL
I adjust $ARTIFACTORY_HOME/etc/db.properties and download the JDBC driver corresponding to your PostgreSQL.
Everything is OK, Artifactory detect new empty database, create object ... but Artifactory do not start. I have and error with master.key : here is my catalina.out :
Jul 26, 2018 2:00:17 PM org.apache.coyote.AbstractProtocol init INFO:
Initializing ProtocolHandler ["http-nio-8081"] Jul 26, 2018 2:00:17 PM
org.apache.tomcat.util.net.NioSelectorPool getSharedSelector INFO:
Using a shared selector for servlet write/read Jul 26, 2018 2:00:17 PM
org.apache.coyote.AbstractProtocol init INFO: Initializing
ProtocolHandler ["http-nio-8040"] Jul 26, 2018 2:00:17 PM
org.apache.tomcat.util.net.NioSelectorPool getSharedSelector INFO:
Using a shared selector for servlet write/read Jul 26, 2018 2:00:17 PM
org.apache.coyote.AbstractProtocol init INFO: Initializing
ProtocolHandler ["ajp-nio-8019"] Jul 26, 2018 2:00:17 PM
org.apache.tomcat.util.net.NioSelectorPool getSharedSelector INFO:
Using a shared selector for servlet write/read Jul 26, 2018 2:00:17 PM
org.apache.catalina.core.StandardService startInternal INFO: Starting
service [Catalina] Jul 26, 2018 2:00:17 PM
org.apache.catalina.core.StandardEngine startInternal INFO: Starting
Servlet Engine: Apache Tomcat/8.5.23 Jul 26, 2018 2:00:17 PM
org.apache.catalina.startup.HostConfig deployDescriptor INFO:
Deploying configuration descriptor
[/opt/jfrog/artifactory/tomcat/conf/Catalina/localhost/access.xml] Jul
26, 2018 2:00:17 PM org.apache.catalina.startup.HostConfig
deployDescriptor INFO: Deploying configuration descriptor
[/opt/jfrog/artifactory/tomcat/conf/Catalina/localhost/artifactory.xml]
2018-07-26 14:00:18 [UNDEFINED] [INFO ] Fetched Artifactory
[artifactory.home=null] from servlet context 2018-07-26 14:00:19
[UNDEFINED] [INFO ] Resolved Artifactory home by logger
[artifactory.home=/var/opt/jfrog/artifactory]. 14:00:19.438
[localhost-startStop-2] DEBUG
org.artifactory.converter.VersionProviderImpl - Last Artifactory
database version is: v610 14:00:19.460 [localhost-startStop-2] INFO
org.artifactory.converter.ConvertersManagerImpl - Triggering PRE_INIT
conversion, from v610 to v610 14:00:19.460 [localhost-startStop-2]
INFO org.artifactory.converter.ConvertersManagerImpl - Finished
PRE_INIT conversion, current version is: v610 2018-07-26 14:00:19
[ARTIFACTORY] [INFO ] master.key file currently missing - waiting for
Access to create it. Reattempting to check master.key file existence
in 1 second. 2018-07-26 14:00:20 [ARTIFACTORY] [INFO ] master.key file
currently missing - waiting for Access to create it. Reattempting to
check master.key file existence in 1 second.
Any idea is welcome :-)
Here is an example Terraform template to install jFrog Artifactory. I think you can use something similar to https://github.com/jfrog/JFrog-Cloud-Installers/blob/master/Terraform/userdata.sh for setting Artifactory configuration files.

Docker logs driver gcplogs: How to format logs so that they're shown correctly in Stackdriver?

tldr:
- I'm writing logs from a python application that's running outside of Google Cloud
- I'm using gcplogs to import logs to Stackdriver
- Logs imported must be in the wrong format because they're not displayed as they should in Stackdriver logging
I'm having a hard time figuring out how to configure the gcplogs Docker logs driver so that logs appear as they should in Stackdriver logging. Here's what the logs currently look like:
It looks like stackdriver isn't parsing these logs correctly. However, if the docker image is ran in GKE, the logs look correct:
Here Stackdriver has recognized that these logs were debug messages.
Since these two applications are using the same logger, I think the message that is logged from the application is in the right format, and that it must be the gcplogs configuration that's wrong or missing something. (See this repository for what the python code looks like)
Here's the content of /etc/docker/daemon.json on the machine that's running the Docker image:
{
"log-driver": "gcplogs",
"log-opts": {
"gcp-project": "removed",
"env": "host"
}
}
The output of docker info | grep 'Logging Driver' is Logging Driver: gcplogs.
The output of docker version is:
Client:
Version: 18.03.0-ce
API version: 1.37
Go version: go1.9.2
Git commit: 0520e24
Built: Wed Mar 21 23:05:52 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm
Server:
Engine:
Version: 18.03.0-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.4
Git commit: 0520e24
Built: Wed Mar 21 23:14:54 2018
OS/Arch: linux/amd64
Experimental: false
Any tips on how to configure this so that the logs end up looking as they should in Stackdriver would be much appreciated.

Jenkins GAE Deploy hangs on Getting current resource limits

I am deploying a django project to GAE. When I deploy locally everything runs as smooth as silk, however when I try to deploy from Jenkins it gets hung on the following line:
12:42 PM Getting current resource limits.
Full Deploy Log
12:42 PM Application: <gaeproject>; version: dev (was: 1)
12:42 PM Host: appengine.google.com
12:42 PM Starting update of app: <gaeproject>, version: dev
12:42 PM Getting current resource limits.
Deploy Command
python $GAE_PATH/appcfg.py -A $GAE_INSTANCE -V $GAE_VERSION update .
Has anyone else encountered this issue before? What might be causing this hang?

Admin account constantly locked

I have tried multiple solutions from stack and other sites and the outcome is the same. My admin account is constantly locked. I don't really know what more I can do. Here are some screenshots:
Both my admin accounts aren't stated as locked. While trying to login as ADMIN I get 'locked account' alert and with TWEANT account I get 'wrong passwor'.
I have tried with 'wwv_flow_fnd_user_api.UNLOCK_ACCOUNT(‘ADMIN’);' and running 'apxchpwd.sql' script. Nothin helped and I am out of ideas. Maybe there is something wrong with my installation but while running
[oracle#OracleDatabase oracle-rest]$ java -jar ords.war install simple
Verify ORDS schema in Database Configuration apex with connection host: localhost port: 1521 service name: pdbwindow
Jul 24, 2017 11:59:42 AM oracle.dbtools.rt.config.setup.SchemaSetup install
INFO: Oracle REST Data Services schema version 3.0.11.180.12.34 is installed.
2017-07-24 11:59:42.947:INFO::main: Logging initialized #11856ms
Jul 24, 2017 11:59:44 AM
INFO: The document root is serving static resources located in: /home/oracle/oracle-rest/conf/ords/standalone/doc_root
2017-07-24 11:59:48.514:INFO:oejs.Server:main: jetty-9.2.z-SNAPSHOT
2017-07-24 11:59:49.354:INFO:/ords:main: INFO: Using configuration folder: /home/oracle/oracle-rest/conf/ords
2017-07-24 11:59:49.363:INFO:/ords:main: FINEST: |ApplicationContext [configurationFolder=/home/oracle/oracle-rest/conf/ords, services=Application Scope]|
Jul 24, 2017 11:59:49 AM
INFO: Validating pool: |apex||
Jul 24, 2017 11:59:49 AM
INFO: Pool: |apex|| is correctly configured
Jul 24, 2017 11:59:49 AM
INFO: Validating pool: |apex|al|
Jul 24, 2017 11:59:49 AM
INFO: Pool: |apex|al| is correctly configured
Jul 24, 2017 11:59:49 AM
INFO: Validating pool: |apex|pu|
Jul 24, 2017 11:59:50 AM
INFO: Pool: |apex|pu| is correctly configured
Jul 24, 2017 11:59:50 AM
INFO: Validating pool: |apex|rt|
Jul 24, 2017 11:59:50 AM
INFO: Pool: |apex|rt| is correctly configured
config.dir
2017-07-24 11:59:51.322:INFO:/ords:main: INFO: Oracle REST Data Services initialized|Oracle REST Data Services version : 3.0.11.180.12.34|Oracle REST Data Services server info: jetty/9.2.z-SNAPSHOT|
2017-07-24 11:59:51.338:INFO:oejsh.ContextHandler:main: Started o.e.j.s.ServletContextHandler#86ab0b2{/ords,null,AVAILABLE}
2017-07-24 11:59:51.339:INFO:oejsh.ContextHandler:main: Started o.e.j.s.h.ContextHandler#14767a6f{/,null,AVAILABLE}
2017-07-24 11:59:51.340:INFO:oejsh.ContextHandler:main: Started o.e.j.s.h.ContextHandler#6e6017e7{/i,null,AVAILABLE}
2017-07-24 11:59:51.388:INFO:oejs.ServerConnector:main: Started ServerConnector#5714f585{HTTP/1.1}{0.0.0.0:8080}
2017-07-24 11:59:51.394:INFO:oejs.Server:main: Started #20302ms
Jul 24, 2017 12:00:04 PM
INFO: Configuration properties for: |apex||
cache.caching=false
cache.directory=/tmp/apex/cache
cache.duration=days
cache.expiration=7
cache.maxEntries=500
cache.monitorInterval=60
cache.procedureNameList=
cache.type=lru
db.hostname=localhost
db.port=1521
db.servicename=pdbwindow
debug.debugger=false
debug.printDebugToScreen=false
error.keepErrorMessages=true
error.maxEntries=50
jdbc.DriverType=thin
jdbc.InactivityTimeout=1800
jdbc.InitialLimit=3
jdbc.MaxConnectionReuseCount=1000
jdbc.MaxLimit=10
jdbc.MaxStatementsLimit=10
jdbc.MinLimit=1
jdbc.statementTimeout=900
log.logging=false
log.maxEntries=50
misc.compress=
misc.defaultPage=apex
security.crypto.enc.password=******
security.crypto.mac.password=******
security.disableDefaultExclusionList=false
security.maxEntries=2000
security.requestValidationFunction=wwv_flow_epg_include_modules.authorize
security.validationFunctionType=plsql
db.password=******
db.username=APEX_PUBLIC_USER
Jul 24, 2017 12:00:04 PM
WARNING: *** jdbc.MaxLimit in configuration |apex|| is using a value of 10, this setting may not be sized adequately for a production environment ***
Jul 24, 2017 12:00:04 PM
WARNING: *** jdbc.InitialLimit in configuration |apex|| is using a value of 3, this setting may not be sized adequately for a production environment ***
Jul 24, 2017 12:00:04 PM oracle.ucp.common.UniversalConnectionPoolBase initInactiveConnectionTimeoutTimer
INFO: inactive connection timeout timer scheduled
It has happened to me and I found out the solution was to confirm and re-confirm you're using the right apxchpwd.sql because if you run the 5.1 on 5.0, no errors but the exact issue you're described.
Heck, copy the file from another installation and give it a try..