Self hosted Build Agent taking too much time to run Job successfully - build

Our self-hosted build agent fails on publishing the test results to Azure DevOps (MS Cloud, not on-premise) sometimes.
We configured the agent according to the documentation on https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/proxy?view=azure-devops&tabs=windows by passing the proxy settings in the config.cmd.
We have the following task description:
steps:
- task: PublishTestResults#2
displayName: 'Publish Test Results **/junit*.xml'
inputs:
testResultsFiles: '**/junit*.xml'
testRunTitle: 'Jest unit tests'
continueOnError: true
condition: succeededOrFailed()
timeoutInMinutes: 2
In most cases the build agent takes round about 15minutes to run the "Publish Test Results */junit.xml" job. (Pipeline in Azure DevOps).
Since it sometimes took less time (only 6 seconds) i am thinking that something goes wrong in this task. Really shouldn't take that much time for this task.
Task returns following warning:
##[warning]Failed to upload file junit.xml to Blob Transfer exception with errorcode Unknown, exception message Microsoft.Azure.Storage.DataMovement.TransferException: The transfer failed. ---> Microsoft.Azure.Storage.StorageException: Fehler beim Senden der Anforderung. ---> System.Net.Http.HttpRequestException: Fehler beim Senden der Anforderung. ---> System.Net.WebException: Der Remoteserver hat einen Fehler zurückgegeben: (407) Proxyauthentifizierung erforderlich.
Sorry for the warning message being in German! Can try to get in English if necessary.
Does somebody know where the error/warning comes from? Grateful for any help, and sorry if this issue is not being created in perfection, i promise i will do better next time. I am learning!
For those who suggest using environment variables for the proxy configuration - unfortunately this is not a option.
Thanks in advance!

You are having http 407 error. So it is Proxy Authentication Required client error status response code.
Please check: https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/proxy?view=azure-devops&tabs=windows

Problem solved!
The problem was that there were some URLs that needed to be whitelisted. The firewall blocked them before.
Works fine now.

Related

Google Cloud Django App Deployment - Permission Issues

I'm following this tutorial, yet I get stuck at the very end when I'm trying to deploy the app on the App Engine.
I get the following error message:
Updating service [default] (this may take several minutes)...failed.
ERROR: (gcloud.app.deploy) Error Response: [13] Flex operation projects/responder-289707/regions/europe-west6/operations/a0e5f3f4-29a7-49d8-98b5-4a52b7bf04ca error [INTERNAL]: An internal error occurred while processing task /app-engine-flex/insert_flex_deployment/flex_create_resources>2020-09-21T20:32:48.366Z12808.hy.0: Deployment Manager operation responder-289707/operation-1600720369987-5afd8c109adf5-6a4ad9a9-e71b9336 errors: [code: "RESOURCE_ERROR"
location: "/deployments/aef-default-20200921t223056/resources/aef-default-20200921t223056"
message: "{\"ResourceType\":\"compute.beta.regionAutoscaler\",\"ResourceErrorCode\":\"403\",\"ResourceErrorMessage\":{\"code\":403,\"message\":\"The caller does not have permission\",\"status\":\"PERMISSION_DENIED\",\"statusMessage\":\"Forbidden\",\"requestPath\":\"https://compute.googleapis.com/compute/beta/projects/responder-289707/regions/europe-west6/autoscalers\",\"httpMethod\":\"POST\"}}"
I don't really understand why though. I'm have authenticated my gcloud, made sure my account has App Engine Admin/Deployment rights. Have everything in place.
Any hints would be much appreciated.
You apparently do not have the rights for autoscaling resources. This could be due to a free account or that you need different rights to deploy an autoscaling service (other than App Engine Admin/Deployment).
Seeing as how you're doing the tutorial you could define a static resource amount, this is safer for your wallet as wel.
app.yaml
# add this
automatic_scaling:
min_num_instances: 1
max_num_instances: 2

Random “upstream connect error or disconnect/reset before headers” between services with Istio 1.3

So, this problem is happening randomly (it seems) and between different services.
For example we have a service A which needs to talk to service B, and some times we get this error, but after a while, the error goes away. And this error doesn't happen too often.
When this happens, we see the error log in service A throwing the “upstream connect error” message, but none in service B. So we think it might be related with the sidecars.
One thing we notice is that in service B, we get a lot of this error messages in the istio-proxy container:
[src/istio/mixerclient/report_batch.cc:109] Mixer Report failed with: UNAVAILABLE:upstream connect error or disconnect/reset before headers. reset reason: connection failure
And according to documentation when a request comes in, envoy asks Mixer if everything is good (authorization and other things), and if Mixer doesn’t reply, the request is not success. So that’s why exists an option called policyCheckFailOpen.
We have that in false, I guess is a sane default, we don’t want the request to go through if Mixer cannot be reached, but why can’t?
disablePolicyChecks: true
policyCheckFailOpen: false
controlPlaneSecurityEnabled: false
NOTE: istio-policy is running with the istio-proxy sidecar. Is that correct?
We don’t see that error in some other service which can also fail.
Another log that I can see a lot, and this one happens in all the services not running as root with fsGroup defined in the YAML files is:
watchFileEvents: "/etc/certs": MODIFY|ATTRIB
watchFileEvents: "/etc/certs/..2020_02_10_09_41_46.891624651": MODIFY|ATTRIB
watchFileEvents: notifying
One of the leads I'm chasing is about default circuitBreakers values. Could that be related with this?
Thanks
The error you are seeing is because of a failure to establish a connection to istio-policy
Based on this github issue
Community members add two answers here which could help you with your issue
If mTLS is enabled globally make sure you set controlPlaneSecurityEnabled: true
I was facing the same issue, then I read about protocol selection. I realised the name of the port in the service definition should start with for example http-. This fixed the issue for me. And . if you face the issue still you might need to look at the tls-check for the pods and resolve it using destinationrules and policies.
istio-policy is running with the istio-proxy sidecar. Is that correct?
Yes, I just checked it and it's with sidecar.
Let me know if that help.

Some scheduled tasks failing with error: Connection Failure: Status code unavailable

I've googled the issue and pretty much every answer has been related to a certificate issue. Problem is, we have other tasks on the same server that trigger just fine. The file runs as expected directly from a browser, so it's not an issue in the CF code. And with other scheduled tasks running fine I don't see it being a problem with any certificate. Any suggestions on what else could cause this?
From the log:
Information [DefaultQuartzScheduler_Worker-9] - MyTask - myreport triggered.
Information [DefaultQuartzScheduler_Worker-9] - Starting HTTP request {URL='http://myserver/reports/myreport.cfm', method='get'}
Error [DefaultQuartzScheduler_Worker-9] - Connection Failure: Status code unavailable

ColdFusion 2016 - Security service not available

CF 2016 on windows10 with IIS
I've checked other threads on similar issues and they don't appear to apply.
My laptop has needed to be crash-started on a number of occasions recently due to the laptop not waking up from sleep mode. A couple of times ColdFusion 2016 didn't start automatically and needed to be manually started. Now, ColdFusion appears to be starting automatically, but now I'm getting an error:
HTTP Error 500.0 - The Security service is not available.
I'm afraid I have no idea where to start on this or even what additional information to provide. So, I would really appreciate any hints.
The remainder of the error has the following information:
Detailed Error Information:
Module: IsapiModule Notification: ExecuteRequestHandler
Handler: ISAPI-dll
Error Code: 0x00000000
Requested URL: http://zbay_sys:80/jakarta/isapi_redirect.dll
Physical Path : C:\ColdFusion2016\config\wsconfig\1\isapi_redirect.dll
Logon Method: Anonymous
Logon User : Anonymous
I really hope I don't have to re-install CF
Glad, that you are sorted.
The error message says, "The Security service is not available." Thus IIS is showing http based error 500. If the service is not starting, there could likely be a problem at ColdFusion end.
Please try the following, if you face the similar issue in future:–
Stop ColdFusion service (if not already)
Launch Command prompt as Administrator
Browse to cf_root\cfusion\bin and run the following
command coldfusion.exe -start console
Try to access the CF Admin, once the services are started.
In case it gives an error message, please share the same.

NATS Error while developing echo service

I'm trying to develop a system service, so I use the echo service as a test.
I developed the service by following the directions on the CF doc.
Now the echo node can be running, but the echo gateway failed with the error "echo_gateway - pid=15040 tid=9321 fid=290e ERROR -- Exiting due to NATS error: Could not connect to server on nats://localhost:4222/"
I got into this issue and struck for almost a week finally someone helped me to resolve it. The underlying problemn is something else and since errors are not trapped properly it gives a wrong message. You need to goto github and get the latest code base. The fix for this issue is http://reviews.cloudfoundry.org/#/c/8891 . Once you fix this issue, you will most likely encounter a timeout field issue. the solution for that is to define the timeout field gateway.yml
A few additional properties became required in the echo_gateway.yml.erb file - specifically, the latest were default_plan and timeout, under the service group. The properties have been added to the appropriate file in the vcap-services-sample-release repo.
Looks like the fix for the misleading error has been merged into github. I haven't updated and verified this myself just yet but the gerrit comments indicate the solution is the same as what the node base has had for some time. I did previously run into that error handling and it was far more helpful.