Azure Image fails to deploy to VMSS or VM - azure-virtual-machine

I have an image, which has been sysprep'd with no unattend.xml file.
When I try to deploy it to a VMSS, it fails with:
New-AzVmss : Long running operation failed with status 'Failed'.
Additional Info:'OS provisioning for VM 'Lancelot-vmss_0' failed.
Error details: This installation of Windows is undeployable. Make
sure the image has been properly prepared (generalized).
It was generalized, if i get the boot diagnostics screen of the failed VM or VMSS: VM instance, it shows:
I have tried this several times and each time get the same result.
The Image has been recreated several times, I take the current Image and deploy it to a VM, update the software within the VM, and create a new Image.
This process used to work but no longer works even though it has now changed.
I thought it might have something to do with Rearm value, but i have checked it before creating this latest image and it said it has 987 remaining.
Serial Console in Azure Portal looks to connect but just shows a flashing courser (even if I press Enter)
I can see a failed event which is:
Additional error information is available for this virtual machine:
GENERAL Provisioning state Provisioning failed. OS provisioning for VM
'LancelotBase' failed. Error details: This installation of Windows is
undeployable. Make sure the image has been properly prepared
(generalized). Instructions for Windows:
https://azure.microsoft.com/documentation/articles/virtual-machines-windows-upload-image/.
OSProvisioningClientError Provisioning state error code
ProvisioningState/failed/OSProvisioningClientError Guest agent Not
Ready. VM status blob is found but not yet populated.
Does anyone know what is causing this or how I can investigate it considering I can not find a way to access its logs inside the VM?

Related

How to run docker task with Amazon ECS - getting error `STOPPED (CannotStartContainerError: Error response from dae)`

My goal is to execute a benchmark deployed as a docker image. While doing so, I had too many issues, so I decided to first make something extremely trivial work.
So I decided to follow the guide in https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-task-definition.html
and use the "ping" example - it should just ping a domain couple of times, and stop.
The problem is, I always receive this message in the task status:
STOPPED (CannotStartContainerError: Error response from dae)
I tried it with various subnets and security groups, but the result is always the same - the task starts, and after a minute or two fails with the message above.
I even tried it on a fresh new AWS account, using these steps:
in https://us-east-2.console.aws.amazon.com/ecs/ created new cluster (networking only)
in task definitions, created a taskdef
with docker image alpine:latest, command ping -c 4 google.com
then I select the cluster, switch to "tasks" tab, and enter the run dialog
with one of pre-created subnets
After executing:
the task appears in the cluster's tasks list in PENDING state
it takes couple of minutes
eventually (using refresh button), it changes to the mentioned message - STOPPED (CannotStartContainerError: Error response from dae)
My guess is that the reason is:
either the task cannot download the image
or the instance cannot reach outside net
What can I be doing wrong? How to fix?
In my case too the log group was the problem. The one I had configured wasnt working. Hence I enabled the "Auto-configure CloudWatch Logs" option in the "Log Configuration" of the container settings.
Also if you open the stopped task, navigate to the container section, expand it, under the Details section you can see a detailed error message. Screenshot below
It could be a problem with the entry point as pointed in the comments of the question (in the task definition) Entrypoint: ["sh","-c"]
It could also be a bad reference, for example a wrong log group in the LogConfiguration or something similar.
I just create de group log in my cloudwatch console because it have not created, and now everything is going well.

Google Cloud Dataprep: Transformation engine unavailable due to prior crash (exit code: -1)

I am trying to create a flow using Google Cloud Dataprep. The flow takes a data set from Big Query which contains app events data from Firebase Analytics to flatten event parameters for easier analysis. I keep getting the following error before even being able to create the first step (recipe):
Transformation engine unavailable due to prior crash (exit code: -1)
See top right corner in the screenshot below
Screenshot
The error message you received is particularly challenging in that it
is so generic. The root cause could be within the platform, or it
could be in whatever execution environment you used for the job.
Unfortunately, we don't have the resources right now to capture and
document all of the error messages that can be emitted during the job
execution process, which can span a wide variety of servers and other
software platforms.
I encountered the same problem. First I tried following steps:
Refresh the browser (i.e., click the Reload button top left)
"Hard refresh" the browser (i.e., ctrl + Reload)
Clear cache + cookies (i.e., https://support.google.com/accounts/answer/9098093?co=GENIE.Platform=Desktop&hl=en&visit_id=636802035537591679-2642248633&rd=1)
References:
https://community.trifacta.com/s/question/0D51L00005dG3MXSA0/i-was-working-on-a-recipe-and-i-received-the-error-message-transformation-engine-unavailable-due-x-to-prior-crash-exit-code-1-why-am-i-getting-this-error
https://community.trifacta.com/s/question/0D51L00005choIbSAI/unable-to-develop-on-our-trifacta-42-platform-for-the-past-12-hours-steps-added-to-recipes-are-lost-and-having-to-recode-the-error-given-is-transformation-engine-unavailable-what-is-causing-this-error
However this did not solve the problem. Then I tried:
Confirm that your Chrome version is 68+. If not, please upgrade.
Navigate to chrome://nacl/ and ensure that PNaCl is enabled.
Navigate to chrome://components/ and ensure that the PNaCl Version is not 0.0.0.0. Click on Check for Updates
Did not solve the problem either.
References:
https://community.trifacta.com/s/question/0D51L00005dDrcmSAC/not-able-to-preview-data-sources-or-edit-recipes
I got the info from Trifacta, that there has been an internal issue after maintenance. So if non of the above solutions work, you just have to wait and see when they fix the problem.

GoCD Custom Command

I am trying to run a very simple custom command "echo helloworld" in GoCD as per the Getting Started Guide Part 2 however, the job does not finish with the Console saying Waiting for console logs and raw output saying Console log for this job is unavailable as it may have been purged by Go or deleted externally.
My job looks like the following which was taken from typing "echo" in the Lookup Command (which is different to the Getting Started example which I tried first with the same result)
Judging from the screenshot, the problem seems to be that no agent is assigned to the task. For an agent to be assigned, it must satisfy all of these conditions:
An agent must be running, and connected to the server
The agent must be enabled on the "Agents" page
If you use environments, the job and the agent need to be in the same environment
The agent needs to have all of the resources assigned that are configured in the job
Found the issue.
The Pipelines have to be in the same Environment to work.

Resource manager does not transit to active state from standby

One spark job was running for more than 23 days and eventually caused the resource manager to crash. After restarting the resource manager istance (there are two of them in our cluster) both of them stayed in standby state.
And we are getting this error:
ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Failed to load/recover state
org.apache.hadoop.yarn.exceptions.YarnException: Application with id application_1470300000724_40101 is already present! Cannot add a duplicate!
We could not kill 'application_1470300000724_40101' from yarn as the resource manager is not working. So we killed all the instances from Unix level on all nodes but dint work. We have tried rebooting all nodes and still the same.
Somewhere one entry of that job is still there and preventing the resource manager to get elected as active. We are using cloudera 5.3.0 and I can see that this issue has been addressed and resolved in cloudera 5.3.3. But at this moment we need a workaround to get past for now.
To resolve this issue we can format RMStateStore by executing the below command:
yarn resourcemanager -format-state-store
But be careful as this will clear all the application history that were executed before executing this command.

SAS Grid Validation error

I am getting the following error message:
"Objspawn was unable to launch the server SASApp - Workspace Server due to the server launch exceeding the specified wait time.
Failed to start the server"
This is a grid environment, and the new application servers created - (for example SASAppVS, created for the Visual Statistics use) - work fine. I am new to the grid environment. So, any help regarding this would be appreciated.
more on SAS