How do you undo a deleted agent for a GOCD server? - go-cd

I disabled/deleted a GOCD go-agent. It seems I can no longer attempt to register this go-agent, as it doesn't appear in the agents list on service start.
Is this agent getting blacklisted somewhere since it was deleted? Just tried to disable/delete another go-agent that was working just fine, and it also no longer appears in the list on service start.
I checked the cruise-config.xml, with no mentioned of the deleted agent. The UI doesn't show deleted agents either.

If you disable an agent on in the GoCD's server agent administration page, it is shown grayed out at the end of the list. You can re-enable it.
If you delete and agent, GoCD truly forgets about it.
If you then, on an agent machine, start an agent that you had previously deleted, it'll show up in the agent list again in state Pending, and you have to enable it.
If the agent doesn't show up, it's worth checking the agent's log (/var/log/go-agent/*.log) and possibly the server's log if there are any cues about why the agent doesn't register.
Finally, you can try to delete /var/lib/go-agent/config/ (make a backup first) on the agent. Restarting the agent then allocates a new agent UUID, so the GoCD server will see it as a new agent.

Related

GoCD Custom Command

I am trying to run a very simple custom command "echo helloworld" in GoCD as per the Getting Started Guide Part 2 however, the job does not finish with the Console saying Waiting for console logs and raw output saying Console log for this job is unavailable as it may have been purged by Go or deleted externally.
My job looks like the following which was taken from typing "echo" in the Lookup Command (which is different to the Getting Started example which I tried first with the same result)
Judging from the screenshot, the problem seems to be that no agent is assigned to the task. For an agent to be assigned, it must satisfy all of these conditions:
An agent must be running, and connected to the server
The agent must be enabled on the "Agents" page
If you use environments, the job and the agent need to be in the same environment
The agent needs to have all of the resources assigned that are configured in the job
Found the issue.
The Pipelines have to be in the same Environment to work.

Azure Webjob using QueueTrigger not polling Queue in new Storage account

I have a WebApp with a WebJob (monitoring a queue with QueueTrigger) in Azure that has been working fine for over a year. I'm trying to re-organize some of my Azure resources and would like the WebJob to monitor a different Queue than it has been.
I've created a new (non-classic) storage account and changed my code to insert new messages into the new Queue. I can see new messages showing up in the new Queue but the WebJob is never triggered. (The old queue is in a classic storage account. I don't think that matters, but it has crossed my mind. I have seen some older posts that make me think this used to be a problem, but some newer ones that make me think it's OK.)
My code is pretty straightforward (almost straight out of tutorial). It wants the Queue connection strings in app.config for both AzureWebJobsDashboard and AzureWebJobsStorage, which I have done.
var host = new JobHost();
host.RunAndBlock();
To verify that the new code is being successfully deployed, I have deleted the old WebJob in the Azure portal and verified the file dates in the App_Data\Jobs... folder are current and I've looked at the value of "AzureWebJobsStorage" in the deployed config file and it is the new Queue's connection string.
I finally thought to manually insert a message into the old queue (that none of my code is pointing at any more). Sure enough - when I do that, the WebJob is triggered and runs.
I think changing the connection string values in App.Config should be all that's needed to have it "watch" a new queue, but that doesn't seem to be enough. Does anyone know what else would need to be changed?
So I figured this out. Not only do you have to have the Connection String entries (pointing to the queue) in the App.Config of the WebJob project, but also for the Web Application. I had configured AzureWebJobsDashboard and AzureWebJobsStorage in the Application Settings section of the Web App in the Azure portal.
Those were still pointing to the old queue, and apparently those were the important ones. Once I updated those settings to point to the new queue, it worked as it should.

(AWS SWF) Is there a way to get a list of all activity workers listening on a particular tasklist?

In our beta stack, we have a single EC2 instance listening to a tasklist. Sometimes another developer in the team start's his own instance for testing purposes and forget to turn it off. This creates problems for the next developer who tries to start an activity only for it to be taken up by the last developer's machine. Is there a way to get the hostnames of all activity workers listening to a particular tasklist ?
It is not currently possible to get a list of pollers waiting on a task list through the SWF API. The workaround is to look at the identity field on the ActivityExecutionStarted event after it was picked up by the wrong worker.
One way to avoid this issue is always use a task list name that is specific to a machine or developer to avoid collisions.

One or more services have started or stopped unexpectedly SPTimerService (SPTimerV4)

I have stop and restart services(Sharepoint Administration & Sharepoint Timer Service)
I cleaned the Configuration Cache by using mentioned steps.
Summary of the steps to clear the timer job:
Stop SharePoint Timer service on all servers in the farm.
Browse to C:\ProgramData\Microsoft\SharePoint\Config{GUID} where the {GUID} folder contains a bunch of XML files and NOT the files with a “.PERSITEDFILE” extension.
Delete all the XML files
Update the contents of the Cache.ini file to just say “1” (without quotes).
Restart the SharePoint Timer service on each server
Reanalyze the issue in Health Analyzer
Does anyone know why this keeps occurring and how I can stop it?
First of all try and check your ULS Logs and see if there is any error that arise.
Secondly try and maybe check the event viewer on your SharePoint server to see if any errors are shown and make sure you have enough disk space available.
and also you might want to check this :Clearing Timer Services
Let me know if you see any error post it here.
hope it helps.
Yotam.

Amazon EC2 custom AMI not running bootstrap (user-data)

I have encountered an issue when creating custom AMIs (images) on EC2 instances. If I start up a Windows default 2012 server instance with a custom bootstrap/user-data script such as;
<powershell>
PowerShell "(New-Object System.Net.WebClient).DownloadFile('http://download.microsoft.com/download/3/2/2/3224B87F-CFA0-4E70-BDA3-3DE650EFEBA5/vcredist_x64.exe','C:\vcredist_x64.exe')"
</powershell>
It will work as intended and go to the URL and download the file, and store it on the C: Drive.
But if I setup a Windows Server Instance, then create a image from it, and store it as a Custom AMI, then deploy it with the exact same custom user-data script it will not work. But if I go to the instance url (http://169.254.169.254/latest/user-data) it will show the script has imported successfully but has not been executed.
After checking the error logs I have noticed this on a regular occasion:
Failed to fetch instance metadata http://169.254.169.254/latest/user-data with exception The remote server returned an error: (404) Not Found.
Update 4/15/2017: For EC2Launch and Windows Server 2016 AMIs
Per AWS documentation for EC2Launch, Windows Server 2016 users can continue using the persist tags introduced in EC2Config 2.1.10:
For EC2Config version 2.1.10 and later, or for EC2Launch, you can use
true in the user data to enable the plug-in after
user data execution.
User data example:
<powershell>
insert script here
</powershell>
<persist>true</persist>
For subsequent boots:
Windows Server 2016 users must additionally enable configure and enable EC2Launch instead of EC2Config. EC2Config was deprecated on Windows Server 2016 AMIs in favor of EC2Launch.
Run the following powershell to schedule a Windows Task that will run the user data on next boot:
C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1 –Schedule
By design, this task is disabled after it is run for the first time. However, using the persist tag causes Invoke-UserData to schedule a separate task via Register-FunctionScheduler, to persist your user data on subsequent boots. You can see this for yourself at C:\ProgramData\Amazon\EC2-Windows\Launch\Module\Scripts\Invoke-Userdata.ps1.
Further troubleshooting:
If you're having additional issues with your user data scripts, you can find the user data execution logs at C:\ProgramData\Amazon\EC2-Windows\Launch\Log\UserdataExecution.log for instances sourced from the WS 2016 base AMI.
Original Answer: For EC2Config and older versions of Windows Server
User data execution is automatically disabled after the initial boot. When you created your image, it is probable that execution had already been disabled. This is configurable manually within C:\Program Files\Amazon\Ec2ConfigService\Settings\Config.xml.
The documentation for "Configuring a Windows Instance Using the EC2Config Service" suggests several options:
Programmatically create a scheduled task to run at system start using schtasks.exe /Create, and point the scheduled task to the user data script (or another script) at C:\Program Files\Amazon\Ec2ConfigServer\Scripts\UserScript.ps1.
Programmatically enable the user data plug-in in Config.xml.
Example, from the documentation:
<powershell>
$EC2SettingsFile="C:\Program Files\Amazon\Ec2ConfigService\Settings\Config.xml"
$xml = [xml](get-content $EC2SettingsFile)
$xmlElement = $xml.get_DocumentElement()
$xmlElementToModify = $xmlElement.Plugins
foreach ($element in $xmlElementToModify.Plugin)
{
if ($element.name -eq "Ec2SetPassword")
{
$element.State="Enabled"
}
elseif ($element.name -eq "Ec2HandleUserData")
{
$element.State="Enabled"
}
}
$xml.Save($EC2SettingsFile)
</powershell>
Starting with EC2Config version 2.1.10, you can use <persist>true</persist> to enable the plug-in after user data execution.
Example, from the documentation:
<powershell>
insert script here
</powershell>
<persist>true</persist>
Another solution that worked for me is to run Sysprep with EC2Launch.
The issue is that AWS doesn't reestablish the route to the profile service (169.254.169.254) in your custom AMI. See response by SanjitPatel in this post. So when I tried to use my custom AMI to create spot requests, my new instances were failing to find user data.
Shutting down with Sysprep, essentially forces AWS re-do all setup work on the instance, as if it were run for the first time. So when you create your instance, shut it down with Sysprep and then create your custom AMI, AWS will setup the profile service route correctly for the new instances and execute your user data. This also avoids manually changing Windows Tasks and executing user data on subsequent boots, as persist tag does.
Here is a quick step-by-step:
Create an instance using one of the AWS Windows AMIs (Windows Server 2016 Nano Server doesn't support Sysprep) and passing your desired user data (this may be optional, but good to make sure AWS wires setup scripts correctly to handle user data).
Customize your instance as needed.
Shut down your instance with Sysprep. Just open EC2LaunchSettings application and click "Shutdown with Sysprep". Full instructions here.
Create your custom AMI from the instance you just shut down.
Use your custom AMI to create other instances, passing user data on instance creation. User data will be executed on instance launch. In my case, I used Spot Request screen, which had a User Data text box.
Hope this helps!
At the end of initial bootstrap (UserData) script, just append persist tag as shown below.
Works perfectly.
<powershell>
insert script here
</powershell>
<persist>true</persist>
For those people that got here from Google and are running a Server 2016 instance, it seems that this is no longer possible.
Server2016 doesn't have ec2config service and so you can't use the persist flag.
<persist>true</persist>
Described in Anthony Neace's post.
Server 2016 uses EC2Launch and I haven't yet seen how it's possible to run a script at every boot. You can run a script on the first boot, but subsequent boots will not run it.
I added below powershell script to run during the AMI bake process which helped me fix this issue. This was Windows server 2019.
$EC2LaunchInitInstance = "C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1"
$EC2LaunchSysprep = "C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\SysprepInstance.ps1"
Invoke-Expression -Command "$EC2LaunchInitInstance -Schedule"
Invoke-Expression -Command "$EC2LaunchSysprep -NoShutdown"