Unable to launch task from a spring cloud data flow stream - cloud-foundry

I registered my task app in Spring Cloud Data Flow, created a definition for it and the status shows 'unknown'. I created the stream and trying to launch the task through task-sink and I get an error:
java.lang.IllegalStateException: failed to resolve MavenResource:
How to launch a task from the task-sink? Am I missing something? Any help is appreciated. Another question I have is how do I access the payload sent via TaskLaunchRequest in my task?
S1 http | step1: transformer-rabbit | log
S2 :S1.step1 > filter --expression=payload.contains('CUSTADDRMODRQ_V15') | task-processor | task-sink
task-sink is launching the task provided by the uri in the TaskLaunchRequest. It is looking for the resource as shown in the log
OUT Using manager EnhancedLocalRepositoryManager with priority 10.0 for /home/vcap/.m2/repository
OUT Using transporter HttpTransporter with priority 5.0 for https://repo.spring.io/libs-snapshot and finally failing.
The task is deployed in our repository and as mentioned I registered and created the definition for it as well.
This one is in cf environment and I am using SCDF server 1.0.0.M4.
In the application.properties for the task-sink i am providing maven.remote.repositories.snapshots.url=**
task create fis-ifx-event-task --definition "fis-event-task"
My goal is launching the task from the stream.
Thanks for the information. I am in fact using the BUILD-SNAPSHOT as I am unable to enable taks in 1.0.0M4 version. Here is the one I am using spring-cloud-dataflow-server-cloudfoundry-1.0.0.BUILD-20160808.144306-116. I am able to register and create task definitions. The status of the task definition is showing as 'unknown' even when I am using the sample task module provided by your team. But when I initiate the flow of the stream and when task-sink tries to launch the task, it is unable to find the maven resource. When I create the task definition, does the task module gets deployed? I don't see any app in Pivotal Apps Manager. As mentioned earlier, I provided maven.remote.repositories.snapshot.url in the application.properties file for the task-sink application. Another thing I observed is when I launch the task manually from dataflow shell it gives an error CF-UnprocessableEntity(10008): The request is semantically invalid: Unknown field(s): 'staging_disk_in_mb', 'staging_memory_in_mb' and also a message saying 'Source is empty'. Presently the task is supposed to print the timestamp and is not dependent on any input.
TaskProcessor code:
#EnableBinding(Processor.class)
#EnableConfigurationProperties(TaskProcessorProperties.class)
public class TaskProcessor {
#Autowired
private TaskProcessorProperties processorProperties;
public TaskProcessor() {
}
#Transformer(inputChannel = Processor.INPUT, outputChannel = Processor.OUTPUT)
#ELI(level = "info", eventType = ELIEventType.INBOUND)
public Object setupRequest(String message) {
Map<String, String> properties = new HashMap<String, String>();
properties.put("payload", message);
TaskLaunchRequest request = new TaskLaunchRequest(processorProperties.getUri(), null, properties, null);
return new GenericMessage<>(request);
}
}
TaskSink code:
#SpringBootApplication
#EnableTaskLauncher
#EnableBinding(Sink.class)
#EnableConfigurationProperties(TaskSinkProperties.class)
public class FisIfxEventTaskSinkApplication {
public static void main(String[] args) {
SpringApplication.run(FisIfxEventTaskSinkApplication.class, args);
}
}
I provided the stream I am using earlier in the post. Sink is receiving the TaskLaunchRequest with uri and payload as you can see here and unable to launch the task.
OUT registering [40, java.io.File] with serializer org.springframework.integration.codec.kryo.FileSerializer
2016-08-10T16:08:55.02-0600 [APP/0]
OUT Launching Task for the following resource TaskLaunchRequest{uri='maven://com.xxx:fis.ifx.event-task:jar:1.0-SNAPSHOT', commandlineArguments=[], environmentProperties={payload={"statusCode":0,"fisT
opic":"CustomerDataUpdated","payloadId":"CUSTADDRMODR``Q_V15","customerIds":[1597304]}}, deploymentProperties={}}

Before I begin, you have a number of questions here. In the future, it's better to break them up into multiple questions so that they are easier to find by other users and easier to answer. That being said:
A little context on the current state of things
In order to understand how things will work, it's important to understand the current state of things. The current releases of the software involved are:
Pivotal Cloud Foundry (PCF) - 1.7.12. This version is required for any task support.
Spring Cloud Task (SCT) - 1.0.2.RELEASE
Spring Cloud Data Flow CF (SCDF) - 1.0.0.BUILD-SNAPSHOT (current as of the date of this post).
Currently PCF 1.7.12+ has all the capabilities to run tasks. You can create v3 applications (the type of application used to launch a task), run it as a task, etc. However, the tooling around that functionality is not currently complete. There is no support for v3 applications in Apps Manager or the CLI. There is a plugin for the CLI that is more of a dev tool that can be used to help with some functions (it will show you logs, etc), but it is not fully functional and requires a specific version of the CLI to work [1]. This is one of the reasons that the task functionality within PCF is still considered experimental.
Spring Cloud Task is currently GA and supports all the functionality needed to effectively run tasks on CF. However, it's important to note that SCT doesn't handle orchestration so the actual launching of tasks on CF is the responsibility of either the user, or Spring Cloud Data Flow (the easier route).
Spring Cloud Data Flow's Cloud Foundry server implementation currently has functionality to launch tasks on PCF in the latest snapshots. We have validated this against 1.7.12 as well as the development branch of 1.8.
The task workflow within SCDF
Tasks are fundamentally different from stream applications within the context of SCDF. When you create a stream definition, you are given the option to deploy it. What this does is it actually downloads the Spring Boot über jars and deploys them to PCF as long running processes. If they go down, PCF, will relaunch them as expected, etc.
Tasks on the other hand, are not deployed. They are launched. The difference is that while you create a task definition, there is nothing deployed until you click launch. And when the task completes, the software is shut down and cleaned up. So while a stream definition may have states, it's really a one to one relationship between the definition and the deployed software. Where with a task, you can launch a task definition as many times as you want.
Your issues
Reading through your post, I see a few things that you are struggling with. Let me see if I can help:
Task Definitions within SCDF and launching them via a stream - When launching a task from a stream, the task registry within SCDF is not used. The sink expects the URL for the resource to be within the TaskLauchRequest.
Apps Manager and tasks - As mentioned above, there is no support for v3 applications in Apps Manager yet so you won't be able to see your tasks there.
Viewing the logs - In order to debug what's going wrong with launching your task on CF, you're going to want to view the logs. To do so, use the v3 CLI plugin mentioned above to view them. It's important to note that you can only tail live logs with the plugin, not view logs that have previously been rendered. Because of that, when testing, you'll want to tail the logs as soon as the app is created, before it's launched.
Error in SCDF Shell - The error you received from the SCDF shell (CF-UnprocessableEntity(10008):...) leads me to wonder if you have both the correct version of PCF (1.7.12+) and the correct version of the following other libraries:
spring-cloud-deployer-cloudfoundry - The latest snapshots
cf-java-client - 2.0.0.M10+
reactor-core - 3.0.0.RC1+
I hope this helps!
[1] https://github.com/cloudfoundry/v3-cli-plugin

Task support is not available in 1.0.0.M4 release of SCDF's CF-server. In this release, the task commands/REST-APIs should be disabled - see here. And for that reason, you wouldn't see any docs related to Tasks in the 1.0.0.M4 reference guide.
That said, the Task support is available/enabled in the BUILD-SNAPSHOT release. If you're locally building the CF-server and upon pushing it to CF, you could take advantage the task commands in the shell to create and launch task definitions.

Related

Running Batch python processes on Google Cloud

I have couple of Python scripts which I would like to schedule to run once a month on Google cloud. The scripts basically trigger DLP jobs, extract data catalog information to a file in GCS. These batch workloads would hardly run for 30 mins. And so I don't need to use services like GKE, composer etc which are very resource intensive.
For these batch workloads I would like to know the best options available in GCP. Looking at some of the blog posts I found below article to use Cloud Scheduler-> Pub/Sub-> Cloud Functions -> Create VM (using a startup script).
https://medium.com/google-cloud/running-a-serverless-batch-workload-on-gcp-with-cloud-scheduler-cloud-functions-and-compute-86c2bd573f25
I have below questions with above design..
1) How long does the Cloud Function run as it starts the VM? I know cloud function has a timeout of 9mins ..what happens if the VM takes longer than 9mins to process the startup script?
Any other design ideas are much appreciated.
Thanks
I'm the author of that medium post.
1) How long does the Cloud Function run as it starts the VM?
You can change the Cloud Function code to not wait for the response, It's using NodeJS so you just don't have to wait for the Promise.
Also in that solution the Cloud Function job is only to trigger the VM creation.
.createVM(vmName, vmConfig)
.then(data => {
// Operation pending.
const vm = data[0];
const operation = data[1];
console.log(`VM being created: ${vm.id}`);
console.log(`Operation info: ${operation.id}`);
return operation.promise();
// This will return right away with the VM pending state, you can finish
// your logic here, and not wait for VM creation to finish.
// You can even ignore this step if you don't need the VM ID logged for
// debugging purposes
})
.then(() => {
const message = 'VM created with success, Cloud Function finished execution.';
console.log(message);
}
Using that same code, in the worst case (if it takes more than 9 minutes), the Cloud Function will timeout but the VM creation will continue.
The desing that I suggest is using: Cloud Scheduler + Pub/Sub + Compute Engine
This design in few words:
- you compute engine will have a utility that listens to a Cloud Pub/Sub topic
- this utility will execute upon receiving a new event from the Topic and run a cron job on the instance
- Cloud scheduler is used here to push messages to the Pub/Sub Topic in a time that you can specify in your job.
By using Pub/Sub to decouple the task-scheduling logic from the logic
running the commands on Compute Engine, you can update your cron
scripts as needed, without updating the Cloud Scheduler configuration.
You can also change your task schedule without updating the utility
service on your Compute Engine instances
you can find full explanation of this design and a sample code by following this and this.
let me know if there is anything not obvious.

uknown reason for google cloud compute engine VM shutdown?

One of the vm instances on google cloud compute was shutdown, with an event log in stackdriver without ip or actor (user or service or system) which initiated the event. The instance has onHostMaintenance set to migrate and automaticRestart set to true. This particular instance has migrated on maintenance without error before. The stackdriver event log looks like
{
actor: {
user: ""
},
event_subtype: "compute.instances.stop",
event_timestamp_us: "1531781734907624",
event_type: "GCE_API_CALL",
ip_address: "",
}
The user and ip_address fields are NOT redacted. They have empty values on actual log.
is this common? how does one identify the cause for shutdown in these peculiar cases ?
According to your event type, I dive in the doc of Activity logs,
Compute Engine API calls - GCE_API_CALL events are API calls that change the state of a resource.
It seems like someone may use API calls to shutdown your VM. Looking your settings and logs on the VM instance, it can't be a hostError or Maintenance event. As far as you turn on the onHostMaintenance set to migrate and automaticRestart set to true. Your VM will always be migrated to another hardware.
Out of curiosity, did you restart your VM? did it shutdown again? How often does it happen?
Seems like app engine had an outage during that time, incident .

GoCD Custom Command

I am trying to run a very simple custom command "echo helloworld" in GoCD as per the Getting Started Guide Part 2 however, the job does not finish with the Console saying Waiting for console logs and raw output saying Console log for this job is unavailable as it may have been purged by Go or deleted externally.
My job looks like the following which was taken from typing "echo" in the Lookup Command (which is different to the Getting Started example which I tried first with the same result)
Judging from the screenshot, the problem seems to be that no agent is assigned to the task. For an agent to be assigned, it must satisfy all of these conditions:
An agent must be running, and connected to the server
The agent must be enabled on the "Agents" page
If you use environments, the job and the agent need to be in the same environment
The agent needs to have all of the resources assigned that are configured in the job
Found the issue.
The Pipelines have to be in the same Environment to work.

GAE service running on Flexible Env. as target of a task queue

According to the google doc, a service running in the flexible enviroment can be the target of a push task:
Outside of the standard environment, you can't add tasks to push
queues, but a service running in the flexible environment can be the
target of a push task. You can specify this using the target parameter
when adding a task to queue or by specifying the default target for
the queue in queue.yaml.
However, when I tried to do it I get 404 errors in the flexible service.
That's totally normal due to the required endpoint (/_ah/queue/deferred) for task queues is it not defined in the flexible service.
How do I become a flexible service in a valid target for task queues?
Do I have to define that endpoint in my code in some way?
Usually, you'll need to write a handler in your worker service to do the processing after receiving a task. In the case of push tasks, the service will send HTTP requests to your whatever url you specify. If no url is specified the default URL /_ah/queue/[QUEUE_NAME] will be used.
Now, from the endpoint you mention, it seems you are using deferred tasks, which are a somewhat special kind. Please, see this thread for a workaround by adding the needed url entry. It mentions Managed VMS but it should still work.

Amazon EC2 custom AMI not running bootstrap (user-data)

I have encountered an issue when creating custom AMIs (images) on EC2 instances. If I start up a Windows default 2012 server instance with a custom bootstrap/user-data script such as;
<powershell>
PowerShell "(New-Object System.Net.WebClient).DownloadFile('http://download.microsoft.com/download/3/2/2/3224B87F-CFA0-4E70-BDA3-3DE650EFEBA5/vcredist_x64.exe','C:\vcredist_x64.exe')"
</powershell>
It will work as intended and go to the URL and download the file, and store it on the C: Drive.
But if I setup a Windows Server Instance, then create a image from it, and store it as a Custom AMI, then deploy it with the exact same custom user-data script it will not work. But if I go to the instance url (http://169.254.169.254/latest/user-data) it will show the script has imported successfully but has not been executed.
After checking the error logs I have noticed this on a regular occasion:
Failed to fetch instance metadata http://169.254.169.254/latest/user-data with exception The remote server returned an error: (404) Not Found.
Update 4/15/2017: For EC2Launch and Windows Server 2016 AMIs
Per AWS documentation for EC2Launch, Windows Server 2016 users can continue using the persist tags introduced in EC2Config 2.1.10:
For EC2Config version 2.1.10 and later, or for EC2Launch, you can use
true in the user data to enable the plug-in after
user data execution.
User data example:
<powershell>
insert script here
</powershell>
<persist>true</persist>
For subsequent boots:
Windows Server 2016 users must additionally enable configure and enable EC2Launch instead of EC2Config. EC2Config was deprecated on Windows Server 2016 AMIs in favor of EC2Launch.
Run the following powershell to schedule a Windows Task that will run the user data on next boot:
C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1 –Schedule
By design, this task is disabled after it is run for the first time. However, using the persist tag causes Invoke-UserData to schedule a separate task via Register-FunctionScheduler, to persist your user data on subsequent boots. You can see this for yourself at C:\ProgramData\Amazon\EC2-Windows\Launch\Module\Scripts\Invoke-Userdata.ps1.
Further troubleshooting:
If you're having additional issues with your user data scripts, you can find the user data execution logs at C:\ProgramData\Amazon\EC2-Windows\Launch\Log\UserdataExecution.log for instances sourced from the WS 2016 base AMI.
Original Answer: For EC2Config and older versions of Windows Server
User data execution is automatically disabled after the initial boot. When you created your image, it is probable that execution had already been disabled. This is configurable manually within C:\Program Files\Amazon\Ec2ConfigService\Settings\Config.xml.
The documentation for "Configuring a Windows Instance Using the EC2Config Service" suggests several options:
Programmatically create a scheduled task to run at system start using schtasks.exe /Create, and point the scheduled task to the user data script (or another script) at C:\Program Files\Amazon\Ec2ConfigServer\Scripts\UserScript.ps1.
Programmatically enable the user data plug-in in Config.xml.
Example, from the documentation:
<powershell>
$EC2SettingsFile="C:\Program Files\Amazon\Ec2ConfigService\Settings\Config.xml"
$xml = [xml](get-content $EC2SettingsFile)
$xmlElement = $xml.get_DocumentElement()
$xmlElementToModify = $xmlElement.Plugins
foreach ($element in $xmlElementToModify.Plugin)
{
if ($element.name -eq "Ec2SetPassword")
{
$element.State="Enabled"
}
elseif ($element.name -eq "Ec2HandleUserData")
{
$element.State="Enabled"
}
}
$xml.Save($EC2SettingsFile)
</powershell>
Starting with EC2Config version 2.1.10, you can use <persist>true</persist> to enable the plug-in after user data execution.
Example, from the documentation:
<powershell>
insert script here
</powershell>
<persist>true</persist>
Another solution that worked for me is to run Sysprep with EC2Launch.
The issue is that AWS doesn't reestablish the route to the profile service (169.254.169.254) in your custom AMI. See response by SanjitPatel in this post. So when I tried to use my custom AMI to create spot requests, my new instances were failing to find user data.
Shutting down with Sysprep, essentially forces AWS re-do all setup work on the instance, as if it were run for the first time. So when you create your instance, shut it down with Sysprep and then create your custom AMI, AWS will setup the profile service route correctly for the new instances and execute your user data. This also avoids manually changing Windows Tasks and executing user data on subsequent boots, as persist tag does.
Here is a quick step-by-step:
Create an instance using one of the AWS Windows AMIs (Windows Server 2016 Nano Server doesn't support Sysprep) and passing your desired user data (this may be optional, but good to make sure AWS wires setup scripts correctly to handle user data).
Customize your instance as needed.
Shut down your instance with Sysprep. Just open EC2LaunchSettings application and click "Shutdown with Sysprep". Full instructions here.
Create your custom AMI from the instance you just shut down.
Use your custom AMI to create other instances, passing user data on instance creation. User data will be executed on instance launch. In my case, I used Spot Request screen, which had a User Data text box.
Hope this helps!
At the end of initial bootstrap (UserData) script, just append persist tag as shown below.
Works perfectly.
<powershell>
insert script here
</powershell>
<persist>true</persist>
For those people that got here from Google and are running a Server 2016 instance, it seems that this is no longer possible.
Server2016 doesn't have ec2config service and so you can't use the persist flag.
<persist>true</persist>
Described in Anthony Neace's post.
Server 2016 uses EC2Launch and I haven't yet seen how it's possible to run a script at every boot. You can run a script on the first boot, but subsequent boots will not run it.
I added below powershell script to run during the AMI bake process which helped me fix this issue. This was Windows server 2019.
$EC2LaunchInitInstance = "C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1"
$EC2LaunchSysprep = "C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\SysprepInstance.ps1"
Invoke-Expression -Command "$EC2LaunchInitInstance -Schedule"
Invoke-Expression -Command "$EC2LaunchSysprep -NoShutdown"