Is there a way to specify ignoreExisting on pipelineJob? - jenkins-job-dsl

Is there a way to specify ignoreExisting on pipelineJob? I don't see it listed in plugin/job-dsl/api-viewer/index.html but maybe I'm missing a way to do it.
In my setup all jobs are defined using job dsl thru the configuration as code module. All jobs defined by jobs dsl are use to load pipelines where all info for the jobs is configured. Since all configuration from the jobs are stored in the pipeline, I'd like to be able to define each job and have it not be modified by job dsl again unless the job is removed.
Current behavior is that the job dsl overwrites any changes in the job made by the pipeline which is not what I want. Any way around this? I thought ignoreExisting would do the trick but it doesn't seem to be available in pipelineJob

Related

How to get dataflow job id from inside that dataflow job - JAVA

In my current architecture, multiple dataflow jobs are triggered at various stages, as part of ABC framework, I need to capture the job id of those jobs as audit metrics inside the dataflow pipeline and update it in BigQuery.
How do I get the run id of dataflow job from the pipeline using JAVA?
Is there any existing method that I can use for that or do I need to use google cloud's client library inside the pipeline for that?
If you are submitting to dataflow, I believe this might work:
DataflowPipelineJob result = (DataflowPipelineJob)pipeline.run()
result.getJobId()
But you cannot access that within the pipeline itself afaik (DoFns etc).
The best way to ensure you know your job id/name, is to set it yourself. You can do this by setting --jobName and this is accessible via options.getJobName(), dataflow will use this. Note it must be unique.

It is possible to re-run a job in Google Cloud Dataflow after succeded

Maybe the question sounds stupid but I was wondering if once the job is successfully finished and having ID, is it possible to start the same job again?
Or is it necessary to create another one?
Because otherwise I would have the job with the same name throughout the list.
I just want to know if there is a way to restart it without recreating it again.
It's not possible to run the exact same job again, but you can create a new job with the same name that runs the same code. It will just have a different job ID and show up as a separate entry in the job list.
If you want to make running repeated jobs easier, you can create a template. This will let you create jobs from that template via a gcloud command instead of having to run your pipeline code.
Cloud Dataflow does have a re-start function. See SDK here. One suggested pattern (to help with deployment) is to create a template for the graph you want to repeatedly run AND execute the template.

Is it possible for a workflow to run the same job multiple times with different parameters?

I am new to amazon's aws-glue and I am still trying to figure it out.
Currently, I have a python shell glue job and every time I execute it I change the job parameters.
I am looking at the workflows and have managed to parse the parameters through there and set a trigger to run every day.
My question would be:
Is there a way for workflows to create instances and concurrently execute the same job with different parameters?
Is creating multiple workflows is the only way to go about it?
I guess you did it through triggers right. Thats the way. You can parameterize your Glue Job (the underlying Python code) and create multiple triggers that uses the same Glue Job but different parameters. (You have an option to pass the job parameters in Triggers).
"Is there a way for workflows to create instances and concurrently execute the same job with different parameters? Is creating multiple workflows is the only way to go about it?"
-- I tried this approach but I didnot find any way out.

Using AWS Batch can a docker image be specified dynamically in a job definition?

I want to create jobs in AWS Batch that vary on the image that is used to launch the container. I'd like to do this without creating a different Job Definition for each image. Is it possible to parameterize the image property using job definition parameters? If not, what's the best way to achieve this or do I have to just create job definitions on the fly in my application?
I would really love this functionality as well. Sadly, it appears the current answer is no.
Batch allows parameters, but they're only for the command.
AWS Batch Parameters
You may be able to find a workaround be using a :latest tag, but then you're buying a ticket to :latest hell.
My current solution is to use my CI pipeline to update all dev job definitions using the aws cli (describe-job-definitions then register-job-definition) on each tagged commit.
To keep my infrastructure-as-code consistent, I've moved the version for batch job definitions into an environment variable that I retrieve before running any terraform commands.
Typically you make a job definition for a docker image.
However that job definition and docker can certainly do anything you've programmed it to do so it can be multi-purpose and you pass in whatever parameter or command line you would like to execute.
You can override most of the parameters in a Job definition when you submit the job.

How to automate the Updating/Editing of Amazon Data Pipeline

I want to use AWS Data Pipeline service and have created some using the manual JSON based mechanism which uses the AWS CLI to create, put and activate the pipeline.
My question is that how can I automate the editing or updating of the pipeline if something changes in the pipeline definition? Things that I can imagine changing could be schedule time, addition or removal of Activities or Preconditions, references to DataNodes, resources definition etc.
Once the pipeline is created, we cannot edit quite a few things as mentioned here in the official doc: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-manage-pipeline-modify-console.html#dp-edit-pipeline-limits
This makes me believe that if I want to automate the updating of pipeline then I would have to delete and re-create/activate a new pipeline? If yes, then the next question is that how can I create a automated process which identifies the previous version's ID, deletes it and creates a new one? Essentially trying to build a release management flow for this where the configuration JSON file is released and deployed automatically.
Most commands like activate, delete, list-runs, put-pipeline-definition etc. take the pipeline-id which is not known until a new pipeline created. I am unable to find anything which remains constant across updates or recreation (the unique-id and name parameters of the createpipeline command are consistent but then I can't use them for the above mentioned tasks (I need pipeline-id for that.
Of course I can try writing shell scripts which grep and search the output and try to create a script but is there any other better way? Some other info that I am missing?
Thanks a lot.
You cannot edit schedules completely or change references so creating/deleting pipelines seems to be the best way for your scenario.
You'll need the pipeline-id to delete a pipeline. Is it not possible to keep a record of that somewhere? You can have a file with the last used id stored locally or in S3 for instance.
Some other ways I can think of are:
If you have only 1 pipeline in the account you can list-pipelines and
use the only result
If you have the pipeline name you can list-pipelines and find the id