What is the difference between deploy and redeploy in SAS DI Studio? - sas

Am guessing that it might just retain the metadata ID (redeploy) as opposed to generating a new one (deploy), is that the only difference though?

It is only difference but it is very important. You should always redeploy jobs that any flow is dependent on. If you deploy job that was already added to a flow, the flow will be damaged.

I disagree with the previous comment by #barjey that the flow is damaged. It is not damaged , its just that a New metadata ID is created in the form of a new .sas file for the same job, like JOB_NAME_00000.sas and deployed
This adds to lot of confusion and a lot of versions of the same jobs float around, which is incorrect. That's the reason a job is always re-deployed so that the previous version of the code is over-written and new changes reflected in the flow.

Yo redeploy a job to incorporate the environmental changes by automatically identifying the environment to which you have deployed the job .These changes are then reflected at back end where the job is actually saved (job which is scheduled in a flow).

Related

Can I configure Google DataFlow to keep nodes up when I drain a pipeline

I am deploying a pipeline to Google Cloud DataFlow using Apache Beam. When I want to deploy a change to the pipeline, I drain the running pipeline and redeploy it. I would like to make this faster. It appears from the logs that on each deploy DataFlow builds up new worker nodes from scratch: I see Linux boot messages going by.
Is it possible to drain the pipeline without tearing down the worker nodes so the next deployment can reuse them?
rewriting Inigo's answer here:
Answering the original question, no, there's no way to do that. Updating should be the way to go. I was not aware it was marked as experimental (probably we should change that), but the update approach has not changed in the last 3 i have been using DF. About the special cases of update not working, supposing your feature existed, the workers would still need the new code, so no really much to save, and update should work in most of the other cases.

AWS Sagemaker processing Job automatically created?

I haven't used sagemaker for a while and today I started a training job (with the same old settings I always used before), but this time I noticed that a processing job has been automatically created and it's running while my training job run (I presume for debugging purpose).
I'm sure that this is the first time that it happens.. Is that a new feature introduced by sagemaker? I didn't find any related in documentation, but it's important to know because I don't want extra costs..
This is the image used by the processing job, with a instance type of ml.m5.2xlarge which I didn't set anywhere..
929884845733.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-debugger-rules:latest
I can answer my question.. it seems to be a new feature as highlighted here. You can turn it off as suggested in the doc:
To disable both monitoring and profiling, include the disable_profiler parameter to your estimator and set it to True.

Why is my cloud run deploy hanging at the "Deploying..." step?

Up until today, my deploy process has worked fine. Today when I go to deploy a new revision, I get stuck at the Deploying... text with a spinning indicator, and it says One or more of the referenced revisions does not yet exist or is deleted. I've tried a number of different images and flags -- all the same.
See Viewing the list of revisions for a service, in order to undo whatever you may have done.
Probably you have the wrong project selected, if it does not know any of the revisions.
I know I provided scant information, but just to follow up with an answer: it looks like the issue was that I was deploying a revision, and then immediately trying to tag it using gcloud alpha run services update-traffic <service_name> --set-tags which looks to have caused some sort of race, where it complained that the revision was not yet deployed, and would hang indefinitely. Moving the set-tag into the gcloud alpha run deploy seemed to fix it.

How to automate the Updating/Editing of Amazon Data Pipeline

I want to use AWS Data Pipeline service and have created some using the manual JSON based mechanism which uses the AWS CLI to create, put and activate the pipeline.
My question is that how can I automate the editing or updating of the pipeline if something changes in the pipeline definition? Things that I can imagine changing could be schedule time, addition or removal of Activities or Preconditions, references to DataNodes, resources definition etc.
Once the pipeline is created, we cannot edit quite a few things as mentioned here in the official doc: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-manage-pipeline-modify-console.html#dp-edit-pipeline-limits
This makes me believe that if I want to automate the updating of pipeline then I would have to delete and re-create/activate a new pipeline? If yes, then the next question is that how can I create a automated process which identifies the previous version's ID, deletes it and creates a new one? Essentially trying to build a release management flow for this where the configuration JSON file is released and deployed automatically.
Most commands like activate, delete, list-runs, put-pipeline-definition etc. take the pipeline-id which is not known until a new pipeline created. I am unable to find anything which remains constant across updates or recreation (the unique-id and name parameters of the createpipeline command are consistent but then I can't use them for the above mentioned tasks (I need pipeline-id for that.
Of course I can try writing shell scripts which grep and search the output and try to create a script but is there any other better way? Some other info that I am missing?
Thanks a lot.
You cannot edit schedules completely or change references so creating/deleting pipelines seems to be the best way for your scenario.
You'll need the pipeline-id to delete a pipeline. Is it not possible to keep a record of that somewhere? You can have a file with the last used id stored locally or in S3 for instance.
Some other ways I can think of are:
If you have only 1 pipeline in the account you can list-pipelines and
use the only result
If you have the pipeline name you can list-pipelines and find the id

Can a Hudson job poll a SCM without pulling code down?

I have a job that I want to run every time a commit is made to a repository. I want to avoid pulling this code down, I only want the notification build trigger. So, is there either a way to not pull down certain repositories in your SCM upon a build or a way to poll things that aren't in the SCM for a build?
you could use a post commit hook to trigger your hudson job.
Since you want to avoid changing SVN, you have to write a job that gets executed every so often (may be every 5 Minutes). This jobs runs a svn command using windows bach or shell script task to get the current revision for the branch in question. You can set the status of the job to unstable if there is a change. Don't use failure because you can't distinguish than between a real failure and a repository change. I think there is a plugin that sets the job status depending on the contents of you output.
You can then use the email extension plugin to send an email every time the revision changes. You can get the revision number from the last (or better the last successful or unstable) job. You can archive a file containing the revision number on the jobs or you can set the description for the job to the revision using the description setter plugin. Have a look at Hudsons remote API for ideas on how to get the information from the previous job.
Since you run your job very often during the day. don't forget to delete old job runs. But I would keep at least two days worth of history, just in case your svn is down for 24 hours.