SAS DI Stop job if dataset is populated - sas

I'm quite new to SAS and really can't get my head around it's code, so asking here for help.
I've a job that is reading an external csv file, and have a macro created by a colleague that validates the data in this external file and prints out error message to a work table.
What I'd like to do is either on precode of the file reader, or by using another user written code transformation is to read the work table and check if observations exist, and if they do, abort the job. From googling, and between here and SAS community, I can find how to read a dataset and count observations but I'm having real difficulty in figuring out how to implement it so any guidance would be really appreciated
Can anyone please help me on this?
Thanks

Related

I want to ask questions in terms of PSM code for stata

I have a trouble in understanding PSM(propensity score matching)-time varying did code in stata:
I found a list of code online but I have trouble in understanding the code:enter image description here
I don't have data available on hand, but I am wondering how did the person generate variable "myblock" and what did variable that I should enter in the position of the "comsup" when I run my own code.
I am also wondering how many variables are the propriate for "xlist"
Thank you so much for the help and support!

ClientError: Unable to parse csv: rows 1-1000, file

I've looked at the other answers to this issue and none of them are helping me. I am trying to run a simple random cut forest algorithm. I have a small data set of IPs which have been stripped down to only have numbers. I still get this error. It only has one column of these numbers. The CSV looks like this:
176162144
176862141
176762141
176761141
176562141
Have you looked at this sample notebook, and tried using it with your own data?
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/random_cut_forest/random_cut_forest.ipynb
In a nutshell, it reads the CSV file with Pandas and trains the model like this:
rcf = RandomCutForest(role=execution_role,
train_instance_count=1,
train_instance_type='ml.m4.xlarge',
data_location='s3://{}/{}/'.format(bucket, prefix),
output_path='s3://{}/{}/output'.format(bucket, prefix),
num_samples_per_tree=512,
num_trees=50)
# automatically upload the training data to S3 and run the training job
rcf.fit(rcf.record_set(taxi_data.value.as_matrix().reshape(-1,1)))
You didn't say what your use case was, but as you're working with IP addresses, you may find the IP Insights built-in algorithm useful too: https://docs.aws.amazon.com/sagemaker/latest/dg/ip-insights.html
I was utilizing the sample notebook Julien Simon mentioned earlier, but at some point the data was ending up as strings! The funny thing about RCF algorithms is they have to run on integer data.
What I did is I made sure to cast the array as an int array as a double check and vallah! It worked. I am at loss over how the data ended up in a string format but alas, that was the issue. Simple solution.

Extraction of Mainframe datasets used in the jobs which are scheduled in TWS

I am looking to extract the list of datasets used in the Jobs which are scheduled in TWS. Can you please help me on this?
Here is the sample example
owner name ,Jobname ,dataset name
CAXXXXXXX PADLSHX EBEU.XXXX
The execution can be either in JCL OR JCL with SAS, please let me know if there is any program which results the above ask.

how to read records from ESE database using cpp

I have open the ESE database successfully by using JetOpenDatabase API.
To read the records I have open the "MSysObjects" table and set the current index to the "RootObjects".
Here's my code (without error-handling):
err = ::JetOpenTable(sessionID,dbID,"MSysObjects",NULL,0,0,&tableId);
err = ::JetSetCurrentIndex( sessionID, tableId, "RootObjects" );
err = ::JetMove( sessionID, tableId, JET_MoveFirst, 0 );
to read the records I have tried the JetRetrieveColumns function to retrieves multiple column values from the current record. I have also tried JetRetrievedColumn function but I didn't get the actual result.
Is any one know that how to read the records from existing and unmounted ESE database files by using cpp?
The esent engine gives you a hint of what went wrong by the error code. Look it up here:
https://msdn.microsoft.com/en-us/library/gg269297(v=exchg.10).aspx
In general you have to prepare the JET_RETRIEVECOLUMN before you do actually try to read the data via JetRetrieveColumn(s), by selecting which columns you want to retrieve, preparing buffer/pointers, etc. Of course there's more to it, but you should be a little bit more specific with your question.
Yes, Fotis gives good advice. The specific error codes are very valuable. Since you're looking for example code, some of the more comprehensive example code is written in C#.
Take a look at the EsentInteropTests at https://managedesent.codeplex.com/SourceControl/latest. Search for RetrieveColumn, and it will give you a good idea on which orders to call in which order. Sure, it's not the right language, but you can easily translate.
I presume you're using MSysObjects as an example because every database has that table. It's for internal use, and can be fairly cryptic to decipher.
-martin

Finding and debugging bad record using hive

Is there any way to pinpoint the badrecord when we are loading the data using hive or while processing the data.
The scenario Goes like this.
Suppose I have file that need to be loaded as table using hive which got 1 Million records in it. Delimited by some '|' symbol.
So suppose after Half a million record processing I encounter a problem. IS there anyway to debug it or precisely pinpoint the record/records having the issues.
If you are not clear about my question please let me know.
I know there is a skipping of bad record in mapreduce (Kind of percentage). I would like to get this in the perspective of hive.
Thanks In Advance.