SAS Metadata DATA STEP Functions

SAS Metadata DATA STEP Functions - sas

I want to know why is the metadata_getnatr function is used with metadata_resolve function when we are trying to read the metadata through data step functions.
For example: in the code that is covered in the link
Reproduced Here:
Example 1: Using an Object URI
data _null_;
length id $20
type $256;
rc=metadata_resolve("omsobj:Machine?#Name='bluedog'",type,id);
put rc=;
put id=;
put type=;
run;
Example 2: Using a Repository URI
data _null_;
length id $20
type $256
attr $256
value $256;
rc=metadata_resolve("omsobj:RepositoryBase?#Name='myrepos'",type,id);
put rc=;
put id=;
put type=;
n=1;
rc=1;
do while(rc>=0);
rc=metadata_getnatr("omsobj:RepositoryBase?#Name='myrepos'",n,attr,value);
if (rc>=0) then put attr=;
if (rc>=0) then put value=;
n=n+1;
end;
run;
Why are they using getnatr function?
Does the metadata_resolve gives a URI as an Output or what is the output?
Thanks!

I agree - the documentation here could be improved!
The first example is quite clear - provide a URI (metadata query) and return the type and Id to be used for further logic / queries.
The second example demonstrates something which is a bit of an edge case. It is using the REPO namespace (instead of the usual, SAS namespace) to return an object representing the repository (eg FOUNDATION). You may have noticed that you cannot substitue the URI with the ID from the original metadata_resolve function (which you'd expect the example to be demonstrating, as an efficiency thing). According to the documentation, the RepositoryBase subclass inherits it's metadata id, so this may indicate why it cannot be referenced without using a URI.
In any case, to clarify the usage of metadata_resolve:
It's not mandatory to use it in combination with metadata_getnatr
It's useful when you when you want to find out the metadata type returned from a URI
It's useful when you are going to use the same uri in more than one query (so more efficient to turn it into an ID)
SAS will cache your query within the same metadata function, so unless you are going to use the same URI in more than one metadata function, you don't need to use metadata_resolve.

Related

What is AWS S3 dataset?

Looking at documentation of awswrangler.s3.to_csv or awswrangler.s3.to_parquet, there is a dataset parameter.
From testing, it looks like setting dataset=True allows, among other things, to append new data to an already existing set. It also looks like when dataset=True, I can't specify the file name and AWS autogenerates the names for the files which are added to the specified path.
Apart from that, I can't find more information on what dataset means. Is it just referring to the general concept or is there a specific meaning within the context of AWS? What exactly is dataset and when should it be set to True?

The dataset=True option allows you to store the entire dataset, including all metadata, indexes, etc.
The dataset parameter documentation:
dataset (bool) – If True store as a dataset instead of ordinary file(s) If True, enable all follow arguments: partition_cols, mode, database, table, description, parameters, columns_comments, concurrent_partitioning, catalog_versioning, projection_enabled, projection_types, projection_ranges, projection_values, projection_intervals, projection_digits, catalog_id, schema_evolution.
Note all those extra things that get saved when you save a dataset. All that information, like columns_comments, concurrent_partitioning, projection_values, will be lost when you save to CSV or Parquet. But on the other hand, those values are probably only useful if you plan to do further manipulation of the data via awswrangler/pandas at some later date.
Also note that if you set dataset=True you have to give it a file name prefix instead of a single file name, because the output generated will be spread across multiple files.
If you want to use the data in any other tool besides Pandas, such as loading the CSV into Excel, then you most likely want to set dataset=False and output to a single file.

Google Cloud Data Fusion - Dynamic arguments based on functions

Good morning all,
I'm looking in Google Data Fusion for a way to make dynamic the name of a source file stored on GCS. The files to be processed are named according to their value date, example: 2020-12-10_data.csv
My need would be to set the filename dynamically so that the pipeline uses the correct file every day (something like this: ${ new Date(). Getfullyear()... }_data.csv
I managed to use the arguments in runtime by specifying the date as a string (2020-12-10) but not with a function.
And more generally is there any documentation on how to enter dynamic parameters with ready-made or custom "functions" (I couldn't find it)
Thanks in advance for your help.

There is a readymade workaround, you can give a try "BigQuery Execute" plugin.
Steps:
Put below query in SQL
select cast(current_date as string) ||'_data.csv' as filename
--for output '2020-12-15_data.csv'
Row As Arguments to 'true'
Now use the above arguments via ${filename} wherever you want to.

SAS Metadata Objects Details - SASLibraries, PhysicalTables, Jobs

I've used this code to fetch the list of objects for all SAS Libraries, Physical Tables and Jobs.
https://github.com/sasjs/core/blob/master/meta/mm_getobjects.sas
I now need to fetch these objects details,
Like for Libraries - I need their libname and full path,
Teradata Libs - Schema Name,Lib path
Physical Tables - Location and other attribs
Jobs - Location, and other attribs.
I'm not very familiar on how or what attribs can we report, but I definitely need their paths and attribs.
Thank you.

The example you refer to is using proc metadata that returns XML you need to understand and process. The real problem here is you have to learn how to build input XML to construct the metadata query, which is quite a complex thing.
Maybe more straight forward is to use Data step metadata functions like here.
METABROWSE command is useful to understand metadata object relations (if you have access to SAS Foundation), see here

The attributes you are asking for will change depending on the library engine you are checking for.
The following macro will generate a libname for BASE, OLEDB, ODBC and POSTGRES engines (note that the repository has moved):
https://github.com/sasjs/core/blob/main/meta/mm_assigndirectlib.sas
The direct attributes are available as per this answer: How to get details of metadata objects in SAS
Folder path is available as per this answer:
%let metauri=OMSOBJ:PhysicalTable\A5HOSDWY.BE0006N9;
/* get metadata paths */
data ;
length tree_path $500 tree_uri parent_uri parent_name $200;
call missing(tree_path,tree_uri,parent_uri,parent_name);
drop tree_uri parent_uri parent_name rc ;
uri="&metauri";
rc=metadata_getnasn(uri,"Trees",1,tree_uri);
rc=metadata_getattr(tree_uri,"Name",tree_path);
do while (metadata_getnasn(tree_uri,"ParentTree",1,parent_uri)>0);
rc=metadata_getattr(parent_uri,"Name",parent_name);
tree_path=strip(parent_name)||'/'||strip(tree_path);
tree_uri=parent_uri;
end;
tree_path='/'||strip(tree_path);
run;

With stored procedures, is cfSqlType necessary?

To protect against sql injection, I read in the introduction to ColdFusion that we are to use the cfqueryparam tag.
But when using stored procedures, I am passing my variables to corresponding variable declarations in SQL Server:
DROP PROC Usr.[Save]
GO
CREATE PROC Usr.[Save]
(#UsrID Int
,#UsrName varchar(max)
) AS
UPDATE Usr
SET UsrName = #UsrName
WHERE UsrID=#UsrID
exec Usr.[get] #UsrID
Q: Is there any value in including cfSqlType when I call a stored procedure?
Here's how I'm currently doing it in Lucee:
storedproc procedure='Usr.[Save]' {
procparam value=Val(form.UsrID);
procparam value=form.UsrName;
procresult name='Usr';
}

This question came up indirectly on another thread. That thread was about query parameters, but the same issues apply to procedures. To summarize, yes you should always type query and proc parameters. Paraphrasing the other answer:
Since cfsqltype is optional, its importance is often underestimated:
Validation:
ColdFusion uses the selected cfsqltype (date, number, etcetera) to validate the "value". This occurs before any sql is ever sent to
the database. So if the "value" is invalid, like "ABC" for type
cf_sql_integer, you do not waste a database call on sql that was never
going to work anyway. When you omit the cfsqltype, everything is
submitted as a string and you lose the extra validation.
Accuracy:
Using an incorrect type may cause CF to submit the wrong value to the database. Selecting the proper cfsqltype ensures you are
sending the correct value - and - sending it in a non-ambiguous format
the database will interpret the way you expect.
Again, technically you can omit the cfsqltype. However, that
means CF will send everything to the database as a string.
Consequently, the database will perform implicit conversion
(usually undesirable). With implicit conversion, the interpretation
of the strings is left entirely up to the database - and it might
not always come up with the answer you would expect.
Submitting dates as strings, rather than date objects, is a
prime example. How will your database interpret a date string like
"05/04/2014"? As April 5th or a May 4th? Well, it depends. Change the
database or the database settings and the result may be completely
different.
The only way to ensure consistent results is to specify the
appropriate cfsqltype. It should match the data type of the target
column/function (or at least an equivalent type).

SharePoint UserData and the ;# Syntax in returned data

Can a SharePoint expert explain to me the ;# in data returned by the GetListItems() call to the Lists web service?
I think I understand what they are doing here. The ;# is almost like a syntax for making a comment... or better yet, including the actual data (string) and not just the ID. This way you can use either, but they are nicely paired together in the same column.
Am I way off base? I just can't figure out the slighly different use. For example
I have a list with:
ows_Author
658;#Tyndall, Bruno
*in this case the 658 seems to be an ID for me in a users table somewhere*
ows_CreatedDate (note: a custom field. not ows_Created)
571;#2009-08-31 23:41:58
*in this case the 571 seems to be an ID of the row I'm already in. Why the repetition?*
Can anyone out there shed some light on this aspect of SharePoint?

The string ;# is used as a delimiter by SharePoint's lookup fields, including user fields. When working with the object model, you can use SPFieldLookupValue and SPFieldUserValue to convert the delimited string into a strongly-typed object. When working with the web services, however, I believe you'll need to parse the string yourself.
You are correct that the first part is an integer ID: ID in the site user list, or ID of the corresponding item in the lookup list. The second part is the user name or value of the lookup column.
Nicolas correctly notes that this delimiter is also used for other composite field values, including...
SPFieldLookupValueCollection
SPFieldMultiColumnValue
SPFieldMultiChoiceValue
SPFieldUserValueCollection

The SPFieldUser inherits from the SPFieldLookup which uses the ;# notation. You can easily parse the value by creating a new instance of the SPFieldLookupValue class:
string rawValue = "1;#value";
SPFieldLookupValue lookupValue = new SPFieldLookupValue(rawValue);
string value = lookupValue.LookupValue; // returns value

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js