Using SAS to format a string as a substring - sas

I am new to SAS formats.
Say I have a string in the form of NNN.xxx where NNN is a number in the format of z3. and xxx is just some text.
E.g.
001.NUL and 002.ABC
Now can I define a format, fff, such that b = put("&NNN..&xxx.",fff.); returns only the &xxx. part?
I know we can achieve this by using b = substr("&NNN..&xxx.",5,3); but I want to have a format so that I can simply assign the format to a variable and not have to create a new variable out of it.
Thanks in advance.

Probably the only way is to code your own custom character format using SAS/TOOLKIT. It will be much easier to create another variable as you do with substr().

As said, I think this can be achieved thru combination of custom defined formats along with SAS builtin character functions - i.e. CAT, CATX, CATS, CATT etc...

Related

Replace string with numerical data in Stata error type mismatch r(109);

I'm attempting to reformat the following data
treatment
text-only
text-only
text-only
text-only
text-only
text-only
text+photo
text+photo
text+photo
text+photo
text+photo
text+photo
text+video
text+video
as binary data (0,1,2)
I use the following code, but receive the following error. What am I doing wrong?
Thanks!
replace treatment = 0 if treatment == "text-only"
type mismatch
r(109);
I take it you are not really looking to create a binary variable (only zeros and ones) but rather just a numeric variable.
The problem you are running into is that you want to replace strings with integers. These are two different data types, and in Stata you can't have a variable with two different data types.
You solved this problem by making the integers strings. This solves one problem, but could create others. The best way to solve your problem is to use encode
This is a simple way to preserve the original variable and create a new numeric variable based on the original string. You code would look like this:
encode treatment, gen(id_treatment)
This will give you a new variable called id_treatment that will be numeric, but will have value labels that correspond to the original strings. You will also still have the original string variables if you need them.
Figured it out!
replace treatment = "0" if treatment == "text-only"
This works^

sqlite3 multiple string replacement

Im looking for a possibility to do a multiple string replacement for language localization.
I have a field with a text with multiple place holders like that:
"Set Point for Temperatur is changed from {%} ℃ to {%} ℃" I want to replace the {%} with a value from another field from the same row. I know there is the replace function within sqlite3. But with that I'm replacing all the place holders. But I want to replace the first {%} with a different row than the second {%}. Is that possible? I can do that programmatically in c++ or php, but it would be nice to have a solution inside the database.
Thanks and Regards
If you can live with a different place holder, you can use printf():
select printf('%s, %s!', 'Hello', 'world');
Or if it is guaranteed that none or printf's %x place holders occur in your strings, you can replace {%} with %s before applying printf.

Parsing pipes from a line using expression

I have data that looks like this:
A|B|CC|DD|EE|FF|GG
Is there any way I can parse the string to output values of the pipe separators? Can someone give me some examples?
e.g.
A is the value before the first pipe
B is the value before the second pipe
etc.
etc.
It's possible within Expression Transformation but very inconvenient. You need to use INSTR and SUBSTR functions as indicated by #Vikas.
What you can also try is Java Transformation or...
A trick: how about dumping this (i.e. the string along with some key value) to a file prior to processing the dataset. And then use an additional Source Qualifier with Column delimiter set to "|" to do all the dirty work for you? Then you can join it all back together using a Joiner Transformation and the key value dumped to the file.
You can use INSTR and SUBSTR combination or REG_ commands . Thanks !!

How do I use numeric functions to correct date typos?

I know it's easy enough to do manual corrections on date typos, but I want to automate such corrections using one or more SAS functions, given that my dataset is large and typos are frequent.
For instance, it seems that whomever created the dataset I am cleaning often transposed digits in the year of someone's birthdate (e.g., '2102' rather than '2012', '2110' instead of '2010', etc). I'm aware of string functions such as INDEX() that find certain character values or strings and then allow for the replacement of said characters in the same position (i.e., replace "ABCD" with "ABBB", regardless of the string's location in a value). Can the same process be replicated with numeric (and specifically date) values?
I don't think SAS has any functions that would check numeric values for digit patterns. I often do data cleaning and address this issue by making a character variable out of the numeric date variable, then using character functions and Perl regex to clean the character values, and then storing the cleaned values as numeric date.
For specifically date values, you could try using SAS date functions (e.g. DAY(), MONTH(), YEAR(), MDY(), etc.) to extract parts of the date value, error-check them, and put them all back together into a date value. This could be a good quick solution if you expect a limited set of typos and you roughly know what they are. For a more thorough error check, converting the numeric values to character and using char or regex functions would give you more options.
The only really concise suggestion I can imagine is using mdy (Assuming this is date, not datetime variables).
For example:
data want;
set have;
if year(datevar) > 2100 then
datevar = mdy(month(datevar),day(datevar),year(datevar)-90);
run;
would correct any '2104' to '2014'. That's a very simple correction (and may well do as much harm as good, since '2114' is also a possible typo), but things along those lines - break the date up into its pieces, verify the pieces, reconstruct using mdy.

ICU Custom Currency Formatting (C++)

Is it possible to custom format currency strings using the ICU library similar to the way it lets you format time strings by providing a format string (e.g. "mm/dd/yyy").
So that for a given locale (say USD), if I wanted I could have all currency strings come back "xxx.00 $ USD".
See http://icu-project.org/apiref/icu4c/classDecimalFormat.html,
Specifically: http://icu-project.org/apiref/icu4c/classDecimalFormat.html#aadc21eab2ef6252f25eada5440e3c65
For pattern syntax see: http://icu-project.org/apiref/icu4c/classDecimalFormat.html#_details
I didn't used this but from my knowledge of ICU this is the direction.
However I would suggest to use:
http://icu-project.org/apiref/icu4c/classNumberFormat.html and createCurrencyInstance member and then use setMaximumIngegerDigits or other functions to make what you need -- that would be much more localized. Try not assume anything about any culture. Because "10,000 USD" my be misinterpreted as "$ 10" in some countries where "," used for fraction part separation.
So be careful.
You can create a currency instance, then if it is safe to cast it to a DecimalFormat
if (((const NumberFormat*)fmt)->getDynamicClassID() == DecimalFormat::getStaticClassID())
{ const DecimalFormat* df = (const DecimalFormat*) fmt; ...
… then you can call applyPattern on it. See the information on ¤, ¤¤, ¤¤¤ under 'special pattern chars'
Use the ICU library's createCurrencyInstance().