SAS How to add space in between a string? - sas

I want to generate value for a variable in this format X (XX).
eg: 7 (70)
I've the values in the variables N and PctN_01. SO tried
value = cats(N," (",PctN_01,")");
But its not adding the space. I get the result 7(70). So what do I do?

You can use two 'cat' functions, cats and catx - catx concatenates with a delimiter...
value = catx(" ", N, cats("(", PctN_01, ")")) ;

If you're running a version of SAS prior to version 9 (catx not available) then you can do it along the lines of:
value = N || "(" || PctN_01 || ")";

Related

I need to parse text to find beginning, middle and end of a phrase, replace beginning completely, keep middle, and modify ending

I am converting DB2 SQL to Microsoft TSQL Syntax.
DB2 has a CONCAT function working completely different from TSQL that I need to convert.
Example problems...these are all legitimate syntax examples:
Concat as function
CONCAT ( sample1.ColumnName, sample2.ColumnName)
Converts to:
sample1.ColumnName + sample2.ColumnName
Concat as inline
sample1.ColumnName CONCAT sample2.ColumnName
Converts to
sample1.ColumnName + sample2.ColumnName
Concat Symbol as inline
sample1.ColumnName || sample2.ColumnName
Converts to
sample1.ColumnName + sample2.ColumnName
CONCAT as Simple Composite
CONCAT ( sample1.ColumnName, TRIM(sample2.ColumnName))
Converts to
sample1.ColumnName + TRIM(sample2.ColumnName)
CONCAT as Conversion from %!#^#^&
CONCAT ( sample1.ColumnName, sample3.ColumnName CONCAT TRIM(sample2.ColumnName))
Converts to
sample1.ColumnName + sample3.ColumnName + TRIM(sample2.ColumnName)
CONCAT as the Worse Case Scenario
CONCAT ( CONCAT(sample1.ColumnName, sample4.ColumnName), sample3.ColumnName CONCAT TRIM(sample2.ColumnName))
Converts to
sample1.ColumnName + sample4.ColumnName + sample3.ColumnName + TRIM(sample2.ColumnName)
It would be nice to have recursive regex. I can make multiple calls if only processing the outer or inner instance of the replacement pattern.
The one I'm hung up on the most is the version of CONCAT ( value1, value2 )
I can identify CONCAT (. How do I identify value1, value2 and the correct instance of the closing )?
If I can do that, then I can drop CONCAT ( and ). I can return value1 + "+" + value2.
If that scenario could be defined, I could extrapolate other formulas needed, knowing how to specify the necessary groups.
I feel that I would approach it the same way as you have described in your title. First, remove everything off the front of your string and also any closing parenthesis that are not followed by another closing parenthesis. I would match the following and replace with nothing:
^CONCAT.*CONCAT\s*\( | ^Concat (?:Symbol )?as inline | \)(?!\))
Then, do a second replace to look for commas, the string CONCAT or a set of double vertical pipes || and replace it with a plus sign:
\s*(?: , | CONCAT | \|\| )\s*
Here is a demo in PHP
db2 concat to other.
do while locate('Concat(', mystring) = 0
Convert conat(blah , blah ) to blah || blah
loop
Make sure it still works on db2 and you haven't broke anything.
convert || to +

SAS Retain not working for 1 string variable

The below code doesn't seem to be working for the variable all_s when there is more than 1 record with the same urn. Var1,2,3 work fine but that one doesn't and I cant figure out why. I am trying to have all_s equal to single_var1,2,3 concatenated with no spaces if it's first.urn but I want it to be
all_s = all_s + ',' + single_var1 + single_var2 + single_var3
when it's not the first instance of that urn.
data dataset_2;
set dataset_1;
by URN;
retain count var1 var2 var3 all_s;
format var1 $40. var2 $40. var3 $40. all_s $50.;
if first.urn then do;
count=0;
var1 = ' ';
var2 = ' ';
var3 = ' ';
all_s = ' ';
end;
var1 = catx(',',var1,single_var1);
var2 = catx(',',var2,single_var2);
var3 = catx(',',var3,single_var3);
all_s = cat(all_s,',',single_var1,single_var2,single_var3);
count = count+1;
if first.urn then do;
all_s = cat(single_var1,single_var2,single_var3);
end;
run;
all_s is not large enough to contain the concatenation if the total length of the var1-var3 values within the group exceeds $50. Such a scenario seems likely with var1-var3 being $40.
I recommend using the length function to specify variable lengths. format will create a variable of a certain length as a side effect.
catx removes blank arguments from the concatenation, so if you want spaces in the concatenation when you have blank single_varN you won't be able to use catx
A requirement that specifies a concatenation such that non-blank values are stripped and blank values are a single blank will likely have to fall back to the old school trim(left(… approach
Sample code
data have;
length group 8 v1-v3 $5;
input group (v1-v3) (&);
datalines;
1 111 222 333
1 . 444 555
1 . . 666
1 . . .
1 777 888 999
2 . . .
2 . b c
2 x . z
run;
data want(keep=group vlist: all_list);
length group 8 vlist1-vlist3 $40 all_list $50;
length comma1-comma3 comma $2;
do until (last.group);
set have;
by group;
vlist1 = trim(vlist1)||trim(comma1)||trim(left(v1));
vlist2 = trim(vlist2)||trim(comma2)||trim(left(v2));
vlist3 = trim(vlist3)||trim(comma3)||trim(left(v3));
comma1 = ifc(missing(v1), ' ,', ',');
comma2 = ifc(missing(v2), ' ,', ',');
comma3 = ifc(missing(v3), ' ,', ',');
all_list =
trim(all_list)
|| trim(comma)
|| trim(left(v1))
|| ','
|| trim(left(v2))
|| ','
|| trim(left(v3))
;
comma = ifc(missing(v3),' ,',',');
end;
run;
Reference
SAS has operators and multiple functions for string concatenation
|| concatenate
cat concatenate
catt concatenate, trimming (remove trailing spaces) of each argument
cats concatenate, stripping (remove leading and trailing spaces) of each argument
catx concatenate, stripping each argument and delimiting
catq concatenate with delimiter and quote arguments containing the delimiter
From SAS 9.2 documentation
Comparisons
The results of the CAT, CATS, CATT, and CATX functions are usually equivalent to results that are produced by certain combinations of the concatenation operator (||) and the TRIM and LEFT functions. However, the default length for the CAT, CATS, CATT, and CATX functions is different from the length that is obtained when you use the concatenation operator. For more information, see Length of Returned Variable.
Note: In the case of variables that have missing values, the concatenation produces different results. See Concatenating Strings That Have Missing Values.
Some example data would be helpful, but I'm going to give it a shot and ask you to try
all_s = cat(strip(All_s),',',single_var1,single_var2,single_var3);

Dynamic regexprep in MATLAB

I have the following strings in a long string:
a=b=c=d;
a=b;
a=b=c=d=e=f;
I want to first search for above mentioned pattern (X=Y=...=Z) and then output like the following for each of the above mentioned strings:
a=d;
b=d;
c=d;
a=b;
a=f;
b=f;
c=f;
d=f;
e=f;
In general, I want all the variables to have an equal sign with the last variable on the extreme right of the string. Is there a way I can do it using regexprep in MATLAB. I am able to do it for a fixed length string, but for variable length, I have no idea how to achieve this. Any help is appreciated.
My attempt for the case of two equal signs is as follows:
funstr = regexprep(funstr, '([^;])+\s*=\s*+(\w+)+\s*=\s*([^;])+;', '$1 = $3; \n $2 = $3;\n');
Not a regexp but if you stick to Matlab you can make use of the cellfun function to avoid loop:
str = 'a=b=c=d=e=f;' ; %// input string
list = strsplit(str,'=') ;
strout = cellfun( #(a) [a,'=',list{end}] , list(1:end-1), 'uni', 0).' %'// Horchler simplification of the previous solution below
%// this does the same than above but more convoluted
%// strout = cellfun( #(a,b) cat(2,a,'=',b) , list(1:end-1) , repmat(list(end),1,length(list)-1) , 'uni',0 ).'
Will give you:
strout =
'a=f;'
'b=f;'
'c=f;'
'd=f;'
'e=f;'
Note: As Horchler rightly pointed out in comment, although the cellfun instruction allows to compact your code, it is just a disguised loop. Moreover, since it runs on cell, it is notoriously slow. You won't see the difference on such simple inputs, but keep this use when super performances are not a major concern.
Now if you like regex you must like black magic code. If all your strings are in a cell array from the start, there is a way to (over)abuse of the cellfun capabilities to obscure your code do it all in one line.
Consider:
strlist = {
'a=b=c=d;'
'a=b;'
'a=b=c=d=e=f;'
};
Then you can have all your substring with:
strout = cellfun( #(s)cellfun(#(a,b)cat(2,a,'=',b),s(1:end-1),repmat(s(end),1,length(s)-1),'uni',0).' , cellfun(#(s) strsplit(s,'=') , strlist , 'uni',0 ) ,'uni',0)
>> strout{:}
ans =
'a=d;'
'b=d;'
'c=d;'
ans =
'a=b;'
ans =
'a=f;'
'b=f;'
'c=f;'
'd=f;'
'e=f;'
This gives you a 3x1 cell array. One cell for each group of substring. If you want to concatenate them all then simply: strall = cat(2,strout{:});
I haven't had much experience w/ Matlab; but your problem can be solved by a simple string split function.
[parts, m] = strsplit( funstr, {' ', '='}, 'CollapseDelimiters', true )
Now, store the last part of parts; and iterate over parts until that:
len = length( parts )
for i = 1:len-1
print( strcat(parts(i), ' = ', parts(len)) )
end
I do not know what exactly is the print function in matlab. You can update that accordingly.
There isn't a single Regex that you can write that will cover all the cases. As posted on this answer:
https://stackoverflow.com/a/5019658/3393095
However, you have a few alternatives to achieve your final result:
You can get all the values in the line with regexp, pick the last value, then use a for loop iterating throughout the other values to generate the output. The regex to get the values would be this:
matchStr = regexp(str,'([^=;\s]*)','match')
If you want to use regexprep at any means, you should write a pattern generator and a replace expression generator, based on number of '=' in the input string, and pass these as parameters of your regexprep func.
You can forget about Regex and Split the input to generate the output looping throughout the values (similarly to alternative #1) .

Why is max number ignoring two-digit numbers?

At the moment I am saving a set of variables to a text file. I am doing following to check if my code works, but whenever I use a two-digit numbers such as 10 it would not print this number as the max number.
If my text file looked like this.
tom:5
tom:10
tom:1
It would output 5 as the max number.
name = input('name')
score = 4
if name == 'tom':
fo= open('tom.txt','a')
fo.write('Tom: ')
fo.write(str(score ))
fo.write("\n")
fo.close()
if name == 'wood':
fo= open('wood.txt','a')
fo.write('Wood: ')
fo.write(str(score ))
fo.write("\n")
fo.close()
tomL2 = []
woodL2 = []
fo = open('tom.txt','r')
tomL = fo.readlines()
tomLi = tomL2 + tomL
fo.close
tomLL=max(tomLi)
print(tomLL)
fo = open('wood.txt','r')
woodL = fo.readlines()
woodLi = woodL2 + woodL
fo.close
woodLL=max(woodLi)
print(woodLL)
You are comparing strings, not numbers. You need to convert them into numbers before using max. For example, you have:
tomL = fo.readlines()
This contains a list of strings:
['tom:5\n', 'tom:10\n', 'tom:1\n']
Strings are ordered lexicographically (much like how words would be ordered in an English dictionary). If you want to compare numbers, you need to turn them into numbers first:
tomL_scores = [int(s.split(':')[1]) for s in tomL]
The parsing is done in the following way:
….split(':') separates the string into parts using a colon as the delimiter:
'tom:5\n' becomes ['tom', '5\n']
…[1] chooses the second element from the list:
['tom', '5\n'] becomes '5\n'
int(…) converts a string into an integer:
'5\n' becomes 5
The list comprehension [… for s in tomL] applies this sequence of operations to every element of the list.
Note that int (or similarly float) are rather picky about what it accepts: it must be in the form of a valid numeric literal or it will be rejected with an error (although preceding and trailing whitespace is allowed). This is why you need ….split(':')[1] to massage the string into a form that it's willing to accept.
This will yield:
[5, 10, 1]
Now, you can apply max to obtain the largest score.
As a side-note, the statement
fo.close
will not close a file, since it doesn't actually call the function. To call the function you must enclose the arguments in parentheses, even if there are none:
fo.close()

Regex: How to remove English words from sentences using Regex?

I've number of rows in SQLite, each row has one column that contains data like this:
prosperکامیاب شدن ، موفق شدن ، رونق یافتن
As you can see, the sentence starts with English words, Now I want to remove English words at first of each sentence. Is there any way to do that via T-SQL query(using Regex)?
you may try this :) I have made it as a function to call upon
create function dbo.RemoveEngChars (#Unicode_string nvarchar(max))
returns nvarchar(max) as
begin
declare #i int = 1; -- must start from 1, as SubString is 1-based
declare #OriginalString nvarchar(100) = #Unicode_string collate SQL_Latin1_General_Cp1256_CS_AS
declare #ModifiedString nvarchar(100) = N'';
while #i <= Len(#OriginalString)
begin
if SubString(#OriginalString, #i, 1) not like '[a-Z]'
begin
set #ModifiedString = #ModifiedString + SubString(#OriginalString, #i, 1);
end
set #i = #i + 1;
end
return #ModifiedString
end
--To call the function , you can run the following script and pass the Unicode in N' prefix
select dbo.RemoveEngChars(N'prosperکامیاب شدن ، موفق شدن ، رونق یافتن')