I am pulling out survey participants comments from a dataset using prxmatch in SAS EG 7.1. There are a lot of comments that start with "No", it might just be no on its own or it might say no problems etc. I only want to filter these out if the string starts with "no", and not if "no" is contained elsewhere in the string so I have included the ^ metacharacter but I noticed some comments are being filtered out if they have "not" in them so I include \b after no for the word boundary however it is still filtering out strings that have "not" or "no" anywhere within the string.
Examples of what should match:
"No problems" "No"
Examples of what should NOT match:
"His name is STEVE not SPEVE" "I was mostly fine but I was not expecting to get a headache"
How do I stop this from happening? I've included my code, any help would be great.
data cclhd hnelhd islhd nbmlhd seslhd swslhd slhd wslhd mnclhd nnswlhd wnswlhd;
set work.schools_dataset;
where Comments ne " ";
if prxmatch ("m/^no\b|^nil|^none|^nop|all good|na|n\/a|n\.a/i",Comments) = 0 ;
keep ParticipantID FirstName Mobile VaxDate OperationID Venue Comments;
if operationid=108 then output work.cclhd;
else if operationid=109 then output work.hnelhd;
else if operationid=110 then output work.islhd;
else if operationid=111 then output work.nbmlhd;
else if operationid=113 then output work.seslhd;
else if operationid=114 then output work.swslhd;
else if operationid=115 then output work.slhd;
else if operationid=116 then output work.wslhd;
else if operationid=118 then output work.mnclhd;
else if operationid=120 then output work.nnswlhd;
else if operationid=122 then output work.wnswlhd;
run;
Try:
data cclhd hnelhd islhd nbmlhd seslhd swslhd slhd wslhd mnclhd nnswlhd wnswlhd;
set work.schools_dataset;
where Comments ne " ";
if prxmatch("/^no\b/i",comments) = 0;
keep ParticipantID FirstName Mobile VaxDate OperationID Venue Comments;
if operationid=108 then output work.cclhd;
else if operationid=109 then output work.hnelhd;
else if operationid=110 then output work.islhd;
else if operationid=111 then output work.nbmlhd;
else if operationid=113 then output work.seslhd;
else if operationid=114 then output work.swslhd;
else if operationid=115 then output work.slhd;
else if operationid=116 then output work.wslhd;
else if operationid=118 then output work.mnclhd;
else if operationid=120 then output work.nnswlhd;
else if operationid=122 then output work.wnswlhd;
Related
I have the following block of code that I thought should be able to classify a user's input's data type in a Stata .do file:
capture program drop smth
program define smth
di "Enter smth: " _request(smth1)
local type = substr("`: type $smth1 '", 1, 3)
if "`type'" == "str" {
di "It is a string!"
}
else if "`type'" == "flo" {
di "It is a float!"
}
else if "`type'" == "int" {
di "It is an integer!"
}
else {
di "it is not a string, float nor integer!"
}
end
However, when I executed the .do file (trialscript is the name of the .do file) in a Stata command prompt with the user input, "hello", I encountered the following error:
. do trialscript
. capture program drop smth
. program define smth
1. di "Enter smth: " _request(smth1)
2. local type = substr("`: type $smth1 '", 1, 3)
3. if "`type'" == "str" {
4. di "It is a string!"
5. }
6. else if "`type'" == "flo" {
7. di "It is a float!"
8. }
9. else if "`type'" == "int" {
10. di "It is an integer!"
11. }
12. else {
13. di "it is not a string, float nor integer!"
14. }
15. end
.
.
end of do-file
. smth
Enter smth: . hello
no variables defined
it is not a string, float nor integer!
What the user enters given your code is put into a global macro, which is not a variable in Stata's sense, as a variable is (only) a column of data in a dataset. The type syntax you used works only with variables.
All global macros that are defined are strings. The programmer and user can think they contain numbers if and only if their content can be used numerically.
A test of whether the input is numeric is to try something numeric, e.g.
capture di 1 + $smth1
if _rc di "it is a string"
else di "it is a number"
This is not quite fail-safe, as a string might contain the name of a numeric variable or scalar, in which case the operation should work.
A test of whether a global macro contains a string that can be interpreted as an integer is to check whether floor($smth1) == $smth1 or equivalently that ceil() and round() return the input value.
There is no sense in which a global macro or its contents can be a float or int, except by trying whether such a variable would accept the contents as a value.
Stata's terminology here is that of many statistical programs in which a variable is a column in a dataset. It comes as a surprise to many of those who started with a mainstream programming language, as I did myself. More at https://www.stata.com/statalist/archive/2008-08/msg01258.html
The kind of input you are programming is now unusual in Stata.
I'm currently using Crystal Reports 2013 to run reports. I'm having an issue with a formula that needs to look at an SAP order status and only print out a specific few. The SAP Order Status field is made up of 2 sections.
Section 1: 'A' 'B' 'C' 'D' (Only a single selection is pulled from
this list)
Section 2: 'E' 'F' 'G' 'H' (This can have multiple selections within
the Status)
Example Order #1111 Status: "A: F: G"
I currently have a formula that pulls the status of an order from the 1st Section.
if (isnull({user_status}) or
{user_status}=" " or
not ({user_status} like ["*A*", "*B*", "*C*", "*D*"])) then "N/A" else
if {user_status} like "*A*" then "A" else
if {user_status} like "*B*" then "B" else
if {user_status} like "*C*" then "C" else
if {user_status} like "*D*" then "D"
The above snippet would only bring in "A" for Order #1111 and omit "F" & "G".
I need assistance with a formula that would omit "A" and list out both "F" & "G".
I've tried the following:
if (isnull({user_status}) or
{user_status}=" " or
not ({user_status} like ["*E*", "*F*", "*G*", "*H*"])) then "N/A" else
if {user_status} like "*E*" then "E" else
if {user_status} like "*F*" then "F" else
if {user_status} like "*G*" then "G" else
if {user_status} like "*H*" then "H"
But that formula just returns the full "A; F; G" status.
Figured out the answer. (6/9/2020) I used the following code to create an array and loop through it pulling out only what I needed.
NumberVar Counter;
StringVar finalStatus:= "";
StringVar array statusList;
statusList:= split({user_status},";");
//Loop through array and only print out Status w/o No values
FOR Counter := 1 to UBound(statusList) DO
(if statusList[Counter] like ["E", "F", "G", "H"] then
finalStatus:= finalStatus+statusList[Counter]+";" else "");
finalStatus
Thanks for the input.
I have written Perl code for validating GSTIN Number which is related to India’s tax according to the following rules:
The first two digits represent the state code as per Indian Census 2011. Every state has a unique code.
The next ten digits will be the PAN number of the taxpayer
The thirteenth digit will be assigned based on the number of registration within a state
The fourteenth digit will be Z by default
The last digit will be for check code. It may be an alphabet or a number.
Following is the code:
my $gst_number_input = '35AABCS1429B1AX';
my $gst_number_character_count = length($gst_number_input);
my $gst_validation =~ /\d{2}[A-Z]{5}\d{4}[A-Z]{1}[A-Z\d]{1}[Z]{1}[A-Z\d]{1}/;
if ($gst_number_character_count == 15 && $gst_number_input =~ $gst_validation) {
print "GST Number is valid";
} else {
print "Invalid GST Number";
}
I have an invalid GSTIN input entered in the code. So when I run the script, I get:
GST Number is valid
Instead I should get the error because the GSTIN input is invalid:
Invalid GST Number
Can anyone please help ?
Thanks in advance
In this part you are using =~ where is should be an equals sign =
my $gst_validation =~ /\d{2}[A-Z]{5}\d{4}[A-Z]{1}[A-Z\d]{1}[Z]{1}[A-Z\d]{1}/;
If you want to use is as a variable, you could use qr
Note that you can omit {1} from the pattern and you don't have to use the square brackets around [Z]
You code might look like
my $gst_number_input = '35AABCS1429B1AX';
my $gst_number_character_count = length($gst_number_input);
my $gst_validation = qr/\d{2}[A-Z]{5}\d{4}[A-Z][A-Z\d]Z[A-Z\d]/;
if ($gst_number_character_count == 15 && $gst_number_input =~ $gst_validation) {
print "GST Number is valid";
} else {
print "Invalid GST Number";
}
I know next to nothing about regular expressions and after reading tutorials on several sites, still know next to nothing. I want a regular expression that will validate/enforce that a film rating is one of G, PG-13, R or NC-17.
You didn't specify the programming language you're using, but you don't need a regex, i.e.:
for python:
ratings = ['G','PG-13','R','NC-17']
rating = "PG-13"
if rating in ratings:
print "ranking exist"
else:
print "no match"
for php:
$ratings = array('G','PG-13','R','NC-17');
$rating = 'PG-13';
if (in_array($rating, $ratings))
{
print "rating exist";
} else {
print "no match";
}
Solved. See footnote.
/*check regex*/
go = 1;
i = 1;
do while (go = 1);
set braw.regex point = i;
if (upcase(fname) = upcase("&var.")) then do;
put format1 " one"; /*format1 is a field of braw.regex, properties says character length 30*/
if format1 = '/\d{8}/' then put 'hello world one'; else put 'good bye world one';
%check1(&data, format1, &var)
end;
else i = i+1;
end;
/*check1 passes regex, string, true false to check_format*/
%macro check_format(regex, string, truefalse);
pattern = prxparse(®ex.);
truefalse = prxmatch(pattern, &string);
put ®ex " " &string " " &truefalse "post";
%mend;
So sorry about the lack of indentation - stackover flow seems to be being buggy or something.
This outputs
/\d{8}/ one
good bye world one
apparently format isn't a string. So it then fails the prxparse, as it's looking for a string input.
Any idea of what I do?
I was thinking I could use a macro variable to put quotes around it, perhaps using:
call symput('mymacrovar', format1);
%let mymacrovar = "&mymacrovar";
but that symput does nothing.
Solved:
It was being read as a string. On the CSV file that the regex dataset was being read from, there were additional spaces between the commas, making the string ' /\d{8}/' which prxparse doesn't like.
It was being read as a string. On the CSV file that the regex dataset was being read from, there were additional spaces between the commas, making the string '_/\d{8}/' (underscore denoting a space) which prxparse doesn't like.