Checking data type of variable derived from user input - stata

I have the following block of code that I thought should be able to classify a user's input's data type in a Stata .do file:
capture program drop smth
program define smth
di "Enter smth: " _request(smth1)
local type = substr("`: type $smth1 '", 1, 3)
if "`type'" == "str" {
di "It is a string!"
}
else if "`type'" == "flo" {
di "It is a float!"
}
else if "`type'" == "int" {
di "It is an integer!"
}
else {
di "it is not a string, float nor integer!"
}
end
However, when I executed the .do file (trialscript is the name of the .do file) in a Stata command prompt with the user input, "hello", I encountered the following error:
. do trialscript
. capture program drop smth
. program define smth
1. di "Enter smth: " _request(smth1)
2. local type = substr("`: type $smth1 '", 1, 3)
3. if "`type'" == "str" {
4. di "It is a string!"
5. }
6. else if "`type'" == "flo" {
7. di "It is a float!"
8. }
9. else if "`type'" == "int" {
10. di "It is an integer!"
11. }
12. else {
13. di "it is not a string, float nor integer!"
14. }
15. end
.
.
end of do-file
. smth
Enter smth: . hello
no variables defined
it is not a string, float nor integer!

What the user enters given your code is put into a global macro, which is not a variable in Stata's sense, as a variable is (only) a column of data in a dataset. The type syntax you used works only with variables.
All global macros that are defined are strings. The programmer and user can think they contain numbers if and only if their content can be used numerically.
A test of whether the input is numeric is to try something numeric, e.g.
capture di 1 + $smth1
if _rc di "it is a string"
else di "it is a number"
This is not quite fail-safe, as a string might contain the name of a numeric variable or scalar, in which case the operation should work.
A test of whether a global macro contains a string that can be interpreted as an integer is to check whether floor($smth1) == $smth1 or equivalently that ceil() and round() return the input value.
There is no sense in which a global macro or its contents can be a float or int, except by trying whether such a variable would accept the contents as a value.
Stata's terminology here is that of many statistical programs in which a variable is a column in a dataset. It comes as a surprise to many of those who started with a mainstream programming language, as I did myself. More at https://www.stata.com/statalist/archive/2008-08/msg01258.html
The kind of input you are programming is now unusual in Stata.

Related

How can I fix a for loop that checks dictionary values?

I know it may sound like a newbie question, but i'm having a hard time trying to make a loop work.
I'm using Julia language to create a simple regular expressions that checks if a telephone number is valid and return the estate of the number according to the code area. Here are the details:
1-User enters the phone number;
2-Regex tests if it's a valid number;
3-The area code is parsed and checked if present in the dictionary values. If yes, then it should return the key owning that value. Otherwise a simple message saying that area code doesn't exist should be printed.
The problem is: the loop goes for as long as possible, printing if the value is available or not everytime it checks for it in the dictionary. A break wouldn't help much: as soon as it got the first check, it would simply stop.
Then ofc I noticed I had to take the final print statement out of the loop and even maybe assign a variable to get the value copied, but it still overwrites with the very last "Number doesn't exist" result.
How can I rewrite this code so it works?
regexTel = r"^(\+55)?[\s]?\(?(\d{2})?\)?[\s-]?(9?\d{4}[\s-]?\d{4})$"
areaCode = Dict("City A"=> [68],
"City B"=> [82], [...] (and so on)
print("Type the phone number:\n")
telNum = readline()
validTel = (match(regexTel, telNum))
fnlAreaCd = parse(Int32, validTel[2])
for (cityCode, availbCode) in areaCode
if fnlAreaCd in availbCode
println("Phone number: ", validTel[3], "\n",
"Area code: ", fnlAreaCd, "\n",
"City: ", cityCode)
else
println("Area code doesn't exist")
end
end
Your usage of Dict seems strange to me. I think you should use it the other way around.
I believe that the area codes are unique, then the varaible areaCode would be like this:
areaCode = Dict(68 => "City A", 82 => "City B", ...) # do you need to wrap them in an array?
It'll allow you to write much simplier code:
if haskey(areaCode, fnlAreaCd)
cityCode = areaCode[fnlAreaCd]
println("Phone number: ", validTel[3], "\n",
"Area code: ", fnlAreaCd, "\n",
"City: ", cityCode)
else
println("Area code doesn't exist")
end
The matching city won't necessarily be the first entry in the dictionary, but you have instructed the code to say it doesn't exist any time it finds a non-match! Try this instead of your bare loop:
function lookup_areacode(acode)
code_exists = false
for (cityCode, availbCode) in areaCode
if acode in availbCode
code_exists = true
println("Phone number: ", tel, "\n",
"Area code: ", acode, "\n",
"City: ", cityCode)
end
end
if !code_exists
println("Area code doesn't exist")
end
end
lookup_areacode(fnlAreaCd)
You may want to move the dictionary inside the function for neatness.

GSTIN Validation through condition not working in Perl

I have written Perl code for validating GSTIN Number which is related to India’s tax according to the following rules:
The first two digits represent the state code as per Indian Census 2011. Every state has a unique code.
The next ten digits will be the PAN number of the taxpayer
The thirteenth digit will be assigned based on the number of registration within a state
The fourteenth digit will be Z by default
The last digit will be for check code. It may be an alphabet or a number.
Following is the code:
my $gst_number_input = '35AABCS1429B1AX';
my $gst_number_character_count = length($gst_number_input);
my $gst_validation =~ /\d{2}[A-Z]{5}\d{4}[A-Z]{1}[A-Z\d]{1}[Z]{1}[A-Z\d]{1}/;
if ($gst_number_character_count == 15 && $gst_number_input =~ $gst_validation) {
print "GST Number is valid";
} else {
print "Invalid GST Number";
}
I have an invalid GSTIN input entered in the code. So when I run the script, I get:
GST Number is valid
Instead I should get the error because the GSTIN input is invalid:
Invalid GST Number
Can anyone please help ?
Thanks in advance
In this part you are using =~ where is should be an equals sign =
my $gst_validation =~ /\d{2}[A-Z]{5}\d{4}[A-Z]{1}[A-Z\d]{1}[Z]{1}[A-Z\d]{1}/;
If you want to use is as a variable, you could use qr
Note that you can omit {1} from the pattern and you don't have to use the square brackets around [Z]
You code might look like
my $gst_number_input = '35AABCS1429B1AX';
my $gst_number_character_count = length($gst_number_input);
my $gst_validation = qr/\d{2}[A-Z]{5}\d{4}[A-Z][A-Z\d]Z[A-Z\d]/;
if ($gst_number_character_count == 15 && $gst_number_input =~ $gst_validation) {
print "GST Number is valid";
} else {
print "Invalid GST Number";
}

Why are these obviously same strings not equal

I am trying following code:
open Str
let ss = (Str.first_chars "testing" 3);;
print_endline ("The first 3 chars of 'testing' are: "^ss);;
if (ss == "tes")
then print_endline "These are equal to 'tes'"
else print_endline "These are NOT equal to 'tes'"
However, I am getting these are NOT equal:
$ ocaml str.cma testing2.ml
The first 3 chars of 'testing' are: tes
These are NOT equal to 'tes'
Why first 3 characters pulled by Str.first_chars from "testing" not equal to "tes"?
Also, I had to use ;; to make this code work (combinations of in and ; which I tried did not work). What is the best way to put these 3 statements together?
The (==) function is the physical equality operator. If you want to test whether two objects have the same contents, then you should use the structural equality operator which has one equal sign (=).
What is the best way to put these 3 statements together?
There are no statements in OCaml. Only expressions, all returning values. It is like a mathematical formula, where you have numbers, operators, and functions and you combine them together into bigger formulae, e.g., sin (2 * pi). The closest thing to the statement is an expression which has side effects and returns a value of type unit. But this is still expression.
Here is an example, how you can build your expression, which will first bind the returned substring to the ss variable, and then compute in order two expressions: an unconditional print, and a conditional print. Altogether, this will be one expression evaluating to the unit value.
open Str
let () =
let ss = Str.first_chars "testing" 3 in
print_endline ("The first 3 chars of 'testing' are: " ^ ss);
if ss = "tes"
then print_endline "These are equal to 'tes'"
else print_endline "These are NOT equal to 'tes'"
and here is how it works
$ ocaml str.cma test.ml
The first 3 chars of 'testing' are: tes
These are equal to 'tes'

Stata: label variables using forvalue loop

I am trying to label a batch of variables using a loop as follows, but failed with stata error "invalid syntax". I couldn't find out where went wrong.
local myvars "basicenumerator" "basicfr_gpslatitude" "basicfr_gpslongitude"
local mylabels "Name of enumerator" "the latitude of the farmers house" "the longtitude of the farmers house"
local n : word count `mylabels'
forvalues i = 1/`n'{
local a: word `i' of `mylabels'
local b: word `i' of `myvars'
label var `b' "`a'"
}
To debug this, the main trick is to get Stata to show you what it thinks the local macros are. This script makes your code reproducible and also fixes it.
clear
set obs 1
gen basicenumerator = 42
gen basicfr_gpslatitude = 42
gen basicfr_gpslongitude = 42
local myvars `" "basicenumerator" "basicfr_gpslatitude" "basicfr_gpslongitude" "'
local mylabels `" "Name of enumerator" "the latitude of the farmers house" "the longtitude of the farmers house" "'
local n : word count `mylabels'
mac li
forvalues i = 1/`n'{
local a: word `i' of `mylabels'
local b: word `i' of `myvars'
label var `b' "`a'"
}
The problem is that the outer " " get stripped in defining your locals, so to keep the " " as desired, you need to wrap each string within compound double quotes.
For explanation, see http://www.stata.com/manuals14/u12.pdf 12.4.6.
Picky correction: spelling is longitude.

Taking data from columns in two separate files and combining in another file

There are two files, each has multiple columns of data, up to around 14,000 rows, neatly spaced and everything. File1 has 6 columns (Student ID #, semester code #, class name, class code # (though some have letters), the letter grade the student received, and the numeric grade they received.
The second file has 4 columns. Class name, class code, how many hours per week it is, and designation code (three letters indicating whether its a liberal arts class or not).
The task is to output everything from the first file into the new file, but add on two columns (from the second file) corresponding to each appropriate row, that have the hours for the course and designation code.
The second task is to take this new file, and output into it the students ID, overall GPA, GPA in CSCI courses, and a percent of hours spent taking non-liberal arts courses.
I'm not asking for someone to do it for me (obviously), it's just that I've run out of ideas. We're supposed to use nothing more than fstream, iostream, strings, if statements, loops, functions, and " .clear(); " and " seekg(ios::beg); " (also we're not supposed to use getline)
basically super simple stuff, no arrays or vectors or anything.
I figured out how to output parts of the two files into the third file using while loops and if statements, but I have no idea how to tell it to compare values in a column from one file to a column in a different file and that if the values are equal, to output the corresponding values from the other columns (the amount of hours for each class and designation code). And I need a lot of help with the second task as well.
What you're looking for is a map. If you need help streaming into a map you can check out this post: Is there a Way to Stream in a Map?
But what you'll want to do is stream File2 into a map, useing the "class code" as the map key, and a tuple or your own custom struct as value. Then index that map with the "class code #" from the line you're currently outputting from File1, appending the appropriate elements of the map's indexed value.
All this may sound like hand-waving, so, because the question lacks an exemplary input and output, I have created an exemplary File1 input as though it had already been streamed in: tuple<int, int, string, string, char, int> File1[] = {make_tuple(13, 1, "Computer Science 1", "CS101", 'A', 100), make_tuple(13, 2, "Computer Science 2", "CS201", 'A', 100)}; and File2 input as though it had already been streamed in: map<string, tuple<string, int, string>> File2 = {make_pair("CS101", make_tuple("Computer Science 1", 4, "NOT")), make_pair("CS201", make_tuple("Computer Science 2", 4, "NOT"))};
These can then be streamed out, potentially to another file as follows:
for(auto& it : File1) {
const auto& i = File2[get<3>(it)];
cout << get<0>(it) << ' ' << get<1>(it) << ' ' << get<2>(it) << ' ' << get<3>(it) << ' ' << get<4>(it) << ' ' << get<5>(it) << ' ' << get<1>(i) << ' ' << get<2>(i) << endl;
}
[Live Example]