In one line, I would like to perform an operation on a specific row, the row of which is referenced by a number created by a local subtracting a number.
Here is a MWE:
sysuse auto2, clear
*save the number of observations in a local (is there a quicker way to do this?)
count
local N = r(N)
*make sure it works
di `N'
*make sure that the subtraction works
di `N'-1
*make the replacement, which works when referencing row with `N'
replace make = "def" in `N'
*here is the problem - subtracting from `N' doesn't work
replace make = "abc" in `N'-1
Error:
'74-1' invalid observation number
How can I solve this problem?
There are at least two ways to do this.
Create another local macro:
local Nm1 = `N' - 1
replace make = "abc" in `Nm1'
Force evaluation of an expression on the fly:
replace make = "abc" in `=`N'-1'
Related
I have a table in sas and I want to create a new column C with a variable that should be computed by A and B, A should be in upcase letters and B in brackets.
If A is dog and B is cat then the C in that row should be DOG (cat).
I' m very new to sas, how can I do that?
I know that I can get upcase by upcase(A), but I don't know how I can have 2 character variables after one another to create a new variable and how to put a new variable in brackets.
SAS has a series of CAT.() functions that make that simple. CATS() strips the leading/trailing spaces from the values. CATX() allows you specify a value to paste between the values.
data want ;
set have;
length new $100 ;
new=catx(' ',upcase(a),cats('[',b,']'));
run;
Personally, I'm using cat/cats/catx only in very specific cases. For a problem like this, you can simply use the concatenate operator || that will make the code much more easier to understand:
data want;
set have;
attrib new format=$100.;
new = strip(upcase(a)) || " (" || strip(b) || ")";
run;
OK, that's maybe a little bit more verbose, but I think that's also more easy to understand for a new SAS programmer :)
Why does this code not need two trim statements, one for first and one for last name? Does the length statement remove blanks?
data work.maillist; set cert.maillist;
length FullName $ 40;
fullname=trim(firstname)||' '||lastname;
run;
length is a declarative statement and introduces a variable to the Program Data Vector (PDV) with the specific length you specify. When an undeclared variable is used in a formula SAS will assign it a default length depending on the formula or usage context.
Character variables in SAS have a fixed length and are padded with spaces on the right. That is why the trim(firstname) is needed when || lastname concatenation occurs. If it wasn't, the right padding of firstname would be part of the value in the concatenation operations, and might likely exceed the length of the variable receiving the result.
There are concatenation functions that can simplify string operations
CAT same as using <var>|| operator
CATT same as using trim(<var>)||
CATS same as using trim(left(<var>))||
CATX same as using CATS with a delimiter.
STRIP same as trim(left(<var>))
Your expression could be re-coded as:
fullname = catx(' ', firstname, lastname);
Is there a reason you think it should? Can you see trailing spaces in the surname, have you tried a length() function?
I could be wrong here but sometimes when you apply a function (put especially) or import data you can inadvertently store leading or trailing spaces. Trailing spaces are a mystery because you don't realise they are there until you try to do something else with the data.
A length statement should allow you to store exactly the data you give it providing you use a number/character variable correctly with truncation only occurring if the length value is too short.
I've found the
compress() function to be the most convenient for dealing with white space and punctuation particularly if you are concatenating variables.
https://www.geeksforgeeks.org/sas-compress-function-with-examples/
All the best,
Phil
Because SAS will truncate the value when it is too long to fit into FULLNAME. And when it is too short it will fill in the rest of FULLNAME with spaces anyway so there is no need to remove them.
It would only be an issue if the length of FULLNAME is smaller than the sum of the lengths of FIRSTNAME and LASTNAME plus one. Otherwise the result cannot be too long to fit into FULLNAME, even if there are no trailing spaces in either FIRSTNAME or LASTNAME.
Try it yourself with non-blank values so it is easier to see what is happening.
1865 data test;
1866 length one $1 two $2 three $3 ;
1867 one = 'ABCD';
1868 two = 'ABCD';
1869 three='ABCD';
1870 put (_all_) (=);
1871 run;
one=A two=AB three=ABC
NOTE: The data set WORK.TEST has 1 observations and 3 variables.
So in my PostgreSQL 10 I have a column of type integer. This column represents a code of products and it should be searched against another code or part of the code. The values of the column are made of three parts, a five-digit part and two two-digit parts. Users can search for only the first part, the first-second or first-second-third.
So, in my column I have , say 123451233 the user searches for 12345 (the first part). I want to be able to return the 123451233. Same goes if the users also searches for 1234512 or 123451233.
Unfortunately I cannot change the type of column or break the one column into three (one for every part). How can I do this? I cannot use LIKE. Maybe something like a regex for integers?
Thanks
Consider to use simple arithmetic.
log(value)::int + 1 returns the number of digits in integer part of the value and using this:
value/(10^(log(value)::int-log(search_input)::int))::int
returns value truncated to the same digits number as search_input so, finally
search_input = value/(10^(log(value)::int-log(search_input)::int))::int
will make the trick.
It is more complex literally but also could be more efficient then strings manipulations.
PS: But having index like create index idx on your_table(cast(your_column as text)); search like
select * from your_table
where cast(your_column as text) like search_input || '%';
is the best case IMO.
You do not need regex functions. Cast the integer to text and use the function left(), example:
create table my_table(code int); -- or bigint
insert into my_table values (123451233);
with input_data(input_code) as (
values('1234512')
)
select t.*
from my_table t
cross join input_data
where left(code::text, length(input_code)) = input_code;
code
-----------
123451233
(1 row)
I have a dataset with missing values coded "missing". How do I recode these so Stata recognizes them as missing values? When I have numeric missing values, I have been using e.g.:
mvdecode _all, mv(99=. )
However, when I run this with a character in it, e.g.:
mvdecode _all, mv("missing"=. )
I get the error missing is not a valid numlist.
mvdecode is for numeric variables only: the banner in the help is "Change numeric values to missing values" (emphasis added). So the error message should make sense: the string "missing" is certainly not a numeric value, so Stata stops you there. It makes no sense to say to Stata that numeric values "missing" should be changed to system missing, as you requested.
As for what you should do, that depends on what you mean in Stata terms by coded "missing".
If you are referring to string variables with literal values "missing" which should just be replaced by the empty string "", then that would be a loop over all string variables:
ds, has(type string)
quietly foreach v in `r(varlist)' {
replace `v' = "" if `v' == "missing"
}
If you are referring to numeric variables for which there is a value label "missing" then you need to find out the corresponding numeric value and use that in your call to mvdecode. Use label list to look up the asssociation between values and value labels.
mvdecode works with numlists, not strings (clearly stated in help mvdecode). The missing value for strings in Stata is denoted by "".
clear
set more off
*----- example dataset -----
sysuse auto
keep make mpg
keep in 1/5
replace make = "missing" in 2
list
*----- what you want -----
ds, has(type string)
foreach var in `r(varlist)' {
replace `var' = "" if `var' == "missing"
}
list
list if missing(make)
You can verify that Stata now recognizes one missing value for the string variable using the missing() function.
I have a few rows in a test database where there are dollar signs prefixed to the value. I want to UPDATE the values in the name row of the test1 table however when I threw the following query together it emptied the six rows of data in the name column...
UPDATE test1 SET name=overlay('$' placing '' from 1 for 1);
So "$user" became "" when I intended for that column/row value to become "user".
How do I combine UPDATE and a substr replacement without deleting any of the other data?
If there isn't a dollar sign I want the row to remain untouched.
The dollar sign only occurs as the first character when it does occur.
If you want to replace all dollar signs, use this:
update test1
set name = replace(name, '$', '');
If you want to replace the $ only at the beginning of the value you can use substr() and a where clause to only change those rows where the column actually starts with a $
update test1
set name = substr(name, 2)
where name like '$%';
To answer the question using the pattern the OP had in mind.
UPDATE test1 SET name=overlay(name placing '' from 1 for 1)
WHERE name like '$%';