SAS displays a value longer than the column's length - sas

I have a column with a length of 8. However, when I view the data, the value in the column is above 8 characters long.
See the following screenshot:

Your mixing up Length and Format.
http://blogs.sas.com/content/sasdummy/2007/11/20/lengths-and-formats-the-long-and-short-of-it/
Length: The column length, in SAS terms, is the amount of storage allocated in the data set to hold the column values. The length is specified in bytes. For numeric columns, the valid lengths are usually 3 through 8. The longer the length, the greater the precision allowed within the column values. For character columns, the length can be 1 through 32767. For single-byte data values, that equates to the number of characters the column can hold. For multibyte data values (DBCS, Unicode, or UTF-8), where a character can occupy more than one byte, the number of characters that fit might be less than the length value of the column.
Format: The column format, in SAS terms, is a basically an instruction for how to transform a raw value into an appearance that is suitable for a given purpose. A basic attribute of a format is the format length, which controls how much of the value is displayed. For example, a character column might have a storage length of 10 bytes, but a format length of 5 characters ($5. format), so when you see the formatted values you will see at most 5 characters for each record.

Related

Can SAS support numeric that is longer than 16 digits?

I have a requirement set whereby some of the SAS numeric columns must be able to store numeric value that is more than 16 digits. For example:
123,456,789,123,456,789,123,123.9996
It is actually 24.4 by looking at that.
I've studied a few pages such as :
http://www.sfu.ca/sasdoc/sashtml/unixc/z0344718.htm
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lrcon/p0ji1unv6thm0dn1gp4t01a1u0g6.htm
http://v8doc.sas.com/sashtml/win/numvar.htm#:~:text=The%20maximum%20number%20of%20variables,can%20be%20is%20160%20bytes.
It seems to me that the maximum numeric length that SAS support is 8 bytes which can only support 16 digits whole number. Is there a way to achieve numeric value that is "24.4" like the above example?

SAS, converting numbers, from character format to numeric format, keeping all leading zeros, but length of numbers is NOT uniform

I'm working in SAS EG and I'm trying to convert a column that's in character format to numeric format, EXACTLY as they appear in their character format. The numbers vary in length and some have one or two leading zeros.
If I do it one way, it gets rid of all leading zeros. Another way I tried, it adds leading zeros to the point that it's as long as the longest number in the column, e.g., a 9-digit number with one leading zero now has four leading zeros because the longest number in the column is 12 digits. (I hope this description makes sense).
I'm working in SAS EG. When I run proc contents, it tells me my existing variable is a character variable of length 26. It is blank for both 'format' and 'informat.'
I need to convert it so that a new column is a numeric variable, with length 8, and 'F12.' for 'format' and 'BEST12.' for 'informat,' as I plan to use it to match two data sets.
I created the following test data set in 'regular' SAS, but I'm not sure if fully recreates the issue I'm working on in SAS EG:
data have;
input mrn $1-12;
cards;
118283586928
003875807
038087875
0385709873
0038576830
;
run;
As you can see, I have one number that's 12 digits long (no leading zeros); two that are 9 digits (with one or two leading zeros); and two that are 10 digits (with one or two leading zeros).
Any help would be greatly appreciated.
Thanks
You cannot store 26 digit strings exactly as a number in SAS. SAS stores numbers as floating point values. You can use the CONSTANT() function to see the end of the contiguous integers that can be stored exactly.
73 data _null_;
74 x=constant('exactint');
75 put x= comma30.;
76 run;
x=9,007,199,254,740,992
So if you actually have values longer than 15 digits in the character variable you will not be able to convert them to numbers.
But if they are only 12 digits long then just convert the strings into numbers and compare the numbers.
proc sql;
create table want as
select *
from a, b
where a.mrn = input(b.mrn_string,32.)
;
quit;
It's not possible to have different formats in the same column in SAS. The only way to keep them looking exactly as they do while in the same column is to keep them as text. If you need to do calculations on them I'd suggest just creating a 2nd column with their numeric values.
Leading zeros can be added to numbers using the z. format.
https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000205244.htm

How to use numbers present as text with different unit prefixes in calculations

I have data in a spreadsheet describing amount of data transferred over a mobile network: data in one column (over 300 rows) has three possible forms:
123,45KB
123,45MB
1,23GB
How can I transform or use this data in order to sum or do other calculations on numbers properly?
Assuming your data is in column A and there are always two characters as unit ("KB", "MB" or "GB") at the end, then the formula for transforming the data to numeric could be:
=--LEFT(A2;LEN(A2)-2)*10^(IF(RIGHT(A2;2)="KB";3;IF(RIGHT(A2;2)="MB";6;IF(RIGHT(A2;2)="GB";9))))
Result:
Put the formula in B2 and fill downwards as needed.
I suspected the decimal delimiter in your locale is comma. If not, please state what it is.
Also since this site is English, I have used English function names. Maybe you need to translate them into your language version.
If the decimal delimiter in your locale is not comma, then you need substituting the comma with your decimal delimiter to get a proper numeric decimal value.
For example if the decimal delimiter is dot, then:
=SUBSTITUTE(LEFT(A2,LEN(A2)-2),",",".")*10^(IF(RIGHT(A2,2)="KB",3,IF(RIGHT(A2,2)="MB",6,IF(RIGHT(A2,2)="GB",9))))
An alternative formula:
=LEFT(A1,LEN(A1)-2)*10^(3*MATCH(RIGHT(LEFT(A1,LEN(A1)-1)),{"K","M","G"},0))
Uses the position of the next to last character in an array to determine the factor.

Redshift varchar too narrow

I've got a table that I populate with tab-separated data from files whose encoding doesn't seem to be utf-8 exactly, like so:
CREATE TABLE tab (
url varchar(2000),
...
);
COPY tab
FROM 's3://input.tsv'
After the copy has completed I run
SELECT
MAX(LEN(url))
FROM tab
which returns 1525. I figure, since I'm wasting space I might as well resize the column by almost a quarter by using varchar(2000) instead of varchar(1525). But neither redoing the COPY nor setting up a new table and inserting the already imported data works. In both cases I get
error: Value too long for character type
Why won't the column hold these values?
Your file might be in a multi-byte format.
From the LEN Function documentation:
The LEN function returns an integer indicating the number of characters in the input string. The LEN function returns the actual number of characters in multi-byte strings, not the number of bytes. For example, a VARCHAR(12) column is required to store three four-byte Chinese characters. The LEN function will return 3 for that same string.
The extra size of a VARCHAR will not waste disk space due to the compression methods used by Amazon Redshift, but it will waste in-memory buffer space when a block is read from disk and decompressed into memory.

Changed byte value solved this, but why? SAS: this range is repeated, or values overlap

My BI department just ran into the SAS error: this range is repeated, or values overlap.
I found some links they looked at and found that there was an error in a macro.
The error was that the length of a numeric variable byte value was changed from 7 to 6 bytes created this error.
Now when they changed it back to it's previous value everything is ok.
What is this behaviour all about? Are there some logic in this?
When reducing the length of a variable from 7 to 6 bytes, some numbers might get "truncated". 7 bytes can store integers up to 35,184,372,088,832 while 6 bytes can store only integers up to 137,438,953,472. Decimal numbers should always be length 8. See here for details.