Converting CHAR to NUM with varying decimal places - sas

I am trying to convert a column stored from character to numeric. The problem is that this column has varying number of decimal places.
For example,
Data
1052969525
392282764.234
221018301.2
130010764.7894
82340150
183779233.4
I have determined that the likely maximum of decimal places is 4, the width required would be about 15. So I have attempted the following:
datanum = input(data, 15.4);
But this appears to put the decimal place in the wrong place, especially for those that have no decimal places. What is the most reasonable way to convert this column from char to numeric? This column is part of a database table uploaded by someone else so there's not much option to change that. Thanks.

You don't normally supply the decimal width in informats. For a normal number, you only supply the width, and SAS will figure out the decimal for you (based on the position of the decimal point).
datanum = input(data,15.);
The .d part of an informat is to allow for compatibility with (mostly) older systems that did not have decimals in the data, to save space. For example, if I'm reading in money amounts, and I only have 6 spaces:
123456
882348
100000
123400
I can read that in as an integer amount of cents - or I can do:
input cost 6.2;
That will then tell SAS to place the decimal before the last 2 characters.

Related

Identify the value with highest number of decimal values

I have a range of values and I want to count the decimal points of all values in the range and display the max count. the formula should exclude the zeroes at the end(not count ending zeroes in the decimal points).
for example, in the above sample, in the whole range the max of count of decimal places is 4 excluding the ending zeroes. so the answer is 4 to be displayed in cell D2
I tried doing regex, but do not know how do I do it for a whole range of values.
Please help!
try:
=INDEX(MAX(LEN(IFERROR(REGEXEXTRACT(TO_TEXT(A2:C4), "(\..+)")*1))-2))
Player0's solution is a good start, but uses TO_TEXT which seems to rely on the formatting of your cells.
If you want to safely compute the number of decimal places, use the TEXT function instead.
TEXT(number, format) requires a format whose max. number of decimal places has to be specified. There is no way around this, because formulas like =1/3 can have infinitely many decimal places.
Therefore, first decide on the max, precision for your use-case (here we use 8). Then use below function which works independently from your document's formatting and language:
=INDEX(MAX(
LEN(REGEXEXTRACT(
TEXT(ABS(A2:C4); "."&REPT("#";8));
"[,.].*$"
))-1
))
We subtract -1 since LEN(REGEXEXTRACT()) also counts the decimal separator (. for english, , for many others) .
Everything after the 8th decimal place is ignored. If all your numbers are something like 123.00000000987 the computed max. is 0. If you prefer it to be 8 instead, then add ROUNDUP( ; 8):
=INDEX(MAX(
LEN(REGEXEXTRACT(
TEXT(ROUNDUP(ABS(A2:C4);8); "."&REPT("#";8));
"[,.].*$"
))-1
))

powerquery: extra digits added to number when importing table

Glad to ask a question here again after more than 10 years (last one was about BASH scripting, now as I'm in corporate, guess what... it's about excel ;) )
here it's my question/issue:
I am importing data with powerquery for further analysis
I have discovered is that the values imported contains extradigits not present in the original table.
I have googled for this problem but I have not been able to find an explanation nor a solution ( a similar issue is this one this one , more than one year old, but with no feedback from Microsoft )
(columns are formatted as text in the screenshot but the issue is still present even if formatted as number)
The workaround I am using now, but I am not happy with that is the following:
I "increased decimal" to make sure all my digits are captured (in my source the entries do not have all the same significant digits),
saved as csv
imported impacted columns as number
convert columns as text (for future text match
I am really annoyed by this unwanted and unpredictable behaviour of excel.
I see a serious issue of data integrity, if we cannot rely on the powerquery/powerbi platform to maintain accurate queries, I wonder why would be use it
adding another screenshot to clarify that changing the source format to text does not solve the problem
another screenshot added following #David Bacci comments:
I think I wrongfully assumed my data was stored as text in the source, can you confirm?
If you are exporting and importing as text, then this will not happen. If you convert to number, you will lose precision. From the docs (my bold):
Represents a 64-bit (eight-byte) floating-point number. It's the most
common number type, and corresponds to numbers as you usually think of
them. Although designed to handle numbers with fractional values, it
also handles whole numbers. The Decimal Number type can handle
negative values from –1.79E +308 through –2.23E –308, 0, and positive
values from 2.23E –308 through 1.79E + 308. For example, numbers like
34, 34.01, and 34.000367063 are valid decimal numbers. The largest
precision that can be represented in a Decimal Number type is 15
digits long. The decimal separator can occur anywhere in the number.
The Decimal Number type corresponds to how Excel stores its numbers.
Note that a binary floating-point number can't represent all numbers
within its supported range with 100% accuracy. Thus, minor differences
in precision might occur when representing certain decimal numbers.
BTW, you should probably accept some of the good answers from your previous questions from 10 years ago.

One decimal field taking up 75% file size of power bi file

I have a Power bi file which is over a 2gb in size and found one field is taking up 1.5gb of the file size. When I change it to a whole number or decimal it is reduced to 350mb.
I wanted to change to a decimal but I feel it being changed to a decimal place shouldn't increase the file size so dramatically. Is this correct and wanted to check if this is expected behaviour
Thanks for any help
Here is a screenshot of the settings:
If you are ok with only preserving 4 decimals then you can switch to a “fixed decimal number” data type and it should compress the same as a whole number. Fixed decimal is stored as an integer and the last 4 digits are interpreted to be right of the decimal as explained here.

What is SAS format 8.

I am new to SAS and currently working on a small piece of work with SAS.
Could I please ask what the below format means? I believe the 8. is formatting two digits to the right of the decimal place such as 896.33 but I am not sure. Not really sure what input means.
input(tablename.fieldname, 8.)
That is an INFORMAT, not a FORMAT. It means to read the first 8 characters as a number. If there is a decimal point in the data then it is used naturally. You could have up to 7 digits to the right of the decimal point (since the decimal point would use up the eighth character position). It will also support reading scientific notation so '896.33E2' would mean the number 89,633.

SAS Format in calculation

I am creating New variable as AGE.The CUTOFF value is 100 and it is divided by 12 so the value is exactly 8.3333.....But Few freshness values are 8.3333333. I have to pick the value of SEGMENT if FRESHNESS>= 100/12, but its picking AMU where freshness is 8.3333... The format of FRESHNESS is F12.9 and CUTOFF is BEST12.
data new;
set SEGMENT_AGE;
IF Freshness< CUTOFF/12 THEN AGE=AMU;
ELSE AGE=SEGMENT;
RUN;
I tried with different format making cutoff to F12.9 , still its not working
You're running into an issue of floating point precision. If a number is a repeating decimal (in binary), you may have two different values (the higher or lower - ie, 0.333333333333333333 or 0.3333333333333333333334) depending on how it was arrived at. IE:
1-(1/3) - (1/3) = 0.33333333333333333334
0+(1/3) = 0.33333333333333333333
So do not assume it is precisely equal just because it looks like it should be. Further, some numbers in decimal that are not repeating decimals are repeating in binary - 7/10 for example is 0.7 decimal but is not storable precisely in binary.
You should compare rounded numbers if you need to compare precisely; for example,
if round(freshness,0.001) < round(cutoff/12,0.001) ...
should result in your calculations matching your expectations.