What is SAS format 8. - sas

I am new to SAS and currently working on a small piece of work with SAS.
Could I please ask what the below format means? I believe the 8. is formatting two digits to the right of the decimal place such as 896.33 but I am not sure. Not really sure what input means.
input(tablename.fieldname, 8.)

That is an INFORMAT, not a FORMAT. It means to read the first 8 characters as a number. If there is a decimal point in the data then it is used naturally. You could have up to 7 digits to the right of the decimal point (since the decimal point would use up the eighth character position). It will also support reading scientific notation so '896.33E2' would mean the number 89,633.

Related

Identify the value with highest number of decimal values

I have a range of values and I want to count the decimal points of all values in the range and display the max count. the formula should exclude the zeroes at the end(not count ending zeroes in the decimal points).
for example, in the above sample, in the whole range the max of count of decimal places is 4 excluding the ending zeroes. so the answer is 4 to be displayed in cell D2
I tried doing regex, but do not know how do I do it for a whole range of values.
Please help!
try:
=INDEX(MAX(LEN(IFERROR(REGEXEXTRACT(TO_TEXT(A2:C4), "(\..+)")*1))-2))
Player0's solution is a good start, but uses TO_TEXT which seems to rely on the formatting of your cells.
If you want to safely compute the number of decimal places, use the TEXT function instead.
TEXT(number, format) requires a format whose max. number of decimal places has to be specified. There is no way around this, because formulas like =1/3 can have infinitely many decimal places.
Therefore, first decide on the max, precision for your use-case (here we use 8). Then use below function which works independently from your document's formatting and language:
=INDEX(MAX(
LEN(REGEXEXTRACT(
TEXT(ABS(A2:C4); "."&REPT("#";8));
"[,.].*$"
))-1
))
We subtract -1 since LEN(REGEXEXTRACT()) also counts the decimal separator (. for english, , for many others) .
Everything after the 8th decimal place is ignored. If all your numbers are something like 123.00000000987 the computed max. is 0. If you prefer it to be 8 instead, then add ROUNDUP( ; 8):
=INDEX(MAX(
LEN(REGEXEXTRACT(
TEXT(ROUNDUP(ABS(A2:C4);8); "."&REPT("#";8));
"[,.].*$"
))-1
))

powerquery: extra digits added to number when importing table

Glad to ask a question here again after more than 10 years (last one was about BASH scripting, now as I'm in corporate, guess what... it's about excel ;) )
here it's my question/issue:
I am importing data with powerquery for further analysis
I have discovered is that the values imported contains extradigits not present in the original table.
I have googled for this problem but I have not been able to find an explanation nor a solution ( a similar issue is this one this one , more than one year old, but with no feedback from Microsoft )
(columns are formatted as text in the screenshot but the issue is still present even if formatted as number)
The workaround I am using now, but I am not happy with that is the following:
I "increased decimal" to make sure all my digits are captured (in my source the entries do not have all the same significant digits),
saved as csv
imported impacted columns as number
convert columns as text (for future text match
I am really annoyed by this unwanted and unpredictable behaviour of excel.
I see a serious issue of data integrity, if we cannot rely on the powerquery/powerbi platform to maintain accurate queries, I wonder why would be use it
adding another screenshot to clarify that changing the source format to text does not solve the problem
another screenshot added following #David Bacci comments:
I think I wrongfully assumed my data was stored as text in the source, can you confirm?
If you are exporting and importing as text, then this will not happen. If you convert to number, you will lose precision. From the docs (my bold):
Represents a 64-bit (eight-byte) floating-point number. It's the most
common number type, and corresponds to numbers as you usually think of
them. Although designed to handle numbers with fractional values, it
also handles whole numbers. The Decimal Number type can handle
negative values from –1.79E +308 through –2.23E –308, 0, and positive
values from 2.23E –308 through 1.79E + 308. For example, numbers like
34, 34.01, and 34.000367063 are valid decimal numbers. The largest
precision that can be represented in a Decimal Number type is 15
digits long. The decimal separator can occur anywhere in the number.
The Decimal Number type corresponds to how Excel stores its numbers.
Note that a binary floating-point number can't represent all numbers
within its supported range with 100% accuracy. Thus, minor differences
in precision might occur when representing certain decimal numbers.
BTW, you should probably accept some of the good answers from your previous questions from 10 years ago.

How to use numbers present as text with different unit prefixes in calculations

I have data in a spreadsheet describing amount of data transferred over a mobile network: data in one column (over 300 rows) has three possible forms:
123,45KB
123,45MB
1,23GB
How can I transform or use this data in order to sum or do other calculations on numbers properly?
Assuming your data is in column A and there are always two characters as unit ("KB", "MB" or "GB") at the end, then the formula for transforming the data to numeric could be:
=--LEFT(A2;LEN(A2)-2)*10^(IF(RIGHT(A2;2)="KB";3;IF(RIGHT(A2;2)="MB";6;IF(RIGHT(A2;2)="GB";9))))
Result:
Put the formula in B2 and fill downwards as needed.
I suspected the decimal delimiter in your locale is comma. If not, please state what it is.
Also since this site is English, I have used English function names. Maybe you need to translate them into your language version.
If the decimal delimiter in your locale is not comma, then you need substituting the comma with your decimal delimiter to get a proper numeric decimal value.
For example if the decimal delimiter is dot, then:
=SUBSTITUTE(LEFT(A2,LEN(A2)-2),",",".")*10^(IF(RIGHT(A2,2)="KB",3,IF(RIGHT(A2,2)="MB",6,IF(RIGHT(A2,2)="GB",9))))
An alternative formula:
=LEFT(A1,LEN(A1)-2)*10^(3*MATCH(RIGHT(LEFT(A1,LEN(A1)-1)),{"K","M","G"},0))
Uses the position of the next to last character in an array to determine the factor.

Converting CHAR to NUM with varying decimal places

I am trying to convert a column stored from character to numeric. The problem is that this column has varying number of decimal places.
For example,
Data
1052969525
392282764.234
221018301.2
130010764.7894
82340150
183779233.4
I have determined that the likely maximum of decimal places is 4, the width required would be about 15. So I have attempted the following:
datanum = input(data, 15.4);
But this appears to put the decimal place in the wrong place, especially for those that have no decimal places. What is the most reasonable way to convert this column from char to numeric? This column is part of a database table uploaded by someone else so there's not much option to change that. Thanks.
You don't normally supply the decimal width in informats. For a normal number, you only supply the width, and SAS will figure out the decimal for you (based on the position of the decimal point).
datanum = input(data,15.);
The .d part of an informat is to allow for compatibility with (mostly) older systems that did not have decimals in the data, to save space. For example, if I'm reading in money amounts, and I only have 6 spaces:
123456
882348
100000
123400
I can read that in as an integer amount of cents - or I can do:
input cost 6.2;
That will then tell SAS to place the decimal before the last 2 characters.

Incorrect conversion when decimal point embedded in VT_BSTR and German locale used

I have a piece of code(c++) that is writing some floating point values to excel like this:
...
values[ position ].bstrVal = formattedValue;
values[ position ].vt = VT_BSTR;
...
as you can see those floating point values are stored in the form of string and the decimal point is formatted in different ways, for example:
"110.000000", "20.11" etc. (this example is for English locale)
Now it works perfectly when English locale is used. However when I switch to German locale in the Control Panel the decimal point is changed to "," (and that's fine) but after passing those localized strings to Excel they are not correctly converted. For example in case of writing "110,000000" I'm getting 100 millions in excel. Other values like "20,11" stay as a text.
The only way to fix this is to overwrite the decimal point with "." in my program before writing to Excel. Any ideas why the conversion is not locale-aware when using VT_BSTR?
I should also add that I tried to switch the locale in my program from default one to German - still no luck.
Thank you in advance
It is never a good idea to let Excel guess at the value type. Do not use VT_BSTR, a currency value should be of variant type VT_CY. Assign the cyVal member with the value. It is an 8 byte integer value (int64 member of type LONGLONG), the currency amount multiplied by 10,000. Ten thousand :)