Convert Scientific notation to text or integer using regex in Notepad++ - regex

I am looking to convert a scientific notation number into the full integer number.
E.g:
8.18234E+11 => 818234011668
Excel reformatted all my upc codes within a csv and this solution is not working for me.
I have my csv open in Notepad++ and would love to do this using a regex find and replace.
Thanks.

The damage is already done and cannot be recovered from the CSV file. 8.18234E+11 could be anything* from 818233500000 to 818234499999.
To prevent Excel from rounding large numbers, you need to store them as text. If you set the cell format to text, any value inserted from then on should be automatically interpreted as text. In OpenOffice Calc (I don't have MS Excel), you can also prefix a numeric value with ' to get it interpreted as text no matter the cell format.
There is a chance that the correct value is stored in the original XLS (or XSLX or ODS or the live Excel session or ...) file. If not, then you'll have to enter the data again. If the data is there, you need to store it as text or increase the number of significant digits in the exported CSV. If you only have the exported data, then you're out of luck.
*UPC has a single check digit, so only 100 000 out of the 1 000 000 codes are actually valid UPC codes.

Related

How to let PowerBI read dot as decimal in csv

I use R output csv file for powerbi
and the decimal in the file is dot
e.g. 27.25
and the PowerBI treat this as Text rather than number.
How should I fix this?
because this is .csv file, I will not able to change 27.25 to 27,25
I tried to change the 'reginal setting' to 'English''chinese''us'...not work.
any suggestion? thank you!
You can use the functionality Using Locale to transform a column to a desired region.
So, I have loaded these values as an example
With Using Locale, I can transform decimals on the expected locality, in this case, any country that uses dots as decimal, I will use English (United States).
Step 1
Step 2
Output

Meaning of 3F7.1 in Fortran data format

I am trying to create an MDM file using HLM 7 Student version, but since I don't have access to SPSS I am trying to import my data using ASCII input. As part of this process I am required to input the data format Fortran style. Try as I might I have not been able to understand this step. Could someone familiar with Fortran (or even better HLM itself) explain to me how this works? Here is my current understanding
From the example EG3.DAT they give
(A4,1X,3F7.1)
I think
A4 signifies that the ID is 4 characters long.
1X means skip a space.
F.1 means that it should read 1 decimal places.
I am very confused about what 3F7 might mean.
EG3.DAT
2020 380.0 40.3 12.5
2040 502.0 83.1 18.6
2180 777.0 96.6 44.4
Below are examples from the help documents.
Rules for format statement
Format statement example
EG1 data format
EG2 data format
EG3 data format
One similar question is Explaining Fortran Write Format. Unfortunately it does not explicitly treat the F descriptor.
3F7.1 means 3 floating point numbers, each printed over 7 characters, each with one decimal number behind the decimal point. Leading characters are blanks.
For reading you don't need the .1 info at all, just read a floating point number from those 7 characters.
You guessed the meaning of A4 (string of four characters) and 1X (one blank) correctly.
In Fortran, so-called data edit descriptors (which format the input or output of data) may have repeat specifications.
In the format (A4,1X,3F7.1) the data edit descriptors are A4 and F7.1. Only F7.1 has a repeat specification (the number before the F). This simply means that the format is as though the descriptor appeared repeated: like F7.1, F7.1, F7.1. With a repeat specification of 1, or not given, there is just the single appearance.
The format of the question, then, is like
(A4,1X,F7.1,F7.1,F7.1)
This format is one that is covered by the rules provided in one of the images of the question. In particular, the aspect of repeat specification is given in rule 2 with the corresponding example of rule 3.
Further, in Fortran proper, a repeat count specifier may also be * as special case: that's like an exceptionally large repeat count. *(F7.1) would be like F7.1, F7.1, F7.1, .... I see no indication that this is supported by HLM but if this is needed a very large repeat count may be given instead.
In 1X the 1 isn't a repeat specification but an integral, and necessary, part of the position edit descriptor.
Procedure for making MDM file from excel for HLM:
-Make sure ALL the characters in ALL the columns line up
Select a column, then right click and select Format Cells
Then click on 'Custom' and go to the 'Type' box and enter the number
of 0s you need to line everything up
-Remove all the tabs from the document and replace them with spaces.
Open the document in word and use find and replace
-To save the document as .dat
First save it as .txt
Then open it in Notepad and save it as .dat
To enter the data format (FORTRAN-Style)
The program wants to read the data file space by space, so you have to specify it perfectly so that it reads the whole set properly.
If something is off, even by a single space, then your descriptive stats will be wonky compared to if you check them in another program.
Enclose the code with brackets ()
Divide the entries with commas ,
-Need ID column for all levels
ID column needs to be sorted so that it is in order from smallest to
largest
Use A# with # being the number of characters in the ID
Use an X1 to
move from the ID to the next column
-Need to say how many characters are needed in each column
Use F
After F is the number of characters needed for that column -Use F# (#= number)
There need to be enough character spaces to provide one 'gap' space
between each column
There need to be enough to character spaces to allow for the decimal
As part of the F you need to specify the number of decimal places
You do this by adding a decimal point after the F number and then a
number to represent the spaces you need -F#.#
You can use a number in front of the F so as to 'repeat' it. Not
necessary though. -#F#.#
All in all, it should look something like this:
(A4,X1,F4.0,F5.1)
Helpful links:
https://books.google.de/books?id=VdmVtz6Wtc0C&pg=PA78&lpg=PA78&dq=data+format+fortran+style+hlm&source=bl&ots=kURJ6USN5e&sig=fdtsmTGSKFxn04wkxvRc2Vw1l5Q&hl=en&sa=X&ved=0ahUKEwi_yPurjYrYAhWIJuwKHa0uCuAQ6AEIPzAC#v=onepage&q&f=false
http://www.ssicentral.com/hlm/help6/error/Problems_creating_MDM_files.pdf
http://www.ssicentral.com/hlm/help7/faq/FAQ_Format_specifications_for_ASCII_data.pdf

Regular expression for a time format

I am trying to search for cells in a column containing a time in the format of 00:00 using the MATCH function. I have tried things similar to MATCH("??:??",A:A) and MATCH("?*:?*",A:A) with no luck. How can I form a regular expression of 2 digits, then a colon, then 2 digits?
Often times what is displayed to us in excel isn't the actual value. Time and Dates are one such time. Generally if you enter a time into excel 3:55 it will convert that automatically to excel's time format, which is a decimal number: 0.163194444444444, but it formats it automatically to "mm:ss" just like you entered it.
So... when you try to =MATCH() using a wildcard, you aren't going to find a hit since the value in the cell is actually that decimal number.
Instead you have to convert the value of the cells you are searching into a text format. You can do that with the =TEXT() formula. Assuming your data starts in A1 you can put in B1:
=Text(A1, "mm:ss")
Now the returned value from that formula still looks like the mm:ss format, but it's now text. The underlying value is actually 3:55. No funny business. You can now base your wild card search of "*:*" off of this column:
=Match("*:*", B1:B10, 0)
If you want to do this all in one formula you can use an Array formula (or CSE):
=Match("*:*", Text(A1:A10,"mm:ss"),0)
Using Ctrl+Shift+Enter (instead of Enter) when you enter the formula.

convert existing numeric cells to text

I've got a .xls file with a column of zip codes.
Since they are all 5 digit numbers, Open Office Calc is treating them as numbers. I want it to treat them as text.
I know I can do it by prepending an apostrophe to all the numeric fields. But I've got a couple dozen spreadsheets with a couple thousand zip codes.
I've tried selecting the column and doing Format - Cells - picking the Number tab and selecting Text. But that doesn't work and the Format Code is # instead of '
Is there a way to select a column of numeric data and automatically add a ' to the beginning of each field?
You could do it using VBA in a macro, which would be straightforward. Alternatively you can make a formula using BAHTTEXT (under text) which converts numbers to text. If that doesn't work you can use CONCATENATE and have a column of all apostrophes, and just join those two columns. Then just drag down the formula and you will have a new column of text.

long digit reading in sas

I have a long ID number (say, 12184447992012111111). BY using proc import from csv file that number shortens itself with a addition of 'E' in between the digits (1.2184448E19, with format best12. and informat best32.). Browsing here I got to know the csv format itself shortens it previously so it is nothing to do with SAS. So I tried to copy say about 5 numbers and use datalines statement then also it results same.... It wil be helpful if anyone can suggest which format I need to use. Using best32. format I donot get the original number since most probably it modifies that altered number, which infact gives me 12184447992012111872 which is not my desired number.
Because your ID variable is really an identifier rather than a "real" number, you need to read it in as a character string. The value you show as an example is too large to be represented as an integer, so since SAS stores all numerics as floating point, you are losing "precision".
Since you mention using PROC IMPORT, copy the SAS program it generates and change the FORMAT and INFORMAT specifications from "21." and "best32." to "$32." (or whatever value matched your data.
Best of course would be if you had SAS Access to PC File formats, in which case you cound format the column as "text" in Excel and let SAS read it directly.
I'm not sure about the csv changing the value (they are just plain text files) - unless you are saving an excel spreadsheet as a csv file. If you are using excel just set the column to number format, no decimal places.
It might be easier to treat the column as text when importing it to SAS - unless you need to perform mathematical operations on it! If you really need to keep it as a number the format 32. should force it to be a 32 digit number - best is fairly sensibly changing it into scientific notation (though I suspect the data is there in the background and just displayed unhelpfully).
There is a SAS informat for reading exponential notation - Ew.d where w is the width and d the number of decimal places. In your case, it probably won't help because you will "lose" the complete number - and the value stored in case you read with this informat will be 1.2184448 * (10^19). The only way in your case is to ensure that the program which produces the CSV file outputs it in the right way. If you are creating the data from an Excel worksheet, then format the number in the Excel worksheet to display all the digits correctly.