How to convert text field with formatted currency to numeric field type in Postgres? - postgresql-11

I have a table that has a text field which has formatted strings that represent money.
For example, it will have values like this, but also have "bad" invalid data as well
$5.55
$100050.44
over 10,000
$550
my money
570.00
I want to convert this to a numeric field but maintain the actual numbers that can be retained, and for any that can't , convert to null.
I was using this function originally which did convert clean numbers (numbers that didn't have any formatting). The issue was that it would not convert $5.55 as an example and set this to null.
CREATE OR REPLACE FUNCTION public.cast_text_to_numeric(
v_input text)
RETURNS numeric
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
declare v_output numeric default null;
begin
begin
v_output := v_input::numeric;
exception when others then return null;
end;
return v_output;
end;
$BODY$;
I then created a simple update statement which removes the all non digit characters, but keeps the period.
update public.numbertesting set field_1=regexp_replace(field_1,'[^\w.]','','g')
and if I run this statement, it correctly converts the text data to numeric and maintains the number:
alter table public.numbertesting
alter column field_1 type numeric
using field_1::numeric
But I need to use the function in order to properly discard any bad data and set those values to null.
Even after I run the clean up to set the text value to say 5.55
my "cast_text_to_numeric" function STILL sets this to null ? I don't understand why this sets it to null, but the above statement correctly converts it to a proper number.
How can I fix my cast_text_to_numeric function to properly convert values such as 5.55 , etc?
I'm ok with disgarding (setting to NULL) any values that don't end up with numbers and a period. The regular expression will strip out all other characters... and if there happens to be two numbers in the text field, with the script, they would be combined into one (spaces are removed) and I'm good with that.
In the example of data above, after conversion, the end result in numeric field would be:
5.55
100050.44
null
550
null
570.00
FYI, I am on Postgres 11 right now

Related

Nvarchar '45,56' casted to decimal is 4,00. Nvarchar '45' to tinyint is 4. Why?

There are 2 text boxes called #IMPORTOORARIO and #ANTICIPO.
The user writes in the first one '45,56', and in the second one '45'.
I want to cast the first string to decimal and the second one to tinyint.
No matter what I try: if the cast succeds, I end up with '45,56' casted to 4,00 and '45' to 4.
Just 4. Not 4,00.
Here is part of the store procedure I am using:
INSERT INTO TableName
VALUES
(
try_convert(decimal(6, 2),#IMPORTOORARIO),
try_convert(tinyint,#ANTICIPO)
)
I attach a screenshot to show the problem.
You will see more text boxes and table fields but the problem is always the same, just focus on #IMPORTOORARIO and #ANTICIPO.
#IMPORTOORARIO = number 3. #ANTICIPO = number 4
Info: the maximum number I am going to store in the database is decimal(9, 2).
So something like: 123456,78
So 6 numbers before the comma and 2 after the comma.
To solve the problem, I tried to use CAST, CONVERT, TRY_CONVERT, and something I did not understand with REPLACE:
Select try_convert(numeric(6, 2),replace('25,12', ',', '.'))

Multiple To clauses in Data step

I have a data step where I have a few columns that need tied to one other column.
I have tried using multiple "from" statements and " to" statements and a couple other permutations of that, but nothing seems to do the trick. The code looks something like this:
data analyze;
set css_email_analysis;
from = bill_account_number;
to = customer_number;
output;
from = bill_account_number;
to = email_addr;
output;
from = bill_account_number;
to = e_customer_nm;
output;
run;
I would like to see two columns showing bill accounts in the "from" column, and the other values in the "to", but instead I get a bill account and its customer number, with some "..."'s for the other values.
Issue
This is most likely because SAS has two datatypes and the first time the to variable is set up, it has the value of customer_number. At your second to statement you attempt to set to to have the value of email_addr. Assuming email_addr is a character variable, two things can happen here:
Customer_number is a number - to has already been set up as a number, so SAS cannot force to to become a character, an error like this may appear:
NOTE: Invalid numeric data, 'me#mywebsite.com' , at line 15 column 8. to=.
ERROR=1 N=1
Customer_number is a character - to has been set up as a character, but without explicitly defining its length, if it happens to be shorter than the value of email_addr then the email address will be truncated. SAS will not show an error if this happens:
Code:
data _NULL_;
to = 'hiya';
to = 'me#mydomain.com';
put to=;
run;
short=me#m
to is set with a length of 4, and SAS does not expand it to fit the new data.
Detail
The thing to bear in mind here is how SAS works behind the scenes.
The data statement sets up an output location
The set statement adds the variables from first observation of the dataset specified to a space in memory called the PDV, inheriting lengths and data types.
PDV:
bill_account_number|customer_number|email_addr|e_customer_nm
===================================================================
010101 | 758|me#my.com |John Smith
The to statement adds another variable inheriting the characteristics of customer_number
PDV:
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |758
(to is either char length 3 or a numeric)
Subsequent to statements will not alter the characteristics of the variable and SAS will continue processing
PDV (if customer_number is character = TRUNCATION):
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |me#
PDV (if customer_number is numeric = DATA ERROR, to set to missing):
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |.
Resolution
To resolve this issue it's probably easiest to set the length and type of to before your first to statement:
data analyze;
set css_email_analysis;
from = bill_account_number;
length to $200;
to = customer_number;
output;
...
You may get messages like this, where SAS has converted data on your behalf:
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
27:8
N.B. it's not necessary to explicitly define the length and type of from, because as far as I can see, you only ever get the values for this variable from one variable in the source dataset. You could also achieve this with a rename if you don't need to keep the bill_account_number variable:
rename bill_account_number = from;

CAST numeric to varchar is giving scientific notation

I have three fields that I am trying to concatenate into one large field. Two of the fields are varchar, but one is a float. In certain situations, the concatenated field is showing scientific notation. The concatenated field should be a varchar and show the combination of the three fields regardless of how they are formatted. I am even seeing scientific notation when I just concatenate the two varchar fields when the values have all numbers in them. Why is this occurring and how can I fix it? Here are some examples of ways I am trying to do the concatenation:
Field1 = e.DocumentNo + e.Assignment + CAST(CAST([Amount in LC] as int) as nvarchar(50))
Field2 = CAST(e.DocumentNo + e.Assignment as varchar(255))
I have also tried using CONVERT and it does not provide the expected result. DocumentNo is a varchar(255) and Assignment is a varchar(255), yet when I have these values for each, 5115146916 and 1610000 respectively, Field2 looks like 5.11515E+16.
I also tried to use CONCAT() with the fields and it produces the same undesired result.
Here you go:
IF OBJECT_ID('TEMPDB..#ConcatData','U') IS NOT NULL
DROP TABLE #ConcatData;
CREATE TABLE #ConcatData(
[Amount in LC] [float] NULL,
[Assignment] [varchar](255) NULL,
[DocumentNo] [varchar](255) NULL)
INSERT INTO #ConcatData
VALUES
(-27.08, '20120295', '4820110172'),
(10625451.5124, '20140701', '4810122475'),
(205.5, 'TPE035948900001', '8200022827'),
(10000000, 'TPE035948900001', '8200022827')
SELECT DOCUMENTNO +
ASSIGNMENT +
CASE WHEN RIGHT(str([amount in lc],50,4),4) = '0000'
THEN ltrim(LEFT(str([amount in lc],50,4),LEN(str([amount in lc],50,4))-5))
WHEN RIGHT(str([amount in lc],50,4),3) = '000'
THEN ltrim(LEFT(str([amount in lc],50,4),LEN(str([amount in lc],50,4))-3))
WHEN RIGHT(str([amount in lc],50,4),2) = '00'
THEN ltrim(LEFT(str([amount in lc],50,4),LEN(str([amount in lc],50,4))-2))
WHEN RIGHT(str([amount in lc],50,4),1) = '0'
THEN ltrim(LEFT(str([amount in lc],50,4),LEN(str([amount in lc],50,4))-1))
ELSE ltrim(str([amount in lc],50,4))
END
FROM #ConcatData
Moral of the story here, float isn't the right datatype for your column. I actually don't know when float is the right datatype...
Anyway, the obnoxious CASE statement is needed to remove excess decimal-place zeroes caused by STR(). You might even need more, but this covers you up to 4 decimal places and I think you'll get the idea.
One note, the first THEN removes 5 chars instead of 4. This is to include the . as well.
Output:
482011017220120295-27.08
48101224752014070110625451.5124
8200022827TPE035948900001205.5
8200022827TPE03594890000110000000

Converting a string of numbers to hex and back to dec pandas python

I currently have a string of values which I retrieved after filtering through data from a csv file. ultimately I had to do some filtering of the data but I have the same numbers as a list, dataframe, or array. I just need to take the numbers in the string and convert them to hex and then take the first 8 numbers of the hex and convert that to dec for each element in the string. Lastly I also need to convert the last 8 of the same hex and then to dec as well for each value in the string.
I cannot provide a snippet because it is sensitive data, but here is an example.
I basically have something like this
>>> list_A
[52894036, 78893201, 45790373]
If I convert it to a dataframe and call df.dtypes, it says dtype: object and I can convert the values of Column A to bool, int, or string, but the dtype is always an object.
It does not matter whether it is a function, or just a simple loop. I have been trying many methods and am unable to attain the results I need. But ultimately the data is taken from different csv files and will never be the same values or list size.
Pandas is designed to work primarily with integers and floats, with no particular facilities for hexadecimal that I know of, but you can use apply to access standard python conversion functions like hex and int:
df=pd.DataFrame({ 'a':[52894036999, 78893201999, 45790373999] })
df['b'] = df['a'].apply( hex )
df['c'] = df['b'].apply( int, base=0 )
Results:
a b c
0 52894036999 0xc50baf407 52894036999
1 78893201999 0x125e66ba4f 78893201999
2 45790373999 0xaa951a86f 45790373999
Note that this answer is for Python 3. For Python 2 you may need to strip off the trailing "L" in column "b" with str[:-1].

Format mask for number field items: trailing and 'leading' zero

I'm having some trouble with displaying numbers in apex, but only when i fill them in through code. When numbers are fetched through an automated row fetch, they're fine!
Leading Zero
For example, i have a report where a user can click a link, which runs a javascript function. There i get detailed values for that record through an application process. The returned values are in JSON. Several fields are number fields.
My response looks as follows (fe):
{"AVAILABLE_STOCK": "15818", "WEIGHT": ".001", "VOLUME": ".00009", "BASIC_PRICE": ".06", "COST_PRICE": ".01"}
Already the numbers here 'not correct': values less than one do not have a zero before the .
I kind of hoped that the format mask on the items would catch this. If i specify FM999G990D000 for the item weight, i'd expect it to show '0.001' .
But okay, i suppose it only works that way when it comes through session state, and not when you set an item value through $("#").val() ?
Where do i go wrong? Is my only option to change my select in the app process?
Now:
SELECT '"AVAILABLE_STOCK": "' || AVAILABLE_STOCK ||'", '||
'"WEIGHT": "' || WEIGHT ||'", '||
'"VOLUME": "' || VOLUME ||'", '||
'"BASIC_PRICE": "' || BASIC_PRICE ||'", '||
Do i need to provide my numberfields a to_char with the format mask here (to_char(available_stock, 'FM999G990D000')) ?
Right now i need to put my numbers between quotes ofcourse, or i get invalid json when i parse it.
Trailing Zero
I have an application process on a page on the after header point, right after an automated row fetch. Several fields are calculated here (totals). The variables used are all specified as number(10, 2). All values are correct and rounded to 2 values after the comma. My format masks on the items are also specified as FM999G999G990D00.
However, when one of the calculated values has only one meaningfull value after the comma, the trailing zeros get dropped. Instead of '987.50', it is displayed as '987.5'.
So, i have a number variable, and assign it like this: :P12_NDB_TOTAL_INCL := v_totI;
Would i need to convert my numbers here too, with format mask?
What am i doing wrong, or what am i missing?
If you aren't doing math on it and are more concerned with formatting, I suggest treating it as a varchar/string instead of as a number wherever you can.