Casting not working correctly in Amazon Athena (Presto)? - amazon-web-services

I have a doctor license registry dataset which includes the total_submitted_charge_amount for each doctor as well as the number of entitlements with medicare & medicaid . I used the query from the answer suggested below:
with datamart AS
(SELECT npi,
provider_last_name,
provider_first_name,
provider_mid_initial,
provider_address_1,
provider_address_2,
provider_city,
provider_zipcode,
provider_state_code,
provider_country_code,
provider_type,
number_of_services,
CASE
WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN
null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS medicare_medicaid_entitlement,
CASE
WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN
null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS total_submitted_charge_amount
FROM cmsaggregatepayment2017)
SELECT *
FROM datamart
ORDER BY total_submitted_charge_amount DESC
Unfortunately I get the error
INVALID_CAST_ARGUMENT: Cannot cast VARCHAR '' to DECIMAL(38, 0)
This query ran against the aggregatepayment_data_2017 database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: be01d1e8-dc4d-4c75-a648-428dcb6be3a5." I have tried Decimal, Real, Big int and nothing works for casting num_entitlement_medicare_medicaid. Below is a screenshot of how the data looks like:
Can someone please suggest how to rephrase this query?

Instead of putting cast/replace in your queries, you could convert the data into a new table with 'clean' data:
CREATE TABLE clean_table
WITH (format='Parquet', external_location='s3://my_bucket/clean_data/')
AS
SELECT
npi,
provider_last_name,
provider_first_name,
...
CASE WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS medicare_medicaid_entitlement,
CASE WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS total_submitted_charge_amount
FROM cmsaggregatepayment2017
You can the SELECT ... FROM clean_table without having to do any conversions.
In data warehousing, this type of process is known as ETL (Extract, Transform, Load). The cleaning process is the 'transform' to convert the data into a more useful format.
See: CREATE TABLE AS - Amazon Athena

You might want to try try_cast() in presto. This version works on coercion. If there is any error it avoids it and moves to the next item.
Documention: https://prestodb.io/docs/current/functions/conversion.html

The reason you are getting error is you have blank value(but it is not null) in the column and we cannot cast varchar '' as decimal. You can probably use case statement. Also as per the data set column num_entitlement_medicare_medicaid has comma ',' in it which you are not replacing.
SELECT npi,
case
when REPLACE(num_entitlement_medicare_medicaid,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(num_entitlement_medicare_medicaid,'[^0-9.]', '') AS DECIMAL)
end as medicare_medicaid_entitlement,
case
when REPLACE(total_submitted_charge_amount,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(total_submitted_charge_amount,'[^0-9.]', '') AS DECIMAL)
end as total_submitted_charge_amount
FROM cmsaggregatepayment2017

Related

regexp_similar '^.$' issues in teradata

For data scrubbing I have lot of hard coded values in my program. I am trying to put those values into a table. One of the conditions for this scrubbing is to find the length of the character and code (character_length(name) = 1).
But when I try to emulate the this by using ^.$, it is not catching values like ¿, ¥, Ã
please let me know if I am doing something wrong .
When I run below code and I see this 3 values ¿, ¥, Ã
select name from email_table
where character_length(name) = 1
and name not in
(select name from email_table
where regexp_similar(translate(name USING LATIN_TO_UNICODE WITH ERROR),'^.$', 'i') = 1)
It seems like the issue is due to version.
We have TD14 and TD 15 on different servers and I did following query
select case when regexp_similar('¥','^.$', 'i')=1
then 'Y'
else 'N'
end as output;
In case of TD 14, I get output as 'N' and in case of TD 15 answer is 'Y'.

How to remove carriage returns and new lines on all the columns in a table using Postgresql?

I am trying to see if there is any way to remove carriage and new lines from all the varchar columns in a table using one statement.
I know that we can do this for a single column using something like below
select regexp_replace(field, E'[\\n\\r]+', ' ', 'g' )
In that case I need have one for every column, which I don't want to do unless there is any easy way.
Appreciate your help!
You can do this either creating a plpgsql function to execute dynamic SQL, or directly run it via DO, as the following example (replace my_table with the name of your table`):
do $$declare _q text; _table text = '<mytable>';
begin
select 'update '||attrelid::regclass::text||E' set\n'||
string_agg(' '||quote_ident(attname)||$q$ = regexp_replace($q$||quote_ident(attname)||$q$, '[\n\r]+', ' ', 'g')$q$, E',\n' order by attnum)
into _q
from pg_attribute
where attnum > 0 and atttypid::regtype::text in ('text', 'varchar')
group by attrelid
having attrelid = _table::regclass;
raise notice E'Executing:\n\n%', _q;
-- uncomment this line when happy with the query:
-- execute _q;
end;$$;

Cellular number valid in pl sql oracle

I want to check cellular number if it current and if true - get it in fromat
for example
the numbers correct :
0521234567
521234567 - need only to 0 in the start
052-1234567
(052)1234567
052-123-456-7
numbers not correct:
052123
0871234567
how I do it??
i tried to write:
SELECT REGEXP_REPLACE('0521234567', '^0?(5[0-9])(\-)?\d{7}$', '') FROM dual;
but it's return '' ;
thank.
SELECT CASE WHEN REGEXP_LIKE('0521234567', '^0?(5[0-9])(\-)?\d{7}$')
THEN '0521234567'
ELSE NULL END
FROM dual;
If a string satisfies the format return the string, otherwise return NULL

InvalidOperation: Invalid literal for Decimal: u' '

When the users perform allocation of money in each envelope sometimes they forgot to put amounts in other envelopes which result to '0'. Then it will result to InvalidOperation.
How to fix this error? Or How can the system get only the amount that is more than 0?
Exception
Types: InvalidOperation
Value: Invalid literal for Decimal: u''
envelopes/views.py in allocate (application)
t2_payee = 'Envelope Transfer'
for val in request.POST:
if val[0:4] == "env_":
env = Envelope.objects.get(pk=int(val[4:]))
amt = Decimal(request.POST[val])
<WSGIRequest
path:/envelopes/allocate/6313/,
GET:<QueryDict: {}>,
POST:<QueryDict: {u'allocation_date': [u'2013-03-03'], u'month': [u'03'],
u'source': [u'6313'], u'year': [u'2013'], u'env_6316': [u''],
u'csrfmiddlewaretoken': [u'3kKoVymvIpbyhCknE1c3WH6YFznTaEoj'],
u'env_6315': [u'1'], u'env_6314': [u'0']}>,
COOKIES:{'__utma': '136509540.132217190.1357543480.1362303551.1362307904.34',
'__utmb': '1
In your example value of env_6316 is empty, Decimal doesn't know how to convert that to a number. You should check if the val is empty and if so then replace it with 0 before converting to Decimal.
I encountered this error while running a SQL query and attempting to construct a Pandas dataframe with the data returned from the query. An alteration to the query solved the problem for me. I had also attempted to CAST the column values returned, but ultimately, appending ::FLOAT8 to the problematic field was the only solution for me.
Example query:
SELECT sum(dollars)::FLOAT8 FROM [table] WHERE ...
sum(dollars) was the field causing the issue for me. It's Type in my table was numeric(10,6), and Size was 8.

Django - coercing to Unicode

I am having a unicode problem and, as everytime I have something related I'm completely lost..
One of my Django template renders a TypeError :
Exception Value:
coercing to Unicode: need string or buffer, long found
The line giving trouble is just a string ( which I want to use in a mysql query) :
query = unicode('''(SELECT asset_name, asset_description, asset_id, etat_id, etat_name FROM Asset LEFT OUTER JOIN Etat ON etat_id_asset=asset_id WHERE asset_id_proj='''+proj+''' AND asset_id_type='''+t.type_id+''' ORDER BY asset_name, asset_description) UNION (SELECT asset_name, asset_description, asset_id, 'NULL', 'NULL' FROM Asset WHERE asset_id_proj='''+proj+''' AND asset_id_type='''+t.type_id+''' AND asset_id IN (SELECT etat_id_asset FROM Etat)); ''')
What can be wrong here ?
I know you figured out a better way to accomplish, but to answer the original question, in case you get that error again somewhere else in the project:
t.type_id appears to be a long integer. You cannot mix integers in strings unless you convert to string, this is really simple:
myString = 'some string with type id ' + str(t.type_id) + ', and whatever else you want in the string.'