converting values from string to int in hive - casting

I am creating a table in hive;
create table patients(
patient_id INT,
age_group STRING,
gender STRING,
income_range STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
load data local inpath '/mnt/patients.csv' into table patients;
Now when I am using the command:
hive>select * from patients limit 5;
I am getting the output:
NULL 75-84, F, 32000-47999
NULL 75-84, M, 16000-23999
NULL 85+, M, <16000
NULL 65-74, F, 32000-47999
NULL <65, M, <16000
But when I am using assigning patient_id as string its showing:
910997967, 75-84, F, 32000-47999
506013497, 75-84, M, 16000-23999
432041392, 85+, M, <16000
633048699, 65-74, F, 32000-47999
I tried to use :
hive>select CAST(patient_id AS int) from patients;
But its not changing the values to int and only showing
NULL
NULL
...
How could the values of patient_id can be converted to int values?
Thanks

As #visakh pointed out that there is a comma(,) in your 1st column: patient_id.
You need to remove this.
You may use
CAST(regexp_replace(patient_id, ',' , '') AS INT)
This is similar to
Hive function to replace comma in column value

Related

combine column in json format in big query

I have columns in bigquery like this:
expected output:
I am trying to merge columns into json using bigquery.
I am taking letter before underscore(common name ) as column then converting.
I am trying this query:
with selectdata as (
SELECT a_firstname, a_middlename,a_lastname FROM `account_id.Dataset.Table_name`
)
select TO_JSON_STRING(t) AS json_data FROM selectdata AS t;
How can I join columns with condition or with case to achieve this output in bigquery
Consider below approach
create temp function extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
select * except(row_id) from (
select format('%t',t) row_id,
split(key, '_')[offset(0)] as col,
'{' || string_agg(format('"%s":"%s"', split(key, '_')[safe_offset(1)], value)) || '}' as value
from your_table t, unnest(extract_keys(to_json_string(t))) key with offset
join unnest(extract_values(to_json_string(t))) value with offset
using(offset)
group by row_id, col
)
pivot (any_value(value) for col in ('a','b','c'))
if applied to sample data in your question - output is

How to use string as column name in Bigquery

There is a scenario where I receive a string to the bigquery function and need to use it as a column name.
here is the function
CREATE OR REPLACE FUNCTION METADATA.GET_VALUE(column STRING, row_number int64) AS (
(SELECT column from WORK.temp WHERE rownumber = row_number)
);
When I call this function as select METADATA.GET_VALUE("TXCAMP10",149); I get the value as TXCAMP10 so we can say that it is processed as SELECT "TXCAMP10" from WORK.temp WHERE rownumber = 149 but I need it as SELECT TXCAMP10 from WORK.temp WHERE rownumber = 149 which will return some value from temp table lets suppose the value as A
so ultimately I need value A instead of column name i.e. TXCAMP10.
I tried using execute immediate like execute immediate("SELECT" || column || "from WORK.temp WHERE rownumber =" ||row_number) from this stack overflow post to resolve this issue but turns out I can't use it in a function.
How do I achieve required result?
I don't think you can achieve this result with the help of UDF in standard SQL in BigQuery.
But it is possible to do this with stored procedures in BigQuery and EXECUTE IMMEDIATE statement. Consider this code, which simulates the situation you have:
create or replace table d1.temp(
c1 int64,
c2 int64
);
insert into d1.temp values (1, 1), (2, 2);
create or replace procedure d1.GET_VALUE(column STRING, row_number int64, out result int64)
BEGIN
EXECUTE IMMEDIATE 'SELECT ' || column || ' from d1.temp where c2 = ?' into result using row_number;
END;
BEGIN
DECLARE result_c1 INT64;
call d1.GET_VALUE("c1", 1, result_c1);
select result_c1;
END;
After some research and trial-error methods, I used this workaround to solve this issue. It may not be the best solution when you have too many columns but it surely works.
CREATE OR REPLACE FUNCTION METADATA.GET_VALUE(column STRING, row_number int64) AS (
(SELECT case
when column_name = 'a' then a
when column_name = 'b' then b
when column_name = 'c' then c
when column_name = 'd' then d
when column_name = 'e' then e
end from WORK.temp WHERE rownumber = row_number)
);
And this gives the required results.
Point to note: the number of columns you use in the case statement should be of the same datatype else it won't work

Remove empty commas in the string in sql server

I want to remove blank values commas(first,last and consecutive empty commas) in the below table.My table contains below values
x
---
,1,2,3,
4,5,6
7,8,,,9
,10,11
I want the below result.
O/P
---
1,2,3
4,5,6
7,8,9
10,11
CREATE TYPE TableVariable AS TABLE
(
id int identity(1,1),
field_ids INT,
value VARCHAR(MAX)
)
inserting values in this type
DECLARE #DepartmentTVP AS TableVariable;
insert into #DepartmentTVP values(1994,',85574,,,85538,')
when i select from this type i want to remove this commas(first,last and consecutive commas in the value)

regular expression to solve for the following

Example 1
asdk[wovkd'vk'psacxu5=205478499|205477661zamd;amd;a;d
Example 2
sadlmdlmdadsldu5=205478499|205477661|234567899amsd/samdamd
u5 can have multiple values separated by |
How can I capture all u5 values from a long string I have?
Below is for BigQuery Standard SQL
#standardSQL
WITH data AS (
SELECT 1 AS id, "asdk[wovkd'vk'psacxu5=205478499|205477661zamd;amd;a;d" AS junk UNION ALL
SELECT 2, "sadlmdlmdadsldu5=205478499|205477661|234567899amsd/samdamd"
)
SELECT id, SPLIT(REGEXP_EXTRACT(junk, r'(?i)u5=([\d|]*)'), '|') AS value
FROM data
with output as below
id value
1 205478499
205477661
2 205478499
205477661
234567899

The result being cast to double in Pig but is still being ordered as a string

I encountered the following problem:
First my data is a string that looks like this:
decimals, decimals
example: 1.345, 3.456
I used the following pig script to put this column, say QQ, into two columns:
result = FOREACH old_table GENERATE FLATTEN(STRSPLIT(QQ, ',')) as (COL1: double, COL2: double);
Then, I want to order it by first field, then second field.
result_ordered = ORDER result BY COL1, COL2;
However, I got the result like the following:
> 59.619198977071434 -151.4586740547339
> 60.52611316847121 -150.8005347076273
> 64.8310014577408 -147.84786488835852
> 7.059652849999997 125.59985130999996
which implies that my data is still being ordered as a string and not as a double. Has anyone encountered this issue and know how to solve it? Thank you in advance!
I'm not sure why STRSPLIT is returning a chararray though you explicitly state they are doubles. But if you look at http://pig.apache.org/docs/r0.10.0/basic.html#arithmetic, notice that chararrays can't be multiplied by 1.0 to doubles, but bytearrays can. Therefore you can do something like:
result = FOREACH old_table
GENERATE FLATTEN(STRSPLIT(QQ, ',')) AS (COL1: bytearray, COL2: bytearray);
B = FOREACH result GENERATE 1.0 * COL1 AS COL1, 1.0 * COL2 AS COL2 ;
result_ordered = ORDER B BY COL1, COL2;
Which gives me the correct output of:
result_ordered: {COL1: double,COL2: double}
(7.059652849999997,125.59985130999996)
(59.619198977071434,-151.4586740547339)
(60.52611316847121,-150.8005347076273)
(64.8310014577408,-147.84786488835852)
Instead of assigning the output of FLATTEN to a schema with two doubles, try actually casting the fields with (chararray). It may be that Pig only uses the :chararray syntax for applying schema checking, but requires the explicit cast to convert the types during execution.