MYSQL ROW :
ID : 1
DATA : ,1,11,5
i wanna remove only ,1
MYSQL QUERY :
UPDATE SET DATA=TRIM(BOTH ',' FROM REPLACE(CONCAT(',', DATA, ','), ',1,', ',')) WHERE ID='1';
MYSQL RESULT : 11,5
NEED RESULT : ,11,5
I was able
UPDATE TABLE SET DATA=TRIM(BOTH ',' FROM REPLACE(CONCAT(' ', DATA, ','), ',1,', ',')) WHERE ID=1
MYSQL RESULT ,11,5
it's working
Related
I would like to prepare a manifest file using Lambda and then execute the stored procedure providing input parameter manifest_location.
Stored procedure signature:
CREATE OR REPLACE PROCEDURE stage.sp_stage_user_activity_page_events(manifest_location varchar(256))
and I would like to use this parameter as follows:
COPY stage.user_activity_event
FROM manifest_location
IAM_ROLE 'arn:aws:iam::XXX:role/redshift-s3-read-only-role'
IGNOREHEADER 1
REMOVEQUOTES
DELIMITER ','
LZOP
MANIFEST;
but Redshift is giving me ERROR:
syntax error at or near "$1" Where: SQL statement in PL/PgSQL function "sp_stage_user_activity_page_events" near line 21
How can I achieve this?
Could you try the command below?
EXECUTE 'COPY '|| parameter_for_table ||' FROM '||CHR(39)|| parameter_for_s3_path ||CHR(39)||' IAM_ROLE '||CHR(39)|| parameter_for_iam_role ||CHR(39)||' FORMAT AS PARQUET;';
Please, let me know if you have any more questions, because I created one stored_procedure, as below today:
CREATE TABLE table_for_error_log (message varchar);
CREATE OR REPLACE procedure sp_proc_name(parameter_for_table in VARCHAR, parameter_for_s3_path in VARCHAR, parameter_for_iam_role in VARCHAR)
AS
$$
BEGIN
EXECUTE 'COPY '|| parameter_for_table ||' FROM '||CHR(39)|| parameter_for_s3_path ||CHR(39)||' IAM_ROLE '||CHR(39)|| parameter_for_iam_role ||CHR(39)||' FORMAT AS PARQUET;';
EXCEPTION WHEN OTHERS THEN
RAISE INFO 'An exception occurred.';
INSERT INTO table_for_error_log VALUES ('Error message: ' || SQLERRM);
END;
$$ LANGUAGE plpgsql;
It's already tested and worked.
Just replace it with your own command.
i don't have enough reputation to upvote the above answer but it works!.
you can also use the quote_literal function eg:
DATEFORMAT as ' || quote_literal('auto') ||
'TIMEFORMAT as '|| quoteliteral('MM/DD/YYYY HH24:MI:SS') ||
etc..
I am following this solution for loading an external table into Impala as I get the same error if I load data by referring to the file.
So, If I run:
[quickstart.cloudera:21000] > create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
> fields terminated by ','
> STORED as TEXTFILE
> location '/user/cloudera/rdpdata/rpd_data_all.csv' ;
I get:
Query: create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
fields terminated by ','
STORED as TEXTFILE
location '/user/cloudera/rdpdata/rpd_data_all.csv'
ERROR: ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore:
CAUSED BY: MetaException: hdfs://quickstart.cloudera:8020/user/cloudera/rdpdata/rpd_data_all.csv is not a directory or unable to create one
and If run the below, nothing get imported.
[quickstart.cloudera:21000] > create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
> fields terminated by ','
> location '/user/cloudera/rdpdata' ;
Query: create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
fields terminated by ','
location '/user/cloudera/rdpdata'
Fetched 0 row(s) in 1.01s
and the content of the folder
[cloudera#quickstart ~]$ hadoop fs -ls /user/cloudera/rdpdata
Found 1 items
-rwxrwxrwx 1 cloudera cloudera 75115191 2020-09-02 19:36 /user/cloudera/rdpdata/rpd_data_all.csv
and the content of the file:
[cloudera#quickstart ~]$ hadoop fs -cat /user/cloudera/rdpdata/rpd_data_all.csv
1,EMSP,RP,RC, 03/21/2013,095454,000000,000000,101659,CANC
and the screenshot of the cloudera quickstart vm
The location option in the impala create table statement determines the hdfs_path or the HDFS directory where the data files are stored. Try giving the directory location instead of the file name that should let you use the existing data.
For your reference : https://impala.apache.org/docs/build/html/topics/impala_tables.html
I have a table with column with separated by ';'. The data looks like this:
row_id col
1 p.[D389R;D393_W394delinsRD]
2 p.[D390R;D393_W394delinsRD]
3 p.D389R
4. p.[D370R;D393_W394delinsRD]
I would like replace the '[]' brackets whereever they are and fetch the text. Later, I would like to split the string be ';' and concatenate 'p.' to the splitted text (if it is not there) and create a new row.
The expected output is:
row_id new_col
1 p.D389R
2 p.D393_W394delinsRD
3 p.D390R
4 p.D393_W394delinsRD
5 p.D389R
6 p.D370R
7 p.D393_W394delinsRD
I have tried below query to get the desired output.
SELECT *,
CASE
WHEN regexp_split_to_table(regexp_replace(col, '\[|\]', '', 'g'), E';') NOT LIKE 'p.[%'
THEN 'p.' || (regexp_split_to_table(regexp_replace(col, '\[|\]', '', 'g'), E';'))[1]
ELSE regexp_split_to_table(regexp_replace(col, '\[|\]', '', 'g'), E';')[2]
END AS new_col
FROM table;
Any suggestions would be really helpful.
I would first remove the constant values ( p.[ and ]) from the string and then unnest it.
with clean as (
select row_id, regexp_replace(col, '^p\.(\[){0,1}|\]$', '', 'g') as col
from the_table
)
select row_id, 'p.'|| t.c
from clean c
cross join unnest(string_to_array(c.col, ';')) as t(c)
The CTE (with ...) isn't really necessary, but that way the unnest(...) stays readable.
Online example
This is a sample row in input data file with two fields - dept and names
dept,names
Mathematics,[foo,bar,alice,bob]
Here, 'name' is an array of String and I want to load it as array of String Athena.
Any suggestion?
To have a valid CSV file, make sure you put quotes around your array:
Mathematics,"[foo,bar,alice,bob]"
If you can remove the "[" and "]" the solution below becomes even easier and you can just split without the regex.
Better: Mathematics,"foo,bar,alice,bob"
First create a simple table from CSV with just strings:
CREATE EXTERNAL TABLE IF NOT EXISTS test.mydataset (
`dept` string,
`names` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ',',
'quoteChar' = '"',
"separatorChar" = ',',
'collection.delim' = ',',
'mapkey.delim' = ':'
) LOCATION 's3://<your location>'
TBLPROPERTIES ('has_encrypted_data'='false')
Then create a view which uses a regex to remove your '[' and ']' characters, then splits the rest by ',' into an array.
CREATE OR REPLACE VIEW mydataview AS
SELECT dept,
split(regexp_extract(names, '^\[(.*)\]$', 1), ',') as names
FROM mydataset
Then use the view for your queries. I am not 100% sure as I've only spent like 12 hours using Athena.
--
Note that in order to use the quotes, you need to use OpenCSVSerde, the 'lazyserde' won't work as it does support quotes. lazyserde DOES support internal arrays, but you can't use the ',' as a separator in that case. If you want to try that, your data would look like:
Better: Mathematics,foo|bar|alice|bob
In that case this MIGHT work directly:
CREATE EXTERNAL TABLE IF NOT EXISTS test.mydataset (
`dept` string,
`names` array<string>
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ',',
'quoteChar' = '"',
"separatorChar" = ',',
'collection.delim' = '|',
'mapkey.delim' = ':'
) LOCATION 's3://<your location>'
TBLPROPERTIES ('has_encrypted_data'='false')
Note how collection.delim = '|', which should translate your field directly to an array.
Sorry I don't have time to test this, I'll be happy to update my answer if you can confirm what works. Hopefully this get's you started.
When trying to use a Script Argument in the sqlActivity:
{
"id" : "ActivityId_3zboU",
"schedule" : { "ref" : "DefaultSchedule" },
"scriptUri" : "s3://location_of_script/unload.sql",
"name" : "unload",
"runsOn" : { "ref" : "Ec2Instance" },
"scriptArgument" : [ "'s3://location_of_unload/#format(minusDays(#scheduledStartTime,1),'YYYY/MM/dd/hhmm/')}'", "'aws_access_key_id=????;aws_secret_access_key=*******'" ],
"type" : "SqlActivity",
"dependsOn" : { "ref" : "ActivityId_YY69k" },
"database" : { "ref" : "RedshiftCluster" }
}
where the unload.sql script contains:
unload ('
select *
from tbl1
')
to ?
credentials ?
delimiter ',' GZIP;
or :
unload ('
select *
from tbl1
')
to ?::VARCHAR(255)
credentials ?::VARCHAR(255)
delimiter ',' GZIP;
process fails:
syntax error at or near "$1" Position
Any idea what i'm doing wrong?
This is the script that works fine from psql shell :
insert into tempsdf select * from source where source.id = '123';
Here are some of my tests on SqlActivity using Data-Pipelines :
Test 1 : Using ?'s
insert into mytable select * from source where source.id = ?; - works fine if used via both 'script' and 'scriptURI' option on SqlActivity object.
where "ScriptArgument" : "123"
here ? can replace the value of the condition, but not the condition itself.
Test 2 : Using parameters works when command is specified using 'script' option only
insert into #{myTable} select * from source where source.id = ?; - Works fine if used via 'script' option only
insert into #{myTable} select * from source where source.id = #{myId};
works fine if used via 'script' option only
where #{myTable} , #{myId} are Parameters whose value can be declared in template.
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-templates.html
(when you are using only parameters, make sure you delete an unused
scriptArguments - otherwise it will still throw and error)
FAILED TESTS and inferences:
insert into ? select * from source where source.id = ?;
insert into ? select * from source where source.id = '123';
Both the above commands does not work because
Table names cannot be used for placeholders for script arguments. '?''s can only be used to pass values for a comparison condition and column values.
insert into #{myTable} select * from source where source.id = #{myId}; - doesn't work if used as 'SciptURI'
insert into tempsdf select * from source where source.id = #{myId}; - does not work when used with 'ScriptURI'
Above 2 commands does not work because
Parameters cannot be evaluated if script is stored in S3.
insert into tempsdf select * from source where source.id = $1 ; - doesnt work with 'scriptURI'
insert into tempsdf values ($1,$2,$3); - does not work.
using $'s - doesn't not work in any combination
Other tests :
"ScriptArgument" : "123"
"ScriptArgument" : "456"
"ScriptArgument" : "789"
insert into tempsdf values (?,?,?); - works as both scriptURI , script and translates to insert into tempsdf values ('123','456','789');
scriptArguments will follow the order you insert and replaces "?" in
the script.
in shellcommand activity
we specify two scriptArguments to acces using $1 $2 in shell script(.sh)
"scriptArgument" : "'s3://location_of_unload/#format(minusDays(#scheduledStartTime,1),'YYYY/MM/dd/hhmm/')}'", # can be accesed using $1
"scriptArgument" : "'aws_access_key_id=????;aws_secret_access_key=*******'" # can be accesed using $2
I dont know will this work for you.
I believe you are using this sql activity for Redshift. Can you modify your sql script to refer to parameters using their positional notation.
To refer to the parameters in the sql statement itself, use $1, $2, etc.
See http://www.postgresql.org/docs/9.1/static/sql-prepare.html