Use multiple replace conditions for a single column in Amazon Redshift - amazon-web-services

I have a table where the amount column has , and $ sign for example: $8,122.14 as values. I want to write a replace function to replace $ and , over that column in one go. Is there any way we can write multiple conditions in one replace in Redshift? Also, this is apart of post processing the data where I am inserting data from stage table to a final table after replacing these values.
I tried the ways listed in the take 1 and 2 given in the code but both of them failed.
Take 1:
insert into db.stage_table
select
(coalesce(replace(logging_amount,'$',','),''))) as logging_amount
from db.table;
Take 2:
insert into db.stage_table
select
(coalesce(replace(logging_amount,'$',',')) as logging_amount
from db.table;
Both of them failed.
The expected result should be replace function in a single statement.

Yes you can nest replace statements like this
replace(replace(logging_amount,'$',''),',','')
Or you can use regex if you prefer (personally for something like this i think nested replaces are easier to read.)

Related

How to use Calculated Join in Tableau using 1-M relationship and regex_extract

Question 1 - Is Tableau able to use multiple results from from a single line in a REGEXP using the global variable to compare against another table during a Join operation? If no, question 2 is null. If yes...
Question 2 - I'm attempting to join two data sources in Tableau using a regexp in a calculated join because the left table has 1 value in each cell (ie. 64826) and the right table has 4 possible matches in each cell (ie. 00000|00000|21678|64826).
The problem is that my regex stops looking after it finds 1 match (the first of 4 values), and the global variable /g has the opposite effect I expected and eliminates all matches.
I've tried calculated joins on the Data Source tab. I've also tried separating those 4 values into their own columns in worksheets using
regexp_extract_nth. In both cases, regex stops looking after the first result. A Left Join seems to work somewhat, while an Outer Join returns nothing.
REGEXP_EXTRACT([Event Number],'(\d{5})')
REGEXP_EXTRACT_NTH([Event Number],'(?!0{5})(\d{5})',1)
With these examples, regex would match a NULL with the left table even though 64826 is in the right table. I expect the calculated join to return all possible matches from the right set, so there'd be a match on 21678 and on 64826, duplicating rows in the right table like so...
21678 - 00000|00000|21678|64826
64826 - 00000|00000|21678|64826
45245 - 45106|45245|00000|00000
45106 - 45106|45245|00000|00000
Your original expression is just fine, we might want to make sure that we are sending a right command in Tableau, which I'm not so sure, maybe let's try an expression similar to:
\b([^0]....)\b
even just for testing, then maybe let's modify our commands to:
REGEXP_EXTRACT([Event Number], '\b([^0]....)\b')
or:
REGEXP_EXTRACT_NTH([Event Number], '\b([^0]....)\b', 1)
to see what happens. I'm assuming that the desired numbers won't be starting with 0.
Please see the demo here
Reference

Split multiple hive queries if query contains a semicolon

I am trying to split multiple hive queries in files, and loop over them and run them using scala/spark. I am using .split(";"). But it is creating a problem when the query itself has a semicolon.
select * from table where value='myName\;is\;Name';
select * from table;
How can I escape the semicolon in the first query and split the above into 2 separate queries in scala
Let check this pattern:
.split("(?<!\\\\);")
In Java, it return a correct output, but I am not sure it work for you on Scala.
The pattern mean: Find the ; with not \ before.
You can find "Negative look behind" regex for more detail.

Function regex_extract in hive

I'm extracting information from logs in hive with this sentences:
regexp_extract(values, "^(\\w{3} \\s?\\d+ \\d\\d:\\d\\d:\\d\\d \\w+-\\w+ \\w+:) (\\[)(\\d{2})(\\/)(\\w{3})(\\/)(\\d{4})(.*\\])",3)day,
regexp_extract(values, "^(\\w{3} \\s?\\d+ \\d\\d:\\d\\d:\\d\\d \\w+-\\w+ \\w+:) (\\[)(\\d{2})(\\/)(\\w{3})(\\/)(\\d{4})(.*\\])",5)month
I use the same regular expression for extract two fields in two different regex_extract call. It is possible to extract more than one field only executing regex_extract once?
Maybe not exactly what you are looking for, but if your really want to have one extraction that will give you multiple fields instead of one, this is what I found:
http://dev.bizo.com/2012/01/using-genericudfs-to-return-multiple.html
Note that for this solution you need to write a UDF with object inspectors, but see for yourself.

Optimize join with regex

I have one table (A) with a phrase, and the other (B) is a phrase that I want to find WITHIN table A's phrase. So I'm joining them as follows:
Create table C as
SELECT A.*
FROM A
JOIN B
where (A.phrase LIKE concat("%",B.phrase,"%"));
It is taking a long time because it's only using one reducer, and I believe this has to do with the nature of the query? Is there a way of speeding this up? I don't think a mapjoin or bucketjoin would help, because I'm not equating two columns, but rather, searching within one table for words from another table...
I found the solution.
The problem was that Hive doesn't do non equi joins well. So I did equi joins to get a subset of table A before I did the non equi join regex. So, 3 steps.
Break A.phrase and B.phrase into individual words.
Equate these words to see which keywords from B.phrase are equal to any keywords from A.phrase - this gives a subset of table A where A.phrase contains at least one keyword from B.phrase.
Use this table A subset to find the whole "%B.phrase%".
I think that EXISTS may be faster simply because your query will return same row from A multiple times for every match:
SELECT
A.*
FROM A as a
WHERE EXISTS (
SELECT
1
FROM B
WHERE a.phrase LIKE concat("%",phrase,"%")
);

regular expression to extract insert sql statement from a text file and to check for hardcoded parameters

I have a bunch of sql statements updated by my team developers.
I intend to run a check before these statements are run against a db.
for example, check if a certain column is hardcoded instead of being fetched from the respective table (foreign key)
for example:
INSERT INTO [Term1] ([CreatedBy]
,[CreateUser]) values(1,'asdadad')
where 1 is hardcoded value.
Is there a regular expression that can extract all insert statements from the file so that they can be parse?
I tried with this expression http://regexlib.com/REDetails.aspx?regexp_id=1750 but it didnot work
You may need to run a multi-level regex on this. First parse the entire parameter string from the whole query, then parse each individual field from the paramter string that you previously got to get each one specifically ignoring all the other characters that may come up.