Split multiple hive queries if query contains a semicolon - regex

I am trying to split multiple hive queries in files, and loop over them and run them using scala/spark. I am using .split(";"). But it is creating a problem when the query itself has a semicolon.
select * from table where value='myName\;is\;Name';
select * from table;
How can I escape the semicolon in the first query and split the above into 2 separate queries in scala

Let check this pattern:
.split("(?<!\\\\);")
In Java, it return a correct output, but I am not sure it work for you on Scala.
The pattern mean: Find the ; with not \ before.
You can find "Negative look behind" regex for more detail.

Related

Use multiple replace conditions for a single column in Amazon Redshift

I have a table where the amount column has , and $ sign for example: $8,122.14 as values. I want to write a replace function to replace $ and , over that column in one go. Is there any way we can write multiple conditions in one replace in Redshift? Also, this is apart of post processing the data where I am inserting data from stage table to a final table after replacing these values.
I tried the ways listed in the take 1 and 2 given in the code but both of them failed.
Take 1:
insert into db.stage_table
select
(coalesce(replace(logging_amount,'$',','),''))) as logging_amount
from db.table;
Take 2:
insert into db.stage_table
select
(coalesce(replace(logging_amount,'$',',')) as logging_amount
from db.table;
Both of them failed.
The expected result should be replace function in a single statement.
Yes you can nest replace statements like this
replace(replace(logging_amount,'$',''),',','')
Or you can use regex if you prefer (personally for something like this i think nested replaces are easier to read.)

regex for CREATE TABLE PARTITIONED BY DDL

I need a regex that matches to both of these strings:
CREATE TEMPORARY TABLE db.table (cols)USING parquet PARTITIONED BY (DATA2, DATA3)
CREATE TABLE db.table (cols)USING parquet
The closest I've got is this:
CREATE +?(TEMPORARY +)?TABLE *(?P<db>.*?\.)?(?P<table>.*?)\((?P<col>.*?)\).*?USING.*?(PARTITIONED BY \((?P<pcol>.*?)\))
But that doesn't match to the second string. I've tried using a ? on the end but that didn't help. basically I've been playing around with this for hours now and can't figure it out, so I'm resorting to SO.
I've set up a demo of this here: https://regex101.com/r/ffSVuD/1 If anyone feels game enough to try and solve it, be my guest!
I ended up using CREATE +?(TEMPORARY +)?TABLE *(?P<db>.*?\.)?(?P<table>.*?)\((?P<col>.*?)\).*?USING +([^\s]+) *(PARTITIONED BY \((?P<pcol>.*?)\))? to match both your examples.
Basically, I replaced USING.*? by USING +(\[^\s\]+) *, so that you don't end up with a .*? before your last group.
Finally, I added a ? after your last group to make it optional.

Function regex_extract in hive

I'm extracting information from logs in hive with this sentences:
regexp_extract(values, "^(\\w{3} \\s?\\d+ \\d\\d:\\d\\d:\\d\\d \\w+-\\w+ \\w+:) (\\[)(\\d{2})(\\/)(\\w{3})(\\/)(\\d{4})(.*\\])",3)day,
regexp_extract(values, "^(\\w{3} \\s?\\d+ \\d\\d:\\d\\d:\\d\\d \\w+-\\w+ \\w+:) (\\[)(\\d{2})(\\/)(\\w{3})(\\/)(\\d{4})(.*\\])",5)month
I use the same regular expression for extract two fields in two different regex_extract call. It is possible to extract more than one field only executing regex_extract once?
Maybe not exactly what you are looking for, but if your really want to have one extraction that will give you multiple fields instead of one, this is what I found:
http://dev.bizo.com/2012/01/using-genericudfs-to-return-multiple.html
Note that for this solution you need to write a UDF with object inspectors, but see for yourself.

Searching for unknown characters in an Oracle database

Is there a way to search an Oracle database (some sort of regex I suspect) to find unknown characters (which often appear as □ □)?
There is no standard way to search over entire Oracle database. You would need a tedious script that walks over various types of Oracle objects in dba_objects, and then descends into each (for a trivial example if an object is a table you need to parse the columns, and if a column contains a character data, REGEXP_LIKE; but there are more types of objects, for example a package - do you want to search package's literals too?). I would instead make manually an explicit list of queries over tables and columns.
Try something like this:
select co11, ...
from tab1
where col1 like '%'||chr(9)||'%' -- ascii code for tab
or col1 like '%'||chr(20)||'%' -- ascii code for newline
--...
;

Using a RegEx in a SQL Query

Here's the situation I'm in: We have a field in our database that contains a 3 digit number, surrounded by some text. This number is actually a PK in another table, and I need to extract this out so I can implement a proper FK relationship. Here's an example of what would currently reside in the column:
Some Text Goes Here - (305) Followed By Some More Text
So, what I'm looking to do is extract the '305' from the column, and hopefully end up with a result that looks something like this (pseudo code)
SELECT
<My Extracted Value>,
Original Column Text,
Id
FROM dbo.MyTable
It seems to me that using a Regex match in my query is the most effective way to do this. Can anybody point me in the right direction?
EDIT: We're using SQL Server 2005
RegExp in SQL is defined by a SQL-Standard but most databases implemented their own syntax, you should tell us the product name of your RDBMS ;)
This is based on Pranay's first answer that has since been changed.
DECLARE #NumStr varchar(1000)
SET #NumStr = 'Some Text Goes Here - (305) Followed By Some More Text';
SELECT SUBSTRING(#NumStr,PATINDEX('%[0-9][0-9][0-9]%',#NumStr),3)
Returns 305
Microsoft seems to suggest using a CLR assembly to do Regex pattern matching in SQL Server 2005.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Apart from LIKE (which is not going to solve your problem) I don't know of a built-in pattern matching functionality in SQL Server 2005 (that is, more advanced than simple string searches).
Just after I implemented a solution in Postgres, I see you are using SqlServer... Just for the records, then, with a regex that extracts data in parenthesis.
Postgresql solution:
create table main(id text not null)
insert into main values('some text (44) other text');
insert into main values('and more text (78) and even more');
select substring(id from '\\(([^\\(]+)\\)') from main
The only way to access RegEx-type functions in SQL 2005 (and probably 2008) is by writing (or downloading) and using CLR functions.
If all the strings are always formatted in such a way as you can identify the specific numbers you want, you can do something like the following. This is based on the (big) assumption that the first set of parenthesis found in the string contains the number that you want.
/*
CREATE TABLE MyTable
(
MyText varchar(500) not null
)
INSERT MyTable values ('Some Text Goes Here - (305) Followed By Some More Text')
*/
SELECT
MyText -- String
,charindex('(', MyText) -- Where's the open parenthesis
,charindex(')', MyText) -- Where's the closed parenthesis
,substring(MyText
,charindex('(', MyText) + 1, charindex(')'
,MyText) - charindex('(', MyText) - 1) -- Glom it all together
from MyTable
Awkward as heck (because SQL has a pathetically limited set of string manipulation functions), but it works.