regex for CREATE TABLE PARTITIONED BY DDL

regex for CREATE TABLE PARTITIONED BY DDL - regex

I need a regex that matches to both of these strings:
CREATE TEMPORARY TABLE db.table (cols)USING parquet PARTITIONED BY (DATA2, DATA3)
CREATE TABLE db.table (cols)USING parquet
The closest I've got is this:
CREATE +?(TEMPORARY +)?TABLE *(?P<db>.*?\.)?(?P<table>.*?)\((?P<col>.*?)\).*?USING.*?(PARTITIONED BY \((?P<pcol>.*?)\))
But that doesn't match to the second string. I've tried using a ? on the end but that didn't help. basically I've been playing around with this for hours now and can't figure it out, so I'm resorting to SO.
I've set up a demo of this here: https://regex101.com/r/ffSVuD/1 If anyone feels game enough to try and solve it, be my guest!

I ended up using CREATE +?(TEMPORARY +)?TABLE *(?P<db>.*?\.)?(?P<table>.*?)\((?P<col>.*?)\).*?USING +([^\s]+) *(PARTITIONED BY \((?P<pcol>.*?)\))? to match both your examples.
Basically, I replaced USING.*? by USING +(\[^\s\]+) *, so that you don't end up with a .*? before your last group.
Finally, I added a ? after your last group to make it optional.

Related

How to Keep rows of multi-line cells containing a keyword in google sheets

I'm trying to keep lines that contain the word "NOA" in a column A which has many multi-line cells as can be viewed in this Google Spreadsheet.
If "NOA" is present then, I would like to keep the line. The input and output should look like the image which I have "working" with too-many helper cells. Can this be combined into a single formula?
Theoretical Approaches:
I have been thinking about three approaches to solve this:
ARRAYFORMULA(REGEXREPLACE - couldn't get it to work
JOIN(FILTER(REGEXMATCH(TRANSPOSE - showing promise as it works in multiple steps
Using the QUERY Function - unfamiliar w/ function but wondering if this function has a fast solution
Practical attempts:
FIRST APPROACH: first I attempted using REGEXEXTRACT to extract out everything that did not have NOA in it, the Regex worked in demo but didn't work properly in sheets. I thought this might be a concise way to get the value, perhaps if my REGEX skill was better?
ARRAYFORMULA(REGEXREPLACE(A1:A7, "^(?:[^N\n]|N(?:[^O\n]|O(?:[^A\n]|$)|$)|$)+",""))
I think the Regex because overly complex, didn't work in Google or perhaps the formula could be improved, but because Google RE2 has limitations it makes it harder to do certain things.
SECOND APPROACH:
Then I came up with an alternate approach which seems to work 2 stages (with multiple helper cells) but I would like to do this with one equation.
=TRANSPOSE(split(A2,CHAR(10)))
=TEXTJOIN(CHAR(10),1,FILTER(C2:C7,REGEXMATCH(C2:C7,"NOA")))
Questions:
Can these formulas be combined and applied to the entire Column using an Index or Array?
Or perhaps, the REGEX in my first approach can be modified?
Is there a faster solution using Query?
The shared Google spreadhseet is here.
Thank you in advance for your help.

Here's one way you can do that:
=index(substitute(substitute(transpose(trim(
query(substitute(transpose(if(regexmatch(split(
filter(A2:A,A2:A<>""),char(10)),"NOA"),split(
filter(A2:A,A2:A<>""),char(10)),))," ","❄️")
,,9^9)))," ",char(10)),"❄️"," "))
First, we split the data by the newline (char 10), then we filter out the lines that don't contain NOA and finally we use a "query smush" to join everything back together.

Teradata REGEX or SUBSTR to remove the text between two *'s and the asterisk?

I'm working in teradata with a dataset that has several occurrences of data in the following format:
*6A*H.ORTHO I
*4A*IMP
*16A*T.IMPLANTS
*2A*HIMPLANTS
*9A*IMP
*5A*F.IMPLANT
*6A*DIMP
*4A*TISSUE
*5A*KIMP
*7A*IMP
*10A*D.IMP
*3A*W.LSH
*10A*IMP
*16A*IMP
*22A*T.IMPLANTS
In the dataset above I'm attempting to extract everything after the second occurrence of an asterick. I.E. D.IMP, IMP, T.IMPLANTS, F.IMPLANT, etc..
I've attempted to use SUBSTR and came close using:
SUBSTR(TRIM(FSS.Surgical_Inventory_Code),1,
INDEX(TRIM(FSS.Surgical_Inventory_Code),'*')-1)
But, that only returns the data after the first *.
I believe the best solution to solve problem would be using a REGEX expression or SUBSTR. There is a function in teradata called REGEXP_SUBSTR. I'm not exactly sure how to create a REGEX statement to solve my problem.

If you only ever have 2 asterisks in your string, you can use STRTOK:
strtok(<source string>,'*',2)

Use multiple replace conditions for a single column in Amazon Redshift

I have a table where the amount column has , and $ sign for example: $8,122.14 as values. I want to write a replace function to replace $ and , over that column in one go. Is there any way we can write multiple conditions in one replace in Redshift? Also, this is apart of post processing the data where I am inserting data from stage table to a final table after replacing these values.
I tried the ways listed in the take 1 and 2 given in the code but both of them failed.
Take 1:
insert into db.stage_table
select
(coalesce(replace(logging_amount,'$',','),''))) as logging_amount
from db.table;
Take 2:
insert into db.stage_table
select
(coalesce(replace(logging_amount,'$',',')) as logging_amount
from db.table;
Both of them failed.
The expected result should be replace function in a single statement.

Yes you can nest replace statements like this
replace(replace(logging_amount,'$',''),',','')
Or you can use regex if you prefer (personally for something like this i think nested replaces are easier to read.)

Split multiple hive queries if query contains a semicolon

I am trying to split multiple hive queries in files, and loop over them and run them using scala/spark. I am using .split(";"). But it is creating a problem when the query itself has a semicolon.
select * from table where value='myName\;is\;Name';
select * from table;
How can I escape the semicolon in the first query and split the above into 2 separate queries in scala

Let check this pattern:
.split("(?<!\\\\);")
In Java, it return a correct output, but I am not sure it work for you on Scala.
The pattern mean: Find the ; with not \ before.
You can find "Negative look behind" regex for more detail.

Using a RegEx in a SQL Query

Here's the situation I'm in: We have a field in our database that contains a 3 digit number, surrounded by some text. This number is actually a PK in another table, and I need to extract this out so I can implement a proper FK relationship. Here's an example of what would currently reside in the column:
Some Text Goes Here - (305) Followed By Some More Text
So, what I'm looking to do is extract the '305' from the column, and hopefully end up with a result that looks something like this (pseudo code)
SELECT
<My Extracted Value>,
Original Column Text,
Id
FROM dbo.MyTable
It seems to me that using a Regex match in my query is the most effective way to do this. Can anybody point me in the right direction?
EDIT: We're using SQL Server 2005

RegExp in SQL is defined by a SQL-Standard but most databases implemented their own syntax, you should tell us the product name of your RDBMS ;)

This is based on Pranay's first answer that has since been changed.
DECLARE #NumStr varchar(1000)
SET #NumStr = 'Some Text Goes Here - (305) Followed By Some More Text';
SELECT SUBSTRING(#NumStr,PATINDEX('%[0-9][0-9][0-9]%',#NumStr),3)
Returns 305

Microsoft seems to suggest using a CLR assembly to do Regex pattern matching in SQL Server 2005.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Apart from LIKE (which is not going to solve your problem) I don't know of a built-in pattern matching functionality in SQL Server 2005 (that is, more advanced than simple string searches).

Just after I implemented a solution in Postgres, I see you are using SqlServer... Just for the records, then, with a regex that extracts data in parenthesis.
Postgresql solution:
create table main(id text not null)
insert into main values('some text (44) other text');
insert into main values('and more text (78) and even more');
select substring(id from '\\(([^\\(]+)\\)') from main

The only way to access RegEx-type functions in SQL 2005 (and probably 2008) is by writing (or downloading) and using CLR functions.
If all the strings are always formatted in such a way as you can identify the specific numbers you want, you can do something like the following. This is based on the (big) assumption that the first set of parenthesis found in the string contains the number that you want.
/*
CREATE TABLE MyTable
(
MyText varchar(500) not null
)
INSERT MyTable values ('Some Text Goes Here - (305) Followed By Some More Text')
*/
SELECT
MyText -- String
,charindex('(', MyText) -- Where's the open parenthesis
,charindex(')', MyText) -- Where's the closed parenthesis
,substring(MyText
,charindex('(', MyText) + 1, charindex(')'
,MyText) - charindex('(', MyText) - 1) -- Glom it all together
from MyTable
Awkward as heck (because SQL has a pathetically limited set of string manipulation functions), but it works.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regex for CREATE TABLE PARTITIONED BY DDL - regex

Related

How to Keep rows of multi-line cells containing a keyword in google sheets

Teradata REGEX or SUBSTR to remove the text between two *'s and the asterisk?

Use multiple replace conditions for a single column in Amazon Redshift

Split multiple hive queries if query contains a semicolon

Using a RegEx in a SQL Query

Categories

Resources