BigQuery - substr() cannot be negative - google-cloud-platform

I went from hive to bigQuery.
But when I run the query, I got this message: Third argument in SUBST() cannot be negative.
Substr(variable, instr(variable, ‘a’)+2, instr(variable, ‘f’) - instr(variable, ‘a’) - 3)

Don't forget the super powerful regex string function (extract and substring). Here an example
with input as (select "14526a utfsd f azd" as data)
select REGEXP_SUBSTR(data,"[a-z]{5}") from input

Related

Substitute for Function STUFF (SQL Server) in AWS redshift

I have to replace first 3 digits of a column to a fix first 3 digits (123)
Working SQL Server code. (Not working on AWS RedShift)
Code:
Select
Stuff (ColName,1,3,'123')as NewColName
From DataBase.dbo.TableName
eg 1 -Input --- 8010001802000000000209092396---output -1230001802000000000209092396
eg 2 -Input --- 555209092396- --output -123209092396
it should replace the first 3 digits to 123 irrespective of its length.
Please advice anything that is supported in AWS Redshift.
yet trying using substring and repalce.
I see that AWS RedShift was based on an old version of Postgres, and I looked up the SUBSTRING function for you (https://docs.aws.amazon.com/redshift/latest/dg/r_SUBSTRING.html), which is pretty forgiving of its argument values.
In this sample in Transact-SQL, and as documented for RedShift, the third argument of SUBSTRING can be much longer than the actual strings without causing an error. In Transact-SQL, even the second argument is "forgiving" if it starts after the end of the actual string:
;
WITH RawData AS
(SELECT * FROM (VALUES ('8010001802000000000209092396'),
('555209092396'),
('AB')
) AS X(InputString)
)
SELECT InputString, '123' + SUBSTRING(InputString, 4, 1000) AS OutputString
FROM RawData
InputString OutputString
8010001802000000000209092396 1230001802000000000209092396
555209092396 123209092396
AB 123
As it appears that the concatenation operator in Redshift is ||, I think your expression will be very close to:
'123' || SUBSTRING(InputString, 4, 1000)
Got this and it worked
--Using Substring and concat
Select
cast('123'+substring(ColName,4,LEN(ColName)) as numeric (28)) as NewColName
From DataBase.dbo.TableName

Regexp expression from Oracle SQL to Big Query

I previously had help here for an Regexp expression in oracle sql which worked great.However, our place is converting to Big Query and the regexp does not seem to be working anymore.
In my tables, i have the following data
WC 12/10 change FC from 24 to 32
W/C 12/10 change fc from 401 to 340
W/C12/10 18-26
This oracle sql would have split the table up to give me the before number (24) and (32) and (12/10).
cast(REGEXP_SUBSTR(Line_Comment, '((\d+ |\d+)(change )?(- |-|to |to|too|too )(\d+))', 1, 1, 'i',2) as Int) as Before,
cast(REGEXP_SUBSTR(Line_Comment, '((\d+ |\d+)(change )?(- |-|to |to|too|too )(\d+))', 1, 1, 'i', 5) as Int) as After,
REGEXP_SUBSTR(Line_Comment, '((\d+)(\/|-|.| )(\d+)(\/|-|.| )(\d+))|(\d+)(\/|-|.| )(\d+)', 1, 1, 'i') as WC_Date,
Totally understand that comments are not consistent and may not work but if it works more than 80% of the time which it has then we are fine with this.
Since moving to big query, I'm getting this error message. In oracle, the tables were in varchar but in big query when they migrated it, its now in strings. Could this be the reason why its broken?Is there anyone who can help with this?This is way over my head.
No matching signature for function REGEXP_SUBSTR for argument types:
STRING, STRING, INT64, INT64, STRING, INT64. Supported signatures:
REGEXP_SUBSTR(STRING, STRING, [INT64], [INT64]); REGEXP_SUBSTR(BYTES,
BYTES, [INT64], [INT64]) at [69:12]
Since google bigquery REGEXP_SUBSTR doesn't support the subexpr parameter of Oracle's REGEXP_SUBSTR, you need to modify your regexes to take advantage of the fact that:
If the regular expression contains a capturing group, the function returns the substring that is matched by that capturing group.
So for each value you are trying to extract, you need to make that the only capturing group in the regex:
cast(REGEXP_SUBSTR(Line_Comment, '(?:(\d+ |\d+)(?:change )?(?:- |-|to |to|too|too )(?:\d+))', 1, 1) as Int) as Before,
cast(REGEXP_SUBSTR(Line_Comment, '(?:(?:\d+ |\d+)(?:change )?(?:- |-|to |to|too|too )(\d+))', 1, 1) as Int) as After,
REGEXP_SUBSTR(Line_Comment, '((?:\d+)(?:\/|-|.| )(?:\d+)(?:\/|-|.| )(?:\d+))|((?:\d+)(?:\/|-|.| )(?:\d+))', 1, 1) as WC_Date,
Note you can substantially simplify your regexes as below:
(\d+) ?(?:change )?(?:-|too?) ?(?:\d+)
(?:\d+) ?(?:change )?(?:-|too?) ?(\d+)
(?:\d+)(?:[\/.-](?:\d+)){1,2}
Regex demos on regex101: numbers, date
Based on the sample data you provided in the comment section, you can try below query:
with t1 as (
select 'WC 12/10 change FC from 24 to 32' as Comment
union all select 'W/C 12/10 change fc from 401 to 340' as Comment,
union all select 'W/C12/10 18-26' as Comment
)
select Comment,
regexp_extract(t1.Comment, r'(\d+\/\d+)') as WC,
regexp_extract(t1.Comment, r'.+\s(\d{1,3})[\s|\-]') as Before,
regexp_extract(t1.Comment, r'.+[\sto\s|\-](\d{1,3})$') as After
from t1
Output:
Consider below super simple approach
select Comment,
format('%s/%s', arr[offset(0)], arr[safe_offset(1)]) as wc,
arr[safe_offset(2)] as before,
arr[safe_offset(3)] as after
from your_table, unnest([struct(regexp_extract_all(Comment, r'\d+') as arr)])
if applied to sample data in your question - output is

sqlite valid email input [duplicate]

I'd like to use a regular expression in sqlite, but I don't know how.
My table has got a column with strings like this: "3,12,13,14,19,28,32"
Now if I type "where x LIKE '3'" I also get the rows which contain values like 13 or 32,
but I'd like to get only the rows which have exactly the value 3 in that string.
Does anyone know how to solve this?
As others pointed out already, REGEXP calls a user defined function which must first be defined and loaded into the the database. Maybe some sqlite distributions or GUI tools include it by default, but my Ubuntu install did not. The solution was
sudo apt-get install sqlite3-pcre
which implements Perl regular expressions in a loadable module in /usr/lib/sqlite3/pcre.so
To be able to use it, you have to load it each time you open the database:
.load /usr/lib/sqlite3/pcre.so
Or you could put that line into your ~/.sqliterc.
Now you can query like this:
SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';
If you want to query directly from the command-line, you can use the -cmd switch to load the library before your SQL:
sqlite3 "$filename" -cmd ".load /usr/lib/sqlite3/pcre.so" "SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';"
If you are on Windows, I guess a similar .dll file should be available somewhere.
SQLite3 supports the REGEXP operator:
WHERE x REGEXP <regex>
http://www.sqlite.org/lang_expr.html#regexp
A hacky way to solve it without regex is where ',' || x || ',' like '%,3,%'
SQLite does not contain regular expression functionality by default.
It defines a REGEXP operator, but this will fail with an error message unless you or your framework define a user function called regexp(). How you do this will depend on your platform.
If you have a regexp() function defined, you can match an arbitrary integer from a comma-separated list like so:
... WHERE your_column REGEXP "\b" || your_integer || "\b";
But really, it looks like you would find things a whole lot easier if you normalised your database structure by replacing those groups within a single column with a separate row for each number in the comma-separated list. Then you could not only use the = operator instead of a regular expression, but also use more powerful relational tools like joins that SQL provides for you.
A SQLite UDF in PHP/PDO for the REGEXP keyword that mimics the behavior in MySQL:
$pdo->sqliteCreateFunction('regexp',
function ($pattern, $data, $delimiter = '~', $modifiers = 'isuS')
{
if (isset($pattern, $data) === true)
{
return (preg_match(sprintf('%1$s%2$s%1$s%3$s', $delimiter, $pattern, $modifiers), $data) > 0);
}
return null;
}
);
The u modifier is not implemented in MySQL, but I find it useful to have it by default. Examples:
SELECT * FROM "table" WHERE "name" REGEXP 'sql(ite)*';
SELECT * FROM "table" WHERE regexp('sql(ite)*', "name", '#', 's');
If either $data or $pattern is NULL, the result is NULL - just like in MySQL.
My solution in Python with sqlite3:
import sqlite3
import re
def match(expr, item):
return re.match(expr, item) is not None
conn = sqlite3.connect(':memory:')
conn.create_function("MATCHES", 2, match)
cursor = conn.cursor()
cursor.execute("SELECT MATCHES('^b', 'busy');")
print cursor.fetchone()[0]
cursor.close()
conn.close()
If regex matches, the output would be 1, otherwise 0.
With python, assuming con is the connection to SQLite, you can define the required UDF by writing:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
Here is a more complete example:
import re
import sqlite3
with sqlite3.connect(":memory:") as con:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
cursor = con.cursor()
# ...
cursor.execute("SELECT * from person WHERE surname REGEXP '^A' ")
I don't it is good to answer a question which was posted almost an year ago. But I am writing this for those who think that Sqlite itself provide the function REGEXP.
One basic requirement to invoke the function REGEXP in sqlite is
"You should create your own function in the application and then provide the callback link to the sqlite driver".
For that you have to use sqlite_create_function (C interface). You can find the detail from here and here
An exhaustive or'ed where clause can do it without string concatenation:
WHERE ( x == '3' OR
x LIKE '%,3' OR
x LIKE '3,%' OR
x LIKE '%,3,%');
Includes the four cases exact match, end of list, beginning of list, and mid list.
This is more verbose, doesn't require the regex extension.
UPDATE TableName
SET YourField = ''
WHERE YourField REGEXP 'YOUR REGEX'
And :
SELECT * from TableName
WHERE YourField REGEXP 'YOUR REGEX'
SQLite version 3.36.0 released 2021-06-18 now has the REGEXP command builtin.
For CLI build only.
Consider using this
WHERE x REGEXP '(^|,)(3)(,|$)'
This will match exactly 3 when x is in:
3
3,12,13
12,13,3
12,3,13
Other examples:
WHERE x REGEXP '(^|,)(3|13)(,|$)'
This will match on 3 or 13
You may consider also
WHERE x REGEXP '(^|\D{1})3(\D{1}|$)'
This will allow find number 3 in any string at any position
You could use a regular expression with REGEXP, but that is a silly way to do an exact match.
You should just say WHERE x = '3'.
If you are using php you can add any function to your sql statement by using: SQLite3::createFunction.
In PDO you can use PDO::sqliteCreateFunction and implement the preg_match function within your statement:
See how its done by Havalite (RegExp in SqLite using Php)
In case if someone looking non-regex condition for Android Sqlite, like this string [1,2,3,4,5] then don't forget to add bracket([]) same for other special characters like parenthesis({}) in #phyatt condition
WHERE ( x == '[3]' OR
x LIKE '%,3]' OR
x LIKE '[3,%' OR
x LIKE '%,3,%');
You can use the sqlean-regexp extension, which provides regexp search and replace functions.
Based on the PCRE2 engine, this extension supports all major regular expression features. It also supports Unicode. The extension is available for Windows, Linux, and macOS.
Some usage examples:
-- select messages containing number 3
select * from messages
where msg_text regexp '\b3\b';
-- count messages containing digits
select count(*) from messages
where msg_text regexp '\d+';
-- 42
select regexp_like('Meet me at 10:30', '\d+:\d+');
-- 1
select regexp_substr('Meet me at 10:30', '\d+:\d+');
-- 10:30
select regexp_replace('password = "123456"', '"[^"]+"', '***');
-- password = ***
In Julia, the model to follow can be illustrated as follows:
using SQLite
using DataFrames
db = SQLite.DB("<name>.db")
register(db, SQLite.regexp, nargs=2, name="regexp")
SQLite.Query(db, "SELECT * FROM test WHERE name REGEXP '^h';") |> DataFrame
for rails
db = ActiveRecord::Base.connection.raw_connection
db.create_function('regexp', 2) do |func, pattern, expression|
func.result = expression.to_s.match(Regexp.new(pattern.to_s, Regexp::IGNORECASE)) ? 1 : 0
end

How to get the more than one mached keywords using regexp_matches

How to get the more than one matched keywords in a given string.
Please find the below query.
SELECT regexp_matches(UPPER('bakerybaking'),'BAKERY|BAKING');
output: "{BAKERY}"
the above scenario given string is matched with two keywords.
when i execute the above query getting only one keyword only.
How to get other matched keywords.
g is a global search flag using in regex.Is used to get all the matching strings
select regexp_matches(UPPER('bakerybaking'),'BAKERY|BAKING','g')
regexp_matches
text[]
--------------
{BAKERY}
{BAKING}
to get the result as a single row :
SELECT ARRAY(select array_to_string(regexp_matches(UPPER('bakerybaking'),'BAKERY|BAKING','g'),''));
array
text[]
---------------
{BAKERY,BAKING}
by using unnest - to convert the array returned to a table
select unnest(regexp_matches(UPPER('bakerybaking'),'BAKERY|BAKING','g'))
unnest
text
------
BAKERY
BAKING
accoring to: http://www.postgresql.org/docs/9.5/static/functions-string.html
SELECT regexp_matches(UPPER('bakerybaking'),'(BAKERY)(BAKING)');
Otput:)
regexp_matches
----------------- {BAKERY,BAKING} (1 row)
Oh the humanity. Please thank me.
--https://stackoverflow.com/questions/52178844/get-second-match-from-regexp-matches-results
--https://stackoverflow.com/questions/24274394/postgresql-8-2-how-to-get-a-string-representation-of-any-array
CREATE OR REPLACE FUNCTION aaa(anyarray,Integer, text)
RETURNS SETOF text
LANGUAGE plpgsql
AS $function$
DECLARE s $1%type;
BEGIN
FOREACH s SLICE 1 IN ARRAY $1[$2:$2] LOOP
RETURN NEXT array_to_string(s,$3);
END LOOP;
RETURN;
END;
$function$;
--SELECT aaa((ARRAY(SELECT unnest(regexp_matches('=If(If(E_Reports_# >=1, DMT(E_Date_R1_#, DateShift),0)', '(\w+_#)|([0-9]+)','g'))::TEXT)),1,',')
--select (array[1,2,3,4,5,6])[2:5];
SELECT aaa(array_remove(Array(SELECT unnest(regexp_matches('=If(If(E_Reports_# >=1, DMT(E_Date_R1_#, DateShift),0)', '(\w+_#)|([0-9]+)','g'))::TEXT), Null),3,',')

SQL and regular expression to check if string is a substring of larger string?

I have a database filled with some codes like
EE789323
990
78000
These numbers are ALWAYS endings of a larger code. Now I have a function that needs to check if the larger code contains the subcode.
So if I have codes 90 and 990 and my full code is EX888990, it should match both of them.
However I need to do it in the following way:
SELECT * FROM tableWithRecordsWithSubcode
WHERE subcode MATCHES [reg exp with full code];
Is a regular expression like this this even possible?
EDIT:
To clarify the issue I'm having, I'm not using SQL here. I just used that to give an example of the type of query I'm using.
In fact I'm using iOS with CoreData, and I need a predicate to fetch me only the records that match.
In the way that is mentioned below.
Given the observations from a comment:
Do you have two tables, one called tableWithRecordsWithSubcode and another that might be tableWithFullCodeColumn? So the matching condition is in part a join - you need to know which subcodes match any of the full codes in the second table? But you're only interested in the information in the tableWithRecordsWithSubcode table, not in which rows it matches in the other table?
and the laconic "you're correct" response, then we have to rewrite the query somewhat.
SELECT DISTINCT S.*
FROM tableWithRecordsWithSubcode AS S
JOIN tableWithFullCodeColumn AS F
ON F.Fullcode ...ends-with... S.Subcode
or maybe using an EXISTS sub-query:
SELECT S.*
FROM tableWithRecordsWithSubcode AS S
WHERE EXISTS(SELECT * FROM tableWithFullCodeColumn AS F
WHERE F.Fullcode ...ends-with... S.Subcode)
This uses a correlated sub-query but avoids the DISTINCT operation; it may mean the optimizer can work more efficiently.
That just leaves the magical 'X ...ends-with... T' operator to be defined. One possible way to do that is with LENGTH and SUBSTR. However, SUBSTR does not behave the same way in all DBMS, so you may have to tinker with this (possibly adding a third argument, LENGTH(s.subcode)):
LENGTH(f.fullcode) >= LENGTH(s.subcode) AND
SUBSTR(f.fullcode, LENGTH(f.fullcode) - LENGTH(s.subcode)) = s.subcode
This leads to two possible formulations:
SELECT DISTINCT S.*
FROM tableWithRecordsWithSubcode AS S
JOIN tableWithFullCodeColumn AS F
ON LENGTH(F.Fullcode) >= LENGTH(S.Subcode)
AND SUBSTR(F.Fullcode, LENGTH(F.Fullcode) - LENGTH(S.Subcode)) = S.Subcode;
and
SELECT S.*
FROM tableWithRecordsWithSubcode AS S
WHERE EXISTS(
SELECT * FROM tableWithFullCodeColumn AS F
WHERE LENGTH(F.Fullcode) >= LENGTH(S.Subcode)
AND SUBSTR(F.Fullcode, LENGTH(F.Fullcode) - LENGTH(S.Subcode)) = S.Subcode);
This is not going to be a fast operation; joins on computed results such as required by this query seldom are.
I'm not sure why you think that you need a regular expression... Just use the charindex function:
select something
from table
where charindex(code, subcode) <> 0
Edit:
To find strings at the end, you can create a pattern with the % wildcard from the subcode:
select something
from table
where '%' + subcode like code