Google Big Query, convert legacy sql to standard sql - regex

I'm trying to convert a LegacySql Query to StandardSQL
SELECT * FROM
TABLE_QUERY([prod-chap_out],'REGEXP_MATCH(table_id, r"OUT\d+$")')
This query works fine in Legacy SQL, however it can't be converted to a json response when used in API's. I would rather serialize this into json than have to work with a bunch of data tables and converting values.
How can this be converted to standardSQL?
I've tried
REGEXP_CONTAINS(table_id, r"OUT\d+$"))
but I get the error \d is an illegal character.

You can use the wildcard * in your FROM and the resulting pseudo column _table_suffix in your WHERE:
SELECT
*
FROM
`<project-id>.<dataset-id>.<table-prefix>*`
WHERE
REGEXP_MATCH(_table_suffix, r"OUT\d+$")
I'm not entirely sure how you table names look like - here is the official documentation on transitioning to standard SQL: https://cloud.google.com/bigquery/docs/reference/standard-sql/wildcard-table-reference#the_table_query_function

Related

Failure while converting text column to binary data type [duplicate]

In my application I am using a postgresql database table with a "text" column to store
pickled python objects.
As database driver I'm using psycopg2 and until now I only passed python-strings (not unicode-objects) to the DB and retrieved strings from the DB. This basically worked fine until I recently decided to make String-handling the better/correct way and added the following construct to my DB-layer:
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
This basically works fine everywhere in my application and I'm using unicode-objects where possible now.
But for this special case with the text-column containing the pickled objects it makes troubles. I got it working in my test-system this way:
retrieving the data:
SELECT data::bytea, params FROM mytable
writing the data:
execute("UPDATE mytable SET data=%s", (psycopg2.Binary(cPickle.dumps(x)),) )
... but unfortunately I'm getting errors with the SELECT for some columns in the production-system:
psycopg2.DataError: invalid input syntax for type bytea
This error also happens when I try to run the query in the psql shell.
Basically I'm planning to convert the column from "text" to "bytea", but the error
above also prevents me from doing this conversion.
As far as I can see, (when retrieving the column as pure python string) there are only characters with ord(c)<=127 in the string.
The problem is that casting text to bytea doesn't mean, take the bytes in the string and assemble them as a bytea value, but instead take the string and interpret it as an escaped input value to the bytea type. So that won't work, mainly because pickle data contains lots of backslashes, which bytea interprets specially.
Try this instead:
SELECT convert_to(data, 'LATIN1') ...
This converts the string into a byte sequence (bytea value) in the LATIN1 encoding. For you, the exact encoding doesn't matter, because it's all ASCII (but there is no ASCII encoding).

Azure Web Job bad encoding downloaded data from data lake store

Currently, I'm just want to download files from data lake store and store data into my sql database but I have problem with strings that shoudl containt characters like (ę, ą, ć, ł) but it is replaced by (e,a,c,l). Currently I'm tried changing Culture Information and Encoding in Stream Reader but it doesn't give me any better result (still getting replaced characters in my string values). So is there any work around or any place where I can globally set encoding parameters for my app service and web jobs included in web app service?
The issue is not related to WebJob. We could read any special character from any place and write it to another place due to the read and write work at byte level.
strings that shoudl containt characters like (ę, ą, ć, ł) but it is replaced by (e,a,c,l).
What column type did you define for the special characters in your SQL Server? If the column type is char or varchar. It will lost data if you store special characters. Change the column type to nchar or nvarchar will solve this issue.
Here is the test from my side.
Step 1, Define a table using following SQL statement.
CREATE TABLE [dbo].[mytable]
(
[id] INT NOT NULL PRIMARY KEY,
[text1] varchar(50),
[text2] nvarchar(50)
)
Step 2, Insert a row using following SQL statement.
insert into mytable (id, text1, text2) values(1, 'ę, ą, ć, ł', 'ę, ą, ć, ł')
Step 3, Query data from mytable using following SQL statement.
select * from dbo.mytable
Here is the result I got.
According to the result, the value of text1 was changed to 'e,a,c,l' due to the column type is varchar.

QSqlQuery using with indexes

I have my own data store mechanism for store data. but I want to implement standards data manipulation and query interface for end users,so I thought QT sql is suitable for my case.
but I still cannot understand how do I involved my indexes for sql query.
let say for example,
I have table with column A(int),B(int),C(int),D(int) and column A is indexed.assume I execute query like select * from Foo where A = 10;
How do I involved my index for search the results?.
You have written your own storage system and want to manipulate it using an SQL like syntax? I don't think Qt SQL is the right tool for that job. It offers connectivity to various SQL servers and is not meant for parsing SQL statements. Qt expects to "pass through" the queries and then somehow parse the result set and transform it into a Qt friendly representation.
So if you only want to have a Qt friendly representation, I wouldn't see a reason to go the indirection with SQL.
But regarding your problem:
In SQL, indexes are usually not stated in the queries, but during the creation of the table schema. But SQL server has a possibility to "hint" indexes, is that what you are looking for?
SELECT column_list FROM table_name WITH (INDEX (index_name) [, ...]);

Concat strings in SQL Server and Oracle with the same unmodified query

I have a program that must support both Oracle and SQL Server for it's database.
At some point I must execute a query where I want to concatenate 2 columns in the select statement.
In SQL Server, this is done with the + operator
select column1 + ' - ' + column2 from mytable
And oracle this is done with concat
select concat(concat(column1, ' - '), column2) from mytable
I'm looking for a way to leverage them both, so my code has a single SQL query literal string for both databases and I can avoid ugly constructs where I need to check which DBMS I'm connected to.
My first instinct was to encapsulate the different queries into a stored procedure, so each DBMS can have their own implementation of the query, but I was unable to create the procedure in Oracle that returns the record set in the same way as SQL Server does.
Update: Creating a concat function in SQL Server doesn't make the query compatible with Oracle because SQL Server requires the owner to be specified when calling the function as:
select dbo.concat(dbo.concat(column1), ' - '), column2) from mytable
It took me a while to figure it out after creating my own concat function in SQL Server.
On the other hand, looks like a function in Oracle with SYS_REFCURSOR can't be called with a simple
exec myfunction
And return the table as in SQL Server.
In the end, the solution was to create a view with the same name on both RDBMs but with different implementations, then I can do a simple select on the view.
If you want to go down the path of creating a stored procedure, whatever framework you're using should be able to more or less transparently handle an Oracle stored procedure with an OUT parameter that is a SYS_REFCURSOR and call that as you would a SQL Server stored procedure that just does a SELECT statement.
CREATE OR REPLACE PROCEDURE some_procedure( p_rc OUT sys_refcursor )
AS
BEGIN
-- You could use the CONCAT function rather than Oracle's string concatenation
-- operator || but I would prefer the double pipes.
OPEN p_rc
FOR SELECT column1 || ' - ' || column2
FROM myTable;
END;
Alternatively, you could define your own CONCAT function in SQL Server.
Nope, sorry.
As you've noted string concatentaion is implemented in SQL-Server with + and Oracle with concat or ||.
I would avoid doing some nasty string manipulation in stored procedures and simply create your own concatenation function in one instance or the other that uses the same syntax. Probably SQL-Server so you can use concat.
The alternative is to pass + or || depending on what RDBMS you're connected to.
Apparently in SQL Server 2012 they have included a CONCAT() function:
http://msdn.microsoft.com/en-us/library/hh231515.aspx
If you are trying to create a database agnostic application, you should stick to either
Stick to very basic SQL and do anything like this in your application.
Create different abstractions for different databases. If you hope to get any kind of scale out of your application, this is the path you'll likely need to take.
I wouldn't go down the stored procedure path, you can probably get it to work, but but week you'll find out you need to support "database X", then you'll need to rewrite your stored proc in that database as well. Its a recipe for pain.

Using a single ADO Query to copy data from a text file into another ODBC source

This may seem a odd question as I have a solution, I just dont understand why and that limits me.
I am copying data from various sources into SQL and am using a ADO connection in C++ Builder XE2.
When the data is from MSAccess or MSExcel the code is similar to the following:
//SetupADO..
ADOConn->ConnectionString="Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:/temp/testdb.mdb";
//Then open it..
ADOConn->Connected = true;
//Build SQL
UnicodeString sSQL = "SELECT * INTO [ODBC;DSN=PostgreSQL30;DATABASE=admin_db;SERVER=192.168.1.10;PORT=5432;UID=user1;PWD=pass1;SSLmode=disable;ReadOnly=0;Protocol=7.4;].[table1] FROM [accesstb]";
//And finally I use the EXCEUTE() function of the ADO Connection
ADOConn->Execute(sSQL, iRA, TExecuteOptions() << TExecuteOption::eoExecuteNoRecords);
This works fine for Excel too but not for CSV files. I'm using the same driver must can only get it working by changing the syntax around.
//SetupADO..
ADOConn->ConnectionString="Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\\temp;Extended Properties=\"Text;HDR=Yes;\";Persist Security Info=False";
//Then open it..
ADOConn->Connected = true;
//Build SQL with the IN keyword and start internal ODBC connection with 2 single quotes
UnicodeString sSQL = "SELECT * INTO [table1] IN '' [ODBC;DSN=PostgreSQL30;DATABASE=admin_db;SERVER=192.168.1.10;PORT=5432;UID=user1;PWD=pass1;SSLmode=disable;ReadOnly=0;Protocol=7.4;] FROM [test.csv]";
//And finally EXCEUTE() again
ADOConn->Execute(sSQL, iRA, TExecuteOptions() << TExecuteOption::eoExecuteNoRecords);
When using the same SQL as the Access query the error "Query input must contain at least one table or query" would be returned.
Intrestingly, one escaped quote, i.e. \' fails when used in place of the 2 single ones. I have also tried writing to another Access database in case the problem was with PG but I had the same results.
Can someone tell me why the IN keywork is required and what the single quotes do?
Extended Properties=\"Text;HDR=Yes;\" specifies text as the datasource, so the connection string is different. IN '' tells the database to map table1 to the first column of the CSV file, since there is no relational model in CSV.
References
Importing CSV Data and saving it in database - CodeProject