Bigquery struct introspection - google-cloud-platform

Is there a way to get the element types of a struct? For example something along the lines of:
SELECT #TYPE(structField.y)
SELECT #TYPE(structField)
...etc
Is that possible to do? The closest I can find is via the query editor and the web call it makes to validate a query:

As I mentioned already in comments - one of the option is to mimic same very Dry Run call with query built in such a way that it will fail with exact error message that will give you the info you are looking for. Obviously this assumes your use case can be implemented in whatever scripting language you prefer. Should be relatively easy to do.
Meantime, I was looking for making this within the SQL Query.
Below is the example of another option.
It is limited to below types, which might fit or not into your particular use case
object, array, string, number, boolean, null
So example is
select
s.birthdate, json_type(to_json(s.birthdate)),
s.country, json_type(to_json(s.country)),
s.age, json_type(to_json(s.age)),
s.weight, json_type(to_json(s.weight)),
s.is_this, json_type(to_json(s.is_this)),
from (
select struct(date '2022-01-01' as birthdate, 'UA' as country, 1 as age, 2.5 as weight, true as is_this) s
)
with output

You can try the below approach.
SELECT COLUMN_NAME, DATA_TYPE
FROM `your-project.your-dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE TABLE_NAME = 'your-table-name'
AND COLUMN_NAME = 'your-struct-column-name'
ORDER BY ORDINAL_POSITION
You can check this documentation for more details using INFORMATION_SCHEMA for BigQuery.
Below is the screenshot of my testing.
DATA:
RESULT USING THE ABOVE SYNTAX:

Related

How to see 'full' SQL Error Messages in BigQuery?

I am writing a large MERGE statement in BigQuery.
When I attempt to run this query the validator gives me an error involving a lot of ...'s that hides the useful information as shown below:
Value has type ARRAY<STRUCT<eventName STRING, eventUUID STRING, eventDate DATE, ...>> which cannot be inserted into column Events, which has type ARRAY<STRUCT<eventName STRING, eventUUID STRING, eventDate DATE, ...>> at [535:1]
I am extremely confident these two array objects match exactly, however since I am struggling to get around this I would love to see the full error message.
Is there any way to see the full error?
I have looked into the Google Logging tool and cannot see any additional information.
I have also tried the following Cloud Shell command:
bq --format=prettyjson show -j [Job Id Goes Here]
Again, this seems to provide no additional information.
This approach feels pretty silly but it could be the last resort for really long nest type.
Use INFORMATION_SCHEMA.COLUMNS to get a full string of the target type, in your case, type of column Events.
Use CREATE TABLE <yourDataset>.<yourTempTable> AS SELECT ... to dump one row of the Value into a table. Use 1) again to see its full type string.

Pentaho PDI get SQL SUM() with conditions

I'm using Pentaho PDI 7.1. I'm trying to convert data from Mysql to Mysql changing the structure of data.
I'm reading the source table (customers) and for each row I've to run another query to calculate the balance.
I was trying to use Database value lookup to accomplish it but maybe is not the best way.
I've to run a query like this to get the balance:
SELECT
SUM(
CASE WHEN direzione='ENTRATA' THEN -importo ELSE +importo END
)
FROM Movimento WHERE contoFidelizzato_id = ?
I should set the parameter taking it from the previous step. Some advice?
The Database lookup value may be a good idea, especially if you are used to database reasoning, but it may result in many queries which may not be the most efficient.
A more PDI-ish style would be to make the query like:
SELECT contoFidelizzato_id
, SUM(CASE WHEN direzione='ENTRATA' THEN -importo ELSE +importo END)
FROM Movimento
GROUP BY contoFidelizzato_id
and use it as the info source of a Lookup Stream Step, like this:
An even more PDI-ish style would be to divert the source table (customer) in two flows : one in which you keep the source rows, and one that you group by contoFidelizzato_id. Of course, you need a formula, or a Javascript, or to put a formula in the SQL of the Table input to change the sign when needed.
Test to know which strategy is better in your case. You'll soon discover that the PDI is very good at handling large data.

QSqlQuery using with indexes

I have my own data store mechanism for store data. but I want to implement standards data manipulation and query interface for end users,so I thought QT sql is suitable for my case.
but I still cannot understand how do I involved my indexes for sql query.
let say for example,
I have table with column A(int),B(int),C(int),D(int) and column A is indexed.assume I execute query like select * from Foo where A = 10;
How do I involved my index for search the results?.
You have written your own storage system and want to manipulate it using an SQL like syntax? I don't think Qt SQL is the right tool for that job. It offers connectivity to various SQL servers and is not meant for parsing SQL statements. Qt expects to "pass through" the queries and then somehow parse the result set and transform it into a Qt friendly representation.
So if you only want to have a Qt friendly representation, I wouldn't see a reason to go the indirection with SQL.
But regarding your problem:
In SQL, indexes are usually not stated in the queries, but during the creation of the table schema. But SQL server has a possibility to "hint" indexes, is that what you are looking for?
SELECT column_list FROM table_name WITH (INDEX (index_name) [, ...]);

Case sensitive LINQ to DataSet

I am having an issue with a strongly typed DataSet exhibiting case-sensitivity using LINQ to DataSet to retrieve and filter data. In my example project, I have created a strongly typed DataSet called DataSet1. It contains a single DataTable called Customers. To instantiate and populate, I create a couple of rows (notice the casing on the names):
// Instantiate
DataSet1 ds = new DataSet1();
// Insert data
ds.Customers.AddCustomersRow(1, "Smith", "John");
ds.Customers.AddCustomersRow(2, "SMith", "Jane");
Next, I can easily fetch/filter using the DataSet's built-in Select functionality:
var res1 = ds.Customers.Select("LastName LIKE 'sm%'");
Console.WriteLine("DataSet Select: {0}", res1.Length);
DataSet Select: 2
The trouble begins when attempting to use LINQ to DataSet to perform the same operation:
var res2 = from c in ds.Customers where c.LastName.StartsWith("sm") select c;
Console.WriteLine("LINQ to DataSet: {0}", res2.Count());
LINQ to DataSet: 0
I've already checked the instantiated DataSet's CaseSensitive property as well as the Customer DataTable's CaseSensitive property--both are false. I also realize that when using the Select methodology, the DataSet performs the filtering and the LINQ query is doing something else.
My hope and desire for this type of code was to use it to Unit Test our Compiled LINQ to SQL queries so I can't really change all the current queries to use:
...where c.LastName.StartsWith("sm", StringComparison.CurrentCultureIgnoreCase) select c;
...as that changes the query in SQL. Thanks all for any suggestions!
LINQ to DataSets still use normal managed functions, including the standard String.StartsWith method.
It is fundamentally impossible for these methods to be aware of the DataTable's CaseSensitive property.
Instead, you can use an ExpressionVisitor to change all StartsWith (or similar) calls to pass StringComparison.CurrentCultureIgnoreCase.
You could also use c.LastName.ToLower().StartsWith("sm" which will make sure you also retrieve lower cased entries. Good luck!

WQL SELECT with optional column

I need to make a query like this:
SELECT PNPDeviceID FROM Win32_NetworkAdapter WHERE AdapterTypeId = 0
Trouble is, the AdapterTypeId column isn't always present. In this case, I just want everything, like so:
SELECT PNPDeviceID FROM Win32_NetworkAdapter
My WQL/SQL knowledge is extremely limited. Can anybody tell me how to do this in a single query?
EDIT:
A bit more background seems to be required: I am querying Windows for device information using WMI, which uses an SQL-like syntax. So, in my example, I am querying for network adapters that have an AdapterTypeId of 0.
That column is not always present however, meaning that if I enumerate through the returned values then "AdapterTypeId" is not listed.
EDIT 2:
Changed SQL to WQL; apparantly this is more correct.
I am assuming you mean the underlying schema is unreliable.
This is a highly unconventional situation. I suggest that you resolve the issue that is causing the column to not always be present, because to have the schema changing dynamically underneath your application is potentially (almost certainly) disastrous.
Update:
OK, so WQL lets you query objects with a SQL-like syntax but, unlike SQL, the schema can change underneath your feet. This is a classic example of a leaky abstraction, and I now hate WQL without ever having used it :).
Since the available properties are in flux, I am guessing that WQL provides a way to enumerate the properties for a given adapter. Do this, and choose which query to run depending upon the results.
After some Googling, there is an example here, which shows how to enumerate through the available properties. You can use this to determine if AdapterTypeId exists or not.
SELECT PNPDeviceID FROM Win32_NetworkAdapter WHERE AdapterTypeId = {yourDesire} OR AdapterTypeId IS NULL
I assume that you mean that this field is missing from the table.
Do you know before submitting the query if this field exists?
If yes then just create SQL dynamically, otherwise It think you will get syntax error in case of missing field
This is not an SQL question. SQL does not contemplate records with varying schemas in a single table source. Instead (as you mention) this is a different system using an "SQL-like" syntax. You'll have better luck if you recast the question using the actual product that you're trying to query, and information how that product deals with variable record structures is probably discussed in the documentation.