Customizing Calcite's Sql parser - apache-calcite

I am trying to use Apache Calcite's SqlParser to get entity level lineage for my views written in Hive and Impala. However, the parser throws an exception for the following cases.
SELECT emp.1ab FROM emp -- Column name begins with numeric
org.apache.calcite.sql.parser.SqlParseException: Encountered ".1" at line 1, column 12.
Was expecting one of:
<EOF>
"AS" ...
SELECT PERIOD from emp -- column name is a key word (period)
Caused by: org.apache.calcite.sql.parser.impl.ParseException: Encountered "period from" at line 1, column 9.
Was expecting one of:
"ALL" ...
This is my SqlParser config.
SqlParser.Config config = SqlParser
.configBuilder()
.setConformance(SqlConformanceEnum.LENIENT)
.build();
Is there any other config setting available to customize the parser to suit my needs. All these queries are valid in Hive.

Related

Google Sheet query where literal string contains field value

Let's say you have a table of properties in a Google Sheet...
Col A (Name)
Property 1
Property 2
Property 3
Property 4
Property 5
... and you want a formula-driven solution that pulls data on certain properties, specified by a comma-separated literal string like "Property 2,Property 5".
The query() function comes to mind, which uses mySQL syntax. I tried these WHERE queries:
SELECT A WHERE 'Property 2, Property 5' LIKE '%{$A}%' -- No error but returns empty set.
SELECT A WHERE INSTR('Property 2, Property 5', A) -- returned error: Unable to parse query string for function QUERY parameter 2: PARSE_ERROR: Encountered " 'INSTR'" at line 1, column 16. Was expecting one of: "("... "("...
Is there some other query to find the needle in the haystack, where the haystack is a literal string and the needle is a field in the query?
try:
=QUERY(A:A; "select A where A matches 'Property 2|Property 5'"; )

Failed to parse SQL query - column invalid identifier

I am on Application Express 21.1.0.
I added a column to a db table, and tried to add that column to the Form region based on that table.
I got error...
ORA-20999: Failed to parse SQL query! ORA-06550: line 4, column 15: ORA-00904: "NEEDED_EXAMS": invalid identifier
And I can not find the column in any "source> column" attribute of any page item of that form.
I can query the new column in "SQL COMMANDS".
The new column's name is "NEEDED_EXAMS". It's a varachar2(500).
Don't do it manually; use built-in feature by right-clicking the region and then select "Synchronize columns" from the menu as it'll do everything for you. It works for reports and forms.
Solved.
I have many parsing schemas. And I was creating tables through object browser in different schema than my app's parsing schema.

powerBI lookup and substrings

I am looking for a DAX function, or any trick with pbi desktop, to do the operation below :
2 tables :
'data' table, with a data column in text format
eg: "Product XXX.YYY"
'ref' table, with 2 columnes, key and label. The keys are usually some substrings of the data fieds.
eg. "Product XXX"; "Label 1"
I need to add to the data table a measure that contains the label matching the first key from the ref table that is a substring of the data
eg : "Product XXX" is a substring of "Product XXX.YYY" => should return "Label 1"
Of course, all values on the data column have a diferent format, so I can not manipulate the data to split "Product XXX" and "YYY" in a static way.
thanks a lot !
Alternative : rather than a dummy substring, if the key could be a full regex pattern to be matched, would be far more interesting ... :)
Without example data it's hard to test if this solution works, but since October 2018 there is a fuzzy matching preview feature.
Steps summary
Enable the fuzzy merge preview feature via File → Options → Preview Features
Go to Edit Queries and click on Merge Queries as New
Select the data table (containing the Product XXX.YYY values) as the source
Select the ref table as the match table
Pick a join type from the drop down
Tick the fuzzy matching checkbox
Try out some fuzzy matching options to see what is suitable

Does AWS Athena supports Sequence File

Has any one tried creating AWS Athena Table on top of Sequence Files. As per the Documentation looks like it is possible. I was able to execute below create table statement.
create external table if not exists sample_sequence (
account_id string,
receiver_id string,
session_index smallint,
start_epoch bigint)
STORED AS sequencefile
location 's3://bucket/sequencefile/';
The Statement executed Successfully but when i try to read data from the table it throws below error
Your query has the following error(s):
HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://viewershipforneo4j/2017-09-26/000030_0 (offset=372128055, length=62021342) using org.apache.hadoop.mapred.SequenceFileInputFormat: s3://viewershipforneo4j/2017-09-26/000030_0 not a SequenceFile
This query ran against the "default" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 9f0983b0-33da-4686-84a3-91b14a39cd09.
Sequence file are valid one . Issue here is there is not deliminator defined.
Ie row format delimited fields terminated by is missing
if in your case if tab is column deliminator row data is in next row it will be
create external table if not exists sample_sequence (
account_id string,
receiver_id string,
session_index smallint,
start_epoch bigint)
row format delimited fields terminated by '\t'
STORED AS sequencefile
location 's3://bucket/sequencefile/';

Django with Oracle database 11g

I am new to the python language and django. I need to connect the django to the oracle database 11g, I have imported the cx_oracle library and using the instant client for connecting oracle with django, but when i run the command manage inspectdb > models.py. I get error as Invalid column identifier in the models.py. How could i solve it. I have only 2 tables in that schema i am connecting?
"Invalid column" suggests that you specified column name that doesn't exist in any of those tables, or you misspelled its name.
For example:
SQL> desc dept
Name
-----------------------------------------
DEPTNO
DNAME
LOC
SQL> select ndame from dept; --> misspelled column name
select ndame from dept
*
ERROR at line 1:
ORA-00904: "NDAME": invalid identifier
SQL> select imaginary_column from dept; --> non-existent column name
select imaginary_column from dept
*
ERROR at line 1:
ORA-00904: "IMAGINARY_COLUMN": invalid identifier
SQL>
Also, pay attention to letter case, especially if you created tables/columns using mixed case and enclosed those names into double quotes (if so, I'd suggest you to drop tables and recreate them, without double quotes. If you can't do that, you'll have to reference them using double quotes and exactly the same letter case).
So - check column names and compare them to your query. If you still can't make it work, post some more information - table description and your code.
I've faced the same problem. The problem is that Django expects your table have primary key (ID), so when your table is without key, it returns Invalid columns identifier.
https://docs.djangoproject.com/en/2.1/topics/db/models/#automatic-primary-key-fields