Is quote REQUIRED in calcite sql parser? - apache-calcite

From the config there is a few options, of \", [ etc.
However is it possible for calcite to parse identifiers without quotes? like
select * from a.b?
Now I must write
select * from \"a\".\"b\"
and it's kind of annoying.

Quotes are never required but you may need to change how Calcite handles the casing of identifiers depending on your application. By default, unquoted identifiers are converted to uppercase and quoted identifiers are left unchanged. So if you have lowercase identifiers, they need to be quoted by default.
To change this behaviour, you need to modify the config you pass to SqlParser so that the case of unquoted identifiers is also unchanged.
SqlParser.Config = SqlParser.configBuilder().setUnquotedCasing(Casing.Unchanged);
SqlParser parser = SqlParser.create(/* what you're parsing here */, config);

Related

Validating xml through xsd takes entities as referred character

I'm struggling with something strange. We have to send some xml generated documents to a website, where they are going to be parsed against a xsd. This xsd includes a data type for text fields where a regular expression is used to exclude different characters (in fact the reg exp includes a definite list of valid characters, excluding everything outside the rule).
For example, standard double quotes (") are excluded.
On our xml generator I've changed (") with it's entity " and with this change, those data that includes (") pass the validation.
Of course, quotes aren't our only problem, there are lots of different characters excluded from the reg exp that can be found on our source data.
So I made a little function on our Oracle package that checks units of data against the reg exp before adding them to the xml, and if any doesn't fit, loops through it's content seeking invalid characters and changing them by the html entity associated to their encoding (thus, quotes are changed by ")
My surprise is that, although the reg exp validates a piece of data with ", it doesn't validate a piece of data with " (and yes, (#) are included in the reg exp).
There are any reason for this? I don't know... are #numeric entities parsed before making the validation against a xsd or something like that?
BTW, the regExp is this:
[0-9a-zA-ZñáàéèíìóòúùÁÉÍÓÚÑü\s/çÇ¡!¿=\?%€#&#,;:\.\-_''\*\+\(\) ÀÈÌÒÙÜ’´´`·äëïöÄËÏÖ“”’]+

Use reserved keyword as alias in doctrine query builder

$em->createQueryBuilder()
->select("MIN(m.price) AS min")
->addSelect('MAX(m.price) AS max')
->from('AppBundle:Sites', 'm');`
How can I escape min to make this work? I tried to change min Alias to something like _min instead but there should be a better way.
I tried both single quotes and backticks but neither worked.
You won't be able to use min or max as an alias since it is simply not available with the current grammar of the DQL. You can find this info in the section of the documentation that is defining the grammar of DQL. There you will find out the following:
In Select Expressions:
SimpleSelectExpression ::= (...) [["AS"] AliasResultVariable]
In Identifiers:
/* Alias ResultVariable declaration (the "total" of "COUNT(*) AS total") */
AliasResultVariable = identifier
And eventually, in Terminals:
identifier (name, email, …) must match [a-z_][a-z0-9_]*
As you can see, there is nothing in there to help you escape your keyword in anyway. Thus, as it is, when stumbling upon min, the Lexer will identify it as the MIN function (see this section of the code) and not as an identifier, hence the error.
Long story short, you will have to either rely on a native query or use an alias name that is not one of the reserved keywords listed here.
Note: Doctrine allows you to implement your own quoting strategy as discussed in this post but the issue is unrelated. Here, the problem with your alias is that it is matched as a function by the DQL parser which is unexpected at this position.
Using Mysql, you could escape it using the backtick `min`
Using Postgres, you could escape it using the doublequote "min"
You can use another word like minimum as an alias to avoid being database dependant

Enabling Elasticsearch index names with illegal characters

I am trying to create elasticsearch indexes with strings like xxx/yyy and xxx yyy but these are not permitted because they contain illegal characters (/ and ). These names are largely user created and out of my control so changing the names for the sake of fitting into the requirements of elasticsearch is not really an option.
This is the exact error message:
[Error: InvalidIndexNameException[[XXX\%FFZZZ] Invalid index name [XXX\%FFZZZ], must not contain the following characters [\, /, *, ?, ", <, >, |, , ,]]]
Anyways, I've tried URL encoding the strings, but that doesn't work because those include capital letters which are not permitted and backslash escaping is out of the question because it is in the list of illegal characters.
Is there a conventional solution to this problem, or do I have to come up with some sketchy serialization and/or hashing scheme to solve this?
Hmm, letting users have the control on such things like index name is asking for troubles :)
But if you're willing to pursue that route, what I suggest is simply to remove any character that is not alphanumeric and lowercase the result in the process.
In PHP that would be:
$index = preg_replace("/[^a-z0-9]+/i", "", $index);
In Java:
index = index.replace("/[^a-z0-9]+/i", "");
In Javascript:
index = index.replace(/[^a-z0-9]+/i, "");
Please do not allow users to define the index name. You can try to filter out illegal characters, but your regexp might have an issue, and you might run into trouble later.
Also users might not understand why they create problems if one usere uses My_Index and writes stuff in and the next user trying to access yndex accesses the same index.
BTW: The regexp given above is more strict than the list of legal characters asks for. For example _ is legal (but not at the beginning of the name), if you wanted to create a regexp that allows everything that is legal by ES standards, your regexp becomes more complicated and more error prone.

c++ - escape special characters

I need to escape all special characters and replace national characters and get "plain text" for a tablename.
string getTableName(string name)
My string could be "šárka65_%&." and I want to get string I can use in my database as a tablename.
Which DBMS?
In standard SQL, a name enclosed in double quotes is a delimited identifier and may contain any characters.
In MS SQL Server, a name enclosed in square brackets is a delimited identifier.
In MySQL, a name enclosed in back-ticks is a delimieted identifier.
You could simply choose to enclose the name in the appropriate markers.
I had a feeling that wasn't what you wanted...
What codeset is your string in? It seems to be UTF-8 by the time it gets to my browser. Do you need to be able to invert the mapping unambiguously? That is harder.
You can use many schemes to map the information:
One simple minded one is simply to hex-encode everything, using a marker (X) to protect against leading digits:
XC5A1C3A1726B6136355F25262E
One slightly less simple minded one is hex-encode anything that is not already an ASCII alphanumeric or underscore.
XC5A1C3A1rka65_25262E
Or, as a comment suggests, you can devise a mapping table for accented Latin letters - indeed, a mapping table appropriately initialized will be the fastest approach. The input is the character in the source string; the output is the desired mapped character or characters. If you use an 8-bit character set, this is entirely manageable. If you use full Unicode, it is a lot less manageable (not least, how do you map all the Han syllabary to ASCII?).
Or ...

Modify PL/SQL statement strings in C++

This is my use case: Input is a string representing an Oracle PL/SQL statement of arbitray complexity. We may assume it's a single statement (not a script).
Now, several bits of this input string have to be rewritten.
E.g. table names need to be prefixed, aggregate functions in the selection list that don't use a column alias should be assigned a default one:
SELECT SUM(ABS(x.value)),
TO_CHAR(y.ID,'111,111'),
y.some_col
FROM
tableX x,
(SELECT DISTINCT ID
FROM tableZ z
WHERE ID > 10) y
WHERE
...
becomes
SELECT SUM(ABS(x.value)) COL1,
TO_CHAR(y.ID,'111,111') COL2,
y.some_col
FROM
pref.tableX x,
(SELECT DISTINCT ID, some_col
FROM pref.tableZ z
WHERE ID > 10) y
WHERE
...
(Disclaimer: just to illustrate the issue, statement does not make sense)
Since aggregate functions might be nested and subSELECTs are a b_tch, I dare not use regular expressions. Well, actually I did and achieved 80% of success, but I do need the remaining 20%.
The right approach, I presume, is to use grammars and parsers.
I fiddled around with c++ ANTLR2 (although I do not know much about grammars and parsing with the help of such). I do not see an easy way to get the SQL bits:
list<string> *ssel = theAST.getSubSelectList(); // fantasy land
Could anybody maybe provide some pointers on how "parsing professionals" would pursue this issue?
EDIT: I am using Oracle 9i.
Maybe you can use this, it changes an select statement into an xml block:
declare
cl clob;
begin
dbms_lob.createtemporary (
cl,
true
);
sys.utl_xml.parsequery (
user,
'select e.deptno from emp e where deptno = 10',
cl
);
dbms_output.put_line (cl);
dbms_lob.freetemporary (cl);
end;
/
<QUERY>
<SELECT>
<SELECT_LIST>
<SELECT_LIST_ITEM>
<COLUMN_REF>
<SCHEMA>MICHAEL</SCHEMA>
<TABLE>EMP</TABLE>
<TABLE_ALIAS>E</TABLE_ALIAS>
<COLUMN_ALIAS>DEPTNO</COLUMN_ALIAS>
<COLUMN>DEPTNO</COLUMN>
</COLUMN_REF>
....
....
....
</QUERY>
See here: http://forums.oracle.com/forums/thread.jspa?messageID=3693276&#3693276
Now you 'only' need to parse this xml block.
Edit1:
Sadly I don't fully understand the needs of the OP but I hope this can help (It is another way of asking the 'names' of the columns of for example query select count(*),max(dummy) from dual):
set serveroutput on
DECLARE
c NUMBER;
d NUMBER;
col_cnt PLS_INTEGER;
f BOOLEAN;
rec_tab dbms_sql.desc_tab;
col_num NUMBER;
PROCEDURE print_rec(rec in dbms_sql.desc_rec) IS
BEGIN
dbms_output.new_line;
dbms_output.put_line('col_type = ' || rec.col_type);
dbms_output.put_line('col_maxlen = ' || rec.col_max_len);
dbms_output.put_line('col_name = ' || rec.col_name);
dbms_output.put_line('col_name_len = ' || rec.col_name_len);
dbms_output.put_line('col_schema_name= ' || rec.col_schema_name);
dbms_output.put_line('col_schema_name_len= ' || rec.col_schema_name_len);
dbms_output.put_line('col_precision = ' || rec.col_precision);
dbms_output.put_line('col_scale = ' || rec.col_scale);
dbms_output.put('col_null_ok = ');
IF (rec.col_null_ok) THEN
dbms_output.put_line('True');
ELSE
dbms_output.put_line('False');
END IF;
END;
BEGIN
c := dbms_sql.open_cursor;
dbms_sql.parse(c,'select count(*),max(dummy) from dual ',dbms_sql.NATIVE);
dbms_sql.describe_columns(c, col_cnt, rec_tab);
for i in rec_tab.first..rec_tab.last loop
print_rec(rec_tab(i));
end loop;
dbms_sql.close_cursor(c);
END;
/
(See here for more info: http://www.psoug.org/reference/dbms_sql.html)
The OP also want to be able to change the schema name of the table in a query. I think the easiest say to achieve that is to query the table names from user_tables and search in sql statement for those table names and prefix them or to do a 'alter session set current_schema = ....'.
If the source of the SQL statement strings are other coders, you could simply insist that the parts that need changing are simply marked by special escape conventions, e.g., write $TABLE instead of the table name, or $TABLEPREFIX where one is needed. Then finding the places that need patching can be accomplished with a substring search and replacement.
If you really have arbitrary SQL strings and cannot get them nicely marked, you need to somehow parse the SQL string as you have observed. The XML solution certainly is one possible way.
Another way is to use a program transformation system. Such a tool can parse a string for a language instance, build ASTs, carry out analysis and transformation on ASTs, and then spit a revised string.
The DMS Software Reengineering Toolkit is such a system. It has PLSQL front end parser. And it can use pattern-directed transformations to accomplish the rewrites you appear to need. For your example involving select items:
domain PLSQL.
rule use_explicit_column(e: expression):select_item -> select_item
"\e" -> "\e \column\(\e\)".
To read the rule, you need to understand that the stuff inside quote marks represents abstract trees in some computer langauge which we want to manipulate. What the "domain PLSQL" phrase says is, "use the PLSQL parser" to process the quoted string content, which is how it knows. (DMS has lots of langauge parsers to choose from). The terms
"expression" and "select_item" are grammatical constructs from the language of interest, e.g., PLSQL in this case. See the railroad diagrams in your PLSQL reference manual.
The backslash represents escape/meta information rather than target langauge syntax.
What the rule says is, transform those parsed elements which are select_items
that are composed solely of an expression \e, by converting it into a select_item consisting of the same expression \e and the corresponding column ( \column(\e) ) presumably based on position in the select item list for the specific table. You'd have to implement a column function that can determine the corresponding name from the position of the select item. In this example, I've chosen to define the column function to accept the expression of interest as argument; the expression is actually passed as the matched tree, and thus the column function can determine where it is in the select_items list by walking up the abstract syntax tree.
This rule handles just the select items. You'd add more rules to handle the other various cases of interest to you.
What the transformation system does for you is:
parse the language fragment of interest
build an AST
let you pattern match for places of interest (by doing AST pattern matching)
but using the surface syntax of the target langauge
replace matched patterns by other patterns
compute aritrary replacements (as ASTs)
regenerate source text from the modified ASTs.
While writing the rules isn't always trivial, it is what is necessary if your problem
is stated as posed.
The XML suggested solution is another way to build such ASTs. It doesn't have the nice pattern matching properties although you may be able to get a lot out of XSLT. What I don't know is if the XML has the parse tree in complete detail; the DMS parser does provide this by design as it is needed if you want to do arbitrary analysis and transformation.