How to modify a RelNode tree? - apache-calcite

I am using Apache Calcite to validate and rewrite SQL based on policies that put certain restrictions on these SQL queries. I am trying to modify a RelNode tree in order to rewrite the query to enforce these restrictions. I want to be able to remove certain parts from a query (after it has been validated). For example, I want to be able to remove projection fields (which I managed to do using RelBuilder.projectExcept) and to remove a table scan and its corresponding column references from a query.
Simple example:
SELECT a.foo, b.bar, c.baz
FROM a, b, c
WHERE a.index = b.index AND b.index = c.index
Let's say we want to remove table c from the query, to get to the following:
SELECT a.foo, b.bar
FROM a, b
WHERE a.index = b.index
I have tried using RelBuilder but this does not support removing nodes from the tree. I have also thought about an approach using RelVisitor but this seems quite complicated for this purpose. I think it would essentially require building a new RelNode tree. Lastly, implementing rules using RelRule seems like it would be a suitable option, but I cannot figure out from the Calcite documentation how to remove a particular RelNode and how to parameterize this (e.g. conditionally apply the rule if the table name is c).
Can anyone point me to a good approach? Alternatively, would it be easier to just modify the SqlNode parse tree?

A rule transforms (in this case TransformationRule) a RelNode to an equivalent RelNode i.e both should have the same row. Assumming you want to use HepPlanner with your custom rule registered and if the rule matches, it will eventually check whether the original rel and the transformed rel have the same row using RelOptUtil#verifyTypeEquivalence. I think mutating the relNode via RelVisitor or mutating the sqlNode via SqlVisitor is your best bet.

Related

Lucene Syntax For More Complicated Queries

I am developing a website for my company, that allows users to query a database in order to get the information they need.
Currently, the users are used to a particular form of queries, and I don't want to make them change the way they are used to. Therefore, I need to convert their query to Lucene's query syntax.
There are some cases which I'm not sure what is the best way to implement them using Lucene syntax, I was wondering maybe you have some better ideas:
"Current Query" : serverRole=~'(ServerOne|ServerTwo|ServerThree)'
"Lucene Suggested": (serverRole:*ServerOne* OR serverRole:*ServerTwo* OR serverRole:*ServerThree*)
Take into account that I'm using Regex to convert these queries, so one of the difficulties I'm facing for example, is how to do it if the number of elements (ServerOne|ServerTwo|ServerThree.....) is dynamic:
luceneQuery = currentQuery
.replace(/(==~|=~)('|")([a-zA-Z0-9]+)(\|)([a-zA-Z0-9]+)('|")/g, ':*$3 OR $5*')
Another query for example:
"Current Query" : OS=~'SLES1[12]'
"Lucene Suggested": (OS:*SLES11* OR OS:*SLES12*)
I would recomand you to check BooleanQuery() on Lucene to create more complex queries like Wildcard , Term, Fuzzy U can include all by using Occur parameter while u build your queries. As an example
Query query1 = new WildcardQuery(new Term("contents", "*ServerOne*"));
Query query2 = new WildcardQuery(new Term("contents", "*ServerTwo*"));
BooleanQuery booleanQuery = new BooleanQuery.Builder()
.add(query1, BooleanClause.Occur.SHOULD)
.add(query2, BooleanClause.Occur.SHOULD)
.build();
There is also regex queries you can directly run but when your indexed field will be complicates it taking time to find regex match

how to evaluate sql query using RelNode object

I am trying to convert sql query to Tinkerpop Gremlin. sql2Gremlin library does it but it looks on join as relation while I am relying on no join approach where you can refer relations with dot as delimiter between two entity.
I have parsed and validated query and I have RelRoot object.
Apache calcite returns RelRoot object which is root of algebraic expression.
Lets say I dont want to apply any query optimization, How do i use my RelNode Visitor to transform the RelRoot into TinkerPop Gremlin DSL.
Ideally I would first use From clause and then apply filters defined in where clause? How is select, filters, From clause represent in RelRoot tree?
What does apache calcite means by relational expression or RelNode?
Rephrasing the same question without TinkerPop Gremlin context:
How should I use RelRoot visitor to visit the RelRoot and transform the query to another DSL?
I don't know why you insist on RelRoot and not RelNode tree, but Apache Calcite is doing its optimizations of relational algebra in RelNode stack. There is a class called RelVisitor that you might find interesting, since it can do exactly what you need: visit all RelNodes. You can then extract information you need from them and build your DSL with it.
EDIT: In RelVisitor, you have access to the parent node and the child nodes of the currently visited node. You can extract all the information usually available to the RelNode object (see docs), and if you cast it to specific relational algebra operation, for example, Project, you can extract what fields are inside Project operation by doing node.getRowType().getFieldList().forEach(field -> names.add(field.getName())), where names is a previously defined Set<String>. You can find the full code here.
You should also take a look at the algebra docs to understand how SQL maps to relational algebra in Calcite before attempting this.

Is it possible to query multiple AWS Cloudsearch fields for the same value without repeating?

Using AWS Cloudsearch, I need to query 2 separate fields for the same value using a structured (compound) query e.g.
(and (or name:'john smith') (or curr_addr:'123 someplace' other_addr:'123 someplace'))
This query works, but I'm wondering if it's necessary to repeat the value for each field that I want to search against. Is there some way to specify the value only once e.g. curr_addr+other_addr:'123 someplace'
That is the correct way to structure your compound query. From the AWS documentation, you'll see that they structure their example query the same way:
(and title:'star' (or actors:'Harrison Ford' actors:'William Shatner')(not actors:'Zachary Quinto'))
From Constructing Compound Queries
You may be able to get around this by listing the more repetitive fields in the query options (q.options), and then specify the field for the rest of the fields. The fields list is sort of a fallback for when you don't specify which field you are searching in a compound query. So if you list the address fields there, and then only specify the name field in your query, you may get close to the behavior you're looking for.
Query options
q.options={fields: ['curr_addr','other_addr']}
Query
(and (or name:'john smith') (or '123 someplace'))
But this approach would only work for one set of repetitive fields, so it's not a silver bullet by any means.
From Search API Reference (see q.options => fields)

JPA2 how to select n entities starting at the i'th entity/row

I have let the modeling tools in my IDE create entities from tables, so each entity is one record. How can I select n records starting at the i'th record, such that I may easily implement pagination?
Using criteria queries but a simple reference should be enough. My tables are varied so I can't do this by key. I can do this with native queries but am uncertain how at the moment how a criteria query and native query can be combined.
Currently I am returning a list and discarding the portion I do not want, this is proving to be too inefficient.
you can use the combination of javax.persistence.Query#setFirtsResult and javax.persistence.Query#setMaxResult if you don't insist on using criteria.
Criteria criteria
= session.createCriteria(SomeClass.class);
criteria.setFirstResult(0);
criteria.setMaxResults(10);

doctrine: QueryBuilder vs createQuery?

In Doctrine you can create DQL in 2 ways:
EntityManager::createQuery:
$query = $em->createQuery('SELECT u FROM MyProject\Model\User u WHERE u.id = ?1');
QueryBuilder:
$qb->add('select', 'u')
->add('from', 'User u')
->add('where', 'u.id = ?1')
->add('orderBy', 'u.name ASC');
I wonder what the difference is and which should I use?
DQL is easier to read as it is very similar to SQL. If you don't need to change the query depending on a set of parameters this is probably the best choice.
Query Builder is an api to construct queries, so it's easier if you need to build a query dynamically like iterating over a set of parameters or filters. You don't need to do any string operations to build your query like join, split or whatever.
Query builder is just, lets say, interface to create query... It should be more comfortable to use, it does not have just add() method, but also methods like where(), andWhere(), from(), etc. But in the end, it just composes query like the one you use in the createQuery() method.
Example of more advanced use of query builder:
$em->createQueryBuilder()
->from('Project\Entities\Item', 'i')
->select("i, e")
->join("i.entity", 'e')
->where("i.lang = :lang AND e.album = :album")
->setParameter('lang', $lang)
->setParameter('album', $album);
They have different purposes:
DQL is easier to use when you know your full query.
Query builder is smarter when you have to build your query based on some conditions, loops etc.
The main difference is the overhead of calling the methods. Your first code sample (createQuery) just for simplicity makes one method call, while the the queryBuilder makes 4. At the end of everything, they come down to a string that has to be executed, first example you are giving it the string, and the other you are building it with multiple chained method calls.
If you are looking for a reason to use one over the other, that is a question of style, and what looks more readable. For me, I like the queryBuider most of the time, it provides well defined sections for the query. Also, in the past it makes it easier to add in conditional logic when you need it.
It might be easier to unit test when using the query builder. Let's say you have a repository that queries for some data basing on the complicated list of conditions. And you want to assure that if a particular condition is passed into the repository, some other conditions are added into the query. In case of DQL you have two options:
1) To use fixtures and test the real interaction with DB. Which I find somewhat troublesome and ununitestish.
2) To check the generated DQL code. Which can make your test too fragile.
With QueryBuilder, you can substitute it with mock and verify that "andWhere" method with needed parameter is called. Of course such considerations are not applicable if your query is simple and not depended on any parameters.