There are not enough rules to produce a node with desired properties - apache-calcite

I would like to use calcite volcano planner to optimise a query. It doesn't work and return me the exception:
There are not enough rules to produce a node with desired properties: convention=NONE, sort=[]. All the inputs have relevant nodes, however the cost is still infinite.
Root: rel#18:RelSubset#4.NONE.[]
Original rel:
LogicalJoin(condition=[=($0, $3)], joinType=[inner]): rowcount = 81495.22499999999, cumulative cost = {686245.725 rows, 61452.0 cpu, 0.0 io}, id = 4
LogicalJoin(condition=[=($0, $1)], joinType=[inner]): rowcount = 543301.5, cumulative cost = {604750.5 rows, 61451.0 cpu, 0.0 io}, id = 2
LogicalTableScan(table=[[CALCITE_TEST, TTLA_ONE]]): rowcount = 59.0, cumulative cost = {59.0 rows, 60.0 cpu, 0.0 io}, id = 0
LogicalTableScan(table=[[CALCITE_TEST, TTLR_ONE]]): rowcount = 61390.0, cumulative cost = {61390.0 rows, 61391.0 cpu, 0.0 io}, id = 1
LogicalTableScan(table=[[CALCITE_TEST, EMPTY_T]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 io}, id = 3
This the code causing the issue:
val rootSchema = CalciteSchema.createRootSchema(true).plus
val schema = rootSchema.add("CALCITE_TEST", new AbstractSchema())
schema.add("TTLA_ONE", TableA())
schema.add("EMPTY_T", TableS())
schema.add("TTLR_ONE", TableR())
val config = Frameworks.newConfigBuilder.defaultSchema(schema).build
val builder = RelBuilder.create(config)
val opTree: RelNode = builder
.scan("TTLA_ONE")
.scan("TTLR_ONE")
.join(JoinRelType.INNER, "X")
.scan("EMPTY_T")
.join(JoinRelType.INNER, "X")
.build()
val rw = new RelWriterImpl(new PrintWriter(System.out, true))
opTree.explain(rw)
println()
val program = HepProgram.builder
.addRuleInstance(FilterJoinRule.FILTER_ON_JOIN).build
val hepPlanner = new HepPlanner(program)
hepPlanner.setRoot(opTree)
hepPlanner.findBestExp.explain(rw)
println()
val cluster = opTree.getCluster
val planner = cluster.getPlanner().asInstanceOf[VolcanoPlanner]
planner.setRoot(opTree)
// add rules
planner.addRule(PruneEmptyRules.PROJECT_INSTANCE)
// add ConverterRule
planner.addRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE)
planner.addRule(EnumerableRules.ENUMERABLE_SORT_RULE)
planner.addRule(EnumerableRules.ENUMERABLE_VALUES_RULE)
planner.addRule(EnumerableRules.ENUMERABLE_PROJECT_RULE)
planner.addRule(EnumerableRules.ENUMERABLE_FILTER_RULE)
planner.addRule(Bindables.BINDABLE_TABLE_SCAN_RULE)
val optimized = planner.findBestExp
optimized.explain(rw)
It produces the output belove:
4:LogicalJoin(condition=[=($0, $3)], joinType=[inner])
2:LogicalJoin(condition=[=($0, $1)], joinType=[inner])
0:LogicalTableScan(table=[[CALCITE_TEST, TTLA_ONE]])
1:LogicalTableScan(table=[[CALCITE_TEST, TTLR_ONE]])
3:LogicalTableScan(table=[[CALCITE_TEST, EMPTY_T]])
10:LogicalJoin(condition=[=($0, $3)], joinType=[inner])
7:LogicalJoin(condition=[=($0, $1)], joinType=[inner])
0:LogicalTableScan(table=[[CALCITE_TEST, TTLA_ONE]])
1:LogicalTableScan(table=[[CALCITE_TEST, TTLR_ONE]])
3:LogicalTableScan(table=[[CALCITE_TEST, EMPTY_T]])
There are not enough rules to produce a node with desired properties: convention=NONE, sort=[]. All the inputs have relevant nodes, however the cost is still infinite.
Root: rel#18:RelSubset#4.NONE.[]
I added some rules on the VolcanoPlanner, what can be the issue?

changed
val cluster = opTree.getCluster
val planner = cluster.getPlanner().asInstanceOf[VolcanoPlanner]
planner.setRoot(opTree)
to
val cluster = opTree.getCluster
val desiredTraits = cluster.traitSet.replace(EnumerableConvention.INSTANCE)
val planner = cluster.getPlanner.asInstanceOf[VolcanoPlanner]
val newRoot = planner.changeTraits(opTree, desiredTraits)
planner.setRoot(newRoot)
by introducing
val desiredTraits = cluster.traitSet.replace(EnumerableConvention.INSTANCE)
and creating a new root form the desiredTraits
val newRoot = planner.changeTraits(opTree, desiredTraits)
I also added some projections to the query, but that is not necessary for Volcano running well.
This is now the output:
6:LogicalProject(X=[$0], X0=[$2])
5:LogicalJoin(condition=[=($0, $2)], joinType=[inner])
3:LogicalProject(X=[$0], X0=[$1])
2:LogicalJoin(condition=[=($0, $1)], joinType=[inner])
0:LogicalTableScan(table=[[CALCITE_TEST, TTLA_ONE]])
1:LogicalTableScan(table=[[CALCITE_TEST, TTLR_ONE]])
4:LogicalTableScan(table=[[CALCITE_TEST, EMPTY_T]])
16:LogicalProject(X=[$0], X0=[$2])
14:LogicalJoin(condition=[=($0, $2)], joinType=[inner])
11:LogicalProject(X=[$0], X0=[$1])
9:LogicalJoin(condition=[=($0, $1)], joinType=[inner])
0:LogicalTableScan(table=[[CALCITE_TEST, TTLA_ONE]])
1:LogicalTableScan(table=[[CALCITE_TEST, TTLR_ONE]])
4:LogicalTableScan(table=[[CALCITE_TEST, EMPTY_T]])
103:EnumerableProject(X=[$2], X0=[$0])
102:EnumerableHashJoin(condition=[=($0, $2)], joinType=[inner])
52:EnumerableTableScan(table=[[CALCITE_TEST, EMPTY_T]])
101:EnumerableProject(X=[$0], X0=[$1])
100:EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
33:EnumerableTableScan(table=[[CALCITE_TEST, TTLA_ONE]])
37:EnumerableTableScan(table=[[CALCITE_TEST, TTLR_ONE]])

Related

How to set proper back test range

I do not know how to code and I am trying to learn Pinescript but it really makes no sense to me so i googled how to set a backtest range and used some code someone else wrote but it doesn't seem to be actually testing the area i would like, it tests the entirety of the chart. I'd like to test from 1/1/2018 to present. I'm trying to do this for multiple strategies so I can better tailor them to the current market. here is wat I have for one of them and if you are willing to help with the others I would very much appreciate it!!! feel free to DM me.
//#version=5
strategy("Bollinger Bands BACKTEST", overlay=true)
source = close
length = input.int(20, minval=1)
mult = input.float(2.0, minval=0.001, maxval=50)
basis = ta.sma(source, length)
dev = mult * ta.stdev(source, length)
upper = basis + dev
lower = basis - dev
buyEntry = ta.crossover(source, lower)
sellEntry = ta.crossunder(source, upper)
if (ta.crossover(source, lower))
strategy.entry("BBandLE", strategy.long, stop=lower, oca_name="BollingerBands", oca_type=strategy.oca.cancel, comment="BBandLE")
else
strategy.cancel(id="BBandLE")
if (ta.crossunder(source, upper))
strategy.entry("BBandSE", strategy.short, stop=upper, oca_name="BollingerBands", oca_type=strategy.oca.cancel, comment="BBandSE")
else
strategy.cancel(id="BBandSE")
//plot(strategy.equity, title="equity", color=color.red, linewidth=2, style=plot.style_areabr)
// === INPUT BACKTEST RANGE ===
fromMonth = input.int(defval = 1, title = "From Month", minval = 1, maxval = 12)
fromDay = input.int(defval = 1, title = "From Day", minval = 1, maxval = 31)
fromYear = input.int(defval = 2018, title = "From Year", minval = 1970)

Calculating results pro rata over several months with PowerQuery

I am currently stuck on below issue:
I have two tables that I have to work with, one contains financial information for vessels and the other contains arrival and departure time for vessels. I get my data combining multiple excel sheets from different folders:
financialTable
voyageTimeTable
I have to calculate the result for above voyage, and apportion the result over June, July and August for both estimated and updated.
Time in June : 4 hours (20/06/2020 20:00 - 23:59) + 10 days (21/06/2020 00:00 - 30/06/2020 23:59) = 10.1666
Time in July : 31 full days
Time in August: 1 day + 14 hours (02/08/2020 00:00 - 14:00) = 1.5833
Total voyage duration = 10.1666 + 31 + 1.5833 = 42.7499
The result for the "updated" financialItem would be the following:
Result June : 100*(10.1666/42.7499) = 23.7816
Result July : 100*(31/42.7499) = 72.5148
Result August : 100*(1.5833/42.7499) = 3.7036
sum = 100
and then for "estimated" it would be twice of everything above.
This is the format I ideally would like to get:
prorataResultTable
I have to do this for multiple vessels, with multiple timespans and several voyage numbers.
Eagerly awaiting responses, if any. Many thanks in advance.
Brds,
Not sure if you're still looking for an answer, but code below gives me your expected output:
let
financialTable = Table.FromRows({{"A", 1, "profit/loss", 200, 100}}, type table [vesselName = text, vesselNumber = Int64.Type, financialItem = text, estimated = number, updated = number]),
voyageTimeTable = Table.FromRows({{"A", 1, #datetime(2020, 6, 20, 20, 0, 0), #datetime(2020, 8, 2, 14, 0, 0)}}, type table [vesselName = text, vesselNumber = Int64.Type, voyageStartDatetime = datetime, voyageEndDatetime = datetime]),
joined =
let
joined = Table.NestedJoin(financialTable, {"vesselName", "vesselNumber"}, voyageTimeTable, {"vesselName", "vesselNumber"}, "$toExpand", JoinKind.LeftOuter),
expanded = Table.ExpandTableColumn(joined, "$toExpand", {"voyageStartDatetime", "voyageEndDatetime"})
in expanded,
toExpand = Table.AddColumn(joined, "$toExpand", (currentRow as record) =>
let
voyageInclusiveStart = DateTime.From(currentRow[voyageStartDatetime]),
voyageExclusiveEnd = DateTime.From(currentRow[voyageEndDatetime]),
voyageDurationInDays = Duration.TotalDays(voyageExclusiveEnd - voyageInclusiveStart),
createRecordForPeriod = (someInclusiveStart as datetime) => [
inclusiveStart = someInclusiveStart,
exclusiveEnd = List.Min({
DateTime.From(Date.EndOfMonth(DateTime.Date(someInclusiveStart)) + #duration(1, 0, 0, 0)),
voyageExclusiveEnd
}),
durationInDays = Duration.TotalDays(exclusiveEnd - inclusiveStart),
prorataDuration = durationInDays / voyageDurationInDays,
estimated = prorataDuration * currentRow[estimated],
updated = prorataDuration * currentRow[updated],
month = Date.MonthName(DateTime.Date(inclusiveStart)),
year = Date.Year(inclusiveStart)
],
monthlyRecords = List.Generate(
() => createRecordForPeriod(voyageInclusiveStart),
each [inclusiveStart] < voyageExclusiveEnd,
each createRecordForPeriod([exclusiveEnd])
),
toTable = Table.FromRecords(monthlyRecords)
in toTable
),
expanded =
let
dropped = Table.RemoveColumns(toExpand, {"estimated", "updated", "voyageStartDatetime", "voyageEndDatetime"}),
expanded = Table.ExpandTableColumn(dropped, "$toExpand", {"month", "year", "estimated", "updated"})
in expanded
in
expanded
The code tries to:
join financialTable and voyageTimeTable, so that for each vesselName and vesselNumber combination, we know: estimated, updated, voyageStartDatetime and voyageEndDatetime.
generate a list of months for the period between voyageStartDatetime and voyageEndDatetime (which get expanded into new table rows)
for each month (in the list), do all the arithmetic you mention in your question
get rid of some columns (like the old estimated and updated columns)
I recommend testing it with different vesselNames and vesselNumbers from your dataset, just to see if the output is always correct (I think it should be).
You should be able to manually inspect the cells in the $toExpand column (of the toExpand step/expression) to see the nested rows before they get expanded.

Google Sheets Search and Sum in two lists

I have a Google Sheets question I was hoping someone could help with.
I have a list of about 200 keywords which looks like the ones below:
**List 1**
Italy City trip
Italy Roundtrip
Italy Holiday
Hungary City trip
Czechia City trip
Croatia Montenegro Roundtrip
....
....
And I then have another list with jumbled keywords with around 1 million rows. The keywords in this list don't exactly match with the first list. What I need to do is search for the keywords in list 1 (above) in list 2 (below) and sum all corresponding cost values. As you can see in the list below the keywords from list 1 are in the second list but with other keywords around them. For example, I need a formula that will search for "Italy City trip" from list 1, in list 2 and sum the cost when that keyword occurs. In this case, it would be 6 total. Adding the cost of "Italy City trip April" and "Italy City trip June" together.
**List 2** Cost
Italy City trip April 1
Italy City trip June 5
Next week Italy Roundtrip 4
Italy Holiday next week 1
Hungary City holiday trip 9
....
....
I hope that makes sense.
Any help would be greatly appreciated
try:
=ARRAYFORMULA(QUERY({IFNA(REGEXEXTRACT(PROPER(C1:C),
TEXTJOIN("|", 1, SORT(PROPER(A1:A), 1, 0)))), D1:D},
"select Col1,sum(Col2)
where Col1 is not null
group by Col1
label sum(Col2)''", 0))
You want to establish whether keywords in one list (List#1) can be found in another list (List#2).
List#2 is 1,000,000 rows long, so I would recommend segmenting the list so that execution times are not exceeded. That's something you will be able to establish by trial and error.
The solution is to use the javascript method indexOf.
Paraphrasing from w3schools: indexOf() returns the position of the first occurrence of a specified value in a string. If the value is not found, it returns -1. So testing if (idx !=-1){ will only return List#1 values that were found in List#2. Note: The indexOf() method is case sensitive.
function so5864274503() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var srcname = "source";
var tgtname = "target";
var sourceSheet = ss.getSheetByName(srcname);
var targetSheet = ss.getSheetByName(tgtname);
// get the source list
var sourceLR = sourceSheet.getLastRow();
var srcData = sourceSheet.getRange(1,1,sourceLR).getValues();
//get the target list
var targetLR = targetSheet.getLastRow();
var tgtlist = targetSheet.getRange(1,1,targetLR,2).getValues();
var totalcostvalues = [];
// start looping through the keywords (list 1)
for (var s = 0;s<srcData.length;s++){
var totalcost = 0;
var value = srcData[s][0]
// start looping through the strings (List 2)
for (var i=0;i<tgtlist.length;i++){
// set cost to zero
var cumcost = 0;
// use indexOf to test if keyword is in the string
var idx = tgtlist[i][0].indexOf(value);
// value of -1 = no match, value >-1 indicates posuton in the string where the key word was found
if (idx !=-1){
var cost = tgtlist[i][1]
cumcost = cumcost + cost;
totalcost = totalcost+cost
}
}//end of loop - list2
//Logger.log("DEBUG: Summary: "+value+", totalcost = "+totalcost)
totalcostvalues.push([totalcost])
}// end of loop - list1
//Logger.log(totalcostvalues); //DEBUG
sourceSheet.getRange(1,2,sourceLR).setValues(totalcostvalues);
}
I also got this one, but it's case sensitive a bit
function myFunction() {
var ss = SpreadsheetApp.getActive();
var sheet1 = ss.getSheets()[0];
var sheet2 = ss.getSheets()[1];
var valuesSheet1 = sheet1.getRange(2,1, (sheet1.getLastRow()-1), sheet1.getLastColumn()).getValues();
var valuesCol1Sheet1 = valuesSheet1.map(function(r){return r[0]});
var valuesCol2Sheet1 = valuesSheet1.map(function(r){return r[1]});
Logger.log(valuesCol2Sheet1);
var valuesSheet2 = sheet2.getRange(2,1, (sheet2.getLastRow()-1)).getValues();
var valuesCol1Sheet2 = valuesSheet2.map(function(r){return r[0]});
for (var i = 0; i<= valuesCol1Sheet2.length-1; i++){
var price = 0;
valuesCol1Sheet1.forEach(function(elt,index){
var position = elt.toLowerCase().indexOf(valuesCol1Sheet2[i].toLowerCase());
if(position >-1){
price = price + valuesCol2Sheet1[index];
}
});
sheet2.getRange((i+2),2).setValue(price);
};
}

How to randomly generate an Oct-Tuple with SML

Edit: Here is the code I have so far for generating the Patient Oct-Tuples.
(thanks Anon for giving me the bost on how to calculate weighted probability/setting the seed)
fun genPatients(x:int) =
let
val seed=let
val m=Date.minute(Date.fromTimeLocal(Time.now()))
val s=Date.second(Date.fromTimeLocal(Time.now()))
in Random.rand(m,s)
end;
val survivalrate = ref(1)
val numl = ref(1)
val td = ref(1)
val xray = ref(false)
val count= ref(0)
val emnum= ref(1000)
val ageList = [1, 2, 3, 3];
val xrayList=[false,true];
val age = Random.randRange (0, 3) seed;(* random age*)
val nextInt1 = Random.randRange(0, 1)(* random xray*)
val r1 = Random.rand(1,1)
val nextInt2 = Random.randRange(1, 10000000)(* random td*)
val r2 = Random.rand(1,1)
val r1hold= ref(1);
in
while !count < x do
(
count:= !count + 1;
List.nth(ageList, age);
r1hold:= nextInt1 r1;
td:= nextInt2 r2;
(!emnum,age,survivalrate,numl,[],[],xray,td);
emnum:= !emnum + 1
)
end;
My question now is now how to go about indexing a boolean list?
So I was looking for some help defining my Oct-tuple to finish up my project and lo and behold I find someone posting the entirety of my project hoping for a handout answer. Not only that, but I'm almost certain we're in the same class, and you think posting this the night before the morning the project is due is what a responsible student does? Pretty sure nobody on SO is gonna do your homework for you anyway, in fact I'm not even sure it's allowed.
Maybe do some work and then ask for help when you've actually done anything. Or maybe in the next phase try a little harder.
EDIT: I'll give you something to get you started.
To calculate weighted probability you need a seed.
val seed=let
val m=Date.minute(Date.fromTimeLocal(Time.now()))
val s=Date.second(Date.fromTimeLocal(Time.now()))
in Random.rand(m,s)
end;
Here's one. Then you can calculate probability, at least for the age, like this:
val ageList = [1, 2, 3, 3];
val ageInt = Random.randRange (0, 3) seed;
List.nth(ageList, ageInt)
This was how I decided to do the weighted probability portion, you can equate this to the other weighted sections if you're creative. Good luck.

Pseudocode to Calculate average using MapReduce

Hi I want to write a MapReduce algorithm in pseudo code to solve the following problem:
Given input records in the following format:
address, zip, city, house_value,
please calculate the average house value for each zip code.
I would really appreciate if you could help me with this..
The easiest would be to use Apache Pig, here is an example of finding an average:
inpt = load 'data.txt' as (address:chararray, zip:chararray, city:chararray, house_value:long);
grp = group inpt by zip;
average = foreach grp generate FLATTEN(group) as (zip), AVG(inpt.house_value) as average_price;
dump average;
For Pseudo Map Reduce code you would need one MAPPER, COMBINER and a REDUCER
MAPPER(record):
zip_code_key = record['zip'];
value = {1, record['house_value']};
emit(zip_code_key, value);
COMBINER(zip_code_key, value_list):
record_num = 0;
value_sum = 0;
foreach (value : value_list) {
record_num += value[0];
value_sum += value[1];
}
value_out = {record_num, value_sum};
emit(zip_code_key, value_out);
REDUCER(zip_code_key, value_list):
record_num = 0;
value_sum = 0;
foreach (value : value_list) {
record_num += value[0];
value_sum += value[1];
}
avg = value_sum / record_num;
emit(zip_code_key, avg);