custom reduce functions in crossfilter on 2 fields - mapreduce

My data looks like this
field1,field2,value1,value2
a,b,1,1
b,a,2,2
c,a,3,5
b,c,6,7
d,a,6,7
The ultimate goal is to get value1+value2 for each distinct value of field1 and field2 : {a:15(=1+2+5+7),b:9(=1+2+6),c:10(=3+7),d:6(=6)}
I don't have a good way of rearranging that data so let's assume the data has to stay like this.
Based on this previous question (Thanks #Gordon), I mapped using :
cf.dimension(function(d) { return [d.field1,d.field2]; }, true);
But I am left a bit puzzled as to how to write the custom reduce functions for my use case. Main question is : from within the reduceAdd and reduceRemove functions, how do I know which key is currently "worked on" ? i.e in my case, how do I know whether I'm supposed to take value1 or value2 into account in my sum ?
(have tagged dc.js and reductio because it could be useful for users of those libraries)

OK so I ended up doing the following for defining the group :
reduceAdd: (p, v) => {
if (!p.hasOwnProperty(v.field1)) {
p[v.field1] = 0;
}
if (!p.hasOwnProperty(v.field2)) {
p[v.field2] = 0;
}
p[v.field1] += +v.value1;
p[v.field2] += +v.value2;
return p;
}
reduceRemove: (p, v) => {
p[v.field1] -= +v.value1;
p[v.field2] -= +v.value2;
return p;
}
reduceInitial: () => {
return {}
}
And when you use the group in a chart, you just change the valueAccessor to be (d) => d.value[d.key] instead of the usual (d) => d.value
Small inefficicency as you store more data than you need to in the value fields but if you don't have millions of distinct values it's basically negligible.

you always have a good way to re-arrange the data, after you have fetched it and before you feed it to crossfilter ;)
In fact, it's pretty much mandatory as soon as you handle non string fields (numeric or date)
You can do a reduceSum on multiple fields
dimensions.reduceSum(function(d) {return +d.value1 + +d.value2; });

Related

Access individual elements of ChunkedArray by its index within column

What is the best method to randomly access individual elements ("Scalars") of arrow::ChunkedArray e.g. for testing and display purposes? Is there some equivalent method to Array::GetScalar which takes into account that the ChunkedArray consists of multiple chunks?
The way I found so far is an own helper function which "searches" the appropriate chunk like this (not tested):
std::shared_ptr<arrow::Scalar>
get_scalar(const std::shared_ptr<arrow::ChunkedArray>& chunked_array, int64_t index) {
auto it = chunked_array->chunks().begin();
while (index >= (*it)->length()) {
index -= (*it)->length();
++it;
}
auto result = (*it)->GetScalar(index);
if (!result.ok()) {
return nullptr;
}
return result.MoveValueUnsafe();
}
But I'm wondering if there is something like this already provided or there is a better practice to do so?
Similar applies to the arrow::stl::ArrayIterator - is there an equivalent iterator for Chunked::Array?

Google Sheets - If statement to populate cell

It feels like this should be really easy, but I keep getting errors related to circular logic.
Column C "Total" will always be entered by the user first. If user enters number in Column B "Variable" then Column A "Fixed" will be populated with C - B. If user enters number in Column A "Fixed", then Column B "Variable" will be populated with C - A.
https://docs.google.com/spreadsheets/d/1xBbU6A_MDK6fyLjdFUD7X06b7BQ1VhQ-FWQBET4cLso/edit?usp=sharing
You are trying to add formulas that will always need to rely on eachother to produce an output and as result of that, it will run into a Circualr Dependency error.
Possible solution:
Try using the "Iterative Calculation" option under File –> Spreadsheet Settings –> Calculation. You can see the description for Iterative Calculation here.
Here is one way to avoid circular references: do not hand enter any formulas, but use an onEdit() script to insert formulas programmatically only when necessary.
The following script will enter a formula in column B when column A is edited, and vice versa:
function onEdit(e) {
if (!e) {
throw new Error('Please do not run the script in the script editor window. It runs automatically when you hand edit the spreadsheet.');
}
const watchSheet = /^(Sheet1|Sheet2|Sheet3|Sheet4)$/i;
const watchColumns = [
{
colNumber: 1,
formula: '=C:C - B:B',
},
{
colNumber: 2,
formula: '=C:C - A:A',
},
];
const sheet = e.range.getSheet();
if (!sheet.getName().match(watchSheet)) {
return;
}
const editedColumn = watchColumns.filter(column => column.colNumber === e.range.columnStart)[0];
if (!editedColumn) {
return;
}
const updateColumns = watchColumns.filter(column => column.colNumber !== editedColumn.colNumber);
updateColumns.forEach(column => {
sheet
.getRange(e.range.rowStart, column.colNumber)
.setFormula(column.formula);
});
}

Prioritization of output by more than one factor in List

public Transform m_targetPos;
public List<Transform> l_targetList = new List<Transform>();
private void GetPriority()
{
l_targetList = l_targetList.OrderBy(x => Vector3.Distance(_path.m_start.position,
(x.transform == _PC.transform) ? x.transform.position + new Vector3(0, 10, 0) : x.transform.position))
.ToList();
m_targetPos = l_targetList[0];
}
Here I have a method to output a single Transform to store in m_targetPos to feed another method for other functions like targeting for AoE attack. As of now, it's sorted by a factor(distance from a point called _path.m_start.position) and that's the basic function I intended this to work.
However, how can I add another distinguishing factor from here?
Let me explain what I want:
There are two tag; tagA and tagB. If in List l_targetList objectAwithTagA has a distance of 10 and objectBwithTagB has a distance of 7 the method GetPriority() will store objectBwithTagB due to the distance. However, since tagA is prioritized by my intention I want GetPriority() to ignore(or compensate some amount) of the distance factor to prioritize object with tagA.
I feel like I totally blew the comprehension here.
Instead of just x => Vector3.Distance(...) do something along the lines of x => Vector3.Distance(...) * CalculateTagFactor(x). To have no effect from the tag, just have CalculateTagFactor return 1. To have it ignore anything with a tag, return float.NaN or float.PositiveInfinity.

How to make a readable return of data

I read this article http://www.slideshare.net/redigon/refactoring-1658371
on page 53 it states that "You have a method that returns a value but also changes the state of the object. create two methods,one for the query and one for the modification.
But what if on the query I need the values of more than 1 field.
For example:
QSqlQuery query(QSqlDatabase::database("MAIN"));
QString command = "SELECT FIELD1, FIELD2, FIELD3, FIELD4, FIELD5 FROM TABLE";
query.exec( command );
This is the method I know but I really feel that this is not that readable
QString values;
columnDelimiter = "[!##]";
rowDelimiter = "[$%^]";
while( query.next )
{
values += query.value(0).toString() + columnDelimiter;
values += query.value(1).toString() + columnDelimiter;
values += query.value(2).toString() + columnDelimiter;
values += query.value(3).toString() + columnDelimiter;
values += rowDelimiter;
}
And I will retrive it like this.
QStringList rowValues, columnValues;
rowValues = values.split(rowDelimiter);
int rowCtr =0;
while( rowCtr < rowValues.count() )
{
columnValues.clear();
// Here i got the fields I need
columnValues = rowValues.at( rowCtr ).split( columnDelimiter );
// I will put the modification on variables here
rowCtr++;
}
EDIT: Is there a more readable way of doing this?
"Is there a more readable way of doing this?" is a subjective question. I'm not sure whether your question will last long on SO, as SO prefers factual problems and solutions.
What I personally think will make your code more readable, would be:
Use a custom made data structure for your data set. Strings are not the right data structures for tabulated data. Lists of custom made structs are better.
Example:
// data structure for a single row
struct MyRow {
QString a, b, c;
}
...
QList<MyRow> myDataSet;
while( query.next )
{
MyRow currentRow;
// fill with data
currentRow.a = query.value(0).toString();
currentRow.b = query.value(1).toString();
...
myDataSet.append(currentRow);
}
I doubt all your data is text. Some is probably numbers. Never store numbers as strings. That's inefficient.
You first read all data into a data structure, and then read the data structure to process it. Why don't you combine the two? I.e. process while reading the data, in the same while(...)
In your comment, you're confused by the difference between an enum and struct. I suggest, stop doing complex database and QT stuff. Grab a basic C++ book and try to understand C++ first.

Better, or advantages in different ways of coding similar functions

I'm writing the code for a GUI (in C++), and right now I'm concerned with the organisation of text in lines. One of the problems I'm having is that the code is getting very long and confusing, and I'm starting to get into a n^2 scenario where for every option I add in for the texts presentation, the number of functions I have to write is the square of that. In trying to deal with this, A particular design choice has come up, and I don't know the better method, or the extent of the advantages or disadvantages between them:
I have two methods which are very similar in flow, i.e, iterate through the same objects, taking into account the same constraints, but ultimately perform different operations between this flow. For anyones interest, the methods render the text, and determine if any text overflows the line due to wrapping the text around other objects or simply the end of the line respectively.
These functions need to be copied and rewritten for left, right or centred text, which have different flow, so whatever design choice I make would be repeated three times.
Basically, I could continue what I have now, which is two separate methods to handle these different actions, or I could merge them into one function, which has if statements within it to determine whether or not to render the text or figure out if any text overflows.
Is there a generally accepted right way to going about this? Otherwise, what are the tradeoffs concerned, what are the signs that might indicate one way should be used over the other? Is there some other way of doing things I've missed?
I've edited through this a few times to try and make it more understandable, but if it isn't please ask me some questions so I can edit and explain. I can also post the source code of the two different methods, but they use a lot of functions and objects that would take too long to explain.
// EDIT: Source Code //
Function 1:
void GUITextLine::renderLeftShifted(const GUIRenderInfo& renderInfo) {
if(m_renderLines.empty())
return;
Uint iL = 0;
Array2t<float> renderCoords;
renderCoords.s_x = renderInfo.s_offset.s_x + m_renderLines[0].s_x;
renderCoords.s_y = renderInfo.s_offset.s_y + m_y;
float remainingPixelsInLine = m_renderLines[0].s_y;
for (Uint iTO= 0;iTO != m_text.size();++iTO)
{
if(m_text[iTO].s_pixelWidth <= remainingPixelsInLine)
{
string preview = m_text[iTO].s_string;
m_text[iTO].render(&renderCoords);
remainingPixelsInLine -= m_text[iTO].s_pixelWidth;
}
else
{
FSInternalGlyphData intData = m_text[iTO].stealFSFastFontInternalData();
float characterWidth = 0;
Uint iFirstCharacterOfRenderLine = 0;
for(Uint iC = 0;;++iC)
{
if(iC == m_text[iTO].s_string.size())
{
// wrap up
string renderPart = m_text[iTO].s_string;
renderPart.erase(iC, renderPart.size());
renderPart.erase(0, iFirstCharacterOfRenderLine);
m_text[iTO].s_font->renderString(renderPart.c_str(), intData,
&renderCoords);
break;
}
characterWidth += m_text[iTO].s_font->getWidthOfGlyph(intData,
m_text[iTO].s_string[iC]);
if(characterWidth > remainingPixelsInLine)
{
// Can't push in the last character
// No more space in this line
// First though, render what we already have:
string renderPart = m_text[iTO].s_string;
renderPart.erase(iC, renderPart.size());
renderPart.erase(0, iFirstCharacterOfRenderLine);
m_text[iTO].s_font->renderString(renderPart.c_str(), intData,
&renderCoords);
if(++iL != m_renderLines.size())
{
remainingPixelsInLine = m_renderLines[iL].s_y;
renderCoords.s_x = renderInfo.s_offset.s_x + m_renderLines[iL].s_x;
// Cool, so now try rendering this character again
--iC;
iFirstCharacterOfRenderLine = iC;
characterWidth = 0;
}
else
{
// Quit
break;
}
}
}
}
}
// Done! }
Function 2:
vector GUITextLine::recalculateWrappingContraints_LeftShift()
{
m_pixelsOfCharacters = 0;
float pixelsRemaining = m_renderLines[0].s_y;
Uint iRL = 0;
// Go through every text object, fiting them into render lines
for(Uint iTO = 0;iTO != m_text.size();++iTO)
{
// If an entire text object fits in a single line
if(pixelsRemaining >= m_text[iTO].s_pixelWidth)
{
pixelsRemaining -= m_text[iTO].s_pixelWidth;
m_pixelsOfCharacters += m_text[iTO].s_pixelWidth;
}
// Otherwise, character by character
else
{
// Get some data now we don't get it every function call
FSInternalGlyphData intData = m_text[iTO].stealFSFastFontInternalData();
for(Uint iC = 0; iC != m_text[iTO].s_string.size();++iC)
{
float characterWidth = m_text[iTO].s_font->getWidthOfGlyph(intData, '-');
if(characterWidth < pixelsRemaining)
{
pixelsRemaining -= characterWidth;
m_pixelsOfCharacters += characterWidth;
}
else // End of render line!
{
m_pixelsOfWrapperCharacters += pixelsRemaining; // we might track how much wrapping px we use
// If this is true, then we ran out of render lines before we ran out of text. Means we have some overflow to return
if(++iRL == m_renderLines.size())
{
return harvestOverflowFrom(iTO, iC);
}
else
{
pixelsRemaining = m_renderLines[iRL].s_y;
}
}
}
}
}
vector<GUIText> emptyOverflow;
return emptyOverflow; }
So basically, render() takes renderCoordinates as a parameter and gets from it the global position of where it needs to render from. calcWrappingConstraints figures out how much text in the object goes over the allocated space, and returns that text as a function.
m_renderLines is an std::vector of a two float structure, where .s_x = where rendering can start and .s_y = how large the space for rendering is - not, its essentially width of the 'renderLine', not where it ends.
m_text is an std::vector of GUIText objects, which contain a string of text, and some data, like style, colour, size ect. It also contains under s_font, a reference to a font object, which performs rendering, calculating the width of a glyph, ect.
Hopefully this clears things up.
There is no generally accepted way in this case.
However, common practice in any programming scenario is to remove duplicated code.
I think you're getting stuck on how to divide code by direction, when direction changes the outcome too much to make this division. In these cases, focus on the common portions of the three algorithms and divide them into tasks.
I did something similar when I duplicated WinForms flow layout control for MFC. I dealt with two types of objects: fixed positional (your pictures etc.) and auto positional (your words).
In the example you provided I can list out common portions of your example.
Write Line (direction)
bool TestPlaceWord (direction) // returns false if it cannot place word next to previous word
bool WrapPastObject (direction) // returns false if it runs out of line
bool WrapLine (direction) // returns false if it runs out of space for new line.
Each of these would be performed no matter what direction you are faced with.
Ultimately, the algorithm for each direction is just too different to simplify anymore than that.
How about an implementation of the Visitor Pattern? It sounds like it might be the kind of thing you are after.