Cursor based pagination spec is inefficient

Cursor based pagination spec is inefficient - facebook-graph-api

I am looking cursor based pagination https://relay.dev/graphql/connections.htm#ApplyCursorsToEdges()
The ApplyCursorsToEdges function initializes edges to contain all edges.
However, if we are using a database with 1 billion entries in it, it will require us fetching 1 billion edges.
Even if these are column indexes, this is very inefficient.
How do people actually implement cursor based pagination, in a way that is not garbage?

Related

QT infinite view on model

I am looking for a way to create an infinite view on a model that is not initialized completely. I would like to create something similar to an Excel spreadsheet, and all I came in was to start with an initialized model (e.g. 100x100 empty cells, maybe working on a database that has empty values), and then just dynamically add next rows/columns (and update view) once we are close to the end of a scrollbar.
But I am wondering if it is the best solution - I think I would definitively benefit from a model that's filled in only partially - by that, I mean store information in the model only about filled cells, and let view handle showing 'empty cells' (which would have been created once we - for example - click them).
I know it would be necessary to store XY positions and cell data (instead of only a 2D container with data), but I would like to try different solutions:
a) have a pointer-like container which would contain a list of filled cells with their positions on a 2D plane
b) have a 2D container with size (x,y), where x and y would mean the 'last filled cell' in a given dimension
And for both solutions, I would like to dynamically allocate more place once data is written.
So there is my question - how can it be achieved with QT model/view programming, if it is even possible to show 'ghost cells' without a model filled with empty data? It would be also nice if I could get a brief explanation of how it is done in apps like excel etc.

Well, your table will never be truly infinite unless you implement some indexing with numbers with infinite digit count and in that case, you will probably not be able to use Qt classes.
But I think you should choose some big enough number to define the maximum. It can be a really large number... if you are on a 64-bit machine, then your 'infinite' table can have 9,223,372,036,854,775,807 rows and the same number of columns. This big number happens to be the maximum of signed 64-bit int. And int is used for indexing with QModelIndex in QAbstractItemModel. So you can have a total of 8.5070592e+37 cells in your two-dimensional 'Excel' table. If this table size is not big enough for you then I do not know what is. Just for comparison, there are approximately 7e+27 atoms in the average human body, maybe a bit more after the covid lockdowns because people were eating instead of practicing sports. :) So you can count all atoms of all people on this planet (say there are a bit less than 10e+10 people altogether). But you will need to buy a bit bigger computer for this task.
So if you decide to go this way, then you can easily override QAbstractTableModel and display it in QTableView. Of course, you cannot save the underlying data in a two-dimensional array because you do not have enough memory. But you have to choose some other method. For example a QHash<QPoint, QString> where QPoint will represent the coordinates and QString the value (you can choose any other type instead of a string of course). Then when you will want to get the value for the given coordinates, you just look up the value in the hash table. The number of data points you will be able to hold depends only on your memory size. This solution is very simple, I guess it will be some 30 rows of code, not more.

running out of memory plotting with Google Charts and very sparse data

We have very sparse data that we are attempting to plot with Google Charts. There are 16 different vectors and each has about 12,000 points. The points are times. All of the times are different. My reading of the API is that I need to create a row where each element corresponds to a different vector. So that's a set of 192,000, where the first element in each row is the time and all of the other elements are null except for the one that has data there, for a total of 3,072,000 elements. When we give this to Google Charts, the browser dies.
The problem with using arrayToDataTable is that our array is sparse. Likewise, arrayToDataTable doesn't work.
My question: is there a more efficient way to do this? Can I plot each data value independently, rather than all at the same time?

It turns out that the answer to this question is to do server-side data reduction in the form of binning. The individual rows each have their own timestamp, but because we are displaying this in a graph with a width of at most 2000 pixel, it makes sense to bin on the server into 2000 individual rows, each one with 16 columns. Then the total array is 32,000 elements, which appears well within the limits of the browser.

Clustering a list of dates

I have a list of dates I'd like to cluster into 3 clusters. Now, I can see hints that I should be looking at k-means, but all the examples I've found so far are related to coordinates, in other words, pairs of list items.
I want to take this list of dates and append them to three separate lists indicating whether they were before, during or after a certain event. I don't have the time for this event, but that's why I'm guessing it by breaking the date/times into three groups.
Can anyone please help with a simple example on how to use something like numpy or scipy to do this?

k-means is exclusively for coordinates. And more precisely: for continuous and linear values.
The reason is the mean functions. Many people overlook the role of the mean for k-means (despite it being in the name...)
On non-numerical data, how do you compute the mean?
There exist some variants for binary or categorial data. IIRC there is k-modes, for example, and there is k-medoids (PAM, partitioning around medoids).
It's unclear to me what you want to achieve overall... your data seems to be 1-dimensional, so you may want to look at the many questions here about 1-dimensional data (as the data can be sorted, it can be processed much more efficiently than multidimensional data).
In general, even if you projected your data into unix time (seconds since 1.1.1970), k-means will likely only return mediocre results for you. The reason is that it will try to make the three intervals have the same length.
Do you have any reason to suspect that "before", "during" and "after" have the same duration? If not, don't use k-means.
You may however want to have a look at KDE; and plot the estimated density. Once you have understood the role of density for your task, you can start looking at appropriate algorithms (e.g. take the derivative of your density estimation, and look for the largest increase / decrease, or estimate an "average" level, and look for the longest above-average interval).

Here are some workaround methods that may not be the best answer but should help.
You can plot the dates as converted durations from a starting date (such as one week)
and convert the dates to number representations for time in minutes or hours from the starting point.
These would all graph along an x-axis but Kmeans should still be possible and clustering still visible on a graph.
Here are more examples of numpy:Python k-means algorithm

An efficient way to draw many OpenGL points in individual Begin-End blocks?

If you begin to render points, render a ton of vertices, and then end, you get noticeably better performance than if you begin points, render a vertex, end, and repeat a ton of times (e.g., redraws during pan and zoom actions for, say, 200,000 points are MUCH smoother).
I guess this might make sense, but it's disappointing. Is there a way to get back the performance while still rendering each point in its own begin-end block?
BACKGROUND:
I wanted to design a control that could contain a ton (upwards of a million in an extreme case) of "objects" that each do its own rendering. Many of these objects will represent themselves as points.
If I let a hundred-thousand points individually render themselves in their own begin-end blocks, I get a major performance hit (as opposed to rendering them all in a single begin-end block). It thus seems I might have to make the container aware of the way the objects render themselves (for example, beginning points, telling everything that needs to render a point to do so, and then ending).
This messes up the independent nature of the display-object relationship I wanted. It also messes up hit testing by selection because I don't think you can add a name to a vertex inside a begin-end block of points, right?
FYI (in case this helps) my project will be displaying a 2D scene (using an ortho projection) and requires hit testing to determine which related object a user might click. In general, the objects will represent "tracks" containing individual points connected with lines. The position data is generally static, but point and track colors and display representations may change due to user settings and selection information. One exception--a "playback" mode may allow the user to see only one track point at a time (the "current" point in the playback) and step through time from one point to the next. However, even in that case I assumed I would simply change which point on each track is actually displayed (at its "static" location) depending on the current time in the playback. If any of that brings to mind further suggestions for an OpenGL newbie, then much thanks!

To solve this issue, I started by using VBOs (which did speed things up). I then allowed my "track" objects to each draw their own set of points as well as the lines connecting the points (each track using two DrawArrays: one for the line strip and one for the points). Each point does not have to draw itself independent of the other points--this was the major performance improvement.
BUT, I still needed hit-testing against the points, so..
Finally, I needed allowed each displayed object (in this case, the tracks) to do its own selection routine so each object can do what it needs for efficient selection. For tracks, they took a two-step process. First, a track names its entire line strip with one name (0) and performs the select. IF that results in a hit, then the track does a second render pass, naming each individual point and line segment to hit-test against each part of the track. This makes hit-testing against each point quite fast.
As an aside, I'm using .Net (C#) for my programming. With it, I created a class (SelectEventArgs) derived from EventArgs to describe selection criteria to objects being displayed. My SelectEventArgs class includes a list meant to be filled with selected objects. The display class then has an EventHandler<SelectEventArgs> event. When an object is added to a display, it subscribes to that event. When the event fires, each object determines whether it's been selected and fills the list of selected objects (in the SelectEventArgs passed in the event) with its information. After the event fires, I can access the list of objects returned in the SelectEventArgs instance to handle the user interaction. I find this to be a nice pattern for creating flexible display objects and selection tools.

Sorting objects by y value before rendering?

I have a 2D top-down 45 degree game like pokemon or Zelda. Since the y value of an object determines it's depth, objects need to be drawn in order of their y value. So when your standing behind a tree for example, the tree is drawn on top of your player to look like you are standing behind the tree.
My current design would be to draw a row of tiles, and then draw any players standing on that row, then draw the next row, and then draw any players standing on that. This way any tile that has a higher y value than the player is drawn in front of them to simulate depth.
However, my players are currently a std::vector of objects that is simply iterated and drawn after all the tiles are drawn. For my method to work, I would have to either iterate the vector for every row of tiles, and only render if they are on the current row, OR sort every player by y value somehow each frame. Both these methods seem quite CPU intensive, and maybe I am over thinking it and there is a simpler way of simulating depth in this type of game.
Edit:
This game is an MMORPG type game, so there could potentially be many players/NPC's walking around which is why I need a very efficient method.
And ideas or comments would be appreciated!
Thanks

You could use std::set or std::map instead of vector to keep your objects in sorted order. Unfortunately you wouldn't be able to simply modify their position, you would have to remove/insert each time you needed to change the y coordinate.

(Disgregard my original suggestion, if you read it; it was daft.)
As far as I can tell, you basically have to iterate over a sorted container each time you render a frame; there will be a computational penalty for this, and having to do a copy and sort each time will not be that bad (O(N log N) sorting time, I'd guess).
A clever data structure might help you here; an array containing a vector of game objects as each element, for example. Each element of the array represents a row of your tiled grid, and you iterate over your vector of objects and bin them into your depth buffer array (which will take approximately O(N) time). Then just iterate over the vector representing each row and draw each object in turn, again O(N).
There are probably cleverer z-buffering techniques, but I'll leave those as an exercise for the reader.

As long as you aren't also attempting to sort your players according to other criteria then you can get away with just sorting your players by y-coordinate every time you want to iterate through them. As long as you use a sort that runs in linear time when the input is mostly sorted then you will probably not even incur O(n log n) time for this step. I'm making the assumption here that your set of players changes slowly and that their y-coordinates will also change slowly. If this is the case then every time you need to sort the vector it will already be mostly-sorted, and something like Smooth sort or Tim sort will run in linear time.
For c++ it looks like you can find an implementation of Tim sort here.

If you have space, another option would be to create an array of lists of players, with the index of the array being the row number, and the array containing a collection of the players in that row.
As I mentioned, this will require a bit of extra memory, and some bookeeping every time a player moves to a different row, but then drawing is simply a matter of iterating through the rows, and drawing the players that are present in that row.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js