Django spatial calculations efficiency in PostGIS (RDS) vs. in object manager (EC2)?

Django spatial calculations efficiency in PostGIS (RDS) vs. in object manager (EC2)? - django

From this question, I'd like to decide whether I should use GeoDjango, or roll my own with Python to filter Points within a certain radius of another Point.
There are two excellent answers that take different approaches to the question of how to perform such a calculation here: Django sort by distance
One of them uses GeoDjango to perform the distance calculation in PostGIS. I'm guessing that the compute would be done on the RDS instance?
The other uses a custom manager to implement the Great Circle distance formula. The compute would obviously be done on the EC2 instance.
I would imagine that the PostGIS implementation is more efficient because it's likely that people much smarter than I have optimized it. To what extent have they optimized it? Is there anything special about their implementation?
Assuming I am correct in assuming GeoDjango performs the distance compute using PostGIS on the RDS instance, I would imagine that RDS is not suited for heavy compute tasks, and may end up being slower or more expensive in the end. Are my assumptions correct?
What if I don't need a precise distance, where an octaggon or even a square would suffice? In the case of a square, it would be simply a matter of filtering Points with latitude and longitude within a certain range. Is GeoDjango/PostGIS able to perform estimates like this?
If I do need a precise distance, I could calculate the furthest bounds that can be reached with the given radius, and only perform precise distance calculations on Points within those bounds. Does GeoDjango/PostGIS do this?

I'll try to address you questions:
One of them uses GeoDjango to perform the distance calculation in
PostGIS. I'm guessing that the compute would be done on the RDS
instance?
If you are bringing two django models to memory, and doing the calculation using Django, such as
model_a = Foo.objects.get(id=1)
model_b = Bar.objects.get(id=1)
distance = model_a.geometry.distance(model_b.geometry)
This will be done in Python, using GEOS.
https://docs.djangoproject.com/en/1.9/ref/contrib/gis/geos/#django.contrib.gis.geos.GEOSGeometry.distance
There are distance lookups on Django, such as
foos = Foo.objects.filter(geometry__distance_lte=(Point(0,0,srid=4326), km1))
This calculation will be done by the backend (aka database).
The other uses a custom manager to implement the Great Circle distance
formula. The compute would obviously be done on the EC2 instance.
I would imagine that the PostGIS implementation is more efficient because it's likely that people much smarter than I have optimized it.
To what extent have they optimized it? Is there anything special about
their implementation?
Django has methods to use GCD in queries. This requires a transformation on the PostGIS, if you geometry field, to geography fields. Only EPSG:4326 is supported for now. If that's all you need, I bet the PostGIS implementation is good enough for almost all applications (if not all).
Assuming I am correct in assuming GeoDjango performs the distance compute using PostGIS on the RDS instance, I would imagine that RDS is
not suited for heavy compute tasks, and may end up being slower or
more expensive in the end. Are my assumptions correct?
I don't know much about amazon products, but without an estimate of size (number of rows, types of calculations (cross-product, for example), etc), it's hard to help further.
What if I don't need a precise distance, where an octaggon or even a square would suffice? In the case of a square, it would be simply a
matter of filtering Points with latitude and longitude within a
certain range. Is GeoDjango/PostGIS able to perform estimates like
this?
What kind of data do you have? There are several components in calculating distances and areas, mainly the spatial reference that you use (datum, ellipsoid, projection).
IF you need to do accurate or more accurate distance measurements between two distance sides of the globe, the geography side is more precise and it will yield good results. If you need to do that kind of measurements in a Cartesian plane, your data will yield bad results.
If your data is local, like a few sq km, consider using a more local spatial reference. WGS84 4326 is more suitable for global data. Local spatial references can give you precise results, but in much smaller extents.
If I do need a precise distance, I could calculate the furthest bounds that can be reached with the given radius, and only perform
precise distance calculations on Points within those bounds. Does
GeoDjango/PostGIS do this?
I think you are optimizing too early. I know your question is a bit old, but this is something that you should only care when it starts to hurt. PostGIS and Django have been grinding a lot of data for a long time for me in a govn. system that checks land registry parcels and does tons of queries to check several parameters. It's working for a few years without a hitch.

Related

Algorithm for calculating the volume of the part of point cloud

I am taking part in the project studies associated with clouds of points.
We have to create a web application. Whose task will be displaying point cloud from .ply file. And then select an area and calculate its volume. The algorithm of counting the volume is to be implemented in C ++. The only things we have is a file in .ply format and file with the XYZ-coordinates of all points. The cloud of points we get, is generated from a picture taken by a drone. For example, it is a cloud of points representing a mountainous area . Our task is to be able to select such one mountain and calculate its approximate volume taking into account an error +/-. The measurement does not have to be perfect but it has to be even close to the real volume of mountain. The volume has to be calculated from the flat surface at the lowest point of the mountain.
I have two questions for you.
-First, could you give me a clue, link or anything that would help me to find such an algorithm and the reasons why he is the best.
-Second, do any of you have idea what would be the best way to select some area from the rendered point cloud?
I was looking for this information . But I can not find anything that would be useful enough to use it in our project. Any tip or a document on the subject would be very useful ;)

"Volume" is not a clearly defined concept for a point cloud. There are very many ways to determine a surface, and there is no single answer. It would depend very much on what constraints were given for defining the surface of the point cloud.
A very simplistic approach would be simply to use the minimum and maximum coordinate values on all three axes, thereby giving the volume of a right rectangular parallelepiped that encloses all the points.
A much more complex approach would involve computing a minimum convex envelope. That is a nontrivial problem.
It would get even harder if you were trying to find an envelope that was not necessarily convex.
In any case, it is important to pin down exactly what is meant by "volume" before you can craft an effective algorithm to compute it.

As you are working with pointclouds generated "from a picture taking by a drone" (I'm assuming here that you mean something like: photogrammetric process over drone imagery):
First:
Take a look at:
This
Or try to develop yourself some approach based on octrees.
If you go for developing your own approach, and you want it in c++, take a look at:
This
and This
Second:
I'm not sure if I understand the question, but looks obvius to me that the best way to select the area of interest in order to perfmor the calculation is through user's interaction (let the user select points arround the area and compute over the remaining points between).
Extra:
Just In case you didn't know it yet, I recommend CloudCompare to everyone who is working on something PointCloud-related.
Hope this links could help you.

Clustering a list of dates

I have a list of dates I'd like to cluster into 3 clusters. Now, I can see hints that I should be looking at k-means, but all the examples I've found so far are related to coordinates, in other words, pairs of list items.
I want to take this list of dates and append them to three separate lists indicating whether they were before, during or after a certain event. I don't have the time for this event, but that's why I'm guessing it by breaking the date/times into three groups.
Can anyone please help with a simple example on how to use something like numpy or scipy to do this?

k-means is exclusively for coordinates. And more precisely: for continuous and linear values.
The reason is the mean functions. Many people overlook the role of the mean for k-means (despite it being in the name...)
On non-numerical data, how do you compute the mean?
There exist some variants for binary or categorial data. IIRC there is k-modes, for example, and there is k-medoids (PAM, partitioning around medoids).
It's unclear to me what you want to achieve overall... your data seems to be 1-dimensional, so you may want to look at the many questions here about 1-dimensional data (as the data can be sorted, it can be processed much more efficiently than multidimensional data).
In general, even if you projected your data into unix time (seconds since 1.1.1970), k-means will likely only return mediocre results for you. The reason is that it will try to make the three intervals have the same length.
Do you have any reason to suspect that "before", "during" and "after" have the same duration? If not, don't use k-means.
You may however want to have a look at KDE; and plot the estimated density. Once you have understood the role of density for your task, you can start looking at appropriate algorithms (e.g. take the derivative of your density estimation, and look for the largest increase / decrease, or estimate an "average" level, and look for the longest above-average interval).

Here are some workaround methods that may not be the best answer but should help.
You can plot the dates as converted durations from a starting date (such as one week)
and convert the dates to number representations for time in minutes or hours from the starting point.
These would all graph along an x-axis but Kmeans should still be possible and clustering still visible on a graph.
Here are more examples of numpy:Python k-means algorithm

Spatial queries on AWS SimpleDB

I would like to know what people suggest as efficient ways of doing a spatial query in an Amazon Web Services SimpleDB?
By spatial query I mean finding objects in a given radius of a latitude and longitude.

SimpleDB doesn't currently offer any built-in spatial search operations but that doesn't mean it can't be done. There's several methods of implementing geospatial searches in non-geospatially aware databases such as SimpleDB and all of them center around the idea of using the database to retrieve a rough first selection based on a geospatial bounding box and then filtering the returned data in your application using more accurate algorithms such as the Haversine formula.
You could store the latitude and longitude as (zero-padded and normalized) numeric attributes and then perform a double range query (lat >= minLat and lat <= maxLat and lon >= minLat and lon <= maxLat) but since neither of theese predicates are selective (each predicate matches a lot of items) it's not ideal (see Tuning Queries).
A better way would be using GeoHashes.
Geohashes offer properties like arbitrary precision, similar prefixes
for nearby positions, and the possibility of gradually removing
characters from the end of the code to reduce its size (and gradually
lose precision).
As a practical example, the Geohash 6gkzwgjzn820 decodes to the
coordinates -25.382708 and -49.265506, while the Geohash 6gkzwgjz will
decode to -25.383 and -49.266, and if we take a similar position in
the same region, such as -25.427 and -49.315, we can see it being
encoded as 6gkzmg1w (note the similar prefix).
From http://geohash.org/site/tips.html
With your item positions as GeoHashes you could use the like operator to search for a bounding box (where GeoHash like '6gkzmg1w%') but since the like operator is expensive (Comparison Operators) a better way would be to denormalize the data by storing each GeoHash prefix level (how many depends on your required search precision) as a separate attribute (GeoHash6 GeoHash8 etc) and then use a simple equality predicate (where Geohash8 = '6gkzmg1w').
Now on to the downside of GeoHashes. Since you can't make any assumption of a GeoHash being centered within your search box you have to search all neighboring prefixes as well. The process is excellently described by geohash-js
Geohash also has the property that as the number of digits decreases
(from the right), accuracy degrades. This property can be used to do
bounding box searches, as points near to one another will share
similar Geohash prefixes.
However, because a given point may appear at the edge of a given
Geohash bounding box, it is necessary to generate a list of Geohash
values in order to perform a true proximity search around a point.
Because the Geohash algorithm uses a base-32 numbering system, it is
possible to derive the Geohash values surrounding any other given
Geohash value using a simple lookup table.
So, for example, 1600 Pennsylvania Avenue, Washington DC resolves to:
38.897, -77.036
Using the geohash algorithm, this latitude and longitude is converted
to: dqcjqcp84c6e
A simple bounding box around this point could be described by
truncating this geohash to: dqcjqc
However, 'dqcjqcp84c6e' is not centered inside 'dqcjqc', and searching
within 'dqcjqc' may miss some desired targets.
So instead, we can use the mathematical properties of the Geohash to
quickly calculate the neighbors of 'dqcjqc'; we find that they are:
'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8'
This gives us a bounding box around 'dqcjqcp84c6e' roughly 2km x 1.5km
and allows for a database search on just 9 keys: SELECT * FROM table
WHERE LEFT(geohash,6) IN ('dqcjqc',
'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8');
Translated to a SimpleDB query that'd be where GeoHash6 in('dqcjqc', 'dqcjqf', 'dqcjqb', 'dqcjr1', 'dqcjq9', 'dqcjqd', 'dqcjr4', 'dqcjr0', 'dqcjq8') and then you'll do your Haversine filtering on the results in order to only get the items that's within your search radius.

I'm going to leave this here because it might help you!
14 years ago we tried to do a geo lookup table of locations within a radius. There was obviously no geospatial indexes or anything like that.
There was literally only standard SQL and Oracle... anyway, we ended up converting all lat/lng into kilometers from a fixed plane field. Essentially what geospatial indexes do these days.
To explain what exactly it does, it turns the world into a flat surface and with a bit of SQL trickery you can even select by radius, you even get the distance from the two points you're selecting. Since it's also raw full integers the queries are blazing fast.
Here is a simple example in PHP and a very complex looking but pretty easy once you understand it SQL query:
https://gist.github.com/tobsn/899413

How to combine two images with different gains in Matlab/ C++

I have two images, both taken at the same time from the same detector.
Both images have 11 bit resolution (yes, its odd but that is the case here). The difference between the two images is that one image as been amplified by a factor of 1 and the other has been amplified by a factor of 10.
How can I take these two 11 bit images, and combine their pixel values to get a single 16 bit image? Basically, this increases the dynamic range of the final image.
I am fairly new to image processing. I know there is a solution for this, since other systems do this on the fly pixel-by-pixel in an FPGA. I was just hoping to be able to do this in Matlab post processing instead of live. I know doing bitwise operations in Matlab can be kinda difficult, but we do have an educational license with every toolbox available.
As mentioned below, this look an awful lot like HDR processing. The goal isn't artistic, rather data preservation. This is eventually going to be put in C++ and flown on an autonomous flight computer and running standard bloated HDR software on the fly would kill our timing requirements
Thanks for the help!
As a side note, I'd like to be able to do this for any combination of gains. ie 2x and 30x, 4x and 8x ect. In my gut I feel like this is a deceptively simple algorithm or interpolation, but I just don't know where to start.
Gains
Since there is some confusion on what the gains mean, I'll try to explain. The image sensor (CMOS) being used on our custom camera has the capability to simultaneously output two separate images, both taken from the same exposure. It can do this because the sensor has 2 different electrical amplifiers along its data path.
In photography terms, it would be like your DSLR being able to take a picture using 2 different ISO values at the same time.
Sorry for the confusion

The problems you pose is known as "High Dynamic Range Imaging" and "Tone Mapping". I suggest you start with those Wikipedia articles, then drill down to the bibliography cited therein.
You don't provide enough details about your imagery to give a more specific answer. What is the "gain" you mention? Did you crank up the sensor's gain (to what ISO-equivalent number?), or did you use a longer exposure time? Are the 11-bit pixel values linear or already gamma-compressed?

To upscale an 11bit range to a 16bit range multiple by (2^16-1)/(2^11-1).
(Assuming you want a linear scaling. (Which is reasonable when scaling up.)
If the gain was discrete (applied to the 11bit range), then you have two 11bit images which may have some values saturated.
If the gain was applied in a continuous (analog) or floating point range, then your values can go beyond the original 11bits. Also, if the gain was applied in a continuous (analog) or floating point range, the values were probably scaled to another range first e.g. [0,1] (by dividing by (2^11-1)).
If the values were scaled to another range, you will have to divide by the maximum of the new range instead of by (2^11-1).
Either way (whether gain was in 11bit range or not), due to the gain and due to the addtion, the resulting values may be large than the original range. In this case, you need to decide how you want to scale them:
Do you want to scale the original 11bit range to 16bit (possible causing saturation)?
If so multiple by multiple by (2^16-1)/(2^11-1)
Do you want to scale the maximum possible value to 2^16-1?
If so multiple by multiple by (2^16-1)/( (2^11-1) * (G1+G2) )
Do you want to scale the actual maximum value to 2^16-1?
If so multiple by multiple by (2^16-1)/(max(sum(I1+I2))
Edit:
Since you do not want to add the images, but rather use the different details in them, perhaps this article will help you:
Digital Photography with Flash and No-Flash Image Pairs

Finding the spread of each cluster from Kmeans

I'm trying to detect how well an input vector fits a given cluster centre. I can find the best match quite easily (the centre with the minimum euclidean distance to the input vector is the best), however, I now need to work how good a match that is.
To do this I need to find the spread (standard deviation?) of the vectors which build up the centroid, then see if the distance from my input vector to the centre is less than the spread. If it's more than the spread than I should be able to say that I have no clusters to fit it (given that the best doesn't fit the input vector well).
I'm not sure how to find the spread per cluster. I have all the centre vectors, and all the training vectors are labelled with their closest cluster, I just can't quite fathom exactly what I need to do to get the spread.
I hope that's clear? If not I'll try to reword it!
TIA
Ian

Use the distance function and calculate the distance from your center point to each labeled point, then figure out the mean of those distances. That should give you the standard deviation.

If you switch to using a different algorithm, such as Mixture of Gaussians, you get the spread (e.g., std. deviation) as part of the model (clustering result).
http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/mixture.html
http://en.wikipedia.org/wiki/Mixture_model

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js