Postgis : endless st_intersection - unit-testing

I have 2 layers of land use (2006 and 2018) of the Multipolygon type, and a grid of polygons of 100 * 100m of the Polygon type.
These 2 layers are cut on a circle of 10,000 m², are on the Lambert 93, and have a spatial index.
https://zupimages.net/viewer.php?id=20/49/dlkk.png
The 2006 land use layer contains 14,320 entries, the 2018 land cover approximately 13,000, and the grid 31,000.
In each cell of my grid (and in its strict hold), I want to cut out the land use elements, separately for the 2 different dates (to start). For this, I headed to st_intersection :
/* 1 */
select
grille.id as id_carr,
2006 as annee,
code,
st_intersection(occ_2006_test.geom, grille.geom) as geom
from grille, occ_2006_test
where st_intersects(occ_2006_test.geom, grille.geom);
/* 2 */
select
grille.id as id_carr,
2018 as annee,
code,
st_intersection(occ_2018_test.geom, grille.geom) as geom
from grille, occ_2018_test
where st_intersects(occ_2018_test.geom, grille.geom);
The first query, the one on the 2006 layer, lasts 12 seconds and returns 110,000 rows. The second, for some reason, still turns after 10 minutes ....
How to explain this difference in performance? (the link to access my diapers, if ever ... https://drive.google.com/file/d/1RC76v0Dm-iufw_mmA8EGaarO0ZaC4MJw/view?usp=sharing)
Thank you.

Related

How can I estimate the DC output of a solar plant consisting of multiple modules and inverters in PVLib?

I'm using the ModelChain class to estimate DC and AC values for a fictitious solar plant. Input parameters include module, inverter, number of strings, number of modules, number of inverters, albedo, PVGIS TMY data, etc. I apply simple math to calculate number of modules per string and number of strings per inverter then I create one PVSystem object consisting of a single PVArray, per inverter. The I run the ModelChain model for each inverter and, for simplicity, add up the AC output to estimate the total AC for all arrays like this:
for idx in range(0, num_of_inverters):
array = {
'name': f'pvsystem-{idx+1}-array',
'mount': mount,
'module': module_name,
'module_parameters': module_parameters,
'module_type': module_type,
'albedo': albedo,
'strings': strings_per_inverter,
'modules_per_string': modules_per_string,
'temperature_model_parameters': temperature_model_parameters,
}
pvsystem=pvlib.pvsystem.PVSystem(arrays=[pvlib.pvsystem.Array(**array)], inverter_parameters=inverter_parameters)
mc = pvlib.modelchain.ModelChain(pvsystem, location)
mc.run_model(tmy_weather)
total_ac += mc.results.ac.sum()
According to PVLib documentation, the AC output is yearly in Watts hour.
But now I need to get the DC output as well (yearly in Watts hours) so I can calculate the DC/AC ratio. Running mc.results.dc gives me a Dataframe with several values (columns) that are hard to grasp for a newbie like me:
i_sc : Short-circuit current (A)
i_mp : Current at the maximum-power point (A)
v_oc : Open-circuit voltage (V)
v_mp : Voltage at maximum-power point (V)
p_mp : Power at maximum-power point (W)
i_x : Current at module V = 0.5Voc, defines 4th point on I-V curve for modeling curve shape
i_xx : Current at module V = 0.5(Voc+Vmp), defines 5th point on I-V curve for modeling curve shape
I tried using p_mp and adding it up: mc.results.dc['p_mp'].sum() but the output is much bigger than the estimated AC. I usually expect the DC/AC ratio to be somewhere > 1 and <= 1.5, roughly. However, I'm getting DC values that are like 3-5 times bigger which probably means I'm doing something wrong.
Example: 1 string, 1 inverter, 10 modules per string:
Output (yearly):
AC: 869.61kW
DC: 3326.36kW
Ratio: 3.83
Any help is appreciated.
As for why the total DC and AC generation values are so different, it's because the inverter is way undersized for the array. The inverter is rated for 250 W maximum, which is not much more than what a single module produces at STC (calculate by Impo * Vmpo as below, or noticing the "220" in the module name), and you have ten modules total. So the inverter will be saturated at even very low light, and the total AC production will be severely curtailed as a result. I think if you make a plot (mc.results.ac.plot()) you will see that the daily inverter output curve is clipped at 250 W while the simulated DC power can be nearly 10x higher. It's always a good idea to plot your time series when things aren't making sense!
In [23]: pvlib.pvsystem.retrieve_sam('cecinverter')['ABB__MICRO_0_25_I_OUTD_US_208__208V_']['Paco']
Out[23]: 250.0
In [24]: pvlib.pvsystem.retrieve_sam('sandiamod')['Canadian_Solar_CS5P_220M___2009_'][['Impo', 'Vmpo']]
Out[24]:
Impo 4.54629
Vmpo 48.3156
Name: Canadian_Solar_CS5P_220M___2009_, dtype: object
A couple other notes:
Please be careful about units:
Summing (really, integrating) an hourly time series of power (Watts) produces energy (Watt-hours). An annual output in kW doesn't make sense, since kW is for power and power is an instantaneous rate of energy generation. If this is new to you, it might be helpful to think about speed vs distance: a car might be traveling at 60mph at any given time point, but the total distance it travels in a year is measured in miles, not mph. Power is energy per unit time just like speed is distance per unit time.
Summing voltages (num_of_inverters * mc.results.dc['v_mp'].sum()) makes no sense that I can see. Volt-hours doesn't seem like a useful unit to me outside of some very specialized power electronics engineering contexts.
The term "DC/AC ratio" is typically understood to mean the ratio of rated capacities, not annual productions. So for the example in your gist, the DC/AC ratio would be calculated as (220 W/module * 10 modules/string * 2 strings/inverter = 4400 W DC) / (250 W AC) = 17.6 (which is a crazy DC/AC ratio).

Linear interpolation of two vector arrays with different lengths

I have two curves. One handdrawn and one is a smoothed version of the handdrawn.
The data of each curve is stored in 2 seperate vector arrays.
Time Delta is also stored in the handdrawn curve vector, so i can replay the drawing process and so that it looks natural.
Now i need to transfer the Time Delta from Curve 1 (Raw input) to Curve 2 (already smoothed curve).
Sometimes the size of the first vector is larger and sometimes smaller than the second vector.
(Depends on the input draw speed)
So my question is: How do i fill vector PenSmoot.time with the correct values?
Case 1: Input vector is larger
PenInput.time[0] = 0 PenSmoot.time[0] = 0
PenInput.time[1] = 5 PenSmoot.time[1] = ?
PenInput.time[2] = 12 PenSmoot.time[2] = ?
PenInput.time[3] = 2 PenSmoot.time[3] = ?
PenInput.time[4] = 50 PenSmoot.time[4] = ?
PenInput.time[5] = 100
PenInput.time[6] = 20
PenInput.time[7] = 3
PenInput.time[8] = 9
PenInput.time[9] = 33
Case 2: Input vector is smaller
PenInput.time[0] = 0 PenSmoot.time[0] = 0
PenInput.time[1] = 5 PenSmoot.time[1] = ?
PenInput.time[2] = 12 PenSmoot.time[2] = ?
PenInput.time[3] = 2 PenSmoot.time[3] = ?
PenInput.time[4] = 50 PenSmoot.time[4] = ?
PenSmoot.time[5] = ?
PenSmoot.time[6] = ?
PenSmoot.time[7] = ?
PenSmoot.time[8] = ?
PenSmoot.time[9] = ?
Simplyfied representation:
PenInput holds the whole data of a drawn curve (Raw Input)
PenInput.x // X coordinate)
PenInput.y // Y coordinate)
PenInput.pressure // The pressure of the pen)
PenInput.timetotl // Total elapsed time)
PenInput.timepart // Time fragments)
PenSmoot holds the data of the massaged (smoothed,evenly distributed) curve of PenInput
PenSmoot.x // X coordinate)
PenSmoot.y // Y coordinate)
PenSmoot.pressure // Unknown - The pressure of the pen)
PenSmoot.timetotl // Unknown - Total elapsed time)
PenSmoot.timepart // Unknown - Time fragments)
This is the struct that i have.
struct Pencil
{
sf::VertexArray vertices;
std::vector<int> pressure;
std::vector<sf::Int32> timetotl;
std::vector<sf::Int32> timepart;
};
[This answer has been extensively revised based on editing to the question.]
Okay, it seems to me that you just about need to interpolate the time stamps in parallel with the points.
I'm going to guess that the incoming data is something on the order of an array of points (e.g., X, Y coordinates) and an array of time deltas with the same number of each, so time-delta N tells you the time it took to get from point N-1 to point N.
When you interpolate the points, you're probably going to want to do it intelligently. For example, in the shape shown in the question, we have what look like two nearly straight lines, one with positive slope, and the other with negative slope. According to the picture, that's composed of 263 points. We could reduce that to three points and still have a fairly reasonable representation of the original shape by choosing the two end-points plus one point where the two lines meet.
We probably don't need to go quite that far though. Especially taking time into account, we'd probably want to use at least 7 points for the output--one for each end-point of each colored segment. That would give us 6 straight line segments. Let's say those are at points 0, 30, 140, 180, 200, 250, and 263.
We'd then use exactly the same segmentation on the time deltas. Add up the deltas from 0 to 30 to get an average speed for the first segment. Add up the deltas for 31 through 140 to get an average speed for the second segment (and so on to the end).
Increasing the number of points works out roughly the same way. We need to look at exactly which input points were used to create a pair of output points. For a simplistic example, let's assume we produced output that was precisely double the number of input points. We'd then interpolate time deltas exactly halfway between each pair of input points.
In the case shown in the question, we start with unevenly distributed inputs, but produce evenly distributed outputs. So the second output point might be an average of the first four input points. The next output point might be an average of three input points (and so on). In many cases, it's likely that neither end-point of a segment in the output corresponds precisely to any point in the input.
That's fine too. We interpolate between two points of the input to figure out the time hack for the starting point of the output segment. Likewise for the ending point. Then we can compute the total time it should have taken to travel between them based on the time delta between the points.
If you want to get fancy, you could use a higher order interpolation instead of linear. That does require more input points per interpolation, but it looks like you probably have plenty to do something like a quadratic or cubic interpolation (in most cases). This is likely to make the most differences at transitions--places the "pen" was accelerating or decelerating quickly. In such an place, linear interpolation can give somewhat misleading results (though, given the number of points you seem to be working with, it may not make enough difference to notice).
As an illustration, let's consider a straight line. We're going to start from 5 input points, and produce 7 output points.
So, the input points are [0, 2, 7, 10, 15], and the associated time deltas are [0, 1, 4, 8, 3].
So, out total distance traveled is 16, and we want our output points to be evenly distributed. So, the distance between output points will be 16/7 = (roughly) 2.29.
So, obviously the first output point and time are both 0. The second output point is 2.29. To compute the output time, we take the entirety of the time to the first input point (0->2), plus .29 / (7-2) * (4-1). That interpolated section gives 1.37, so our first output time delta is 2.37.
The next output point should be at a distance of 4.58. Since the second input segment goes from 2 to 7, our entire second output segment will lie within the second input segment. So, we take 2.29 / (7-2), telling use that this output segment occupies .458 of the input segment. We then multiply that by the time for the second input segment to get the time delta for the second output segment: .458 * (4-1) = 1.374.
[...and it continues on the same way until we reach the end.]

The Concept of Bilinear Interpolation by Accessing 2-d array like 1-d array does

In 2-d array, there are pixels of bmp files. and its size is width(3*65536) * height(3*65536) of which I scaled.
It's like this.
1 2 3 4
5 6 7 8
9 10 11 12
Between 1 and 2, There are 2 holes as I enlarged the original 2-d array. ( multiply 3 )
I use 1-d array-like access method like this.
array[y* width + x]
index
0 1 2 3 4 5 6 7 8 9...
1 2 3 4 5 6 7 8 9 10 11 12
(this array is actually 2-d array and is scaled by multiplying 3)
now I can patch the hole like this solution.
In double for loop, in the condition (j%3==1)
Image[i*width+j] = Image[i*width+(j-1)]*(1-1/3) + Image[i*width+(j+2)]*(1-2/3)
In another condition ( j%3==2 )
Image[i*width+j] = Image[i*width+(j-2)]*(1-2/3) + Image[i*width+(j+1)]*(1-1/3)
This is the way I know I could patch the holes which is so called "Bilinear Interpolation".
I want to be sure about what I know before implementing this logic into my code. Thanks for reading.
Bi linear interpolation requires either 2 linear interpolation passes (horizontal and vertical) per interpolated pixel (well, some of them only require 1), or requires up to 4 source pixels per interpolated pixel.
Between 1 and 2 there are two holes. Between 1 and 5 there are 2 holes. Between 1 and 6 there are 4 holes. Your code, as written, could only patch holes between 1 and 2, not the other holes correctly.
In addition your division is integer division, and does not do what you want.
Generally you are far better off writing a r=interpolate_between(a,b,x,y) function, that interpolates between a and b at step x out of y. Then test and fix. Now scale your image horizontally using it, and check visually you got it right (especially the edges!)
Now try using it to scale vertically only.
Now do both horizontal, then vertical.
Next, write the bilinear version, which you can test again using the linear version three times (will be within rounding error). Then try to bilinear scale the image, checking visually.
Compare with the two-linear scale. It should differ only by rounding error.
At each of these stages you'll have a single "new" operation that can go wrong, with the previous code already validated.
Writing everything at once will lead to complex bug-ridden code.

OpenGL 2.1: Rebuffering sub-region in VBO

I have a terrain mesh stored in a VBO. The mesh is a grid composed of right triangles. In other words, it looks like a rectilinear grid with diagonals. The width and height of the mesh are known, so it's easy to calculate the vertex indices for a given XY or vice-versa.
The terrain mesh will be editable. My question concerns rebuffering the vertex data when the terrain is edited. I will be able to determine the rectangular region of vertices that are dirtied by any edit operation, so obviously I'd prefer to rebuffer only those and leave the rest alone.
The first thing that comes to mind is glBufferSubData. But I can't come up with a way to lay out my VBO such that glBufferSubData would only affect the dirty vertices. For example, suppose my mesh is 5 x 5 vertices. (It would actually be much larger; this is just an example.) Like this:
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
(Each number in the diagram above represents the vertex's offset from the start of the VBO.)
Suppose the 3 x 3 region in the center needs to be rebuffered. That means I want to hit vertices 6, 7, 8, 11, 12, 13, 16, 17, and 18. So I could call glBufferSubData starting at index 6 and ending at 18:
0 1 2 3 4
5 *6 *7 *8 *9
*10 *11 *12 *13 *14
*15 *16 *17 *18 19
20 21 22 23 24
(In the diagram above, the vertices marked with * are rebuffered.)
Notice that vertices 10, 14, and 15 are not dirty, and yet they get rebuffered, because they're in the range given to glBufferSubData. This strikes me as inefficient. For a large mesh, I'd be rebuffering way more data than I need to in most cases.
Is there a well-known solution to this problem? Should I call glBufferSubData once per row (which would solve the present problem, but would come with its own overhead)? Or is it standard just to buffer the full range and eat the cost of the unnecessary writing?
Also, terrain editing happens sometimes but not often. When it does, it will be animated, so the dirty vertices will have to be updated repeatedly while the animation is occurring. I'm thinking GL_DYNAMIC_DRAW would be good. Does this sound right?
You shall use buffer object mapping. You can access to buffer object as a memory array, indeed accessing sparse vertices. The driver (hopefully) will optimize it for you.
The use of GL_DYNAMIC_DRAW is correct.

Designing a grid overlay based on longitudes and latitudes

I'm trying to figure out the best way to approach the following:
Say I have a flat representation of the earth. I would like to create a grid that overlays this with each square on the grid corresponding to about 3 square kilometers. Each square would have a unique region id. This grid would just be stored in a database table that would have a region id and then probably the long/lat coordinates of the four corners of the region, right? Any suggestions on how to generate this table easily? I know I would first need to find out the width and height of this "flattened earth" in kms, calculate the number of regions, and then somehow assign the long/lats to each intersection of vertical/horizontal line; however, this sounds like a lot of manual work.
Secondly, once I have that grid table created, I need to design a fxn that takes a long/lat pair and then determines which logical "region" it is in. I'm not sure how to go about this.
Any help would be appreciated.
Thanks.
Assume the Earth is a sphere with radius R = 6371 km.
Start at (lat, long) = (0, 0) deg. Around the equator, 3km corresponds to a change in longitude of
dlong = 3 / (2 * pi * R) * 360
= 0.0269796482 degrees
If we walk around the equator and put a marker every 3km, there will be about (2 * pi * R) / 3 = 13343.3912 of them. "About" because it's your decision how to handle the extra 0.3912.
From (0, 0), we walk North 3 km to (lat, long) (0.0269796482, 0). We will walk around the Earth again on a path that is locally parallel to the first path we walked. Because it is a little closer to the N Pole, the radius of this circle is a bit smaller than that of the first circle we walked. Let's use lower case r for this radius
r = R * cos(lat)
= 6371 * cos(0.0269796482)
= 6 368.68141 km
We calculate dlong again using the smaller radius,
dlong = 3 / (2 * pi * r) * 360
= 0.0269894704 deg
We put down the second set of flags. This time there are about (2 * pi * r) / 3 = 13 338.5352 of them. There were 13,343 before, but now there are 13,338. What's that? five less.
How do we draw a ribbon of squares when there are five less corners in the top line? In fact, as we walked around the Earth, we'd find that we started off with pretty good squares, but that the shape of the regions sheared out into pretty extreme parallelograms.
We need a different strategy that gives us the same number of corners above and below. If the lower boundary (SW-SE) is 3 km long, then the top should be a little shorter, to make a ribbon of trapeziums.
There are many ways to craft a compromise that approximates your ideal square grid. This wikipedia article on map projections that preserve a metric property, links to several dozen such strategies.
The specifics of your app may allow you to simplify things considerably, especially if you don't really need to map the entire globe.
Microsoft has been investing in spatial data types in their SQL Server 2008 offering. It could help you out here. Because it has data types to represent your flattened earth regions, operators to determine when a set of coordinates is inside a geometry, etc. Even if you choose not to use this, consider checking out the following links. The second one in particular has a lot of good background information on the problem and a discussion on some of the industry standard data formats for spatial data.
http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx
http://jasonfollas.com/blog/archive/2008/03/14/sql-server-2008-spatial-data-part-1.aspx
First, Paul is right. Unfortunately the earth is round which really complicates the heck out of this stuff.
I created a grid similar to this for a topographical mapping server many years ago. I just recoreded the coordinates of the upper left coder of each region. I also used UTM coordinates instead of lat/long. If you know that each region covers 3 square kilometers and since UTM is based on meters, it is straight forward to do a range query to discover the right region.
You do realize that because the earth is a sphere that "3 square km" is going to be a different number of degrees near the poles than near the equator, right? And that at the top and bottom of the map your grid squares will actually represent pie-shaped parts of the world, right?
I've done something similar with my database - I've broken it up into quad cells. So what I did was divide the earth into four quarters (-180,-90)-(0,0), (-180,0)-(0,90) and so on. As I added point entities to my database, if the "cell" got more than X entries, I split the cell into 4. That means that in areas of the world with lots of point entities, I have a lot of quad cells, but in other parts of the world I have very few.
My database for the quad tree looks like:
\d areaids;
Table "public.areaids"
Column | Type | Modifiers
--------------+-----------------------------+-----------
areaid | integer | not null
supercededon | timestamp without time zone |
supercedes | integer |
numpoints | integer | not null
rectangle | geometry |
Indexes:
"areaids_pk" PRIMARY KEY, btree (areaid)
"areaids_rect_idx" gist (rectangle)
Check constraints:
"enforce_dims_rectangle" CHECK (ndims(rectangle) = 2)
"enforce_geotype_rectangle" CHECK (geometrytype(rectangle) = 'POLYGON'::text OR rectangle IS NULL)
"enforce_srid_rectangle" CHECK (srid(rectangle) = 4326)
I'm using PostGIS to help find points in a cell. If I look at a cell, I can tell if it's been split because supercededon is not null. I can find its children by looking for ones that have supercedes equal to its id. And I can dig down from top to bottom until I find the ones that cover the area I'm concerned about by looking for ones with supercedeson null and whose rectangle overlaps my area of interest (using the PostGIS '&' operator).
There's no way you'll be able to do this with rectangular cells, but I've just finished an R package dggridR which would make this easy to do using a grid of hexagonal cells. However, the 3km cell requirement might yield so many cells as to overload your machine.
You can use R to generate the grid:
install.packages('devtools')
install.packages('rgdal')
library(devtools)
devools.install_github('r-barnes/dggridR')
library(dggridR)
library(rgdal)
#Construct a discrete global grid (geodesic) with cells of ~3 km^2
dggs <- dgconstruct(area=100000, metric=FALSE, resround='nearest')
#Get a hexagonal grid for the whole earth based on this dggs
grid <- dgearthgrid(dggs,frame=FALSE)
#Save the grid
writeOGR(grid, "grid_3km_cells.kml", "cells", "KML")
The KML file then contains the ids and edge vertex coordinates of every cell.
The grid looks a little like this:
My package is based on Kevin Sahr's DGGRID which can generate this same grid to KML directly, though you'll need to figure out how to compile it yourself.