Load partial shapefile into Postgis / GeoDjango project - django

I have a shapefile with Canadian postal codes, but am only looking to load a small subset of the data. I can load the entire data file and use SQL or Django queries to prune the data, but the load process takes about 2 hours on the slower machines I'm using.
As the data I'm actually after is about 10% of the dataset, this isn't a very efficient process.
I'm following the instructions in the Geodjango tutorial, specifically the following code:
from django.contrib.gis.utils import LayerMapping
from geoapp.models import TestGeo
mapping = {'name' : 'str', # The 'name' model field maps to the 'str' layer field.
'poly' : 'POLYGON', # For geometry fields use OGC name.
} # The mapping is a dictionary
lm = LayerMapping(TestGeo, 'test_poly.shp', mapping)
lm.save(verbose=True) # Save the layermap, imports the data.
Is there a way to only only import data with a particular name, as in the example above?
I'm limited to the Linux / OS X command line, so wouldn't be able to utilize any GUI tools.

Thanks for everyone here and on Postgis for their help, particularly ThomasG77 for this answer.
The following line did the trick:
ogr2ogr PostalCodes.shp CANmep.shp -sql "select * from CANmep where substr(postalcode,1,3) in ('M1C', 'M1R')"
ogr2ogr comes with GDAL. brew install gdal will install GDAL on OS X. If you're on another *nix system, the following installs it from source:
$ wget http://download.osgeo.org/gdal/gdal-1.9.2.tar.gz
$ tar xzf gdal-1.9.2.tar.gz
$ cd gdal-1.9.2
$ ./configure
$ make
$ sudo make install

If the required postal codes you need won't change for some time, try creating a shapefile of selected postal codes with QGIS. If you're not familiar with QGIS it's worth looking into. I use it to prepare the file for the webapp before uploading such as crs conversion, editing, attribute table and maybe simplify the geometry.
Theres plenty of tutorials and great help at gis.stackexchange
If you haven't done so already,take this question to gis.stackexchange.
hope this helps get you started, and feel free to ask for more info. I was new to django/geodjango not long ago and appreciated all of the help I received. Django is not for the faint of heart.
Michael

Related

How do I find ID to download ImageNet Subset?

I am new to ImageNet and would like to download full sized images of one of the subsets/synsets however I have found it incredibly difficult to actually find what subsets are available and where to find the ID code so I can download this.
All previous answers (from only 7 months ago) contain links which are now all invalid. Some seem to imply there is some sort of algorithm to making up an ID as it is linked to wordnet??
Essentially I would like a dataset of plastic or plastic waste or ideally marine debris. Any help on how to get the relevant ImageNet ID or suggestions on other datasets would be much much appreciated!!
I used this repo to achieve what you're looking for. Follow the following steps:
Create an account on Imagenet website
Once you get the permission, download the list of WordNet IDs for your task
Once you've the .txt file containing the WordNet IDs, you are all set to run main.py
As per your need, you can adjust the number of images per class
By default ImageNet images are automatically resized into 224x224. To remove that resizing, or implement other types of preprocessing, simply modify the code in line #40
Source: Refer this medium article for more details.
You can find all the 1000 classes of ImageNet here.
EDIT:
Above method doesn't work post March 2021. As per this update:
The new website is simpler; we removed tangential or outdated functions to focus on the core use case—enabling users to download the data, including the full ImageNet dataset and the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).
So with this, to parse and search imagenet now you may have to use nltk.
More recently, the organizers hosted a Kaggle challenge based on the original dataset with additional labels for object detection. To download the dataset you need to register a Kaggle account and join this challenge. Please note that by doing so, you agree to abide by the competition rules.
Please be aware that this file is very large (168 GB) and the download will take anywhere from minutes to days depending on your network connection.
Install the Kaggle CLI and set up credentials as per this guideline.
pip install kaggle
Then run these:
kaggle competitions download -c imagenet-object-localization-challenge
unzip imagenet-object-localization-challenge.zip -d <YOUR_FOLDER>
Additionally to understand ImageNet hierarchy refer this.

Can't find pvlib.pvsystem.Array

I am using pvlib-python to model a series of photovoltaic installations.
I have been running the normal pvlib-python procedural code just fine (as described in the intro tutorial.
I am now trying to extend my model to be able to cope with several arrays of panels in different directions etc, but connected to the same inverter. For this I though the easiest way would be to use pvlib.pvsystem.Array to create a list of Array-objects that I can then pass to the pvlib.pvsytem.PVSystem class (as described here).
My issue now is that I can't find pvsystem.Array at all? eg I'm just getting:
AttributeError: module 'pvlib.pvsystem' has no attribute 'Array'
when I try to create an instance of Array using:
from pvlib import pvsystem
module_parameters = {'pdc0': 5000, 'gamma_pdc': -0.004}
array_one = pvsystem.Array(module_parameters=module_parameters)
array_two = pvsystem.Array(module_parameters=module_parameters)
system_two_arrays = pvsystem.PVSystem(arrays=[array_one, array_two],
inverter_parameters=inverter_parameters)
as described in the examples in the PVSystem and Arrays page.
I am using pvlib-python=0.8.1, installed in my conda env using conda install -c conda-forge pvlib-python.
I am quite confused about this since I can obviously see all the documentation on pvsystem.Array on read-the-docs and see the source code on pvlib's github.
When I look at the code in my conda env it doesn't have Array under pvsystem (or if I list it using dir(pvlib.pvsystem)), so it is something wrong with the installation, but I simply can't figure out what. I've tried installing pvlib again and using different installation but always the same issue.
Am I missing something really obvious here?
Kind regards and thank you,
This feature is not present in the current stable version (8.1). If you want to use it already you could download the latest source as a zip file and install it, or clone the pvlib git repository on your computer.

Why does pm4py.view doesn't generate any image

For a business process discovery task, I am trying to generate a process model, following pm4py python library. Here's a sample code:
!pip install pm4py
import pm4py
log = pm4py.read_xes('/content/running-example.xes')
process_model, initial_marking, final_marking = pm4py.discover_petri_net_inductive(log)
pm4py.view_petri_net(process_model, initial_marking, final_marking, format="svg")
However, I get output as:
parsing log, completed traces :: 100%
6/6 [00:00<00:00, 121.77it/s]
But no image as is expected from the website: https://pm4py.fit.fraunhofer.de/getting-started-page#discovery
Being relatively new to the world of python, what I learnt from other coders' suggestions here on SO that always read in depth the source code in case of open source libraries.
Here is pm4py visual links:
https://github.com/pm4py/pm4py-core/blob/afee8b0932283b8f8f02dd2b6cc0968a1f1cc723/pm4py/visualization/process_tree/visualizer.py#L69
and specifically for my example:
https://github.com/pm4py/pm4py-core/blob/afee8b0932283b8f8f02dd2b6cc0968a1f1cc723/pm4py/vis.py#L17
But I am not able to figure out how to manipulate it.
Can someone please point out the problem to me and help me generate the views. Also, if anyone has done business process generations before, maybe if you could suggest me any libraries or techniques to analyse event-logs data it would be really helpful.
to visualize the process models mined in PM4Py, make sure that you have graphviz installed on your computer.
see https://pm4py.fit.fraunhofer.de/install for more information on this.

WEKA Incremental Learning?

The idea of incremental learning that i understand, is that after Training, i save my model and when i have new data, instead of training the old data with new one, i just load the model i have saved and train again using the new data and the new trained model would build on top of the old one.
I have searched for this in WEKA and i found that this can be done using "Incremental Algorithms". I know that Hoefdding-Tree is an incremental version of the J48 algorithm but i am not sure how do the incremental learning.
If anybody could explain if this is possible in WEKA and how it could be done.
In order to do incremental learning in WEKA, you have to choose classifiers that implement an UpdatableClassifer Interface. There are 10 classifiers that can do this. Note that this can only be done using either coding or command line.
You have to first build your model from training data, then save the model. After that you use the same model and train more.
Using HoefddingTree algorithm, it would be something like this:
java weka.classifiers.trees.HoeffdingTree -L 2 -S 0 -E 1.0E-7 -H 0.1 -M 0.01 -G 200.0 -N 0.0 -t Training.arff -no-cv -d ht.model
java weka.classifiers.trees.HoeffdingTree -t Training.arff -T Testing.arff -l ht.model -d ht.updated.model
of-course there is no need to specify the training parameter again when updating the model because these settings are already saved in the model.
For more information:
http://weka.8497.n7.nabble.com/WEKA-Incremental-Learning-Training-td35691.html
https://weka.wikispaces.com/Classification-Train/test%20set#Classification-Building a Classifier-Incremental

How to Convert a Maxmind .MMDB to .DAT?

How to convert MaxMinds MMDB GeoIP to DAT format so that I can use with modsecurity+Apache. Modsecurity supports only DAT format.
As of February 2019, the following Python script is the best option for converting GeoIP2 MMDB format to legacy .dat format:
https://github.com/sherpya/geolite2legacy
Using this script, somebody has done the conversion and made the resulting .dat files available for download:
https://www.miyuru.lk/geoiplegacy
The Legacy GeoIP builds (.dat) are not going away in the near future. If they do ever go away, you could build off of the .dat build program that Debian uses for its GeoLite databases (copy of it on GitHub) or this (untested) Python script.
Firstly, what I have to say to some here: You are required by MaxMind to update to new databases until 30 days after they get released (EULA point 4.c), so using old databases is actually not legit; also, the data from old databases is simply outdated (probably not valid anymore), so why use it in the first place?