save geotools query result in a shape file/csv file - shapefile

I have implemented GeoTools Query tutorial. Can someone tell me that how could I apply a query in two shape files at a time and save the results in a new shape file or a csv file.
or even can i save the query results of a single shape file as done in the Query Tutorial in a new shape file or csv file?

You can apply your Query to as many shapefiles as you like just loop through them and keep adding to the feature collection where you are storing the answers.
So in pseudocode:
collection = new List
for file in shapefiles:
collection.add(file.queryResult(query));
writeShapefile(collection);
You can actually use a normal Java ArrayList (or Set) to store the results in and then use a ListFeatureCollection to write them out.
FeatureCollection features = new ListFeatureCollection(schema, feats);
Caveat: All your shapefiles must have the same contents (or your Query must only extract the common attributes). See my answer to this question for an example of ShapeFile writing.

Related

Can I add a new column without rewriting an entire file?

I've been experimenting with Apache Arrow. I have used the column oriented memory mapped files for many years. In the past, I've used a separate file for each column. Arrow seems to like to store everything in one file. Is there a way to add a new column without rewriting the entire file?
The short answer is probably no.
Arrow's in-memory format & libraries support this. You can add a chunked array to a table by just creating a new table (this should be zero-copy).
However, it appears you are talking about storing tables in files. None of the common file formats in use (parquet, csv, feather) support partitioning a table in this way.
Keep in mind, if you are reading a parquet file, you can specify which column(s) you want to read and it will only read the necessary data. So if your goal is only to support individual column retrieval/query then you can just build one large table with all your columns.

Alternatives to dynamically creating model fields

I'm trying to build a web application where users can upload a file (specifically the MDF file format) and view the data in forms of various charts. The files can contain any number of time based signals (various numeric data types) and users may name the signals wildly.
My thought on saving the data involves 2 steps:
Maintain a master table as an index, to save such meta information as file names, who uploaded it, when, etc. Records (rows) are added each time a new file is uploaded.
Create a new table (I'll refer to this as data tables) for each file uploaded, within the table each column will be one signal (first column being timestamps).
This brings the problem that I can't pre-define the Model for the data tables because the number, name, and datatype of the fields will differ among virtually all uploaded files.
I'm aware of some libs that help to build runtime dynamic models but they're all dated and questions about them on SO basically get zero answers. So despite the effort to make it work, I'm not even sure my approach is the optimal way to do what I want to do.
I also came across this Postgres specifc model field which can take nested arrays (which I believe fits the 2-D time based signals lists). In theory I could parse the raw uploaded file and construct such an array and basically save all the data in one field. Not knowing the limit of size of data, this could also be a nightmare for the queries later on, since to create the charts it usually takes only a few columns of signals at a time, compared to a total of up to hundreds of signals.
So my question is:
Is there a better way to organize the storage of data? And how?
Any insight is greatly appreciated!
If the name, number and datatypes of the fields will differ for each user, then you do not need an ORM. What you need is a query builder or SQL string composition like Psycopg. You will be programatically creating a table for each combination of user and uploaded file (if they are different) and programtically inserting the records.
Using postgresql might be a good choice, you might also create a GIN index on the arrays to speed up queries.
However, if you are primarily working with time-series data, then using a time-series database like InfluxDB, Prometheus makes more sense.

To extract information from IFC file using ifcopenshell

I am working on extracting data from an IFC file using ifcopenshell. Till now I have extracted the entities that are needed i.e I have extracted structural model from architectural model.
But now the main problem is that I want to obtain information from my IFC file. I want to ask questions from ifcopenshell like-
How many columns are there?
What is the total area of the structure?
What is the size of the column?
What is the loading on column?
These are some of the questions that i am expecting from ifcopenshell to answer. I need this information further for designing.
Any help that can point me right direction will really be appriciated.
It might not be possible to answer all those questions.
How many columns are there?
Query for IfcColumn and count the size of the result set. This might be inaccurate if objects that should be columns aren't typed as columns.
What is the total area of the structure?
This might be given as property to the complete model (if the creator of the file included that information). If not you can try to get an estimation based on model size. The third option would be to see if there are annotations/measurements or a 2D floor plan in the file and use those informations.
What is the size of the column?
Again look for properties. Model size might not be appropriate as it is based on what is displayed. For design different sizes/measurements are probably relevant (I am not an architect, so I don't know about building design).
What is the loading on column?
This information is probably only available through properties. A look at http://www.buildingsmart-tech.org/ifc/IFC4/Add2TC1/html/link/ifccolumn.htm at the standard properties doesn't reveal a property relating to your question - so either load information is included in a custom property by the program/creator of that IFC file or you need an external source. IfcOpenShell only does IFC file parsing and geometry rendering, no additional calculations.

Qt splitting data structure into groups

I have a problem I'm trying to solve but I'm at a stand still due to the fact that I'm in the process of learning Qt, which in turn is causing doubts as to what's the 'Qt' way of solving the problem. Whilst being the most efficient in term of time complexity. So I read a file line by line ( file qty ranging between 10-2000,000). At the moment my approach is to dump ever line to a QVector.
Qvector <QString> lines;
lines.append("id,name,type");
lines.append("1,James,A");
lines.append("2,Mark,B");
lines.append("3,Ryan,A");
Assuming the above structure I would like to give the user with three views that present the data based on the type field. The data is comma delimited in its original form. My question is what's the most elegant and possibly efficient way to achieve this ?
Note: For visual aid , the end result kind of emulates Microsoft access. So there will be the list of tables on the left side.In my case these table names will be the value of the grouping field (A,B). And when I switch between those two list items the central view (a table) will refill to contain the particular groups data.
Should I split the data into x amount of structures ? Or would that cause unnecessary overhead ?
Would really appreciate any help
In the end, you'll want to have some sort of a data model that implements QAbstractItemModel that exposes the data, and one or more views connected to it to display it.
If the data doesn't have to be editable, you could implement a custom table model derived from QAbstractTableModel that maps the file in memory (using QFile::map), and incrementally parses it on the fly (implement canFetchMore and fetchMore).
If the data is to be editable, you might be best off throwing it all into a temporary sqlite table as you parse the file, attaching a QSqlTableModel to it, and attaching some views to it.
When the user wants to save the changes, you simply iterate over the model and dump it out to a text file.

How do I join huge csv files (1000's of columns x 1000's rows) efficiently using C/C++?

I have several (1-5) very wide (~50,000 columns) .csv files. The files are (.5GB-1GB) in size (avg. size around 500MB). I need to perform a join on the files on a pre-specified column. Efficiency is, of course, the key. Any solutions that can be scaled out to efficiently allow multiple join columns is a bonus, though not currently required. Here are my inputs:
-Primary File
-Secondary File(s)
-Join column of Primary File (name or col. position)
-Join column of Secondary File (name or col. position)
-Left Join or Inner Join?
Output = 1 File with results of the multi-file join
I am looking to solve the problem using a C-based language, but of course an algorithmic solution would also be very helpful.
Assuming that you have a good reason not to use a database (for all I know, the 50,000 columns may constitute such a reason), you probably have no choice but to clench your teeth and build yourself an index for the right file. Read through it sequentially to populate a hash table where each entry contains just the key column and an offset in the file where the entire row begins. The index itself then ought to fit comfortably in memory, and if you have enough address space (i.e. unless you're stuck with 32-bit addressing) you should memory-map the actual file data so you can access and output the appropriate right rows easily as you walk sequentially through the left file.
Your best bet by far is something like Sqlite, there's C++ bindings for it and it's tailor made for lighting fast inserts and queries.
For the actual reading of the data, you can just go row by row and insert the fields in Sqlite, no need for cache-destroying objects of objects :) As an optimization, you should group up multiple inserts in one statement (insert into table(...) select ... union all select ... union all select ...).
If you need to use C or C++, open the file and load the file directly into a database such as MySQL. The C and C++ languages do not have adequate data table structures nor functionality for manipulating the data. A spreadsheet application may be useful, but may not be able to handle the capacities.
That said, I recommend objects for each field (column). Define a record (file specific) as a collection of fields. Read a text line from a file into a string. Let the record load the field data from the string. Store records into a vector.
Create a new record for the destination file. For each record from the input file(s), load the new record using those fields. Finally, for each record, print the contents of each field with separation characters.
An alternative is to whip up a 2 dimensional matrix of strings.
Your performance bottleneck will be I/O. You may want to read huge blocks of data in. The thorn to the efficiency is the variable record length of a CSV file.
I still recommend using a database. There are plenty of free ones out there, such as MySQl.
It depends on what you mean by "join". Are the columns in file 1 the same as in file 2? If so, you just need a merge sort. Most likely a solution based on merge sort is "best". But I agree with #Blindy above that you should use an existing tool like Sqlite. Such a solution is probably more future proof against changes to the column lists.