I am working on extracting data from an IFC file using ifcopenshell. Till now I have extracted the entities that are needed i.e I have extracted structural model from architectural model.
But now the main problem is that I want to obtain information from my IFC file. I want to ask questions from ifcopenshell like-
How many columns are there?
What is the total area of the structure?
What is the size of the column?
What is the loading on column?
These are some of the questions that i am expecting from ifcopenshell to answer. I need this information further for designing.
Any help that can point me right direction will really be appriciated.
It might not be possible to answer all those questions.
How many columns are there?
Query for IfcColumn and count the size of the result set. This might be inaccurate if objects that should be columns aren't typed as columns.
What is the total area of the structure?
This might be given as property to the complete model (if the creator of the file included that information). If not you can try to get an estimation based on model size. The third option would be to see if there are annotations/measurements or a 2D floor plan in the file and use those informations.
What is the size of the column?
Again look for properties. Model size might not be appropriate as it is based on what is displayed. For design different sizes/measurements are probably relevant (I am not an architect, so I don't know about building design).
What is the loading on column?
This information is probably only available through properties. A look at http://www.buildingsmart-tech.org/ifc/IFC4/Add2TC1/html/link/ifccolumn.htm at the standard properties doesn't reveal a property relating to your question - so either load information is included in a custom property by the program/creator of that IFC file or you need an external source. IfcOpenShell only does IFC file parsing and geometry rendering, no additional calculations.
Related
I've been experimenting with Apache Arrow. I have used the column oriented memory mapped files for many years. In the past, I've used a separate file for each column. Arrow seems to like to store everything in one file. Is there a way to add a new column without rewriting the entire file?
The short answer is probably no.
Arrow's in-memory format & libraries support this. You can add a chunked array to a table by just creating a new table (this should be zero-copy).
However, it appears you are talking about storing tables in files. None of the common file formats in use (parquet, csv, feather) support partitioning a table in this way.
Keep in mind, if you are reading a parquet file, you can specify which column(s) you want to read and it will only read the necessary data. So if your goal is only to support individual column retrieval/query then you can just build one large table with all your columns.
I need to determine the best approach to determine the structure of my Django app models at runtime based on the structure of an uploaded CSV file, which will then be held constant once the models are created in Django.
I have come across several questions relating to dynamically creating/altering Django models at run-time. The consensus was that this is bad practice and one should know before hand what the fields are.
I am creating a site where a user can upload a time-series based csv file with many columns representing sensor channels. The user must then be able to select a field to plot the corresponding data of that field. The data will be approximately 1 Billion rows.
Essentially, I am seeking to code in the following steps, but information is scarce and I have never done a job like this before:
User selects a CSV (or DAT) file.
The app then loads only the header row in (these files are > 4GB).
The header row is split by ",".
I use the results from 3 to create a table for each channel (columns), with the name of the field the same as the individual header entry for that specific channel.
I then load the corresponding data into the respective tables and I ahve my models for my app that will then not be changed again.
Another option I am considering is creating model with 10 fields, as I know there will never be more than 10 channels. Then reading my CSV into the table when a user loads a file, and just having those fields empty.
Has anyone had experience with similar applications?
That are allot of records, never worked with so many. For performance the fixed fields idea sounds best. If you use PostgreSQL you could look at the JSON field but don't know the impact on so many rows.
For flexible models you could use the EAV pattern but this works only for small data sets in my experience.
I'm trying to build a web application where users can upload a file (specifically the MDF file format) and view the data in forms of various charts. The files can contain any number of time based signals (various numeric data types) and users may name the signals wildly.
My thought on saving the data involves 2 steps:
Maintain a master table as an index, to save such meta information as file names, who uploaded it, when, etc. Records (rows) are added each time a new file is uploaded.
Create a new table (I'll refer to this as data tables) for each file uploaded, within the table each column will be one signal (first column being timestamps).
This brings the problem that I can't pre-define the Model for the data tables because the number, name, and datatype of the fields will differ among virtually all uploaded files.
I'm aware of some libs that help to build runtime dynamic models but they're all dated and questions about them on SO basically get zero answers. So despite the effort to make it work, I'm not even sure my approach is the optimal way to do what I want to do.
I also came across this Postgres specifc model field which can take nested arrays (which I believe fits the 2-D time based signals lists). In theory I could parse the raw uploaded file and construct such an array and basically save all the data in one field. Not knowing the limit of size of data, this could also be a nightmare for the queries later on, since to create the charts it usually takes only a few columns of signals at a time, compared to a total of up to hundreds of signals.
So my question is:
Is there a better way to organize the storage of data? And how?
Any insight is greatly appreciated!
If the name, number and datatypes of the fields will differ for each user, then you do not need an ORM. What you need is a query builder or SQL string composition like Psycopg. You will be programatically creating a table for each combination of user and uploaded file (if they are different) and programtically inserting the records.
Using postgresql might be a good choice, you might also create a GIN index on the arrays to speed up queries.
However, if you are primarily working with time-series data, then using a time-series database like InfluxDB, Prometheus makes more sense.
I could split the file contents up into separate search documents but then I would have to manually identify this in the results and only show one result to the user - otherwise it will look like there are 2 files that match their search when in fact there is only one.
Also the relevancy score would be incorrect. Any ideas?
So the response from AWS support was to split the files up into separate documents. In response to my concerns regarding relevancy scoring and multiple hits they said the following:
You do raise two very valid concerns here for your more challenging use case here. With regard to relevance, you face a very significant problem already in that is harder to establish a strong 'signal' and degrees of differentiation with large bodies of text. If the documents you have are much like reports or whitepapers, a potential workaround to this may be in indexing the first X number of characters (or the first identified paragraph) into a "thesis" field. This field could be weighted to better indicate what the document subject matter may be without manual review.
With regard to result duplication, this will require post-processing on your end if you wish to filter it. You can create a new field that can generate a unique "Parent" id that will be shared for each chunk of the whole document. The post-processing can check to see if this "Parent" id has already been return(the first result should be seen as most relevant), and if it has, filter the subsequent results. What is doubly useful in such a scenario, is that you include a refinement link into your results that could filter on all matches within that particular Parent id.
I have a problem I'm trying to solve but I'm at a stand still due to the fact that I'm in the process of learning Qt, which in turn is causing doubts as to what's the 'Qt' way of solving the problem. Whilst being the most efficient in term of time complexity. So I read a file line by line ( file qty ranging between 10-2000,000). At the moment my approach is to dump ever line to a QVector.
Qvector <QString> lines;
lines.append("id,name,type");
lines.append("1,James,A");
lines.append("2,Mark,B");
lines.append("3,Ryan,A");
Assuming the above structure I would like to give the user with three views that present the data based on the type field. The data is comma delimited in its original form. My question is what's the most elegant and possibly efficient way to achieve this ?
Note: For visual aid , the end result kind of emulates Microsoft access. So there will be the list of tables on the left side.In my case these table names will be the value of the grouping field (A,B). And when I switch between those two list items the central view (a table) will refill to contain the particular groups data.
Should I split the data into x amount of structures ? Or would that cause unnecessary overhead ?
Would really appreciate any help
In the end, you'll want to have some sort of a data model that implements QAbstractItemModel that exposes the data, and one or more views connected to it to display it.
If the data doesn't have to be editable, you could implement a custom table model derived from QAbstractTableModel that maps the file in memory (using QFile::map), and incrementally parses it on the fly (implement canFetchMore and fetchMore).
If the data is to be editable, you might be best off throwing it all into a temporary sqlite table as you parse the file, attaching a QSqlTableModel to it, and attaching some views to it.
When the user wants to save the changes, you simply iterate over the model and dump it out to a text file.