What is the most efficiency way to import an .STL file in c++? - c++

What is the most efficient strategy for parsing a .STL file?
A critical part of my code is importing a .STL file, (a common CAD file format) and this is limiting performance overall.
The .STL file format is summarized here- https://en.wikipedia.org/wiki/STL_(file_format)
Using ASCII format is required for this application.
The generic format is:
solid name
facet normal ni nj nk
outer loop
vertex v1x v1y v1z
vertex v2x v2y v2z
vertex v3x v3y v3z
endloop
endfacet
endsolid
However, I've noticed that there are no strict formatting requirements. And, the import function must do a minimal amount of error checking. I've done some performance measuring (using chrono) which for a 43,000 line file gives:
stl_import() - 1.177568 s
parsing loop - 3.894250 s
Parsing loop:
cout << "Importing " << stl_path << "... ";
auto file_vec = import_stl(stl_path);
for (auto& l : file_vec) {
trim(l);
if (solid_state) {
if (facet_state) {
if (starts_with(l, "vertex")) {
//---------ADD FACE----------//
l.erase(0, 6);
trim(l);
vector<string> strs;
split(strs, l, is_any_of(" "));
point p = { stod(strs[0]), stod(strs[1]), stod(strs[2]) };
facet_points.push_back(p);
//---------------------------//
}
else {
if (starts_with(l, "endfacet")) {
facet_state = false;
}
}
}
else {
if (starts_with(l, "facet")) {
facet_state = true;
//assert(facet_points.size() == 0);
//---------------------------//
// Normals can be ignored //
//---------------------------//
}
if (starts_with(l, "endsolid")) {
solid_state = false;
}
}
}
else {
if (starts_with(l, "solid")) {
solid_state = true;
}
}
if (facet_points.size() == 3) {
triangle facet(facet_points[0], facet_points[1], facet_points[2]);
stl_solid.add_facet(facet);
facet_points.clear();
//check normal
facet.normal();
}
}
The stl_import function is:
std::vector<std::string> import_stl(const std::string& file_path)
{
std::ifstream infile(file_path);
SkipBOM(infile);
std::vector<std::string> file_vec;
std::string line;
while (std::getline(infile, line))
{
file_vec.push_back(line);
}
return file_vec;
}
I have searched for ways to optimize file reading, etc. And, I see that using mmap may improve file read speed.
Fast textfile reading in c++
This question is an inquiry as to what the best parsing strategy for a .STL file is?

Without data which can be used for measuring where the time is spent it hard to determine what actually improves the performance. A decent library already doing the job may be the easiest approach. However, the current code uses a few approaches which may be easy wins to improve performance. There are things I spotted:
The streams library is quite good at skipping leading whitespace. Instead of first reading spaces followed by trimming them off, you may want to use std::getline(infile >> std::ws, line): the std::ws manipulator skips leading whitespaces.
Instead of using starts_with() with string literals, I'd rather read each line into a "command" and the tail of the line and compare the commands against std::string const objects: instead of a character comparison it may be sufficient to compare the size.
Instead of split()ing a std::string into a std::vector<std::string> on whitespace I'd rather reset a suitable stream (probably an std::istringstream but to prevent copying possibly a custom memory stream) and read directly from that:
std::istringstream in; // declared outside the reading loop
// ...
point p;
in.clear(); // get rid of potentially existing errors
in.str(line);
if (in >> p.x >> p.y >> p.z) {
facet_points.push_back(p);
}
This approach has the added advantage of allowing format checking: I always distrust any input received, even when it is from a trusted source.
If you insist in using adjusting the character sequence and/or splitting it into subsequences, I'd strongly recommend using std::string_view (or, in case this C++17 class isn't available a similar class) to avoid moving characters around.
Assuming the file is of a significant size, I'd recommend against reading the file into a std::vector<std::string> and then parsing it. Instead, I'd parse the file on the fly: this way the hot memory is immediately reused instead of moving it out of cache for later post-processing. This way dealing with an auxiliary stream (see point 3 above) can be avoided. To prevent an overly complex reading loop I'd split nested sections into appropriate functions, returning from them on closing tags. In addition I'd define input functions for structures like point to simply read them off the stream.
Depending on the system you are working on, you may want to call std::ios_base::sync_with_stdio(false) before reading the file: there used to be at least one often used implementation of streams which would benefit from this call.

Related

Is there a data structure for implementing a function equivalent to 'tail -n' command in C++?

I want to write a function equivalent to the Linux tail -n command in C++. While, I parsed over the data of that file line-by-line thereby incrementing the line count, if the file size gets really big(~gigabytes), this method will take a lot of time! Is there a better approach or a data structure to implement this function?
Here are my 2 methods:
int File::countlines()
{
int lineCount = 0;
string str;
if (file)
{
while (getline(file, str))
{
lineCount += 1;
}
}
return lineCount;
}
void File::printlines()
{
int lineCount = 0;
string line;
if (file)
{
lineCount = countlines();
file.clear();
file.seekg(ios::beg);
if (lineCount <= 10)
{
while (getline(file, line))
{
cout << line << endl;
}
}
else
{
int position = lineCount - 10;
while (position--)
{
getline(file, line);
}
while (getline(file, line))
{
cout << line << endl;
}
}
}
}
This method is time consuming if the file size increases, so I want to either replace it with another data structure, or write a more efficient code.
One of the things that is slowing down your program is reading the file twice, so you could keep the last n EOL positions (n=10 in your program) and the most convenient data structure is a circular buffer but this isn't provided by the standard library as far as I know (boost has one). It can be implemented by an std::vector with size n, with an index where a modulo of n is done after incrementing.
With that circular buffer, you can jump immediately to the lowest offset (next one if buffer is full) in the file and print the needed lines.
When I've done this, I've done a generous estimate of the maximum length of a line (e.g., one kilobyte), seeked to that distance from the end, and started reading lines into a circular buffer until the end of the file.
In nearly every case, you get more than n lines, so you just print out the contents of the circular buffer, and you're done. Note, however, that you do need to assure that you read more than n lines, not just n lines. The first line you read will usually only be a partial line, so if you read exactly n lines, the first would probably be only a partial line.
On rare occasion, you haven't gotten the required number of lines, so you seek back twice as far (or other factor of your choice), and restart. If you want to get really fancy, you can extrapolate the number of lines you'll need based on the average length of the lines you did read (but honestly, this is such a rare situation it's not worth a lot of work to optimize it).
This normally works essentially instantly, regardless of file size. I suppose (in theory) for a file with incredibly long lines, it would get slower, but if that's the case, the user has probably made a mistake, and tried to tail something that isn't a text file (which is generally useless anyway).

C++: How to read a lot of data from formatted text files into program?

I'm writing a CFD solver for specific fluid problems. So far the mesh is generated every time running the simulation, and when changing geometry and fluid properties,the program needs to be recompiled.
For small-sized problem with low number of cells, it works just fine. But for cases with over 1 million cells, and fluid properties needs to be changed very often, It is quite inefficient.
Obviously, we need to store simulation setup data in a config file, and geometry information in a formatted mesh file.
Simulation.config file
% Dimension: 2D or 3D
N_Dimension= 2
% Number of fluid phases
N_Phases= 1
% Fluid density (kg/m3)
Density_Phase1= 1000.0
Density_Phase2= 1.0
% Kinematic viscosity (m^2/s)
Viscosity_Phase1= 1e-6
Viscosity_Phase2= 1.48e-05
...
Geometry.mesh file
% Dimension: 2D or 3D
N_Dimension= 2
% Points (index: x, y, z)
N_Points= 100
x0 y0
x1 y1
...
x99 y99
% Faces (Lines in 2D: P1->p2)
N_Faces= 55
0 2
3 4
...
% Cells (polygons in 2D: Cell-Type and Points clock-wise). 6: triangle; 9: quad
N_Cells= 20
9 0 1 6 20
9 1 3 4 7
...
% Boundary Faces (index)
Left_Faces= 4
0
1
2
3
Bottom_Faces= 6
7
8
9
10
11
12
...
It's easy to write config and mesh information to formatted text files. The problem is, how do we read these data efficiently into program? I wonder if there is any easy-to-use c++ library to do this job.
Well, well
You can implement your own API based on a finite elements collection, a dictionary, some Regex and, after all, apply bet practice according to some international standard.
Or you can take a look on that:
GMSH_IO
OpenMesh:
I just used OpenMesh in my last implementation for C++ OpenGL project.
As a first-iteration solution to just get something tolerable - take #JosmarBarbosa's suggestion and use an established format for your kind of data - which also probably has free, open-source libraries for you to use. One example is OpenMesh developed at RWTH Aachen. It supports:
Representation of arbitrary polygonal (the general case) and pure triangle meshes (providing more efficient, specialized algorithms)
Explicit representation of vertices, halfedges, edges and faces.
Fast neighborhood access, especially the one-ring neighborhood (see below).
[Customization]
But if you really need to speed up your mesh data reading, consider doing the following:
Separate the limited-size meta-data from the larger, unlimited-size mesh data;
Place the limited-size meta-data in a separate file and read it whichever way you like, it doesn't matter.
Arrange the mesh data as several arrays of fixed-size elements or fixed-size structures (e.g. cells, faces, points, etc.).
Store each of the fixed-width arrays of mesh data in its own file - without using streaming individual values anywhere: Just read or write the array as-is, directly. Here's an example of how a read would look. Youll know the appropriate size of the read either by looking at the file size or the metadata.
Finally, you could avoid explicitly-reading altogether and use memory-mapping for each of the data files. See
fastest technique to read a file into memory?
Notes/caveats:
If you write and read binary data on systems with different memory layout of certain values (e.g. little-endian vs big-endian) - you'll need to shuffle the bytes around in memory. See also this SO question about endianness.
It might not be worth it to optimize the reading speed as much as possible. You should consider Amdahl's law, and only optimize it to a point where it's no longer a significant fraction of your overall execution time. It's better to lose a few percentage points of execution time, but get human-readable data files which can be used with other tools supporting an established format.
In the following answear I asume:
That if the first character of a line is % then it shall be ignored as a comment.
Any other line is structured exactly as follows: identifier= value.
The code I present will parse a config file following the mentioned assumptions correctly. This is the code (I hope that all needed explanation is in comments):
#include <fstream> //required for file IO
#include <iostream> //required for console IO
#include <unordered_map> //required for creating a hashtable to store the identifiers
int main()
{
std::unordered_map<std::string, double> identifiers;
std::string configPath;
std::cout << "Enter config path: ";
std::cin >> configPath;
std::ifstream config(configPath); //open the specified file
if (!config.is_open()) //error if failed to open file
{
std::cerr << "Cannot open config file!";
return -1;
}
std::string line;
while (std::getline(config, line)) //read each line of the file
{
if (line[0] == '%') //line is a comment
continue;
std::size_t identifierLenght = 0;
while (line[identifierLenght] != '=')
++identifierLenght;
identifiers.emplace(
line.substr(0, identifierLenght),
std::stod(line.substr(identifierLenght + 2))
); //add entry to identifiers
}
for (const auto& entry : identifiers)
std::cout << entry.first << " = " << entry.second << '\n';
}
After reading the identifiers you can, of course, do whatever you need to do with them. I just print them as an example to show how to fetch them. For more information about std::unordered_map look here. For a lot of very good information about making parsers have a look here instead.
If you want to make your program process input faster insert the following line at the beginning of main: std::ios_base::sync_with_stdio(false). This will desynchronize C++ IO with C IO and, in result, make it faster.
Assuming:
you don't want to use an existing format for meshes
you don't want to use a generic text format (json, yml, ...)
you don't want a binary format (even though you want something efficient)
In a nutshell, you really need your own text format.
You can use any parser generator to get started. While you could probably parse your config file as it is using only regexps, they can be really limited on the long run. So I'll suggest a context-free grammar parser, generated with Boost spirit::x3.
AST
The Abstract Syntax Tree will hold the final result of the parser.
#include <string>
#include <utility>
#include <vector>
#include <variant>
namespace AST {
using Identifier = std::string; // Variable name.
using Value = std::variant<int,double>; // Variable value.
using Assignment = std::pair<Identifier,Value>; // Identifier = Value.
using Root = std::vector<Assignment>; // Whole file: all assignments.
}
Parser
Grammar description:
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/home/x3.hpp>
namespace Parser {
using namespace x3;
// Line: Identifier = value
const x3::rule<class assignment, AST::Assignment> assignment = "assignment";
// Line: comment
const x3::rule<class comment> comment = "comment";
// Variable name
const x3::rule<class identifier, AST::Identifier> identifier = "identifier";
// File
const x3::rule<class root, AST::Root> root = "root";
// Any valid value in the config file
const x3::rule<class value, AST::Value> value = "value";
// Semantic action
auto emplace_back = [](const auto& ctx) {
x3::_val(ctx).emplace_back(x3::_attr(ctx));
};
// Grammar
const auto assignment_def = skip(blank)[identifier >> '=' >> value];
const auto comment_def = '%' >> omit[*(char_ - eol)];
const auto identifier_def = lexeme[alpha >> +(alnum | char_('_'))];
const auto root_def = *((comment | assignment[emplace_back]) >> eol) >> omit[*blank];
const auto value_def = double_ | int_;
BOOST_SPIRIT_DEFINE(root, assignment, comment, identifier, value);
}
Usage
// Takes iterators on string/stream...
// Returns the AST of the input.
template<typename IteratorType>
AST::Root parse(IteratorType& begin, const IteratorType& end) {
AST::Root result;
bool parsed = x3::parse(begin, end, Parser::root, result);
if (!parsed || begin != end) {
throw std::domain_error("Parser received an invalid input.");
}
return result;
}
Live demo
Evolutions
To change where blank spaces are allowed, add/move x3::skip(blank) in the xxxx_def expressions.
Currently the file must end with a newline. Rewriting the root_def expression can fix that.
You'll certainly want to know why the parsing failed on invalid inputs. See the error handling tutorial for that.
You're just a few rules away from parsing more complicated things:
// 100 X_n Y_n
const auto point_def = lit("N_Points") >> ':' >> int_ >> eol >> *(double_ >> double_ >> eol)
If you don't need specific text file format, but have a lot of data and do care about performance, I recommend using some existing data serialization frameworks instead.
E.g. Google protocol buffers allow efficient serialization and deserialization with very little code. The file is binary, so typically much smaller than text file, and binary serialization is much faster than parsing text. It also supports structured data (arrays, nested structs), data versioning, and other goodies.
https://developers.google.com/protocol-buffers/

C++ Read in file element by element, but executing functions every line

I have a file that I need to read in. Each line of the file is exceedingly long, so I'd rather not read each line to a temporary string and then manipulate those strings (unless this isn't actually inefficient - I could be wrong). Each line of the file contains a string of triplets - two numbers and a complex number, separated by a colon (as opposed to a comma, which is used in the complex number). My current code goes something like this:
while (states.eof() == 0)
{
std::istringstream complexString;
getline(states, tmp_str, ':');
tmp_triplet.row() = stoi(tmp_str);
getline(states, tmp_str, ':');
tmp_triplet.col() = stoi(tmp_str);
getline(states, tmp_str, ':');
complexString.str (tmp_str);
complexString >> tmp_triplet.value();
// Then something useful done with the triplet before moving onto the next one
}
tmp_triplet is a variable that stores these three numbers. I want some way to run a function every line (specifically, the triplets in every line are pushed into a vector, and each line in the file denotes a different vector). I'm sure there's an easy way to go about this, but I just want a way to check whether the end of the line has been reached, and to run a function when this is the case.
When trying to plan stuff out, abstraction can be your best friend. If you break down what you want to do by abstract functionality, you can more easily decide what data types should be used and how different data types should be planned out, and often you can find some functions almost write themselves. And typically, your code will be more modular (almost by definition), which will make it easy to reuse, maintain, and adapt if future changes are needed.
For example, it sounds like you want to parse a file. So that should be a function.
To do that function, you want to read in the file lines then process the file lines. So you can make two functions, one for each of those actions, and just call the functions.
To read in file lines you just want to take a file stream, and return a collection of strings for each line.
To process file lines you want to take a collection of strings and for each one parse the string into a triplet value. So you can create a method that takes a string and breaks it into a triplet, and just use that method here.
To process a string you just need to take a string and assign the first part as the row, the second part as the column, and the third part as the value.
struct TripletValue
{
int Row;
int Col;
int Val;
};
std::vector<TripletValue> ParseFile(std::istream& inputStream)
{
std::vector<std::string> fileLines = ReadFileLines(inputStream);
std::vector<TripletValue> parsedValues = GetValuesFromData(fileLines);
return parsedValues;
}
std::vector<std::string> ReadFileLines(std::istream& inputStream)
{
std::vector<std::string> fileLines;
while (!inputStream.eof())
{
std::string fileLine;
getline(inputStream, fileLine);
fileLines.push_back(fileLine);
}
return fileLines;
}
std::vector<TripletValue> GetValuesFromData(std::vector<std::string> data)
{
std::vector<TripletValue> values;
for (int i = 0; i < data.size(); i++)
{
TripletValue parsedValue = ParseLine(data[i]);
values.push_back(parsedValue);
}
return values;
}
TripletValue ParseLine(std::string fileLine)
{
std::stringstream sstream;
sstream << fileLine;
TripletValue parsedValue;
std::string strValue;
sstream >> strValue;
parsedValue.Row = stoi(strValue);
sstream >> strValue;
parsedValue.Col = stoi(strValue);
sstream >> strValue;
parsedValue.Val = stoi(strValue);
return parsedValue;
}

parse huge csv file with C++

I order to simulate my network I am using a trace file (csv file) with a size between 5 to 30 GB.
The csv file is a row based, where each row contains multiple fields delimited by a space and forming teh information to form a network packet:
3 53 4 12 1 1 2 6
Since the file's size could reach several GBs (millions of lines), is it better to divided it into small chunks myfile00.csv, myfile01.csv..., or I can process the entire file on the hard drive without being loaded into the memory?
I want to read the file line by line at a specific time, which is the clock cycle of the simulation, and get all information in the line to create an omnet++ message.
packet MyTrace::getpacket() {
int id; // first field
int cycle; // second field
int source; // third field
int destination; // fourth field
int numberofDep; // fifth field
std::list<int> listofDep; // remaining fields
if (!traceFile.is_open()) {
// get id
// get cycle
// ....
}
Any suggestion would be helpful.
EDIT:
string line;
ifstream myfile ("BlackSmall.csv");
int currentline=0 ;
if (myfile.is_open())
{
while (getline(myfile, line)) {
istringstream ss(line);
string request;
int id, cycle, source , dest, srcType, destType, packetSize, dependency;
int listdep;
std::list<int> dep;
ss >> id;
ss>> cycle;
ss>> source;
ss>> dest;
ss>>request;
ss>> srcType;
ss>> destType;
ss>> packetSize;
ss>> dependency;
while (ss >> listdep) dep.push_back(listdep);
// Create my packet
}
myfile.close();
}
else cout << "Unable to open file";
With the above code, I can get all information that I need from a line.
The problem is that I need to use this code inside a class, which when I call it returns just one line's information. Is there a way how to point to a specific line when I call this class?
It seems like your application seems to require a single sequential pass through the input, so processing a file that is 1GB or 100GB is perhaps just a matter of patience and perhaps parallelism.
The approach should be to translate records line-by-line. You should avoid strategies that attempt to read the entire file into memory. The STL offers the easy-to-use std::ifstream class with a built-in getline method, which returns a std::string containing the line to be converted.
If you are feeling more ambitious and want to control the amount of data read or buffered more carefully then you would not be the first developer to roll-your-own code to implement a buffered reader. This is a fairly empowering exercise and will help you think through some corner cases with reading partial lines and such. But in the end, it probably will not give you a significant boost toward your goal. I suspect the ifstream approach will get you up and running without the hassle and will not ultimately be the bottleneck in processing these files.
If you were really concerned about optimizing execution time then having multiple files might help you launch parallel processing tasks.
// define a class to hold your custom record
class Record {
};
// create a parser function to convert a line of text into the record
bool parse(std::string const &line, Record &record) {
}
// create a translator method to convert a record into the desired output
bool write(Record const &record, std::ofstream &os) {
}
// actually open input stream for the input file
std::ifstream is;
std::ofstream os;
std::string line;
while (std::getline(is,line)) {
Record record;
if (!parse(line,record)) break;
if (!write(record,os)) break;
}
You can re-use the Record instance by moving it outside the while loop so long as you are careful to reset the variable so that information from preceding records does not taint the current record. You can also dive head first into the C++ ecosystem by producing stream input and output operator ("<<",">>") but I personally find this approach to be more confusion than it is worth.
Perhaps best approach for you would be to import your CSV file into SQLite database.
Once you import it and add some indexes, you can easily and very efficiently query necessary rows from that database. SQLite has lots of ready-to-use C/C++ client libraries available, you can start with default one at https://www.sqlite.org/cintro.html.

Read a line of a file c++

I'm just trying to use the fstream library and I wanna read a given row.
I thought this, but I don't know if is the most efficient way.
#include <iostream>
#include <fstream>
using namespace std;
int main(){
int x;
fstream input2;
string line;
int countLine = 0;
input2.open("theinput.txt");
if(input2.is_open());
while(getline(input2,line)){
countLine++;
if (countLine==1){ //1 is the lane I want to read.
cout<<line<<endl;
}
}
}
}
Is there another way?
This does not appear to be the most efficient code, no.
In particular, you're currently reading the entire input file even though you only care about one line of the file. Unfortunately, doing a good job of skipping a line is somewhat difficult. Quite a few people recommend using code like:
your_stream.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
...for this job. This can work, but has a couple of shortcomings. First and foremost, if you try to use it on a non-text file (especially one that doesn't contain new-lines) it can waste inordinate amounts of time reading an entire huge file, long after you've read enough that you would normally realize that there must be a problem. For example, if you're reading a "line", that's a pretty good indication that you're expecting a text file, and you can pretty easily set a much lower limit on how long that first line could reasonably be, such as (say) a megabyte, and usually quite a lot less than that.
You also usually want to detect whether it stopped reading because it reached that maximum, or because it got to the end of the line. Skipping a line "succeeded" only if a new-line was encountered before reaching the specified maximum. To do that, you can use gcount() to compare against the maximum you specified. If you stopped reading because you reached the specified maximum, you typically want to stop processing that file (and log the error, print out an error message, etc.)
With that in mind, we might write code like this:
bool skip_line(std::istream &in) {
size_t max = 0xfffff;
in.ignore(max, '\n');
return in.gcount() < max;
}
Depending on the situation, you might prefer to pass the maximum line size as a parameter (probably with a default) instead:
bool skip_line(std::istream &in, size_t max = 0xfffff) {
// skip definition of `max`, remainder identical
With this, you can skip up to a megabyte by default, but if you want to specify a different maximum, you can do so quite easily.
Either way, with that defined, the remainder becomes fairly trivial, something like this:
int main(){
std::ifstream in("theinput.txt");
if (!skip_line(in)) {
std::cerr << "Error reading file\n";
return EXIT_FAILURE;
}
// copy the second line:
std::string line;
if (std::getline(in, line))
std::cout << line;
}
Of course, if you want to skip more than one line, you can do that pretty easily as well by putting the call to skip_line in a loop--but note that you still usually want to test the result from it, and break the loop (and log the error) if it fails. You don't usually want something like:
for (int i=0; i<lines_to_skip; i++)
skip_line(in);
With this, you'd lose one of the basic benefits of assuring that your input really is what you expected, and you're not producing garbage.
I think you can condense your code to this. if (input) is sufficient to check for failure.
#include <iostream>
#include <fstream>
#include <limits>
int main()
{
std::ifstream input("file.txt");
int row = 5;
int count = 0;
if (input)
{
while (count++ < row) input.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::string line;
std::getline(input, line);
std::cout << line;
}
}