searching for hundreds of patterns in huge Logfiles

searching for hundreds of patterns in huge Logfiles - c++

I have to get lots of filenames from inside a webserver's htdocs directory and then take this list of filenames to search a huge amount of archived logfiles for last access on these files.
I plan to do this in C++ with Boost. I would take newest log first and read it backwards checking every single line for all of the filenames I got.
If a filename matches, I read the Time from Logstring and save it's last access. Now I don't need to look for this file any more as I only want to know last access.
The vector of filenames to search for should rapidly decrease.
I wonder how I can handle this kind of problem with multiple threads most effective.
Do I partition the Logfiles and let every thread search a part of the logs from memory and if a thread has a match it removes this filename from the filenames vector or is there a more effective way to do this?

Try using mmap, it will save you considerable hair loss. I was feeling expeditious and in some odd mood to recall my mmap knowledge, so I wrote a simple thing to get you started. Hope this helps!
The beauty of mmap is that it can be easily parallelized with OpenMP. It's also a really good way to prevent an I/O bottleneck. Let me first define the Logfile class and then I'll go over implementation.
Here's the header file (logfile.h)
#ifndef _LOGFILE_H_
#define _LOGFILE_H_
#include <iostream>
#include <fcntl.h>
#include <stdio.h>
#include <string>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
using std::string;
class Logfile {
public:
Logfile(string title);
char* open();
unsigned int get_size() const;
string get_name() const;
bool close();
private:
string name;
char* start;
unsigned int size;
int file_descriptor;
};
#endif
And here's the .cpp file.
#include <iostream>
#include "logfile.h"
using namespace std;
Logfile::Logfile(string name){
this->name = name;
start = NULL;
size = 0;
file_descriptor = -1;
}
char* Logfile::open(){
// get file size
struct stat st;
stat(title.c_str(), &st);
size = st.st_size;
// get file descriptor
file_descriptor = open(title.c_str(), O_RDONLY);
if(file_descriptor < 0){
cerr << "Error obtaining file descriptor for: " << title.c_str() << endl;
return NULL;
}
// memory map part
start = (char*) mmap(NULL, size, PROT_READ, MAP_SHARED, file_descriptor, 0);
if(start == NULL){
cerr << "Error memory-mapping the file\n";
close(file_descriptor);
return NULL;
}
return start;
}
unsigned int Logfile::get_size() const {
return size;
}
string Logfile::get_title() const {
return title;
}
bool Logfile::close(){
if( start == NULL){
cerr << "Error closing file. Was closetext() called without a matching opentext() ?\n";
return false;
}
// unmap memory and close file
bool ret = munmap(start, size) != -1 && close(file_descriptor) != -1;
start = NULL;
return ret;
}
Now, using this code, you can use OpenMP to work-share the parsing of these logfiles, i.e.
Logfile lf ("yourfile");
char * log = lf.open();
int size = (int) lf.get_size();
#pragma omp parallel shared(log, size) private(i)
{
#pragma omp for
for (i = 0 ; i < size ; i++) {
// do your routine
}
#pragma omp critical
// some methods that combine the thread results
}

Parsing the logfile into a database table (SQLite ftw). One of the fields will be the path.
In another table, add the files you are looking for.
Now it is a simple join on a derived table. Something like this.
SELECT l.file, l.last_access FROM toFind f
LEFT JOIN (
SELECT file, max(last_access) as last_access from logs group by file
) as l ON f.file = l.file
All the files in toFind will be there, and will have last_access NULL for those not found in the logs.

Ok this is some days ago already but I spent some time writing code and working with SQLite in other projects.
I still wanted to compare the DB-Approach with the MMAP Solution just for the performance aspect.
Of course it saves you a lot of work if you can use SQL-Queries to handle all the data you parsed. But I really didn't care about the work amount because I'm still learning a lot and what I learned from this is:
This MMAP-Approach - if you implement it correctly - is absolutely superior in performance. It's unbelievable fast which you will notice if you implement the "word-count" example which can be seen as the "hello world" for MapReduce Algo.
Now if you further want to benefit from SQL-Language the correct approach would be implementing your own SQL-Wrapper that uses kind of Map-Reduce too by the means of sharing queries amongst threads.
You could perhaps share Objects by ID amongst threads, where every thread handles it's own DB-Connection. It then queries Objects in it's own part of the dataset.
This would be much faster than just writing things to SQLite DB the usual Way.
After all you can say:
MMAP is the fastest way to handle string processing
SQL provides great functionality for parser-applications but it slows down things if you don't implement a wrapper for processing SQL-Queries

Related

Big csv file c++ parsing performance

I have a big csv file (25 mb) that represents a symmetric graph (about 18kX18k). While parsing it into an array of vectors, i have analyzed the code (with VS2012 ANALYZER) and it shows that the problem with the parsing efficiency (about 19 seconds total) occurs while reading each character (getline::basic_string::operator+=) as shown in the picture below:
This leaves me frustrated, as with Java simple buffered line file reading and tokenizer i achieve it with less than half a second.
My code uses only STL library:
int allColumns = initFirstRow(file,secondRow);
// secondRow has initialized with one value
int column = 1; // dont forget, first column is 0
VertexSet* rows = new VertexSet[allColumns];
rows[1] = secondRow;
string vertexString;
long double vertexDouble;
for (int row = 1; row < allColumns; row ++){
// dont do the last row
for (; column < allColumns; column++){
//dont do the last column
getline(file,vertexString,',');
vertexDouble = stold(vertexString);
if (vertexDouble > _TH){
rows[row].add(column);
}
}
// do the last in the column
getline(file,vertexString);
vertexDouble = stold(vertexString);
if (vertexDouble > _TH){
rows[row].add(++column);
}
column = 0;
}
initLastRow(file,rows[allColumns-1],allColumns);
init first and last row basically does the same thing as the loop above, but initFirstRow also counts the number of columns.
VertexSet is basically a vector of indexes (int). Each vertex read (separated by ',') goes no more than 7 characters length long (values are between -1 and 1).

At 25 megabytes, I'm going to guess that your file is machine generated. As such, you (probably) don't need to worry about things like verifying the format (e.g., that every comma is in place).
Given the shape of the file (i.e., each line is quite long) you probably won't impose a lot of overhead by putting each line into a stringstream to parse out the numbers.
Based on those two facts, I'd at least consider writing a ctype facet that treats commas as whitespace, then imbuing the stringstream with a locale using that facet to make it easy to parse out the numbers. Overall code length would be a little greater, but each part of the code would end up pretty simple:
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <time.h>
#include <stdlib.h>
#include <locale>
#include <sstream>
#include <algorithm>
#include <iterator>
class my_ctype : public std::ctype<char> {
std::vector<mask> my_table;
public:
my_ctype(size_t refs=0):
my_table(table_size),
std::ctype<char>(my_table.data(), false, refs)
{
std::copy_n(classic_table(), table_size, my_table.data());
my_table[',']=(mask)space;
}
};
template <class T>
class converter {
std::stringstream buffer;
my_ctype *m;
std::locale l;
public:
converter() : m(new my_ctype), l(std::locale::classic(), m) { buffer.imbue(l); }
std::vector<T> operator()(std::string const &in) {
buffer.clear();
buffer<<in;
return std::vector<T> {std::istream_iterator<T>(buffer),
std::istream_iterator<T>()};
}
};
int main() {
std::ifstream in("somefile.csv");
std::vector<std::vector<double>> numbers;
std::string line;
converter<double> cvt;
clock_t start=clock();
while (std::getline(in, line))
numbers.push_back(cvt(line));
clock_t stop=clock();
std::cout<<double(stop-start)/CLOCKS_PER_SEC << " seconds\n";
}
To test this, I generated an 1.8K x 1.8K CSV file of pseudo-random doubles like this:
#include <iostream>
#include <stdlib.h>
int main() {
for (int i=0; i<1800; i++) {
for (int j=0; j<1800; j++)
std::cout<<rand()/double(RAND_MAX)<<",";
std::cout << "\n";
}
}
This produced a file around 27 megabytes. After compiling the reading/parsing code with gcc (g++ -O2 trash9.cpp), a quick test on my laptop showed it running in about 0.18 to 0.19 seconds. It never seems to use (even close to) all of one CPU core, indicating that it's I/O bound, so on a desktop/server machine (with a faster hard drive) I'd expect it to run faster still.

The inefficiency here is in Microsoft's implementation of std::getline, which is being used in two places in the code. The key problems with it are:
It reads from the stream one character at a time
It appends to the string one character at a time
The profile in the original post shows that the second of these problems is the biggest issue in this case.
I wrote more about the inefficiency of std::getline here.
GNU's implementation of std::getline, i.e. the version in libstdc++, is much better.
Sadly, if you want your program to be fast and you build it with Visual C++ you'll have to use lower level functions than std::getline.

The debug Runtime Library in VS is very slow because it does a lot of debug checks (for out of bound accesses and things like that) and calls lots of very small functions that are not inlined when you compile in Debug.
Running your program in release should remove all these overheads.
My bet on the next bottleneck is string allocation.

I would try read bigger chunks of memory at once and then parse it all.
Like.. read full line. and then parse this line using pointers and specialized functions.

Hmm good answer here. Took me a while but I had the same problem. After this fix my write and process time went from 38 sec to 6 sec.
Here's what I did.
First get data using boost mmap. Then you can use boost thread to make processing faster on the const char* that boost mmap returns. Something like this: (the multithreading is different depending on your implementation so I excluded that part)
#include <boost/iostreams/device/mapped_file.hpp>
#include <boost/thread/thread.hpp>
#include <boost/lockfree/queue.hpp>
foo(string path)
{
boost::iostreams::mapped_file mmap(path,boost::iostreams::mapped_file::readonly);
auto chars = mmap.const_data(); // set data to char array
auto eofile = chars + mmap.size(); // used to detect end of file
string next = ""; // used to read in chars
vector<double> data; // store the data
for (; chars && chars != eofile; chars++) {
if (chars[0] == ',' || chars[0] == '\n') { // end of value
data.push_back(atof(next.c_str())); // add value
next = ""; // clear
}
else
next += chars[0]; // add to read string
}
}

ofstream overwriting stack variables

I'm working on a "threadpool" program for operating systems class. Essentially, files are extracted from a tar file and written to the disk, using a pool of 5 threads. Here's my thread code
#include <iostream>
#include <cstdlib>
using namespace std;
vector<Header*> headers;
vector<string> fileBlocks;
void* writeExtractedFiles(void* args)
{
bool hasFilesLeft = true;
ofstream outputFile;
while(hasFilesLeft)
{
pthread_mutex_lock(&mutex);
if(headers.size() != 0)
{
Header* hdr = headers.back();
headers.pop_back();
string fileBytes = fileBlocks.back();
fileBlocks.pop_back();
pthread_mutex_unlock(&mutex);
outputFile.open(hdr->fileName.c_str(), ios::app);
outputFile.rdbuf()->pubsetbuf(0,0);
fileBytes = fileBytes.substr(0, hdr->fileSize);
outputFile.put('0');
outputFile.close();
// This is a dummy object to check if the values are corrupted
Header* test0 = headers.back();
cout << "GRAWWR!";
//chown(hdr->fileName.c_str(), hdr->userId, hdr->groupId);
//chmod(hdr->fileName.c_str(), hdr->fileMode);
}
else
{
// We're done!
hasFilesLeft = false;
pthread_mutex_unlock(&mutex);
}
}
}
Note As of right now, I'm only testing it with a single thread. Obviously accessing the headers vector outside of my mutex would be counterproductive with multiple threads.
Problem is, the values for test0 are all messed up, super high numbers and nonsense for fileName. It seems like I'm overwriting my stack variables for some reason. When I comment out outputFile.close(); then my variable values aren't changed, but when I keep it, whether I actually write things to the file or not, things get wonky. I know there must be something I'm missing. I've tried getting rid of the buffer altogether, writing the file in a different place, anything I could think of. Any suggestions?
(I'm testing it on a Windows machine but it's being made for Linux)

C++ iostream binary read and write issues

Right, please bear with me as I have two separate attempts I'll cover below.
I first started off reading the guide here (http://www.cplusplus.com/doc/tutorial/files/). However whilst it contains what appears to be a good example of how to use read(), it does not contain an example of how to use write().
I first attempted to store a simple char array in binary using write(). My original idea (and hope) was that I could append to this file with new entries using ios::app. Originally this appeared to work, but I was getting junk output as well. A post on another forum for help suggested I lacked a null terminator on the end of my char array. I applied this (or at least attempted to based on how I was shown) as can be seen in the example below. Unfortunately, this meant that read() no longer functioned properly because it won't read past the null terminator.
I was also told that doing char *memoryBlock is 'abuse' of C++ standard or something, and is unsafe, and that I should instead define an array of an exact size, ie char memoryBlock[5], however what if I wish to write char data to a file that could be of any size? How do I proceed then? The code below includes various commented out lines of code indicating various attempts I have made and different variations, including some of the suggestions I mentioned above. I do wish to try and use good-practice code, so if char *memoryBlock is unsafe, or any other lines of code, I wish to amend this.
I would also like to clarify that I am trying to write chars here for testing purposes only, so please do not suggest that I should write in text mode rather than binary mode instead. I'll elaborate further in the second part of this question under the code below.
First code:
#include <cstdlib>
#include <iostream>
#include <fstream>
//#include <string>
int main()
{
//char memoryBlock[5];
char *memoryBlock;
char *memoryBlockTwo;
std::ifstream::pos_type size;// The number of characters to be read or written from/to the memory block.
std::ofstream myFile;
myFile.open("Example", std::ios::out | /*std::ios::app |*/ std::ios::binary);
if(myFile.is_open() && myFile.good())
{
//myFile.seekp(0,std::ios::end);
std::cout<<"File opening successfully completed."<<std::endl;
memoryBlock = "THEN";
//myFile.write(memoryBlock, (sizeof(char)*4));
//memoryBlock = "NOW THIS";
//strcpy_s(memoryBlock, (sizeof(char)*5),"THIS");
//memoryBlock = "THEN";
//strcpy(memoryBlock, "THIS");
//memoryBlock[5] = NULL;
myFile.write(memoryBlock, (sizeof(char)*5));
}
else
{
std::cout<<"File opening NOT successfully completed."<<std::endl;
}
myFile.close();
std::ifstream myFileInput;
myFileInput.open("Example", std::ios::in | std::ios::binary | std::ios::ate);
if(myFileInput.is_open() && myFileInput.good())
{
std::cout<<"File opening successfully completed. Again."<<std::endl;
std::cout<<"READ:"<<std::endl;
size = myFileInput.tellg();
memoryBlockTwo = new char[size];
myFileInput.seekg(0, std::ios::beg);// Get a pointer to the beginning of the file.
myFileInput.read(memoryBlockTwo, size);
std::cout<<memoryBlockTwo<<std::endl;
delete[] memoryBlockTwo;
std::cout<<std::endl<<"END."<<std::endl;
}
else
{
std::cout<<"Something has gone disasterously wrong."<<std::endl;
}
myFileInput.close();
return 0;
}
The next attempt of mine works on the basis that attempting to use ios::app with ios::binary simply won't work, and that to ammend a file I must read the entire thing in, make my alterations, then write back and replace the entire contents of the file, although this does seem somewhat inefficient.
However I don't read in and ammend contents in my code below. What I am actually trying to do is write an object of a custom class to a file, then read it back out again intact.
This seems to work (although if I'm doing anything bad code-wise here, please point it out), HOWEVER, I am seemingly unable to store variables of type std::string and std::vector because I get access violations when I reach myFileInput.close(). With those member variables commented out the access violation does not occur. My best guess as to why this happens is that They use pointers to other pieces of memory to store their files, and I am not writing the data itself to my file but the pointers to it, which happen to still be valid when I read my data out.
Is it possible at all to store the contents of these more complex datatypes in a file? Or must I break everything down in to more basic variables such as chars, ints and floats?
Second code:
#include <cstdlib>
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
class testClass
{
public:
testClass()
{
testInt = 5;
testChar = 't';
//testString = "Test string.";
//testVector.push_back(3.142f);
//testVector.push_back(0.001f);
}
testClass(int intInput, char charInput, std::string stringInput, float floatInput01, float floatInput02)
{
testInt = intInput;
testChar = charInput;
testArray[0] = 't';
testArray[1] = 'e';
testArray[2] = 's';
testArray[3] = 't';
testArray[4] = '\0';
//testString = stringInput;
//testVector = vectorInput;
//testVector.push_back(floatInput01);
//testVector.push_back(floatInput02);
}
~testClass()
{}
private:
int testInt;
char testChar;
char testArray[5];
//std::string testString;
//std::vector<float> testVector;
};
int main()
{
testClass testObject(3, 'x', "Hello there!", 9.14f, 6.662f);
testClass testReceivedObject;
//char memoryBlock[5];
//char *memoryBlock;
//char *memoryBlockTwo;
std::ifstream::pos_type size;// The number of characters to be read or written from/to the memory block.
std::ofstream myFile;
myFile.open("Example", std::ios::out | /*std::ios::app |*/ std::ios::binary);
if(myFile.is_open() && myFile.good())
{
//myFile.seekp(0,std::ios::end);
std::cout<<"File opening successfully completed."<<std::endl;
//memoryBlock = "THEN";
//myFile.write(memoryBlock, (sizeof(char)*4));
//memoryBlock = "NOW THIS";
//strcpy_s(memoryBlock, (sizeof(char)*5),"THIS");
//memoryBlock = "THEN AND NOW";
//strcpy(memoryBlock, "THIS");
//memoryBlock[5] = NULL;
myFile.write(reinterpret_cast<char*>(&testObject), (sizeof(testClass)));//(sizeof(char)*5));
}
else
{
std::cout<<"File opening NOT successfully completed."<<std::endl;
}
myFile.close();
std::ifstream myFileInput;
myFileInput.open("Example", std::ios::in | std::ios::binary | std::ios::ate);
if(myFileInput.is_open() && myFileInput.good())
{
std::cout<<"File opening successfully completed. Again."<<std::endl;
std::cout<<"READ:"<<std::endl;
size = myFileInput.tellg();
//memoryBlockTwo = new char[size];
myFileInput.seekg(0, std::ios::beg);// Get a pointer to the beginning of the file.
myFileInput.read(reinterpret_cast<char *>(&testReceivedObject), size);
//std::cout<<memoryBlockTwo<<std::endl;
//delete[] memoryBlockTwo;
std::cout<<std::endl<<"END."<<std::endl;
}
else
{
std::cout<<"Something has gone disasterously wrong."<<std::endl;
}
myFileInput.close();
return 0;
}
I apologise for the long-windedness of this question, but I am hoping that my thoroughness in providing as much information as I can about my issues will hasten the appearance of answers, even for this (what may even be a simple issue to fix although I have searched for hours trying to find solutions), as time is a factor here. I will be monitoring this question throughout the day to provide clarifications in the aid of an answer.

In the first example, I'm not sure what you are writing out as memoryBlock is commented out and never initialized to anything. When you are reading it in, since you are using std::cout to display the data to the console, it MUST be NULL terminated or you will print beyond the end of the memory buffer allocated for memoryBlockTwo.
Either write the terminating null to the file:
memoryBlock = "THEN"; // 4 chars + implicit null terminator
myFile.write(memoryBlock, (sizeof(char)*5));
And/or, ensure the buffer is terminated after it is read:
myFileInput.read(memoryBlockTwo, size);
memoryBlockTwo[size - 1] = '\0';
In your second example, don't do that with C++ objects. You are circumventing necessary constructor calls and if you try that using vectors like you have commented out it certainly won't work like you expect. If the class is plain old data (non-virtual functions, no pointers to other data) you will likely be OK, but it's still really bad practice. When persisting C++ objects, consider looking into overloading the << and >> operators.

How to enhance the speed of my C++ program in reading delimited text files?

I show you C# and C++ code that execute the same job: to read the same text file delimited by “|” and save with “#” delimited text.
When I execute C++ program, the time elapsed is 169 seconds.
UPDATE 1: Thanks to Seth (compilation with: cl /EHsc /Ox /Ob2 /Oi) and GWW for changing the positions of string s outside the loops, the elapsed time was reduced to 53 seconds. I updated the code also.
UPDATE 2: Do you have any other suggestion to enhace the C++ code?
When I execute the C# program, the elapsed time is 34 seconds!
The question is, how can I enhance the speed of C++ comparing with the C# one?
C++ Program:
int main ()
{
Timer t;
cout << t.ShowStart() << endl;
ifstream input("in.txt");
ofstream output("out.txt", ios::out);
char const row_delim = '\n';
char const field_delim = '|';
string s1, s2;
while (input)
{
if (!getline( input, s1, row_delim ))
break;
istringstream iss(s1);
while (iss)
{
if (!getline(iss, s2, field_delim ))
break;
output << s2 << "#";
}
output << "\n";
}
t.Stop();
cout << t.ShowEnd() << endl;
cout << "Executed in: " << t.ElapsedSeconds() << " seconds." << endl;
return 0;
}
C# program:
static void Main(string[] args)
{
long i;
Stopwatch sw = new Stopwatch();
Console.WriteLine(DateTime.Now);
sw.Start();
StreamReader sr = new StreamReader("in.txt", Encoding.Default);
StreamWriter wr = new StreamWriter("out.txt", false, Encoding.Default);
object[] cols = new object[0]; // allocates more elements automatically when filling
string line;
while (!string.Equals(line = sr.ReadLine(), null)) // Fastest way
{
cols = line.Split('|'); // Faster than using a List<>
foreach (object col in cols)
wr.Write(col + "#");
wr.WriteLine();
}
sw.Stop();
Console.WriteLine("Conteo tomó {0} secs", sw.Elapsed);
Console.WriteLine(DateTime.Now);
}
UPDATE 3:
Well, I must say I am very happy for the help received and because the answer to my question has been satisfied.
I changed the text of the question a little to be more specific, and I tested the solutions that kindly raised Molbdlino and Bo Persson.
Keeping Seth indications for the compile command (i.e. cl /EHsc /Ox /Ob2 /Oi pgm.cpp):
Bo Persson's solution took 18 seconds on average to complete the execution, really a good one taking in account that the code is near to what I like).
Molbdlino solution took 6 seconds on average, really amazing!! (thanks to Constantine also).
Never too late to learn, and I learned valuable things with my question.
My best regards.

As Constantine suggests, read large chunks at a time using read.
I cut the time from ~25s to ~3s on a 129M file with 5M "entries" (26 bytes each) in 100,000 lines.
#include <iostream>
#include <fstream>
#include <sstream>
#include <algorithm>
using namespace std;
int main ()
{
ifstream input("in.txt");
ofstream output("out.txt", ios::out);
const size_t size = 512 * 1024;
char buffer[size];
while (input) {
input.read(buffer, size);
size_t readBytes = input.gcount();
replace(buffer, buffer+readBytes, '|', '#');
output.write(buffer, readBytes);
}
input.close();
output.close();
return 0;
}

How about this for the central loop
while (getline( input, s1, row_delim ))
{
for (string::iterator c = s1.begin(); c != s1.end(); ++c)
if (*c == field_delim)
*c = '#';
output << s1 << '\n';
}

It seems to me that Your slow part is within getline. I don't have precise documentation which would support my idea, but it's how it feels for me. You should try using read instead. Because getline has the delimiter, so it need to check every symbol whether it has found the delimiter symbol, so that looks like multiple in operations, so Your program accesses a symbol in a file, then write it to the memory of your program, in other words, the time consumed on disk head movement. But if You use read function, You will copy the block of symbols and then work with them within program's memory, that may reduce time consuming.
PS again, I don't have documentation about getline and how it works, but I'm sure about read, hope it is helpful.

If you know the max line length you can your stdio+fgets and null terminated strings, it will rock.
For c# if it will fit in memory (probably not if it takes 34 sec) I'd be curious to see how IO.File.WriteAllText("out.txt",IO.File.ReadAllText("in.txt").Replace("|","#")); performs!

I'd be really surprised if this beat #molbdnilo's version, but it's probably the second fastest, and (I would posit) the simplest and cleanest:
#include <fstream>
#include <string>
#include <sstream>
#include <algorithm>
int main() {
std::ifstream in("in.txt");
std::ostringstream buffer;
buffer << in.rdbuf();
std::string s(buffer.str());
std::replace(s.begin(), s.end(), '|', '#');
std::ofstream out("out.txt");
out << s;
return 0;
}
Based on past experience with this method, I'd expect it to be no worse than half the speed of what #molbdnilo posted -- which should still be around triple the speed of your C# version, and over ten times as fast as your original version in C++. [Edit: I just wrote a file generator, and on a file a little over 100 megabytes, it's even closer than I expected -- I'm getting 4.4 seconds, versus 3.5 for #molbdnilo's code.] The combination of reasonable speed with really short, simple code is often quite a decent trade-off. Of course, that's all predicated on your having enough physical RAM to hold the entire file content in memory, but that's generally a fairly safe assumption these days.

Reading in image files without specifying name

Are there any facilities in SDL or C++ that allow you to read image files in from a folder without specifying their name, like reading them in sequential order, etc.? If not are there any techniques you use to accomplish something along the same lines?
Doing something like this:
foo_ani[0] = LoadImage("Animations/foo1.png");
foo_ani[1] = LoadImage("Animations/foo2.png");
foo_ani[2] = LoadImage("Animations/foo3.png");
can become quite tedious, and a loop can't be used because the file name is specific each time.
The only way I could really think of is maybe having a string that you modify through each loop iterator and insert the loop number into the specific part of the string assuming that's how your files are labeled, and using that string as the LoadImage parameter. That seems like more work though than just doing the above.

Use boost::filesystem.
The tiny program shown here lists all files in the directory files/, matching the pattern fileN.type, where N is from 0 and upwards, unspecified.
#include <iostream>
#include <sstream>
#include <string>
#include <boost/filesystem.hpp>
using namespace std;
namespace fs = boost::filesystem;
int main(int argc, char** argv)
{
fs::path dir ("./files");
string prefix = "file";
string suffix = "type";
int i = 0;
fs::path file;
do {
stringstream ss;
ss << prefix << i++ << "." << suffix;
file = fs::path(dir / fs::path(ss.str()));
if(fs::exists(file)) {
cout << file.leaf() << " exists." << endl;
}
} while(fs::exists(file));
return 0;
}
Link with -lboost_filesystem.
boost::filesystem also provides a simple directory iterator.

For this type of situation, you would typically get a list of the filenames in the directory (with opendir/readdir or FindFirstFile/FindNextFile as appropriate), and loop on each filename in the directory. Given each filename, you can call LoadImage() and append the result to your array.
This technique doesn't require that you know the filenames ahead of time.

How about loading all files in that directory automatically?
foo_ani = LoadImages("Animations/");
Just traverse the directory given and load all files inside that fit.
Another solution, if you have several animations with different prefix is to use regular expressions. I suggest you use boost for that or std::tr1::regex, like this:
foo_ani = LoadImageSet("Animations/", std::tr1::regex("foo*.png"));

Given that you are are currently hard coding the name of the frames, I'm going to assume you know / have control over the naming scheme of the files. I'm also assuming you want them sequentially since it seems to be frames in an animation. Finally I'm assuming you know how many frames there are since you seem to have an array big enough to accommodate them all ready and waiting.
Given the names of the files presented in the question, you can't just do FindFirst / FindNext because once you get past 10 frames, they're almost certainly going to come in out of order (given the naming scheme presented).
So I think that you're right that the best way to do it is in a loop, but wrong that it's more effort than doing it by hand.
char* fname = new char[50]; // buffer big enough to hold filenames
int numFrames = 8; // or however many, you seem to know what this value should be
for(int i = 0; i < numFrames; ++i)
{
sprint(fname, "Animations/foo%d.png",(i+1));
foo_ani[i] = LoadImage(fname);
}
delete[] fname;
That's about 6 lines of code. So for animations of more than 6 frames, I'd say that was easier.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

searching for hundreds of patterns in huge Logfiles - c++

Related

Big csv file c++ parsing performance

ofstream overwriting stack variables

C++ iostream binary read and write issues

How to enhance the speed of my C++ program in reading delimited text files?

Reading in image files without specifying name

Categories

Resources