Efficient search algorithm for files with specifed extension C/C++ - c++

I want to implement faster directory search.
Is there any algorithm in c/c++ is available for that ?

Check boost::filesystem library on http://www.boost.org/doc/libs/1_47_0/libs/filesystem/v3/doc/index.htm, there you have a recursive_directory_iterator class.

There isn't a C++ thing per se, but usually directory search is slow because of IO, and because you must stat each file (or whatever the OS equivalent is non-unix systems) to find out anything besides its name. One way to make this faster would be to keep write a server that keeps the inodes and filenames in memory. Of course the difficulty is that the inode information is not static. You would need to listen to file system changes to keep your cache up to date. That is definitely possible in linux, but I have no experience with it on other systems. As you can see, another theme of this problem is that it is very system and possibly filesystem dependent. Maybe a system-independent library like Boost::Filesystem can help, but I doubt it implements directory update callbacks.
Maybe just install Google Desktop?

Here's a windows solution (http://ideone.com/5dFVf)
class file_iterator : std::iterator<std::output_iterator_tag, const WIN32_FIND_DATA> {
HANDLE handle;
WIN32_FIND_DATA fdata;
public:
file_iterator() :handle(NULL) {
#ifdef _DEBUG
memset(&fdata, 0, sizeof(fdata);
#endif //_DEBUG
}
file_iterator(const std::wstring& path) :handle(FindFirstFile(path.c_str(), &fdata)) {}
file_iterator(file_iterator&& b) :handle(b.handle) {b.handle = NULL;}
file_iterator& operator=(file_iterator&& b) {close(); handle = b.handle; b.handle = NULL;}
void close() {
if (handle)
FindClose(handle);
#ifdef _DEBUG
memset(&fdata, 0, sizeof(fdata);
#endif //_DEBUG
}
const WIN32_FIND_DATA& operator*() {return fdata;}
file_iterator& operator++() {if (FindNextFile(handle , &fdata)==false) close(); return *this;}
bool operator==(const file_iterator& b) {return handle == b.handle;}
bool operator!=(const file_iterator& b) {return handle != b.handle;}
};
std::vector<std::wstring>
find_files_with_extension(
const std::wstring& folder,
const std::wstring& extension,
std::vector<std::wstring>* result=NULL)
{
std::wstring filepath = folder + L"/*";
std::vector<std::wstring> local_result;
std::deque<std::wstring> todo;
if (result == NULL)
result = &local_result;
file_iterator iter(filepath);
while(iter != file_iterator()) {
std::wstring folder_file((*iter).cFileName);
if ((*iter).dwFileAttributes | FILE_ATTRIBUTE_DIRECTORY)
todo.push_back(folder_file);
else if (folder_file.size() > extension.size() && folder_file.substr(folder_file.size()-extension.size())==extension)
result->push_back(folder_file);
++iter;
}
for(int i=0; i<todo.size(); ++i)
find_files_with_extension(todo[i], extension, result);
return *result;
}
This uses a breadth-first search, which takes a little more RAM and is slightly more complicated, but faster due to caching.

Searching is an OS feature these days, and those who are trying to implement third party indexing are giving up. Even Google Desktop is not being updated and most consider it dead:
https://superuser.com/questions/194082/is-google-desktop-search-a-dead-project
If you install a search server on someone's computer and get caught hogging disk and CPU--and you do not have a very, VERY good reason for doing so--you will not only waste a lot of time writing code and patching bugs but you will also alienate your users.
For most cross-platform apps, letting the users find the files in the Explorer/Finder/Nautilus and then making your app accept multi-file drag and drops is a better answer. Also, most "common dialogs" for opening files provide built in search functionality now.
If you're trying to write a search-accelerated tool for a specific platform, hook into that platform's API, which may even permit you to supplement its index. Here's Microsoft's Programmatic Search API:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb266517(v=vs.85).aspx
OS/X has the spotlight API:
http://developer.apple.com/library/mac/#documentation/Carbon/Conceptual/SpotlightQuery/SpotlightQuery.html
I'm not quite sure if there's "canon" for search in the Linux world, but most all of the relevant versions of Ubuntu now ship with Tracker:
http://live.gnome.org/Tracker/Documentation

Related

The equivelant code %SystemDrive% in batch translated into C++

To anyone that can help Please,
(My operating system is Windows XP)
I have looked on the this forum but have not found a similair answer that I could use or adapt to suite this particular situation. I will try to explain (I apologise in advance if my question seems confusing)
I am constructing a batch file that will call a C++ program (.exe) The C++ program is hard coded to the C: drive. By the way I did not write the C++ program as I am incapable of writing in C++ but would like to exchange the C: in C++ for what would be in batch %SystemDrive%. The line of code in C++ reads as follows:
SetSfcFileException(0, L"c:\\windows\\system32\\calc.exe",-1);
// Now we can modify the system file in a complete stealth.
}
The bit of code I would like to alter in the above code is C: or "C" to change it to %systemDrive% but in C++ code language, in effect change the hard coded part of the C++ program to read a System path variable within XP.
I have also looked elsewhere on the net but have not found a suitable answer as I do Not want to break the C++ code you see.
The C++ code was obtained from the folowing website written by Abdellatif_El_Khlifi:
https://www.codeproject.com/Articles/14933/A-simple-way-to-hack-Windows-File-Protection-WFP-u
Many Thanks for any help given,
David
The search term you should be looking for is Known Folders.
Specifically, calling SHGetKnownFolderPath() with the FOLDERID_System identifier, one of the many IDs found here.
That's for Vista or better. For earlier than that (such as XP), you have to use CSIDL values, CSIDL_SYSTEM (see here for list) passed into SHGetFolderPath().
You can still use the pre-Vista ones but I think they're just thin wrappers around the newer ones.
This is the simplest console application I could come up with that shows this in action (Visual Studio 2019):
#include <iostream>
#include <shlobj_core.h>
#include <comutil.h>
int main()
{
PWSTR path = NULL;
HRESULT hr = SHGetKnownFolderPath(FOLDERID_System, 0, NULL, &path);
_bstr_t bstrPath(path);
std::string strPath((char*)bstrPath);
std::cout << "Path is '" << strPath << "'\n";
}
and the output on my system is:
Path is 'C:\WINDOWS\system32'
This is not really answering my own question, well it is but in a alternative manner, many ways to skin a cat so to speak!
Here is one encouraging bit of news though I have stumbled across the very thing I need called WFPReplacer, it is a commandline windows utility that pretty well does what I want & generally in the same manner. it disables WFP for both singular files & can be used for wholesale switching off of WFP if the right file is replaced. All I need to do is write a batch file as a front end to back up the system files I want to disable use WFPReplacer.exe. So if in the event of the proceedings the routine gets stuffed I can revert back to the backed up files. I think this program uses the same type of embedded coding but is written in Delphi/pascal, it is called Remko Weijnen's Blog (Remko's Blog) "replacing Wfp protected files".
I generally like to leave whatever I am doing on a positive note. So just in case someone else lands on this forum & is trying to accomplish a similair exercise here is the code that one can compile (This is not my code it belongs to Remko Weijnen's Blog (Remko's Blog)) Please be advised it is NOT C++ it is a commandline exe Delhi/Pascal found at this link, so all credits belong to him. The link is:
https://www.remkoweijnen.nl/blog/2012/12/05/replacing-wfp-protected-files/
DWORD __stdcall SfcFileException(RPC_BINDING_HANDLE hServer, LPCWSTR lpSrc, int Unknown)
{
RPC_BINDING_HANDLE hServerVar; // eax#2
int nts; // eax#6
__int32 dwResult; // eax#7
DWORD dwResultVar; // esi#9
int v8; // [sp+8h] [bp-8h]#1
int v9; // [sp+Ch] [bp-4h]#1
LOWORD(v8) = 0;
*(int *)((char *)&v8 + 2) = 0;
HIWORD(v9) = 0;
if ( !hServer )
{
hServerVar = _pRpcHandle;
if ( !_pRpcHandle )
{
hServerVar = SfcConnectToServer(0);
_pRpcHandle = hServerVar;
if ( !hServerVar )
return 0x6BA; // RPC_S_SERVER_UNAVAILABLE
}
hServer = hServerVar;
}
nts = SfcRedirectPath(lpSrc, (int)&v8);
if ( nts >= 0 )
dwResult = SfcCli_FileException((int)hServer, v9, Unknown).Simple;
else
dwResult = RtlNtStatusToDosError(nts);
dwResultVar = dwResult;
MemFree(v9);
return dwResultVar;
}
Also as one further warning (Unless you know what you are doing!!!) do not attempt to use this program, ALWAYS ALWAYS ALWAYS backup your system files before deletion or alteration.
What this program will do is disarm WFP for 60 seconds whilst you intercange or amend your files. Example usage for example is:
WfpReplacer.exe c:\windows\Notepad.exe (Errorlevel true or false will be produced on execution).
Best Regards
David

boost::interprocess_exception - library_error exception when creating shared_memory_object

In some rare cases (in fact on a single client's computer) code below throws an exception "library_error":
namespace ipc = boost::interprocess;
ipc::shared_memory_object m_shm;
...
bool initAsServer(size_t sharedMemSize)
{
ipc::permissions perm;
perm.set_unrestricted();
try
{
m_shm = ipc::shared_memory_object(
ipc::create_only,
CNameGenHelper::genUniqueNameUtf8().c_str(), // static std::string genUniqueNameUtf8()
ipc::read_write, perm);
}
catch(const ipc::interprocess_exception& ex)
{
logError("failed with exception \"%s\"", ex.what());
return false;
}
...
}
In log file:
[ERR] failed with exception "boost::interprocess_exception::library_error"
Boost v1.58, platform win32, vs13.
I'll be very grateful if you help me in solving this problem. Thank you in advance!
Reason of problem is events with Event ID = "6005" and source name is "EventLog" in "System" Windows log.
Event Viewer - Windows Logs - System.
If the system log does not contain at least one such event, then method boost::interprocess::winapi::get_last_bootup_time() returns false and boost::interprocess::ipcdetail::windows_bootstamp constructor throws exception.
(define BOOST_INTERPROCESS_HAS_KERNEL_BOOTTIME is used).
So it seems that it is enough to clear the "System" windows event log and any application that uses the Boost shared memory will stop working.
What a terrible logic: use the contents of the windows event log.
It seems this boost ipc implementation bug that has not yet been fixed (boost_1_61_0).
My temporary workaround for this case (w/o reboot of computer):
bool fixBoostIpcSharedMem6005issue() const
{
bool result = false;
HANDLE hEventLog = ::RegisterEventSourceA(NULL, "EventLog");
if(hEventLog)
{
const char* msg = "simple boost shared memory fix for 6005";
if(::ReportEventA(hEventLog, EVENTLOG_INFORMATION_TYPE, 0, 6005, NULL, 1, 0, &msg, NULL))
result = true;
::DeregisterEventSource(hEventLog);
}
return result;
}
Use it and try to use ipc::shared_memory_object again :)
Many detailed explanations about by the problem, by one of the authors of the library: Boost interprocess: Getting boot-up time is unreliable on Windows and here: Interprocess get_last_bootup_time use of Event Log on Windows is completely unreliable
Apparently, a reliable solution is to define the preprocessor constant BOOST_INTERPROCESS_SHARED_DIR_PATH to a function call, which always returns the same directory path as a string, once the machine is booted. For example by formatting the update time-stamp of a file, written to at start-up.
You can #define BOOST_INTERPROCESS_BOOTSTAMP_IS_SESSION_MANAGER_BASED or BOOST_INTERPROCESS_BOOTSTAMP_IS_LASTBOOTUPTIME to switch to either registry or WMI based boot time detection.
Alternatively, you can use BOOST_INTERPROCESS_SHARED_DIR_PATH, but it's kind of useless on Windows since it uses hardcoded path. BOOST_INTERPROCESS_SHARED_DIR_FUNC is much better option since it lets you define a function that returns path to the shared directory.

How to walk through directory tree step by step?

I found many examples on walking through directory tree, but I need something a little different. I need a class with some method which each call returns one file from directory and gradually walking through directory tree. How can I do this please? I am using functions FindFirstFile, FindNextFile and FindClose, I am newbie in c++. I have something like this...
For example I have this simple directory tree
Parent(folder)\
file1.txt
file2.txt
Child(folder)\
file3.txt
file4.txt
and I need a class with a method for example getNextFile(), that first call returns file1.txt; second call returns file2.txt, third call returns Child(folder), fourth call returns file3.txt and so on...
Edit on duplicate flag: I basically need walk through tree without do/while, while or for...I need some kind of iterator, which can be stored for later use and which can continue from last file, when I interrupt browsing, but ideally only with using winapi calls
WIN32_FIND_DATA fdFile;
HANDLE hFind = NULL;
if((hFind = FindFirstFile(sPath, &fdFile)) == INVALID_HANDLE_VALUE)
{
return false;
}
do
{
//do some job with fdFile
}
while(FindNextFile(hFind, &fdFile));
Here is the native C++ way of doing it on Windows platform (using MFC framework):
void ListFiles(const CString& sPath)
{
CFileFind finder;
CString sWildcard(sPath);
sWildcard += _T("\\*.*");
BOOL bWorking = finder.FindFile(sWildcard);
while (bWorking)
{
bWorking = finder.FindNextFile();
if (finder.IsDots())
continue;
if (finder.IsDirectory())
{
CString sFilePath = finder.GetFilePath();
// TODO: do stuff here
ListFiles(sFilePath);
}
}
finder.Close();
}
You can change wild card string to target specific files, like *.txt etc. You can also pass it as a parameter to this function to make it more general purpose.
Use the right tools. Boost is available as good as everywhere, and has the methods you want.
From http://rosettacode.org/wiki/Walk_a_directory/Recursively#C.2B.2B:
#include "boost/filesystem.hpp"
#include "boost/regex.hpp"
#include <iostream>
using namespace boost::filesystem;
int main()
{
path current_dir("."); //
boost::regex pattern("a.*"); // list all files starting with a
for (recursive_directory_iterator iter(current_dir), end;
iter != end;
++iter)
{
std::string name = iter->path().filename().string();
if (regex_match(name, pattern))
std::cout << iter->path() << "\n";
}
}
remove the whole regex business if you don't care whether your file matches a certain pattern.
EDIT:
Could you please explain why it would be bad to use directly API calls ?
it's ugly and hard to read, even harder to get right,
it's not portable at all, and what's most important,
there's a million corner cases you'd have to take care of, possibly, when using the raw win api. Boost has been written by people who did this a few hundred times and has underwent serious code review, so take the save route, and don't reinvent a wheel.
In essence, winapi is about two decades old; there's been a lot of usability improvement in the rest of the world. Unless you have a really good reason, I would try to abstract as much of it away as possible by using common libraries, such as Boost.
I think this does not solves my problem, I edited the original post to make it clearer.
basically need walk through tree without do/while, while or for...I need some kind of iterator, which can be stored for later use
That's exactly what my answer does: give you an Iterator in a for loop. I don't understand what's not fulfilling your Edit's specification about that.
In addition, it would be best to use only WinAPI, because it has to work on different computers with windows and installing boost could be a problem.
You don't have to install boost on any of these computers. Boost::filesystem can comfortable be linked in statically; also, the old-school windows way of doing this is just delivering boost_filesystem*.dll and boost_system*.dll along with your binary. However, if your goal is a single executable that contains all needed functions, you'll go for static linkage, anyway, so this is absolutely no problem.

Find out if file path is mapped / remote or local

Is it possible to find out if a drive path (e.g. P:/temp/foo) is local or remote?
Here ( CMD line to tell if a file/path is local or remote? ) it's shown for a cmd evaluation, but I am looking for a C++/Qt way.
Related to:
QDir::exists with mapped remote directory
How to perform Cross-Platform Asynchronous File I/O in C++
There's no way in Qt, at least up to Qt 5.5. QStorageInfo would be the closest fit, but there is no agreement about how such an API should look like (see the gigantic discussion that started in this thread; basically one risks to have Qt reporting misleading information).
So, for now, you're up to using native APIs. The aforementioned GetDriveType would be fine for Windows, but you're pretty much on your own on Linux and Mac.
you could use the GetDriveType function:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa364939(v=vs.85).aspx
I recently filed a feature request about this exact question: https://bugreports.qt.io/browse/QTBUG-83321
A possible workaround emerged there. Using the following enum:
enum DeviceType {
Physical,
Other,
Unknown
};
I could reliably check a mount to be a local device or something else (possibly a net mount) using the following function on Linux, Windows and macOS:
DeviceType deviceType(const QStorageInfo &volume) const
{
#ifdef Q_OS_LINUX
if (QString::fromLatin1(volume.device()).startsWith(QLatin1String("/"))) {
return DeviceType::Physical;
} else {
return DeviceType::Other;
}
#endif
#ifdef Q_OS_WIN
if (QString::fromLatin1(volume.device()).startsWith(QLatin1String("\\\\?\\Volume"))) {
return DeviceType::Physical;
} else {
return DeviceType::Other;
}
#endif
#ifdef Q_OS_MACOS
if (! QString::fromLatin1(volume.device()).startsWith(QLatin1String("//"))) {
return DeviceType::Physical;
} else {
return DeviceType::Other;
}
#endif
return DeviceType::Unknown;
}

How to know and load all images in a specific folder?

I have an application (C++ Builder 6.0) that needs to know the total of images there are in a specific folder, and then I have to load them: in an ImageList or in a ComboBoxEx... or any other control...
How can I do that?
I know how to load an image in a control, or to save in a TList, or in an ImageList... but How to know how many files files there are in the directory, and how to load every image in it??
I am Sorry about my English.
I did something like this yesterday with C++ using the boost::filesystem library. However, if you are not using boost already, I would strongly recommend you just use the windows libraries instead. This was my code though in case you're interested:
#include <algorithm>
#include <boost/filesystem.hpp>
#include <set>
namespace fs = boost::filesystem;
typedef std::vector<fs::path> PathVector;
std::auto_ptr<PathVector> ImagesInFolder(const fs::path& folderPath) {
std::set<std::string> targetExtensions;
targetExtensions.insert(".JPG");
targetExtensions.insert(".BMP");
targetExtensions.insert(".GIF");
targetExtensions.insert(".PNG");
std::auto_ptr<PathVector> paths(new PathVector());
fs::directory_iterator end;
for(fs::directory_iterator iter(folderPath); iter != end; ++iter) {
if(!fs::is_regular_file(iter->status())) { continue; }
std::string extension = iter->path().extension();
std::transform(extension.begin(), extension.end(), extension.begin(), ::toupper);
if(targetExtensions.find(extension) == targetExtensions.end()) { continue; }
paths->push_back(iter->path());
}
return paths;
}
This doesn't answer the part of your question about how to actually put the paths into a listbox though.
Use the Win32 functions FindFirstFile and FindNextFile ...?
There's no practical way to identify every image in an arbitrary folder. Almost anything you can't identify as something else, could be some sort of image. Then again, using steganography, even something you can identify as something else still might be (or contain) at least part of an image as well.
Realistically, you want to pick out a set of formats you want to support, and write code that knows about them. For quite a few purposes, a half dozen formats or so is quite adequate, though the exact half dozen you pick will vary by the type of application -- only a few programs have any use for both bitmapped and vector graphics, for one example.
Once you've decided what you want, DlgDirList is probably the easiest way to list some files. If that isn't flexible enough for your purposes, the next obvious choice is FindFirstFile, FindNextFile, and FindClose.
To get a list of all files in a folder, have a look at the FindFirst and FindNext functions in SysUtils.
Here is an example function which shows how to get a list of files.
void __fastcall TForm1::GetDirList(TStrings *List, const AnsiString SearchStr)
{
TSearchRec SRec;
AnsiString TempFName;
List->Clear();
// start search
if (FindFirst(SearchStr, faAnyFile, SRec) == 0)
{
do
{
if ((SRec.Attr & faDirectory) != faDirectory) // exclude directories
{
List->Add(SRec.Name);
} // end if
}
while (FindNext(SRec) == 0);
FindClose(SRec);
} // end if
}
Examples:
// get list of all files in directory
GetDirList(MyStringList, "C:\images*.*");
// get list of all .bmp files in directory
GetDirList(MyStringList, "C:\images\*.bmp");
If you can upgrade to newer version of C++Builder, have a look at TMS AdvSmoothImageListBox, from TMS Software.
The TMS Smooth Controls are available free for C++Builder 2010 users on the from Embarcadero website.