Sort filenames naturally with Qt

Sort filenames naturally with Qt - c++

I am reading a directories content using QDir::entryList(). The filenames within are structured like this:
index_randomNumber.png
I need them sorted by index, the way the Windows Explorer would sort the files so that I get
0_0815.png
1_4711.png
2_2063.png
...
instead of what the sorting by QDir::Name gives me:
0_0815.png
10000_6661.png
10001_7401.png
...
Is there a built-in way in Qt to achieve this and if not, what's the right place to implement it?

If you want to use QCollator to sort entries from the list of entries returned by QDir::entryList, you can sort the result with std::sort():
dir.setFilter(QDir::Files | QDir::NoSymLinks);
dir.setSorting(QDir::NoSort); // will sort manually with std::sort
auto entryList = dir.entryList();
QCollator collator;
collator.setNumericMode(true);
std::sort(
entryList.begin(),
entryList.end(),
[&](const QString &file1, const QString &file2)
{
return collator.compare(file1, file2) < 0;
});
According to The Badger's comment, QCollator can also be used directly as an argument to std::sort, replacing the lambda, so the call to std::sort becomes:
std::sort(entryList.begin(), entryList.end(), collator);

Qt didn't have natural sort implementation until Qt 5.2, see this feature request.
Since Qt 5.2 there is QCollator which allows natural sort when numeric mode is enabled.

Yes it is possible.
In order to do that you need to specify the flag LocaleAware when constructing the QDir. object. The constructor is
QDir(const QString & path, const QString & nameFilter, SortFlags sort = SortFlags( Name | IgnoreCase ), Filters filters = AllEntries)
You can also use
QDir dir;
dir.setSorting(QDir::LocaleAware);

inline int findNumberPart(const QString& sIn)
{
QString s = "";
int i = 0;
bool isNum = false;
while (i < sIn.length())
{
if (isNum)
{
if (!sIn[i].isNumber())
break;
s += sIn[i];
}
else
{
if (sIn[i].isNumber())
s += sIn[i];
}
++i;
}
if (s == "")
return 0;
return s.toInt();
}
bool naturalSortCallback(const QString& s1, const QString& s2)
{
int idx1 = findNumberPart(s1);
int idx2 = findNumberPart(s2);
return (idx1 < idx2);
}
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
QDir dir(MYPATH);
QStringList list = dir.entryList(QDir::AllEntries | QDir::NoDotAndDotDot);
qSort(list.begin(), list.end(), naturalSortCallback);
foreach(QString s, list)
qDebug() << s << endl;
return a.exec();
}

This isn't an answer to the question as such, but some general information for the benefit of others that stumble across this trying to figure out how to "sort naturally".
First off: it's impossible. "Correct" natural sorting depends on context that — short of "true" artificial intelligence — is virtually impossible to have. For instance, if I have a bunch of file names with mixed numbers and letters, and some parts of those names happen to match [0-9a-f], is that a hexadecimal number? Is "1,500" the same as "1500", or are "1" and "500" individual numbers? Does "2019/06/07" come before or after "2019/07/06"? What about "1.21" vs. "1.5"? (Hint: the last depends on if those are decimal numbers or semantic version numbers.)
"Solving" this problem requires constraining it; deciding we're only going to handle specific cases, and anything outside of those bounds is just going to produce a "wrong" answer. (Fortunately, the OP's problem would appear to already satisfy the usual set of constraints.)
That said, I believe QCollator works generally well (again, in that it doesn't "really" work, but it succeeds within the constraints that are generally accepted). In the "own solutions" department, have a look also at qtNaturalSort, which I wrote as a Qt-API improvement over a different (not QCollator) algorithm. (Case insensitivity is not supported as of writing, but patches welcomed!) I put a whole bunch of effort into making it parse numbers "correctly", even handling numbers of arbitrary length and non-BMP digits.

Qt doesn't support natural sorting natively, but it can be quite easily implemented. For example, this can be used to sort a QStringList:
struct naturalSortCompare {
inline bool isNumber(QChar c) {
return c >= '0' && c <= '9';
}
inline bool operator() (const QString& s1, const QString& s2) {
if (s1 == "" || s2 == "") return s1 < s2;
// Move to the first difference between the strings
int startIndex = -1;
int length = s1.length() > s2.length() ? s2.length() : s1.length();
for (int i = 0; i < length; i++) {
QChar c1 = s1[i];
QChar c2 = s2[i];
if (c1 != c2) {
startIndex = i;
break;
}
}
// If the strings are the same, exit now.
if (startIndex < 0) return s1 < s2;
// Now extract the numbers, if any, from the two strings.
QString sn1;
QString sn2;
bool done1 = false;
bool done2 = false;
length = s1.length() < s2.length() ? s2.length() : s1.length();
for (int i = startIndex; i < length; i++) {
if (!done1 && i < s1.length()) {
if (isNumber(s1[i])) {
sn1 += QString(s1[i]);
} else {
done1 = true;
}
}
if (!done2 && i < s2.length()) {
if (isNumber(s2[i])) {
sn2 += QString(s2[i]);
} else {
done2 = true;
}
}
if (done1 && done2) break;
}
// If none of the strings contain a number, use a regular comparison.
if (sn1 == "" && sn2 == "") return s1 < s2;
// If one of the strings doesn't contain a number at that position,
// we put the string without number first so that, for example,
// "example.bin" is before "example1.bin"
if (sn1 == "" && sn2 != "") return true;
if (sn1 != "" && sn2 == "") return false;
return sn1.toInt() < sn2.toInt();
}
};
Then usage is simply:
std::sort(stringList.begin(), stringList.end(), naturalSortCompare());

Related

QDir::entryList() sorts files when QDir::NoSort flag is set [duplicate]

I am reading a directories content using QDir::entryList(). The filenames within are structured like this:
index_randomNumber.png
I need them sorted by index, the way the Windows Explorer would sort the files so that I get
0_0815.png
1_4711.png
2_2063.png
...
instead of what the sorting by QDir::Name gives me:
0_0815.png
10000_6661.png
10001_7401.png
...
Is there a built-in way in Qt to achieve this and if not, what's the right place to implement it?

If you want to use QCollator to sort entries from the list of entries returned by QDir::entryList, you can sort the result with std::sort():
dir.setFilter(QDir::Files | QDir::NoSymLinks);
dir.setSorting(QDir::NoSort); // will sort manually with std::sort
auto entryList = dir.entryList();
QCollator collator;
collator.setNumericMode(true);
std::sort(
entryList.begin(),
entryList.end(),
[&](const QString &file1, const QString &file2)
{
return collator.compare(file1, file2) < 0;
});
According to The Badger's comment, QCollator can also be used directly as an argument to std::sort, replacing the lambda, so the call to std::sort becomes:
std::sort(entryList.begin(), entryList.end(), collator);

Qt didn't have natural sort implementation until Qt 5.2, see this feature request.
Since Qt 5.2 there is QCollator which allows natural sort when numeric mode is enabled.

Yes it is possible.
In order to do that you need to specify the flag LocaleAware when constructing the QDir. object. The constructor is
QDir(const QString & path, const QString & nameFilter, SortFlags sort = SortFlags( Name | IgnoreCase ), Filters filters = AllEntries)
You can also use
QDir dir;
dir.setSorting(QDir::LocaleAware);

inline int findNumberPart(const QString& sIn)
{
QString s = "";
int i = 0;
bool isNum = false;
while (i < sIn.length())
{
if (isNum)
{
if (!sIn[i].isNumber())
break;
s += sIn[i];
}
else
{
if (sIn[i].isNumber())
s += sIn[i];
}
++i;
}
if (s == "")
return 0;
return s.toInt();
}
bool naturalSortCallback(const QString& s1, const QString& s2)
{
int idx1 = findNumberPart(s1);
int idx2 = findNumberPart(s2);
return (idx1 < idx2);
}
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
QDir dir(MYPATH);
QStringList list = dir.entryList(QDir::AllEntries | QDir::NoDotAndDotDot);
qSort(list.begin(), list.end(), naturalSortCallback);
foreach(QString s, list)
qDebug() << s << endl;
return a.exec();
}

Qt doesn't support natural sorting natively, but it can be quite easily implemented. For example, this can be used to sort a QStringList:
struct naturalSortCompare {
inline bool isNumber(QChar c) {
return c >= '0' && c <= '9';
}
inline bool operator() (const QString& s1, const QString& s2) {
if (s1 == "" || s2 == "") return s1 < s2;
// Move to the first difference between the strings
int startIndex = -1;
int length = s1.length() > s2.length() ? s2.length() : s1.length();
for (int i = 0; i < length; i++) {
QChar c1 = s1[i];
QChar c2 = s2[i];
if (c1 != c2) {
startIndex = i;
break;
}
}
// If the strings are the same, exit now.
if (startIndex < 0) return s1 < s2;
// Now extract the numbers, if any, from the two strings.
QString sn1;
QString sn2;
bool done1 = false;
bool done2 = false;
length = s1.length() < s2.length() ? s2.length() : s1.length();
for (int i = startIndex; i < length; i++) {
if (!done1 && i < s1.length()) {
if (isNumber(s1[i])) {
sn1 += QString(s1[i]);
} else {
done1 = true;
}
}
if (!done2 && i < s2.length()) {
if (isNumber(s2[i])) {
sn2 += QString(s2[i]);
} else {
done2 = true;
}
}
if (done1 && done2) break;
}
// If none of the strings contain a number, use a regular comparison.
if (sn1 == "" && sn2 == "") return s1 < s2;
// If one of the strings doesn't contain a number at that position,
// we put the string without number first so that, for example,
// "example.bin" is before "example1.bin"
if (sn1 == "" && sn2 != "") return true;
if (sn1 != "" && sn2 == "") return false;
return sn1.toInt() < sn2.toInt();
}
};
Then usage is simply:
std::sort(stringList.begin(), stringList.end(), naturalSortCompare());

QString compare with digits [duplicate]

I am reading a directories content using QDir::entryList(). The filenames within are structured like this:
index_randomNumber.png
I need them sorted by index, the way the Windows Explorer would sort the files so that I get
0_0815.png
1_4711.png
2_2063.png
...
instead of what the sorting by QDir::Name gives me:
0_0815.png
10000_6661.png
10001_7401.png
...
Is there a built-in way in Qt to achieve this and if not, what's the right place to implement it?

If you want to use QCollator to sort entries from the list of entries returned by QDir::entryList, you can sort the result with std::sort():
dir.setFilter(QDir::Files | QDir::NoSymLinks);
dir.setSorting(QDir::NoSort); // will sort manually with std::sort
auto entryList = dir.entryList();
QCollator collator;
collator.setNumericMode(true);
std::sort(
entryList.begin(),
entryList.end(),
[&](const QString &file1, const QString &file2)
{
return collator.compare(file1, file2) < 0;
});
According to The Badger's comment, QCollator can also be used directly as an argument to std::sort, replacing the lambda, so the call to std::sort becomes:
std::sort(entryList.begin(), entryList.end(), collator);

Qt didn't have natural sort implementation until Qt 5.2, see this feature request.
Since Qt 5.2 there is QCollator which allows natural sort when numeric mode is enabled.

Yes it is possible.
In order to do that you need to specify the flag LocaleAware when constructing the QDir. object. The constructor is
QDir(const QString & path, const QString & nameFilter, SortFlags sort = SortFlags( Name | IgnoreCase ), Filters filters = AllEntries)
You can also use
QDir dir;
dir.setSorting(QDir::LocaleAware);

inline int findNumberPart(const QString& sIn)
{
QString s = "";
int i = 0;
bool isNum = false;
while (i < sIn.length())
{
if (isNum)
{
if (!sIn[i].isNumber())
break;
s += sIn[i];
}
else
{
if (sIn[i].isNumber())
s += sIn[i];
}
++i;
}
if (s == "")
return 0;
return s.toInt();
}
bool naturalSortCallback(const QString& s1, const QString& s2)
{
int idx1 = findNumberPart(s1);
int idx2 = findNumberPart(s2);
return (idx1 < idx2);
}
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
QDir dir(MYPATH);
QStringList list = dir.entryList(QDir::AllEntries | QDir::NoDotAndDotDot);
qSort(list.begin(), list.end(), naturalSortCallback);
foreach(QString s, list)
qDebug() << s << endl;
return a.exec();
}

Qt doesn't support natural sorting natively, but it can be quite easily implemented. For example, this can be used to sort a QStringList:
struct naturalSortCompare {
inline bool isNumber(QChar c) {
return c >= '0' && c <= '9';
}
inline bool operator() (const QString& s1, const QString& s2) {
if (s1 == "" || s2 == "") return s1 < s2;
// Move to the first difference between the strings
int startIndex = -1;
int length = s1.length() > s2.length() ? s2.length() : s1.length();
for (int i = 0; i < length; i++) {
QChar c1 = s1[i];
QChar c2 = s2[i];
if (c1 != c2) {
startIndex = i;
break;
}
}
// If the strings are the same, exit now.
if (startIndex < 0) return s1 < s2;
// Now extract the numbers, if any, from the two strings.
QString sn1;
QString sn2;
bool done1 = false;
bool done2 = false;
length = s1.length() < s2.length() ? s2.length() : s1.length();
for (int i = startIndex; i < length; i++) {
if (!done1 && i < s1.length()) {
if (isNumber(s1[i])) {
sn1 += QString(s1[i]);
} else {
done1 = true;
}
}
if (!done2 && i < s2.length()) {
if (isNumber(s2[i])) {
sn2 += QString(s2[i]);
} else {
done2 = true;
}
}
if (done1 && done2) break;
}
// If none of the strings contain a number, use a regular comparison.
if (sn1 == "" && sn2 == "") return s1 < s2;
// If one of the strings doesn't contain a number at that position,
// we put the string without number first so that, for example,
// "example.bin" is before "example1.bin"
if (sn1 == "" && sn2 != "") return true;
if (sn1 != "" && sn2 == "") return false;
return sn1.toInt() < sn2.toInt();
}
};
Then usage is simply:
std::sort(stringList.begin(), stringList.end(), naturalSortCompare());

Is there an alternative to using str.substr( ) to extract a substring at a given position?

I am trying to compare two std::strings, and decide if string A is the same as string B, but with the insertion or deletion of a single character.
Otherwise it returns false.
For example: "start" and "strt" or "ad" and "add"
Currently:
if(((sizeA - sizeB) != 1)
&& ((sizeB - sizeA) != 1))
{
return false;
}
if(sizeA < sizeB)
{
for(int i = 0; i < sizeA; ++i)
{
if(stringA[i] != stringB[i])
{
if(stringA.substr(i)
== stringB.substr(i + 1))
{
return true;
}
else return false;
}
}
} //with another loop that runs only if stringA is larger than stringB
This works flawlessly, but gprof tells me that this function is being bogged down.
I tried converting the for loop to use iterators to access the chars, but this doubled my run time.
Ive narrowed it down to my use of std::string.substr( ) because it is constructing new strings each time stringA and stringB differ in size by 1.
When the first character differs, I need a more efficient way to check if I were to delete that character, would the two strings then be equal?

It seems, once it is known whether there is a one character difference the comparison can be done more effective with a single pass over the string: find the location of the difference, skip the character, and see if the tail is the same. To that end it is obviously necessary to know which one is the smaller string but that's trivial to determine:
bool oneCharDiff(std::string const& shorter, std::string const& longer) {
if (shorter.size() + 1u != longer.size() {
return false;
}
typedef std::string::const_iterator const_iterator;
std::pair<const_iterator, const_iterator> p
= std::mismatch(shorter.begin(), shorter.end(), longer.begin());
return std::equal(p.first, shorter.end(), p.second + 1);
}
bool atMostOneCharDiff(std::string const& s0, std::string const& s1) {
if (s0.size() < s1.size()) {
return oneCharDiff(s0, s1);
else if (s1.size() < s0.size()) {
return oneCharDiff(s1, s0);
}
else {
return s0 == s1;
}
}

Try:
if (stringA.compare(i, stringA.npos, stringB, i+1, stringB.npos) == 0) {
/* the strings are equal */
}
In this write-up, that's version (3) of std::basic_string::compare.

If your compiler supports it it may be worth checking out the new ISO/IEC TS 19568:xxxx Technical Specification string_view class.
It provides an immutable view of a string through references without copying the string itself so it promises to be much more efficient when dealing with substrings.
#include <experimental/string_view>
using std::experimental::string_view;
bool func(string_view svA, string_view svB)
{
// ... stuff
if(svA.size() < svB.size())
{
for(int i = 0; i < svA.size(); ++i)
{
if(svA[i] != svB[i])
{
if(svA.substr(i)
== svB.substr(i + 1))
{
return true;
}
else return false;
}
}
}
// ... stuff
return false;
}
As you can see it works pretty much like a drop-in replacement for std::string (or const char* etc...). Simply pass your normal std::string objects as arguments to the function and the string_view parameters will initialize from the passed in strings.

What data structure is better to use to find if sentence consist of unique characters?

I'm trying to solve a task and not sure if I'm using suitable data structure for it. My task is to find if sentence consist of unique characters and as a result return boolean value.
Here is my function:
bool use_map(string sentence) {
map<int, string> my_map;
for (string::size_type i = 0; i <= sentence.length(); i++) {
unsigned int index = (int)sentence[i];
if (my_map.find(index) != my_map.end())
return false;
my_map[index] = sentence[i];
}
return true;
}
I found only map structure which is suitable for me. Maybe I miss something?
Maybe it's better to use something like dynamic arrays at PHP?
I'm trying to use hash table solution.

The other answers suggested std::set and that's a solution. BUT, they copy all chars inside the std::set and then get the size of the set. You don't really need this and you can avoid it, using the return value of std::set::insert. Something like:
std::set< char > my_set;
for (std::string::size_type ii = 0; ii < sentence.size(); ++ii)
{
if( ! my_set.insert( sentence[ ii ] ).second )
{
return false;
}
}
This way you'll:
stop on the first duplicated char and you will not copy the whole string (unnecessarily)
you will avoid the unnecessary cast to int in your code
will save memory - if you don't actually need you std::map< int, std::string >::second
Also, make sure you need to "count" all chars or you want to skip some of them (like white spaces, commas, question marks, etc)

A very simple (but rather memory expensive) way would be:
bool use_map(const std::string& sentence)
{
std::set<char> chars(sentence.begin(), sentence.end());
return chars.size() == sentence.size();
}
If there's no duplicate chars, the sizes of both string and set will be equal.
#Jonathan Leffler raises a good point in the comments: sentences usualy contain several whitespaces, so this will return false. You'll want to filter spaces out. Still, std::set should be your container of choice.
Edit:
Here's an idea for O(n) solution with no additional memory. Just use a look-up table where you mark if the char was seen before:
bool no_duplicates(const std::string& sentence)
{
static bool table[256];
std::fill(table, table+256, 0);
for (char c : sentence) {
// don't test spaces
if (c == ' ') continue;
// add more tests if needed
const unsigned char& uc = static_cast<unsigned char>(c);
if (table[uc]) return false;
table[uc] = true;
}
return true;
}

I guess an easy way is to store all the characters in an associative container that does not allow duplicates, such as std::set, and check if it contains a single value:
#include <set>
#include <string>
bool has_unique_character(std::string const& str)
{
std::set<char> s(begin(str), end(str));
return (s.size() == str.size());
}

What about this? There is a case issue of course...
bool use_map(const std::string& sentence)
{
std::vector<bool> chars(26, false);
for(std::string::const_iterator i = sentence.begin(); i != sentence.end(); ++i) {
if(*i == ' ' || *i - 'a' > 25 || *i - 'a' < 0) {
continue;
} else if(chars[*i - 'a']) {
return false;
} else {
chars[*i - 'a'] = true;
}
}
return true;
}

Sort the characters and then look for an adjacent pair of alphabetic characters with both characters equal. Something like this:
std::string my_sentence = /* whatever */
std::sort(my_sentence.begin(), my_sentence.end());
std::string::const_iterator it =
std::adjacent_find(my_sentence.begin(), my_sentence.end());
while (it != my_sentence.end() && isalpha((unsigned char)*it)
it = std::adjacent_find(++it, my_sentence.end());
if (it == my_sentence.end())
std::cout << "No duplicates.\n";
else
std::cout << "Duplicated '" << *it << "'.\n";

If you are allowed to use additional memory, use a hash table: Iterate through the array, check if current element has already been hashed. If yes, you found a repetition. If no, add it to hash. This will be linear, but will require additional memory.
If the range of original sequence elements is quite small, instead of hashing you can simply have an array of the range size and do like in a bucket sort. For example
bool hasDuplicate( string s )
{
int n = s.size();
vector<char> v( 256, 0 );
for( int i = 0; i < n; ++i )
if( v[ s[ i ] ] ) // v[ hash( s[i] ) ] here in case of hash usage
return true;
else
v[ s[ i ] ] = 1; // and here too
return false;
}
Finally, if you are not allowed to use additional memory, you can just sort it and check if two adjacent elements are equal in one pass. This will take O(nlogn) time. No need for sets or maps :)

Here is the fastest possible solution:
bool charUsed[256];
bool isUnique(string sentence) {
int i;
for(i = 0; i < 256; ++i) {
charUsed[i] = false;
}
int n = s.size();
for(i = 0; i < n; ++i) {
if (charUsed[(unsigned char)sentence[i]]) {
return false;
}
charUsed[(unsigned char)sentence[i]] = true;
}
return true;
}

c++ STL sort with extra parameter 'invalid operator <'

I am trying to sort a not so small vector of strings with self-defined comparing rule which is here:
bool lexGraph(string const &str1, string const &str2)
{
string::const_iterator i1 = str1.begin(), i2 = str2.begin();
while((i1 < str1.end()) && (i2 < str2.end()))
{
if(*i1 == ' ')
{
i1++;
continue;
}
if(*i2 == ' ')
{
i2++;
continue;
}
if(toupper(*i1) < toupper(*i2))
{
return true;
}
if(toupper(*i1) > toupper(*i2))
{
return false;
}
i1++, i2++;
}
return (str1.length() <= str2.length());
}
I use it in this loop:
vector<string> subset;
ifstream fin(input);
ofstream fout(output);
string buff;
for(long i = 0; i < 241; i++)
{
getline(fin,buff);
buff += '\n';
subset.push_back(buff);
}
sort(subset.begin(), subset.end(),lexGraph);
I found out that the overflow error occurs with vectors larger than 240. I found that this number can even become smaller if I use a smaller file. Also, strings are never really big. If I cut my function down to
bool lexGraph(string const &str1, string const &str2)
{
return (str1.length() <= str2.length());
}
the error still occurs. But it doesnt when I use STL sort without an extra parameter.
So, I cant figure where the leak is and I hope for some hint here.

You need a strict-weak ordering. Your function for ordering must return false when called with equal strings. If you compare with <=, it doesn't work. BTW: I believe that some standard library implementations have a diagnostic mode that could have caught this error for you. Use this, as there are enough ropes in C++ that you can shoot yourself in the foot with.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js