c++ find text in QStringList that starts with "..." using .indexOf - c++

I have a question concerning QStringList:
I have a .txt-File containing several 1000 lines of Data followed by this:
+-------------------------+-------------------+-----------------------|
| Conditions at | X1 | X2 |
+-------------------------+-------------------+-----------------------|
| Time [ms] | 0.10780E-02 | 0.27636E-02 |
| Travel [m] | 0.11366E+00 | 0.18796E+01 |
| Velocity [m/s] | 0.43980E+03 | 0.13920E+04 |
| Acceleration [g] | 0.11543E+06 | 0.20936E+05 |
…
Where the Header (Conditions at…) and the first column (Travel, Time,…) always stay the same but the values vary for each run. From this File I want to read the values (only!) into fields of a GUI.
First I write all data into a QStringList. (Each line of .txt copied to one Element of QStringList)
To get the values, from the QStringList I tried to find the corresponding lines with “.indexOf()" which didn´t work because I have to ask for the exact text of the whole line. Since the values vary, the lines are different for each run and my program is not able to find corresponding lines.
Is there a command like “.indexOf-Starting with certain text” which would find me the lines starting with a certain text for example “| Time [ms]”
Thank you very much
itelly

Yes there is method “.indexOf-Starting with certain text”. You can use regular expressions to match the beggining of a string:
int QStringList::indexOf (const QRegExp& rx, int from = 0) const
Use it in this way:
int timeLineIndex = stringList.indexOf(QRegExp("^\| Time \[ms\].+"));
^ means that this text should be at the beggining of a string
\ escapes special characters
.+ means that any text can follow this
EDIT:
Here is a working example that show how it works:
QStringList stringList;
stringList << "abc 5234 hjd";
stringList << "bnd|gf dfs aaa";
stringList << "das gf dfs aaa";
int index = stringList.indexOf(QRegExp("^bnd\|gf.+"));
qDebug() << index;
Output: 1
EDIT:
Here is a function for ezee usage of this:
int indexOfLineStartingWith(const QStringList& list, const QString& textToFind)
{
return list.indexOf(QRegExp("^" + QRegExp::escape(textToFind) + ".+"));
}
int index = indexOfLineStartingWith(stringList, "bnd|gf"); //it's not needed to escape characters here

First of all your actual data starts from the line 4 (excluding the header). Second - each data string has specific layout, that you can parse. Assuming that you read the whole file into the QStringList, where each item in the list represents each line, you can do the following:
QStringList data;
[..]
for (int i = 3; i < data.size(); i++) {
const QString &line = data.at(i);
// Parse the X1 and X2 columns' values
QString strX1 = line.section('|', 1, 1, QString::SectionSkipEmpty).trimmed();
QString strX2 = line.section('|', 2, 2, QString::SectionSkipEmpty).trimmed();
}

Related

Convert string to portable filename with <filesystem> or Boost.Filesystem

Is there a simple way, with <filesystem> or <boost/filesystem.hpp> to convert a sequence of bytes, perhaps represented by std::vector<char> into a portable filename string such that the result can be converted back to the input sequence?
As an example, if a platform permits a filename to be comprised of characters from ranging from [a,f] and [0,9]. A conversion function that suits the above constraint might be one that simply outputs each character in it's two-digit hex equivalent, so {'a', 'b'} would become "6768" as 'a' -> 97 -> 0x67, and 'b' -> 98 -> 0x68.
This is very simple to do with filesystem::path. The first step is to construct a path object from the sequence of characters. There are two-iterator constructors available, as well as constructors that take any of C++'s character types for encoding purposes.
Then, just call generic_u8string on that path object; you will get a std::u8string (in C++20; in C++17, you get a std::string) containing the path formatted in a platform-neutral generic format. This string can later be used to reconstitute the path object as well.
Now, full round-tripping from the platform-specific format through path back to the platform-specific format is not really permitted. You can get the native string version of the path (path::u8string returns this), but there's no guarantee of a byte-for-byte identical string. There is a guarantee that the two strings will identify the same filesystem resource. So the differences, if they exist, are unimportant.
It took me several days to write this :/.
My personal objective here was to following:
Each unique input string must result in a unique filename. In other words, the conversion must be "one-to-one" which implies that it is also reversible, as was requested by the OP.
As many characters as possible must be kept the same; at least, the filename should be mostly human readable and look much like the original string.
When nothing else goes, characters should be escaped as is usual for url-encoding: a percentage followed by the byte value in hexadecimal.
The escape character (%) itself is escaped with two of them (%%), unless another translation is requested.
I wanted simple things, like the ability to have spaces replaced by underscores; and since underscores might also occur frequently, I don't want those escaped with %2F, but with something neat, like a (multi-byte) unicode character.
I decided to only support UTF8 strings therefore, and be able to treat any utf8 glyph as a translatable 'character'; that is: you can translate single glyphs into different glyphs (including all 1-byte ASCII values).
It is not possible to translate multi-glyph sequences with my implementation, but since it is based on a Dictionary class, most of the code should be reusable if anyone wants to add support for that.
Making sure that every string, under any possible translation is reversible turned out to be non-trivial to say the least.
// From
// |
// v
// .----------------------------------------.
// | |
// | j--------.
// | | |
// | i k | |
// | .--------|-----|---+----|----------------.
// | |1 a v | b--->B v 2|<-- To
// | .--->M | I | | J |
// | | | E v | | |
// | | | ^ A d c | | |
// | .-------|--+--|-----|--|--|---+---------. |
// | | m |3 | g | v v | | |
// | | | e | | C K | | |
// | | p | v | | | |
// | l | | n o-->O G h | f------------->F |
// '----|-----+-|---|----+------|-|---------' | |
// | | | | |4 v v | |
// | | | | | H D | |
// | | | `----------------------------------->N |
// | | `--------->P | |
// `----------------->L | |
// | | | |
// | '----------------------------+-----------'
// | |
// | q | r
// | |
// | |<-- Illegal
// '---------------------------------------'
(and that doesn't even include escape characters)
But I think I succeeded. If anyone manages to find arguments to u8string_to_filename that does not convert back with filename_to_u8string let me know!
First of all I needed a function that returns the number of bytes of a glyph:
// Returns the length of the UTF8 encoded glyph, which is highly
// recommended to be either guaranteed correct UTF8, or points
// inside a zero terminated string.
//
// If the pointer does not point to a legal UTF8 glyph then 1 is returned.
// The zero termination is necessary to detect the end of the string
// in the case that the apparent encoded glyph length goes beyond the string.
//
int utf8_glyph_length(char8_t const* glyph)
{
// The length of a glyph is determined by the first byte.
// This magic formula returns 1 for 110xxxxx, 2 for 1110xxxx,
// 3 for 11110xxx and 0 otherwise.
int extra = (0x3a55000000000000 >> ((*glyph >> 2) & 0x3e)) & 0x3;
// Detect if there are indeed `extra` bytes that follow the first
// one, each of which must begin with 10xxxxxx to be legal UTF8.
int i = 0;
while (++i <= extra)
if (glyph[i] >> 6 != 2)
return 1; // Not legal UTF8 encoding.
return 1 + extra;
}
You can find this file here.
Next we need a simple Dictionary class:
class Dictionary
{
private:
std::vector<std::u8string_view> m_words;
public:
Dictionary(std::u8string const&);
size_t size() const { return m_words.size(); }
void add(std::u8string_view glyph);
int find(std::u8string_view glyph) const;
std::u8string_view operator[](int index) const { return m_words[index]; }
};
with its definition
Dictionary::Dictionary(std::u8string const& in)
{
// Run over each glyph in the input.
int glen; // The number of bytes of the current glyph.
for (char8_t const* glyph = in.data(); *glyph; glyph += glen)
{
glen = utf8_glyph_length(glyph);
m_words.emplace_back(glyph, glen);
}
}
void Dictionary::add(std::u8string_view glyph)
{
if (find(glyph) == -1)
m_words.push_back(glyph);
}
int Dictionary::find(std::u8string_view glyph) const
{
for (int index = 0; index < m_words.size(); ++index)
if (m_words[index] == glyph)
return index;
return -1;
}
I also used the following two helper functions
char8_t to_hex_digit(int d)
{
if (d < 10)
return '0' + d;
return 'A' + d - 10;
}
std::u8string to_hex_string(char8_t c)
{
std::u8string hex_string;
hex_string += to_hex_digit(c / 16);
hex_string += to_hex_digit(c % 16);
return hex_string;
}
Finally, here is the encoder function
// Copy str to the returned filename, replacing every occurance of
// the utf8 glyphs in `from` with the corresponding one in `to`.
//
// All glyphs in `illegal` will be escaped with a percentage sign (%)
// followed by two hexidecimal characters for each code point of
// the glyph.
//
// If `from` does not contain the escape character, then each '%' will
// be replaced with "%%".
//
// All glyphs in `to` that are not in `from` are considered illegal
// and will also be escaped.
//
std::filesystem::path u8string_to_filename(std::u8string const& str,
std::u8string const& illegal, std::u8string const& from, std::u8string const& to)
{
using namespace detail::us2f;
// All glyphs are found by their first byte.
// Build a dictionary for each of the three strings.
Dictionary from_dictionary(from);
Dictionary to_dictionary(to);
Dictionary illegal_dictionary(illegal);
// The escape character is always illegal (is not allowed to appear on its own
// in the output).
illegal_dictionary.add({ &escape, 1 });
// For each `from` entry there must exist one `to` entry.
ASSERT(from_dictionary.size() == to_dictionary.size());
std::filesystem::path filename;
// Run over all glyphs in the input string.
int glen; // The number of bytes of the current glyph.
for (char8_t const* gp = str.data(); *gp; gp += glen)
{
glen = utf8_glyph_length(gp);
std::u8string_view glyph(gp, glen);
// Perform translation.
int from_index = from_dictionary.find(glyph);
if (from_index != -1)
glyph = to_dictionary[from_index];
else if (*gp == escape)
{
filename += escape;
filename += escape;
continue;
}
// What is in illegal is *always* illegal - even when it is the result
// of a translation.
if (illegal_dictionary.find(glyph) != -1 ||
// If an input glyph is not in the from_dictionary (aka, it
// wasn't just translated) but it is in the to_dictionary -
// then also escape it. This is necessary to make sure that
// each unique input str results in a unique filename (and
// consequently is reversible).
(from_index == -1 && to_dictionary.find(glyph) != -1))
{
// Escape illegal glyphs.
// Always escape the original input (not a possible translation),
// otherwise we can't know if what the input was when decoding:
// the input could have been translated first or not.
for (int j = 0; j < glen; ++j)
{
filename += escape;
filename += to_hex_string(gp[j]);
}
continue;
}
// Append the glyph to the filename.
filename += glyph;
}
return filename;
}
And the decoder function
std::u8string filename_to_u8string(std::filesystem::path const& filename,
std::u8string const& from, std::u8string const& to)
{
using namespace detail::us2f;
std::u8string input = filename.u8string();
std::u8string result;
Dictionary from_dictionary(from);
Dictionary to_dictionary(to);
// First unescape all bytes in the filename.
int glen; // The number of bytes of the current glyph.
for (char8_t const* gp = input.c_str(); *gp; gp += glen)
{
glen = utf8_glyph_length(gp);
std::u8string_view glyph(gp, glen);
// First translate escape sequences back - those are then always
// original input.
if (*gp == escape)
{
if (gp[1] == escape)
{
glen = 2; // Skip the second escape character too.
result += escape;
}
else
{
char8_t val = 0;
for (int d = 1; d <= 2; ++d)
{
val <<= 4;
val |= ('0' <= gp[d] && gp[d] <= '9') ? gp[d] - '0'
: gp[d] - 'A' + 10;
}
result += val;
glen = 3; // Skip the two hex digits too.
}
continue;
}
else
{
// Otherwise - if the character is in the from dictionary, it must have
// been translated - otherwise it would have been escaped.
int from_index = from_dictionary.find(glyph);
if (from_index != -1)
glyph = to_dictionary[from_index];
}
result += glyph;
}
return result;
}
You can find this all back (and the latest version) on github

How can I use languages (like arabic or chinese) in a QString?

How can I use languages (like arabic or chinese) in a QString?
I am creating a QString:
QString m = "سلام علیکم";
and then I am saving it into a file using:
void stWrite(QString Filename,QString stringtext){
QFile mFile(Filename);
if(!mFile.open(QIODevice::WriteOnly | QIODevice::Append |QIODevice::Text))
{
QMessageBox message_file_Write;
message_file_Write.warning(0,"Open Error"
,"could not to open file for Writing");
return;
}
QTextStream out(&mFile);
out << stringtext<<endl;
out.setCodec("UTF-8");
mFile.flush();
mFile.close();
}
But, when I open the result file I see:
???? ????
What is going wrong? How can I get my characters to be saved correctly in the file?
QString has unicode support. So, there is nothing wrong with having*:
QString m = "سلام علیکم";
Most modern compilers use UTF-8 to encode this ordinary string literal (You can enforce this in C++11 by using u8"سلام عليكم", see here). The string literal has the type of an array of chars. When QString is initialized from a const char*, it expects data to be encoded in UTF-8. And everything works as expected.
All input controls and text drawing methods in Qt can take such a string and display it without any problems. See here for a list of supported languages.
As for the problem you are having writing this string to a file, You just need to set the encoding of data you are writing to a codec that can encode these international characters (such as UTF-8).
From the docs, When using QTextStream::operator<<(const QString& string), The string is encoded using the assigned codec before it is written to the stream.
The problem you have is that you are using the operator<< before assigning. You should setCodec before writing. your code should look something like this:
void stWrite(QString Filename,QString stringtext){
QFile mFile(Filename);
if(!mFile.open(QIODevice::WriteOnly | QIODevice::Append |QIODevice::Text))
{
QMessageBox message_file_Write;
message_file_Write.warning(0,"Open Error"
,"could not to open file for Writing");
return;
}
QTextStream out(&mFile);
out.setCodec("UTF-8");
out << stringtext << endl;
mFile.flush();
mFile.close();
}
* In translation phase 1, Any source file character not in the basic character set is replaced by the universal-character-name that designates the character,The basic character set is defined as follows:
N4140 §2.3 [lex.charset]/1
The basic source character set consists of 96 characters: the space
character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’
This means that a string like:
QString m = "سلام عليكم";
Will be translated to something like:
QString m = "\u0633\u0644\u0627\u0645\u0020\u0639\u0644\u064a\u0643\u0645";
Assuming that the source file is encoded in an encoding that supports storing such characters such as UTF-8.

Extracting numbers from string into an array

I'm having a problem with an assignment. I have to open a text file that looks more or less like this:
-------------------------------------------------------------
|ammount | time |delay |
-------------------------------------------------------------
|100 | 342 | 4324 |
with a few more rows. All I have to do is get the numbers into an array, which, for the example above, would look like this: ar[0]=100, ar[1]=342, ar[2]=4324. I imagine that I need to read the file line by line into strings with getline, but what next? If I use stringstream, I would get |100 instead of just 100. I'm really out of ideas now.
To read one line of input like you described (file may be an ifstream or a istringstream here):
for (int i = 0; i < 3; ++i)
{
file.ignore(numeric_limits<streamsize>::max(), '|'); // Ignores all characters until it finds a '|' character
file >> ar[i]; // Reads the number following the '|' to ar[i]
}
file.ignore(numeric_limits<streamsize>::max(), '\n'); // Finally, ignores all characters until newline
You can even make a small shortcut macro if you want:
#define ignore_until(c) ignore(numeric_limits<streamsize>::max(), c)
and use it like this:
file.ignore_until('|');

Reading a 2d array with blank spaces into char

I have a 2d array that represents a map/maze that looks like this :
+-+-+-+-+-+
| |
+-+ +-+ + +
| | | |
+ +-+-+ + +
| | |
+-+ +-+-+-+
And I have the following code for reading that map :
char mapa[hlimit][wlimit];
for(int j=0;j<hlimit;j++)
cin>>mapa[j];
also tried this :
char mapa[hlimit][wlimit];
for(int j=0;j<hlimit;j++)
for(int k=0;k<wlimit;k++)
cin>>mapa[j][k];
Both ways the for loop ends before I enter the whole map. I tried replacing the blank spaces in the map with dots and the input works flawlessly. So, how do I do the input with spaces? I tried cin.getline(mapa[j],wlimit) also, didn't work for me.
The answer provided by WhozCraig suggesting using get() worked flawlessly. Here's my code :
char mapa[hlimit][wlimit];
for(int j=0;j<hlimit;j++)
{
for(int k=0;k<wlimit;k++)
mapa[j][k]=cin.get();
cin.get();
}

C++ Reading a text file backwards from the end of each line up until a space

Is it possible to read a text file backwards from the end of each line up until a space? I need to be able to output the numbers at the end of each line. My text file is formatted as follows:
1 | First Person | 123.45
2 | Second Person | 123.45
3 | Third Person | 123.45
So my output would be, 370.35.
Yes. But in your case, it's most likely more efficient to simply read the whole file and parse out the numbers.
You could do something like this (and I'm writing this in pseudocode so you have to acutally write real code, since that's how you learn):
seek to end of file.
pos = current position
while(pos >= 0)
{
read a char from file.
if (char == space)
{
flag = false;
process string to fetch out number and add to sum.
}
else
{
add char to string
}
if (char == newline)
{
flag = true;
}
pos--
seek to pos-2
}