How to adjust the indentation of code in text file using c++? - c++

I am copying code of file 1 to file 2 , but i want the code in file 2 to look adjusted with indentation like this: at the beginning indentation=0, every curly bracket opened increases the depth of indentation, every curly bracket closed reduces the indentation 4 spaces for example. I need help in fixing this to work
char preCh;
int depth=0;
int tab = 3;
int d = 0;
int pos = 0;
file1.get(ch);
while(!file1.eof())
{
if(ch=='{')
{
d++;
}
if(ch=='}'){
d--;
}
depth = tab * d;
if(preCh == '{' && ch=='\n'){
file2.put(ch);
for (int i = 0; i <= depth; i++)
{
file2.put(' ');
}
}
else
file2.put(ch);
preCh = ch;
ch = file1.get();
}
}
result must be indented like in code editors:
int main(){
if(a>0)
{
something();
}
}

Maybe, unexpectedly for you, there is no easy answer to your question.
And because of that, your code will never work.
First and most important, you need to understand and define indentation styles. Please see here in Wikipedia. Even in your given mini example, you are mixing Allman and K&R. So, first you must be clear, what to use.
Then, you must be aware that brackets may appear in quotes, double quotes, C-Comments, C++ comments and even worse, multi line comments (and #if or #idefs). This will make life really hard.
And, for the closing brackets, and for example Allman style, you will know the needed indentation only after you printed already the "indentation spaces". So you need to work line oriented or use a line buffers, before you print a complete line.
Example:
}
In this one simple line, you will read the '}' character, after you have already printed the spaces. This will always lead to wrong (too far right) indentation.
The logic for only this case would be complicated. Ant then assume statements like
if (x ==5) { y = 3; } } } }
So, unfortunately I cannot give you an easy solution.
A parser would be needed, or I simply recommend any kinde of beautifier or pretty printer

Related

Looking for a better algorithn of findins substrings in strings using Qt

UPDATE
I'll add some info about the problem to give you a better idea about why is everything done the way it is.
The main point of the whole script is to find all errors in a special file that keeps original and translated strings.
The script requires the "special" bilingual file(an xml in real life) and a "special" vocabulary file which keeps words and their tranlations(xls, xlsx constructed by hand. PO would probably be better.)
As a result it find all errors in translation, using the provided vocabulary.
Obviously if the vocab is bad the result sucks.
At some point of time the whole thing used 'std' or mostly 'std' and 'boost regular expressions'.
At some other point of time came the need for utf-8 support, including the regular expressions. We had no time to write complex stuff, so it was decided to go the QT way.
We were aware that it is possible to iterate over bytes. But we needed actual letters and sequences of letters also we needed to cut the word ending which is done though regular expressions, and no other regex supports utf-8 relatively good.
It was decided that Qt fitted the role far better than anything we would write ourselves in very limited time, as Qt has utf-8 support, and as of v5 keeps all internal stings as utf-8 encoded(as far as I am aware).
It was pointed out that complexity of proposed solution looks like O(m * n).
In reality it's probably even worse - closer to O(m * n * log(l)) or even O(m * n * l) strait. Here m is number of strings, n - number of vocabulary records, l - number of synonyms each word has(l is always at least equals 1).
Since we need to check all strings, and for each string run the whole vocabulary to find all errors, I currently see no way how can we make it any faster, because there is no real way faster.
As the question implies I am looking for a better solution to an existing coding problem.
I am gonna try to explain what exactly the problem is as best as I can.
Imagine you have a piece of code written on C++ that takes a string, a translation of the string,
gets rid of pesky word endings.
After that it takes another file which is a vocabulary and actually runs the whole vocab to find out whether the translation of the string has any errors.
Obviously this thing is highly dependent on the actual vocabulary, but that is not really a problem.
I actually have a described piece of code, although I need to mention the whole thing runs through CGI(don't ask, but at some point it was decided that C++ will run it faster). I can have the full code uploaded to git repo, it's rather big, but I will share the essential parts here.
The current problem I am facing is two fold: either the code does not find all it is supposed to, or it works too slow(probably gets stuck somewhere, but I have not yet pin pointed where)
The main idea behind the code was:
// All definitions for essential structures so you have a better idea what he hell is goind on
struct Word {
QString full = "";
QString stemmed = "";
};
struct VocRecord {
QVector<Word> orig;
QVector<Word> trans;
QString error = "";
void clearRecord() {
this->orig.clear();
this->trans.clear();
this->error = "";
}
};
typedef QVector<VocRecord> Vocabluary;
......
Vocabluary voc = .....; // Obviosly here we get the vocabulary, now how we get it is rather complicated, you can just assume it looks like defined vector of records.
QString origStemmed, transStemmed, orig, trans;
// orig - original string
// trans - it's translation
// origStemmed - original string with removed word endings (we call it stemming hence stemmed)
// transStemmed - transtalion with removed word endings.
At first the algo was something along the lines of:
origStemmed = QString(" ") + origStemmed + QString(" "); // Add whitespaces in the begin and end of string for searching
transStemmed = QString(" ") + transStemmed + QString(" ");
for(int i = 0; i < voc.length(); i++) {
VocRecord record = voc[i];
for(int j = 0; j < record.orig.length(); j++) {
Word origWord = record.orig[j];
si = origStemmed.indexOf(QString(" ") + origWord.stemmed + QString(" "));
if(si > -1) {
int ind = origWord.stemmed.indexOf(" ");
int idx = 0;
if(ind != -1) {
// Found a space in record, means record contains at least two words.
// Here we care where the firs word ends, an it's part of the global problem
idx = origMod.indexOf(origWord.full.mid(0, ind));
} else {
// We did not find a space, do one word only, take the whole thing.
idx = origMod.indexOf(origWord.full);
}
// Now comes the tricky part, we try to figure out if that original text, in which we found our voc record, had any punctuation after the word.
// Now this actually matters only for records that have more then one word in reality, but as you'll see we check all of them and that is not correct - still figuring how to get around it.
QChar symb; - // We'll keep our last symbol of first word here
// originMod - modified original: everything is lowercase, punctuation is kept.
// The main reason we have this at all is because when stemming we have to get rid of all punctuation so we keep the "lowercased" string separate.
// I am 100% sure we don't need it at all since Qt supporrts case insensitive search, but I would like to hear your opinion on it.
if(origMod.indexOf(" ", idx) > 0) {
symb = origMod[origMod.indexOf(" ", idx)-1];
} else {
symb = origMod[origMod.length()-1];
}
// When we have the last symbol we skip the the found word
if(ind != -1 && (symb == QChar(',') || symb == QChar(';') || symb == QChar('!') || symb == QChar(':') || symb == QChar('?') || symb == QChar('.'))) {
continue;
}
// The important part ends here
............
As you will notice we search for stemmed word in the original string.
by all accounts it should work, but the main problem of proposed search that it can have several matches including false ones, and we only care about first found one. The most obvious solution is probably go through all matches, but I am unsure that is a good idea, it requires another loop and the algo is quite slow already.
The next solution I came up with to solving the problem was using regular expressions, but I must have messed up, because the algo started to be "really slow".
The main idea of the second solution:
// We DO not add spaces! spaces suck big time.
for(int i = 0; i < voc.length(); i++) {
VocRecord record = voc[i];
for(int j = 0; j < record.orig.length(); j++) {
Word origWord = record.orig[j];
// In stead of using spaces, we search for a regular expression made from vocab record.
// The simple contains actually runs into the same set of problems namely more then one match or in some cases false matches(when the searched part matches something it should not).
// Now this is terribly slow as you can imagine because we create regular expressions on the fly and not pre-make them. But I still have not thought of a way around it.
if(origStemmed.contains(origWord.stemmed + "\\b",
QRegularExpression::UseUnicodePropertiesOption | QRegularExpression::CaseInsensitiveOption))) {
// Here we do something ungodly. We take our stemmed voc record, split it by space, then go through all parts making striing that will become our regular expression later
QString temp;
parts.clear();
parts = origWord.stemmed.split(" ");
for(int k = 0; k < parts.count(); k++) {
temp += "\\b" + parts[k] + "[a-z]*?\\b";
}
// After we added everything we need? we join the whole thing back by spaces.
temp = parts.join(" ");
// And here is the Ungodly chech - we actually search for the made regular expression in the original sting, and because we made sure to exclude any punctuation from expression in theory this should work.
if(!origMod.contains(QRegularExpression(temp, QRegularExpression::UseUnicodePropertiesOption | QRegularExpression::CaseInsensitiveOption))) {
continue;
}
// Well it does not work, or rather it works so slow - it's impossible to get any result, and even if we do, we still don't find everything we should - I blame the shitty regex here.
// And the important part ends.
As I pointed the second solution sucks big time. Currently I am aiming for some intermediate solution and would gladly accept any tips or suggestions you can make on where to look or what to look for.
If any of you will want to see the full code for this thing - just add a comment, I'll github all the important files in a separate repo.

C++ Qt creator can I comment each line, instead of commenting just the selection?

I think this question must have been asked before, but I couldn't find any.
So in Qt creator, let's say that I have some code like this:
int var1;
int var2;
for (int i = 0; i < 10; i++) {
// do sth
}
When I select a bunch of lines from the beginning of the first line till the end of the last line and toggle comment, I get this:
// int var1;
// int var2;
// for (int i = 0; i < 10; i++) {
// // do sth
// }
But when I select from the middle of the first line, I get something like this:
int v/*ar1;
^ note the /*
int var2;
for (int i = 0; i < 10; i++) {
// do sth
}*/
^ and */
What I would like is, have Qt creator comment using // from the beginning of each line selected, just like the first example.
Is there a way to do this? For all IDEs and editors I have used in the past (Atom, Sublime etc) this has worked, so I assume that there must be a way, but I can't seem to find it.
Thanks in advance.
Ctrl + / will comment in/out the line the cursor happens to be on.
If you have a selection that spans over the entirety of one or more lines, the // comment will be generated as well.
But if your selection doesn't span over entire lines, the /* */ format will be used.
That is very logical behavior that allows you to have both comment styles, depending on whether you want to comment out an entire line or just a small fragment. There is no benefit in losing the second commenting style, which can be quite useful at times, simply select lines end to beginning if you don't want the comment block style.

Split a even-numbered string in c++

I am very new to c++. I am trying to split a string that contains even numbered sub strings till there is no even numbered sub string left. For example, if I input AB ABCD ABC, the output should be A B A B C D ABC. I am trying to do it without tokens, because I don't know how to..
What I have so far only split the first even sub string and it doesn't work if I only have 1 sub string. Can someone please help me out?
Any advise will be much appreciated. Thank you!
string temp = "";
void check(string &str, int &i, int &flag)
{
int count = 0;
int reminder;
do
{
count++;
temp += str[i];
i++;
} while (str[i] != ' ');
i = i - temp.size();
reminder = count % 2;
if (reminder == 0)
flag = 1;
else
flag = 0;
}
void SplitEvenWord(string &str)
{
int i = 0;
int flag = 0;
for (i = 0; i < str.size(); i++)
{
check(str, i, flag);
if (flag == 1)
{
temp.insert(temp.size() / 2, " ");
str.replace(i, temp.size() - 1, temp);
}
}
}
There are two skills that are absolutely vital in software engineering (Well, more than two, but two for now): developing new functions in isolation, and testing things in the simplest possible way.
You say that the code fails if there is only one substring. You don't say how it fails (I should have mentioned clear error reports in the list) so I don't know whether to test your code with an even-length string which it ought to split ("ABCD" => "A B C D") or an odd-length string which it ought to leave alone ("ABC" => "ABC"). Before I try to code these up, I look at your first function:
void check(string &str, int &i, int &flag)
{
...
do
{
count++;
temp += str[i];
i++;
} while (str[i] != ' ');
...
}
Trouble already. The strings I have in mind do not contain any spaces, so the loop cannot terminate. This code will run past the end of the string into whatever happens to be in that memory space, which will cause undefined behavior. (If you don't know that term, it means that there's no telling what will happen, but if you're lucky the program will just crash.)
Fix that, try running that code on "ABC" and "ABCD" and "A" and "" and "ABC DEF", and get it working perfectly. Once it does, take a look at your other function. Don't test it with random typing, test it with short, clearly defined strings. Once it works perfectly, try longer, more complicated ones. If you find a string which causes it to fail, hold onto it! That string will lead you to a bug.
That should be enough to get you started.
I'm writing this as an answer because it was too long to fit as a comment.
I have a couple of suggestions that may help you to figure out what the problem is.
Separate "check" into at least two functions, one to split the string into individual words and check them and one to check the length of the string.
Test the "check" and "tokenize" functions by separately and see if they give you the expected answers. Work on them individually until they are correct.
Separate the formatting of the answers out of "SplitEvenWord" into a separate function.
"SplitEvenWord" should then be nothing more than calling the functions you created as a result of the steps above.
When I'm stuck, I always try to break the problem down into small bite sized pieces that I know I can get working. Eventually, the problem becomes assembling the already working pieces of the solution into a larger function that solves the original problem.

Alternative to break?

I'm pretty new to c++, and I was told not to use a 'break' statement. I was curious what are some alternatives to a 'break'? (using the example of the code below)
void remove_comments( ifstream& fileIn , ofstream& fileOut)
{
string line;
bool flag = false;
bool found = false;
while (! fileIn.eof() )
{
getline(fileIn, line);
if (line.find("/*") < line.length() )
flag = true;
if (! flag)
{
for (int i=0; i < line.length(); i++)
{
if(i<line.length())
if ((line.at(i) == '/') && (line.at(i + 1) == '/'))
break;
else
fileOut << line[i];
}
fileOut<<endl;
}
if(flag)
{
if(line.find("*/") < line.length() )
flag = false;
}
}
}
In my opinion using break is quite OK but if your task is to do the job without it then let's do this without it. The very same problem can be solved by using several differently structured codesnippets that use different control flow statements from C++. This problem can also be solved without break. I recommend you to break your function into a central function and several helper functions. Since I don't want to solve the problem instead of you I help just with instructions and with some "pseudo code"-ish something.
You have an input text that consists of commented and noncommented sections in turns. You want to do the following in a loop:
// I refer to non-commented text as "writable"
writable_begin = 0
while (writable_begin < text_len)
{
writable_end, comment_type = find_next_comment_begin(writable_begin);
write_out_text(writable_begin, writable_end);
if (comment_type == singleline)
writable_begin = find_singleline_comment_end(writable_end);
else
writable_begin = find_multiline_comment_end(writable_end);
}
You have to find out how to implement the helper functions/methods I used in my pseudo code, they can easily be implemented without break. If you solve the problem with helper functions you also get a much nicer looking solution than your current spaghetti code that uses complex control flow statements. Many bugs can easily hide in such code.
Tip: Your helper functions will search the end of the commented text in a loop but instead of break you can simply use return to exit the helper func with the result.
You could rewrite the loop
for (int i=0;
i < line.length() &&
!(i+1 < line.length() && (line.at(i) == '/') && (line.at(i + 1) == '/'));
++i)
{
fileOut << line[i];
}
fileOut<<endl;
Breaking is sometimes necessary -- without breaks you might crash into the stuff ahead and hurt yourself.
You may also hurt yourself by thinking poorly and then solving the problem in a cryptic manner that even you won't understand 6 months later.
Lastly -- whoever told you not to use a "break" .. give him a break -- never stop by him/her/it for advise.
BTW -- work on your indentation and curlies -- not good.
You could - and should - rewrite that loop. Mankarse showed one option, but that's got all those weird and difficult to understand conditions in the for loops.
You should learn to leverage the power of the standard library. For example, this code will remove all the characters that follow a C++ style line comment from the string stored in line:
// Find the first instance of two forward slashes in the line
// and return an iterator to that.
auto begin_comment = line.find("//");
// We found it! Remove all characters from that point on
if (begin_comment != std::string::npos)
line.erase (begin_comment, line.end());
fileOut << line << std::endl;
Consider how you could also take small chunks of code like that and put them into functions, which you will call to do work on your behalf. This will not only keep the code more readable, but it will get you into the habit of designing interfaces, which is a very important skill to have.
As a sidenote, your indentation is really bad and you must work on it. Look at this gem:
for (int i=0; i < line.length(); i++)
{
if(i<line.length())
if ((line.at(i) == '/') && (line.at(i + 1) == '/'))
break;
else
fileOut << line[i];
}
Which of those two if statements does the else match up against? Are you immediately and completely sure that you are right?
To be sure, the compiler doesn't need indentation and couldn't care less for it. But you do, and you will soon find out that as your code grows more complex, unless it's properly indented, it will be impossible to understand.

Text difference of scrolling output

I have code that is capturing the text from scrolling output and I'm looking for an algorithm (working with C++/Qt) that can tell me which lines are new. NOTE: New lines are only ever added to the end.
So on first capture I might have the following:
hello world
some more text
hello world
some text
And on second capture might have:
hello world
some text
yet more text
hello world
So I want the algorithm to return that I have two new lines:
yet more text
hello world
If possible it would be help performance if it could start from the last line and terminate once it reaches an already processed line. But I'm thinking this is probably not possible since there can be duplicate lines.
Well you say its scrolling, and you are using OCR, so could you also capture the size of the scroll widget on the scroll window, and check that along with the lines youve recorded?
Alternatively can you hook a dll into the producer program so you can signal when it outputs a new line? or directly pipe its output into yours?
For your special case I would consider a plain basic loop-inside-loop algorithm. I don't think that performance is really an issue (not so much lines, I also consider OCR to be the major part) and therefore the algorithm should be easily readable and robust.
One possible algorithm in pseudo code:
numberOfNewLines = 0
while numberOfNewLines <= numberOfTotalLines do
compare lines
[1..numberOfTotalLines-numberOfNewLines] of textNew
with lines [1+numberOfNewLines..numberOfTotalLines] of textOld
if identical then exit while
numberOfNewLines++
end while
You can break comparison as soon as one line differs, but still the algorithm is O(N^2) in the number of lines.
Then you can output the last numberOfNewLines from the end of textNew. As mentioned in the comment you can of course not detect some edge cases like "10000 times 'ABC' and then 1 times 'DEF'" where most of the lines 'ABC' will be neglected.
I have tested this against a number of test cases and it works so far:
QStringList scrollDiff(const QStringList& oldLines, const QStringList& newLines)
{
if (oldLines.empty()) {
return newLines;
}
if (oldLines.size() < newLines.size()) {
return newLines.mid(oldLines.size());
}
/*
* Note: oldLines.size() == newLines.size()
*/
int i;
for (i = 0; i < oldLines.size() && oldLines[i] == newLines[i]; ++i);
if (i == oldLines.size()) {
return QStringList();
}
// Remove lines from oldLines that are no longer shown
int j = oldLines.indexOf(newLines[i]);
if (j == -1) {
return newLines;
}
QStringList commonLines = oldLines.mid(j - i);
return newLines.mid(commonLines.size());
}