Should C++ file read be slower than Ruby or C#? - c++

Completely new to C++.
I'm comparing various aspects of C++, C# and Ruby to see if there's need for mirroring a library. Currently, simple read of a file (post update).
Compiling C++ and C# in VS 2017. C++ is in release(x64) mode (or at least compile then run)
The libraries more or less read a file and split the lines into three which make up the members of an object which are then stored in an array member.
For stress testing I tried a large file 380MB(7M lines) (after update) now getting similar performance with C++ and Ruby,
Purely reading the file and doing nothing else the performance is as below:
Ruby: 7s
C#: 2.5s
C++: 500+s (stopped running after awhile, something's clearly wrong)
C++(release build x64): 7.5s
The code:
#Ruby
file = File.open "test_file.txt"
while !file.eof
line = file.readline
end
//C#
StreamReader file = new StreamReader("test_file.txt");
file.Open();
while((line = file.ReadLine()) != null){
}
//C++
#include "stdafx.h"
#include "string"
#include "iostream"
#include "ctime"
#include "fstream"
int main()
{
std::ios::sync_with_stdio(false);
std::ifstream file;
file.open("c:/sandboxCPP/test_file.txt");
std::string line;
std::clock_t start;
double duration;
start = std::clock();
while (std::getline(file, line)) {
}
duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << "\nDuration: " << duration;
while (true)
{
}
return 0;
}
Edit: The following performed incredibly well. 0.03s
vector<string> lines;
string tempString = str.str();
boost::split(lines, tempString, boost::is_any_of("\n"));
start = clock();
cout << "\nCount: " << lines.size();
int count = lines.size();
string s;
for (int i = 0; i < count; i++) {
s = lines[i];
}
s = on the likelihood that I don't know what boost is doing. Changed performance.
Tested with a cout of a random record at the end of the loop.
Thanks

Based on the comments and the originally posted code (it has now been fixed [now deleted]) there was previously a coding error (i++ missing) that stopped the C++ program from outputting anything. This plus the while(true) loop in the complete code sample would present symptoms consistent with those stated in the question (i.e. user waits 500s sees no output and force terminates the program). This is because it would complete reading the file without outputting anything and enter into the deliberately added infinite loop.
The revised complete source code correctly completes (according to the comments) in ~1.6s for a 1.2 million file. My advice for improving performance would be as follows:
Make sure you are compiling in release mode (not debug mode). Given the user has specified they are using Visual Studio 2017, I would recommend viewing the official Microsoft documentation (https://msdn.microsoft.com/en-us/library/wx0123s5.aspx) for a thorough explanation.
To make it easier to diagnose problems do not add an infinite loop at the end of your program. Instead run the executable from powershell / (cmd) and confirm that it terminates correctly.
EDIT: I would also add:
For accurate timings you also need to take into account the OS disk cache. Run each benchmark multiple times to 'warm-up' the disk cache.

C++ doesn’t automatically write everything the instant you tell it to. Instead, it buffers the data so it can write it all at once, which is usually faster. To say “I really want to write this now.”, you need to say something like std::cout << std::flush (if you use std::endl to end your lines it does this automatically).
Usually you don’t need to do this; the buffers are flushed when the program exits, or when you ask for input from the user, or things like that. However, your program doesn’t exit, so it never flushes its buffer. You read the input, and then the program is executing while(true) forever, never giving the output.
The solution to this is simple: remove the while loop at the end of the program. You should not have that; people usually assume a console program exits when it’s finished. I would’ve guessed you had that because Visual Studio automatically closed the console window when the program was finished, but apparently it doesn’t do that with Ctrl+F5, which you use, so I’m not sure.

Related

GDB Skips While Loop Condition When Used With File Input

Alright Stack Overflow, I am running into a decently persistent problem in my C++ code. I'm sure this is one of those dumb mistake moments, but I have tried everything and cannot seem to squish this bug.
I have a bit of code here, and it's behavior is very odd. I have a main function that opens a file containing text I want to read in. I was taught in programming fundamentals class at my university that I could use getline() as a condition for the while loop, which is nice since it automatically terminates when it reaches the end of the file.
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
fstream input_mem_traces("gcc.txt");
string trace_to_parse = "";
while(getline(input_mem_traces, trace_to_parse))
{
cout << trace_to_parse << endl;
}
}
When I compile and run it, it works just fine. It reads out every single line of the file I pass it, and returns with no problems.
However, when I try to use gdb, and set a breakpoint at the line
cout << trace_to_parse << endl;
it didn't hit the breakpoint. Curious as to why that was, I broke above the loop, and tried single stepping through the code. When I got to the while loop, and tried to step, it simply skipped to the line after it, which happened to be the end of the program.
This behavior occurs both using the VSCode GUI front end for gdb, as well as straight gdb from the command line. I am running this on Windows using Ubuntu under WSL2, and VSCode as my IDE with the Remote - WSL extension enabled.
Turns out there was some weirdness going on with the working directory with GDB. For some reason, my working directory was changing under the VSCode GUI, thus the file was not able to be opened, and the while loop condition performed as expected for that circumstance by not entering the loop. Upon the recommendation by Retired Ninja in the comments, I used an absolute path in the fstream object constructor, and that solved the issue.

C++ Debug Assert Error / Crash (While Loop)

I'm debugging a very old application (Visual C++ 6) due to a crash. I have isolated the crash into a while loop. This app reads a bunch of strings from a database and outputs them into a file for processing.
This while loop appears to be looping through all strings retrieved from the DB and wrapping them in a delimiter/end character. However, the while loop will randomly crash.
I have a particular set of data containing around 2000 strings, and at some point, during processing, the loop will crash (or give a debug assertion error in debug mode).
What's strange is that this loop does not stop in the same location each time (bad string was my first thought), nor is this a common occurrence. This app has processed hundreds of thousands of rows over the years, and yet it chokes on this particular data set somewhat randomly.
I added some logging code initially (which I've stripped out, and it appears that the crash is occurring on the actual execution/evaluation of the WHILE line.
Is there anything, in particular, I should look for? I was thinking perhaps this is an index out of bounds error, but I would think it would happen consistently in the same spot with the same data set if that were the case. Does anyone see anything with this code I should double-check, or can offer any advice on how to better troubleshoot this?
//Definitions
CStringList m_stlMessageList;
CStringList m_stlMessageIDList;
//Populating list
rec.Open(CRecordset::forwardOnly, strSQL);
while (!rec.IsEOF())
{
rec.GetFieldValue("string_id", strMessageID);
strMessageID.TrimRight();
rec.GetFieldValue("string_text", strMessage);
strMessage.TrimRight();
rec.GetFieldValue("string_destination", strDestination);
strDestination.TrimRight();
rec.GetFieldValue("string_ip", strIP);
strIP.TrimRight();
//Add to the Lists...
m_stlMessageIDList.AddTail(strMessageID);
m_stlMessageList.AddTail(strMessage);
rec.MoveNext();
}
//Code thats crashing.
CString strSegment, strTemp, strTmp;
BOOL bFirst = TRUE;
POSITION POS = m_stlMessageList.GetHeadPosition();
while (POS)
{
strSegment = m_stlMessageList.GetNext(POS);
strTemp = strSegment.Left(4);
strSegment.Format(strSegment + "%c", 13);
if (POS)
{
strTmp = m_stlMessageList.GetPrev(POS);
m_stlMessageList.SetAt(POS, strSegment);
strTmp = m_stlMessageList.GetNext(POS);
}
else
{
m_stlMessageList.RemoveTail();
m_stlMessageList.AddTail(strSegment);
}
}

Why does getchar work like a buffer instead of working as expected in real-time

This is my first question on stackoverflow. Pardon me if I haven't searched properly but I do not seem to find an explanation for this. Was just attempting an example from Bjourne Stroustroup's papers. Added my bits to see the array get re-sized as I type the text.
But it doesn't seem to work that way! getchar() simply waits till I am done with entering all the characters and then it will execute the loop. As per the logic, it doesn't actually go into the loop, get a character, perform its actions and then iterate. I am wondering if this is implementation specific, or intended to be like this?
I am on Ubuntu 14.04 LTS using Codeblocks with gcc 4.8.2. The source was in cpp files if that matters.
while(true)
{
int c = getchar();
if(c=='\n' || c==EOF)
{
text[i] = 0;
break;
}
text[i] = c;
if(i == maxsize-1)
{
maxsize = maxsize+maxsize;
text = (char*)realloc(text,maxsize);
if(text == 0) exit(1);
cout << "\n Increasing array size to " << maxsize << endl;
}
i++;
}
The output is as follows:
Array Size is now: 10
Please enter some text: this is some sample text. I would have liked to see the memory being realloced right here, but apparently that's not how it works!
Increasing array size to 20
Increasing array size to 40
Increasing array size to 80
Increasing array size to 160
You have entered: this is some sample text. I would have liked to see the memory being realloced right here, but apparently that's not how it works!
Array Size is now: 160
This has nothing to do with getchar directly. The "problem" is the underlying terminal, which will buffer your Input. The Input is sent to the program after you press enter. In Linux (dunno if there is a way in Windows) you can workaround this by calling
/bin/stty raw
in terminal or by calling
system ("/bin/stty raw");
in your program. This will cause getchar to immediately return the input character to you.
Dont forget to reset the tty behaviour by calling
/bin/stty cooked
when done!
Here is an example (for Linux):
#include <stdio.h>
#include <stdlib.h>
using namespace std;
int main() {
system ("/bin/stty raw");
char c = getchar();
system ("/bin/stty cooked");
return 0;
}
Also have a look at this SO Post: How to avoid press enter with any getchar()
Also, as suggested in the comments, have a look here: http://linux.die.net/man/3/termios especially on the command tcsetattr, which should work cross-platform.
Actually, tcsetattr does not apply to Windows (which is what is commonly referred to in this site as "cross-platform"). However, the question is tagged for Linux, so "cross-platform" is a moot point.
By default the standard input, output and error streams are set to
line-buffered (input)
block-buffered (output)
line-buffered (error)
You can change that using setbuf, but of course will not solve the problem (the answer calls for single-character input). POSIX terminal I/O (termios) lets you change via a system call any of the flags shown using stty. As a rule, you might call stty directly from a script, rarely from a C program.
Reading a single character is a frequently asked question, e.g.,
How can I read single characters from the terminal? (unix-faq)
How can I read a single character from the keyboard without waiting for the RETURN key? How can I stop characters from being echoed on the screen as they're typed? (comp.lang.c FAQ)
You could also use ncurses: the filter function is useful for programs that process a command-line (rather than a full-screen application). There is a sample program in ncurses-examples (filter.c) which does this.

C++ unexpected hangtime

I am trying to write a function in my program that loads a huge text-file of 216,555 words and put them as strings into a set. This works properly, but as expected, it will hang for a few micro seconds while looping through the file. But there is something funky going on with my loop, and it's not behaving properly. Please take the time to read, I am sure there's a valid reason for this, but I have no idea what to search for.
The code, which is working by the way, is this:
ifstream dictionary;
dictionary.open("Dictionary.txt");
if(dictionary.fail())
{
cout<<"Could not find Dictionary.txt. Exiting."<<endl;
exit(0);
}
int i = 0;
int progress = 216555/50;
cout<<"Loading dictionary..."<<endl;
cout<<"< >"<<endl;
cout<<"<";
for(string line; getline(dictionary, line);)
{
usleep(1); //Explanation below (not the hangtime)
i++;
if(i%progress == 0)
cout<<"=";
words.insert(line);
}
The for-loop gets every string from the file, and inserts them in the map.
This is a console-application, and I want the user to see the progress. It's not much of a delay, but I wanted to do it anyway. If you don't understand the code, I'll try to explain.
When the program starts, it first prints out "Loading Dictionary...", and then a "<" and a ">" separated by 50 spaces. Then on the next line: "<" followed by a "=" for every 433 word it loops through (216555/50). The purpose of this is so the user can see the progress. The wanted output half way through the loop would be like this:
Loading dictionary...
< >
<=========================
My problem is:
The correct output is shown, but not at the expected time. It prints out the full progress bar, but that only after it has hanged and are done with the loop. How is that possible? The correct number of '=' are shown, but they all pop out at the same time, AFTER it hangs for some microseconds. I added the usleep(1) to make the hangtime a bit longer, but the same thing happened. The if-statement clearly works, or the '=' would've never showed at all, but it feels like my program is stacking the cout-calls for after the entire loop.
The weirdest thing of all, is that the last cout<<"<"; before the for-loop starts, is also shown at the same time as the rest of its line; after the loop is finished. Why and how?
You never flush the stream, so the output just goes into a buffer.
Change cout<<"="; to cout<<"="<<std::flush;
You need to flush the output stream.
cout << "=" << std::flush;
The program is "stacking the cout-calls". It's called output buffering, and every major operating system does it.
When in interactive mode (as your program is intended to be used), output is buffered by line; that is, it will be forced to the terminal when a newline is seen. You can also have block-buffered (fixed number of bytes between forced outputs; used when piping output) and unbuffered.
C++ provides the std::flush stream modifier to force an output at any point. It can be used like this:
cout << "=" << std::flush;
This will slow down your program a bit; the point of buffering is for efficiency. As you'll only be doing it about 51 times, though, the slowdown should be negligible.

how do I tell when a c++ program is waiting for input?

I'm trying to control a simple c++ program through python. The program works by prompting the user for input. The prompts are not necessarily endl terminated. What I would like to know is if there is a way to tell from python that the c++ program is no longer generating output and has switched to requesting input.
Here's an simple example:
c++
#include <iostream>
using namespace std;
int main()
{
int x;
cout << "Enter a number ";
cin >> x;
cout << x << " squared = " << x * x << endl;
}
python:
#! /usr/bin/python
import subprocess, sys
dproc = subprocess.Popen('./y', stdin=subprocess.PIPE,stdout=subprocess.PIPE, stderr=subprocess.PIPE)
while (True) :
dout = dproc.stdout.read(1)
sys.stdout.write(dout)
dproc.stdin.write("22\n")
This sort of works but writes to dproc.stdin too much. What I am looking for instead is a way to read everything from dproc.stdout until the program is ready for input and then write to dproc.stdout .
If possible, I would like to do this w/o modifying the c++ code. ( However, I have tried playing with buffering on the c++ side but it didn't seem to help )
Thank you for any responses.
I'm not aware of a sufficiently general paradigm for detecting that the remote end is waiting for input. I can think of basic ways, but also of situations in which they will fail. Examples:
Read with a non-blocking pipe/socket until no input arrives: the response might be interrupted by the process writing to disk, waiting for a reply from a database etc.
Wait until the process becomes idle: a background thread might still be running.
You'll have to be aware of the protocol used by the application you're trying to control. In a sense, you'll be designing an interpreter for that process. pexpect is a toolset for Python based on this idea.
Have a look at pexpect.