I'm writing a command-line tool for Mac OS X that processes a bunch of files. I would like to show the user the current file being processed, but do not want a bazillion files polluting the terminal window.
Instead I would like to use a single line to output the file path, then reuse that line for the next file. Is there a character (or some other code) to output to std::cout to accomplish this?
Also, if I wanted to re-target this tool for Windows, would the solution be the same for both platforms?
"\r" should work for both windows and Mac OS X.
Something like:
std::cout << "will not see this\rwill see this" << std::flush;
std::cout << std::endl; // all done
I don't have access to a mac, but from a pure console standpoint, this is going to be largely dependent on how it treats the carriage return and line-feed characters. If you can literally send one or the other to the console, you want to send just a carriage return.
I'm pretty sure Mac treats both carriage returns and line-feeds differently than *nix & windows.
If you're looking for in-place updates (e.g. overwrite the current line), I'd recommend looking at the curses lib. This should provide a platform independent means of doing what you're looking for. (because, even using standard C++, there is no platform independent means of what you're asking for).
As Nathan Ernst's answer says, if you want a robust, proper way to do this, use curses - specifically ncurses.
If you want a low-effort hackish way that tends to work, carry on...
Command-line terminals for Linux, UNIX, MacOS, Windows etc. tend to support a small set of basic ASCII control characters, including character 13 decimal - known as a Carriage Return and encoded in C++ as '\r' or equivalently in octal '\015' or hex '\x0D' - instructing the terminal to return to the start of the line.
What you generally want to do is...
int line_width = getenv("COLUMNS") ? atoi(getenv("COLUMNS")) : 80;
std::string spaces{line_width - 1, ' '};
for (const auto& filename : filenames) {
std::cout << '\r' << spaces << '\r' << filename << std::flush;
process_file(filename);
}
std::cout << std::endl; // move past last filename...
This uses a string of spaces to overwrite the old filename before writing the next one, so if you have a shorter filename you don't see trailing characters from the earlier longer filename(s).
The std::flush ensures the C++ program calls the OS write() function to send the text to the terminal before starting to process the file. Without that, the text needed for the update - \r, spaces, \r and a filename - will be appended to a buffer and only written to the OS - in e.g. 4k chunks - when the buffer is full, so the filename displayed would lag dozens of files behind the actual file being processing. Further, say the buffer is 4k - 4096 bytes - and at some point you have 4080 bytes buffered, then output text for the next filename: you'll end up with \r and 15 spaces fitting in the buffer, which when auto-flushed will end up wiping out the first 15 characters on the line on-screen and leaving the rest of the previous filename (if it was longer than 15 characters), then waiting until the buffer is full again before updating the screen (still haphazardly).
The final std::endl just moves the cursor on from the line where you've been printing filenames so you can write "all done", or just leave main() and have the shell prompt display on a nice clean line, instead of potentially overwriting part of your last filename (great shells like zsh check for this).
Related
When I want to print out another text in the same line, I can do this:
int i = 0;
string text = "Paragraph ";
while (i < 10) {
if (clock() % CLOCKS_PER_SEC == 0) {
cout << text << i + 1 << "\r";
cout.flush();
i++;
}
}
But, how I can I do this with multiple line? I want to retain a paragraph as a whole in its initial position in terminal. If I change text with a string that contains paragraph with some newline characters, it prints another new block of paragraph below the last printed.
How can I retain it's position?
Your question isn't very clear, but I'm going to assume you want to know how to overwrite text in places other than the current line.
Standard C++ doesn't give you this capability. You will have to use OS-specific functionality to place the cursor at an arbitrary place of the console.
Under Unix-like systems you will generally use ANSI escape sequences
Under Windows you're best served by the console manipulation functions, in particular SetConsoleCursorPosition. Look here for more console functions.
It is not possible in standard C++.
The technique depends on what the standard output device (i.e. std::cout) is - which is difficult, as that depends on the operating system and choices by the end user. For example, a lot of physical terminals (and terminal/console emulators) support escape sequences. Standard output can be redirected to various devices (including to a text file, which makes positioning the cursor a bit pointless).
In general terms, you will need to specify the output device (i.e. what your program can assume output is being written to), the host system, system settings, and a bunch of other things. And then use an API (or library) supported on the host system. Depending on your choices here, the techniques are highly variable.
Under unix, functions libraries like curses might be used. If you use curses, it will probably be necessary to use other curses functions to actually write your output (rather than cout).
Under windows, there is a set of console API functions (a subset of the win API), such as SetConsoleCursorPosition(). Again, it might be easier if you use other console functions, rather than cout.
Specifically I'm interested in istream& getline ( istream& is, string& str );. Is there an option to the ifstream constructor to tell it to convert all newline encodings to '\n' under the hood? I want to be able to call getline and have it gracefully handle all line endings.
Update: To clarify, I want to be able to write code that compiles almost anywhere, and will take input from almost anywhere. Including the rare files that have '\r' without '\n'. Minimizing inconvenience for any users of the software.
It's easy to workaround the issue, but I'm still curious as to the right way, in the standard, to flexibly handle all text file formats.
getline reads in a full line, up to a '\n', into a string. The '\n' is consumed from the stream, but getline doesn't include it in the string. That's fine so far, but there might be a '\r' just before the '\n' that gets included into the string.
There are three types of line endings seen in text files:
'\n' is the conventional ending on Unix machines, '\r' was (I think) used on old Mac operating systems, and Windows uses a pair, '\r' following by '\n'.
The problem is that getline leaves the '\r' on the end of the string.
ifstream f("a_text_file_of_unknown_origin");
string line;
getline(f, line);
if(!f.fail()) { // a non-empty line was read
// BUT, there might be an '\r' at the end now.
}
Edit Thanks to Neil for pointing out that f.good() isn't what I wanted. !f.fail() is what I want.
I can remove it manually myself (see edit of this question), which is easy for the Windows text files. But I'm worried that somebody will feed in a file containing only '\r'. In that case, I presume getline will consume the whole file, thinking that it is a single line!
.. and that's not even considering Unicode :-)
.. maybe Boost has a nice way to consume one line at a time from any text-file type?
Edit I'm using this, to handle the Windows files, but I still feel I shouldn't have to! And this won't fork for the '\r'-only files.
if(!line.empty() && *line.rbegin() == '\r') {
line.erase( line.length()-1, 1);
}
As Neil pointed out, "the C++ runtime should deal correctly with whatever the line ending convention is for your particular platform."
However, people do move text files between different platforms, so that is not good enough. Here is a function that handles all three line endings ("\r", "\n" and "\r\n"):
std::istream& safeGetline(std::istream& is, std::string& t)
{
t.clear();
// The characters in the stream are read one-by-one using a std::streambuf.
// That is faster than reading them one-by-one using the std::istream.
// Code that uses streambuf this way must be guarded by a sentry object.
// The sentry object performs various tasks,
// such as thread synchronization and updating the stream state.
std::istream::sentry se(is, true);
std::streambuf* sb = is.rdbuf();
for(;;) {
int c = sb->sbumpc();
switch (c) {
case '\n':
return is;
case '\r':
if(sb->sgetc() == '\n')
sb->sbumpc();
return is;
case std::streambuf::traits_type::eof():
// Also handle the case when the last line has no line ending
if(t.empty())
is.setstate(std::ios::eofbit);
return is;
default:
t += (char)c;
}
}
}
And here is a test program:
int main()
{
std::string path = ... // insert path to test file here
std::ifstream ifs(path.c_str());
if(!ifs) {
std::cout << "Failed to open the file." << std::endl;
return EXIT_FAILURE;
}
int n = 0;
std::string t;
while(!safeGetline(ifs, t).eof())
++n;
std::cout << "The file contains " << n << " lines." << std::endl;
return EXIT_SUCCESS;
}
Are you reading the file in BINARY or in TEXT mode? In TEXT mode the pair carriage return/line feed, CRLF, is interpreted as TEXT end of line, or end of line character, but in BINARY you fetch only ONE byte at a time, which means that either character MUST be ignored and left in the buffer to be fetched as another byte! Carriage return means, in the typewriter, that the typewriter car, where the printing arm lies in, has reached the right edge of the paper and is returned to the left edge. This is a very mechanical model, that of the mechanical typewriter. Then the line feed means that the paper roll is rotated a little bit up so the paper is in position to begin another line of typing. As fas as I remember one of the low digits in ASCII means move to the right one character without typing, the dead char, and of course \b means backspace: move the car one character back. That way you can add special effects, like underlying (type underscore), strikethrough (type minus), approximate different accents, cancel out (type X), without needing an extended keyboard, just by adjusting the position of the car along the line before entering the line feed. So you can use byte sized ASCII voltages to automatically control a typewriter without a computer in between. When the automatic typewriter is introduced, AUTOMATIC means that once you reach the farthest edge of the paper, the car is returned to the left AND the line feed applied, that is, the car is assumed to be returned automatically as the roll moves up! So you do not need both control characters, only one, the \n, new line, or line feed.
This has nothing to do with programming but ASCII is older and HEY! looks like some people were not thinking when they begun doing text things! The UNIX platform assumes an electrical automatic typemachine; the Windows model is more complete and allows for control of mechanical machines, though some control characters become less and less useful in computers, like the bell character, 0x07 if I remember well... Some forgotten texts must have been originally captured with control characters for electrically controlled typewriters and it perpetuated the model...
Actually the correct variation would be to just include the \r, line feed, the carriage return being unnecessary, that is, automatic, hence:
char c;
ifstream is;
is.open("",ios::binary);
...
is.getline(buffer, bufsize, '\r');
//ignore following \n or restore the buffer data
if ((c=is.get())!='\n') is.rdbuf()->sputbackc(c);
...
would be the most correct way to handle all types of files. Note however that \n in TEXT mode is actually the byte pair 0x0d 0x0a, but 0x0d IS just \r: \n includes \r in TEXT mode but not in BINARY, so \n and \r\n are equivalent... or should be. This is a very basic industry confusion actually, typical industry inertia, as the convention is to speak of CRLF, in ALL platforms, then fall into different binary interpretations. Strictly speaking, files including ONLY 0x0d (carriage return) as being \n (CRLF or line feed), are malformed in TEXT mode (typewritter machine: just return the car and strikethrough everything...), and are a non-line oriented binary format (either \r or \r\n meaning line oriented) so you are not supposed to read as text! The code ought to fail maybe with some user message. This does not depend on the OS only, but also on the C library implementation, adding to the confusion and possible variations... (particularly for transparent UNICODE translation layers adding another point of articulation for confusing variations).
The problem with the previous code snippet (mechanical typewriter) is that it is very inefficient if there are no \n characters after \r (automatic typewriter text). Then it also assumes BINARY mode where the C library is forced to ignore text interpretations (locale) and give away the sheer bytes. There should be no difference in the actual text characters between both modes, only in the control characters, so generally speaking reading BINARY is better than TEXT mode. This solution is efficient for BINARY mode typical Windows OS text files independently of C library variations, and inefficient for other platform text formats (including web translations into text). If you care about efficiency, the way to go is to use a function pointer, make a test for \r vs \r\n line controls however way you like, then select the best getline user-code into the pointer and invoke it from it.
Incidentally I remember I found some \r\r\n text files too... which translates into double line text just as is still required by some printed text consumers.
The C++ runtime should deal correctly with whatever the endline convention is for your particular platform. Specifically, this code should work on all platforms:
#include <string>
#include <iostream>
using namespace std;
int main() {
string line;
while( getline( cin, line ) ) {
cout << line << endl;
}
}
Of course, if you are dealing with files from another platform, all bets are off.
As the two most common platforms (Linux and Windows) both terminate lines with a newline character, with Windows preceding it with a carriage return,, you can examine the last character of the line string in the above code to see if it is \r and if so remove it before doing your application-specific processing.
For example, you could provide yourself with a getline style function that looks something like this (not tested, use of indexes, substr etc for pedagogical purposes only):
ostream & safegetline( ostream & os, string & line ) {
string myline;
if ( getline( os, myline ) ) {
if ( myline.size() && myline[myline.size()-1] == '\r' ) {
line = myline.substr( 0, myline.size() - 1 );
}
else {
line = myline;
}
}
return os;
}
One solution would be to first search and replace all line endings to '\n' - just like e.g. Git does by default.
Other than writing your own custom handler or using an external library, you are out of luck. The easiest thing to do is to check to make sure line[line.length() - 1] is not '\r'. On Linux, this is superfluous as most lines will end up with '\n', meaning you'll lose a fair bit of time if this is in a loop. On Windows, this is also superfluous. However, what about classic Mac files which end in '\r'? std::getline would not work for those files on Linux or Windows because '\n' and '\r' '\n' both end with '\n', eliminating the need to check for '\r'. Obviously such a task that works with those files would not work well. Of course, then there exist the numerous EBCDIC systems, something that most libraries won't dare tackle.
Checking for '\r' is probably the best solution to your problem. Reading in binary mode would allow you to check for all three common line endings ('\r', '\r\n' and '\n'). If you only care about Linux and Windows as old-style Mac line endings shouldn't be around for much longer, check for '\n' only and remove the trailing '\r' character.
Unfortunately the accepted solution does not behave exactly like std::getline(). To obtain that behavior (to my tests), the following change is necessary:
std::istream& safeGetline(std::istream& is, std::string& t)
{
t.clear();
// The characters in the stream are read one-by-one using a std::streambuf.
// That is faster than reading them one-by-one using the std::istream.
// Code that uses streambuf this way must be guarded by a sentry object.
// The sentry object performs various tasks,
// such as thread synchronization and updating the stream state.
std::istream::sentry se(is, true);
std::streambuf* sb = is.rdbuf();
for(;;) {
int c = sb->sbumpc();
switch (c) {
case '\n':
return is;
case '\r':
if(sb->sgetc() == '\n')
sb->sbumpc();
return is;
case std::streambuf::traits_type::eof():
is.setstate(std::ios::eofbit); //
if(t.empty()) // <== change here
is.setstate(std::ios::failbit); //
return is;
default:
t += (char)c;
}
}
}
According to https://en.cppreference.com/w/cpp/string/basic_string/getline:
Extracts characters from input and appends them to str until one of the following occurs (checked in the order listed)
end-of-file condition on input, in which case, getline sets eofbit.
the next available input character is delim, as tested by Traits::eq(c, delim), in which case the delimiter character is extracted from input, but is not appended to str.
str.max_size() characters have been stored, in which case getline sets failbit and returns.
If no characters were extracted for whatever reason (not even the discarded delimiter), getline sets failbit and returns.
If it is known how many items/numbers each line has, one could read one line with e.g. 4 numbers as
string num;
is >> num >> num >> num >> num;
This also works with other line endings.
So I have a two programs that I am comparing output to. Whenever it does a newline, it says my output is different using a diff utility.
I have tried using:
std::cout << endl;
and
std::cout << '\n';
but it winmerge still says that our output is different. I am running both executables on the same machine.
Streaming std::endl is equivalent to streaming '\n' then std::flush so, no, you won't see any differences. More likely your tool is expecting to find a Windows-style line ending (that is, CRLF rather than just LF).
std::cout << "\r\n" << std::flush;
But, instead of guessing, you should simply open up that comparison data in a hex editor and see for yourself what characters are expected.
Read the file in binary mode and check the byte values for the newline.
One will probably be a char with ASCII code 10
and the other will be 10 and 13.
Is there a header file somewhere that stores the line termination character/s for the system (so that without any #ifdefs it will work on all platforms, MAC, Windows, Linux, etc)?
You should open the file in "text mode" (that is "not use binary"), and newline is always '\n', whatever the native file is. The C library will translate whatever native character(s) indicate newlines into '\n' whenever appropriate [that is, reading/writing text files]. Note that this also means you can't rely on "counting the number of characters read and using that to "seek back to this location".
If the file is binary, then newlines aren't newlines anyways.
And unless you plan on running on really ancient systems, and you REALLY want to do this, I would do:
#ifdef __WINDOWS__ // Or something like that
#define END_LINE "\r\n"
#else
#define END_LINE "\n"
#endif
This won't work for MacOS before MacOS X, but surely nobody is using pre-MacOS X hardware any longer?
No, because it's \n everywhere. That expands to the correct newline character(s) when you write it to a text file.
Posix requires it to be \n. So if _POSIX_VERSION is defined, it's \n. Otherwise, special-case the only non-POSIX OS, windows, and you're done.
It doesn't look like there's anything in the standard library to obtain the current platform's line terminator.
The closest looking API is
char_type std::basic_ios::widen(char c);
It "converts a character c to its equivalent in the current locale" (cppreference). I was pointed at it by the documentation for std::endl which "inserts a endline character into the output sequence os and flushes it as if by calling os.put(os.widen('\n')) followed by os.flush()" (cppreference).
On Posix,
widen('\n') returns '\n' (as a char, for char-based streams);
endl inserts a '\n' and flushes the buffer.
On Windows, they do exactly the same. In fact
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ofstream f;
f.open("aaa.txt", ios_base::out | ios_base::binary);
f << "aaa" << endl << "bbb";
f.close();
return 0;
}
will result in a file with just '\n' as a line terminator.
As others have suggested, when the file is open in text mode (the default) the '\n' will be automatically converted to '\r' '\n' on Windows.
(I've rewritten this answer because I had incorrectly assumed that std::endl translated to "\r\n" on Windows)
The answer to your question can be extended a little further by being able to use the same code to read both Windows-based text files and Unix-based text files in Windows, MacOS and Linux/Unix systems (excluding the ancient Macintosh system that use \r as line delimiter).
As already pointed out by others, \n can be used as line delimiter in all above systems because underlying C library can convert it to native delimiter used by each system. Therefore, one can use the following codes to read in text files that use either \n or \r\n as line delimiters while discarding all delimiter characters:
// Open a file in text mode
std::ifstream file_stream(file_name, ios_base::in);
// Use widened '\n' as line delimiter
for(std::string text_line; std::getline(file_stream, text_line, input.widen('\n'));)
{
if(!text_line.empty())
{
// Discard '\r' when read Windows-based file in Unix-like systems
if(text_line.back() == '\r') text_line.pop_back();
// Do more with text_line
}
}
In above codes, read-in lines containing \r will only be encountered when reading Windows-based text files in Unix-like systems because a single \n is used as delimiter while Windows-based text files use \r\n. On the other hand, when reading text files in Windows-based systems, text files with either \r\n or \n can be removed by std::getline function that uses the widened \n as delimiter. Note that this code snippet doesn't remove any \r not adjacent to \n because then those text files are not correctly formed in Windows, Mac and Linux/Unix systems.
Thinking about UNIX, Windows and Mac and an output stream (both binary and text),
What does std::endl represent, i.e. <CR><LF>, <LF> or <CR>? Or is it always the same no matter what platform/compiler?
The reason I'm asking is that I'm writing a TCP client that talks a protocol that expects each command to end in <CR><LF>. So I'm wondering whether to use std::endl or "\r\n" in my streams.
EDIT: Ok, so one flushes the buffer and another doesn't. I get that. But if I'm outputting text to a file, is '\n' equal to <LF> or does it convert to <CR><LF> on Windows and <LF> on Unix or not? I don't see a clear answer yet.
The code:
stream << std::endl;
// Is equivalent to:
stream << "\n" << std::flush;
So the question is what is "\n" mapped too.
On normal streams nothing happens. But for file streams (in text mode) then the "\n" gets mapped to the platfrom end of line sequence. Note: The read converts the platform end of line sequence back to a '\n' when it reads from a file in text mode.
So if you are using a normal stream nothing happens. If you are using a file stream, just make sure it is opened in binary mode so that no conversion is applied:
stream << "\r\n"; // <CR><LF>
The C++ standard says that it:
Calls os.put(os.widen(ā\nā) ), then
os.flush()
What the '\n' is converted to, if it is converted at all, is down to the stream type it is used on, plus any possible mode the stream may be opened in.
Use stream << "\r\n" (and open the stream in binary mode). stream << std::endl; is equivalent to stream << "\n" << flush;. The "\n" might be converted to a "\r\n" if the code runs on Windows, but you can't count on it -- at least one Windows compiler converts it to "\n\r". On a Mac, it's likely to be converted to "\r" and on Unix/Linux and most similar systems, it'll be left as just a "\n".
Quoted from the accepted answer on a related question:
The varying line-ending characters don't matter, assuming the file is open in text mode, which is what you get unless you ask for binary. The compiled program will write out the correct thing for the system compiled for.
The only difference is that std::endl flushes the output buffer, and '\n' doesn't. If you don't want the buffer flushed frequently, use '\n'. If you do (for example, if you want to get all the output, and the program is unstable), use std::endl
In your case, since you specifically want <CR><LF>, you should explicitly use \r\n, and then call std::flush() if you still want to flush the output buffer.
Looks like your question got munged. Each command ends in []? For an over-the-wire protocol, I'd suggest using a delimiter that doesn't vary by platform. std::endl could resolve to '\r\n' or '\n\r' depending on the platform.