I am trying to read a string given in console using getchar(). I have given a string of length 20k in which i am able to read only 4094 characters, and loop is getting breaked after that. used the below lines of code.
while ((c = getchar()) != '\n') {
}
I have tested your code in Ubuntu and it seems to be an issue with pasting the string on the shell rather than an issue with the code.
Next, make sure you compare c for EOF character as mentioned above.
In order to fix your problem or to reproduce it first, I created a 20000 character file using Python and then copied the 20000 characters.
When I pasted the characters on the shell, it doesn't give correct output but gives 4094 as you mentioned, meaning it is a shell limitation.
My solution is to paste the output to a file and then redirect it to the output.
On Linux, I did this: cat longfile20000 | ./a.out
One blog I found shows this and suggests a workaround but using a file seems better.
http://blog.chaitanya.im/4096-limit#:~:text=The%20reason%20for%20this%20discrepancy%20as%20I%20later,instead%20of%20taking%20input%20from%20the%20command%20line.
You don't mention your operating system, but its console I/O is probably limited to 4096 bytes (see for example this). And with the newline and null terminator, you are seeing 4094 bytes.
There is no machine-independent way to change this limit, as far as I can see. And in any case, it just feels wrong to use console I/O on large amounts of data. I suggest a re-design, perhaps using the suggestion in #VaibhavDS's answer.
By the way, what happpens if you call getchar after exiting the loop? I expect that the input data has been lost, but it might still be there.
Edited to add: You might find something useful at this answer.
Related
how can i input text in c++ console without breaking the input in one line at a time?
If i use cin i can input one string each time plus i cannot edit the input (except if i edit the string but this wont help)
Is there any way to input strings (with multiple lines) but not to break the string in one line at a time?
I am running ubuntu 12.04
Who is writing? Is it you, or some program??
Your terminology is unusual: generally programmers take the point of view of the computer!
What you write by typing on your keyboard is an input to some program (which reads it).
If you want an editable input (to the program, so "written" or typed by the human user), consider using GNU readline (on Linux), or perhaps ncurses
If you want to format the program's output (which the user would read with his eyes), you'll generally need to code that formatting explicitly. Perhaps ANSI escape codes might be useful (but using them might make readline or ncurses unhappy).
See also this answer and the references I gave there.
I have a fairly simple program with a vector of characters which is then outputted to a .txt file.
ofstream op ("output.txt");
vector <char> outp;
for(int i=0;i<outp.size();i++){
op<<outp[i]; //the final output of this is incorrect
cout<<outp[i]; //this output is correct
}
op.close();
the text that is output by cout is correct, but when I open the text file that was created, the output is wrong with what look like Chinese characters that shouldn't have been an option for the program to output. For example, when the program should output:
O dsof
And cout prints the right output, the .txt file has this:
O獤景
I have even tried adding the characters into a string before outputting it but it doesn't help. My best guess is that the characters are combining together and getting a different value for unicode or ascii but I don't know enough about character codes to know for sure or how to stop this from happening. Is there a way to correct the output so that it doesn't do this? I am currently using a windows 8.1 computer with code::blocks 12.11 and the GNU GCC compiler in case that helps.
Some text editors try to guess the encoding of a file and occasionally get it wrong. This can particularly happen with very small amounts of text because whatever statistical analysis is being used just doesn't have enough data to make a good conclusion. Window's Notepad has/had an infamous example with the text "Bush hid the facts".
More advanced text editors (for example Notepad++) may either not experience the same problem or may give you options to change what encoding is being assumed. You could use such to verify that the contents of the file are actually correct.
Hex editors/viewers are another way, since they allow you to examine the raw bytes of the file without interpretation. For instance, HxD is a hex editor that I have used in the past.
Alternatively, you can simply output more text. The more there is, generally the less likely something will guess wrong. From some of my experiences, newlines are particularly helpful in convincing the text editor to assume the correct encoding.
there is nothing wrong with your code.
maybe the text editor you use has a default encoding.
use more advanced editors and you will get the right output.
I find it hard to explain but I will try my best. Some times in Linux- in the Terminal- things get printed but you can still write over them. eg when using wget you get a progress bar like this:
[===================> ]
Now if you type something while it is doing this it will 'overwrite' it. My question is how to recreate this in c++.
Will you use something like
cout <<
or something else?
I hope you understand what I am getting at...
btw I am using the most recent version of Arch with xfce4
Printing a carriage return character \r is typically interpreted in Linux as returning you to the beginning of the line. Try this, for example:
std::cout << "Hello\rJ";
The output will be:
Jello
This does depend on your terminal, however, so you should look up the meaning of particular control characters for your terminal.
For a more cross-platform solution and the ability to do more complex text-based user interfaces, take a look at ncurses.
You can print the special character \b to go back one space. Then you can print a space to blank it out, or another character to overwrite what was there. You can also use \r to return to the beginning of the current output line and write again from there.
Controlling the terminal involved sending various escape sequences to it, in order to move the cursor around and such.
http://www.ibiblio.org/pub/historic-linux/ftp-archives/tsx-11.mit.edu/Oct-07-1996/info/vt102.codes
You could also use ncurses to do this.
I am writing a C++ program which reads lines of text from a .txt file. Unfortunately the text file is generated by a twenty-something year old UNIX program and it contains a lot of bizarre formatting characters.
The first few lines of the file are plain, English text and these are read with no problems. However, whenever a line contains one or more of these strange characters mixed in with the text, that entire line is read as characters and the data is lost.
The really confusing part is that if I manually delete the first couple of lines so that the very first character in the file is one of these unusual characters, then everything in the file is read perfectly. The unusual characters obviously just display as little ascii squiggles -arrows, smiley faces etc, which is fine. It seems as though a decision is being made automatically, without my knowledge or consent, based on the first line read.
Based on some googling, I suspected that the issue might be with the locale, but according to the visual studio debugger, the locale property of the ifstream object is "C" in both scenarios.
The code which reads the data is as follows:
//Function to open file at location specified by inFilePath, load and process data
int OpenFile(const char* inFilePath)
{
string line;
ifstream codeFile;
//open text file
codeFile.open(inFilePath,ios::in);
//read file line by line
while ( codeFile.good() )
{
getline(codeFile,line);
//check non-zero length
if (line != "")
ProcessLine(&line[0]);
}
//close line
codeFile.close();
return 1;
}
If anyone has any suggestions as to what might be going on or how to fix it, they would be very welcome.
From reading about your issues it sounds like you are reading in binary data, which will cause getline() to throw out content or simply skip over the line.
You have a couple of choices:
If you simply need lines from the data file you can first sanitise them by removing all non-printable characters (that is the "official" name for those weird ascii characters). On UNIX a tool such as strings would help you with that process.
You can off course also do this programmatically in your code by simply reading in X amount of data, storing it in a string, and then removing those characters that fall outside of the standard ASCII character range. This will most likely cause you to lose any unicode that may be stored in the file.
You change your program to understand the format and basically write a parser that allows you to parse the document in a more sane way.
If you can, I would suggest trying solution number 1, simply to see if the results are sane and can still be used. You mention that this is medical data, do you per-chance know what file format this is? If you are trying to find out and have access to a unix/linux machine you can use the utility file and maybe it can give you a clue (worst case it will tell you it is simply data).
If possible try getting a "clean" file that you can post the hex dump of so that we can try to provide better help than that what we are currently providing. With clean I mean that there is no personally identifying information in the file.
For number 2, open the file in binary mode. You mentioned using Windows, binary and non-binary files in std::fstream objects are handled differently, whereas on UNIX systems this is not the case (on most systems, I'm sure I'll get a comment regarding the one system that doesn't match this description).
codeFile.open(inFilePath,ios::in);
would become
codeFile.open(inFilePath, ios::in | ios::binary);
Instead of getline() you will want to become intimately familiar with .read() which will allow unformatted operations on the ifstream.
Reading will be like this:
// This code has not been tested!
char input[1024];
codeFile.read(input, 1024);
int actual_read = codeFile.gcount();
// Here you can process input, up to a maximum of actual_read characters.
//ProcessLine() // We didn't necessarily read a line!
ProcessData(input, actual_read);
The other thing as mentioned is that you can change the locale for the current stream and change the separator it considers a new line, maybe this will fix your issue without requiring to use the unformatted operators:
imbue the stream with a new locale that only knows about the newline. This method may or may not let your getline() function without issues.
Okay, I have been researching on how to do this, but say I am running a program that has a whole bit of output on the terminal, how would I clear the screen from within my program so that I can keep my program running?
I know I can just type clear in terminal and it clears it fine, but like I said, for this program it would be more beneficial for me.
I found something that works, however, I'm not sure what it is or what it is doing.
cout << "\033[2J\033[1;1H";
That works but I have no clue what it is, if you could explain it, than I would much appreciate it.
These are ANSI escape codes. The first one (\033[2J) clears the entire screen (J) from top to bottom (2). The second code (\033[1;1H) positions the cursor at row 1, column 1.
All ANSI escapes begin with the sequence ESC[, have zero or more parameters delimited by ;, and end with a command letter (J and H in your case). \033 is the C-style octal sequence for the escape character.
See here for the full roadshow.
Instead of depending on specific escape sequences that may break in unexpected situations (though accepting that trade-off is fine, if it's what you want), you can just do the same thing you'd do at your shell:
std::system("clear");
Though generally system() is to be avoided, for a user-interactive program neither the extra shell parsing nor process overhead is significant. There's no problem with shell escaping either, in this case.
You could always fork/exec to call clear if you did want to avoid system(). If you're already using [n]curses or another terminal library, use that.
For portability you should get the string from termcap's cl (clear) capability (Clear screen and cursor home). (Or use std::system("clear") as told by Roger Pate).
man 3 termcap (in ncurses)
man 5 termcap
set | grep TERMCAP
you can write in a terminal "clear > data" and read in data the escapes sequance
0x1B[H0x1B[2J0x1B[3J
so
std::cout << "\033[H\033[2J\033[3J" ;