Markov C++ read from file performance - c++

I have my 2nd assignment for C++ class which includes Markov chains. The assignment is simple but I'm not able to figure out what is the best implementation when reading chars from files.
I have a file around 300k. One of the rules for the assignment is to use Map and Vector classes. In Map (key is only string) and values will be the Vectors. When I'm reading from the file, I need to start collecting key pairs.
Example:
File1.txt
1234567890
1234567890
If Select Markov k=3, I should have in my Map:
key vector
123 -> 4
456 -> 7
789 -> 0
0/n1 -> 2
234 -> 5
567 -> 8
890 -> /n
/n -> NULL
The professor's suggestion is to read char by char, so my algorithm is the following
while (readchar != EOF){
tempstring += readchar
increment index
if index == Markovlevel {
get nextchar if =!EOF
insert nextchar value in vector
insert tempstring to Map and assign vector
unget char
}
}
I omit some other details. My main question is that if I have 318,000 characters, I will be doing the conditional every time which slows down my computer a lot (brand new MAC pro). A sample program from the professor executes this file in around 5 seconds.
I'm not able to figure out what's the best method to read fixed length words from a text file in C++.
Thanks!

Repeated file reading will slow down the program.
Read the file in blocks, of say size 1024, put into a buffer. Then process this buffer as you require for the assignment. Repeat for the next block till you are done with the file.

Have you actually timed the program? 318,000 conditionals should be a piece of cake for your brand new MAC pro. That should take only microseconds.
Premature optimization is the root of all evil. Make your program work first, optimization comes second.

Related

Read bytes from the standard input when they're written [duplicate]

This question already has an answer here:
How to read one character at a time from user input — at each keystroke (without waiting for the user to press the return key)?
(1 answer)
Closed 2 months ago.
I'm trying to read from the standard input by using input_byte stdin
Something like this:
let rec loop () =
Format.eprintf "in the loop#.";
(match input_byte stdin with
| 0x0D -> print_newline ()
| b -> Format.eprintf "saw %i#." b);
loop ()
let () =
loop ()
If I type aaa I expect to see
in the loop
a
in the loop
saw 97
a
in the loop
saw 97
a
in the loop
saw 97
But instead nothing happens until I hit enter and this is what I have:
in the loop
aaa
saw 97
in the loop
saw 97
in the loop
saw 97
in the loop
saw 10
in the loop
Is there a way to read characters from a channel with a reader that doesn't wait for a flush?
Generally speaking the terminal driver of your system doesn't send characters at each keystroke. As you're seeing, it waits to send a whole line. This is more efficient and it also lets you correct errors before sending.
There are system-dependent ways to change this buffering behavior (and also echoing behavior, which you might also want to change). If you're on some kind of Unix system (anything but Windows essentially) you can use Unix.tcsetattr to change the handling of characters by the terminal driver.
Here is a previous question and answer that shows how to read one character at a time: How to read a character in OCaml without a return key?

C++ - Count integers from .txt file

I want to count integers from .txt file named "Example.in", which contains (for example):
1 2 3 4 5
3 6 7
5 8
8 9 10 11
1. and returns me a 11, in this case (Integers that repeat counts as 1 - unique number count only). At this point, it only prints out 0 (I think there's problem with opening file at this stage).
2. prints out only the first integers in every row - 1, 3, 5, 8.
int integer_count(){
int count = 0;
int i;
ifstream fin;
fin.open("Example.in"); //.txt file
while(fin >> i)
{
count++;
}
fin.close();
return count; // In this case it should print 14 instead of 11,
because I didn't count out 3, 5 and 8 (which duplicates - haven't figured
out how to make unique count that would close eye for duplicates and
just count unique integers.)
}
When opening the file, you should always check whether or not the opening process has succeed. I believe that file_name.good() is a perfect boolean function to do that. Just make a simple if statement to see if it works, like that:
if(fin.good()) std::cout << "File opened!";
else{ //do something when opening didn't work }
Additioanlly, I believe you also need to write the 'full file name', which in this case (if your file name is exactly "Example.in") would be "Example.in.txt".
For the unique integers problem, there are couple solutions: either make a std::vector that stores already read integers from file and everytime you try to read another one, check whether or not it was already read (fastest algorithm here would be quick/heap/merge sort + binary search, instead of iterating everytime). Always when new integer is added, increase your count value by one.
Second solution is: just store your integers in vector and get rid of the multiples by iterating and erasing. Then count would just be your_vector.size().
"Printing each integer that starts a new line" is a diffrent problem. I would switch from using while(fin >> i) to getline() function. That way, I have each line of file as a string, that can be turned into row of integers. With your while() loop, you do not know when the new line stars, so it's impossible to complete second task (or, after the task 1, you could open a file once again and use another algoritm only for getting the first integers of every line)
Last thing: As many people here stated, using a debugger is highly recommended. It saved me hours of pondering what is wrong, by just simply going through code, line by line. Codeblocks offers a really good debugger, so I encourage you to simply google a codeblocks debugger guide. Even YouTube has some tutorials covering that.

How do I take in "standard input commands" and then know when to continue running the program?

I'm working on a traveling sales person program in C++. I'm very new to C++ and its so much different from Java that the simple things really get confusing.
How do I use standard input to accept commands from a file (apparently I don't open the file, my professor just uses it to feed in commands or something) and then use those commands to run my program accordingly?
An example would be
city a
city b
city c
city d
a c 1300
a d 1400
a b 900
d c 1500
So basically an unknown amount of information is going to be passed into my program and then my program needs to declare a specific number of cities and then attach travel costs between each of them as specified. I think I can do the latter part, but my problem is knowing how to take an unknown number of inputs and then attach those inputs to variables.
I guess in Java I would do something like this:
While(nextLine = true){
if (nextLine.contains ("city"){
String nextLine = nextLine;
...and so on
}
}
Start with waiting a filename by ifstream, then you can get inputs by char or line in which using a char pointer and determine it with text size i believe somethig like this
std::ifstream::pos_type filesize(const char* filename)
{
Now you buffered, go on for what you know from java and combine it. Besides, like Sam's sugesstion, u you should read

OCaml string length limitation when reading from stdin\file

As part of a Compiler Principles course I'm taking in my university, we're writing a compiler that's implemented in OCaml, which compiles Scheme code into CISC-like assembly (which is just C macros).
the basic operation of the compiler is such:
Read a *.scm file and convert it to an OCaml string.
Parse the string and perform various analyses.
Run a code generator on the AST output from the semantic analyzer, that outputs text into a *.c file.
Compile that file with GCC and run it in the terminal.
Well, all is good and well, except for this: I'm trying to read an input file, that's around 4000 lines long, and is basically one huge expressions that's a mix of Scheme if & and.
I'm executing the compiler via utop. When I try to read the input file, I immediately get a stack overflow error message. It is my initial guess that the file is just to large for OCaml to handle, but I wasn't able to find any documentation that would support this theory.
Any suggestions?
The maximum string length is given by Sys.max_string_length. For a 32-bit system, it's quite short: 16777211. For a 64-bit system, it's 144115188075855863.
Unless you're using a 32-bit system, and your 4000-line file is over 16MB, I don't think you're hitting the string length limit.
A stack overflow is not what you'd expect to see when a string is too long.
It's more likely that you have infinite recursion, or possibly just a very deeply nested computation.
Well, it turns out that the limitation was the amount of maximum ram the OCaml is configured to use.
I ran the following command in the terminal in order to increase the quota:
export OCAMLRUNPARAM="l=5555555555"
This worked like a charm - I managed to read and compile the input file almost instantaneously.
For reference purposes, this is the code that reads the file:
let file_to_string input_file =
let in_channel = open_in input_file in
let rec run () =
try
let ch = input_char in_channel in ch :: (run ())
with End_of_file ->
( close_in in_channel;
[] )
in list_to_string (run ());;
where list_to_string is:
let list_to_string s =
let rec loop s n =
match s with
| [] -> String.make n '?'
| car :: cdr ->
let result = loop cdr (n + 1) in
String.set result n car;
result
in
loop s 0;;
funny thing is - I wrote file_to_string in tail recursion. This prevented the stack overflow, but for some reason went into an infinite loop. Oh, well...

Why do my program stops when inserting values to a MAP?

I try to read a file into a map but the program stops in the middle of the file.
The file consists millionss of lines, each line is a STRING composed of numbers and an INT.
e.g. 1230981237120313 123.
#include<map>
#include<iostream>
#include<fstream>
void main ()
{
ifstream mapfile("filename.txt",ifstream::in);
int itemp;
string stemp;
map<string,int> mapping;
while(mapfile>>stemp>>itemp)
{
mapping[stemp]=itemp;
}
}
When it deals with small files with hundreds of lines, it is ok. But when it reaches more then 90 million lines, it stops without reporting any error and just stops with a "Press any key to continue...".
I've done some analysis and I can make sure the program stops after reading the line in the file and when it needs to do mapping[stemp]=itemp . And every time it stops, it happens at different lines but always around 90 million.
Could anyone tell me why this could happen?
Any help will be highly appreciated.
It is always advisable ***not to read entire file*** at once in memory as file size could vary from small kbs to bigger Mbs.
It's better to read in chunks say few fixed thousand bytes(say 4092) every time you read from file do your processing and close it.