Using getline on html file - c++

I have this assignment to search for certian info in a html file and put the result into text file. I wanted to do it using getline, but somehow it's not working. I have no problems with using getline on text file so I assumed that you cannot use getline on html file. Is that assumption right? How can I convert such file into a text file? Or maybe there is a better/easier solution?Thanks.
Here is the code:(the names of the variables are not in english, I hope it's not a problem)
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
string nazwa_wejsciowego;
string roboczy;
ifstream html;
ifstream txt("wynik.txt");
int main()
{
cout<<"Podaj nazwę pliku html, z ktorego odczytane maja zostac dane."<<endl;
cin>>nazwa_wejsciowego;
ifstream html(nazwa_wejsciowego, ios::app); //opening the file
if(!html){
cout<<"Otwarcie pliku "<<nazwa_wejsciowego<<" nie powiodlo sie."<<endl;
system("pause");}
//checking if it opende properly
getline(html, roboczy);
cout<<roboczy<<endl;
return 0;}

No, the assumption is not right. HTML is text; the fact that it is in a structured format that can be parsed by a computer to render a webpage is not relevant to read individual characters in the file.
getline may be a suitable approach though, as Steve points out in comments, some HTML pages are "minified" (they have unnecessary whitespace removed to save space and make code harder to copy) and, in such a case, you may end up with just one really big line. It may therefore be more convenient to read in chunks of bytes.

Related

How can i edit a txt file without deleting the content in c++?

I want to create a database in a txt file and access it and edit certain parts of it using seekp(), but when I open the file to write in it , the program creates a new file deleting the previous one.
#include <fstream>
#include <iostream>
using namespace std;
int main() {
ofstream g;
g.open("text.txt",ios::out);
if(!g.is_open())
cout<<"error";
else {
g.seekp(2);
g.write("apple",5);
}
g.close();
return 0;
}
You'll need a different open mode.
The documentation is quite obscure when it comes to the behavior of ofstream (for all practical purposes, the behavior you observe is by design: it will truncate).
Use fstream with ios_base::in | ios_base::out | ios_base::binary instead.
Unless you're using some encoding where one character is always one, two, or four bytes, you won't be able to consistently do this with a text mode. Also, writing at any seek position before end-of-file won't shift content past the current seek position, it is simply overwritten. So in order to achieve a database-like behavior, you're at least going to need some kind of fixed-size records or an indexing data structure.

How to append to the last line of a file in c++?

using g++, I want to append some data to the last line (but to not create a new line) of a file. Probably, a good idea would be to move back the cursor to skip the '\n' character in the existing file. However this code does not work:
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ofstream myfile;
myfile.open ("file.dat", fstream::app|fstream::out);
myfile.seekp(-1,myfile.ios::end); //I believe, I am just before the last '\n' now
cout << myfile.tellp() << endl; //indicates the position set above correctly
myfile << "just added"; //places the text IN A NEW LINE :(
//myfile.write("just added",10); //also, does not work correctly
myfile.close();
return 0;
}
Please give me the idea of correcting the code. Thank you in advance. Marek.
When you open with app, writing always writes at the end, regardless of what tellp tells you.
("app" is for "append", which does not mean "write in an arbitrary location".)
You want ate (one of the more inscrutable names in C++) which seeks to the end only immediately after opening.
You also want to add that final newline, if you want to keep it.
And you probably also want to check that the last character is a newline before overwriting it.
And, seeking by characters can do strange things in text mode, and if you open in binary mode you need to worry about the platforms's newline convention.
Manipulating text is much harder than you think.
(And by the way, you don't need to specify out on an ofstream - the "o" in "ofstream" takes care of that.)

Parsing a textfile (with HTML in it) with C++

I've been able to get some raw data in the form of a html webpage, which I have in turn put into an ordinary text file. I'm currently trying to use a C++ program to parse this file, but for some reason it's giving me weird output in that it's putting #s, symbols, and ^Ms in between every single letter. I'm unsure as to whether this is because I'm trying to parse an HTML file or if it's because my code is wrong, but I've tried my code on smaller HTML files and it works fine. The file I want it to work on is just 145kB
Here is my code:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main(int argc, char** argv)
{
ifstream inFile;
inFile.open(argv[1]);
string str;
while(getline(inFile, str))
{
cout << str << endl;
}
}
If anyone could give me a clue as to why this isn't working, I'd be very grateful.
HTML files may come in virtually any encoding. OP needs to open the file, according the encoding that it has, that is typically supplied by the web browser he got it from as part of the page serve. Note that each individual page served up by the same site, may have different encodings. The "#" are probably actually printed as "^#", which is what many output routines will print if you give them null characters. He may have a UTF-16 file, and is reading it assuming it is ASCII 8 bit.
He also needs to understand that "newline" conventions vary between machines; his "^M" probably means he is running on a Unix machine (which thinks "^J" is a line break, and he got his file from a Windows box, which thinks "^M^J" is a line break. Welcome to the real world.
Next, OP will find that parsing HTML is actually hard because it is complex, has lots of crazy character conventions (above and beyond encoding), and often is often simply illegal because the browsers allow it, and not every checks that their HTML is clean.
Try if this works for you.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main(int argc, char** argv)
{
wifstream inFile;
inFile.open(argv[1]);
wstring str;
while(getline(inFile, str))
{
wcout << str << endl;
}
}

Appending string in the middle of a text file

here is my code
#include <string>
#include <iostream>
#include <fstream>
int main(void){
fstream myfile2;
myfile2.open("test2.txt", ios::app);
string checkline;
getline(myfile2, checkline);
int razmer=checkline.length();
string balli="256";
myfile2.seekp(razmer);
myfile2<<balli;
}
test2.txt consists of 2 strings, so it is looks like
Ivanov
Petrov
I want to make from Ivanov -> Ivanov 256. With no touching 2nd string. But my code did not work at all. Thanks in advance.
There's no easy way to edit a text file. The usual solution is to read the whole source file into memory, make your modifications in memory, and then write out all of the file.
In your example where the file seems to be line-based, you could read it line by line and put the lines in a std::vector. Edit the line you want to edit, then loop over the vector and write out the lines.
Note: When writing the file, you open it in write mode, so the file is recreated and looses all old contents.

C++ filestream problem

I'm making a simple game in C++ and I want the highest score at the end of the game to be written in a text file. I'm using fstream to first read the last saved highscore and compare it to the new highscore. The output in the text file looks like this (0НН) and it shouldn't. I'm realy frustrated with this.
Here's a part of my code.
double score_num=0;
fstream datafile("score.pon"); //Declaration of variables
...
if(SPEED>score_num)
{
score_num=SPEED;
}
//getting the score
...
datafile<<score_num; //Writing it to the file
#include <iostream>
#include <fstream>
using namespace std;
#define SPEED 12
int main()
{
double score_num=0;
ofstream datafile("score.pon"); //Declaration of variables
if(SPEED>score_num)
{
score_num=SPEED;
}
//getting the score
datafile<<score_num; //Writing it to the file
return 0;
}
Replaced fstream by ofstream works like a charm. Perhaps you should show more code? Also, closing the file is good habit:
datafile.flush();
datafile.close();
I'll leave errorhandling to you
Hacky solution - open the file as an ifstream, read existing value, close it, adjust score, open file as an ofstream, write score, close it. Alternatively, investigate the use of the seekp() function, and write the score as a binary value, not as text.
My best guess as to why the original was failing is that when you read the last character from a file, the EOF bit is set. In this state, all read & write operations fail. You can write to a file stream that's reached its end by calling clear first.
// the following doesn't truncate file, or handle other error conditions.
if (datafile.eof()) {
datafile.clear();
}
datafile.seekp(0, std::ios_base::beg);
datafile << score_num;
However, this won't solve all your problems. If you write less to the file than its current length (e.g. the old high score was "1.5" and the new high score is "2"), part of the old data will still be present at the end of the file. As long as scores never have a fractional part (in which case you should probably be using an integer type, such as unsigned long), you won't notice the bug, since a < b ⇒ len(a) ≤ len(b). To handle this properly, you'll need to use unapersson's recommended approaches (which will either truncate the file or always write the same amount of data to the file), or use a different I/O library (such as your platform's C library or boost) which provide a way to truncate files (such as the POSIX ftruncate).