Read file and extract certain part only - c++

ifstream toOpen;
openFile.open("sample.html", ios::in);
if(toOpen.is_open()){
while(!toOpen.eof()){
getline(toOpen,line);
if(line.find("href=") && !line.find(".pdf")){
start_pos = line.find("href");
tempString = line.substr(start_pos+1); // i dont want the quote
stop_pos = tempString .find("\"");
string testResult = tempString .substr(start_pos, stop_pos);
cout << testResult << endl;
}
}
toOpen.close();
}
What I am trying to do, is to extrat the "href" value. But I cant get it works.
EDIT:
Thanks to Tony hint, I use this:
if(line.find("href=") != std::string::npos ){
// Process
}
it works!!

I'd advise against trying to parse HTML like this. Unless you know a lot about the source and are quite certain about how it'll be formatted, chances are that anything you do will have problems. HTML is an ugly language with an (almost) self-contradictory specification that (for example) says particular things are not allowed -- but then goes on to tell you how you're required to interpret them anyway.
Worse, almost any character can (at least potentially) be encoded in any of at least three or four different ways, so unless you scan for (and carry out) the right conversions (in the right order) first, you can end up missing legitimate links and/or including "phantom" links.
You might want to look at the answers to this previous question for suggestions about an HTML parser to use.

As a start, you might want to take some shortcuts in the way you write the loop over lines in order to make it clearer. Here is the conventional "read line at a time" loop using C++ iostreams:
#include <fstream>
#include <iostream>
#include <string>
int main ( int, char ** )
{
std::ifstream file("sample.html");
if ( !file.is_open() ) {
std::cerr << "Failed to open file." << std::endl;
return (EXIT_FAILURE);
}
for ( std::string line; (std::getline(file,line)); )
{
// process line.
}
}
As for the inner part the processes the line, there are several problems.
It doesn't compile. I suppose this is what you meant with "I cant get it works". When asking a question, this is the kind of information you might want to provide in order to get good help.
There is confusion between variable names temp and tempString etc.
string::find() returns a large positive integer to indicate invalid positions (the size_type is unsigned), so you will always enter the loop unless a match is found starting at character position 0, in which case you probably do want to enter the loop.
Here is a simple test content for sample.html.
<html>
<a href="foo.pdf"/>
</html>
Sticking the following inside the loop:
if ((line.find("href=") != std::string::npos) &&
(line.find(".pdf" ) != std::string::npos))
{
const std::size_t start_pos = line.find("href");
std::string temp = line.substr(start_pos+6);
const std::size_t stop_pos = temp.find("\"");
std::string result = temp.substr(0, stop_pos);
std::cout << "'" << result << "'" << std::endl;
}
I actually get the output
'foo.pdf'
However, as Jerry pointed out, you might not want to use this in a production environment. If this is a simple homework or exercise on how to use the <string>, <iostream> and <fstream> libraries, then go ahead with such a procedure.

Related

How do I find character in string and after that I extract the rest of string in C++

So my problem is next:
I have youtube link entered as input, and I should print youtubeID as output.
So, perhaps I have this link: https://www.youtube.com/watch?v=szyKv63JB3s , I should print this "szyKv63JB3s", so I came to conclusion that I need to find that " = " in user inputed string and I should look to make new string called "ytID" or "result" and store youtubeID from the moment I find that " = " characther.
So I really don't have idea how should it be done...
I mean I should go through string with for loop I guess, and after that I should store the part which will be called youtubeID, and I have problem there because I don't know how long is that link, I mean how much characthers it have, so I cannot really use for or while loops...
P.S This is not HOMEWORK, I just want to practice. :)
you should give a look at this post :
Removing everything after character (and also character)
A solution could be as simple as :
theString.substr( theString.find('=') ) ;
Just search for the character = as you mentioned in the question:
#include <iostream>
#include <string_view>
std::string_view extractYoutubeId(std::string_view const link) {
auto const off = link.find('=');
if (off == std::string_view::npos)
return std::string_view{};
return link.substr(off + 1);
}
int main() {
auto const& link = "https://www.youtube.com/watch?v=dQw4w9WgXcQ";
std::cout << extractYoutubeId(link);
}
Output: dQw4w9WgXcQ

Read from file up to another point C++

So I have a list
{"ID":"55e5f0c8ace3e","nombre":"Jacqueline ","apellido":"Charlet ","sobrenombre":"","edad":"30","caracteristicas":"","comentario":"",
I need to change that list and put it in a correct way like:
ID: 55e5f0c8ace3e
Nombre: Jacqueline
Apellido: ...
etc..
Tried this:
#include <iostream>
#include <fstream>
using namespace std;
ifstream datos("datos.txt");
ofstream final("final.txt");
int main(){
char valor;
if(!datos)
{
cout << "error";
}
else
{
while(!datos.eof())
{
datos.get(valor);
if(valor == 'I' && datos.peek() == 'D')
{
cout << "I can read" << endl;
}
}
}
}
I´m trying to do this with C++, which is the correct way to do it? I´ve tried some ways, but i dont know how to read from one point to another, by this i mean read from the double comas the ID and finish on the other double comas.
Thanks in advance
Introduce a integer state variable that describes what part of file you're in. Then interpret every character in accordance with that. For example, initially it's 0. Once you encounter a ", it becomes 1 - that means you're reading a name. If state is 1, and a character is not a "", you accumulate characters in some string variable. Once you encounter a quote character and state is 1, that means the name is over. And so on.
If this is not homework, Google for a C++ JSON parser. This is a stock problem, solved a thousand times already.
Also, please don't edit the answer - place a comment instead.

Simple Sentence Reverser in C++

I'm trying to build a program to solve a problem in a text book I bought recently and it's just driving me crazy.
I have to built a sentence reverser so I get the following:
Input = "Do or do not, there is no try."
Output = "try. no is there not, do or Do"
Here's what I've got so far:
void ReverseString::reversalOperation(char str[]) {
char* buffer;
int stringReadPos, wordReadPos, writePos = 0;
// Position of the last character is length -1
stringReadPos = strlen(str) - 1;
buffer = new char[stringReadPos+1];
while (stringReadPos >= 0) {
if (str[stringReadPos] == ' ') {
wordReadPos = stringReadPos + 1;
buffer[writePos++] = str[stringReadPos--];
while (str[wordReadPos] != ' ') {
buffer[writePos] = str[wordReadPos];
writePos++;
wordReadPos++;
}
} else {
stringReadPos--;
}
}
cout << str << endl;
cout << buffer << endl;
}
I was sure I was on the right track but all I get for an output is the very first word ("try.") I've been staring at this code so long I can't make any headway. Initially I was checking in the inner while look for a '/0' character as well but it didn't seem to like that so I took it out.
Unless you're feeling masochistic, throw your existing code away, and start with std::vector and std::string (preferably an std::vector<std::string>). Add in std::copy with the vector's rbegin and rend, and you're pretty much done.
This is utter easy in C++, with help from the standard library:
std::vector< std::string > sentence;
std::istringstream input( str );
// copy each word from input to sentence
std::copy(
(std::istream_iterator< std::string >( input )), std::istream_iterator< std::string >()
, std::back_inserter( sentence )
);
// print to cout sentence in reverse order, separated by space
std::copy(
sentence.rbegin(), sentence.rend()
, (std::ostream_iterator< std::string >( std::cout, " " ))
);
In the interest of science, I tried to make your code work as is. Yeah, it's not really the C++ way to do things, but instructive nonetheless.
Of course this is only one of a million ways to get the job done. I'll leave it as an exercise for you to remove the trailing space this code leaves in the output ;)
I commented my changes with "EDIT".
char* buffer;
int stringReadPos, wordReadPos, writePos = 0;
// Position of the last character is length -1
stringReadPos = strlen(str) - 1;
buffer = new char[stringReadPos+1];
while (stringReadPos >= 0) {
if ((str[stringReadPos] == ' ')
|| (stringReadPos == 0)) // EDIT: Need to check for hitting the beginning of the string
{
wordReadPos = stringReadPos + (stringReadPos ? 1 : 0); // EDIT: In the case we hit the beginning of the string, don't skip past the space
//buffer[writePos++] = str[stringReadPos--]; // EDIT: This is just to grab the space - don't need it here
while ((str[wordReadPos] != ' ')
&& (str[wordReadPos] != '\0')) // EDIT: Need to check for hitting the end of the string
{
buffer[writePos] = str[wordReadPos];
writePos++;
wordReadPos++;
}
buffer[writePos++] = ' '; // EDIT: Add a space after words
}
stringReadPos--; // EDIT: Decrement the read pos every time
}
buffer[writePos] = '\0'; // EDIT: nul-terminate the string
cout << str << endl;
cout << buffer << endl;
I see the following errors in your code:
the last char of buffer is not set to 0 (this will cause a failure in cout<
in the inner loop you have to check for str[wordReadPos] != ' ' && str[wordReadPos] != 0 otherwise while scanning the first word it will never find the terminating space
Since you are using a char array, you can use C string library. It will be much easier if you use strtok: http://www.cplusplus.com/reference/clibrary/cstring/strtok/
It will require pointer use, but it will make your life much easier. Your delimiter will be " ".
What where the problems with your code and what are more cplusplusish ways of doing is yet well written. I would, however, like to add that the methodology
write a function/program to implement algorithm;
see if it works;
if it doesn't, look at code until you get where the problem is
is not too productive. What can help you resolve this problem here and many other problems in the future is the debugger (and poor man's debugger printf). It will make you able to see how your program actually works in steps, what happens to the data etc. In other words, you will be able to see which parts of it works as you expect and which behaves differently. If you're on *nix, don't hesitate to try gdb.
Here is a more C++ version. Though I think the simplicity is more important than style in this instance. The basic algorithm is simple enough, reverse the words then reverse the whole string.
You could write C code that was just as evident as the C++ version. I don't think it's necessarily wrong to write code that isn't ostentatiously C++ here.
void word_reverse(std::string &val) {
size_t b = 0;
for (size_t i = 0; i < val.size(); i++) {
if (val[i] == ' ') {
std::reverse(&val[b], &val[b]+(i - b));
b = ++i;
}
}
std::reverse(&val[b], &val[b]+(val.size() - b));
std::reverse(&val[0], &val[0]+val.size());
}
TEST(basic) {
std::string o = "Do or do not, there is no try.";
std::string e = "try. no is there not, do or Do";
std::string a = o;
word_reverse(a);
CHECK_EQUAL( e , a );
}
Having a multiple, leading, or trailing spaces may be degenerate cases depending on how you actually want them to behave.

How to read a file and get words in C++

I am curious as to how I would go about reading the input from a text file with no set structure (Such as notes or a small report) word by word.
The text for example might be structured like this:
"06/05/1992
Today is a good day;
The worm has turned and the battle was won."
I was thinking maybe getting the line using getline, and then seeing if I can split it into words via whitespace from there. Then I thought using strtok might work! However I don't think that will work with the punctuation.
Another method I was thinking of was getting everything char by char and omitting the characters that were undesired. Yet that one seems unlikely.
So to sort the thing short:
Is there an easy way to read an input from a file and split it into words?
Since it's easier to write than to find the duplicate question,
#include <iterator>
std::istream_iterator<std::string> word_iter( my_file_stream ), word_iter_end;
size_t wordcnt;
for ( ; word_iter != word_iter_end; ++ word_iter ) {
std::cout << "word " << wordcnt << ": " << * word_iter << '\n';
}
The std::string argument to istream_iterator tells it to return a string when you do *word_iter. Every time the iterator is incremented, it grabs another word from its stream.
If you have multiple iterators on the same stream at the same time, you can choose between data types to extract. However, in that case it may be easier just to use >> directly. The advantage of an iterator is that it can plug into the generic functions in <algorithm>.
Yes. You're looking for std::istream::operator>> :) Note that it will remove consecutive whitespace but I doubt that's a problem here.
i.e.
std::ifstream file("filename");
std::vector<std::string> words;
std::string currentWord;
while(file >> currentWord)
words.push_back(currentWord);
You can use getline with a space character, getline(buffer,1000,' ');
Or perhaps you can use this function to split a string into several parts, with a certain delimiter:
string StrPart(string s, char sep, int i) {
string out="";
int n=0, c=0;
for (c=0;c<(int)s.length();c++) {
if (s[c]==sep) {
n+=1;
} else {
if (n==i) out+=s[c];
}
}
return out;
}
Notes: This function assumes that it you have declared using namespace std;.
s is the string to be split.
sep is the delimiter
i is the part to get (0 based).
You can use the scanner technique to grabb words, numbers dates etc... very simple and flexible. The scanner normally returns token (word, number, real, keywords etc..) to a Parser.
If you later intend to interpret the words, I would recommend this approach.
I can warmly recommend the book "Writing Compilers and Interpreters" by Ronald Mak (Wiley Computer Publishing)

tokenizing and converting to pig latin

This looks like homework stuff but please be assured that it isn't homework. Just an exercise in the book we use in our c++ course, I'm trying to read ahead on pointers..
The exercise in the book tells me to split a sentence into tokens and then convert each of them into pig latin then display them..
pig latin here is basically like this: ball becomes allboy in piglatin.. boy becomes oybay.. take the first letter out, put it at the end then add "ay"..
so far this is what i have:
#include <iostream>
using std::cout;
using std::cin;
using std::endl;
#include <cstring>
using std::strtok;
using std::strcat;
using std::strcpy;
void printPigLatin( char * );
int main()
{
char sentence[500];
char *token;
cout << "Enter string to tokenize and convert: ";
cin.getline( sentence, 500 );
token = strtok( sentence, " " );
cout << "\nPig latin for each token will be: " << endl;
while( token != NULL )
{
printPigLatin( token );
token = strtok( NULL, " " );
}
return 0;
}
void printPigLatin( char *word )
{
char temp[50];
for( int i = 0; *word != '\0'; i++ )
{
temp[i] = word[i + 1];
}
strcat( temp, "ay" );
cout << temp << endl;
}
I understand the tokenizing part quite clearly but I'm not sure how to do the pig latin.. i tried to start by simply adding "ay" to the token and see what the results will be .. not sure why the program goes into an infinite loop and keeps on displaying "ayay" .. any tips?
EDIT: this one works fine now but im not sure how to add the first letter of the token before adding the "ay"
EDIT: this is how i "see" it done but not sure how to correctly implement it ..
You're running over your input string with strcat. You need to either create a new string for each token, copying the token and "ay", or simply print the token and then "ay". However, if you're using C++ why not use istream iterators and STL algorithms?
To be honest, I severly doubt the quality of the C++ book, judging from your example. The “basic stuff” in C++ isn't the C pointer style programming. Rather, it's applying high-level library functionality. As “On Freund” pointed out, the C++ standard library provides excellent features to tackle your task. You might want to search for recommendations of better C++ books.
Concerning the problem: your printPigLatin could use the existing function strcpy (or better: strncpy which is safer in regards to buffer overflows). Your manual copy omits the first character from the input because you're using the i + 1st position. You also have a broken loop condition which always tests the same (first) character. Additionally, this should result in an overflow anyway.
As the people before me pointed out, there are several other methods of achieving what you want to do.
However, the actual problem with your code seems to be the use of strcat, I see that you changed it a bit in the edit. Here is an explanation of why the initial one did not work char* and size issues
Basically, the pointer does not allocate enough memory to add the "ay" to the string provided. If you create a pointer using the technique shown in the link, it should work fine.
I got your program to work, taking the strcat out and using
cout << word << "ay" << endl
Your loop is infinite because of *word != '\0'.
The word pointer is not changed at any time in the loop.
This seemed to have worked:
void printPigLatin( char *word )
{
cout << word + 1 << word[0] << "ay" << endl;
}
Just not sure if it's a good idea to do that.