C++ , how do i get a particular string in a html code - c++

I am trying to parse this XML Yahoo feed.
How do i like get each record into an array in C++
like create a structure
then got those variable
and record each element inside the structure.
In the first place, how do i get the value out
Thanks

You may want to see if the given page offers output in a JSON format. Then you can simply request the value instead of messing around with HTML. The Yahoo! Finance site may even offer an API that you can use to easily request the value.

If you want to mess with html code:
#include <iostream>
#include <fstream>
int main() {
std::ifstream ifile("in.html");
std::string line;
std::string ndl("<span id=\"yfs_l10_sgdmyr=x\">");
while(ifile.good()){
getline(ifile, line);
if (line.size()) {
size_t spos, epos;
if ((spos = line.find(ndl)) != std::string::npos) {
spos += ndl.size();
if ((epos = line.find(std::string("</span>"), spos)) != std::string::npos) {
std::cout << line.substr(spos, epos-spos) << std::endl;
}
}
}
}
return 0;
}

Related

boost property tree cannot read multiple json data in one file

I really need to get help to decide my problem. I am using boost property tree to parse twitter messages that is stored in json file. All messages are saved in one json file and I need to parse all one by one.
Here is the twitter json data saved in a file. it has 3 different messages. (Below is deducted message only for test)
{"id":593393012970926082,"in_reply_to_status_id":1,"user":{"id":2292380240,"followers_count":2},"retweet_count":0}
{"id":654878454684687878,"in_reply_to_status_id":7,"user":{"id":2292380241,"followers_count":4},"retweet_count":5}
{"id":123487894154878414,"in_reply_to_status_id":343,"user":{"id":2292380242,"followers_count":773},"retweet_count":654}
And here is my C++ code for parsing the message, using property tree.
#include <boost/property_tree/json_parser.hpp>
using namespace std;
using namespace boost::property_tree;
string jsonfile = "./twitter.json";
int main()
{
ptree pt;
read_json( jsonfile, pt );
cout<<"in_reply_to_status_id: "<<pt.get("in_reply_to_status_id",0)<<"\n";
}
I want to get all in_reply_to_status_id values from the file. Now it is printing only the first line value. The result is printing follow.
in_reply_to_status_id: 1
I would like to get all values like below.
in_reply_to_status_id: 1
in_reply_to_status_id: 7
in_reply_to_status_id: 343
How can I get all values from the file.
Please help me. Thank you very much.
You should have right json file, for example like this
[
{"id":593393012970926082,"in_reply_to_status_id":1,"user":{"id":2292380240,"followers_count":2},"retweet_count":0},
{"id":654878454684687878,"in_reply_to_status_id":7,"user":{"id":2292380241,"followers_count":4},"retweet_count":5},
{"id":123487894154878414,"in_reply_to_status_id":343,"user":{"id":2292380242,"followers_count":773},"retweet_count":654}
]
And code should be like this
for (const auto& p : pt)
{
cout << p.second.get("in_reply_to_status_id",0) << endl;
}
Instead of range-based for, you can use BOOST_FOREACH for example.
BOOST_FOREACH(const ptree::value_type& p, pt)
You can see my example, first you should get the child tree, and then parse it. My code:
string str = "{\"key\":[{\"id\":1}, {\"id\":2}]}";
stringstream ss(str);
boost::property_tree::ptree parser, child;
boost::property_tree::json_parser::read_json(ss, parser);
child = parser.get_child("key");
for(auto& p : child)
cout << p.second.get<uint32_t>("id") << endl;
I hope this can help you.

Parsing a text file based on first word C++

I'm trying to write a simple program in C++ that reads information from a text file and prints it to the console. The text file will look similar to this.
thing1 contents1
thing2 contents2
thing3 contents3
thing4 contents4
Is there a way that I can print contents1 console by knowing that the preceding word is thing1?
#include <istream>
#include <string>
#include <vector>
std::vector<std::string> getContents(std::istream &stream, std::string mark) {
std::vector<std::string> contents;
std::string current;
while(stream) {
stream >> current;
if(current == mark) {
stream >> current;
contents.push_back(current);
}
}
return contents
}
this is a very basic example. i wouldnt suggest using just this, but it does get the job done. what it does do: if mark is found in stream, grab content. what it does not do: a lot checking to make sure the stream is valid, or that the line is valid (ie the content could come immediatly after mark). This also could probably be done easier on strings, its just my personal preferene to use streams
edit: thought i saw thing1 = content1. looked again and turns out it was thing1 content1. code edited appropriately
One way, using std::map
ifstream fin("textfile.txt");
std::string firstw, secondw;
std::map< std::string, std::string> m ;
while ( fin >> firstw >> secondw )
{
m[firstw] = secondw ;
}
fin.close( );
std::string input_word = "thing1" ;
// Wrap following in a function, input_word is the search element
if( m.find(input_word) != m.end () )
{
std::cout << m[input_word] ;
}

How to find specific string constant in line and copy the following

I am creating a somewhat weak/vague database (My experience is very little, and please forgive the mess of my code). For this, I create a check everytime my console program starts that checks whether a database (copied to userlist.txt) is created already, if not a new will be created, if the database exists, however, it should all be copied to a 'vector users' (Which is a struct) I have within the class 'userbase' that will then contain all user information.
My userstats struct looks like this,
enum securityLevel {user, moderator, admin};
struct userstats
{
string ID;
string name;
string password;
securityLevel secLev;
};
I will contain all this information from a textfile in this code,
int main()
{
Userbase userbase; // Class to contain userinformation during runtime.
ifstream inFile;
inFile.open("userlist.txt");
if(inFile.good())
{
// ADD DATE OF MODIFICATION
cout << "USERLIST FOUND, READING USERS.\n";
userstats tempBuffer;
int userCount = -1;
int overCount = 0;
while(!inFile.eof())
{
string buffer;
getline(inFile, buffer);
if (buffer == "ID:")
{
userCount++;
if (userCount > overCount)
{
userbase.users.push_back(tempBuffer);
overCount++;
}
tempBuffer.ID = buffer;
cout << "ID"; // Just to see if works
}
else if (buffer == "name:")
{
cout << "name"; // Just to see if works
tempBuffer.name = buffer;
}
else if (buffer == "password:")
{
cout << "password"; // Just to see if works
tempBuffer.password = buffer;
}
}
if (userCount == 0)
{
userbase.users.push_back(tempBuffer);
}
inFile.close();
}
...
What I try to do is to read and analyze every line of the text file. An example of the userlist.txt could be,
created: Sun Apr 15 22:19:44 2012
mod_date: Sun Apr 15 22:19:44 2012
ID:1d
name:admin
password:Admin1
security level:2
(I am aware I do not read "security level" into the program yet)
EDIT: There could also be more users simply following the "security level:x"-line of the preceding user in the list.
Now, if the program reads the line "ID:1d" it should then copy this into the struct and finally I will put it all into the vector userbase.users[i]. This does not seem to work, however. It does not seem to catch on to any of the if-statements. I've gotten this sort of program to work before, so I am very confused what I am doing wrong. I could really use some help with this. Any other kind of criticism of the code is very welcome.
Regards,
Mikkel
None of the if (buffer == ...) will ever be true as the lines always contain the value of the attribute contained on each line as well as the type of the attribute. For example:
ID:1d
when getline() reads this buffer will contain ID:1d so:
if (buffer == "ID:")
will be false. Use string.find() instead:
if (0 == buffer.find("ID:")) // Comparing to zero ensures that the line
{ // starts with "ID:".
// Avoid including the attribute type
// in the value.
tempBuffer.ID.assign(buffer.begin() + 3, buffer.end());
}
As commented by jrok, the while for reading the file is incorrect as no check is made immediately after getline(). Change to:
string buffer;
while(getline(inFile, buffer))
{
...

Read file and extract certain part only

ifstream toOpen;
openFile.open("sample.html", ios::in);
if(toOpen.is_open()){
while(!toOpen.eof()){
getline(toOpen,line);
if(line.find("href=") && !line.find(".pdf")){
start_pos = line.find("href");
tempString = line.substr(start_pos+1); // i dont want the quote
stop_pos = tempString .find("\"");
string testResult = tempString .substr(start_pos, stop_pos);
cout << testResult << endl;
}
}
toOpen.close();
}
What I am trying to do, is to extrat the "href" value. But I cant get it works.
EDIT:
Thanks to Tony hint, I use this:
if(line.find("href=") != std::string::npos ){
// Process
}
it works!!
I'd advise against trying to parse HTML like this. Unless you know a lot about the source and are quite certain about how it'll be formatted, chances are that anything you do will have problems. HTML is an ugly language with an (almost) self-contradictory specification that (for example) says particular things are not allowed -- but then goes on to tell you how you're required to interpret them anyway.
Worse, almost any character can (at least potentially) be encoded in any of at least three or four different ways, so unless you scan for (and carry out) the right conversions (in the right order) first, you can end up missing legitimate links and/or including "phantom" links.
You might want to look at the answers to this previous question for suggestions about an HTML parser to use.
As a start, you might want to take some shortcuts in the way you write the loop over lines in order to make it clearer. Here is the conventional "read line at a time" loop using C++ iostreams:
#include <fstream>
#include <iostream>
#include <string>
int main ( int, char ** )
{
std::ifstream file("sample.html");
if ( !file.is_open() ) {
std::cerr << "Failed to open file." << std::endl;
return (EXIT_FAILURE);
}
for ( std::string line; (std::getline(file,line)); )
{
// process line.
}
}
As for the inner part the processes the line, there are several problems.
It doesn't compile. I suppose this is what you meant with "I cant get it works". When asking a question, this is the kind of information you might want to provide in order to get good help.
There is confusion between variable names temp and tempString etc.
string::find() returns a large positive integer to indicate invalid positions (the size_type is unsigned), so you will always enter the loop unless a match is found starting at character position 0, in which case you probably do want to enter the loop.
Here is a simple test content for sample.html.
<html>
<a href="foo.pdf"/>
</html>
Sticking the following inside the loop:
if ((line.find("href=") != std::string::npos) &&
(line.find(".pdf" ) != std::string::npos))
{
const std::size_t start_pos = line.find("href");
std::string temp = line.substr(start_pos+6);
const std::size_t stop_pos = temp.find("\"");
std::string result = temp.substr(0, stop_pos);
std::cout << "'" << result << "'" << std::endl;
}
I actually get the output
'foo.pdf'
However, as Jerry pointed out, you might not want to use this in a production environment. If this is a simple homework or exercise on how to use the <string>, <iostream> and <fstream> libraries, then go ahead with such a procedure.

C++: How to extract a string from RapidXml

In my C++ program I want to parse a small piece of XML, insert some nodes, then extract the new XML (preferably as a std::string).
RapidXml has been recommended to me, but I can't see how to retrieve the XML back as a text string.
(I could iterate over the nodes and attributes and build it myself, but surely there's a build in function that I am missing.)
Thank you.
Althoug the documentation is poor on this topic, I managed to get some working code by looking at the source. Although it is missing the xml header which normally contains important information. Here is a small example program that does what you are looking for using rapidxml:
#include <iostream>
#include <sstream>
#include "rapidxml/rapidxml.hpp"
#include "rapidxml/rapidxml_print.hpp"
int main(int argc, char* argv[]) {
char xml[] = "<?xml version=\"1.0\" encoding=\"latin-1\"?>"
"<book>"
"</book>";
//Parse the original document
rapidxml::xml_document<> doc;
doc.parse<0>(xml);
std::cout << "Name of my first node is: " << doc.first_node()->name() << "\n";
//Insert something
rapidxml::xml_node<> *node = doc.allocate_node(rapidxml::node_element, "author", "John Doe");
doc.first_node()->append_node(node);
std::stringstream ss;
ss <<*doc.first_node();
std::string result_xml = ss.str();
std::cout <<result_xml<<std::endl;
return 0;
}
Use print function (found in rapidxml_print.hpp utility header) to print the XML node contents to a stringstream.
rapidxml::print reuqires an output iterator to generate the output, so a character string works with it. But this is risky because I can not know whether an array with fixed length (like 2048 bytes) is long enough to hold all the content of the XML.
The right way to do this is to pass in an output iterator of a string stream so allow the buffer to be expanded when the XML is being dumped into it.
My code is like below:
std::stringstream stream;
std::ostream_iterator<char> iter(stream);
rapidxml::print(iter, doc, rapidxml::print_no_indenting);
printf("%s\n", stream.str().c_str());
printf("len = %d\n", stream.str().size());
If you do build XML yourself, don't forget to escape the special characters. This tends to be overlooked, but can cause some serious headaches if it is not implemented:
< <
> >
& &
" "
&apos; &apos;
Here's how to print a node to a string straight from the RapidXML Manual:
xml_document<> doc; // character type defaults to char
// ... some code to fill the document
// Print to stream using operator <<
std::cout << doc;
// Print to stream using print function, specifying printing flags
print(std::cout, doc, 0); // 0 means default printing flags
// Print to string using output iterator
std::string s;
print(std::back_inserter(s), doc, 0);
// Print to memory buffer using output iterator
char buffer[4096]; // You are responsible for making the buffer large enough!
char *end = print(buffer, doc, 0); // end contains pointer to character after last printed character
*end = 0; // Add string terminator after XML
If you aren't yet committed to Rapid XML, I can recommend some alternative libraries:
Xerces - This is probably the defacto C++ implementation.
XMLite - I've had some luck with this minimal XML implementation. See the article at http://www.codeproject.com/KB/recipes/xmlite.aspx
Use static_cast<>
Ex:
rapidxml::xml_document<> doc;
rapidxml::xml_node <> * root_node = doc.first_node();
std::string strBuff;
doc.parse<0>(xml);
.
.
.
strBuff = static_cast<std::string>(root_node->first_attribute("attribute_name")->value());
Following is very easy,
std::string s;
print(back_inserter(s), doc, 0);
cout << s;
You only need to include "rapidxml_print.hpp" header in your source code.