So lets say I had a string that was
<html>
<head><title>301 Moved Permanently</title><head>
and so
I'm using the str.find(); to find where the title tag starts and it gives me the correct position but how would I go about printing just the
301 Moved Permanently
My code:
string requestedPage = page.GetBody(); //Get the body of a page and store as string "requestedPage"
int subFromBeg = requestedPage.find("<title>"); //Search for the <title> tag
int subFromEnd = requestedPage.find("</title>"); //Search for the </title> tag
std::cout << requestedPage; //before
requestedPage.substr( subFromBeg, subFromEnd );
std::cout << requestedPage; //after
requestedPage.substr( subFromBeg, subFromEnd );
should be
requestedPage = requestedPage.substr( subFromBeg, subFromEnd );
std::string::substr doesn't modify the string, it returns a modified copy of the string.
substr is how I would do it. Something like cout << str.substr(str.find("title") + 6, 21); would get you a 21-character string starting at 6 characters after 'title' (hopefully, I counted my indices right, but you get the idea).
Related
Here's my task and below is most of the code I already wrote:
Develop the program so that it finds and extracts specified items from the xmlstring file using start and end tags. Now we find and extract and display first the location information and then the temperature information.
Location can be found between the tags <location> and </location>. The temperature is between the tags <temp_c> and </temp_c>.
To make it easy to find whatever information from the, xml-string write a function that takes the xml-string and the "inner" text (same for start tag and end tag) of the tags as parameters and returns the text from between the start tag and end tags. If either start or end tag is not found the function must return "not found".
Note that when you search for the tag you must search for the whole tag (including angle brackets) not just the tag name that was given as parameter.
For example, if you wanted to find the location
location = find_field(page, "location");
and to get the temperature you could call it as follows:
temperature = find_field(page, "temp_c");
MY CODE:
#pragma warning (disable:4996)
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
string find_field(const string& xml, string tag_name);
int main() {
string page, line, location, temperature;
ifstream inputFile("weather.xml");
while (getline(inputFile, line)) {
page.append(line);
line.erase();
}
location = find_field(page, "location");
temperature = find_field(page, "<temp_c>");
cout << "Location: " << location << endl;
cout << "Temperature: " << temperature << endl;
}
string find_field(const string& xml, string tag_name)
{
string start_tag = "<" + tag_name + ">";
string end_tag = "<" + tag_name + ">";
return "not found";
}
SPECIFIC QUESTION
When I run the program it says:
Location: not found
Temperature: not found
Just not found. But it doesnt show the data that is in the file. How can I fix it? Thanks
I will not solve this completely for you, because I assume this is a task for you to actually learn, but I will give you some guidance.
In C++ you work on strings with iterators or offsets (positions). Since you are starting, I would suggest to first get familiar with offsets.
Basically, you need to search for the positions of start_tag and end_tag in your xml string and then return what is in between. The std::string class has a find method (http://www.cplusplus.com/reference/string/string/find/). You use it to find the start position of the search string. Additionally, you will probably need the length method (http://www.cplusplus.com/reference/string/string/length/) for position calculation and the substr method (http://www.cplusplus.com/reference/string/string/substr/) to get the interesting part.
Your pseudo logic of the find could be:
Find the start position of the start_tag in xml
Calculate the end position of the start_tag (start position + length of start_tag)
Find the start position of the end_tag in xml
Return sub-string between the two positions (the length is the difference between start position of end_tag and end position of start_tag)
Of course you need to check if the positions are valid before step 4 and if not return the "Not found".
Additionally, please consider Scheff's third comment on your question, that the end_tag starts with </ and that you need to call find_field without the angle brackets around your search string because you add them later in the function.
I hope this helps you find a solution.
Here is a very incomplete starter example:
string find_field(const string& xml, string tag_name)
{
std::string start_tag = "<" + tag_name + ">";
std::string end_tag = "</" + tag_name + ">";
size_t start_tag_start_pos = xml.find(start_tag);
// make some sanity checks to the position here (find returns std::string::npos if start_tag wasn't found)
size_t start_pos_of_interesting_string = start_tag_start_pos + start_tag.length();
// you can find the start_pos of the end tag similarly
// don't forget sanity checks
// calculate the length of between start and end tag
return xml.substr(start_pos_of_interesting_string, /* you need to add the length of the interesting string here*/);
}
C++ newbie here, I'm not sure if my title describes what I am trying to do perfectly, but basically I am trying to output one line of a string array for a certain index of that array.
For example: Say myArray[2] is the 3rd index of a string array, and it holds an entire paragraph, with each sentence separated by a newline character.
contents of myArray[2]: "This is just an example.
This is the 2nd sentence in the paragraph.
This is the 3rd sentence in the paragraph."
I would like to output only the first sentence of the content held in the 3rd index of the string array.
Desired output: This is just an example.
So far I have only been able to output the entire paragraph instead of one sentence, using the basic:
cout << myArray[2] << endl;
But obviously this is not correct. I am assuming the best way to do this is to use the newline character in some way, but I am not sure how to go about that. I was thinking I could maybe copy the array into a new, temporary array which would hold in each index a sentence of the paragraph held in the original array index, but this seems like I am complicating the issue too much.
I have also tried to copy the string array into a vector, but that didn't seem to help my confusion.
You can do something along these lines
size_t end1stSentencePos = myArray[2].find('\n');
std::string firstSentence = end1stSentencePos != std::string::npos?
myArray[2].substr(0,end1stSentencePos) :
myArray[2];
cout << firstSentence << endl;
Here's the reference documentation of std::string::find() and std::string::substr().
Below is a general solution to your problem.
std::string findSentence(
unsigned const stringIndex,
unsigned const sentenceIndex,
std::vector<std::string> const& stringArray,
char const delimiter = '\n')
{
auto result = std::string{ "" };
// If the string index is valid
if(stringIndex < stringArray.size())
{
auto index = unsigned{ 0 };
auto posStart = std::string::size_type{ 0 };
auto posEnd = stringArray[stringIndex].find(delimiter);
// Attempt to find the specified sentence
while((posEnd != std::string::npos) && (index < sentenceIndex))
{
posStart = posEnd + 1;
posEnd = stringArray[stringIndex].find(delimiter, posStart);
index++;
}
// If the sentence was found, retrieve the substring.
if(index == sentenceIndex)
{
result = stringArray[stringIndex].substr(posStart, (posEnd - posStart));
}
}
return result;
}
Where,
stringIndex is the index of the string to search.
sentenceIndex is the index of the sentence to retrieve.
stringArray is your array (I used a vector) that contains all of the strings.
delimiter is the character that specifies the end of a sentence (\n by default).
It is safe in that if an invalid string or sentence index is specified, it returns an empty string.
See a full example here.
From what I could understand in the docs I deducted every xml_node knows it's position in the source text. What I'd like to do is to retrieve LINE and COLUMN for given xml_node<>*:
rapidxml::file<> xmlFile("generators.xml"); // Open file, default template is char
xml_document<> doc; // character type defaults to char
doc.parse<0>(xmlFile.data());; // 0 means default parse flags
xml_node<> *main = doc.first_node(); //Get the main node that contains everything
cout << "My first node is: <" << main->name() << ">\n";
cout << " located at line " << main->?????() << ", column " << main->?????() << "\n";
How should I retrieve those offsets? Could I somehow crawl from the main->name() pointer back to the beginning of the document? But how can I access the document string from xml_document<> doc to compare offsets?
Let's say you parse a simple xml document in a string.
char xml[] = "<hello/><world/>"
doc.parse(xml);
RapidXML will insert null terminators (and maybe make other mods to the "document", so it might look like this now:
char xml[] = "<hello\000\000<world\000\000";
If you than ask for the name() of the 'hello' node, it returns a pointer to the 'h' in your xml array. You can just subtract the base of the array to get an offset.
int offset = node->name() - &xml[0];
Obviously this isn't line and character. To get that, you'd need to count the number of newlines between the offset and the array start. (but maybe do this on a 'clean' version of the xml data, as RapidXML might well mangle newline sequences in the processed version..
Here is an example feed that I would like to parse:
https://gdata.youtube.com/feeds/api/users/aniBOOM/subscriptions?v=2&alt=json
You can check it with http://json.parser.online.fr/ to see what it contains.
I have a small problem while parsing data feed provided by youtube. First issue was the way the youtube provided the data wrapped inside feed field and because of that I couldn't parse the username straight from original json file so I had to parse first entry field and generate new Json data from that.
Anyway the problem is that for some reason that doesn't include more than the first username and I don't know why because if you check that feed on online parser the entry should contain all the usernames.
`
data = value["feed"]["entry"];
Json::StyledWriter writer;
std::string outputConfig = writer.write( data );
//This removes [ at the beginning of entry and also last ] so we can treat it as a Json data
size_t found;
found=outputConfig.find_first_of("[");
int sSize = outputConfig.size();
outputConfig.erase(0,1);
outputConfig.erase((sSize-1),sSize);
reader.parse(outputConfig, value2, false);
cout << value2 << endl;
Json::Value temp;
temp = value2["yt$username"]["yt$display"];
cout << temp << endl;
std::string username = writer.write( temp );
int sSize2 = username.size();
username.erase(0,1);
username.erase((sSize2-3),sSize2);
`
But for some reason [] fix also cuts the data I'm generating, if I print out the data without removing [] I can see all the users but in that case I can't extract temp = value2["yt$username"]["yt$display"];
In JSON, the brackets denote Arrays (nice reference here). You can see this in the online parser, also -- Objects (items with one or more key/value pairs {"key1": "value1", "key2": "value2"}) are denoted with blue +/- signs and Arrays (items inside brackets separated by commas [{arrayItem1}, {arrayItem2}, {arrayItem3}]) are denoted with red +/- signs.
Since entry is an Array, you should be able to iterate through them by doing something like this:
// Assumes value is a Json::Value
Json::Value entries = value["feed"]["entry"];
size_t size = entries.size();
for (size_t index=0; index<size; ++index) {
Json::Value entryNode = entries[index];
cout << entryNode["yt$username"]["yt$display"].asString() << endl;
}
I'm working in a program that uses boost::regex to match some patterns inside a huge text file (greater than 200 MB). The matches are working fine, but to build the output file I need to order the matches (just 2, but over all the text) in the sequence they are found in the text.
Well, when in debug mode, during the cout procedure I can see inside the iterator it1 an m_base attribute that shows an address that is increased each step of the loop and I think this m_base address is the address of the matched pattern in the text, but I could not certify it and I could not find a way to access this attribute to store the address.
I don't know if there is any way to retrieve the address of each matched pattern in the text, but I really need to get this information.
#define FILENAME "File.txt"
int main() {
int length;
char * cMainBuf;
ifstream is;
is.open (FILENAME, ios::binary );
is.seekg(0, ios::end);
length = is.tellg();
is.seekg (0, ios::beg);
cMainBuf = new char[length+1];
memset(cMainBuf, '\0',length+1);
is.read(cMainBuf,length);
is.close();
string str=cMainBuf;
regex reg("^(\\d{1,3}\\s[A-F]{99})");
regex rReg(reg);
int const sub_matches[] = { 1 };
boost::sregex_token_iterator it1(str.begin() ,str.end() ,rReg ,sub_matches ), it2;
while(it1!=it2)
{
cout<<"#"<<sz++<<"- "<< *(it1++) << endl;
}
return 0;
}
#sln
Hi sln,
I'll answer your questions:
1. I removed all code that is not part of this issue, so some libraries remaining there;
2. Same as 1;
3. Because the file is not a simple text file in fact, it can have any symbol and it may affect the reading procedure, as I could realize in the past;
4. Zero buffer was necessary during the tests period, since I could not store more than 1MB in the buffer;
5. the iterator doesn't allo to use char* to set the beggining and the end of the file, so was necessary to change it to string;
6. The incoming RegEx will not be declared static, this is just a draft to show the problem and the anchor act to find the line start, not only the string start;
7. sub_matches was part of the test to see where the iterator was for regex with 2 or more groups inside it;
8. sz is just a counter;
9. There is no cast possible from const std::_String_const_iterator<_Elem,_Traits,_Alloc> to long.
In fact all the code works fine, I can identify any pattern inside the text, but what I really need to know is the memory address of each matched pattern (in this case, the address of the iterator for each iteration). I could realize that m_base has this address, but I could not retrieve this address until this moment.
Ill continue the analysis, if I find any solution for this problem I post it here.
Edit #Tchesko, I am deleting my original answer. I've loaded the boost::regex and tried it out with a regex_search(). Its not the itr1 method like you are doing but, I think it comes down to just getting the results from the boost::smatch class, which is really boost::match_results().
It has member functions to get the position and length of the match and sub-matches. So, its really all you need to find the offset into your big string. The reason you can't get to m_base is that it is a private member variable.
Use the methods position() and length(). See the sample below... which I ran, debugged and tested. I'm getting back up to speed with VS-2005 again. But, boost does seem a little quirky. If I am going to use it, I want it to do Unicode, and than means I have to compile ICU. The boost binarys I'm using is downloaded 1.44. The latest is 1.46.1 so I might build it with vc++ 8 after I asess it viability with ICU.
Hey, let me know how it turns out. Good luck!
#include <boost/regex.hpp>
#include <locale>
#include <iostream>
using namespace std;
int main()
{
std::locale::global(std::locale("German"));
std::string s = " Boris Schäling ";
boost::regex expr("(\\w+)\\s*(\\w+)");
boost::smatch what;
if (boost::regex_search(s, what, expr))
{
// These are from boost::match_results() class ..
int Smpos0 = what.position();
int Smlen0 = what.length();
int Smpos1 = what.position(1);
int Smlen1 = what.length(1);
int Smpos2 = what.position(2);
int Smlen2 = what.length(2);
printf ("Match Results\n--------------\n");
printf ("match start/end = %d - %d, length = %d\n", Smpos0, Smpos0 + Smlen0, Smlen0);
std::cout << " '" << what[0] << "'\n" << std::endl;
printf ("group1 start/end = %d - %d, length = %d\n", Smpos1, Smpos1 + Smlen1, Smlen1);
std::cout << " '" << what[1] << "'\n" << std::endl;
printf ("group2 start/end = %d - %d, length = %d\n", Smpos2, Smpos2 + Smlen2, Smlen2);
std::cout << " '" << what[2] << "'\n" << std::endl;
/*
This is the hard way, still m_base is a private member variable.
Without m_base, you can't get the root address of the buffer.
long Match_start = (long)(what[0].first._Myptr);
long Match_end = (long)(what[0].second._Myptr);
long Grp1_start = (long)(what[1].first._Myptr);
long Grp1_end = (long)(what[1].second._Myptr);
*/
}
}
/* Output:
Match Results
--------------
match start/end = 2 - 17, length = 15
'Boris Schäling'
group1 start/end = 2 - 7, length = 5
'Boris'
group2 start/end = 9 - 17, length = 8
'Schäling'
*/