I am trying to extract a snippet out of a sourcecode from a website and now I want to delete all the spaces and tabs before the tags in each line. So I copied the string to a char and now I am checking each character with isspace (also tried '\t' and ' ') each line till there are some other chars like '<' doesn't matter which one while counting how much spaces and tabs there are. Subsequently I create another char and write the separator(line) to it but there I just skip the spaces (with [chars+i]). This method works pretty good but the problem is if there are more than 5 tabs then it just don't work properly. I have absolutely no idea where the fault is.
for(int i = 0;i < lines;i++){
getline(codefile, buf);
char *separator = new char[buf.size()+1];
separator[buf.size()] = 0;
memcpy(separator,buf.c_str(),buf.size());
int chars = 0;
for(int j = 0; j <= sizeof(separator); j++){
if(isspace(separator[j])){
chars++;
}
else{
break;
}
}
char *newbuf= new char[buf.size()-chars+1];
newbuf[buf.size()-chars] = 0;
for(int k = 0; k <= buf.size()-chars+1; k++){
newbuf[k] = separator[chars+k];
}
if(i > lcounter){
cout << newbuf << i << endl;
}
}
Here is the snippet of the sourcecode from the website. You can see it at the image tag, at the closing figure tag and the p tag. They have more than 5 tabs (sorry I had to censor it).
<div class="xxx">
<article class="xxx" data-id="0">
<a href="link" class="tile" style="background-image:url('x.jpg');background-position:left center" data-more="<a href=x" data-clicks="<i class="fa fa-eye"></i>" data-teaserimg="x.jpg">
<time datetime="2015">
<span>2015</span>
</time>
<h1 class="title">
<span>x</span>
</h1>
<div class="x">x</div>
<div class="x">x</div>
<div class="x">
<figure class="x">
<img src="x.jpg" width="1" height="1" alt="">
</figure>
<p>
<strong>x</strong>xxx
</p>
</div>
</a>
Sorry I can't post a picture and I hope it is understandable.
sizeof(separator) should be strlen(separator)
sizeof is the size of the separator variable, not the length of the string. Since separator is a char* this is four bytes. Now do you see why your code doesn't work when you have more than five tabs?
And as others have pointed out there really is no reason to copy the string to the separator array. Why not just examine the characters where they are? isspace(buf[j]) works just as well as isspace(separator[j]).
Related
Using C++, Windows 10, VS 2019, Curl libcurl
I am trying to fill a const char array in order to send an email using libcurl.
The libcurl example defines the char array this way outside of the main()
static const char inline_html[] =
"<html><body>\r\n"
"<p>HTML_Body</p>"
"<br />\r\n"
"</body></html>\r\n";
However, I have converted the main() in their example code to accept parameters so that I can pass the From, To, CC. Subject, and Body. So at the top of main() I have this:
std::string strBody = argv[5];
Since the inline_html[] is const I tried to create another array on the fly inside of main() like this
char* inline_array = new char[strBody.length() + 1];
and then fill it with from the passed parameter strBody. If I look for char(10) or char(13) in the strBody, they are there but they do not get put into the inline_array[] as anything if I just do this (or at least they do not react as CRLF in the resulting email)
for (int i = 0; i < strBody.length(); i++) {
inline_array[i] = strBody[i];
}
The resulting email will just be a run-on block of text with none of the original line feeds that were in the original string.
If I look for the char(10) and (13)s and substitute an escaped character like this
for (int i = 0; i < strBody.length(); i++) {
if (strBody[i] == char(10)) {
inline_array[i] = '\n'; // '\x10';
}
else if (strBody[i] == char(13)) {
inline_array[i] = '\x13'; // '\r';
}
else {
inline_array[i] = strBody[i];
}
}
I just get "\r\n" in the final email sent and not the desired CRLF.
I do know that separately I can put the escaped CRLR into a char inline_html2Chr[230] like this
inline_html2Chr[i] = '\n';
I have done it in debug mode and am able to see that html2Chr[i] == '\n', for instance.
But when I pass this to the libcurl function to send the email like this
curl_mime_data(part, inline_html2Chr, CURL_ZERO_TERMINATED);
I still end up with a run-on block of text with only the "\r\n" embedded.
Even if I modify the original inline_html[] in the example code outside of main() to put CRLF in the body like this
static const char inline_html[] =
"<html><body>\r\n"
"<p>LineOne\r\nLineTwo</p>"
"<br />\r\n"
"</body></html>\r\n";
I get "LineOne LineTwo" as the body with no line break but also no "\r\n" in the body.
Does anyone have a suggestion of what else I can try? I am just going around in circles and can't even remember all of the variations I have tried.
Thanks for any suggestions.
Ed
So obvious when you you think in terms of "HTML" and not in terms of string and char. #dewaffled was correct, \n and \r are not part of the HTML lexicon. It has to be <br> or <p>.
so I have come quiet far in my Hangman Game, but apparently im stuck with placing the Letters when a letter gets clicked.
I made 2 Variables, one of them is "word" which saves the word that has to be guessed.
And the other one is "hiddenword" which is basically the same length as "word" but the letters are replaced with underscores as you can see here:
function checkLetter(buchstabe) {
if(word.includes(buchstabe)) {
hiddenWord = hiddenWord.replace(/_/g, buchstabe);
console.log("right");
document.getElementById("response").innerHTML = "Great guess!";
}
else {
console.log("wrong")
lives = lives -1;
document.getElementById("response").innerHTML = "Sorry, the letter you chose is not part of the hidden word.";
}
}
Now my 2 Variables I told you about:
let word = "hallo123";//I set the variable to something random, because it randomly changes on start
let lives = 8; //Ignore this one, its just how many tries the user has left.
let hiddenWord;//and I didnt declare this one yet, because it will be only underscores
Example of what my Hangman looks like
So my plan is, if someone clicks a letter (here a example of my clickable letters aka divs) and its actually part of the word, it will replace the specific underscore with that letter.
<div class="layout" onclick="checkLetter('a');">a</div>
<div class="layout" onclick="checkLetter('b');">b</div>
<div class="layout" onclick="checkLetter('c');">c</div>
<div class="layout" onclick="checkLetter('d');">d</div>
<div class="layout" onclick="checkLetter('e');">e</div>
<div class="layout" onclick="checkLetter('b');">b</div>
<div class="layout" onclick="checkLetter('c');">c</div>
<div class="layout" onclick="checkLetter('d');">d</div>
<div class="layout" onclick="checkLetter('e');">e</div>
Okay! So after looking at your PasteBin I can see it looks like the #wordToGuess div at the bottom of the image you posted is a div containing x amount of underscores with a preceding space (" _") depending on the length of the word the user needs to guess.
When the checkLetter(buchstabe) function is fired and meets the if(word.includes(buchstabe)) condition why don't you do:
// HTML
<div id="wordToGuess"> _ _ _</div>
// JS
var buchstabe = 'a';
var currentHiddenWord = document.getElementById('wordToGuess').textContent;
var newHiddenWord = currentHiddenWord.replace(' _', buchstabe);
document.getElementById('wordToGuess').textContent = newHiddenWord;
So your function would become:
function checkLetter(buchstabe) {
if(word.includes(buchstabe)) {
hiddenWord = hiddenWord.replace(/_/g, buchstabe);
console.log("right");
document.getElementById("response").innerHTML = "Great guess!";
// Update underscores
var currentHiddenWord = document.getElementById('wordToGuess').textContent;
var newHiddenWord = currentHiddenWord.replace(' _', buchstabe);
document.getElementById('wordToGuess').textContent = newHiddenWord;
}
else {
console.log("wrong")
lives = lives -1;
document.getElementById("response").innerHTML = "Sorry, the letter you chose is not part of the hidden word.";
}
}
The code, upon successful guess, now sets the current text from wordToGuess div as a variable called currentHiddenWord, replaces the first instance of " _" with the correctly guessed letter (buchstabe) and sets that as a variable called newHiddenWord, and then replaces the text inside the wordToGuess div the value of the newHiddenWord variable.
Does that achieve what you need?
I am writing a program that counts the number of sentences in a string.
I count the number of '.' '?' '!'. However, there are Mr. Mrs. PhD. Dr. ..... situations. Any help please?
int number_of_sentences = 0;
for(unsigned int i=0; i <= text.length()-1; i++){
if(text[i] == '.' || text[i] == '?' ||text[i] == '!'){
++number_of_sentences;
}
}
return number_of_sentences;
You can't do it. You would need a full natural language parser to handle it with any accuracy.
Discarding the words you mention won't solve the problem. Consider:
I am impressed by that PhD. James was awarded.
I am impressed by that PhD. James was awarded it in 2001.
It is only your understanding of the semantics of English that tells you that the first one is one sentence and the second one is two sentences. You wouldn't be able to tell the difference without thinking about the meaning of the words, though. You are trying to solve the problem at the purely syntactic level, but there isn't enough information in the text without considering semantics.
The best approximation would probably be to say that you get a new sentence whenever you get a ".", "!" or "?" and the next word starts with a capital letter. But this would still be only approximately correct. It would get the first of these examples wrong, and the second one right.
Hint. Why don't you split the string in token? Then countdown every time there is a word as Mrs., Mr. ect..
Or replacing special words with white space then counting without problem.
std::string RemoveWords(const std::string& source, const std::string& chars) {
std::string result="";
for (unsigned int i=0; i<source.length(); i++) {
bool foundany=false;
for (unsigned int j=0; j<chars.length() && !foundany; j++) {
foundany=(source[i]==chars[j]);
}
if (!foundany) {
result+=source[i];
}
}
return result;
}
int number_of_sentences = 0;
text = RemoveWords(text);
for(unsigned int i=0; i <= text.length()-1; i++){
if(text[i] == '.' || text[i] == '?' ||text[i] == '!'){
++number_of_sentences;
}
}
return number_of_sentences;
The above solution will omit every character passed in the second argument string. For example:
std::string result=RemoveWords("Mrs. Rease will play to football. ByeBye", "Mrs.");
I am new to C++ and working on a relatively ambitious Win32 application in Visual Studio. The problem I'm having is that a text field in the main window is displaying a number instead of a letter when 'writing' from a string.
I'm reading a vector into a Rich Text Box, but instead of AA, the text box displays 6565. I understand that 65 is the character code for 'A', but haven't been able to find how to get the window to display the letter.
This is the code that creates the vector. Debugging this piece shows that the data loads properly. This snippet of code is part of a 'portfolio' class.
vector<string> tickers;
string cell;
string line;
ifstream d ("file.csv"); //The file contents look like: AA,AAPL,BAC and so on
if (d.is_open()) {
getline(d,line);
stringstream line2(line);
for (ci=0; ci<c; ci++) { //c is known and is the number of 'columns' in the file I am reading.
getline(line2,cell,',');
tickers.push_back(cell);
}
}
The rich text box's name is "Results2".
The code to set the Text property of the rich text box is:
portfolio p;
int n = p.tickers.size();
for (int i=0; i<n; i++) {
for (int j=0; j<p.tickers[i].size(); j++) {
Results2->Text += p.tickers[i][j];
}
Results2->Text += "\n";
}
I know that the "Results2->Text" bit is correct, because the data is getting there just fine. The problem is, I end up with:
6565
65658076
instead of:
AA
AAPL
What am I doing wrong?
Thanks in advance for any help!
It may be a Marshalling issue between your C++/STL string you are manipulating and C++/CLI String that the RichTextBox is taking.
EDIT
The problem is that you are putting the characters one by one in the RichTextBox::Text, or this field expect a System::String so it convert each character as an integer value to a String before putting it in the textbox, that's why you get the ASCII numbers instead of the string.
Since portfolio::tickers is a vector<std::string> as you specify in the comment, you could convert each std::string to a char* before giving it to the RichTextBox::Text :
portfolio p;
for (int i=0; i<p.tickers.size(); i++) {
Results2->Text += gcnew String(p.tickers[i].c_str());
Results2->Text += "\n";
}
You could also use the marshal_asintroduced in VC2008 :
Results2->Text += marshal_as<String^>(p.tickers[i].c_str());
So lets say I had a string that was
<html>
<head><title>301 Moved Permanently</title><head>
and so
I'm using the str.find(); to find where the title tag starts and it gives me the correct position but how would I go about printing just the
301 Moved Permanently
My code:
string requestedPage = page.GetBody(); //Get the body of a page and store as string "requestedPage"
int subFromBeg = requestedPage.find("<title>"); //Search for the <title> tag
int subFromEnd = requestedPage.find("</title>"); //Search for the </title> tag
std::cout << requestedPage; //before
requestedPage.substr( subFromBeg, subFromEnd );
std::cout << requestedPage; //after
requestedPage.substr( subFromBeg, subFromEnd );
should be
requestedPage = requestedPage.substr( subFromBeg, subFromEnd );
std::string::substr doesn't modify the string, it returns a modified copy of the string.
substr is how I would do it. Something like cout << str.substr(str.find("title") + 6, 21); would get you a 21-character string starting at 6 characters after 'title' (hopefully, I counted my indices right, but you get the idea).