Gumbo HTML text inside A - c++

I'm using Gumbo to parse a web page in CP1251. I've converted the text into UTF-8 and sent it to gumbo parser. I have problem with getting text inside A link with
node->v.text.text
I get strange symbols on the output while the source is correctly displayed in the console.
I am using Qt 5.2 and libiconv for converting purposes.
Need I convert node text to local code page or what am I doing wrong?
Getting page in CP1251
QByteArray barrData = pf->getData();
size_t dstlen = 1048576;
char buf[dstlen];
memset((char*)buf, 0, dstlen);
char* pIn = barrData.data();
char* pOut = (char*)buf;
size_t srclen = barrData.size();
iconv_t conv = iconv_open("UTF-8", "CP1251");
iconv(conv, &pIn, &srclen, &pOut, &dstlen);
iconv_close(conv);
GumboOutput* output = gumbo_parse(buf);
parsePage(output->root);
gumbo_destroy_output(&kGumboDefaultOptions, output);
Parsing
if (node->v.element.tag == GUMBO_TAG_DIV && (_class = gumbo_get_attribute(&node->v.element.attributes, "class")))
{
if (QString(_class->value) == "catalog-item-title")
{
qDebug() << "parsePage: found product, parsing...";
GumboVector* children = &node->v.element.children;
for (int i = 0; i < children->length; ++i)
{
GumboNode* node = static_cast<GumboNode*>(children->data[i]);
GumboAttribute* href;
GumboAttribute* id;
if (node->v.element.tag == GUMBO_TAG_A &&
(href = gumbo_get_attribute(&node->v.element.attributes, "href"))
)
{
char buf[1024];
memset(buf, 0, 1024);
int i = node->v.text.original_text.length;
memcpy(buf, node->v.text.original_text.data, i);
QString strTitle = buf;
Q_ASSERT(node->v.text.original_text.length > 0);
qDebug() << "parsePage: found product" << strTitle << href->value;
break;
}
}
}
}
Source page text:
<div class="catalog-item-title">Измир 2</div>

I have smoked up examples at last. The text is contained inside child node.
if (node->v.element.tag == GUMBO_TAG_A &&
(href = gumbo_get_attribute(&node->v.element.attributes, "href"))
)
{
QString strTitle;
GumboNode* title_text = static_cast<GumboNode*>(node->v.element.children.data[0]);
if (title_text->type == GUMBO_NODE_TEXT)
{
strTitle = title_text->v.text.text;
}
qDebug() << "parsePage: found product" << strTitle << href->value;
break;
}

Related

Protobuf C++: Method ParseFromString return false when size of message larger

I'm using method PROTOBUF_ATTRIBUTE_REINITIALIZES bool ParseFromString(ConstStringParam data); to parse message in my C++ code. The code is simple as below:
> model::BroadcastMessage msg;
> if(msg.ParseFromString(message.toStdString())){
> qDebug() << "Parse ok"; }else{
> qDebug() << "Parse failed"; }
message is type of QString.
Model is like:
message BroadcastMessage
{
enum BRC_MESSAGE_ID{
StreamURL = 0;
AllStreamURL = 1;
AllSavedURL = 2;
}
BRC_MESSAGE_ID BrcMessageID = 1;
bytes Body = 2;
bool JSON = 3;
string Data = 4;
string Token = 5;
}
message ListAllSavedVideoURL {
repeated LiveVideoURL CurrentUrl = 1;
}
message LiveVideoURL {
string StreamURL = 1;
int32 NumberCurrentViewing = 2;
}
It parses ok when input is short like this:
"\b\u0002\u0012\\\n\u0015\n\u0013saved_x_22-15-34.ts\n\u0015\n\u0013saved_x_22-16-06.ts\n\u0015\n\u0013saved_x_22-16-39.ts\n\u0015\n\u0013saved_x_22-17-04.ts" "" "hoavq.broadcast"
But it fail to parse when data is longger like:
"\b\u0002\u0012?\u0001\n\u0015\n\u0013saved_x_22-15-34.ts\n\u0015\n\u0013saved_x_22-16-06.ts\n\u0015\n\u0013saved_x_22-16-39.ts\n\u0015\n\u0013saved_x_22-17-04.ts\n\u0015\n\u0013saved_x_22-17-38.ts\n\u0015\n\u0013saved_x_22-18-03.ts" "" "hoavq.broadcast"
Both of data having same struct.

How to fill a label with text read from a file .msg

I want to fill a label with some text read from a file.msg. I think i've somehow managed to read from the file but now i need to fill the label with what i've read.
void __fastcall TErrorPanel::lblOpMsgErClick(TObject *Sender)
{
char OutBuf[500];
char OutBuf2[500];
static int Func_exec = 0;
if (Func_exec == 0)
{
Func_exec = 1;
if (tpgm_cfg.TestMod.RejectModule == 0)
{
GetMessage(1, SYSMSGIMG, OutBuf, gPathMsgFile);
}
else
{
GetMessage(2, SYSMSGIMG, OutBuf2, gPathMsgFile);
}
Func_exec = 0;
}
return;
}
The GetMessage custom function, at the moment it shows MsgNF, it looks like it isn't picking up the content of the OutBuf
void GetMessage(int Code,char *Section, char *OutBuf, char *PathMsgFile, int InsErrCode)
{
char buff[512],Msg[500],sCode[10];
char *p;
int cmpres;
long rOffset = 0;
itoa(Code,sCode,10);
::GetPrivateProfileString(Section, sCode, "MsgNf", buff, sizeof(buff), PathMsgFile);
rOffset = ::GetPrivateProfileInt(Section, "Offset", 0, PathMsgFile);
cmpres=strcmp("MsgNf",buff);
if (cmpres==0)
{
sprintf(Msg,"Message[%ld]: Not Found !",Code + rOffset);
}
do
{
p = strchr (buff , '|');
if(p != NULL)
{
*p = '\n';
}
}while(p != NULL);
strcpy(OutBuf, buff);
if (strcmpi(SYSERRORMSG,Section)==0)
{
sprintf(buff,"Error[%ld]-%s", Code + rOffset, OutBuf);
strcpy(OutBuf,buff);
rmLastErrorCode = Code;
}
return;
}
This is how you generally set the text to display:
label_name->Caption = "Text to display";
However, I don't know how to fit that into the code you've shown.

How to use Native Host Message in Native Application For Chrome Native Messaging Extension

I am working on Chrome Extension with Native Host messaging. I am not able to use message text in my host application. Everything working fine from the establishing connection to get response in extension.
I need to use Message text in my application for further use/execution in simple text datatype (string/char). I know message is in UTF8 encoded form i tried to decode but still getting problem, can any one help me out?
When i decode message chrome extension console show Error: "Error when communicating with the native messaging host." and if i use that message text after "cout" same error is there "Error when communicating with the native messaging host". Direct sending and receiving message works fine for me.
Code is something like this :
std::string mycode(std::string data){
data= data+"abc"; //changing text to any thing.
cout<< data;
anotherFunction(data);//killing processes using string data
}
int main(int argc, char* argv[])
{
std::cout.setf( std::ios_base::unitbuf );
while (true)
{
unsigned int ch, inMsgLen = 0, outMsgLen = 0;
std::string input = "", response = "";
std::cin.read((char*)&inMsgLen, 4);
if (inMsgLen == 0)
{
break;
}
else
{
for (int i=0; i < inMsgLen; i++)
{
ch = getchar();
input += ch;
}
}
response.append("{\"echo\":").append(input).append("}");
outMsgLen = response.length();
std::cout.write((char*)&outMsgLen, 4);
std::cout << response;
cout<< input;
//using "input" variable for further user
mycode(input);
}
return 0;
}
That's wrong, did you read the docs ?
Try something like this...
_setmode( _fileno( stdin ), _O_BINARY );
_setmode( _fileno( stdout ), _O_BINARY );
char cBuffer[65536] = {0};
while(true)
{
unsigned int uiSize = 0;
std::cin.read((char*)&uiSize, sizeof(unsigned int));
if(uiSize != 0 && uiSize < 65536)
{
memset(cBuffer, 0, 65536);
std::cin.read(cBuffer, uiSize);
std::string strIn(cBuffer);
std::string strOut = "{\"result\":\"This is a Test\"}";
uiSize = strOut.length();
std::cout << char(((uiSize>>0) & 0xFF));
std::cout << char(((uiSize>>8) & 0xFF));
std::cout << char(((uiSize>>16) & 0xFF));
std::cout << char(((uiSize>>24) & 0xFF));
std::cout << strOut.c_str();
}
else
break;
}
You need to set the io to binary, otherwise things like this could happen...
if a byte with the value 00011010 (CTRL ALT Z = 26) is present it will be treated as a EOF and end the communication. :)

libzip can't close file

I'm currently using libzip in a C++11 program to extract the contents of a compressed file and store them into a data structure that will also hold metadata related to the file.
I'm using the current method to explode the zip file and get the content of each file in it:
void explodeArchive(const string& path, vector<ZipFileModel>& files) {
int error = 0;
zip *zip = zip_open(path.c_str(), 0, &error);
if (zip == nullptr) {
throw logic_error("Could not extract content of file " + path);
}
const zip_int64_t n_entries = zip_get_num_entries(zip, ZIP_FL_UNCHANGED);
for (zip_int64_t i = 0; i < n_entries; i++) {
const char *file_name = zip_get_name(zip, i, ZIP_FL_ENC_GUESS);
struct zip_stat st;
zip_stat_init(&st);
zip_stat(zip, file_name, ZIP_FL_NOCASE, &st);
char *content = new char[st.size];
std::cerr << file_name << std::endl;
zip_file *file = zip_fopen(zip, file_name, ZIP_FL_NOCASE);
const zip_int64_t did_read = zip_fread(file, content, st.size);
if (did_read <= 0) {
continue;
}
if (strlen(content) < st.size) {
LOG(WARNING)<< "File " << file_name << " is truncated.";
}
if (strlen(content) > st.size) {
content[st.size] = '\0';
}
ZipFileModel model;
model.name = string(file_name);
model.content = string(content);
model.order = -1;
files.push_back(model);
zip_fclose(file);
delete[] content;
}
zip_close(zip);
}
My problem is that I get random segmentation faults with gdb pointing to zip_fclose(file);:
Program received signal SIGSEGV, Segmentation fault.
0x00000001001ef8a0 in zip_source_close (src=0x105001b00) at /Users/xxx/Projects/xxx/xxx/src/libzip/zip_source_close.c:48
48 (void)src->cb.l(src->src, src->ud, NULL, 0, ZIP_SOURCE_CLOSE);
What's the best way to debug this? As I said it happens intermittently so it's hard to pin down the exact cause.
You aren't closing the zip_file when there's nothing to read.
First you open the file inside:
zip_file *file = zip_fopen(zip, file_name, ZIP_FL_NOCASE);
Then try to read something:
const zip_int64_t did_read = zip_fread(file, content, st.size);
and if there's nothing to read you continue and the file is never closed.
if (did_read <= 0) {
continue;
}
So, just add:
if (did_read <= 0) {
zip_fclose(file);
continue;
}

go to other logical drives and continues to search for file

i have a program that search for files of a particular extention(.apk) in a particular logical drive(C:). my system has 3 more partitions :- D: E: F: and these also contains apk files. now i want that my program will also search in these logical drives for the apk's files. how i can do this. please anybody have some suggestion then help me, am trying this since morning. here is my code.....
int SearchDirectory(std::vector<std::string> &refvecFiles,
const std::string &refcstrRootDirectory,
const std::string &refcstrExtension,
bool bSearchSubdirectories = true)
{
std::string strFilePath; // Filepath
std::string strPattern; // Pattern
std::string strExtension; // Extension
HANDLE hFile; // Handle to file
WIN32_FIND_DATA FileInformation; // File information
strPattern = refcstrRootDirectory + "\\*.*";
hFile = FindFirstFile(strPattern.c_str(), &FileInformation);
if(hFile != INVALID_HANDLE_VALUE)
{
do
{
if(FileInformation.cFileName[0] != '.')
{
strFilePath.erase();
strFilePath = refcstrRootDirectory + "\\" + FileInformation.cFileName;
if(FileInformation.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
if(bSearchSubdirectories)
{
// Search subdirectory
int iRC = SearchDirectory(refvecFiles,
strFilePath,
refcstrExtension,
bSearchSubdirectories);
if(iRC)
return iRC;
}
}
else
{
// Check extension
strExtension = FileInformation.cFileName;
strExtension = strExtension.substr(strExtension.rfind(".") + 1);
if(strExtension == refcstrExtension)
{
// Save filename
refvecFiles.push_back(strFilePath);
}
}
}
} while(FindNextFile(hFile, &FileInformation) == TRUE);
// Close handle
FindClose(hFile);
DWORD dwError = GetLastError();
if(dwError != ERROR_NO_MORE_FILES)
return dwError;
}
return 0;
}
int main()
{
int iRC = 0;
std::vector<std::string> vecAPKFiles;
//std::vector<std::string> vecTxtFiles;
// Search 'c:' for '.apk' files including subdirectories
iRC = SearchDirectory(vecAPKFiles, "c:", "apk");
if(iRC)
{
std::cout << "Error " << iRC << std::endl;
return -1;
}
// Print results
for(std::vector<std::string>::iterator iterAvi = vecAPKFiles.begin();
iterAvi != vecAPKFiles.end();
++iterAvi)
std::cout << *iterAvi << std::endl;
TCHAR szDrive[] = (" A:");
DWORD uDriveMask = GetLogicalDrives();
while(uDriveMask)
{
// Use the bitwise AND, 1â€"available, 0-not available
if(uDriveMask & 1)
printf("%s ", (const char *)szDrive);
// increment, check next drive
++szDrive[1];
// shift the bitmask binary right
uDriveMask >>= 1;
}
printf("\n ");
// Wait for keystroke
_getch();
return 0;
}
You've got a bit weird drive string char szDrive[] = " A:"; (Even with the half-baked TCHAR stuff removed). I'd use char szDrive[] = "A:\"; instead, and increment ++szDrive[0];. You can then pass szDrive to SearchDirectory()