How to parse HTML in C++? [closed] - c++

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
How would I go about parsing HTML in C++ on my Webserver Application?

libxml2 has a HTML parser. libxml++ is a wrapper for libxml2, but I'm not sure if it exposes the HTMLparser functionality.

It will mainly depend on what you want to do retrieve in your webpage.
You can try boost::spirit to create your own parser. (Or a Yacc/Lex parser).
If your are looking for more simple information in the HTML page, getc may be good enough...

Hand parsing gets messy, even for relatively trivial cases.
Have you considered a Lexer/Parser, such as Flex/Bison? I highly recommend Antlr - and get AntlrWorks.
A picture is worth a thousand words, so this will tell you why - http://www.antlr.org/works/screenshots/editor.jpg

Related

Why does regex not work for searching json? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a lot of different documents in which I want to find certain JSON, here is an example (regex101).
regex: {\"columns.*]}
I expect to get json like this:
{"columns":["1",{"title":"Bad Boys For Life","value":"Bad Boys For Life"},"2","686.5","764.5","874","877","897","937",{"value":"686.5","isMeta":true},{"isMeta":true,"value":"764.5"},{"isMeta":true,"value":"874"},{"isMeta":true,"value":"877"},{"value":"897","isMeta":true},{"isMeta":true,"value":"937"},"850398",{"value":"937","isMeta":true}]}
But that doesn't work, why?
Your regex misses a global flag so it only produces one match.
Here's the fixed version: https://regex101.com/r/o0j2Sk/2/
The reason you're getting downvoted though is that you should not use regex to parse JSON. It is extremely easy to parse JSON properly with any language, so that's strongly recommended to everybody.

How to implement a parser in C for xml based structured language? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I need to implement a simple parser in C (or C++ if C is discouraged for this application) that will read a xml file which will only contain a few elements (that's why called it xml based) i.e., only 4 root elements and and less than 5 child elements totally.
Is it easy to implement or should I use a library like expat? And if it is possible can someone tell me how I can about the process?
Use libxml or libexpat.
There are millions, trillions of examples you can find on net, or even at StackOverflow.
Have a look at this: How can libxml2 be used to parse data from XML?
or use google protobuf,
although it is not xml-based, but it fast.

How to approach a C++ parser [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am wanting to have a go at a C++ parser for a formatter I am making.
You can obviously open a file and use getline(..) or get(), is this reasonable way of starting things off and then working out a system using vector arrays and hence creating loads of arrays and somehow structuring out and processing what you are doing from there. For example say I wanted to find ever function in a source file, all functions have the common syntax, "(){" once whitespace has been removed, so do you just look for common delimeters to parse out the sections into arrays. I suppose I will learn as I go.
Or I also assume there are tried and tested ways of doing this, and I would likley just be reinventing the wheel as they say.
C++ is a language that is quite hard to parse in the first place. So if you want anything other that really trivial C++ code to be "understood" by your parser, you are definitely better off starting with an existing product.
The Clang frontend library would perhaps be a good starting point.
There are also a number of "source to source" conversion examples based on clang. Here's one of them: http://eli.thegreenplace.net/2012/06/08/basic-source-to-source-transformation-with-clang/

What advantage do we get in using xml as a database in Embedded systems? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have seen recently that people use xml files as a database to store the settings. However, I don't know why exactly is it done. I am from a C/C++, Linux background. Thus, please help me to understand this concept. Any simple C/C++ example will help me to understand it's benefit better?
XML is a very common tool with tons of libraries to handle it. Although it isn't the most beautiful format in the world, it is possible to read and modify it by both hand and program. Probably one want to use it when program configuration modified by some gui or tool. If you intend manual configuration, it's probably better to choose something else, for example ini. This is why linux tools rarely use XML, BTW.
As a C++ programmer you'd probably find interesting the "boost::property_tree" library to deal with configs. Examples of usage included in the documentation. Also it provides with plenty of different backends to store configuration, so you haven't to stick to some one format.

What are the algorithms that can be use as a spell checker for Indic scripts [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I've built Optical character recognition for Sinhala (Language in sri lanka). I've had success to some extent. Now What I need to do is post processing using dictionary data.
What would be the best approach for changing misspelled words into correct words? Can any one give suggestions?
I have the dictionary data files in unicode and also my OCR output also a unicode file. I am doing this using C++. I have tried out string matching algorithms with no success so far. I want to start the most relevant approach to this problem. Can anyone help me please?
Thanks in advance.