How can I read and parse a URL's XML or JSON content with C++? - c++

I can easily do this with JQuery or PHP but I have a project for my Intro to C++ class and I thought it'll be pretty cool if I could mix C++ with some APIs like twitter, google, yahoo etc.
Could you tell me if there is a class ( I know OOP ) I can use to read an external XML or JSON file. The program has to run on windows and linux so I can't use commands.
Or, if this can't be done, what other cool project would you do? Thanks in advance.

XML: xerces(C++) http://xerces.apache.org/xerces-c/
JSON:http://sourceforge.net/projects/jsoncpp/

Related

How can I convert an AIML file to CSV file?

I am trying to make a chatbot android application. Please tell me how can I convert my .aiml file to .aiml.csv
If you want to know the process, you may go through Google's Program-AB and understand how it processes all tags. Also, it is build in Java so you can copy these classes to your application.
https://code.google.com/archive/p/program-ab/source

About the intrepreter for AIML

I tried to build a chatbot in AIML. I downloaded the codes from http://nlp-addiction.com/chatbot/mathbot/ but couldn't get the idea about how to run the program. Please help me.
An AIML file isn't program code, it's a data file (much like any other xml file).
You need to use an interpreter like Program-AB to load and use the file to answer queries.
If you just want to test the contents and formatting of the aiml file, you could use Pandorabots and load the file into a blank bot fairly easily.
Yes, AIML file isn't program code. It's just like a data format. You can learn about it more from here : http://www.alicebot.org/aiml.html
AIML is a data encoding format that tells the bot when to do what to do. Many interpreters can be used to interpret the aiml tags.
One of them is PyAIML which is python based interpreter fairly simple to use.

Extracting key words from HTML to C++ under linux

I am working on a simple client-server project. Client is written in Java, it sends key words to C++ server written under Linux and recives a list of URLs with best ranks ( depending on number of occurrences of key words ). Server's job is to go through some URLs in search of key words and return best-fitting URLs. And now the problem is that I have to parse HTML sites to find occurrences of key words, plus I need to extract links from visited page to search on them as well. And my question is what library can I use to do that? Remember only C++ linux libraries are suitable for me. There were some similar topics, so I tried to go through most of them, but some of libraries parse only html files and I don't want to download every site I visit, but parse it on the fly and just store it's rank and url. Some of them look a bit complicated to me - for instance firstly parsing HTML to XML or something else and then finally work on the results with C++. Is there something simple and sufficient to do what I need it to do? Any advise will be appreciated.
I don't think regular expressions are appropriate for HTML parsing. I'm using libxml2, and I enjoy it very much - easy to use, portable and lightning fast.
To get URLs from the web using C/C++ you could use the libcurl library. To parse URLs and other not too easy stuff from the site you can use a regex library.
Separating the HTML tags from the real content can also be done without the use of a library.
For more advanced stuff one could use Qt which offers classes such as QWebPage (which uses WebKit) that allows one to access the DOM-Model of the page and extract individual HTML objects (e.g. single cells of a table) rather easyly.
You can try xerces-c. It's a powerful library for xml parsing. It support xml reading on the fly, dom and sax parsing.

Help programmatically add text to an existing PDF

I need to write a program that displays a PDF which a third-party supplies. I need to insert text data in to the form before displaying it to the user. I do have the option to convert the PDF in to another format, but it has to look exactly like the original PDF. C++ is the preferred language of choice, but other languages can be investigated (e.g. C#). It need to work on a Windows desktop machine.
What libraries, tools, strategies, or other programming languages do you suggest investigate to accomplish this task? Are there any online examples you could direct me to.
Thank-you in advance.
What about PoDoFo:
The PoDoFo library is a free, portable
C++ library which includes classes to
parse PDF files and modify their
contents into memory. The changes can
be written back to disk easily. The
parser can also be used to extract
information from a PDF file (for
example the parser could be used in a
PDF viewer). Besides parsing PoDoFo
includes also very simple classes to
create your own PDF files. All classes
are documented so it is easy to start
writing your own application using
PoDoFo.
iTextSharp is a free library that you can use in .Net applications. Take a look at the iText page - that is for the iText project, which is a Java library. iTextSharp is part of that project, and is a port to C# and .Net.
Consider Python It have a lot PDF librarys (both creating and extracting) eg:
http://pypi.python.org/pypi/pdfsplit/0.4.2
http://pypi.python.org/pypi/JagPDF/1.4.0
http://pypi.python.org/pypi/pdfminer/20091129
http://pypi.python.org/pypi/podofo/0.0.1
http://pypi.python.org/pypi/pyFPDF/1.52
There are also good tools for using C/C++ code in Python and to create .exe form Python scripts. If you decide to use different language consider Python as prototyping language!

Load Excel data into Linux / wxWidgets C++ application?

I'm using wxWidgets to write cross-plafrom applications. In one of applications I need to be able to load data from Microsoft Excel (.xls) files, but I need this to work on Linux as well, so I assume I cannot use OLE or whatever technology is available on Windows.
I see that there are many open source programs that can read excel files (OpenOffice, KOffice, etc.), so I wonder if there is some library that I could use?
Excel files it needs to support are very simple, straight tabular data. I don't need to extract any formatting except column/row position and the data itself.
Suggestedd reference: What is a simple and reliable C library for working with Excel files?
I came across other libraries (chicago on sf.net, xlsLib) but they seem to be outdated.
jrh
I can say that I know of a wxWidgets application that reads Excel .xls and .xlsx files on any platform. For the .xlsx files we used an XML parser and zip stream reader and grab the data we need, pretty easy to get going. For the .xls files we used: ExcelFormat, which works well and we found the author to be very generous with his support.
Maybe just some encouragement to give it a go? It was a couple of days work to get working.
Maybe http://www.libxl.com/ can help ?
I think that it is not something easy to do. xls files are quite complex and it is a proprietary format.
Maybe this is a stupid idea but why don't you upload and access your doc with Google docs. There are some apis to access your doc.
2 potential problems:
- Your app needs internet access
- Currently there is no C++ api.
But there are api for several languages including python see http://code.google.com/intl/fr/apis/gdata/articles/python_client_lib.html