Could anyone please provide me a sample C/C++ code to read and edit PDF Metadata?
If it is XMP, what else to do?
If it's XMP, I think there's an SDK available from Adobe. But beware, PDF metadata has a long history and isn't only stored in XMP.
You might best be off using a library that allows PDF manipulation. There are several commercial ones available. I have no idea whether there's something usable available for free.
https://github.com/hfiguiere/exempi -- C-library to read and write XMP.
The XMP SDK from adobe does not read/write metadata of PDF files. PDF reading/writing is a complicated task. Your best bet is to use a third party PDF library, or commands like pdfleo.
If you haven't tried XMPToolkit yet, give it a shot and see if it meets your needs.
Related
Basically, I want to be to be able to pass data between Excel cells and
my C++ program. I don't have any experience in Excel/C++ interactions and I haven't been able to find a coherent explanation or documentation on any websites. If someone could link me some references or provide one themselves it would be much appreciated. Thanks.
If this is for a Windows system, you could always use one of the available managed Excel libraries, such as OfficeWriter or Aspose.
There also might be similar libraries specifically for c++, I know we (OfficeWriter) used to make one.
Edit: Looks like there are a few out there, like LibXL and BasicExcel.
If the application will run on an end user machine with Excel installed, you can easily use the Excel interop and keep Excel hidden.
In addition to LibXL and BasicExcel mentioned by smoore, there is:
ExcelFormat Library is an improved version of the BasicExcel library and will allow you to read and write simple values. It is free.
xlslib will also read and write simple values, I have not tried it tho. It is also free.
Number Duck, is a commercial library that I have written, It supports reading and writing values, formulas and pictures. The website has examples of how to use the features.
As part of my application, my client has requested that I include an automated e-mailing system. As part of this system, I generate HTML code and use automation to send it via. Outlook.
However, they also require a PDF copy of the HTML document to be sent as an attachment. My initial attempts involved using libHaru, which proved difficult to use efficiently, as I was required to create the PDF document from scratch, which required computation of the position of each of the lines in a table, and positioning of all the text, etc.
I was wondering if there would be a way to programmatically convert HTML code (or an HTML file if need be) into a PDF document either by using Win32/MFC itself or an external library.
Thanks in advance!
EDIT: Just to clarify, I am looking for solutions which minimize external dependencies.
You should evaluate this utility wkhtmltopdf:
http://code.google.com/p/wkhtmltopdf/
You can call it from the command line without the need to run a setup.
I use it generating my output documents as html then cal a ShellExecute(...) to convert it to PDF. It's great!
Inside uses webkit + qt. So compability with modern HTML is OK.
Hope it helps.
I'd take a look at PDF Creator, which can be used as a COM object (that acts pretty much like a printer). I haven't used it to print HTML, so I'm not sure, but my guess is that you'll probably end up having to instantiate a web browser control to render the HTML, and then feed it from there to the PDF control.
Some possible answers are in this thread:
C++ Library to Convert HTML to PDF?
Not sure if they will satisfy your particular requirements, but these might at least get you started.
Edit:
Some other possible options here.
Not MFC but you can try QtWebKit. It can render and export HTML to PDF, PNG, JPEG
have you ever worked with JPEG2000 and/or GML?.
I'm reading documentation about GMLJP2 but I don't find any libraries that implement this. I mean, as far as I know it should be possible to have a GML file within a jp2 file (that is, a single file with both of the thigs).
Also, it's difficult to get a viewer that integrates GML and JPEG2000.
Any information regarding how to work with GML or JPEG200 is wellcome ;-)
Many thanks!
PD: I want to work with C or C++ but it doesn't matter the language yet
"Also, it's difficult to get a viewer that integrates GML and JPEG2000."
As I understand GML is just a block of metadata in XML inserted into JPEG2000 file. Any decent decoder should be able to extract it.
I wrote a JP2 metadata editor and included basic GML support.
You can download it here:
http://j2k-codec.com/mde.html
See the "gml.jp2" file for an example.
Hope it will be helpful.
I need to write a program that displays a PDF which a third-party supplies. I need to insert text data in to the form before displaying it to the user. I do have the option to convert the PDF in to another format, but it has to look exactly like the original PDF. C++ is the preferred language of choice, but other languages can be investigated (e.g. C#). It need to work on a Windows desktop machine.
What libraries, tools, strategies, or other programming languages do you suggest investigate to accomplish this task? Are there any online examples you could direct me to.
Thank-you in advance.
What about PoDoFo:
The PoDoFo library is a free, portable
C++ library which includes classes to
parse PDF files and modify their
contents into memory. The changes can
be written back to disk easily. The
parser can also be used to extract
information from a PDF file (for
example the parser could be used in a
PDF viewer). Besides parsing PoDoFo
includes also very simple classes to
create your own PDF files. All classes
are documented so it is easy to start
writing your own application using
PoDoFo.
iTextSharp is a free library that you can use in .Net applications. Take a look at the iText page - that is for the iText project, which is a Java library. iTextSharp is part of that project, and is a port to C# and .Net.
Consider Python It have a lot PDF librarys (both creating and extracting) eg:
http://pypi.python.org/pypi/pdfsplit/0.4.2
http://pypi.python.org/pypi/JagPDF/1.4.0
http://pypi.python.org/pypi/pdfminer/20091129
http://pypi.python.org/pypi/podofo/0.0.1
http://pypi.python.org/pypi/pyFPDF/1.52
There are also good tools for using C/C++ code in Python and to create .exe form Python scripts. If you decide to use different language consider Python as prototyping language!
I'm using wxWidgets to write cross-plafrom applications. In one of applications I need to be able to load data from Microsoft Excel (.xls) files, but I need this to work on Linux as well, so I assume I cannot use OLE or whatever technology is available on Windows.
I see that there are many open source programs that can read excel files (OpenOffice, KOffice, etc.), so I wonder if there is some library that I could use?
Excel files it needs to support are very simple, straight tabular data. I don't need to extract any formatting except column/row position and the data itself.
Suggestedd reference: What is a simple and reliable C library for working with Excel files?
I came across other libraries (chicago on sf.net, xlsLib) but they seem to be outdated.
jrh
I can say that I know of a wxWidgets application that reads Excel .xls and .xlsx files on any platform. For the .xlsx files we used an XML parser and zip stream reader and grab the data we need, pretty easy to get going. For the .xls files we used: ExcelFormat, which works well and we found the author to be very generous with his support.
Maybe just some encouragement to give it a go? It was a couple of days work to get working.
Maybe http://www.libxl.com/ can help ?
I think that it is not something easy to do. xls files are quite complex and it is a proprietary format.
Maybe this is a stupid idea but why don't you upload and access your doc with Google docs. There are some apis to access your doc.
2 potential problems:
- Your app needs internet access
- Currently there is no C++ api.
But there are api for several languages including python see http://code.google.com/intl/fr/apis/gdata/articles/python_client_lib.html