Processing multi-page PDFs with an OpenCV program

Processing multi-page PDFs with an OpenCV program - c++

I need to process multi-page PDFs that are scanned in to me using a program I wrote in C++ using OpenCV libraries. OpenCV does not read in PDFs, so I am currently using pdftk to break up the PDF, and convert -density 300 page##.pdf page##.png to convert the individual pages to PNGs before reading them in with my program.
The issue is convert takes about 30 seconds on my Raspberry Pi to do this conversion. Is there an easier way to convert multi-page PDFs in a way that can be read in my C++/OpenCV?

You can try converting PDF to PNG with
http://www.foolabs.com/xpdf/
https://github.com/coolwanglu/pdf2htmlEX (it convert PDF to html, but do generate png as images corresponding to each page, which can be used)

Related

Saving JPG encoded array from a ROS sensor_msgs/CompressedImage to a file in roscpp

I have a Robot Operating System (ROS) .bag file containing .jpg compressed images in the form of sensor_msgs/CompressedImage messages. I have written a roscpp program that can access the raw data in the individual messages, but I'm having a hard time saving the array of raw jpg encoded data into a file.
Unfortunately, the bag files I have are very large and contain thousands of images, and I am working under a time constraint. I tried using rosbag play -i and image_view export to save off the images, but it's way too slow. I also tried using Python, but Python is slow, and I don't have a way to save the images (same problem as in C++).
Essentially, I need a way to prepend a valid jpg header to my data and save it in a file. Any suggestions are appreciated!

Creating an image header for a chunk of data that should already be an image is probably not the right approach. After all, jpegs are complex and the datastream should have all the information needed to decode them... how else would those tools be able to display them to you?
You can get a good idea of whether or not a binary blob contains an image by looking at the first and last bytes. Jpegs start with FF D8 and end with FF D9, for example. Some of the magic numbers for other files can be found here: https://en.wikipedia.org/wiki/Magic_number_(programming)#Magic_numbers_in_files

How to convert a PDF to image format (jpeg, png,...) using my C program?

I'm working with Visual Studio and OpenCV for image processing, but all my documents are PDF so I need to convert them to image format so I can compatibilize them with OpenCV
This is my code to open a image:
enter image description here
Is there any way to modify it to open and read PDF's as an image?
Many thanks!

You may want to take a look at ImageMagick. The CLI tool is able to convert PDF to images and I believe they provide C libraries too.

why is my libharu pdf oversized with .png images?

I am creating a pdf using libharu in C++ (compiled as a .cgi) that features .png images.
The code is fine, but my pdf's are ridiculously oversized.
Each page features one image of around 30kb and around 4 text characters in libharu's system font. If I open a 20 page output file of 25mb and "print" it to a file in my operating system it becomes 256kb or so with no visible change to the images.
I think the issue is related to libharu because this guy see's it too, here. He is using php so, libharu as a compiled .cgi. (my C++ code is also compiled .cgi, linked to libharu).
Another guy here on stack overflow has also seen size issues with libharu, but his problem does not mention anything to do with .png so it may be unrelated.
Code for reference:
WorkingGraphic = HPDF_LoadPngImageFromMem ( *gPdfPtr,
PngAssets[AssetIndex], //Image data ptr
PngSizes[AssetIndex]); //data length
//Render Appropriate
HPDF_Page_DrawImage (*BlitParams->page,
WorkingGraphic,
BlitParams->OutputRect->X,
BlitParams->OutputRect->Y,
BlitParams->OutputRect->Width,
BlitParams->OutputRect->Height);
Does anyone know how to drive libharu so it creates sensible sized pdf's when you use .png images?

Right I don't know how to remove a question but maybe this info will be useful to others anyway.
I may have had the same issue as this fellow here where I have duplicated this answer.
What I needed to do was enable compression of the .pdf, which I had not done.
Documentation link
C Code:
HPDF_SetCompressionMode (pdf, HPDF_COMP_ALL);
It's because I didn't do enough research to know that .pdf format does not natively support .png, or if it has been updated to do so, libharu still doesn't. So, this option tells libharu to use zlib to zip compress everything it can, including your images.
The implementation is not perfect (you will still see a size difference if you zip your output .pdf) but it is acceptable for my use case.

If you don't need the full-size image in the PDF, you can reduce the image to a thumbnail using GDI+ APIs, equal in size to however big you want the image to appear in the PDF.
Save the scaled PNG to a temporary file, and pass the thumbnail PNG to Haru PDF. This will reduce the size of the PDF file.
The image will be pixellated when the viewer zooms in.

Reading PDF file using OpenCV

Is it possible to convert a PDF file to cv::Mat?
I know that PDF file is generally vector of objects, but given a required resolution. Is there any tool that can do such a conversion?

OpenCV doesn't support pdf format at all, so you should convert pdf page to image using another library. Read this discussion: Open source PDF library for C/C++ application?
Also this question is similar to yours: What C++ library can I use to convert a PDF to an image on windows?

Converting CfbsBitmap format to JPG

I am doing a mobile application using symbian and Qt 4.7
I have a CFBSBitmap data when i save the CFBSBitmap data, the format is a .mbm file and I wish to convert it to be saved as jpg format.
Can anyone help me in this area as I am quite a newbie in Qt and Symbian

QImage provides a save function that takes a parameter format. Just specify the JPG format.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Processing multi-page PDFs with an OpenCV program - c++

You can try converting PDF to PNG with http://www.foolabs.com/xpdf/ https://github.com/coolwanglu/pdf2htmlEX (it convert PDF to html, but do generate png as images corresponding to each page, which can be used)

Related

Saving JPG encoded array from a ROS sensor_msgs/CompressedImage to a file in roscpp

How to convert a PDF to image format (jpeg, png,...) using my C program?

why is my libharu pdf oversized with .png images?

Reading PDF file using OpenCV

Converting CfbsBitmap format to JPG

Categories

Resources