Writing a PNG in C++ - c++

I am trying to write some data to a PNG file using C++ with Visual Studio Express 2013 on Windows 7 64-bit. I understand that to do this, I need to use an external library, but here is where I'm having some difficulty.
I tried using LodePNG - it looked simple, lightweight, and easy to use. The problem is, it was TOO simple, and seems to require data in a certain pixel format that doesn't match what I have. I could modify my data to make it compatible with LodePNG, but I'd much rather use a library such as libpng with a bit more flexibility.
However, I don't understand the first thing about building or linking libraries, and libpng has proved to be an absolute nightmare in this. I tried following this guide, and managed to produce "libpng.lib" and "png.h", but when I try to include these in my project (I placed both files in my project directory, added "png.h" to my header files and added "libpng.lib" to the Linker's "Additional Dependencies" field), I got a ton of build errors, notably:
error C1083: Cannot open include file: 'pnglibconf.h': No such file or directory
Can anyone please instruct me as to how to install libpng, direct me to a good guide on the subject (I'm amazed by the lack of guides out there...), or recommend a different (lightweight, easy to install) PNG library? I'm going crazy here.

LodePNG is as easy as you say. I've used it before as well. Just in case you change your mind and decide to encode the data you have into the right format (assuming it is BGRA).. The following will convert the BGRA format to RGBA as required by lodepng..
std::vector<std::uint8_t> PngBuffer(ImageData.size());
for(std::int32_t I = 0; I < Height; ++I)
{
for(std::int32_t J = 0; J < Width; ++J)
{
std::size_t OldPos = (Height - I - 1) * (Width * 4) + 4 * J;
std::size_t NewPos = I * (Width * 4) + 4 * J;
PngBuffer[NewPos + 0] = ImageData[OldPos + 2]; //B is offset 2
PngBuffer[NewPos + 1] = ImageData[OldPos + 1]; //G is offset 1
PngBuffer[NewPos + 2] = ImageData[OldPos + 0]; //R is offset 0
PngBuffer[NewPos + 3] = ImageData[OldPos + 3]; //A is offset 3
}
}
std::vector<std::uint8_t> ImageBuffer;
lodepng::encode(ImageBuffer, PngBuffer, Width, Height);
lodepng::save_file(ImageBuffer, "SomeImage.png");
You can also just do it in-place:
for(std::int32_t I = 0; I < Height; ++I)
{
for(std::int32_t J = 0; J < Width; ++J)
{
std::size_t OldPos = (Height - I - 1) * (Width * 4) + 4 * J;
std::size_t NewPos = I * (Width * 4) + 4 * J;
std::swap(ImageData[NewPos + 0], ImageData[ImageData + 2]);
}
}

Consider writing your file in NetPBM/PBMplus format as specified here. It is very easy and you don't need a library as the file is so straightforward. The Wikipedia article shows the format here.
Here is a simple example:
#include <stdio.h>
#include <stdlib.h>
int main(){
FILE *imageFile;
int x,y,pixel,height=100,width=256;
imageFile=fopen("image.pgm","wb");
if(imageFile==NULL){
perror("ERROR: Cannot open output file");
exit(EXIT_FAILURE);
}
fprintf(imageFile,"P5\n"); // P5 filetype
fprintf(imageFile,"%d %d\n",width,height); // dimensions
fprintf(imageFile,"255\n"); // Max pixel
/* Now write a greyscale ramp */
for(x=0;x<height;x++){
for(y=0;y<width;y++){
pixel=y;
fputc(pixel,imageFile);
}
}
fclose(imageFile);
}
Once you have the file as PBM/PGM frmat, use ImageMagick (here) to convert to PNG with a simple command like:
convert file.pgm file.png

The missing header file may be because you need to specify addition include directories as well to point to libpng's header files. It looks like you are probably linking correctly.
It's been a while since I've done this in visual studios, but there should be a field for this in the projects configuration.

Related

How do I create and save an image from a byte stream in c++?

I am working in c++ trying to create an image and save it to a given directory. I don't care what type of image file it is (PNG, jpeg, bitmap, etc.), I just want it to be viewable in Windows 10. The data stream is in the following form: std::vector<unsigned char>.
I would like to do this natively in c++, but I am not against using a library if it is straightforward to implement and lightweight.
I have this working in C# using the following code, but I don't know if there is a direct translation into c++
// C# code to translate into c++?
var image = new BitmapImage();
using (var ms = new MemoryStream(message.ImageData))
{
image.BeginInit();
image.CacheOption = BitmapCacheOption.OnLoad;
image.StreamSource = ms;
image.EndInit();
image.Freeze();
}
C# implementations have image routines in the language's standard library. C++ does not. So there is no equivalent code in standard C++: you need to use a third party library. On Windows you could use Win32.
Below I use stb_image_write.h, which can be found here; the stb libraries are barebones 1-file or 2-file libraries typically used in independent game development where having a dependency on libPNG et. al. would be overkill.
#include <vector>
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_image_write.h"
std::vector<unsigned char> generate_some_image(int wd, int hgt)
{
// this is just an example for tutorial purposes ... generate a red circle
// in a white field.
std::vector<unsigned char> data(wd * hgt * 4);
int c_x = wd / 2;
int c_y = hgt / 2;
int radius = c_x;
int i = 0;
for (int y = 0; y < hgt; y++) {
for (int x = 0; x < wd; x++) {
if ((x - c_x) * (x - c_x) + (y - c_y) * (y - c_y) <= radius * radius) {
data[i++] = 255;
data[i++] = 0;
data[i++] = 0;
data[i++] = 255;
} else {
data[i++] = 255;
data[i++] = 255;
data[i++] = 255;
data[i++] = 255;
}
}
}
return data;
}
int main()
{
const int wd = 128;
const int hgt = 128;
std::vector<unsigned char> data = generate_some_image(wd, hgt);
return stbi_write_png( "c:\\test\\foo.png", wd, hgt, 4, &(data[0]), 4*wd);
}
There is no default standard C++ library for creating an image file with a specific format (PNG, Bitmap ... etc). However, there are tricks to create an image and make it viewable if this is all that you need. The ways are discussed below:
Use a library as jwezorek! mentioned in his answer. There are tons of libraries that can help: OpenCV, STB ... etc
Create a function that creates a file and streams the pixels data to that file in a specific format (Can be fun for some and a headache for others).
Save the image data as raw data (pixels data only) in a file, and use a simpler programming language to read the file data and view the image. For instance, let's assume that you will use "python" as the simpler language to create a simple image viewer on windows 10, the code will look as follows:
import NumPy as np
import scipy.misc as smp
from PIL import Image
w = 1066
h = 600
with open('rgb.raw', 'r') as file:
data = file.read().replace('\n', ' ') # Replacing new lines with spaces
rgbs = data.split(' ') # Get a 1D array of all the numbers (pixels) in file
rgbs = [ int(x) for x in rgbs if x != ''] # Change the data into integers
nprgbs = np.array(rgbs, dtype=np.uint8) # Define a numpy array
arr = np.reshape(nprgbs, (h,w,3)) # Reorder the 1D array to 3D matrix (width, height, channels)
img = Image.fromarray(arr) # Create an Image from the pixels.
img.show() # View the image in a window (It uses the default viewer of the OS)
The above code will read a file containing the RGB channels of an image and view it on the default viewer.
As you can see, you can do it in a variety of ways, choose the simplest and the most suitable for you. In conclusion, the answer to your question is "No" unless you use a custom library, create your own functions, or use simpler programming languages to read the stream of data and use them to create the file.

C++ GDI+ bitmap manipulation needs speed up on byte operations

I'm using GDI+ in C++ to manipulate some Bitmap images, changing the colour and resizing the images. My code is very slow at one particular point and I was looking for some potential ways to speed up the line that's been highlighted in the VS2013 Profiler
for (UINT y = 0; y < 3000; ++y)
{
//one scanline at a time because bitmaps are stored wrong way up
byte* oRow = (byte*)bitmapData1.Scan0 + (y * bitmapData1.Stride);
for (UINT x = 0; x < 4000; ++x)
{
//get grey value from 0.114*Blue + 0.299*Red + 0.587*Green
byte grey = (oRow[x * 3] * .114) + (oRow[x * 3 + 1] * .587) + (oRow[x * 3 + 2] * .299); //THIS LINE IS THE HIGHLIGHTED ONE
//rest of manipulation code
}
}
Any handy hints on how to handle this arithmetic line better? It's causing massive slow downs in my code
Thanks in advance!
Optimization depends heavily on the used compiler and the target system. But there are some hints which may be usefull. Avoid multiplications:
Instead of:
byte grey = (oRow[x * 3] * .114) + (oRow[x * 3 + 1] * .587) + (oRow[x * 3 + 2] * .299); //THIS LINE IS THE HIGHLIGHTED ONE
use...
//get grey value from 0.114*Blue + 0.299*Red + 0.587*Green
byte grey = (*oRow) * .114;
oRow++;
grey += (*oRow) * .587;
oRow++;
grey += (*oRow) * .299;
oRow++;
You can put the incrimination of the pointer in the same line. I put it in a separate line for better understanding.
Also, instead of using the multiplication of a float you can use a table, which can be faster than arithmetic. This depends on CPU und table size, but you can give it a shot:
// somwhere global or class attributes
byte tred[256];
byte tgreen[256];
byte tblue[256];
...at startup...
// Only init once at startup
// I am ignoring the warnings, you should not :-)
for(int i=0;i<255;i++)
{
tred[i]=i*.114;
tgreen[i]=i*.587;
tblue[i]=i*.229;
}
...in the loop...
byte grey = tred[*oRow];
oRow++;
grey += tgreen[*oRow];
oRow++;
grey += tblue[*oRow];
oRow++;
Also. 255*255*255 is not such a great size. You can build one big table. As this Table will be larger than the usual CPU cache, I give it not such more speed efficiency.
As suggested, you could do math in integer, but you could also try floats instead of doubles (.114f instead of .114), which are usually quicker and you don't need the precision.
Do the loop like this, instead, to save on pointer math. Creating a temporary pointer like this won't cost because the compiler will understand what you're up to.
for(UINT x = 0; x < 12000; x+=3)
{
byte* pVal = &oRow[x];
....
}
This code is also easily threadable - the compiler can do it for you automatically in various ways; here's one, using parallel for:
https://msdn.microsoft.com/en-us/library/dd728073.aspx
If you have 4 cores, that's a 4x speedup, just about.
Also be sure to check release vs debug build - you don't know the perf until you run it in release/optimized mode.
You could premultiply values like: oRow[x * 3] * .114 and put them into an array. oRow[x*3] has 256 values, so you can easily create array aMul1 of 256 values from 0->255, and multiply it by .144. Then use aMul1[oRow[x * 3]] to find multiplied value. And the same for other components.
Actually you could even create such array for RGB values, ie. your pixel is 888, so you will need an array of size 256*256*256, which is 16777216 = ~16MB.Whether this would speed up your process, you would have to check yourself with profiler.
In general I've found that more direct pointer management, intermediate instructions, less instructions (on most CPUs, they're all equal cost these days), and less memory fetches - e.g. tables are not the answer more often than they are - is the usual optimum, without going to direct assembly. Vectorization, especially explicit is also helpful as is dumping assembly of the function and confirming the inner bits conform to your expectations. Try this:
for (UINT y = 0; y < 3000; ++y)
{
//one scanline at a time because bitmaps are stored wrong way up
byte* oRow = (byte*)bitmapData1.Scan0 + (y * bitmapData1.Stride);
byte *p = oRow;
byte *pend = p + 4000 * 3;
for(; p != pend; p+=3){
const float grey = p[0] * .114f + p[1] * .587f + p[2] * .299f;
}
//alternatively with an autovectorizing compiler
for(; p != pend; p+=3){
#pragma unroll //or use a compiler option to unroll loops
//make sure vectorization and relevant instruction sets are enabled - this is effectively a dot product so the following intrinsic fits the bill:
//https://msdn.microsoft.com/en-us/library/bb514054.aspx
//vector types or compiler intrinsics are more reliable often too... but get compiler specific or architecture dependent respectively.
float grey = 0;
const float w[3] = {.114f, .587f, .299f};
for(int c = 0; c < 3; ++c){
grey += w[c] * p[c];
}
}
}
Consider fooling around with OpenCL and targeting your CPU to see how fast you could solve with CPU specific optimizations and easily multiple cores - OpenCL covers this up for you pretty well and provides built in vector ops and dot product.

exchanging 2 memory positions

I am working with OpenCV and Qt, Opencv use BGR while Qt uses RGB , so I have to swap those 2 bytes for very big images.
There is a better way of doing the following?
I can not think of anything faster but looks so simple and lame...
int width = iplImage->width;
int height = iplImage->height;
uchar *iplImagePtr = (uchar *) iplImage->imageData;
uchar buf;
int limit = height * width;
for (int y = 0; y < limit; ++y) {
buf = iplImagePtr[2];
iplImagePtr[2] = iplImagePtr[0];
iplImagePtr[0] = buf;
iplImagePtr += 3;
}
QImage img((uchar *) iplImage->imageData, width, height,
QImage::Format_RGB888);
We are currently dealing with this issue in a Qt application. We've found that the Intel Performance Primitives to be be fastest way to do this. They have extremely optimized code. In the html help files at Intel ippiSwapChannels Documentation they have an example of exactly what you are looking for.
There are couple of downsides
Is the size of the library, but you can link static link just the library routines you need.
Running on AMD cpus. Intel libs run VERY slow by default on AMD. Check out www.agner.org/optimize/asmlib.zip for details on how do a work around.
I think this looks absolutely fine. That the code is simple is not something negative. If you want to make it shorter you could use std::swap:
std::swap(iplImagePtr[0], iplImagePtr[2]);
You could also do the following:
uchar* end = iplImagePtr + height * width * 3;
for ( ; iplImagePtr != end; iplImagePtr += 3) {
std::swap(iplImagePtr[0], iplImagePtr[2]);
}
There's cvConvertImage to do the whole thing in one line, but I doubt it's any faster either.
Couldn't you use one of the following methods ?
void QImage::invertPixels ( InvertMode mode = InvertRgb )
or
QImage QImage::rgbSwapped () const
Hope this helps a bit !
I would be inclined to do something like the following, working on the basis of that RGB data being in three byte blocks.
int i = 0;
int limit = (width * height); // / 3;
while(i != limit)
{
buf = iplImagePtr[i]; // should be blue colour byte
iplImagePtr[i] = iplImagaePtr[i + 2]; // save the red colour byte in the blue space
iplImagePtr[i + 2] = buf; // save the blue color byte into what was the red slot
// i++;
i += 3;
}
I doubt it is any 'faster' but at end of day, you just have to go through the entire image, pixel by pixel.
You could always do this:
int width = iplImage->width;
int height = iplImage->height;
uchar *start = (uchar *) iplImage->imageData;
uchar *end = start + width * height;
for (uchar *p = start ; p < end ; p += 3)
{
uchar buf = *p;
*p = *(p+2);
*(p+2) = buf;
}
but a decent compiler would do this anyway.
Your biggest overhead in these sorts of operations is going to be memory bandwidth.
If you're using Windows then you can probably do this conversion using the BitBlt and two appropriately set up DIBs. If you're really lucky then this could be done in the graphics hardware.
I hate to ruin anyone's day, but if you don't want to go the IPP route (see photo_tom) or pull in an optimized library, you might get better performance from the following (modifying Andreas answer):
uchar *iplImagePtr = (uchar *) iplImage->imageData;
uchar buf;
size_t limit = height * width;
for (size_t y = 0; y < limit; ++y) {
std::swap(iplImagePtr[y * 3], iplImagePtr[y * 3 + 2]);
}
Now hold on, folks, I hear you yelling "but all those extra multiplies and adds!" The thing is, this form of the loop is far easier for a compiler to optimize, especially if they get smart enough to multithread this sort of algorithm, because each pass through the loop is independent of those before or after. In the other form, the value of iplImagePtr was dependent on the value in previous pass. In this form, it is constant throughout the whole loop; only y changes, and that is in a very, very common "count from 0 to N-1" loop construct, so it's easier for an optimizer to digest.
Or maybe it doesn't make a difference these days because optimizers are insanely smart (are they?). I wonder what a benchmark would say...
P.S. If you actually benchmark this, I'd also like to see how well the following performs:
uchar *iplImagePtr = (uchar *) iplImage->imageData;
uchar buf;
size_t limit = height * width;
for (size_t y = 0; y < limit; ++y) {
uchar *pixel = iplImagePtr + y * 3;
std::swap(pix[0], pix[2]);
}
Again, pixel is defined in the loop to limit its scope and keep the optimizer from thinking there's a cycle-to-cycle dependency. If the compiler increments and decrements the stack pointer each time through the loop to "create" and "destroy" pixel, well, it's stupid and I'll apologize for wasting your time.
cvCvtColor(iplImage, iplImage, CV_BGR2RGB);

C++ creating image

I haven't been programming in C++ for a while, and now I have to write a simple thing, but it's driving me nuts.
I need to create a bitmap from a table of colors:
char image[200][200][3];
First coordinate is width, second height, third colors: RGB. How to do it?
Thanks for any help.
Adam
I'm sure you've already checked http://en.wikipedia.org/wiki/BMP_file_format.
With that information in hand we can write a quick BMP with:
// setup header structs bmpfile_header and bmp_dib_v3_header before this (see wiki)
// * note for a windows bitmap you want a negative height if you're starting from the top *
// * otherwise the image data is expected to go from bottom to top *
FILE * fp = fopen ("file.bmp", "wb");
fwrite(bmpfile_header, sizeof(bmpfile_header), 1, fp);
fwrite(bmp_dib_v3_header, sizeof(bmp_dib_v3_header_t), 1, fp);
for (int i = 0; i < 200; i++) {
for (int j = 0; j < 200; j++) {
fwrite(&image[j][i][2], 1, 1, fp);
fwrite(&image[j][i][1], 1, 1, fp);
fwrite(&image[j][i][0], 1, 1, fp);
}
}
fclose(fp);
If setting up the headers is a problem let us know.
Edit: I forgot, BMP files expect BGR instead of RGB, I've updated the code (surprised nobody caught it).
I'd suggest ImageMagick, comprehensive library etc.
I would first try to find out, how the BMP file format (that's what you mean by a bitmap, right?) is defined. Then I would convert the array to that format and print it to the file.
If that's an option, I would also consider trying to find an existing library for BMP files creation, and just use it.
Sorry if what I said is already obvious for you, but I don't know on which stage of the process you are stuck.
For simple image operations I highly recommend Cimg. This library works like a charm, and is extremely easy to use. You just have to include a header file in your code. It literally took me less than 10 minutes to compile and test.
If you want to do more complicated image operations however, I would go with Magick++ as suggested by dagoof.
It would be advisable to initialise the function as a simple 1 dimensional array.
ie (Where bytes is the number of bytes per pixel)
char image[width * height * bytes];
You can then access the relevant position in the array as follows
char byte1 = image[(x * 3) + (y * (width * bytes)) + 0];
char byte2 = image[(x * 3) + (y * (width * bytes)) + 1];
char byte3 = image[(x * 3) + (y * (width * bytes)) + 2];

How do I read JPEG and PNG pixels in C++ on Linux?

I'm doing some image processing, and I'd like to individually read each pixel value in a JPEG and PNG images.
In my deployment scenario, it would be awkward for me to use a 3rd party library (as I have restricted access on the target computer), but I'm assuming that there's no standard C or C++ library for reading JPEG/PNG...
So, if you know of a way of not using a library then great, if not then answers are still welcome!
There is no standard library in the C-standard to read the file-formats.
However, most programs, especially on the linux platform use the same library to decode the image-formats:
For jpeg it's libjpeg, for png it's libpng.
The chances that the libs are already installed is very high.
http://www.libpng.org
http://www.ijg.org
This is a small routine I digged from 10 year old source code (using libjpeg):
#include <jpeglib.h>
int loadJpg(const char* Name) {
unsigned char a, r, g, b;
int width, height;
struct jpeg_decompress_struct cinfo;
struct jpeg_error_mgr jerr;
FILE * infile; /* source file */
JSAMPARRAY pJpegBuffer; /* Output row buffer */
int row_stride; /* physical row width in output buffer */
if ((infile = fopen(Name, "rb")) == NULL) {
fprintf(stderr, "can't open %s\n", Name);
return 0;
}
cinfo.err = jpeg_std_error(&jerr);
jpeg_create_decompress(&cinfo);
jpeg_stdio_src(&cinfo, infile);
(void) jpeg_read_header(&cinfo, TRUE);
(void) jpeg_start_decompress(&cinfo);
width = cinfo.output_width;
height = cinfo.output_height;
unsigned char * pDummy = new unsigned char [width*height*4];
unsigned char * pTest = pDummy;
if (!pDummy) {
printf("NO MEM FOR JPEG CONVERT!\n");
return 0;
}
row_stride = width * cinfo.output_components;
pJpegBuffer = (*cinfo.mem->alloc_sarray)
((j_common_ptr) &cinfo, JPOOL_IMAGE, row_stride, 1);
while (cinfo.output_scanline < cinfo.output_height) {
(void) jpeg_read_scanlines(&cinfo, pJpegBuffer, 1);
for (int x = 0; x < width; x++) {
a = 0; // alpha value is not supported on jpg
r = pJpegBuffer[0][cinfo.output_components * x];
if (cinfo.output_components > 2) {
g = pJpegBuffer[0][cinfo.output_components * x + 1];
b = pJpegBuffer[0][cinfo.output_components * x + 2];
} else {
g = r;
b = r;
}
*(pDummy++) = b;
*(pDummy++) = g;
*(pDummy++) = r;
*(pDummy++) = a;
}
}
fclose(infile);
(void) jpeg_finish_decompress(&cinfo);
jpeg_destroy_decompress(&cinfo);
BMap = (int*)pTest;
Height = height;
Width = width;
Depth = 32;
}
For jpeg, there is already a library called libjpeg, and there is libpng for png. The good news is that they compile right in and so target machines will not need dll files or anything. The bad news is they are in C :(
Also, don't even think of trying to read the files yourself. If you want an easy-to-read format, use PPM instead.
Unfortunately, jpeg format is compressed, so you would have to decompress it before reading individual pixels. This is a non-trivial task. If you can't use a library, you may want to refer to one to see how it's decompressing the image. There is an open-source library on sourceforge: CImg on sourceforge.
Since it could use the exposure, I'll mention one other library to investigate: The IM Toolkit, which is hosted at Sourceforge. It is cross platform, and abstracts the file format completely away from the user, allowing an image to be loaded and processed without worrying about most of the details. It does support both PNG and JPEG out of the box, and can be extended with other import filters if needed.
It comes with a large collection of image processing operators as well...
It also has a good quality binding to Lua.
As Nils pointed, there is no such thing as a C or C++ standard library for JPEG compression and image manipulation.
In case you'd be able to use a third party library, you may want to try GDAL which supports JPEG, PNG and tens of other formats, compressions and mediums.
Here is simple example that presents how to read pixel data from JPEG file using GDAL C++ API:
#include <gdal_priv.h>
#include <cassert>
#include <iostream>
#include <string>
#include <vector>
int main()
{
GDALAllRegister(); // once per application
// Assume 3-band image with 8-bit per pixel per channel (24-bit depth)
std::string const file("/home/mloskot/test.jpg");
// Open file with image data
GDALDataset* ds = static_cast<GDALDataset*>(GDALOpen(file.c_str(), GA_ReadOnly));
assert(0 != ds);
// Example 1 - Read multiple bands at once, assume 8-bit depth per band
{
int const ncols = ds->GetRasterXSize();
int const nrows = ds->GetRasterYSize();
int const nbands = ds->GetRasterCount();
int const nbpp = GDALGetDataTypeSize(GDT_Byte) / 8;
std::vector<unsigned char> data(ncols * nrows * nbands * nbpp);
CPLErr err = ds->RasterIO(GF_Read, 0, 0, ncols, nrows, &data[0], ncols, nrows, GDT_Byte, nbands, 0, 0, 0, 0);
assert(CE_None == err);
// ... use data
}
// Example 2 - Read first scanline by scanline of 1 band only, assume 8-bit depth per band
{
GDALRasterBand* band1 = ds->GetRasterBand(1);
assert(0 != band1);
int const ncols = band1->GetXSize();
int const nrows = band1->GetYSize();
int const nbpp = GDALGetDataTypeSize(GDT_Byte) / 8;
std::vector<unsigned char> scanline(ncols * nbpp);
for (int i = 0; i < nrows; ++i)
{
CPLErr err = band1->RasterIO(GF_Read, 0, 0, ncols, 1, &scanline[0], ncols, 1, GDT_Byte, 0, 0);
assert(CE_None == err);
// ... use scanline
}
}
return 0;
}
There is more complete GDAL API tutorial available.
I've had good experiences with the DevIL library. It supports a wide range of image formats and follows a function-style very similar to OpenGL.
Granted, it is a library, but it's definitely worth a try.
Since the other answers already mention that you will most likely need to use a library, take a look at ImageMagick and see if it is possible to do what you need it to do. It comes with a variety of different ways to interface with the core functionality of ImageMagick, including libraries for almost every single programming language available.
Homepage: ImageMagick
If speed is not a problem you can try LodePNG that take a very minimalist approach to PNG loading and saving.
Or even go with picoPNG from the same author that is a self-contained png loader in a function.