The accepted answer to the question C++ Library for image recognition: images containing words to string recommended that you:
Upsize/Downsize your input image to 300 DPI.
How would I do this... I was under the impression that DPI was for monitors, not image formats.
I think the more accurate term here is resampling. You want a pixel resolution high enough to support accurate OCR. Font size (e.g. in points) is typically measured in units of length, not pixels. Since 72 points = 1 inch, we need 300/72 pixels-per-point for a resolution of 300 dpi ("pixels-per-inch"). That means a typical 12-point font has a height (or more accurately, base-line to base-line distance in single-spaced text) of 50 pixels.
Ideally, your source documents should be scanned at an appropriate resolution for the given font size, so that the font in the image is about 50 pixels high. If the resolution is too high/low, you can easily resample the image using a graphics program (e.g. GIMP). You can also do this programmatically through a graphics library, such as ImageMagick which has interfaces for many programming languages.
DPI makes sense whenever you're relating an image in pixels to a physical device with a picture size. In the case of OCR, it usually means the resolution of the scan, i.e. how many pixels will you get for each inch of your scan. A 12-point font is meant to be printed at 12/72 inches per line, and an upper-case character might fill about 80% of that; thus it would be approximately 40 pixels tall when scanned at 300 DPI.
Many image formats have a DPI recorded in them. If the image was scanned, this should be the exact setting from the scanner. If it came from a digital camera, it always says 72 DPI, which is a default value mandated by the EXIF specification; this is because a camera can't know the original size of the image. When you create an image with an imaging program, you might have the opportunity to set the DPI to any arbitrary value. This is a convenience for you to specify how you want the final image to be used, and has no bearing on the detail contained in the image.
Here's a previous question that asks the details of resizing an image:
How do I do high quality scaling of a image?
OCR software is typically designed to work with "normal" font sizes. From an image point of view, this means that it will be looking for letters perhaps around the 30 to 100 pixel height range. Images of much higher resolution would produce letters that appear much too large for the OCR software to process efficiently. Similarly, images of lower resolution would not provide enough pixels for the software to recognise letters.
"How would I do this... I was under the impression that dpi was for monitors, not image formats."
DPI stands for dots per inch. What does it have to do with monitors? Well, we have a pixel made of three RGB subpixels. The higher the DPI, the more details you cram into that space.
DPI is a useful measurement for displays and prints but nothing useful... in fact, nothing for image formats themselves.
The reason for DPI being tagged inside some formats is to instruct the devices to display at that resolution but from what I understand, virtually all ignore that instruction and does its best to optimize the image for a particular output.
You can change 72 dpi to 1 dpi or 6000 dpi in an image format and it won't make a difference whatsoever on a monitor. "Upsize/downsize to 300 dpi" makes no sense. Resampling does not change DPI either. Try it in Photoshop, uncheck "Resample" when changing the DPI and you'll see no difference whatsoever. It will NOT get bigger or smaller.
DPI is totally meaningless for image formats, IMO.
If your goal is OCR, DPI makes sense as the number of dots in your image for each inch in the original scanned document. If your dpi is too low, the information is gone forever, and even bicubic interpolation is not going to to a brilliant job recovering it. If your dpi is too high, it's easy to throw away bits.
To get the job done; I'm a big fan of the netpbm/pbmplus toolset; the tool to start with is pnmscale, although if you've got a bitmap you want to consider related tools such as pbmreduce.
Related
I read some of those website links that explain about exporting different image sizes for iOS device. But I don't really understand of those explaining. (May be I am not good at in English language.)
I found these dimension for launch screen. Please let me clarify my understanding to you guys.
So, when I create an image, I must create a larger size(#3x) firstly and should export that image into smaller sizes (#2x, #1x). Am I right?
For example, I create 1242x2208px (3x) image and scale to 2x,3x and save.
My questions are that;
1) I draw images in Photoshop CS6.For any size of images, the Resolution is still 75px. isn't it?
So, for 1242x2208px(3x) size , the resolution is 75px and then I decrease the size. The image will small and does it get blurry appearance?
2) Does image elements (heart image in my example image)need to make to be large in smaller device(1x) to get the clear looks? Or
3) If we don't need to modify image elements or may be some texts of font size to be large or small, can we get the high resolution of image appearance in 1x? I'm afraid that if we scale to smaller(1x), it would be blur and not good in looks because we are still in 75 resolutions.
4) Does it need to make to fit the image elements to image size? I found this video https://www.youtube.com/watch?v=WOnczJSsMqk . In this video, he crops the white space and export into #1x, #2x and #3x. So, the size of #3x is not the image size from Apple official Guideline website. I don't know clearly this.
5) If we type the text (font size-90pt) in #3x image, then it will automatically changes to 60pt in #1x image. right?
But in this link https://www.smashingmagazine.com/2015/05/retina-design-in-photoshop/#font-size , he wrote that
a text box with the font set to 16 pixels. But #2x this is 32 pixels,
and #3x it’s 48 pixels!
Not ideal, is it, having to constantly multiply by two or three? I
don’t know about you, but I could do without the constant math. When I
design, I want to know that 16 pixels is 16 pixels!
So, the text should be 16 px in any size of image 1x,2x or 3x or not?
6) These image sizes are for launch screen, isn't it? But, if I create an image for background of Login screen, then is it the same concept and save as these sizes of image ?
7) Above image size dimension is correct or not? But, in this https://developer.apple.com/ios/human-interface-guidelines/graphics/launch-screen/ website, the sizes are a little bit different.
Now, I'm trying to create a design for login background image. So, I was finding the sizes before I draw. But, after reading many articles of image sizes in Retina device, I've confused and got many questions in my mind.
That's why I write down my questions like this and I would like to say sorry that my question is long and make you feel not easy to understand.
Sorry again for my poor English.
I hope anyone would help me to answer all my questions in steps.Thanks for reading till the end. :)
You are asking way to much here. First off, you do not work in pixels. You work in points. These are 2 different units of measurements. On a 1x scale 1 point = 1x1 pixel, on a 2x scale 1 point = 2x2 pixel, and on a 3x scale, 1 point = 3x3 pixels.
Now when it comes to how to scale, people claim that you start big and you got small for best quality. This is simply not true. It all depends on the actual image as to how it will scale. So your goal is to find what works best for the image. I would recommend starting off from big to small, but if it doesn't work out as nice as you like, go small to big, then try a different scaling method.
I personally do not rely 100% on automation, I like to tweak all 3 sizes manually until the images are perfect, which makes sprite-kit really tough to work with in this department because I have to design my graphics in a way that counter act the hardware scaling. Bottom line is at the end of the day, do what is best for your app within your budget constraints.
Now when it comes to font sizes, again you are working in points, not pixels. Whoever told you that you need to multiply has no understanding of how retina display works. So when you do 16pt font, the system will automatically pick 32pt and 48pt. (But if you read it it will still say 16pt)
Try not to over think this matter, it is really simple to understand. The entire point of retina display is to provide a sharper image while maintaining the same experience, and it does this by offering more granularity in the way pixels are displayed. Each individual pixel is very very tiny, which makes it hard to see with your eyes. Instead they are companions to other pixels, so that when your eyes put the 2 pixels together, you get a better looking image that could not be produced by using a single colored pixel. When you work on your apps, you want to keep this in mind. This is why it stinks for new people to get into development. Everybody should start with the iphone 2g, then adapt their app to iphone 4. They will get a clear understanding of what retina is built for.
A generic question: in GDI, the type of font size is int, so it is not accurate when do zoom out/ zoom in for the text draw by GDI in the window,
Is there a simple method to use a float font size in GDI to make font size accurate?
Thanks a lot for your kindly help!
GDI text does not scale linearly, and not just because it uses only integer sizes, but also because of hinting which tries to make text look better when rendered at a resolution on the order of its stroke width. If you double the height of a GDI font, the width may not exactly double. GDI text was once hardware accelerated but no longer is on modern versions of Windows. That doesn't matter much for performance because it's relatively simple and efficient, and hardware is fast.
GDI+ text will scale linearly in the sense that doubling the height will double the width (well, it's really close). But text may look "fuzzier" because it uses grayscale antialiasing instead of ClearType subpixel rendering. GDI+ text tends to be slower that GDI because more work is done in software.
DirectWrite (which runs on Direct2D) scales linearly and generally looks very good. It's harder to write efficient Direct2D/DirectWrite code, and, depending on your requirements, you might have to drop back to GDI if you also need to print. If you try to write DPI-aware programs, you may find yourself having to do a lot of conversions between DirectWrite's device-independent coordinates for graphics and mouse coordinates that are still device-dependant. DirectWrite is hardware accelerated, so i'ts fast if you use it efficiently by caching lots of intermediate data structures.
With CreateFont (and CreateFontIndirect) you specify the font size in pixels, so it remains accurate to the pixel regardless of zooming (within the constraints of the sizes available in the font that's selected--if you use a bitmapped font, scaling may be limited or nonexistent).
If you're using CreatePointFont to create the font, you specify the font size in tenths of a point, which usually works out to smaller than a pixel, so it gets rounded to the nearest pixel. If you really want to be sure you're specifying the height to the nearest pixel, however, you probably want to use CreateFont/CreateFontIndirect instead of CreatePointFont though.
I am using the EM_FORMATRANGE message to render the output of a rich text control to an arbitrary device context. However, when rendering to a bitmap, the dots-per-inch of the bitmap's device context is the same as the display device's DPI, which is 96 dots-per-inch. This is much lower than what I would like to render to. I'd rather render at a much higher DPI so that the user can zoom in, and perhaps print on a high-DPI printer later.
I suspect what happens is that the RTF control calls GetDeviceCaps with LOGPIXELSX and LOGPIXELSY to get the number of pixels per inch of the device. It then renders the document using this DPI value at a 100% zoom level. Windows display devices always return a value of 96 DPI, unless large fonts are being used on the system (as set in Control Panel) and the application is DPI-aware.
Many examples on the Internet propose scaling the output of EM_FORMATRANGE. This is so that any arbitrary DPI resolution can be achieved. Most examples generally involve using SetMapMode, SetWindowExtEx, and SetViewportExtEx (e.g. see http://social.msdn.microsoft.com/Forums/en-us/netfxbcl/thread/37fd1bfb-f07b-421d-9b5e-5f4492ffbbc3). These functions can be used to scale the rich text control's rendered output: for example, if I specify 400% scaling, then if the rich text control rendered something that was 5 pixels wide, it would actually become 20 pixels wide.
Unfortunately, the old GDI functions use integers instead of floating point numbers. For example, suppose the RTF control decided that an element should be drawn at (12.7, 15.3) pixels. This would be rounded to a position of (13, 15). These rounded coordinates are passed to GDI, which then scales up the image using scaling specified by SetMapMode: for the example of 400%, it would be (13*4, 15*4), or (52, 60). But this is not accurate: the element would have better been placed at (12.7*4, 15.3*4), or (51, 61). The worst part is that for some cases, the error becomes cumulative.
I believe this is the underlying cause of this very noticeable error when scaling some simple text:
The above example is 8 point Segoe UI, scaled to 400% using EM_FORMATRANGE and SetMapMode on a 96 DPI display device context. The text has now become 32 point size, but the space between each character is too high and looks unnatural.
The above example was created in WordPad by entering the text as 8 point Segoe UI and then using the zoom control to set to a 400% zoom level. The space between each character looks normal. The exact same result is achieved with a 32 point font and 100% zoom level.
To work around this issue, I have tried the following. For each thing tried, the result has been identically unsatisfactory when scaled to 400%.
Using a scaling transform set using SetWorldTransform instead of the scaling done with SetMapMode and SetWindowExtEx etc.
Passing the device context for a metafile to EM_FORMATRANGE, and then scaling the metafile later.
Using SetMapMode to scale in conjunction with rendering to a metafile, and then showing the metafile later without scaling.
I believe the results are always unsatisfactory because the problem boils down to the fact that the rich edit control is rounding to the nearest integer and rendering to what it thinks is a 96 DPI device - ignoring the transforms in place. I looked into the metafile format and what I discovered is that the individual character positions are actually stored in the metafile at pixel-level resolution - that's why scaling the metafile obviously didn't work since the rounding has already happened by that point.
I can think of two real solutions that would work around this issue:
Use a device context with a higher user-specified dots per inch, such that GetDeviceCaps returns different values. (Note: some examples propose using the printer device since they generally have higher DPI, but I want my code to work on systems that don't have a printer and be able to render to an off-screen buffer).
Some way to tell the rich edit control to assume the device context has a different dots per inch than reported by GetDeviceCaps.
Anything else seems like it would still be subject to these rounding errors.
Does anyone (1) have an idea of how to implement either of the solutions I have proposed, or (2) have an alternate idea of how to achieve my goal of getting an accurate high-DPI output into a buffer?
I'm having the exact same problem.
A quick solution is to draw the text into 100% scale bitmap, and then just scale the bitmap.
it's not the best solution, but it might work for you.
Did you find any better solutions? if so, please share them here.
Also notice, that this problem also occurs when you draw the text to a 100% meta-file and then scale the meta-file to the screen - I believe this has something to do with GDI text drawing functions that aren't working well with scaling.
Roey
You could multiply the point size of all the text in the control by a factor of 4 and render the control to a bitmap that's 4 times larger.
If you're populating the control yourself this would be quite straightforward. If you support arbitrary content entered by the user it would be a lot more work and would require extra effort to handle anything that wasn't text (e.g. embedded bitmaps).
I just spent two weeks on a similar problem. I needed a Rich Edit that was scalable for WYSISWG editing. As we've found the windows rich edit control does not support scaling correctly with EM_FORMATRANGE and inter character spacing does not change between zoom levels and font sizes only scale in discrete font size steps.
Since I did not need large differences in scale the solution I settled on was to use the windowless text edit interfaces from ITextServices to render to an internal bitmap at a fixed resolution. Then I used GDI+ to resample the internal bitmap to the needed screen size with trilinear filtering. The results emulated a scalable rich edit well enough as long as scale difference were not too large, it was good enough for my needs.
After trying many different options I am convinced you can not get precise scaling with the windows rich edit control. You can write your own control that renders text. However, you would need to have a separate draw call for every piece of text with a different style. Also you would need to handle all the nicities rich edit handles for you like highlighting text, placing the cursor, handling mouse and keyboard input, parsing rtf text, et cetera. It would probably be best just to buy a third party component in this case(I could not find any suitable free open source components). In case someone wants to attempt it I will point out the relevant starting points for text rendering for different APIs.
GDI - TextOut does not set inter-character spacing correctly. You need GetCharacterPlacement and ExTextOut. You also need to calculate scaling yourself. You probably don't want to use GDI
GDI+ - DrawString handles scaling correctly. GDI+ is a reasonable option
DirectWrite - If you are willing to limit yourself to Vista Platform Update or later, DirectWrite is the newest text API from Microsoft.
Also here is link describing how text rendering is different between GDI and GDI+:
http://windowsclient.net/articles/gdiptext.aspx
Try using the EM_SETZOOM message to let the rich edit control scale the output itself.
I need to convert 24bppRGB to 16bppRGB, 8bppRGB, 4bppRGB, 8bpp grayscal and 4bpp grayscale. Any good link or other suggestions?
preferably using Windows/GDI+
[EDIT] speed is more critical than quality. source images are screenshots
[EDIT1] color conversion is required to minimize space
You're better off getting yourself a library, as others have suggested. Aside from ImageMagick, there are others, such as OpenCV. The benefits of leaving this to a library are:
Save yourself some time -- by cutting out dev and testing time for the algorithm
Speed. Most libraries out there are optimized to a level far greater than a standard developer (such as ourselves) could achieve
Standards compliance. There are many image formats out there, and using a library cuts the problem of standards compliance out of the equation.
If you're doing this yourself, then your problem can be divided into the following sub-problems:
Simple color quantization. As #Alf P. Steinbach pointed out, this is just "downscaling" the number of colors. RGB24 has 8 bits per R, G, B channels, each. For RGB16 you can do a number of conversions:
Equal number of bits for each of R, G, B. This typically means 4 or 5 bits each.
Favor the green channel (human eyes are more sensitive to green) and give it 6 bits. R and B get 5 bits.
You can even do the same thing for RGB24 to RGB8, but the results won't be as pretty as a palletized image:
4 bits green, 2 red, 2 blue.
3 bits green, 5 bits between red and blue
Palletization (indexed color). This is for going from RGB24 to RGB8 and RGB4. This is a hard problem to solve by yourself.
Color to grayscale conversion. Very easy. Convert your RGB24 to YUV' color space, and keep the Y' channel. That will give you 8bpp grayscale. If you want 4bpp grayscale, then you either quantize or do palletization.
Also be sure to check out chroma subsampling. Often, you can decrease the bitrate by a third without visible losses to image quality.
With that breakdown, you can divide and conquer. Problems 1 and 2 you can solve pretty quickly. That will allow you to see the quality you can get simply by doing coarser color quantization.
Whether or not you want to solve Problem 2 will depend on the result from above. You said that speed is more important, so if the quality of color quantization only is good enough, don't bother with palletization.
Finally, you never mentioned WHY you are doing this. If this is for reducing storage space, then you should be looking at image compression. Even lossless compression will give you better results than reducing the color depth alone.
EDIT
If you're set on using PNG as the final format, then your options are quite limited, because both RGB16 and RGB8 are not valid combinations in the PNG header.
So what this means is: regardless of bit depth, you will have to switch to index color if you want RGB color images below 24bpp (8 bits per channel). This means you will NOT be able to take advantage of the color quantization and chroma decimation that I mentioned above -- it's not supported in PNG. So this means you will have to solve Problem 2 -- palletization.
But before you think about that, some more questions:
What are the dimensions of your images?
What sort of ideal file-size are you after?
How close to that ideal file-size do you get with straight RBG24 + PNG compression?
What is the source of your images? You've mentioned screenshots, but since you're so concerned about disk space, I'm beginning to suspect that you might be dealing with image sequences (video). If this is so, then you could do better than PNG compression.
Oh, and if you're serious about doing things with PNG, then definitely have a look at this library.
Find your self a copy of the ImageMagick [sic] library. It's very configurable, so you can teach it about the details of some binary format that you need to process...
See: ImageMagick, which has a very practical license.
I received acceptable results (preliminary) by GDI+, v.1.1 that is shipped with Vista and Win7. It allows conversion to 16bpp (I used PixelFormat16bppRGB565) and to 8bpp and 4bpp using standard palettes. Better quality could be received by "optimal palette" - GDI+ would calculate optimal palette for each screenshot, but it's two times slower conversion. Grayscale was received by specifying simple custom palette, e.g. as demonstrated here, except that I didn't need to modify pixels manually, Bitmap::ConvertFormat() did it for me.
[EDIT] results were really acceptable until I decided to check the solution on WinXP. Surprisingly, Microsoft decided to not ship GDI+ v.1.1 (required for Bitmap::ConvertFormat) to WinXP. Nice move! So I continue researching...
[EDIT] had to reimplement this on clean GDI hardcoding palettes from GDI+
Pretty simple but specific question here:
I'm not entirely familiar with the JPEG standard for compressing images. Does it create a better (that being, smaller file size at a similar quality) image when the X dimension (width) is very large and the Y dimension (height) is very small, vice versa, or when the two are nearly equal?
The practical use I have for this is CSS sprites. If a website were to consist of hundreds of CSS sprites, it would be ideal to minimize the size of the sprite file to assist users on slower internet and also to reduce server load. If the JPEG standard operates really well on a single horizontal line, but moving vertically requires a lot more complexity, it would make sense for an image of 100 16x16 CSS sprites to be 1600x16.
On the other hand if the JPEG standard has a lot of complexity working horizontally but moves from row to row easily, you could make a smaller file or have higher quality by making the image 16x1600.
If the best compression occurs when the image is a perfect square, you would want the final image to be 160x160
The MPEG/JPEG blocking mechanism would (very slightly) favor an image size that is an exact multiple of the compression block size in each dimension. However, beyond that, the format won't care if the blocks are vertical or horizontal.
So, the direct answer to your question would be "square is as good as anything", as long as your sprites divide easily into a JPEG compression block (just make sure they are 8, 16, 24 or 32 pixels wide and you'll be fine).
However, I would go a bit further and say that for "most" spites, you are going to have a smaller image size, and clearer resolution if you have the initial master image be GIF instead of JPG, even more so if you can use a reduced color palette. Consider why would you need JPG at all for "hundreds of sprites".
It looks like JPEG's compression ratio isn't affected by image dimensions. However, it looks like your dimensions should be multiples of 8 but in all your examples you had multiples of 16 so you should be fine there.
http://en.wikipedia.org/wiki/JPEG#JPEG_codec_example
If I remember correctly, PNG (being lossless) operates much better when the same color appears in a horizontal stretch rather than a vertical stretch. Why are you making your sprites JPEG? If they are of a limited color-set (which is likely if you have 16x16 sprites, animated or not), PNG might actually yield better filesizes with perfect image quality.