PCF filtering on OpenGL - opengl

I'm reading the docs for ARB_shadow and they hint that OpenGL is free to implement PCF (Percentage Closer Filtering) in the texture samplers but they can't talk about it in the spec.
First they say:
How is bilinear or trilinear filtering implemented?
RESOLUTION: We suggest an implementation behaviour but leave the
details up to the implementation. Differences here amount to the
quality and softness of shadow edges. Specific filtering algorithms
could be expressed via layered extensions. We're intentionally vague
here to avoid IP and patent issues.
Then later on:
If TEXTURE_MAG_FILTER is not NEAREST or TEXTURE_MIN_FILTER is not
NEAREST or NEAREST_MIPMAP_NEAREST then r may be computed by comparing
more than one depth texture value to the texture R coordinate. The
details of this are implementation-dependent but r should be a value
in the range [0, 1] which is proportional to the number of comparison
passes or failures.
And ... that's PCF!
Questions
The first question is: Does it work? How many vendors really do PCF when you sample a depth texture with a LINEAR filter?
The second (and most important) question is: How would you detect if hardware PCF filtering via ARB_shadow will work? I want to fall back to a software PCF in the shader if it's not present.
Notes
I noticed that ARB_shadow goes hand-in-hand with ARB_fragment_program_shadow (at least on the graphics cards I have access to). ARB_fragment_program_shadow is the low level hardware support for PCF in shaders. Is that a clue to the answer, ie. if both extensions are present then hardware PCF will appear when I say the magic words?
Implementing ARB_shadow with a simple binary result would be silly. I'd assume the whole point of implementing ARB_shadow in a driver would be to do PCF, even if the docs aren't allowed to say so. Does the mere presence of ARB_shadow imply hardware PCF?

Related

How do fragments get generated by rasterizer in OpenGL

I came across the description of rasterization and it basically says that when an object gets projected onto screen that what happens is that a scan takes place over all the pixels on the window/screen and decides if the pixel/fragment is within the triangle and hence determines that the pixel/fragment is inside the triangle and follows with further processing for the pixel/fragment such as colouring etc
Now since i am studying OpenGL and i do know that OpenGL probably has its own implementations of this process i was wondering whether this also takes place with OpenGL since of the "Scan-Conversion" process of vertices that i have read in OpenGL tutorial
Now another question related to this i have is that i know that the image/screen/window of pixels is an image or 2d array of pixels also known as the default framebuffer that is linear
So what i am wondering is if that is the case, how would projecting the 3 vertices of a triangle define which pixels are covered in side it?
Does the rasterizer draw the edges of a triangle first and then scans through each pixel or 2d array of pixels (also known as the default framebuffer) and sees if the points are between the lines using some mathematical method or some other simpler process happens?
and i do know that OpenGL probably has its own implementations of this process
OpenGL is just a specification document. What runs on a computer is an OpenGL implementation, most of the time as part of a GPU driver. The actual workload is carried out by a GPU…
this also takes place with OpenGL since of the "Scan-Conversion" process of vertices that i have read in OpenGL tutorial
most likely not. As a matter of fact last weekend I was attending a Khronos (the group that specifies OpenGL) event hosted by AMD and one of AMD's GPU engineers was lamenting that newbies have the scanline algorithm in mind with OpenGL, Direct3D, Mantel, Vulkan, etc., while GPUs do something entirely different.
2d array of pixels also known as the default framebuffer that is linear
actually the memory layout of pixels as used internally by the GPU is not linear (i.e. row-by-row) but follows a pattern that gives efficient localized access. For linear access GPUs have extremely efficient copy engines that allow for practically zero overhead conversion between the internal and linear format.
The exact layout used internally is a detail only the GPU engineers have insight into, though. But the fact that memory is not organized linearly but in a localized fashion is also one reason, that the traditional scanline algorithm is not used by GPUs.
So what i am wondering is if that is the case, how would projecting the 3 vertices of a triangle define which pixels are covered in side it?
Any method that satisfies the requirements of the OpenGL specification is allowed. The details are part of the OpenGL implementation, i.e. usually the combination of particular GPU model and driver version.
The scanline algorithm is what people did in software back in the 1990s, before modern GPUs. GPU developers figured out rather quickly that the algorithms you use for software rendering are vastly different from the algorithms you would implement in a VLSI implementation with billions of transistors. Algorithms optimized for hardware implementation tend to look fairly alien to anyone who comes from a software background anyway.
Another thing I'd like to clear up is that OpenGL doesn't say anything about "how" you render, it's just "what" you render. OpenGL implementations are free to do it however they please. We can find out "what" by reading the OpenGL standard, but "how" is buried in secrets kept by the GPU vendors.
Finally, before we start, the articles you linked are unrelated. They are about how ultrasonic scans work.
What do we know about scan conversion?
Scan conversion has as input a number of primitives. For our purposes, let's assume that they're all triangles (which is increasingly true these days).
Every triangle must be clipped by the clipping planes. This can add up to three additional sides to the triangle, in the worst case (turning it into a hexagon). This has to happen before perspective projection.
Every primitive must go through perspective projection. This process takes each vertex with homogeneous coordinates (X, Y, Z, W) and converts it to (X/W, Y/W, Z/W).
The framebuffer is usually organized hierarchically into tiles, not linearly like the way you do in software. Furthermore, the processing might be done at more than one hierarchical level. The reason why we use linear organization in software is because it takes extra cycles to compute memory addresses in a hierarchical layout. However, VLSI implementations do not suffer from this problem, they can simply wire up the bits in a register how they want to make an address from it.
So you can see that in software, tiles are "complicated and slow" but in hardware they're "easy and fast".
Some notes looking at the R5xx manual:
The R5xx series is positively ancient (2005) but the documentation is available online (search for "R5xx_Acceleration_v1.5.pdf"). It mentions two scan converters, so the pipeline looks something like this:
primitive output -> coarse scan converter -> quad scan converter -> fragment shader
The coarse scan converter appears to operate on larger tiles of configurable size (8x8 to 32x32), and has multiple selectable modes, an "intercept based" and a "bounding box based" mode.
The quad scan converter then takes the output of the coarse scan converter and outputs individual quads, which are groups of four samples. The depth values for each quad may be represented as four discrete values or as a plane equation. The plane equation allows the entire quad to be discarded quickly if the corresponding quad in the depth buffer also is specified as a plane equation. This is called "early Z" and it is a common optimization.
The fragment shader then works on one quad at a time. The quad might contain samples outside the triangle, which will then get discarded.
It's worth mentioning again that this is an old graphics card. Modern graphics cards are more complicated. For example, the R5xx doesn't even let you sample textures from the vertex shaders.
If you want an even more radically different picture, look up the PowerVR GPU implementations which use something called "tile-based deferred rendering". These modern and powerful GPUs are optimized for low cost and low power consumption, and they challenge a lot of your assumptions about how renderers work.
Quoting from GPU Gems: Parallel Prefix Sum (Scan) with CUDA, it describes how OpenGL does its scan
conversion and compares it with CUDA which I think suffices as the answer of my question:
Prior to the introduction of CUDA, several researchers implemented
scan using graphics APIs such as OpenGL and Direct3D (see Section
39.3.4 for more). To demonstrate the advantages CUDA has over these APIs for computations like scan, in this section we briefly describe
the work-efficient OpenGL inclusive-scan implementation of Sengupta et
al. (2006). Their implementation is a hybrid algorithm that performs a
configurable number of reduce steps as shown in Algorithm 5. It then
runs the double-buffered version of the sum scan algorithm previously
shown in Algorithm 2 on the result of the reduce step. Finally it
performs the down-sweep as shown in Algorithm 6.
Example 5. The Reduce Step of the OpenGL Scan Algorithm
1: for d = 1 to log2 n do
2: for all k = 1 to n/2 d – 1 in parallel do
3: a[d][k] = a[d – 1][2k] + a[d – 1][2k + 1]]
Example 6. The Down-Sweep Step of the OpenGL Scan Algorithm
1: for d = log2 n – 1 down to 0 do
2: for all k = 0 to n/2 d – 1 in parallel do
3: if i > 0 then
4: if k mod 2 U2260.GIF 0 then
5: a[d][k] = a[d + 1][k/2]
6: else
7: a[d][i] = a[d + 1][k/2 – 1]
The OpenGL scan computation is implemented using pixel shaders, and
each a[d] array is a two-dimensional texture on the GPU. Writing to
these arrays is performed using render-to-texture in OpenGL. Thus,
each loop iteration in Algorithm 5 and Algorithm 2 requires reading
from one texture and writing to another.
The main advantages CUDA has over OpenGL are its on-chip shared
memory, thread synchronization functionality, and scatter writes to
memory, which are not exposed to OpenGL pixel shaders. CUDA divides
the work of a large scan into many blocks, and each block is processed
entirely on-chip by a single multiprocessor before any data is written
to off-chip memory. In OpenGL, all memory updates are off-chip memory
updates. Thus, the bandwidth used by the OpenGL implementation is much
higher and therefore performance is lower, as shown previously in
Figure 39-7.

What is the difference between clearing the framebuffer using glClear and simply drawing a rectangle to clear the framebuffer?

I think at least some old graphics drivers used to crash if glClear wasn't used and that glClear is probably faster in a lot of cases but why? How are 3-d graphics drivers usually implemented such that these uses would have different results?
On a high level, it can be faster because the OpenGL implementation knows ahead of time that the whole buffer needs to be set to the same color/value. The more you know about what exactly needs to be done, the more you can take advantage of possible accelerations.
Let's say setting a whole buffer to the same value is more efficient than setting the same pixels to variable values. With a glClear(), you know already that all pixels will have the same value. If you draw a screen sized quad with a fragment shader that emits a constant color, the driver would either have to recognize that situation by analyzing the shaders, or the system would have to compare the values coming out of the shader, to know that all pixels have the same value.
The reason why setting everything to the same value can be more efficient has to do with framebuffer compression and related technologies. GPUs often don't actually write each pixel out to the framebuffer, but use various kinds of compression schemes to reduce the memory bandwidth needed for framebuffer writes. If you imagine almost any kind of compression, all pixels having the same value is very favorable.
To give you some ideas about the published vendor specific technologies, here are a few sources. You can probably find more with a search.
Article talking about new framebuffer compression method in relatively recent AMD cards: http://techreport.com/review/26997/amd-radeon-r9-285-graphics-card-reviewed/2.
NVIDIA patent on zero bandwidth clears: http://www.google.com/patents/US8330766.
Blurb on ARM web site about Mali framebuffer compression: http://www.arm.com/products/multimedia/mali-technologies/arm-frame-buffer-compression.php.
Why is it faster? Because it is a function that bypasses most calculations that other types of drawings have to go through.
Alpha function, blend function, logical operation, stenciling, texture mapping, and depth-buffering are ignored by glClear
Source
Why do some drivers crash without it? It's hard to say, but it should have something to do with the implementation details of OpenGL. The functions does what it's supposed to do, but might do more that you don't know about.
OpenGL might infer from this function call other tasks that it needs to perform.

how to make opengl mipmaps sharper?

I am rewriting an opengl-based gis/mapping program. Among other things, the program allows you to load raster images of nautical charts, fix them to lon/lat coordinates and zoom and pan around on them.
The previous version of the program uses a custom tiling system, where in essence it manually creates mipmaps of the original image, in the form of 256x256-pixel tiles at various power-of-two zoom levels. A tile for zoom level n - 1 is constructed from four tiles from zoom level n, using a simple average-of-four-points algorithm. So, it turns off opengl mipmapping, and instead when it comes time to draw some part of the chart at some zoom level, it uses the tiles from the nearest-match zoom level (i.e., the tiles are in power-of-two zoom levels but the program allows arbitrary zoom levels) and then scales the tiles to match the actual zoom level. And of course it has to manage a cache of all these tiles at various levels.
It seemed to me that this tiling system was overly complex. It seemed like I should be able to let the graphics hardware do all of this mipmapping work for me. So in the new program, when I read in an image, I chop it into textures of 1024x1024 pixels each. Then I fix each texture to its lon/lat coordinates, and then I let opengl handle the rest as I zoom and pan around.
It works, but the problem is: My results are a bit blurrier than the original program, which matters for this application because you want to be able to read text on the charts as early as possible, zoom-wise. So it's seeming like the simple average-of-four-points algorithm the original program uses gives better results than opengl + my GPU, in terms of sharpness.
I know there are several glTexParameter settings to control some aspects of how mipmaps work. I've tried various combinations of GL_TEXTURE_MAX_LEVEL (anywhere from 0 to 10) with various settings for GL_TEXTURE_MIN_FILTER. When I set GL_TEXTURE_MAX_LEVEL to 0 (no mipmaps), I certainly get "sharp" results, but they are too sharp, in the sense that pixels just get dropped here and there, so the numbers are unreadable at intermediate zooms. When I set GL_TEXTURE_MAX_LEVEL to a higher value, the image looks quite good when you are zoomed far out (e.g., when the whole chart fits on the screen), but as you zoom in to intermediate zooms, you notice the blurriness especially when looking at text on the charts. (I.e., if it weren't for the text you might think "wow, opengl is doing a nice job of smoothly scaling my image." but with the text you think "why is this chart out of focus?")
My understanding is that basically you tell opengl to generate mipmaps, and then as you zoom in it picks the appropriate mipmaps to use, and there are some limited options for interpolating between the two closest mipmap levels, and either using the closest pixels or averaging the nearby pixels. However, as I say, none of these combinations seem to give quite as clear results, at the same zoom level on the chart (i.e., a zoom level where text is small but not minuscule, like the equivalent of "7 point" or "8 point" size), as the previous tile-based version.
My conclusion is that the mipmaps that opengl creates are simply blurrier than the ones the previous program created with the average-four-point algorithm, and no amount of choosing the right mipmap or LINEAR vs NEAREST is going to get the sharpness I need.
Specific questions:
(1) Does it seem right that opengl is in fact making blurrier mipmaps than the average-four-points algorithm from the original program?
(2) Is there something I might have overlooked in my use of glTexParameter that could give sharper results using the mipmaps opengl is making?
(3) Is there some way I can get opengl to make sharper mipmaps in the first place, such as by using a "cubic" filter or otherwise controlling the mipmap creation process? Or for that matter it seems like I could use the same average-four-points code to manually generate the mipmaps and hand them off to opengl. But I don't know how to do that...
(1) it seems unlikely; I'd expect it just to use a box filter, which is average four points in effect. Possibly it's just switching from one texture to a higher resolution one at a different moment — e.g. it "Chooses the mipmap that most closely matches the size of the pixel being textured", so a 256x256 map will be used to texture a 383x383 area, whereas the manual system it replaces may always have scaled down from 512x512 until the target size was 256x256 or less.
(2) not that I'm aware of in base GL, but if you were to switch to GLSL and the programmable pipeline then you could use the 'bias' parameter to texture2D if the problem is that the lower resolution map is being used when you don't want it to be. Similarly, the GL_EXT_texture_lod_bias extension can do the same in the fixed pipeline. It's an NVidia extension from a decade ago and is something all programmable cards could do, so it's reasonably likely you'll have it.
(EDIT: reading the extension more thoroughly, texture bias migrated into the core spec of OpenGL in version 1.4; clearly my man pages are very out of date. Checking the 1.4 spec, page 279, you can supply a GL_TEXTURE_LOD_BIAS)
(3) yes — if you disable GL_GENERATE_MIPMAP then you can use glTexImage2D to supply whatever image you like for every level of scale, that being what the 'level' parameter dictates. So you can supply completely unrelated mip maps if you want.
To answer your specific points, the four-point filtering you mention is equivalent to box-filtering. This is less blurry than higher-order filters, but can result in aliasing patterns. One of the best filters is the Lanczos filter. I suggest you calculate all of your mipmap levels from the base texture using a Lanczos filter and crank up the anisotropic filtering settings on your graphics card.
I assume that the original code managed textures itself because it was designed to view data sets that are too large to fit into graphics memory. This was probably a bigger problem in the past, but is still a concern.

What is state-of-the-art for text rendering in OpenGL as of version 4.1? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
The community reviewed whether to reopen this question last month and left it closed:
Original close reason(s) were not resolved
Improve this question
There are already a number of questions about text rendering in OpenGL, such as:
How to do OpenGL live text-rendering for a GUI?
But mostly what is discussed is rendering textured quads using the fixed-function pipeline. Surely shaders must make a better way.
I'm not really concerned about internationalization, most of my strings will be plot tick labels (date and time or purely numeric). But the plots will be re-rendered at the screen refresh rate and there could be quite a bit of text (not more than a few thousand glyphs on-screen, but enough that hardware accelerated layout would be nice).
What is the recommended approach for text-rendering using modern OpenGL? (Citing existing software using the approach is good evidence that it works well)
Geometry shaders that accept e.g. position and orientation and a character sequence and emit textured quads
Geometry shaders that render vector fonts
As above, but using tessellation shaders instead
A compute shader to do font rasterization
Rendering outlines, unless you render only a dozen characters total, remains a "no go" due to the number of vertices needed per character to approximate curvature. Though there have been approaches to evaluate bezier curves in the pixel shader instead, these suffer from not being easily antialiased, which is trivial using a distance-map-textured quad, and evaluating curves in the shader is still computationally much more expensive than necessary.
The best trade-off between "fast" and "quality" are still textured quads with a signed distance field texture. It is very slightly slower than using a plain normal textured quad, but not so much. The quality on the other hand, is in an entirely different ballpark. The results are truly stunning, it is as fast as you can get, and effects such as glow are trivially easy to add, too. Also, the technique can be downgraded nicely to older hardware, if needed.
See the famous Valve paper for the technique.
The technique is conceptually similar to how implicit surfaces (metaballs and such) work, though it does not generate polygons. It runs entirely in the pixel shader and takes the distance sampled from the texture as a distance function. Everything above a chosen threshold (usually 0.5) is "in", everything else is "out". In the simplest case, on 10 year old non-shader-capable hardware, setting the alpha test threshold to 0.5 will do that exact thing (though without special effects and antialiasing).
If one wants to add a little more weight to the font (faux bold), a slightly smaller threshold will do the trick without modifying a single line of code (just change your "font_weight" uniform). For a glow effect, one simply considers everything above one threshold as "in" and everything above another (smaller) threshold as "out, but in glow", and LERPs between the two. Antialiasing works similarly.
By using an 8-bit signed distance value rather than a single bit, this technique increases the effective resolution of your texture map 16-fold in each dimension (instead of black and white, all possible shades are used, thus we have 256 times the information using the same storage). But even if you magnify far beyond 16x, the result still looks quite acceptable. Long straight lines will eventually become a bit wiggly, but there will be no typical "blocky" sampling artefacts.
You can use a geometry shader for generating the quads out of points (reduce bus bandwidth), but honestly the gains are rather marginal. The same is true for instanced character rendering as described in GPG8. The overhead of instancing is only amortized if you have a lot of text to draw. The gains are, in my opinion, in no relation to the added complexity and non-downgradeability. Plus, you are either limited by the amount of constant registers, or you have to read from a texture buffer object, which is non-optimal for cache coherence (and the intent was to optimize to begin with!).
A simple, plain old vertex buffer is just as fast (possibly faster) if you schedule the upload a bit ahead in time and will run on every hardware built during the last 15 years. And, it is not limited to any particular number of characters in your font, nor to a particular number of characters to render.
If you are sure that you do not have more than 256 characters in your font, texture arrays may be worth a consideration to strip off bus bandwidth in a similar manner as generating quads from points in the geometry shader. When using an array texture, the texture coordinates of all quads have identical, constant s and t coordinates and only differ in the r coordinate, which is equal to the character index to render.
But like with the other techniques, the expected gains are marginal at the cost of being incompatible with previous generation hardware.
There is a handy tool by Jonathan Dummer for generating distance textures: description page
Update:
As more recently pointed out in Programmable Vertex Pulling (D. Rákos, "OpenGL Insights", pp. 239), there is no significant extra latency or overhead associated with pulling vertex data programmatically from the shader on the newest generations of GPUs, as compared to doing the same using the standard fixed function.
Also, the latest generations of GPUs have more and more reasonably sized general-purpose L2 caches (e.g. 1536kiB on nvidia Kepler), so one may expect the incoherent access problem when pulling random offsets for the quad corners from a buffer texture being less of a problem.
This makes the idea of pulling constant data (such as quad sizes) from a buffer texture more attractive. A hypothetical implementation could thus reduce PCIe and memory transfers, as well as GPU memory, to a minimum with an approach like this:
Only upload a character index (one per character to be displayed) as the only input to a vertex shader that passes on this index and gl_VertexID, and amplify that to 4 points in the geometry shader, still having the character index and the vertex id (this will be "gl_primitiveID made available in the vertex shader") as the sole attributes, and capture this via transform feedback.
This will be fast, because there are only two output attributes (main bottleneck in GS), and it is close to "no-op" otherwise in both stages.
Bind a buffer texture which contains, for each character in the font, the textured quad's vertex positions relative to the base point (these are basically the "font metrics"). This data can be compressed to 4 numbers per quad by storing only the offset of the bottom left vertex, and encoding the width and height of the axis-aligned box (assuming half floats, this will be 8 bytes of constant buffer per character -- a typical 256 character font could fit completely into 2kiB of L1 cache).
Set an uniform for the baseline
Bind a buffer texture with horizontal offsets. These could probably even be calculated on the GPU, but it is much easier and more efficient to that kind of thing on the CPU, as it is a strictly sequential operation and not at all trivial (think of kerning). Also, it would need another feedback pass, which would be another sync point.
Render the previously generated data from the feedback buffer, the vertex shader pulls the horizontal offset of the base point and the offsets of the corner vertices from buffer objects (using the primitive id and the character index). The original vertex ID of the submitted vertices is now our "primitive ID" (remember the GS turned the vertices into quads).
Like this, one could ideally reduce the required vertex bandwith by 75% (amortized), though it would only be able to render a single line. If one wanted to be able to render several lines in one draw call, one would need to add the baseline to the buffer texture, rather than using an uniform (making the bandwidth gains smaller).
However, even assuming a 75% reduction -- since the vertex data to display "reasonable" amounts of text is only somewhere around 50-100kiB (which is practically zero to a GPU or a PCIe bus) -- I still doubt that the added complexity and losing backwards-compatibility is really worth the trouble. Reducing zero by 75% is still only zero. I have admittedly not tried the above approach, and more research would be needed to make a truly qualified statement. But still, unless someone can demonstrate a truly stunning performance difference (using "normal" amounts of text, not billions of characters!), my point of view remains that for the vertex data, a simple, plain old vertex buffer is justifiably good enough to be considered part of a "state of the art solution". It's simple and straightforward, it works, and it works well.
Having already referenced "OpenGL Insights" above, it is worth to also point out the chapter "2D Shape Rendering by Distance Fields" by Stefan Gustavson which explains distance field rendering in great detail.
Update 2016:
Meanwhile, there exist several additional techniques which aim to remove the corner rounding artefacts which become disturbing at extreme magnifications.
One approach simply uses pseudo-distance fields instead of distance fields (the difference being that the distance is the shortest distance not to the actual outline, but to the outline or an imaginary line protruding over the edge). This is somewhat better, and runs at the same speed (identical shader), using the same amount of texture memory.
Another approach uses the median-of-three in a three-channel texture details and implementation available at github. This aims to be an improvement over the and-or hacks used previously to address the issue. Good quality, slightly, almost not noticeably, slower, but uses three times as much texture memory. Also, extra effects (e.g. glow) are harder to get right.
Lastly, storing the actual bezier curves making up characters, and evaluating them in a fragment shader has become practical, with slightly inferior performance (but not so much that it's a problem) and stunning results even at highest magnifications.
WebGL demo rendering a large PDF with this technique in real time available here.
http://code.google.com/p/glyphy/
The main difference between GLyphy and other SDF-based OpenGL renderers is that most other projects sample the SDF into a texture. This has all the usual problems that sampling has. Ie. it distorts the outline and is low quality. GLyphy instead represents the SDF using actual vectors submitted to the GPU. This results in very high quality rendering.
The downside is that the code is for iOS with OpenGL ES. I'm probably going to make a Windows/Linux OpenGL 4.x port (hopefully the author will add some real documentation, though).
The most widespread technique is still textured quads. However in 2005 LORIA developed something called vector textures, i.e. rendering vector graphics as textures on primitives. If one uses this to convert TrueType or OpenType fonts into a vector texture you get this:
http://alice.loria.fr/index.php/publications.html?Paper=VTM#2005
I'm surprised Mark Kilgard's baby, NV_path_rendering (NVpr), was not mentioned by any of the above. Although its goals are more general than font rendering, it can also render text from fonts and with kerning. It doesn't even require OpenGL 4.1, but it is a vendor/Nvidia-only extension at the moment. It basically turns fonts into paths using glPathGlyphsNV which depends on the freetype2 library to get the metrics, etc. Then you can also access the kerning info with glGetPathSpacingNV and use NVpr's general path rendering mechanism to display text from using the path-"converted" fonts. (I put that in quotes, because there's no real conversion, the curves are used as is.)
The recorded demo for NVpr's font capabilities is unfortunately not particularly impressive. (Maybe someone should make one along the lines of the much snazzier SDF demo one can find on the intertubes...)
The 2011 NVpr API presentation talk for the fonts part starts here and continues in the next part; it is a bit unfortunate how that presentation is split.
More general materials on NVpr:
Nvidia NVpr hub, but some material on the landing page is not the most up-to-date
Siggraph 2012 paper for the brains of the path-rendering method, called "stencil, then cover" (StC); the paper also explains briefly how competing tech like Direct2D works. The font-related bits have been relegated to an annex of the paper. There are also some extras like videos/demos.
GTC 2014 presentation for an update status; in a nutshell: it's now supported by Google's Skia (Nvidia contributed the code in late 2013 and 2014), which in turn is used in Google Chrome and [independently of Skia, I think] in a beta of Adobe Illustrator CC 2014
the official documentation in the OpenGL extension registry
USPTO has granted at least four patents to Kilgard/Nvidia in connection with NVpr, of which you should probably be aware of, in case you want to implement StC by yourself: US8698837, US8698808, US8704830 and US8730253. Note that there are something like 17 more USPTO documents connected to this as "also published as", most of which are patent applications, so it's entirely possible more patents may be granted from those.
And since the word "stencil" did not produce any hits on this page before my answer, it appears the subset of the SO community that participated on this page insofar, despite being pretty numerous, was unaware of tessellation-free, stencil-buffer-based methods for path/font rendering in general. Kilgard has a FAQ-like post at on the opengl forum which may illuminate how the tessellation-free path rendering methods differ from bog standard 3D graphics, even though they're still using a [GP]GPU. (NVpr needs a CUDA-capable chip.)
For historical perspective, Kilgard is also the author of the classic "A Simple OpenGL-based API for Texture Mapped Text", SGI, 1997, which should not be confused with the stencil-based NVpr that debuted in 2011.
Most if not all the recent methods discussed on this page, including stencil-based methods like NVpr or SDF-based methods like GLyphy (which I'm not discussing here any further because other answers already cover it) have however one limitation: they are suitable for large text display on conventional (~100 DPI) monitors without jaggies at any level of scaling, and they also look nice, even at small size, on high-DPI, retina-like displays. They don't fully provide what Microsoft's Direct2D+DirectWrite gives you however, namely hinting of small glyphs on mainstream displays. (For a visual survey of hinting in general see this typotheque page for instance. A more in-depth resource is on antigrain.com.)
I'm not aware of any open & productized OpenGL-based stuff that can do what Microsoft can with hinting at the moment. (I admit ignorance to Apple's OS X GL/Quartz internals, because to the best of my knowledge Apple hasn't published how they do GL-based font/path rendering stuff. It seems that OS X, unlike MacOS 9, doesn't do hinting at all, which annoys some people.) Anyway, there is one 2013 research paper that addresses hinting via OpenGL shaders written by INRIA's Nicolas P. Rougier; it is probably worth reading if you need to do hinting from OpenGL. While it may seem that a library like freetype already does all the work when it comes to hinting, that's not actually so for the following reason, which I'm quoting from the paper:
The FreeType library can rasterize a glyph using sub-pixel anti-aliasing in RGB mode.
However, this is only half of the problem, since we also want to achieve sub-pixel
positioning for accurate placement of the glyphs. Displaying the textured quad at
fractional pixel coordinates does not solve the problem, since it only results in texture
interpolation at the whole-pixel level. Instead, we want to achieve a precise shift
(between 0 and 1) in the subpixel domain. This can be done in a fragment shader [...].
The solution is not exactly trivial, so I'm not going to try to explain it here. (The paper is open-access.)
One other thing I've learned from Rougier's paper (and which Kilgard doesn't seem to have considered) is that the font powers that be (Microsoft+Adobe) have created not one but two kerning specification methods. The old one is based on a so-called kern table and it is supported by freetype. The new one is called GPOS and it is only supported by newer font libraries like HarfBuzz or pango in the free software world. Since NVpr doesn't seem to support either of those libraries, kerning might not work out of the box with NVpr for some new fonts; there are some of those apparently in the wild, according to this forum discussion.
Finally, if you need to do complex text layout (CTL) you seem to be currently out of luck with OpenGL as no OpenGL-based library appears to exist for that. (DirectWrite on the other hand can handle CTL.) There are open-sourced libraries like HarfBuzz which can render CTL, but I don't know how you'd get them to work well (as in using the stencil-based methods) via OpenGL. You'd probably have to write the glue code to extract the re-shaped outlines and feed them into NVpr or SDF-based solutions as paths.
I think your best bet would be to look into cairo graphics with OpenGL backend.
The only problem I had when developing a prototype with 3.3 core was deprecated function usage in OpenGL backend. It was 1-2 years ago so situation might have improved...
Anyway, I hope in the future desktop opengl graphics drivers will implement OpenVG.

How to do ray tracing in modern OpenGL?

So I'm at a point that I should begin lighting my flatly colored models. The test application is a test case for the implementation of only latest methods so I realized that ideally it should be implementing ray tracing (since theoretically, it might be ideal for real time graphics in a few years).
But where do I start?
Assume I have never done lighting in old OpenGL, so I would be going directly to non-deprecated methods.
The application has currently properly set up vertex buffer objects, vertex, normal and color input and it correctly draws and transforms models in space, in a flat color.
Is there a source of information that would take one from flat colored vertices to all that is needed for a proper end result via GLSL? Ideally with any other additional lighting methods that might be required to complement it.
I would not advise to try actual ray tracing in OpenGL because you need a lot hacks and tricks for that and, if you ask me, there is not a point in doing this anymore at all.
If you want to do ray tracing on GPU, you should go with any GPGPU language, such as CUDA or OpenCL because it makes things a lot easier (but still, far from trivial).
To illustrate the problem a bit further:
For raytracing, you need to trace the secondary rays and test for intersection with the geometry. Therefore, you need access to the geometry in some clever way inside your shader, however inside a fragment shader, you cannot access the geometry, if you do not store it "coded" into some texture. The vertex shader also does not provide you with this geometry information natively, and geometry shaders only know the neighbors so here the trouble already starts.
Next, you need acceleration data-structures to get any reasonable frame-rates. However, traversing e.g. a Kd-Tree inside a shader is quite difficult and if I recall correctly, there are several papers solely on this problem.
If you really want to go this route, though, there are a lot papers on this topic, it should not be too hard to find them.
A ray tracer requires extremely well designed access patterns and caching to reach a good performance. However, you have only little control over these inside GLSL and optimizing the performance can get really tough.
Another point to note is that, at least to my knowledge, real time ray tracing on GPUs is mostly limited to static scenes because e.g. kd-trees only work (well) for static scenes. If you want to have dynamic scenes, you need other data-structures (e.g. BVHs, iirc?) but you constantly need to maintain those. If I haven't missed anything, there is still a lot of research currently going on just on this issue.
You may be confusing some things.
OpenGL is a rasterizer. Forcing it to do raytracing is possible, but difficult. This is why raytracing is not "ideal for real time graphics in a few years". In a few years, only hybrid systems will be viable.
So, you have three possibities.
Pure raytracing. Render only a fullscreen quad, and in your fragment shader, read your scene description packed in a buffer (like a texture), traverse the hierarchy, and compute ray-triangles intersections.
Hybrid raytracing. Rasterize your scene the normal way, and use raytracing in your shader on some parts of the scene that really requires it (refraction, ... but it can be simultated in rasterisation)
Pure rasterization. The fragment shader does its normal job.
What exactly do you want to achieve ? I can improve the answer depending on your needs.
Anyway, this SO question is highly related. Even if this particular implementation has a bug, it's definetely the way to go. Another possibility is openCL, but the concept is the same.
As for 2019 ray tracing is an option for real time rendering but requires high end GPUs most users don't have.
Some of these GPUs are designed specifically for ray tracing.
OpenGL currently does not support hardware accelerated ray tracing.
DirectX 12 on windows does have support for it. It is recommended to wait a few more years before creating a ray tracing only renderer although it is possible using DirectX 12 with current desktop and laptop hardware. Support from mobile may take a while.
Opengl (glsl) can be used for ray (path) tracing. however there are few better options: Nvidia OptiX (Cuda toolkit -- cross platform), directx 12 (with Nvidia ray tracing extension DXR -- windows only), vulkan (nvidia ray tracing extension VKR -- cross platform, and widely used), metal (only works on MacOS), falcor (DXR, VKR, OptiX based framework), Intel Embree (CPU ray tracing only).
I found some of the other answers to be verbose and wordy. For visual examples that YES, functional ray tracers absolutely CAN be built using the OpenGL API, I highly recommend checking out some of the projects people are making on https://www.shadertoy.com/ (Warning: lag)
To answer the topic: OpenGL has no RTX extension, but Vulkan has, and interop is possible. Example here: https://github.com/nvpro-samples/gl_vk_raytrace_interop
As for the actual question: To light the triangles, there are tons of techniques, look up for "forward", "forward+" or "deferred" renderers. The technique to be used depends on you goal. The simplest and most good-looking these days, is image based lighting (IBL) with physically based shading (PBS). Basically, you use a cubemap and blur it more or less depending on the glossiness of the object. For a simple object viewer you don't need more.