i am writing a game engine with directx 11.
I want to creating a lighting system, but, non light limits!
I sending this structure to my pixel shader:
struct LightingConstantBufferData {
DirectionalLight directionalLights[???];
};
Problem is 'directionalLights' number of array. C++ required const. But non limits lighting for dynamic array required. I am not know make this.
And my pixel shader code getting part:
cbuffer LightingConstantBufferData: register(b3) {
DirectionalLight directionalLights[SH_TOTAL_LIGHTS]; // 'SH_TOTAL_LIGHTS' is defined from cpp code(shader macro)
};
I come from OpenGL, so I may be wrong here, but as far as I know, you can send compute shaders a buffer of data, where the last element of the buffer can be an array of an unknown size, I'm not sure if this is the way in DirectX, but definetly worth a try, if you can do your calculation somehow in a compute shader.
Each 'pass' of rendering is going to be limited to a certain number of lights. Generally graphics shaders and constant buffers are not designed for dynamic arrays.
There are of course other approaches to achieve the effect of 'as many lights as you'd like'. Multi-pass rendering was the common approach in the old days where you'd render the same frame multiple times and blend the results together. Many engines used a mix of a few 'important' direct lights (3 or 4) for dynamic characters and objects, and static lightmaps for background.
Many modern game engines use 'deferred rendering' or a hybrid of deferred and forward techniques (called forward tiled rendering or "forward+") largely because it allows them to linearly scale lighting based on the desires of designers and artists.
A quick search of the Internet will bring up numerous articles on these topics. This one is a good overview.
Forward vs Deferred vs Forward+ Rendering with DirectX 11
My advice is start small and don't worry about hundreds of lights just yet... The DirectX Tool Kit may be of use to you as well.
I am in the middle of rendering different textures on multiple meshes of a model, but I do not have much clues about the procedures. Someone suggested for each mesh, create its own descriptor sets and call vkCmdBindDescriptorSets() and vkCmdDrawIndexed() for rendering like this:
// Pipeline with descriptor set layout that matches the shared descriptor sets
vkCmdBindPipeline(...pipelines.mesh...);
...
// Mesh A
vkCmdBindDescriptorSets(...&meshA.descriptorSet... );
vkCmdDrawIndexed(...);
// Mesh B
vkCmdBindDescriptorSets(...&meshB.descriptorSet... );
vkCmdDrawIndexed(...);
However, the above approach is quite different from the chopper sample and vulkan's samples that makes me have no idea where to start the change. I really appreciate any help to guide me to a correct direction.
Cheers
You have a conceptual object which is made of multiple meshes which have different texturing needs. The general ways to deal with this are:
Change descriptor sets between parts of the object. Painful, but it works on all Vulkan-capable hardware.
Employ array textures. Each individual mesh fetches its data from a particular layer in the array texture. Of course, this restricts you to having each sub-mesh use textures of the same size. But it works on all Vulkan-capable hardware (up to 128 array elements, minimum). The array layer for a particular mesh can be provided as a push-constant, or a base instance if that's available.
Note that if you manage to be able to do it by base instance, then you can render the entire object with a multi-draw indirect command. Though it's not clear that a short multi-draw indirect would be faster than just baking a short sequence of drawing commands into a command buffer.
Employ sampler arrays, as Sascha Willems suggests. Presumably, the array index for the sub-mesh is provided as a push-constant or a multi-draw's draw index. The problem is that, regardless of how that array index is provided, it will have to be a dynamically uniform expression. And Vulkan implementations are not required to allow you to index a sampler array with a dynamically uniform expression. The base requirement is just a constant expression.
This limits you to hardware that supports the shaderSampledImageArrayDynamicIndexing feature. So you have to ask for that, and if it's not available, then you've got to work around that with #1 or #2. Or just don't run on that hardware. But the last one means that you can't run on any mobile hardware, since most of them don't support this feature as of yet.
Note that I am not saying you shouldn't use this method. I just want you to be aware that there are costs. There's a lot of hardware out there that can't do this. So you need to plan for that.
The person that suggested the above code fragment was me I guess ;)
This is only one way of doing it. You don't necessarily have to create one descriptor set per mesh or per texture. If your mesh e.g. uses 4 different textures, you could bind all of them at once to different binding points and select them in the shader.
And if you a take a look at NVIDIA's chopper sample, they do it pretty much the same way only with some more abstraction.
The example also sets up descriptor sets for the textures used :
VkDescriptorSet *textureDescriptors = m_renderer->getTextureDescriptorSets();
binds them a few lines later :
VkDescriptorSet sets[3] = { sceneDescriptor, textureDescriptors[0], m_transform_descriptor_set };
vkCmdBindDescriptorSets(m_draw_command[inCommandIndex], VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 0, 3, sets, 0, NULL);
and then renders the mesh with the bound descriptor sets :
vkCmdDrawIndexedIndirect(m_draw_command[inCommandIndex], sceneIndirectBuffer, 0, inCount, sizeof(VkDrawIndexedIndirectCommand));
vkCmdDraw(m_draw_command[inCommandIndex], 1, 1, 0, 0);
If you take a look at initDescriptorSets you can see that they also create separate descriptor sets for the cubemap, the terrain, etc.
The LunarG examples should work similar, though if I'm not mistaken they never use more than one texture?
I am trying to run 100000 and more particles.
I've been watching many tutorials and other examples that demonstrate the power of shaders and OpenCL.
In one example that I watched, particle's position was calculated based on the position of your mouse pointer(physical device that you hold with one hand and cursor on the screen).
The position of each particle was stored as RGB. R being x, G y, and B, z. And passed to pixel shader.And then each color pixel was drawn as position of particle afterward.
However I felt absurd towards this approach.
Isn't this approach or coding style rather to be avoided?
Shoudn't I learn how to use OpenCL and use the power of GPU's multithreading to directly state and pass my intended code?
Isn't this approach or coding style rather to be avoided?
Why?
The entire point of shaders is for you to be able to do what you want, to more effectively express what you want to do, and to allow yourself greater control over the hardware.
You should never, ever be afraid of re-purposing something for a different functionality. Textures do not store colors; they store data, which can be color, but it can also be other stuff. The sooner you stop thinking of textures as pictures, the better off you will be as a graphics programmer.
The GPU and API exist to be used. Use it as you see fit; do not allow how you think the API should be used to limit you.
Shoudn't I learn how to use OpenCL and use the power of GPU's multithreading to directly state and pass my intended code?
Yesterday, I would have said "yes". However, today this was released: OpenGL compute shaders.
The fact that the OpenGL ARB and Khronos created this shader type and so forth is a tacit admission that OpenCL/OpenGL interop is not the most efficient way to generate data for rendering purposes. After all, if it was, there would be no need for OpenGL to have generalized compute functionality. There were 3 versions of GL 4.x that didn't provide this. The fact that it's here now is basically the ARB saying, "Yeah, OK, we need this."
If the ARB, staffed by many people who make the hardware, think that CL/GL interop is not the fastest way to go, then it's pretty clear that you should use compute shaders.
Of course, if you're trying to do something right now, that won't help; only NVIDIA has compute shader support. And even that's only in beta drivers. It will take many months before AMD gets support for them, and many more before that support becomes solid and stable enough to use.
Even so, you don't need compute shaders to generate data. People have used transform feedback and geometry shaders to do LOD and frustum culling for instanced rendering. Do not be afraid to think outside of the "OpenGL draws stuff" box.
To simulate particles in OpenCL, you should try out "Yet Another Shader Editor" / http://yase.chnk.us/ - it takes away all the tricky parts and lets you get down to the meat of coding the particle control algorithms. IN YOUR BROWSER. Nothing to download, no accounts to create, just alter whatever examples you find. It's a blast.
https://lotsacode.wordpress.com/2013/04/16/fun-with-particles-yet-another-shader-editor/
I'm not affiliated with yase in any way.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
The community reviewed whether to reopen this question last month and left it closed:
Original close reason(s) were not resolved
Improve this question
There are already a number of questions about text rendering in OpenGL, such as:
How to do OpenGL live text-rendering for a GUI?
But mostly what is discussed is rendering textured quads using the fixed-function pipeline. Surely shaders must make a better way.
I'm not really concerned about internationalization, most of my strings will be plot tick labels (date and time or purely numeric). But the plots will be re-rendered at the screen refresh rate and there could be quite a bit of text (not more than a few thousand glyphs on-screen, but enough that hardware accelerated layout would be nice).
What is the recommended approach for text-rendering using modern OpenGL? (Citing existing software using the approach is good evidence that it works well)
Geometry shaders that accept e.g. position and orientation and a character sequence and emit textured quads
Geometry shaders that render vector fonts
As above, but using tessellation shaders instead
A compute shader to do font rasterization
Rendering outlines, unless you render only a dozen characters total, remains a "no go" due to the number of vertices needed per character to approximate curvature. Though there have been approaches to evaluate bezier curves in the pixel shader instead, these suffer from not being easily antialiased, which is trivial using a distance-map-textured quad, and evaluating curves in the shader is still computationally much more expensive than necessary.
The best trade-off between "fast" and "quality" are still textured quads with a signed distance field texture. It is very slightly slower than using a plain normal textured quad, but not so much. The quality on the other hand, is in an entirely different ballpark. The results are truly stunning, it is as fast as you can get, and effects such as glow are trivially easy to add, too. Also, the technique can be downgraded nicely to older hardware, if needed.
See the famous Valve paper for the technique.
The technique is conceptually similar to how implicit surfaces (metaballs and such) work, though it does not generate polygons. It runs entirely in the pixel shader and takes the distance sampled from the texture as a distance function. Everything above a chosen threshold (usually 0.5) is "in", everything else is "out". In the simplest case, on 10 year old non-shader-capable hardware, setting the alpha test threshold to 0.5 will do that exact thing (though without special effects and antialiasing).
If one wants to add a little more weight to the font (faux bold), a slightly smaller threshold will do the trick without modifying a single line of code (just change your "font_weight" uniform). For a glow effect, one simply considers everything above one threshold as "in" and everything above another (smaller) threshold as "out, but in glow", and LERPs between the two. Antialiasing works similarly.
By using an 8-bit signed distance value rather than a single bit, this technique increases the effective resolution of your texture map 16-fold in each dimension (instead of black and white, all possible shades are used, thus we have 256 times the information using the same storage). But even if you magnify far beyond 16x, the result still looks quite acceptable. Long straight lines will eventually become a bit wiggly, but there will be no typical "blocky" sampling artefacts.
You can use a geometry shader for generating the quads out of points (reduce bus bandwidth), but honestly the gains are rather marginal. The same is true for instanced character rendering as described in GPG8. The overhead of instancing is only amortized if you have a lot of text to draw. The gains are, in my opinion, in no relation to the added complexity and non-downgradeability. Plus, you are either limited by the amount of constant registers, or you have to read from a texture buffer object, which is non-optimal for cache coherence (and the intent was to optimize to begin with!).
A simple, plain old vertex buffer is just as fast (possibly faster) if you schedule the upload a bit ahead in time and will run on every hardware built during the last 15 years. And, it is not limited to any particular number of characters in your font, nor to a particular number of characters to render.
If you are sure that you do not have more than 256 characters in your font, texture arrays may be worth a consideration to strip off bus bandwidth in a similar manner as generating quads from points in the geometry shader. When using an array texture, the texture coordinates of all quads have identical, constant s and t coordinates and only differ in the r coordinate, which is equal to the character index to render.
But like with the other techniques, the expected gains are marginal at the cost of being incompatible with previous generation hardware.
There is a handy tool by Jonathan Dummer for generating distance textures: description page
Update:
As more recently pointed out in Programmable Vertex Pulling (D. Rákos, "OpenGL Insights", pp. 239), there is no significant extra latency or overhead associated with pulling vertex data programmatically from the shader on the newest generations of GPUs, as compared to doing the same using the standard fixed function.
Also, the latest generations of GPUs have more and more reasonably sized general-purpose L2 caches (e.g. 1536kiB on nvidia Kepler), so one may expect the incoherent access problem when pulling random offsets for the quad corners from a buffer texture being less of a problem.
This makes the idea of pulling constant data (such as quad sizes) from a buffer texture more attractive. A hypothetical implementation could thus reduce PCIe and memory transfers, as well as GPU memory, to a minimum with an approach like this:
Only upload a character index (one per character to be displayed) as the only input to a vertex shader that passes on this index and gl_VertexID, and amplify that to 4 points in the geometry shader, still having the character index and the vertex id (this will be "gl_primitiveID made available in the vertex shader") as the sole attributes, and capture this via transform feedback.
This will be fast, because there are only two output attributes (main bottleneck in GS), and it is close to "no-op" otherwise in both stages.
Bind a buffer texture which contains, for each character in the font, the textured quad's vertex positions relative to the base point (these are basically the "font metrics"). This data can be compressed to 4 numbers per quad by storing only the offset of the bottom left vertex, and encoding the width and height of the axis-aligned box (assuming half floats, this will be 8 bytes of constant buffer per character -- a typical 256 character font could fit completely into 2kiB of L1 cache).
Set an uniform for the baseline
Bind a buffer texture with horizontal offsets. These could probably even be calculated on the GPU, but it is much easier and more efficient to that kind of thing on the CPU, as it is a strictly sequential operation and not at all trivial (think of kerning). Also, it would need another feedback pass, which would be another sync point.
Render the previously generated data from the feedback buffer, the vertex shader pulls the horizontal offset of the base point and the offsets of the corner vertices from buffer objects (using the primitive id and the character index). The original vertex ID of the submitted vertices is now our "primitive ID" (remember the GS turned the vertices into quads).
Like this, one could ideally reduce the required vertex bandwith by 75% (amortized), though it would only be able to render a single line. If one wanted to be able to render several lines in one draw call, one would need to add the baseline to the buffer texture, rather than using an uniform (making the bandwidth gains smaller).
However, even assuming a 75% reduction -- since the vertex data to display "reasonable" amounts of text is only somewhere around 50-100kiB (which is practically zero to a GPU or a PCIe bus) -- I still doubt that the added complexity and losing backwards-compatibility is really worth the trouble. Reducing zero by 75% is still only zero. I have admittedly not tried the above approach, and more research would be needed to make a truly qualified statement. But still, unless someone can demonstrate a truly stunning performance difference (using "normal" amounts of text, not billions of characters!), my point of view remains that for the vertex data, a simple, plain old vertex buffer is justifiably good enough to be considered part of a "state of the art solution". It's simple and straightforward, it works, and it works well.
Having already referenced "OpenGL Insights" above, it is worth to also point out the chapter "2D Shape Rendering by Distance Fields" by Stefan Gustavson which explains distance field rendering in great detail.
Update 2016:
Meanwhile, there exist several additional techniques which aim to remove the corner rounding artefacts which become disturbing at extreme magnifications.
One approach simply uses pseudo-distance fields instead of distance fields (the difference being that the distance is the shortest distance not to the actual outline, but to the outline or an imaginary line protruding over the edge). This is somewhat better, and runs at the same speed (identical shader), using the same amount of texture memory.
Another approach uses the median-of-three in a three-channel texture details and implementation available at github. This aims to be an improvement over the and-or hacks used previously to address the issue. Good quality, slightly, almost not noticeably, slower, but uses three times as much texture memory. Also, extra effects (e.g. glow) are harder to get right.
Lastly, storing the actual bezier curves making up characters, and evaluating them in a fragment shader has become practical, with slightly inferior performance (but not so much that it's a problem) and stunning results even at highest magnifications.
WebGL demo rendering a large PDF with this technique in real time available here.
http://code.google.com/p/glyphy/
The main difference between GLyphy and other SDF-based OpenGL renderers is that most other projects sample the SDF into a texture. This has all the usual problems that sampling has. Ie. it distorts the outline and is low quality. GLyphy instead represents the SDF using actual vectors submitted to the GPU. This results in very high quality rendering.
The downside is that the code is for iOS with OpenGL ES. I'm probably going to make a Windows/Linux OpenGL 4.x port (hopefully the author will add some real documentation, though).
The most widespread technique is still textured quads. However in 2005 LORIA developed something called vector textures, i.e. rendering vector graphics as textures on primitives. If one uses this to convert TrueType or OpenType fonts into a vector texture you get this:
http://alice.loria.fr/index.php/publications.html?Paper=VTM#2005
I'm surprised Mark Kilgard's baby, NV_path_rendering (NVpr), was not mentioned by any of the above. Although its goals are more general than font rendering, it can also render text from fonts and with kerning. It doesn't even require OpenGL 4.1, but it is a vendor/Nvidia-only extension at the moment. It basically turns fonts into paths using glPathGlyphsNV which depends on the freetype2 library to get the metrics, etc. Then you can also access the kerning info with glGetPathSpacingNV and use NVpr's general path rendering mechanism to display text from using the path-"converted" fonts. (I put that in quotes, because there's no real conversion, the curves are used as is.)
The recorded demo for NVpr's font capabilities is unfortunately not particularly impressive. (Maybe someone should make one along the lines of the much snazzier SDF demo one can find on the intertubes...)
The 2011 NVpr API presentation talk for the fonts part starts here and continues in the next part; it is a bit unfortunate how that presentation is split.
More general materials on NVpr:
Nvidia NVpr hub, but some material on the landing page is not the most up-to-date
Siggraph 2012 paper for the brains of the path-rendering method, called "stencil, then cover" (StC); the paper also explains briefly how competing tech like Direct2D works. The font-related bits have been relegated to an annex of the paper. There are also some extras like videos/demos.
GTC 2014 presentation for an update status; in a nutshell: it's now supported by Google's Skia (Nvidia contributed the code in late 2013 and 2014), which in turn is used in Google Chrome and [independently of Skia, I think] in a beta of Adobe Illustrator CC 2014
the official documentation in the OpenGL extension registry
USPTO has granted at least four patents to Kilgard/Nvidia in connection with NVpr, of which you should probably be aware of, in case you want to implement StC by yourself: US8698837, US8698808, US8704830 and US8730253. Note that there are something like 17 more USPTO documents connected to this as "also published as", most of which are patent applications, so it's entirely possible more patents may be granted from those.
And since the word "stencil" did not produce any hits on this page before my answer, it appears the subset of the SO community that participated on this page insofar, despite being pretty numerous, was unaware of tessellation-free, stencil-buffer-based methods for path/font rendering in general. Kilgard has a FAQ-like post at on the opengl forum which may illuminate how the tessellation-free path rendering methods differ from bog standard 3D graphics, even though they're still using a [GP]GPU. (NVpr needs a CUDA-capable chip.)
For historical perspective, Kilgard is also the author of the classic "A Simple OpenGL-based API for Texture Mapped Text", SGI, 1997, which should not be confused with the stencil-based NVpr that debuted in 2011.
Most if not all the recent methods discussed on this page, including stencil-based methods like NVpr or SDF-based methods like GLyphy (which I'm not discussing here any further because other answers already cover it) have however one limitation: they are suitable for large text display on conventional (~100 DPI) monitors without jaggies at any level of scaling, and they also look nice, even at small size, on high-DPI, retina-like displays. They don't fully provide what Microsoft's Direct2D+DirectWrite gives you however, namely hinting of small glyphs on mainstream displays. (For a visual survey of hinting in general see this typotheque page for instance. A more in-depth resource is on antigrain.com.)
I'm not aware of any open & productized OpenGL-based stuff that can do what Microsoft can with hinting at the moment. (I admit ignorance to Apple's OS X GL/Quartz internals, because to the best of my knowledge Apple hasn't published how they do GL-based font/path rendering stuff. It seems that OS X, unlike MacOS 9, doesn't do hinting at all, which annoys some people.) Anyway, there is one 2013 research paper that addresses hinting via OpenGL shaders written by INRIA's Nicolas P. Rougier; it is probably worth reading if you need to do hinting from OpenGL. While it may seem that a library like freetype already does all the work when it comes to hinting, that's not actually so for the following reason, which I'm quoting from the paper:
The FreeType library can rasterize a glyph using sub-pixel anti-aliasing in RGB mode.
However, this is only half of the problem, since we also want to achieve sub-pixel
positioning for accurate placement of the glyphs. Displaying the textured quad at
fractional pixel coordinates does not solve the problem, since it only results in texture
interpolation at the whole-pixel level. Instead, we want to achieve a precise shift
(between 0 and 1) in the subpixel domain. This can be done in a fragment shader [...].
The solution is not exactly trivial, so I'm not going to try to explain it here. (The paper is open-access.)
One other thing I've learned from Rougier's paper (and which Kilgard doesn't seem to have considered) is that the font powers that be (Microsoft+Adobe) have created not one but two kerning specification methods. The old one is based on a so-called kern table and it is supported by freetype. The new one is called GPOS and it is only supported by newer font libraries like HarfBuzz or pango in the free software world. Since NVpr doesn't seem to support either of those libraries, kerning might not work out of the box with NVpr for some new fonts; there are some of those apparently in the wild, according to this forum discussion.
Finally, if you need to do complex text layout (CTL) you seem to be currently out of luck with OpenGL as no OpenGL-based library appears to exist for that. (DirectWrite on the other hand can handle CTL.) There are open-sourced libraries like HarfBuzz which can render CTL, but I don't know how you'd get them to work well (as in using the stencil-based methods) via OpenGL. You'd probably have to write the glue code to extract the re-shaped outlines and feed them into NVpr or SDF-based solutions as paths.
I think your best bet would be to look into cairo graphics with OpenGL backend.
The only problem I had when developing a prototype with 3.3 core was deprecated function usage in OpenGL backend. It was 1-2 years ago so situation might have improved...
Anyway, I hope in the future desktop opengl graphics drivers will implement OpenVG.
I am displaying a texture that I want to manipulate without out affecting the image data. I want to be able to clamp the texel values so that anything below the lower value becomes 0, anything above the upper value becomes 0, and anything between is linearly mapped from 0 to 1.
Originally, to display my image I was using glDrawPixels. And to solve the problem above I would create a color map using glPixelMap. This worked beautifully. However, for performance reasons I have begun using textures to display my image. The glPixelMap approach no longer seems to work. Well that approach may work but I was unable to get it working.
I then tried using glPixelTransfer to set scales and bias'. This seemed to have some sort of effect (not necessarily the desired) on first pass, but when the upper and lower constraints were changed no effect was visible.
I was then told that fragment shaders would work. But after a call to glGetString(GL_EXTENSIONS), I found that GL_ARB_fragment_shader was not supported. Plus, a call to glCreateShaderObjectARB cause a nullreferenceexception.
So now I am at a loss. What should I do? Please Help.
What ever might work I am willing to try. The vendor is Intel and the renderer is Intel 945G. I am unfortunately confined to a graphics card that is integrated on the motherboard, and only has gl 1.4.
Thanks for your response thus far.
Unless you have a pretty old graphics-card, it's surprising that you don't have fragment-shader support. I'd suggest you try double-checking using this.
Also, are you sure you want anything above the max value to be 0? Perhaps you meant 1? If you did mean 1 and not 0 then are quite long-winded ways to do what you're asking.
The condensed answer is that you use multiple rendering-passes. First you render the image at normal intensity. Then you use subtractive blending (look up glBlendEquation) to subtract your minimum value. Then you use additive blending to multiply everything up by 1/(max-min) (which may need multiple passes).
If you really want to do this, please post back the GL_VENDOR and GL_RENDERER for your graphics-card.
Edit: Hmm. Intel 945G don't have ARB_fragment_shader, but it does have ARB_fragment_program which will also do the trick.
Your fragment-code should look something like this (but it's been a while since I wrote any so it's probably bugged)
!!ARBfp1.0
ATTRIB tex = fragment.texcoord[0]
PARAM cbias = program.local[0]
PARAM cscale = program.local[1]
OUTPUT cout = result.color
TEMP tmp
TXP tmp, tex, texture[0], 2D
SUB tmp, tmp, cbias
MUL cout, tmp, cscale
END
You load this into OpenGL like so:
GLuint prog;
glEnable(GL_FRAGMENT_PROGRAM_ARB);
glGenProgramsARB(1, &prog);
glBindProgramARB(GL_FRAGMENT_PROGRAM_ARB, prog);
glProgramStringARB(GL_FRAGMENT_PROGRAM_ARB, GL_PROGRAM_FORMAT_ASCII_ARB, strlen(src), src);
glDisable(GL_FRAGMENT_PROGRAM_ARB);
Then, before rendering your geometry, you do this:
glEnable(GL_FRAGMENT_PROGRAM_ARB);
glBindProgramARB(GL_FRAGMENT_PROGRAM_ARB, prog);
colour4f cbias = cmin;
colour4f cscale = 1.0f / (cmax-cmin);
glProgramLocalParameter4fARB(GL_FRAGMENT_PROGRAM_ARB, 0, cbias.r, cbias.g, cbias.b, cbias.a);
glProgramLocalParameter4fARB(GL_FRAGMENT_PROGRAM_ARB, 1, cscale.r, cscale.g, cscale.b, cscale.a);
//Draw your textured geometry
glDisable(GL_FRAGMENT_PROGRAM_ARB);
Also see if the GL_ARB_fragment_program extension is supported. That extension supports the ASM style fragment programs. That is supposed to be supported in OpenGL 1.4.
It's really unfortunate that you're using such an ancient version of OpenGL. Can you upgrade with your card?
For a more modern OGL 2.x, this is exactly the kind of program that GLSL is for. Great documentation can be found here:
OpenGL Documentation
OpenGL Shading Langauge