I made a new website and decided to ditch the Wordpress blog format. Should be easier to keep this one updated with posts and new pictures, given that it actually has a working gallery and stuff.
Of course, I promised that I'd update more often before and that didn't turn into much, but whatever. Here goes attempt number n.
Only the old OpenGL article is preserved from before.
Now with the popularization of things like WebGL, I'm starting to see a lot of people dip their feet into GL programming, including people who wouldn't normally get anywhere near it like web programmers. It's great to see this kind of high-performance and low-level control extended to the web, cell phones, and whatever beyond mere desktop PCs.
Unfortunately they're all doing it wrong.
I mean, that's expected from someone who's new to it. And there's a problem with every OpenGL "tutorial" out there in that none of them help you adjust your thinking properly to really understand what's going on under the hood.
Granted, that's what you'll be doing in the end to get your visuals to your user. But more importantly, there are a few parts that happen implicitly that can be costly and everyone really should be aware of. Forgive me for the number of 90% true generalizations I'm about to throw out, but I feel that simplifying some of this stuff in the name of easy understanding is necessary.
When you tell it to draw a triangle, rather than drawing your triangle and returning after it's done, it actually sticks your command on the end of a big list of commands, then returns. The graphics card is gradually working through this list in the background.
Here's the first big important takeaway from this: The graphics processor (GPU) is an entirely separate processor from your CPU and can handle itself pretty well. Ideally, you want to have it start handling all those graphics functions, then you go do something else on the CPU. Handle game logic or something. Concurrency matters because if you do it wrong you'll lose half your framerate.
Consider this simple game loop for a single-threaded game:
flipScreen() is your favorite platform-specific screen update function. (I personally prefer SDL_GL_SwapBuffers.
This loop is pretty bad, because your CPU and GPU end up waiting on each other quite a lot. If you were to try to visualize processor usage over time it would look like this...
('*' = Time utilized, '-' = Time waiting) The GPU ends up waiting for the CPU to finish all its game logic before it can even start because its list of commands to execute is just empty that whole time. Similarly, SDL_GL_SwapBuffers or whatever equivalent you're using probably calls glFlush or glFinish internally. This command causes the CPU to wait until the GPU is done before returning.
So consider this loop instead...
Now because you've filled the list of GPU commands, it has something to work on while the CPU crunches through game logic. And both processors are more fully used, leading to a shorter overall frame time...
These are pretty naive examples of game loops, so don't take it as gospel or anything. Just be aware of how this stuff works under the hood. And know that when you tell it to draw a triangle, it might not actually finish drawing that triangle until glFinish or something returns.
And it generally goes one way. When you ask for graphics data back from the GPU, it's like calling glFinish. It'll cause everything you've done to stop and wait until the video card is done before returning.
Imagine rendering a bunch of triangles to the screen, and then taking a screenshot. The video card must finish rendering the triangles before you can get access to the image data it results in. Drivers can obviously try to be clever about how they handle dependencies of that sort, but I wouldn't rely on them. Simply consider that asking for data back from the GPU is a potentially costly operation.
If you want to see clever ways to work around these stalls, you may look into how people set up hardware occlusion queries.
Don't use "if" statements or "for" loops of any sort in your shaders. They're terrible at that. CPUs have complicated systems set up to predict branching logic and start executing future code in a pipeline before the result of a test is done (probably to throw it away if the prediction is incorrect). But GPUs are really bad at this.
Don't get me wrong. GPUs are incredibly fast at all kind of math. They just suck at branching. The closer you can get your shader to being a pure mathematical equation, rather than something with logical choices here and there, the better.
Or at least what you expect to be. The driver can put stuff wherever you want, but once you say glTexImage2D, you're basically telling OpenGL to copy that texture image data from your system's RAM to the video card's own RAM. The pipe between the CPU's RAM and the GPU's RAM is pretty big, but it's not infinite. Calling functions too much that push lots of bulk data can slow things down. (Also, the driver is allowed to handle that stuff however it wants, so it could end up getting swapped into VRAM whenever it feels like it.)
On the other hand, things that are in video memory can be used for rendering pretty damn fast. Vertex buffers, index buffers, textures, compiled shader programs, and so on all need to be in the graphics card's memory to be useful. As a rule of thumb, once GL owns the memory and the only way you can access the data for it is through a GLuint associated with it instead of a pointer, it's probably either in video memory, on its way there, or the driver is doing something clever with it (in which case you shouldn't worry about it at this point anyway). (The GLuint I'm referring to are things like the values generated by glGenTextures, glGenBuffers, glCreateProgram, and others.
This actually depends a lot on the size of the texture, filtering mechanism, and a bunch of other junk. In general, though, you should try to minimize the number of superfluous texture samples in a shader (calling texture2D in a shader is a texture sample).
The other really important thing that nobody ever tells you is this: Don't sample from one texture and use the resulting value to sample another texture. That is, imagine you sample a color from a texture...
...and then use that color as the position in another sample...
This is called a "dependent texture read" and is potentially very slow due to the way drivers and video cards try to predict stuff. The reasoning for this is possibly a little esoteric, but the results are something you have to deal with.
Most of the time you will have more fragments than vertices. It's probably okay to make your vertex shaders more complicated than your fragment shaders. How you divide up some of the logic is up to you.
I just can't think of it right now. Good luck!
Edit: This is also pretty useful. Found it on the opengl.org wiki: Common_Mistakes.