29 KiB
3D Rendering
In computer graphics 3D rendering is the process of computing images which represent a projected view of 3D objects through a virtual camera.
There are many methods and algorithms for doing so differing in many aspects such as computation complexity, implementation complexity, realism of the result, representation of the 3D data, limitations of viewing and so on. If you are just interested in the realtime 3D rendering used in gaymes nowadays, you are probably interested in GPU-accelerated 3D rasterization with APIs such as OpenGL and Vulkan.
LRS has a simple 3D rendering library called small3dlib.
Methods
As most existing 3D "frameworks" are harmful, a LRS programmer is likely to write his own 3D rendering system that suits his program best, therefore we should list some common methods of achieving 3D. Besides that, it's just pretty interesting to see what there is in the store.
A very important realization of a graphics programmer is that 3D rendering is to a great extent about faking (especially the mainstream realtime 3D) -- it is an endeavor that seeks to produce something that looks somehow familiar to HUMAN sight specifically and so even though the methods are mathematical, the endeavor is really an art in the end, not dissimilar to that of a magician who searches for "smoke and mirrors" hacks to produce illusions for the audience. Reality is infinitely complex, we use nothing else but approximations and neglecting that rely on assumptions about human sight such as "60 FPS looks like smooth movement to human eye", "infrared spectrum is invisible", "humans can't tell a mirror reflection is a bit off", "inner corners are usually darker than flat surfaces", "no shadow is completely black because light scatters in the atmosphere" etc. Really 3D graphics is nothing but searching for what looks good enough, and deciding this relies on a SUBJECTIVE judgement of a human (and sometimes every individual). In theory -- if we had infinitely powerful computers -- we would just program in a few lines of electromagnetic equations and run the precise simulation of light propagating in 3D environment to produce an absolutely realistic result, but though some methods try to come close to said approach, we simply won't ever have infinitely powerful computers. For this we have to resort to a bit more ugly approach of identifying specific notable real-life phenomena individually (for example caustics, Fresnel, mirror reflections, refractions, subsurface scattering, metallicity, noise, motion blur and myriads of others) and addressing each one individually with special treatment, many times correcting and masking our imperfections (e.g. applying antialiasing because we dared to use a simplified model of light sampling, applying texture filtering because we dared to only use finite amount of memory for our data, applying postprocessing etc.).
Rendering spectrum: The book Real-Time Rendering mentions that methods for 3D rendering can be seen as lying on a spectrum, one extreme of which is appearance reproduction and the other physics simulation. Methods closer to trying to imitate the appearance try to simply focus on imitating the look of an object on the monitor that the actual 3D object would have in real life, without being concerned with how that look arises in real life (i.e. closer to the "faking" approach mentioned above) -- these may e.g. use image data such as photographs; these methods may rely on lightfields, photo textures etc. The physics simulation methods try to replicate the behavior of light in real life -- their main goal is to solve the rendering equation, still only more or less approximately -- and so, through internally imitating the same processes, come to similar visual results that arise in real world: these methods rely on creating 3D geometry (e.g. that made of triangles or voxels), computing light reflections and global illumination. This is often easier to program but more computationally demanding. Most methods lie somewhere in between these two extremes: for example billboards and particle systems may use a texture to represent an object while at the same time using 3D quads (very simple 3D models) to correctly deform the textures by perspective and solve their visibility. The classic polygonal 3D models are also usually somewhere in between: the 3D geometry and shading are trying to simulate the physics, but e.g. a photo texture mapped on such 3D model is the opposite appearance-based approach (PBR further tries to shift the use of textures more towards the physics simulation end).
With this said, let's not take a look at possible classifications of 3D rendering methods. As seen, there are many ways:
- by order:
- object order: The method iterates on objects and draws object by object, one after another. This results in pixels being drawn to "random" places on the screen and possibly already drawn pixels being overdrawn with new pixels (though this can be further reduced). Typically requires a frame buffer and double buffering, often also z-buffer (or sorting), i.e. requires a lot of memory. This method is also a bit ugly but typically also faster than the alternative, so it is prevailing nowadays.
- image order: The method iterates on screen pixels, typically going pixel by pixel from left to right, top to bottom, deciding the color of each pixel independently. May be easier to program and require less memory (no frame buffer is needed, see e.g. frameless rendering), however though parallelism is applicable here (many pixels may potentially be independently computed in parallel, speeding up rendering), the algorithms used (e.g. path tracing) often have to expensively simulate light behavior and so performance is still an issue.
- by speed:
- by relative limitation:
- primitive/"pseudo3D"/2.5D/...: Older methods that produce 3D views but had great limitations e.g. in camera degrees of freedom or possible environment geometry that was usually limited to a "2D sector map" (see e.g. Doom).
- full/"true" 3D: The "new" way of 3D rendering that allows freely rotating camera, arbitrary 3D geometry etc. Though this still has limitations (as any computer approximation of reality), many people just call this the "true" 3D.
- by approach (sides of above mentioned rendering spectrum):
- appearance based: Focuses on achieving desired appearance by any means necessary, faking, "cheating", not trying to stay physically correct. This is typically faster.
- physics simulation (see also physically based rendering): Focuses on simulating the underlying physics of reality with high correctness so that we also get a very realistic result.
- by main method/algorithm (see also the table below):
- rasterization: Appearance based object order methods further based on a relatively simple algorithm capable of drawing (rasterizing) a simple geometric shape (such as a triangle) which we then use to draw the whole complex 3D scene (composed of great many of triangles).
- ray casting/tracing: Physics simulation image order methods further based on tracing paths of light in a manner that's closer to reality.
- ...
- by 3D data (vector vs raster classification applies here just as in 2D graphics):
- triangle meshes (vector, and other boundary representations)
- voxels (raster, and potentially other volumetric representations)
- point clouds
- heightmaps
- implicit surfaces
- smooth surfaces (e.g. NURBS)
- 2D sectors (e.g. Doom's BSP "pseudo 3D" rendering)
- ...
- by hardware:
- software rendering: Rendering only with CPU. This is typically slower as a CPU mostly performs sequential computation, eliminating the possible parallelism optimization, however the approach is more KISS and portable.
- GPU accelerated: Making use of specialized graphics rendering hardware (GPU) that typically uses heavy parallelism to immensely speed up rendering. While this is the mainstream, extremely fast way of rendering, it is also greatly bloated while often being an overkill that greatly complicates programming and makes programs less portable, less future proof etc.
- by realism of output:
- photorealistic
- stylized, flat shaded, wireframe, ...
- ...
- hybrids: Methods may be combined and/or lie in between different extremes, for example we may see a rasterizer 3D renderer that uses ray tracing to add detail (shadows, reflections, ...) to the scene, we may see renderers that allow triangle meshes as well as voxels etc. { One nice hybrid looking engine is e.g. Chasm: The Rift. ~drummyfish }
- ...
Finally a table of some common 3D rendering methods follows, including the most simple, most advanced and some unconventional ones. Note that here we talk about methods and techniques rather than algorithms, i.e. general approaches that are often modified and combined into a specific rendering algorithm. For example the traditional triangle rasterization is sometimes combined with raytracing to add e.g. realistic reflections. The methods may also be further enriched with features such as texturing, antialiasing and so on. The table below should help you choose the base 3D rendering method for your specific program.
The methods may be tagged with the following:
- 2.5D: primitive 3D, often called pseudo 3D or fake 3D, having significant limitations e.g. in degrees of freedom of the camera
- off: slow method usually used for offline (non-realtime) rendering (even though they indeed may run in real time e.g. with the help of powerful GPUs)
- IO vs OO: image order (rendering by pixels) vs object order (rendering by objects)
method | notes |
---|---|
3D raycasting | IO off, shoots rays from camera |
2D raycasting | IO 2.5D, e.g. Wolf3D |
AI image synthesis | "just let AI magic do it" |
beamtracing | IO off |
billboarding | OO |
BSP rendering | 2.5D, e.g. Doom |
conetracing | IO off |
"dungeon crawler" | OO 2.5D, e.g. Eye of the Beholder |
edge list, scanline, span rasterization | IO, e.g. Quake 1 |
ellipsoid rasterization | OO, e.g. Ecstatica |
flat-shaded 1 point perspective | OO 2.5D, e.g. Skyroads |
reverse raytracing (photon tracing) | OO off, inefficient |
image based rendering | generally using images as 3D data |
light fields | image-based, similar to holography |
mode 7 | IO 2.5D, e.g. F-Zero |
parallax scrolling | 2.5D, very primitive |
pathtracing | IO off, Monte Carlo, high realism |
portal rendering | 2.5D, e.g. Duke3D |
prerendered view angles | 2.5D, e.g. Iridion II (GBA) |
raymarching | IO off, e.g. with SDFs |
raytracing | IO off, recursive 3D raycasting |
segmented road | OO 2.5D, e.g. Outrun |
shear warp rednering | IO, volumetric |
splatting | OO, rendering with 2D blobs |
texture slicing | OO, volumetric, layering textures |
triangle rasterization | OO, traditional in GPUs |
voxel space rendering | OO 2.5D, e.g. Comanche |
wireframe rendering | OO, just lines |
TODO: Rescue On Fractalus!
TODO: find out how build engine/slab6 voxel rendering worked and possibly add it here (from http://advsys.net/ken/voxlap.htm seems to be based on raycasting)
TODO: VoxelQuest has some innovative voxel rendering, check it out (https://www.voxelquest.com/news/how-does-voxel-quest-work-now-august-2015-update)
3D Rendering Basics For Nubs
If you're a complete noob and are asking what the essence of 3D is or just how to render simple 3Dish pictures for your game without needing a PhD, here's the very basics. Yes, you can use some 3D engine such as Godot that has all the 3D rendering preprogrammed, but you you'll surrender to bloat, you won't really know what's going on and your ability to tinker with the rendering or optimizing it will be basically zero... AND you'll miss on all the fun :) So here we just foreshadow some concepts you should start with if you want to program your own 3D rendering.
The absolute basic thing in 3D is probably perspective, or the concept which says that "things further away look smaller". This is basically the number one thing you need to know and with which you can make simple 3D pictures, even though there are many more effects and concepts that "make pictures look 3D" and which you can potentially study later (lighting, shadows, focus and blur, stereoscopy, parallax, visibility/obstruction etc.). { It's probably possible to make something akin "3D" even without perspective, just with orthographic projection, but that's just getting to details now. Let's just suppose we need perspective. ~drummyfish }
If you don't have rotating camera and other fancy things, perspective is actually mathematically very simple, you basically just divide the object's size by its distance from the viewer, i.e. its Z coordinate (you may divide by some multiple of Z coordinate, e.g. by 2 * Z to get different field of view) -- the further away it is, the bigger number its size gets divided by so the smaller it becomes. This "dividing by distance" ultimately applies to all distances, so in the end even the details on the object get scaled according to their individual distance, but as a first approximation you may just consider scaling objects as a whole. Just keep in mind you should only draw objects whose Z coordinate is above some threshold (usually called a near plane) so that you don't divide by 0! With this "dividing by distance" trick you can make an extremely simple "3Dish" renderer that just draws sprites on the screen and scales them according to the perspective rules (e.g. some space simulator where the sprites are balls representing planets). There is one more thing you'll need to handle: visibility, i.e. nearer objects have to cover the further away objects -- you can do this by simply sorting the objects by distance and drawing them back-to-front (painter's algorithm).
Here is some "simple" C code that demonstrates perspective and draws a basic animated wireframe cuboid as ASCII in terminal:
#include <stdio.h>
#define SCREEN_W 50 // ASCII screen width
#define SCREEN_H 22 // ASCII screen height
#define LINE_POINTS 64 // how many points for drawing a line
#define FOV 8 // affects "field of view"
#define FRAMES 30 // how many animation frames to draw
char screen[SCREEN_W * SCREEN_H];
void showScreen(void)
{
for (int y = 0; y < SCREEN_H; ++y)
{
for (int x = 0; x < SCREEN_W; ++x)
putchar(screen[y * SCREEN_W + x]);
putchar('\n');
}
}
void clearScreen(void)
{
for (int i = 0; i < SCREEN_W * SCREEN_H; ++i)
screen[i] = ' ';
}
// Draws point to 2D ASCII screen, [0,0] means center.
int drawPoint2D(int x, int y, char c)
{
x = SCREEN_W / 2 + x;
y = SCREEN_H / 2 + y;
if (x >= 0 && x < SCREEN_W && y >= 0 && y <= SCREEN_H)
screen[y * SCREEN_W + x] = c;
}
// Divides coord. by distance taking "FOV" into account => perspective.
int perspective(int coord, int distance)
{
return (FOV * coord) / distance;
}
void drawPoint3D(int x, int y, int z, char c)
{
if (z <= 0)
return; // at or beyond camera, don't draw
drawPoint2D(perspective(x,z),perspective(y,z),c);
}
int interpolate(int a, int b, int n)
{
return a + ((b - a) * n) / LINE_POINTS;
}
void drawLine3D(int x1, int y1, int z1, int x2, int y2, int z2, char c)
{
for (int i = 0; i < LINE_POINTS; ++i) // draw a few points to form a line
drawPoint3D(interpolate(x1,x2,i),interpolate(y1,y2,i),interpolate(z1,z2,i),c);
}
int main(void)
{
int shiftX, shiftY, shiftZ;
#define N 12 // side length
#define C '*'
// cuboid points:
// X Y Z
#define PA -2 * N + shiftX, N + shiftY, N + shiftZ
#define PB 2 * N + shiftX, N + shiftY, N + shiftZ
#define PC 2 * N + shiftX, N + shiftY, 2 * N + shiftZ
#define PD -2 * N + shiftX, N + shiftY, 2 * N + shiftZ
#define PE -2 * N + shiftX, -N + shiftY, N + shiftZ
#define PF 2 * N + shiftX, -N + shiftY, N + shiftZ
#define PG 2 * N + shiftX, -N + shiftY, 2 * N + shiftZ
#define PH -2 * N + shiftX, -N + shiftY, 2 * N + shiftZ
for (int i = 0; i < FRAMES; ++i) // render animation
{
clearScreen();
shiftX = -N + (i * 4 * N) / FRAMES; // animate
shiftY = -N / 3 + (i * N) / FRAMES;
shiftZ = 0;
// bottom:
drawLine3D(PA,PB,C); drawLine3D(PB,PC,C); drawLine3D(PC,PD,C); drawLine3D(PD,PA,C);
// top:
drawLine3D(PE,PF,C); drawLine3D(PF,PG,C); drawLine3D(PG,PH,C); drawLine3D(PH,PE,C);
// sides:
drawLine3D(PA,PE,C); drawLine3D(PB,PF,C); drawLine3D(PC,PG,C); drawLine3D(PD,PH,C);
drawPoint3D(PA,'A'); drawPoint3D(PB,'B'); // corners
drawPoint3D(PC,'C'); drawPoint3D(PD,'D');
drawPoint3D(PE,'E'); drawPoint3D(PF,'F');
drawPoint3D(PG,'G'); drawPoint3D(PH,'H');
showScreen();
puts("press key to animate");
getchar();
}
return 0;
}
One frame of the animation should look like this:
E*******************************F
* * *** *
* ** *** *
* H***************G* *
* * * *
* * * *
* * * *
* * * *
* * * *
* * * *
* D***************C *
* ** *** *
* * * *
* * ** *
*** * * *
A*******************************B
press key to animate
Mainstream Realtime 3D
You may have come here just to learn about the typical realtime 3D rendering used in today's games because aside from research and niche areas this kind of 3D is what we normally deal with in practice. This is what this section is about.
Nowadays "game 3D" means a GPU accelerated 3D rasterization done with rendering APIs such as OpenGL, Vulkan, Direct3D or Metal (the last two being proprietary and therefore shit) and higher level engines above them, e.g. Godot, OpenSceneGraph etc. The methods seem to be evolving to some kind of rasterization/pathtracing hybrid, but rasterization is still the basis.
This mainstream rendering uses an object order approach (it blits 3D objects onto the screen rather than determining each pixel's color separately) and works on the principle of triangle rasterization, i.e. 3D models are composed of triangles (or higher polygons which are however eventually broken down into triangles) and these triangles are projected onto the screen according to the position of the virtual camera and laws of perspective. Projecting the triangles means finding the 2D screen coordinates of each of the triangle's three vertices -- once we have thee coordinates, we draw (rasterize) the triangle to the screen just as a "normal" 2D triangle (well, with some asterisks).
Furthermore things such as z-buffering (for determining correct overlap of triangles) and double buffering are used, which makes this approach very memory (RAM/VRAM) expensive -- of course mainstream computers have more than enough memory but smaller computers (e.g. embedded) may suffer and be unable to handle this kind of rendering. Thankfully it is possible to adapt and imitate this kind of rendering even on "small" computers -- even those that don't have a GPU, i.e. with pure software rendering. For this we e.g. replace z-buffering with painter's algorithm (triangle sorting), drop features like perspective correction, MIP mapping etc. (of course quality of the output will go down).
Also additionally there's a lot of bloat added in such as complex screen space shaders, pathtracing (popularly known as raytracing), megatexturing, shadow rendering, postprocessing, compute shaders etc. This may make it difficult to get into "modern" 3D rendering. Remember to keep it simple.
On PCs the whole rendering process is hardware-accelerated with a GPU (graphics card). GPU is a special hardware capable of performing many operations in parallel (as opposed to a CPU which mostly computes sequentially with low level of parallelism) -- this is ideal for graphics because we can for example perform mapping and drawing of many triangles at once, greatly increasing the speed of rendering (FPS). However this hugely increases the complexity of the whole rendering system, we have to have a special API and drivers for communication with the GPU and we have to upload data (3D models, textures, ...) to the GPU before we want to render them. Debugging gets a lot more difficult. So again, this is bloat, consider avoiding GPUs.
GPUs nowadays are no longer just focusing on graphics, but are kind of a general device that can be used for more than just 3D rendering (e.g. crypto mining, training AI etc.) and can no longer even perform 3D rendering completely by themselves -- for this they have to be programmed. I.e. if we want to use a GPU for rendering, not only do we need a GPU but also some extra code. This code is provided by "systems" such as OpenGL or Vulkan which consist of an API (an interface we use from a programming language) and the underlying implementation in a form of a driver (e.g. Mesa3D). Any such rendering system has its own architecture and details of how it works, so we have to study it a bit if we want to use it.
The important part of a system such as OpenGL is its rendering pipeline. Pipeline is the "path" through which data go through the rendering process. Each rendering system and even potentially each of its version may have a slightly different pipeline (but generally all mainstream pipelines somehow achieve rasterizing triangles, the difference is in details of how they achieve it). The pipeline consists of stages that follow one after another (e.g. the mentioned mapping of vertices and drawing of triangles constitute separate stages). A very important fact is that some (not all) of these stages are programmable with so called shaders. A shader is a program written in a special language (e.g. GLSL for OpenGL) running on the GPU that processes the data in some stage of the pipeline (therefore we distinguish different types of shaders based on at which part of the pipeline they reside). In early GPUs stages were not programmable but they became so as to give a greater flexibility -- shaders allow us to implement all kinds of effects that would otherwise be impossible.
Let's see what a typical pipeline might look like, similarly to something we might see e.g. in OpenGL. We normally simulate such a pipeline also in software renderers. Note that the details such as the coordinate system handedness and presence, order, naming or programmability of different stages will differ in any particular pipeline, this is just one possible scenario:
- Vertex data (e.g. 3D model space coordinates of triangle vertices of a 3D model) are taken from a vertex buffer (a GPU memory to which the data have been uploaded).
- Stage: vertex shader: Each vertex is processed with a vertex shader, i.e. one vertex goes into the shader and one vertex (processed) goes out. Here the shader typically maps the vertex 3D coordinates to the screen 2D coordinates (or normalized device coordinates) by:
- multiplying the vertex by a model matrix (transforms from model space to world space, i.e. applies the model move/rotate/scale operations)
- multiplying by view matrix (transforms from world space to camera space, i.e. takes into account camera position and rotation)
- multiplying by projection matrix (applies perspective, transforms from camera space to screen space in homogeneous coordinates)
- Possible optional stages that follow are tessellation and geometry processing (tessellation shaders and geometry shader). These offer possibility of advanced vertex processing (e.g. generation of extra vertices which vertex shaders are unable to do).
- Stage: vertex post processing: Usually not programmable (no shaders here). Here the GPU does things such as clipping (handling vertices outside the screen space), primitive assembly and perspective divide (transforming from [homogeneous coordinates](homogeneous coordinates.md) to traditional cartesian coordinates).
- Stage: rasterization: Usually not programmable, the GPU here turns triangles into actual pixels (or fragments), possibly applying backface culling, perspective correction and things like stencil test and depth test (even though if fragment shaders are allowed to modify depth, this may be postpones to later).
- Stage: pixel/fragment processing: Each pixel (fragment) produced by rasterization is processed here by a pixel/fragment shader. The shader is passed the pixel/fragment along with its coordinates, depth and possibly other attributes, and outputs a processed pixel/fragment with a specific color. Typically here we perform shading and texturing (pixel/fragment shaders can access texture data which are again stored in texture buffers on the GPU).
- Now the pixels are written to the output buffer which will be shown on screen. This can potentially be preceded by other operations such as depth tests, as mentioned above.
TODO: example of specific data going through the pipeline