less_retarded_wiki/network.md

188 lines
15 KiB
Markdown
Raw Permalink Normal View History

2024-05-14 13:35:32 +02:00
# Network
2024-06-10 10:07:07 +02:00
WORK IN PROGRESS
*See also [Internet](internet.md).*
Computer network is a set of multiple [computers](computer.md) that are interconnected and can communicate with each other. This allows the computers to share [information](information.md), collaborate on calculations, back up and mirror each other's data, allow people to communicate over large distances and so on. The largest and most famous one is the [Internet](internet.md) but indeed it's not the only one, there exist many local networks ([LAN](lan.md)s), community networks, large networks separate from the Internet (isolated army networks, [North Korea's intranet](kwangmyong.md), ...), virtual networks and so on -- these networks may differ greatly in many aspects, be it by their basic topology (which nodes are connected to which), protocols (languages the computers use to communicate), speed (latency and bandwidth), reliability, accessibility, usage policies and so on.
From a mathematical point of view we tend to see a network as a **[graph](graph.md)**, so we usually call the computers in the network **nodes**.
2024-05-14 13:35:32 +02:00
TODO
2024-06-10 10:07:07 +02:00
## Basic Concepts
Networks are hugely complicated, we can only give a very fast overview here. Hopefully it can be a good starting point. (However bear in mind that networking can also be done in a [KISS](kiss.md) way too, especially if you're for example just letting two devices communicate. Always think about the problem at hand.)
One of the very basic concepts is that of a **[protocol](protocol.md)** -- basically the language and rules that the computers will use for the communication. Computers connected in the network may be quite different, the may run different [operating systems](operating_system.md), programs, have different [hardware](hw.md) -- this is all fine as long as they use the same protocol for the communication. A protocol specifies how communication is established, what formats the data will be sent in, what happens if someone is not responding etc. Examples of protocols are [IP](ip.md), [TCP](tcp.md), [UDP](udp.md), [ICMP](icmp.md), [HTTP](http.md) and many others.
Oftentimes we will talk about network parameters such as **latency** (also sometimes called *ping* -- time it takes a message to delivery to its destination), **throughput** (also called *bandwidth* -- how much data over time the network can transfer, measured in [bits](bit.md) per second), reliability, stability etc. Networks also have different **topologies** -- topology say how the nodes are interconnected, for example a fully connected network has every node (computer) connected to every other node directly (faster, more reliable, more efficient, but more expensive, complex, ...), a ring basically forms a circle of the nodes (each one is connected to two neighbors), a start has one central node to which all other nodes are connected etc. Choosing specific topology depends on situation.
For computer networks the concept of **packet switching** is very important -- packet switching is a way of delivering messages by splitting them into small **[packets](packet.md)** of data, assigning each packet metadata such as its number and destination address, then releasing all them all into the network, letting them find their ways to the destination (potentially through different paths) and then, once they all arrive, assembling them back to the original message. This is basically the invention of the Internet, it is contrasted with the originally used way of so called **circuit switching** in which a circuit was established between any nodes that wanted to communicate to basically allow them direct communication over a constant path (similarly to how phone networks worked: you would first call a telephone exchange, say to whom you wanted to talk and the lady would directly connect the cables so that you could talk to that guy). Packet switching may seems like an overcomplicated way of networking (for example packets may arrive in wrong order, they may get lost, we are also sending extra data in the packet headers etc.), but at bigger scales it really makes communication more efficient, decentralized and reliable (if some path in the network gets destroyed, the networks still keeps working). Even non-Internet networks now work on this principle, any computer network nowadays basically copies this mechanism and even uses the same protocols etc., so in networking we'll just be encountering packets everywhere.
Another important concept is that of **network layers**. Unless we are dealing with a very simple 1-to-1 communication, we inevitably get a lot of complexity -- a message has to be chopped into packets, each of which will potentially travel through the network by different paths and some may even get lost; we have to ensure not only their fast and correct delivery between individuals neighboring nodes (some of which communicate over electrical cables, some through optical cables, some through air, ...) but that their correct routing/forwarding (i.e. that they are being pushed in the direction of their destination) and that they arrive in correct order and without errors (cause e.g. by noise). So this process is split into parts or layers, each one creating an [abstraction](abstraction.md) over certain part of this delivery -- each layer then has its own protocols, addressing and so on. Exactly which layers there are and what they are called is a matter of design and convention, it depends on what standard we use, but generally the layers are ordered from lowest (which ensure delivery between neighboring nodes) through middle (which ensure correct delivery over the whole network) to highest (which are concerned with how specific programs talk to each other). This is often compared to how post office works, i.e. how paper letter are delivered -- the highest level layer is just concerned with what human language the letter is written in and which men lead the communication, the lower levels are concerned with wrapping the letter in an envelope and putting an address and postal code on it, yet lower levels then try to deliver this to the local post office reliably, using whatever means are deemed best (cars, planes, ships, ...), and finally at the lowest level are the mailmen who deliver the letters to the house, again choosing the best way of doing so (walking, riding a bike, finding the shortest paths, ...). The problem of delivery is simplified by the fact that one layer doesn't have to care about the internal details of another layer, i.e. for example a man writing a letter is only concerned about passing the letter to the layer below (putting correct information on the envelope), he doesn't care at all if it will then be delivered by a truck or plane, through which cities it will fly, if it will eventually be delivered by a man or woman etc. Now two of the biggest standards for network layers are TCP/IP and OSI. The OSI model is more general, it defined 7 layers (application, presentation, session, transport, network, data link, physical -- also shortened to L7 through L1) and can be used for anything we could remotely call a network. TCP/IP is a bit simpler and is used for the Internet -- let's take a look at the TCP/IP layers (each one maps more or less to one or more OSI layers):
| layer | task | addressing | protocol examples |
| ----------------- | -------------------------------------------------------- | ---------------------- | ------------------- |
| Application layer | Communicate data (text or bin.) between programs. | URL, email addr., ... | HTTP, FPT, DNS, ... |
| Transport layer | Break data into packets, potentially ensure reliability. |IP addr. + port + proto | TCP, UDP, ... |
| Internet layer | Deliver packet from node A to node B. | IP address | IPv4, IPv6, ... |
| Link layer | Deliver bits of data between two neighoring nodes. | MAC address | Ethernet, Wifi, ... |
Now please keep in mind this separation into layers doesn't always have to be 100% respected, for example while on the application layer level we prefer "nice addresses" such as those used in email, we may sometimes resort to specifying raw IP addresses and ports too. Sometimes very specialized applications (e.g. some games that need to minimize latency) may decide to implement their own level of reliable delivery on application level, ignoring this potential service of transport layer. There may also appear protocols that span several layer or lie somewhere in between etc.
**[Routing](routing.md)** is an important problem to solve in networking -- basically it means finding an [algorithm](algorithm.md) of finding delivery paths in the network, usually in a distributed way, i.e. we are trying to make it so that if some node in the network sends a packet to some other node (identified by its address), all other nodes will know what to do and how to efficiently get it there, i.e. every node should know whom to hand the packet over just from seeing its address. This is not trivial. Nodes usually maintain and update routing tables, i.e. they keep records of "which direction" various addresses lie in, but the situation is complicated by the fact that they practically can't record every single address (there are many of them and they change quickly) and also the routes on the Internet constantly change (some stop working, some get slow by higher traffic, new ones emerge etc.). **Forwarding** is related to routing, it is the process of moving data from the router's input to the correct output (while routing generally refers to the whole larger process of finding the whole path).
With network programs/systems we talk about **architectures** -- there are two main types: **client/server** and **peer to peer** (P2P). Client server means there is one special, central computer (with usually quite powerful hardware) called server that offers services to many clients (other computers in the network) -- clients connect to the server and ask the server to do something for them (e.g. send them a website, store some files to them, fetch emails and so on); in this model even if clients communicate between themselves they communicate through the server, i.e. the server is very stressed and it's a weak point of the system, but it can also possibly better control and coordinate what's going on (for example it can try to prevent [cheating](cheating.md) in games). Peer to peer architecture means that all participants are equal ("peers"): none of them is central, none of them has greater authority, they all run the same software and generally any of the peers can talk between themselves directly. Again, choice of architecture depends on our many things, we can't say one is inherently better than the other, but among freedom proponents P2P is usually favored for its anarchist, decentralized and more robust nature -- it is harder to censor or take down a P2P network.
TODO: subnetwork, sockets, reliability, addresses, ports, NAT, ...
2024-05-14 13:35:32 +02:00
## Code Examples
First let's try writing some **UDP** C program under [Unix](unix.md). Remember that UDP is the unreliable protocol, so it's possible our messages may get lost or distorted, but in programs that can handle some losses this is the faster and more KISS way. Our program will be peer-to-peer, it will create two sockets, one listening and one sending. It will make a few message exchange turns, in each turn it will send something to its partner, it will check if it itself got any message and then will wait for some time before the next round. Note that we will use a non-blocking receiving socket, i.e. checking if we have any messages won't pause our program if there is nothing to be received, we'll simply move on if there is nothing (that's how realtime games may do it, but other kinds of server may rather a use blocking socket if they intend to do nothing while waiting for a message). Also pay attention to the fact that the program will choose its port number based on a one letter "name" we give to the program -- this is so that if we test the programs on the same computer (where both will have the same IP address), they will choose different ports (different processes on the same computer cannot of course use the same port).
```
#include <stdio.h>
2024-05-28 21:17:58 +02:00
#include <stdlib.h> // for exit
2024-05-14 13:35:32 +02:00
#include <unistd.h> // for sleep
#include <arpa/inet.h>
#include <sys/socket.h>
#define BUFFER_LEN 8
#define PORT_BASE 1230
// run as ./program partner_addr partner_letter my_letter
char buffer[BUFFER_LEN + 1]; // extra space for zero terminator
char name; // name of this agent (single char)
2024-05-28 21:17:58 +02:00
int sock = -1; // socket, for both sending and receiving
2024-05-14 13:35:32 +02:00
void error(const char *msg)
{
printf("%c: ERROR, %s\n",name,msg);
2024-05-28 21:17:58 +02:00
if (sock >= 0)
close(sock);
2024-05-14 13:35:32 +02:00
exit(1);
}
int main(int argc, char **argv)
{
if (argc < 4)
error("give me correct arguments bitch");
name = argv[3][0];
char *addrStrDst = argv[1];
int portSrc = PORT_BASE + name, // different name => different port
portDst = PORT_BASE + argv[2][0];
struct sockaddr_in addrSrc, addrDst;
2024-05-28 21:17:58 +02:00
sock = socket(AF_INET,SOCK_DGRAM | SOCK_NONBLOCK,IPPROTO_UDP);
2024-05-14 13:35:32 +02:00
2024-05-28 21:17:58 +02:00
if (sock < 0)
error("couldn't create socket");
2024-05-14 13:35:32 +02:00
addrSrc.sin_family = AF_INET;
2024-05-28 21:17:58 +02:00
addrSrc.sin_port = htons(portSrc); // convert port to netw. endianness
2024-05-14 13:35:32 +02:00
addrSrc.sin_addr.s_addr = htonl(INADDR_ANY);
2024-05-28 21:17:58 +02:00
if (bind(sock,(struct sockaddr *) &addrSrc,sizeof(addrSrc)) < 0)
error("couldn't bind socket");
2024-05-14 13:35:32 +02:00
addrDst.sin_family = AF_INET;
addrDst.sin_port = htons(portDst);
2024-05-28 21:17:58 +02:00
if (inet_aton(addrStrDst,&addrDst.sin_addr) == 0)
2024-05-14 13:35:32 +02:00
error("couldn't translate address");
printf("%c: My name is %c, listening on port %d, "
"gonna talk to %c (address %s, port %d).\n",
name,name,portSrc,argv[2][0],addrStrDst,portDst);
for (int i = 0; i < 4; ++i)
{
printf("%c: Checking messages...\n",name);
2024-05-28 21:17:58 +02:00
int len = recv(sock,buffer,BUFFER_LEN,0);
2024-05-14 13:35:32 +02:00
if (len > 0)
{
buffer[len] = 0;
printf("%c: Got \"%s\"\n",name,buffer);
}
else
printf("%c: Nothing.\n",name);
for (int j = 0; j < BUFFER_LEN; ++j) // make some gibberish message
buffer[j] = 'a' + (name + i * 3 + j * 2) % 26;
printf("%c: Sending \"%s\"\n",name,buffer);
2024-05-28 21:17:58 +02:00
if (sendto(/*sockOut*/sock,buffer,BUFFER_LEN,0,
2024-05-14 13:35:32 +02:00
(struct sockaddr *) &addrDst,sizeof(addrDst)) < 0)
printf("%c: Couldn't send it!\n",name);
printf("%c: Waiting...\n",name);
usleep(2000000);
}
printf("%c: That's enough, bye.\n",name);
2024-05-28 21:17:58 +02:00
close(sock);
2024-05-14 13:35:32 +02:00
return 0;
}
```
We can test this for example like this:
```
./program 127.0.0.1 A B & { sleep 1; ./program 127.0.0.1 B A; } &
```
Which may print out something like this:
```
B: My name is B, listening on port 1296, gonna talk to A (address 127.0.0.1, port 1295).
B: Checking messages...
B: Nothing.
B: Sending "oqsuwyac"
B: Waiting...
A: My name is A, listening on port 1295, gonna talk to B (address 127.0.0.1, port 1296).
A: Checking messages...
A: Nothing.
A: Sending "nprtvxzb"
A: Waiting...
B: Checking messages...
B: Got "nprtvxzb"
B: Sending "rtvxzbdf"
B: Waiting...
A: Checking messages...
A: Got "rtvxzbdf"
A: Sending "qsuwyace"
A: Waiting...
B: Checking messages...
B: Got "qsuwyace"
B: Sending "uwyacegi"
B: Waiting...
A: Checking messages...
A: Got "uwyacegi"
A: Sending "tvxzbdfh"
A: Waiting...
B: Checking messages...
B: Got "tvxzbdfh"
B: Sending "xzbdfhjl"
B: Waiting...
A: Checking messages...
A: Got "xzbdfhjl"
A: Sending "wyacegik"
A: Waiting...
B: That's enough, bye.
A: That's enough, bye.
```
2024-06-10 10:07:07 +02:00
TODO: TCP