Yesterday I described the minimal commands for a TCP server. But that server can only serve one client at a time! It does some work with one TCP connection, then closes it and deals with the next TCP connection, etc. This is not how most servers work; the clients expect to be able to talk to the server regardless of what other clients are around.
The reason the server can only handle one connection at a time is that the process blocks waiting for a single kind of event. If the process calls
accept, the process blocks waiting for a new TCP connection, and nothing else will wake it up. If the process calls
recv(fd), the process blocks waiting for some bytes sent on that existing TCP connection, and nothing else will wake it up.
To solve this, the process needs to say, “OS, please put me to sleep and wake me up when something interesting happens”. In our case, “something interesting” would be a new TCP connection or some bytes sent on any existing TCP connection.
The old-school UNIX way to say this is the
select syscall. Roughly, we call
ready_fds = select(fds), which means “OS, please put me to sleep, and when something happens on a file descriptor in
fds, wake me up and tell me which file descriptors are ready”. Here, “ready” means “you can call a blocking syscall on it, but it won’t block”. If the file descriptor is linked to a TCP listening port, you can call
accept on it, and it won’t block. Thus, ““ready” for a TCP listening port means “there is a client waiting to open a TCP connection”. If the file descriptor is linked to a TCP connection, you can call
recv on it, and it won’t block. Thus, “ready” for a TCP connection means “there are some bytes in the buffer waiting to be read”.
select call, the process can decide which file descriptors to call
recv on. Because neither
recv will block, the server process can deal with clients in a timely way.
fds argument is an
fd_set is an array of bits, each of which corresponds to a file descriptor. There are
FD_SETSIZE bits, and on my machine,
FD_SETSIZE is 1024. Thus, we are limited to 1024 file descriptors, which means ~1000 connected clients.
First wrinkle: the syscall doesn’t return a new file descriptor set; it overwrites the one passed in. So we must track which file descriptors we have elsewhere, then copy them to an
fdset before calling
select, and after
select returns, we must iterate over the same
fdset to find out which file descriptors are “ready”.
To work with
fdsets, we use the functions
FD_ZERO (the empty set),
FD_SET (add to set),
FD_CLR (remove from set), and
FD_ISSET (test set membership). A convenience
FD_COPY copies one set to another. (They’re actually macros; we could see in future how they are implemented.)
Second wrinkle: you actually pass in three
fdset* arguments, not one. The first is for “read” operations, the second for “write” operations, the third for “exceptional” conditions. Thus,
select(readfds, writefds, errorfds) means “OS, please put me to sleep, and wake me up when a file descriptor in
readfds is ready for reading, or when a file descriptor in
writefds is ready for writing, or when a file descriptor in
errorfds has some exceptional condition.”
This is a mouthful, and it’s not totally clear what it means. What is “ready for writing”? As always, it depends on the resource that the file descriptor is linked to. If it’s a TCP connection, it means there is space in the TCP outbound buffer. If it’s a a TCP listening socket, I don’t know.
If we’re not interested in some of these sets, e.g. we’re not interested in write-readiness, we can pass
NULL as the argument, and the process will not be notified of write-readiness.
The third wrinkle is that the process can pass another argument, a timeout. If nothing happens in that time, the call will unblock the process, telling it that nothing happened.
I wrote this because I felt like it. This post is my own, and not associated with my employer.Jim. Friends. Vidrio.