Structure of a Simple Distributed Computing SystemThe question came up: how to structure a distributed computing system, such as a distributed ray tracer, or a cryptographic attack program? You could use shell connections to retrieve the data to be worked on. Have the clients SSH to the server and "check out" a chunk of work -- no "custom" programming needed for that part. The only thing you need to worry about is two clients connecting to check out at the same time, so the "check out" piece needs to be synchronized. Clients then ssh back in to "check in" the pieces they complete. The master would likely predict how long it "should" take a client to complete a piece of work, and if the client is way behind, it'd give that piece to someone else as well, and accept the first one to complete; this adds robustness in the face of client disconnection. I think this would be pretty scalable (each client only needs to know about the one server), and actually easy to implement. The components are:
job setup
chunk checkout
chunk checkin
status
client
The hardest part would be implementing the serialization while clients check out or in chunks of work. Specifically, if you just use lock files, and a client dies while in the middle of checking out (or in), then the lock might stay there forever. However, even solving that problem isn't particularly challenging; there are several approaches, from easy (put process ID in lock file and re-validate on lock) to robust (use a separate lock server). |
|