![]()
|
Programming in Python with Medusa and the Async Sockets LibraryIntroductionWhy Asynchronous?There are only two ways to have a program on a single processor do 'more than one thing at a time'. Multi-threaded programming is the simplest and most popular way to do it, but there is another very different technique, that lets you have nearly all the advantages of multi-threading, without actually using multiple threads. It's really only practical if your program is I/O bound (I/O is the principle bottleneck). If your program is CPU bound, then pre-emptive scheduled threads are probably what you really need. Network servers are rarely CPU-bound, however.
If your operating system supports the Select-based multiplexing in the real worldSeveral well-known Web servers (and other programs) are written using exactly this technique: the thttpd and Zeus, and Squid Internet Object Cache servers are excellent examples.. The InterNet News server (INN) used this technique for several years before the web exploded. An interesting web server comparison chart is available at the thttpd web site
Variations on a Theme: poll() and WaitForMultipleObjects
Of similar (but better) design is the
In the Windows world, the Win32 API provides a bewildering array
of features for multiplexing. Although slightly different in
semantics, the combination of Event objects and the
select()
Here's what
So that leaves only two types of events to build our programs around; read events and write events. As it turns out, this is actually enough to get by with, because other types of events can be implied by the sequencing of these two. It also keeps the low-level interface as simple as possible - always a good thing in my book. The polling loop
Now that you know what Here is a pseudo-code example of a polling loop: while (any_descriptors_left): events = select (descriptors, timeout) for event in events: handle_event (event)
If you take a look at the code used by the library, it looks
very similar to this. (see the file The CodeBlocking vs. Non-BlockingFile descriptors can be in either blocking or non-blocking mode. A descriptor in blocking mode will stop (or 'block') your entire program until the requested event takes place. For example, if you ask to read 64 bytes from a descriptor attached to a socket which is ultimately connected to a modem deep in the backwaters of the Internet, you may wait a while for those 64 bytes.
If you put the descriptor in non-blocking mode, then one of two
things might happen: if the data is sitting in a local buffer,
it will be returned to you immediately; otherwise you will get
back a code (usually
sockets vs. other kinds of descriptors
Although most of our discussion will be about TCP/IP sockets, on
Unix you can use The socket_map
We use a global dictionary ( asyncore.dispatcher
The first class we'll introduce you to is the
The direct interface between the select loop and the socket object
are the
The firing of these low-level events can tell us whether certain
higher-level events have taken place, depending on the timing
and state of the connection. For example, if we have asked for
a socket to connect to another host, we know that the connection
has been made when the socket fires a write event (at this point
you know that you may write to it with the expectation of
success).
Thus, the set of user-level events is a little larger than simply
A quick terminology note: In order to distinguish between low-level socket objects and those based on the async library classes, I call these higher-level objects channels. Enough Gibberish, let's write some codeOk, that's enough abstract talk. Let's do something useful and concrete with this stuff. We'll write a simple HTTP client that demonstrates how easy it is to build a powerful tool in only a few lines of code. # -*- Mode: Python; tab-width: 4 -*- import asyncore import socket import string class http_client (asyncore.dispatcher): def __init__ (self, host, path): asyncore.dispatcher.__init__ (self) self.path = path self.create_socket (socket.AF_INET, socket.SOCK_STREAM) self.connect ((host, 80)) def handle_connect (self): self.send ('GET %s HTTP/1.0\r\n\r\n' % self.path) def handle_read (self): data = self.recv (8192) print data def handle_write (self): pass if __name__ == '__main__': import sys import urlparse for url in sys.argv[1:]: parts = urlparse.urlparse (url) if parts[0] != 'http': raise ValueError, "HTTP URL's only, please" else: host = parts[1] path = parts[2] http_client (host, path) asyncore.loop()
HTTP is (in theory, at least) a very simple protocol. You connect to the
web server, send the string
We have defined a single new class,
Go ahead and run this demo - giving a single URL as an argument, like this:
You should see something like this:
[rushing@gnome demo]$ python asynhttp.py http://www.nightmare.com/ log: adding channel <http_client at 80ef3e8> HTTP/1.0 200 OK Server: Medusa/3.19 Content-Type: text/html Content-Length: 1649 Last-Modified: Sun, 26 Jul 1998 23:57:51 GMT Date: Sat, 16 Jan 1999 13:04:30 GMT [... body of the file ...] log: unhandled close event log: closing channel 4:<http_client connected at 80ef3e8>
The 'log' messages are there to help, they are useful when
debugging but you will want to disable them later. The first log message
tells you that a new Now at this point we haven't seen anything revolutionary, but that's because we've only looked at one URL. Go ahead and add a few other URL's to the argument list; as many as you like - and make sure they're on different hosts...
Now you begin to see why
A really good way to understand [...] (r,w,e) = select.select (r,w,e, timeout) print '---' print 'read', r print 'write', w [...] Each time through the loop you will see which channels have fired which events. If you haven't skipped ahead, you'll also notice a pointless barrage of events, with all your http_client objects in the 'writable' set. This is because we were a bit lazy earlier; sweeping some ugliness under the rug. Let's fix that now. Buffered Output
In our class http_client (asyncore.dispatcher): def __init__ (self, host, path): asyncore.dispatcher.__init__ (self) self.path = path self.create_socket (socket.AF_INET, socket.SOCK_STREAM) self.connect ((host, 80)) self.buffer = 'GET %s HTTP/1.0\r\n\r\n' % self.path def handle_connect (self): pass def handle_read (self): data = self.recv (8192) print data def writable (self): return (len(self.buffer) > 0) def handle_write (self): sent = self.send (self.buffer) self.buffer = self.buffer[sent:]
The
We also introduce the
If you try the client now (with the print statements in
asynchat.pyThe dispatcher class is useful, but somewhat limited in capability. As you might guess, managing input and output buffers manually can get complex, especially if you're working with a protocol more complicated than HTTP.
The There are four new methods to introduce:
These methods build on the underlying capabilities of
The implementation of A Proxy ServerIn order to demonstrate theasync_chat class, we will
put together a simple proxy server. A proxy server combines a server
and a client together, in effect sitting between the real server and
client. You can use this to monitor or debug protocol traffic.
# -*- Mode: Python; tab-width: 4 -*- import asynchat import asyncore import socket import string class proxy_server (asyncore.dispatcher): def __init__ (self, host, port): asyncore.dispatcher.__init__ (self) self.create_socket (socket.AF_INET, socket.SOCK_STREAM) self.set_reuse_addr() self.there = (host, port) here = ('', port + 8000) self.bind (here) self.listen (5) def handle_accept (self): proxy_receiver (self, self.accept()) class proxy_sender (asynchat.async_chat): def __init__ (self, receiver, address): asynchat.async_chat.__init__ (self) self.receiver = receiver self.set_terminator (None) self.create_socket (socket.AF_INET, socket.SOCK_STREAM) self.buffer = '' self.set_terminator ('\n') self.connect (address) def handle_connect (self): print 'Connected' def collect_incoming_data (self, data): self.buffer = self.buffer + data def found_terminator (self): data = self.buffer self.buffer = '' print '==> (%d) %s' % (self.id, repr(data)) self.receiver.push (data + '\n') def handle_close (self): self.receiver.close() self.close() class proxy_receiver (asynchat.async_chat): channel_counter = 0 def __init__ (self, server, (conn, addr)): asynchat.async_chat.__init__ (self, conn) self.set_terminator ('\n') self.server = server self.id = self.channel_counter self.channel_counter = self.channel_counter + 1 self.sender = proxy_sender (self, server.there) self.sender.id = self.id self.buffer = '' def collect_incoming_data (self, data): self.buffer = self.buffer + data def found_terminator (self): data = self.buffer self.buffer = '' print '<== (%d) %s' % (self.id, repr(data)) self.sender.push (data + '\n') def handle_close (self): print 'Closing' self.sender.close() self.close() if __name__ == '__main__': import sys import string if len(sys.argv) < 3: print 'Usage: %s <server-host> <server-port>' % sys.argv[0] else: ps = proxy_server (sys.argv[1], string.atoi (sys.argv[2])) asyncore.loop() To try out the proxy, find a server (any SMTP, NNTP, or HTTP server should do fine), and give its hostname and port as arguments: python proxy.py localhost 25
The proxy server will start up its server on port PipeliningPipelining refers to a protocol capability. Normally, a conversation with a server has a back-and-forth quality to it. The client sends a command, and waits for the response. If a client needs to send many commands over a high-latency connection, waiting for each response can take a long time.
For example, when sending a mail message to many recipients with
SMTP, the client will send a series of I have a favorite visual when explaining the advantages of pipelining. Imagine each request to the server is a boxcar on a train. The client is in Los Angeles, and the server is in New York. Pipelining lets you hook all your cars in one long chain; send them to New York, where they are filled and sent back to you. Without pipelining you have to send one car at a time. Not all protocols allow pipelining. Not all servers support it; Sendmail, for example, does not support pipelining because it tends to fork unpredictably, leaving buffered data in a questionable state. A recent extension to the SMTP protocol allows a server to specify whether it supports pipelining. HTTP/1.1 explicitly requires that a server support pipelining.
Servers built on top of Producers
A Samual M. Rushing Last modified: Fri Apr 30 21:42:52 PDT 1999 |