AllegroServe - A Web Application Server
version 1.2.23

copyright(c) 2000-2001. Franz Inc

Table of Contents

Introduction
Running AllegroServe
Starting the Server
  start
Shutting Down the Server
  shutdown
Publishing Information
  publish-file
  publish-directory
  publish

     publish-prefix
     publish-multi
Generating a Computed Response
  with-http-response
  with-http-body
  get-request-body
  header-slot-value
  reply-header-slot-value
  request-query

     request-query-value
Request Object Readers and Accessors
  request-method
  request-uri
  request-protocol
  request-socket
  request-wserver
  request-raw-request
  request-reply-code
  request-reply-date
  request-reply-headers
  request-reply-content-length
  request-reply-plist
  request-reply-strategy
  request-reply-stream

CGI Program Execution
  run-cgi-program
Form Processing
  get-multipart-header
  parse-multipart-header
  get-multipart-sequence

     get-all-multipart-data
     form-urlencoded-to-query
  query-to-form-urlencoded

Authorization
  get-basic-authorization
  set-basic-authorization
  password-authorizer
  location-authorizer

     function-authorizer
Cookies
  set-cookie-header
  get-cookie-values

Variables
     *aserve-version*
  *default-aserve-external-format*
  *http-response-timeout*
  *mime-types*

AllegroServe Request Processing Protocol
  handle-request
  standard-locator
  unpublish-locator
  authorize
  failed-request
  denied-request
  process-entity

Client Functions
  do-http-request
  client-request
  cookie-jar
  make-http-client-request
  read-client-response-headers
  client-request-read-sequence
  client-request-close
  uriencode-string

Proxy
Cache
Request Filters
Virtual Hosts
Timeouts
  wserver-io-timeout
  wserver-response-timeout

Miscellaneous
     ensure-stream-lock
     map-entities
Running AllegroServe as a Service on Windows NT
Using International Characters in AllegroServe
Debugging
     net.aserve::debug-on
  net.aserve::debug-off



Introduction

AllegroServe is a webserver  written at Franz Inc.  AllegroServe is designed to work with the htmlgen system for generating dynamic html, as one of the big advantages of  a web server written in Common Lisp is the ability to generate html dynamically.  In this document we'll consider the web server and dynamic html generation to be parts of the same product.

The design goals of AllegroServe are:

 

Running AllegroServe

Running  AllegroServe requires that you

We mention publish twice to emphasize that you can publish urls before and after you start the server.

 

Starting the server

The function net.aserve:start is used to start the server running.

(start &key port host listeners chunking keep-alive server setuid setgid
            debug proxy proxy-proxy cache restore-cache accept-hook ssl ssl-password
            os-processes external-format)

If no arguments are given then start  starts a multi-threaded web server on port 80, which is the standard web server port.    If you are running this on Unix then you can only allocate port 80 if you are logged in as root or have made Lisp a set-user-id root program.

There are quite a few keyword arguments to start, but in practice you only need be concerned with :port and :listeners.     The arguments have the following meanings:

 

Shutting down the server

(shutdown &key server save-cache)

This shuts down the web server given (or the most recently started web server if no argument is passed for server).  If save-cache is given then it should be the name of a file to which the current state of the proxy cache will be written.   The save-cache file will only contain in-memory information about the cache.  The cache usually consists of disk files as well and in order to maintain the complete state of the cache these files must be saved by the user as well.  The information in the save-cache file refers to the disk cache files so those disk cache files must exist and be in the same state and location should the user choose to restore the state of the cache.

 

Publishing information

Once the server is started it will accept requests from http clients, typically web browsers.   Each request is parsed and then AllegroServe searches for an object to handle that request.   That object is called an entity.  If an entity is found, it is passed the request and is responsible for generating and sending a response to the client.  If an entity can't be found then AllegroServe sends back a response saying that that request was invalid.

Publishing is the process of creating entities and registering them in the tables scanned by AllegroServe after a request is read.

Components of a request

A request from an http client contains a lot of information.  The two items that determine which entity will handle the request are

 

 

A request contains other information and while that information isn't used to determine which entity will handle the request it can be used by the entity handling the request in any way it sees fit.

 

The following functions create entities and specify which requests they will handle.    An entity is distinguished by the path and host values passed to the particular publish function.   When a publish is done for a path and host for which there is already an entity assigned, the old entity is replaced by the new entity.

 

(publish-file &key path host port  file content-type class preload cache-p remove
                   authorizer server timeout plist)

This creates an entity that will return the contents of a file on the disk in response to a request.   The url and file  must be given, the rest of the arguments are optional..  The arguments have these meanings:

The function that handles requests for files will respond correctly to If-Modified-Since header lines and thus minimizes network traffic.

Example:

This will work on Unix where the password file is stored in /etc.

(publish-file :path "/password" :file "/etc/passwd" :content-type "text/plain")

 

 

(publish-directory &key prefix host port destination remove authorizer server
                        indexes filter timeout plist publisher access-file)

publish-directory is used to publish a complete directory tree of files.  This is similar to how web servers such as Apache publish files.   AllegroServe publishes the files in  the directory tree in a lazy manner.   As files in the tree are referenced by client requests entities are created and published. 

publish-directory creates a mapping from all urls whose name begins with prefix to files stored in the directory specified by the destination.   The host, port, remove, authorizer, plist and server arguments are as described above for publish-file.      The timeout argument defaults as described in publish-file.   The access-file argument names the access file name which will be used in this directory tree. When a request comes in for which there isn't an entity that matches it exactly,  AllegroServe checks to see if a prefix of the request has been registered.  If so, and if the resulting entity is a directory-entity as created by this function, then it strips the prefix off the given request and appends the remaining part of the request to the destination string.  It then publishes that (normally using publish-file and computing the content-type from the file type).    Next that file-entity is made to handle the request in the normal manner.

If a request comes that maps to a directory rather than a file then AllegroServe tries to locate an index file for that directory.  The indexes argument specifies a list of index files to search for.  By default the list consists of two filenames "index.html" and "index.htm".

The valueof the filter argument is a function of four values: req ent  filename and inforeq and ent are the request and entity objects that describe the current client request.     filename is the name of a known file on the current machine which  is being requested by the current request.  info is the list of access information for this file.

If the filter returns nil then the normal operation  is done by the directory-entity handler: the selected file is published and then the request to access it processed (and subsequent access using that url will just return the file and never go through the filter again).

If the filter chooses to handle the request for the file itself  it must generate a response to the request and then return a non-nil value.  To avoid subsequent calls to the filter for this file the filter may choose to publish a handler for this url.   If the filter wants to forbid access to this file a handy way to to call (failed-request req) and the standard "404 Not found" will be sent back to the client.

The publisher argument can be used to specify exactly what happens when a request comes that's handled by the directory-entity and a file is located on the disk that matches the incoming url.   Nomally a publish-file is done to add that file.  You may want to publish some other kind of entity to represent that file.  The publisher argument, if non-nil, must be a function of  four arguments:  req ent filename info.  The filename is a string naming the file that's been matched with the request.   info is the list of access information for this file.  The publisher function must return an entity to be processed to send back a response.   The publisher function may wish to publish that entity but it need not do so.

 

Directory Access Files

When files are accessed and automatically published you may wish to set some of the parameters of the entity that is published. As mentioned above you can define a publisher function that has complete control in publishing the entity.  A less powerful but easier to use alternative is to place access files in the directory tree being published.   An access file specifies information that you want passed to the publisher function.  You can modify these access files while the directory tree is published and their latest values will be used for publishing subsequent files.    This is similar to they way Apache controls its publishing with .htaccess files (except that in AllegroServe once a file is published the access files have no effect on it).

The name of an access file in AllegroServe is controlled by the :access-file argument to publish-directory.   We'll assume the name chosen is access.cl in this document.   If no :access-file argument is given to publish-directory then no access file checking is done.  When a file is about to be published all access files from the destination directory all the way down to the directory containing the file to be published are read and used.  For example if the destination in a publish-directory was given as "/home/joe/html/" and an http request comes in which references the file "/home/joe/html/pics/archive/foo.jpg"  then AllegroServe will check for access files at all of these locations and in this order

The information is collected as successive access files are read.  The new information is placed before the existing information thus causing subdirectory   access files to possibly shadow information in access files in directories above it.  Also superdirectory access file information is automatically eliminated if it isn't marked as being inherited

The publisher function receives the collected information and can do with it what it wishes.  We'll describe what the built-in publisher function does with the information.

When we speak of information in access files we are purposely being vague.   We define what information must look like and what the standard publisher function does with certain information but we allow users to define their own kinds of information and use that in their own publisher function.

Each access file consists of zero or more Lisp forms (and possibly lisp style comments).  Each form is  a list beginning with a keyword symbol and then followed by a property-list-like sequence of   keywords and values.   Nothing in the form is evaluated.     The form cannot contain #. or #,. macros.

One  information form is used by AllegroServe's directory publisher code to decide if it's permitted to descend another directory level:

(:subdirectories  :allow allow-list :deny deny-list :inherit inherit-value)

As AllegroServe descends from the destination directory toward the directory containing the file to be accessed it stops at each directory level accumlates the access information and then tests to see if it can descend further based on the :subdirectories information.   If it cannot descend into the next subdirectory it gives up immediately and a 404 - Not Found response is returned.    See the section Allow Deny processing below for a description of how it uses the :allow and :deny values.

These other information forms are used by the standard publisher function.    Each takes an :inherit argument which defaults to false.   Information not given with ::inherit t will be eliminated as AllegroServe descends directory levels.

name args meaning
:ip :patterns
:inherit
specifies a location-authorizer restriction on which machines can see published files.  The value of the :patterns argument has the same form as the :patterns slot of a location-authorizer.
:password :realm
:allowed
:inherit
specifies a password-authorizer restriction on access to published files.   See the password-authorizer documentation for a description of the :realm and :allowed arguments
:files :allow
:deny
:inherit
specifies which files are visible to be published.  To be visible a file must be allowed and not denied.  What is tested is the filename only (that is the part after the last directory separator in the files's complete name).  See below for the rules on how allow and denied is used.
:mime :types
:inherit
specifies which mime types are to be associated with which file types.   This list takes precedence over the built-in list inside AllegroServe.  :types is a list of mime specifiers.   A mime specifier is a list beginning with a string giving the mime type followed by the files types that should map to that mime type.   A file type in a list (e.g. ("ReadMe")) refers to the whole file name rather than the type component.

Allow and Deny Processing

The :files and :subdirectories information are used to determine if a file or subdirectory of a given name is accessible.  AllegroServe will collect all the access file information for the directory containing the file or subdirectory and for all directories above it up to the directory given as the destination argument to publish-directory.  Information from superdirectories will only be used :inherit t is given for that information. 

The rules is it that a given name is accessible if it is allowed and not denied.   That is the filename or directory name must match one of the allow clauses and none of the deny clauses.  There may be multiple allow and deny clauses since there may be multiple information forms of the type :files or :subdirectories.    Each allow or deny argument can be a string or a list of strings or nil (which is the same as that argument not being given).   The strings are regular expressions (which are not exactly like unix shell wildcard filename expressions).   In particular ".*" is the regular expression that matches anything.

The special cases are the following

 

.

Here is a sample access file:

; only connections to localhost will be able to access the files
(:ip :patterns ((:accept "127.1") :deny) :inherit t)   
(:password :realm "mysite"
           :allowed (("joe" . "mypassword")
                     ("sam" . "secret"))
           :inherit t) ;  applies to subdirectories
; publish html and cgi files, but not those beginning with a period
(:files :allow ("\\.html$" "\\.cgi$") :deny ("^\\."))
; specify mime type for non-standard file extensions.  Also
; specify that a file named exactly ChangeLog should be given
; mime type "text/plain"
(:mime :types (("text/jil" "jil" "jlc") ("text/plain" "cl" ("ChangeLog"))))

 

 

(publish &key path host port content-type function class format remove server authorizer timeout plist)

This creates a mapping from a url to a computed-entity, that is an entity that computes its response every time a request comes in.  The path, host, port, remove, server , authorizer and class arguments are as in the other publish functions.  The timeout argument defaults to nil always.   The content-type sets a default value for the response to the request but this can be overridden.  The format argument is either :text (the default) or :binary and it specifies the kind of value that will be sent back (after the response headers, which are always in text).   This value is only important if the response is generated in a particular way (described below).

The function argument is  a function of two arguments: an object of class http-request that holds a description of the request, and an object of class entity that holds this entity which is handling the request.   This function must generate a response to the http request, even if the response is only that the request wasn't found.

 

(publish-prefix &key prefix host port content-type function class
                     format remove server authorizer timeout plist)

This is like publish except that it declares function to be the handler for all urls that begin with the string prefix.    Note however that prefix handlers have lower priority than exact handlers.   Thus if you declare a prefix handler for "/foo" and also a specific handler for "/foo/bar.html" then the specific handler will be chosen if "/foo/bar.html" is found in an http request.   Typically a prefix handler is used to make available a whole directory of files since their complete names being with a common prefix (namely the directory in which the files are located).    If you want to publish a whole directory then you probably want to use publish-directory since it has a number of features to support file publishing.

 

(publish-multi &key path host port content-type items class remove server authorizer timeout)

Some web pages are created from information from various sources.   publish-multi allows you to specify a sequence of places that supply data for the combined web page.   The data for each page is cached by publish-multi so that minimal conputation is required each time the page is requested.   

The host, port, content-type, class, remove, server authorizer and timeout arguments are the same as those of the other publish functions.    The items argument is unique to publish-multi and is a list of zero or more of the following objects

Here's an example where we create a page from a fixed header and trailer page with a bit of dynamic content in the middle.

(publish-multi :path "/thetime"
	       :items (list "header.html"
			    #'(lambda (req ent old-time old-val)
				(declare (ignore req ent old-time old-val))
				(with-output-to-string (p)
				  (html-stream p
					       :br
					       "The time is "
					       (:princ (get-universal-time))
					       (:b
						"Lisp Universal Time")
					       :br)))
			    "footer.html"))
				   

Generating a computed response

There are a variety of ways that a response can be sent back to the http client depending on whether keep-alive is being done, chunking is possible, whether the response is text or binary, whether the client already has the most recent data, and whether the size of the body of the response is known before the headers are sent.  AllegroServe handles the complexity of determining the optimal response strategy and the user need only use a few specific macros in the computation of the response in order to take advantage of AllegroServe's strategy computation

Here's a very simple computed response.  It just puts "Hello World!" in the browser window:

(publish :path "/hello"
         :content-type "text/html"
         :function #'(lambda (req ent)
                       (with-http-response (req ent)
                          (with-http-body (req ent)
                             (html "Hello World!")))))



This example works regardless of whether the request comes in from an old HTTP/0.9 browser or a modern HTTP/1.1 browser.  It may or may not send the response back with Chunked transfer encoding and it may or may not keep the connection alive after sending back the response.   The user code doesn't have to deal with those possibilities, it just uses with-http-response and with-http-body and the rest is automatic.  The html macro is part of the htmlgen package that accompanies AllegroServe.   In the case above we are being lazy and not putting out the html directives that should be found on every page of html since most browsers are accommodating.   Here's the function that generates the correct html:

(publish :path "/hello2"
         :content-type "text/html"
         :function #'(lambda (req ent)
                       (with-http-response (req ent)
                         (with-http-body (req ent)
                          (html 
                             (:html
                               (:body "Hello World!")))))))

 

The function above generates: <html><body>Hello World!</body></html>.

 

The macros and functions used in computing responses are these:


(with-http-response (req ent &key timeout check-modified format response content-type)
                    &rest body)

This macro begins the process of generating a response to an http request and then  runs the code in the body which will actually send out the response.  req and ent are the request and entity objects passed into the function designated to compute the response for the request.     timeout sets a time limit for the computation of the response.   If timeout is nil then the entity ent is checked for a timeout value.  If that value is also nil then the timeout value is retreived from the current wserver object using wserver-response-timeout.   If check-modified is true (the default) then the last-modified time stored in the entity object will be compared against the if-modified-since time of the request and if that indicates that the client already has the latest copy of this entity then a not-modified response will be automatically returned to the client and the body of this macro will not be run.   response is an object containing the code and description of the http response we wish to return.    The default value is the value of *response-ok* (which has a code of 200 and a string descriptor "OK").   content-type is a string describing the MIME type of the body (if any) sent after the headers.  It has a form like "text/html".   If content-type isn't given here then the content-type value in the entity (which is set in the call to publish) will be used.

The format argument specifies whether the code that writes the body of the response will want to write :text (e.g. write-char) or :binary (e.g. write-byte) when it writes the data of the body of the response.      Based on the value of the format argument, AllegroServe will create the correct kind of response stream.   If format is not specified here it will default to the value specified when publish was called to create the entity.  If not :format argument was passed to publish then :binary format is assumed.     If :binary is specified then you can write both text and binary to the stream  since Allegro's binary streams also support text calls as well.  If you specify :text then you may end up with a stream that supports only text operations.

An http response consists of a line describing the response code, followed by headers (unless it's the HTTP/0.9 protocol in which case there are no headers),    and then followed by the body (if any) of the response.   with-http-response doesn't normally send anything to the client.  It only does so when it determines that the if-modified-since predicate doesn't hold and that it must send back a not-modified response.    Thus  is not enough to just call with-http-response in your response function.  You must always call with-http-body inside the call to with-http-response.

 


(with-http-body (req ent &key format headers external-format)  &rest body)

This macro causes the whole http response to be sent out.  The macro itself will send out everything except the body of the response.  That is the responsibility of the code supplied as the body form of the macro.     In cases where there is no body to the response being sent it is still necessary to call with-http-body so that the other parts of the response are sent out, e.g. at a minimum you should put (with-http-body (req ent)) in the body of a with-http-response

The headers argument is a list of conses, where the car is the header name (a keyword symbol) and the cdr is the header value.  These headers are added to the headers sent as part of this response.

Within the body forms the code calls (request-reply-stream req) to obtain a stream to which it can write to supply the body of the response.   The external-format of this stream is set to the value of the external-format argument (which defaults to the value of *default-aserve-external-format*).   The variable *html-stream* is bound to the value of (request-reply-stream req) before the body is evaluated.   This makes it easy  to use the html macro to generate html as part of the response.

Note: there used to be a :format argument to with-http-body. That argument was never used by with-http-body.  The :format argument has been moved to with-http-response so that it can now have an effect on the stream created.


(get-request-body request)

Return the body of the request as a string.  If there is no body the return value will be an empty string.   The result is cached inside the request object, so this function can be called more than once while processing a request.   The typical reason for there to be a body to a request is when a web browser sends the result of a form with  a POST method.


(header-slot-value request header-name)

Return the value given in the request for the given header-name (a keyword symbol).      If the header wasn't present in this request then nil will be returned.   header-slot-value is a macro that will expand into a fast accessor if the header-name is a constant naming a known header slot

In older versions of aserve the header-name was a string..

 


(reply-header-slot-value request header-name)

Return the value associated with the header header-name in the reply sent back to the client.  This function is setf'able and this is the preferred way to specify headers and values to be sent with a reply.


(request-query request &key uri post external-format)

Decode and return an alist of the query values in the request.   Each item in the alist is a cons where the car is a string giving the name of the argument and the cdr is a string giving the value of the argument.

The query string is in one or both of two places:

request-query will by default look in both locations for the query string and concatenate the results of decoding both query strings.  If you would like it to not check one or both of the locations you can use the :uri and :post keyword arguments.   If uri is true (and true is the default value) then the query string in the uri is checked.  If post is true (and true is the default value) and if the request is a POST then the body of the post form will be decoded for query values.

The external-format is used in the conversion of bytes in the form to characters.  It defaults to the value of *default-aserve-external-format*.

A query is normally a set of names and values. 
http://foo.com/bar?a=3&b=4 yields a query alist (("a" . "3") ("b" . "4")).
If a name doesn't have an associated value then the value in the alist is the empty string. 
http://foo.com/bar?a&b=&c=4   yields a query alist (("a" . "") ("b" . "") (c . "4"))

.  

(request-query-value key request &key uri post external-format test)

This combines a call to request-query to retrieve the alist of query keys and values, with a call to assoc to search for the specific key, and finally with a call to cdr to return just the value from the assoc list entry.  The test argument is the function to be used to test the given key against the keys in the assoc list. It defaults to #'equal.

If the given key is not present in the query nil is returned.   If the given key is present in the query but doesn't have an associated value then the empty string is returned.


 

Request Object Reader and Accessors

The request object contains information about the http request being processed and it contains information about the response that is being computed and returned to the requestor.   The following functions access slots of the request object.   Those with names beginning with request-reply- are accessing the slots which hold information about the response to the request.   When a function is listed as an accessor that means that it can be setf'ed as well as used to read the slot value.

 

(request-method request) - reader - a keyword symbol naming the kind of request, typically :get, :put or :post.

(request-uri request) - reader - a uri object describing the request.   If the request contains a "Host:" header line then the value of this header is placed in the uri-host and uri-port slots of this uri object.

(request-protocol request) - reader - a keyword symbol naming the http protocol requested.  It is either :http/0.9, :http/1.0 or :http/1.1.

(request-protocol-string request) - reader - a string naming the http protocol requested. It is either "HTTP/0.9", "HTTP/1.0" or "HTTP/1.1".

(request-socket request) - reader - the socket object through which the request was made and to which the response must be sent.    This object can be used to determine the IP address of the requestor.

(request-wserver request) - reader - the wserver object describing the web server taking this request

(request-raw-request request) - reader -  a string holding the exact request made by the client

 

(request-reply-code request)      - accessor - the value describes the response code and string we will return for this request.   See the value of the argument response in with-http-response for more information.

(request-reply-date request)      - accessor - the date the response will be made (in Lisp's universal time format).  This defaults to the time when the request arrived.

(request-reply-headers request) - accessor - an alist of some of the headers to send out with the reply (other headers values are stored in specific slots of the request object).  Each entry in the alist is a cons where the car is a keyword symbol holding the header name and the cdr is the value (it is printed using the ~a format directive).    Typically request-reply-headers isn't used, instead the headers to be sent are passed as the :header argument to with-http-body, or (setf reply-header-slot-value) is called.

(request-reply-content-length request)     - accessor -  the value to send as the Content-Length of this response.   This is computed automatically by AllegroServe and thus a user program shouldn't have to set this slot under normal circumstances.

(request-reply-plist request)      - accessor -  this slot holds a property list on which AllegroServe uses to store  less important information.  The user program can use it as well.

(request-reply-strategy request)   - accessor - the strategy is a list of symbols which describe how AllegroServe will build a response stream and will send back a response.  More details will be given about the possible strategies at a future time.

(request-reply-stream request)       - accessor -  This is the stream to be used in user code to send back the body of the response.    This stream must  be used instead of the value of request-socket.

 


CGI Program Execution

The Common Gateway Interface (CGI) specification allows web servers to run programs in response to http requests and to send the results of  the execution of those programs back the web client.    The CGI programs finds information about the request in its environment variables and, in the case of a put or post request, the body of the request is sent to standard input of the program.

CGI is a clumsy and slow protocol for extending the behavior or a web server and is falling out of favor.  However there are legacy CGI applications you may need to call from AllegroServe.   You invoke an external program using the CGI protocol with the run-cgi-program function.

(run-cgi-program req ent program &key path-info path-translated
                                      script-name query-string
                                      auth-type timeout error-output env)

In response to an http request, this runs program which must be a string naming an exectuable program or script followed optionally by command line arguments to pass to that program.  Before the program is run the environment variables are set according the the CGI protocol.  The timeout argument is how long AllegroServe should wait for a response from the program before giving up.   The default is 200 seconds.   The error-output argument specifies what should be done with data the cgi program sends to its standard error.   This is described in detail below.  The other keyword arguments allow the caller to specify values for the CGI environment variables that can't be computed automatically.  path-info specifies the PATH_INFO environment variable, and similarly for path-translated, script-name, query-string and auth-type.   If query-string is not given and the uri that invoked this request contains a query part then that query part is passed in the QUERY_STRING environment variable.   If script-name is not given then its value defaults to the path of the uri of the request.   If you wish to add or modify the environment variables set for the cgi process you can specify a value for env.  The value of env should be a list of conses, the car of each cons containing the environment variable name (a string) and the cdr of each cons containing the environment variable value (a string).   env is checked after all the standard environment variables are computed and the value given in env will override the value computed automatically.

cgi programs send their result to standard output (file descriptor 1 on Unix).  If they encounter problems they often send informative messages to standard error (file descriptor 2 on Unix).    The error-output argument to run-cgi-program allows the caller to specify what happens to data sent to standard error.   The possibile values for error-output are:

nil The cgi program's standard error is made the same as the Lisp process' standard error.   This standard error may not be the same as the current binding of *standard-error*.
pathname or string A file with the given name is opened and standard error is directed to that file.
:output Standard error is directed to the same place as standard output thus the error messages will be mixed into the result of running the cgi program.
symbol or function The function is run whenever there is data available to be read from standard error.  It must read that data.  It must return a true value if it detected an end of file during the read and nil otherwise.    The function takes arguments: req ent stream

A typical way of publishing a CGI page is this:

(publish :path "/cgi/myprog"
         :function #'(lambda (req ent) 
                        (run-cgi-program req ent "/server/cgi-bin/myprog")))

If you're concerned about capturing the error output then here's an example where we supply  a function to collect all the error output into a string. Once collected we simply print it out here but in a real web server you would want to store it in a log file.

(defun cgierr (req ent)
  (let ((error-buffer (make-array 10
                                  :element-type 'character
                                  :adjustable t
                                  :fill-pointer 0)))
    (net.aserve:run-cgi-program
     req ent
     "aserve/examples/cgitest.sh 4"
     :error-output
     #'(lambda (req ent stream)
         (declare (ignore req ent))
         (let (eof)
           (loop
             (let ((ch (read-char-no-hang stream nil :eof)))

               (if* (null ch) then (return))

               (if* (eq :eof ch)
                  then (setq eof t)
                       (return))

               (vector-push-extend ch error-buffer)))
           eof
           )))

    (format t "error buffer is ~s~%" error-buffer)
    ))

 

 

Note: The ability to run CGI programs from AllegroServe was due to features added in Allegro Common Lisp version 6.1.   This will not work in earlier versions of Allegro CL.


Form Processing

Forms are used on web pages in order to allow the user to send information to the web server.   A form consists of a number of objects, such as text fields, file fields, check boxes and radio buttons.   Each field has a name.   When the user takes a certain action, the form data is encoded and sent to the web server.     There are three ways that data can be sent to the web server.  The method used is determined by the attributes of the <form> tag that defines the form

Retrieving multipart/form-data information

If you create a form with <form method="post" enctype="multipart/form-data"> then your url handler must do the following to retrieve the value of each field in the form:

  1. Call (get-multipart-header req) to return the MIME headers of the next field.  If this returns nil then there are no more fields to retrieve.  You'll likely want to call parse-multipart-header on the result of get-multipart-header in order to extract the imporant information from the header.
  2. Create a buffer and call (get-multipart-sequence req buffer) repeatedly to return the next chunk of data.  When there is no more data to read for this field, get-multipart-sequence will return nil.     If you're willing to store the whole multipart data item in a lisp object in memory you can call get-all-multipart-data instead to return the entire item in one Lisp object.
  3. go back to step 1

It's important to retrieve all of the data sent with the form, even if that data is just ignored.  This is because there may be another http request following this one and it's important to advance to the beginning of that request so that it is properly recognized.  

Details on the functions are given next.

 


(get-multipart-header request)

This returns nil or  the MIME headers for the next form field in alist form.     If nil is returned then there is no more form data.   See parse-multipart-header for a simple way to extratacting information from the header.

For an input field such as <input type="text" name="textthing"> the value returned by get-multipart-header would be

((:content-disposition
      (:param "form-data" ("name" . "textthing"))))

For an input field such as <input type="file" name="thefile"> the value returned by get-multipart-header would be something like

((:content-disposition
      (:param "form-data" ("name" . "thefile")
                          ("filename" . "C://down//550mhz.gif")))
 (:content-type "image/gif"))

Note that the filename is expressed in the syntax of the operating system on which the web browser is running.  This syntax may or may not make sense to the Lisp pathname functions of the AllegroServe web server as it may be running on a totally different operating system.

 


(parse-multipart-header header)

This take the value of get-multipart-header and returns values that describe the important information in the header.

The first value returned is


(get-multipart-sequence request buffer &key start end external-format)

This retrieves the next chunk of data for the current form field and stores it in buffer.    If start is given then it specifies the index in the buffer at which to begin storing the data.  If end is given then it specifies the index just after the last index in which to store data.

The return value is nil if there is no more data to return, otherwise it is the index one after the last  index filled with data in buffer.

The buffer can be a one dimensional array of character or of (unsigned-byte 8).  For the most efficient transfer of data from the browser to AllegroServe, the program should use a 4096 byte (unsigned-byte 8) array.

If the buffer is  a character array then the data is converted from get-multipart-sequence's (unsigned-byte 8) array to characters using the given external-format (which defaults to the value of *default-aserve-external-format*).

get-multipart-sequence may return before filling up the whole buffer, so the program should be sure to make use of the index returned by get-multipart-sequence.

 


(get-all-multipart-data request &key  type size external-format limit)

This retrieves the complete data object following the last multipart header.    It returns it as a lisp object.   If type is :text (the default) then the result is returned as a lisp string.   If type is :binary then the result is returned as an array of  element-type (unsigned-byte 8).    size (which defaults to 4096) is the size of the internal buffers used by this function to retrieve the data.   You usually won't need to specify a value for this but but if you know the values retrieved are either very small or very large it may may the operation run faster to specify an appropriate size.    external-format is used when type is :text to convert  the octet stream into characters.  It defaults to the value of *default-aserve-external-format*.   limit can be given an integer value that specifies the maximum size of data you're willing to retrieve.  By default there is no limit.  This can be dangerous as a user may try to upload a huge data file which will take up so much Lisp heap space that it takes down the server.   If a limit is given and that limit is reached, get-all-multipart-data will continue to read the data from the client until it reaches the end of the data, however it will not save it and will return the symbol :limit to indicate that the data being send to the sever exceeded the limit.  It will return a second value which is the size of the data the client tried to upload to the server.    If your application intends to handle very large amounts of data being uploaded to the server you would be better off using get-multipart-sequence since with that you can write the data buffer by buffer to the disk instead of storing it in the Lisp heap.


 

In AllegroServe the information sent to the web server as a result of filling out a form  is called a query.  We store a query as a list of conses, where the car of the cons is the name (a string) and the cdr of the cons is the value (another string).    When a query is transmitted by the web browser to AllegroServe it is sent as string using the encoding application/x-www-form-urlencoded.  We provide the following functions to convert between the encoding and the query list:

 

(form-urlencoded-to-query string &key external-format)

Decodes the string and returns the query list.   The default value for external-format is the value of *default-aserve-external-format*.

 

(query-to-form-urlencoded query &key external-format)

Encodes the query and returns a string.   The default value for external-format is the value of *default-aserve-external-format*.

 

Examples:

user(4): (query-to-form-urlencoded '(("first name" . "joe") 
                                     ("last name" . "smith")))
"first+name=joe&last+name=smith"

user(5): (form-urlencoded-to-query "first+name=joe&last+name=smith")
(("first name" . "joe") ("last name" . "smith"))
 
user(6): (query-to-form-urlencoded
            `(("last name" . ,(coerce '(#\hiragana_letter_ta
                                        #\hiragana_letter_na
                                        #\hiragana_letter_ka)
                                      'string)))
              :external-format :euc)
 "last+name=%a4%bf%a4%ca%a4%ab"
user(7): (query-to-form-urlencoded
            `(("last name" . ,(coerce '(#\hiragana_letter_ta
                                        #\hiragana_letter_na
                                        #\hiragana_letter_ka)
                                      'string)))
             :external-format :shiftjis)
 "last+name=%82%bd%82%c8%82%a9"

user(8): (coerce
           (cdr
              (assoc "last name"
                (form-urlencoded-to-query "last+name=%82%bd%82%c8%82%a9"
                                      :external-format :shiftjis)
                :test #'equalp))
           'list)
 (#\hiragana_letter_ta #\hiragana_letter_na #\hiragana_letter_ka)

Authorization

You may want to restrict certain entities to be accessible from only certain machines or people.   You can put the test for authorization in the entity response function using one of the following functions, or you can have the check done automatically by storing an authorizer object in the entity.

 

functions

(get-basic-authorization request)

This function retrieves the Basic authorization information associated with this request, if any.    The two returned values are the name and password, both strings.  If there is no Basic authorization information with this request, nil is returned.

 

(set-basic-authorization request realm)

This adds a header line that requests Basic authorization in the given realm (a string).    This should be called between with-http-response and with-http-body and only for response of type 401 (i.e. *response-unauthorized*).    The realm is an identifier, unique on this site, for the set of pages for which access should be authorized by a certain name and password.

 

This example manually tests for basic authorization where the name is foo and the password is bar.

(publish :path "/secret"
    :content-type "text/html"
    :function
    #'(lambda (req ent)
        (multiple-value-bind (name password) (get-basic-authorization req)
           (if* (and (equal name "foo") (equal password "bar"))
             then (with-http-response (req ent)
                    (with-http-body (req ent)
                      (html (:head (:title "Secret page"))
                            (:body "You made it to the secret page"))))
             else ; this will cause browser to put up a name/password dialog
                  (with-http-response (req ent :response *response-unauthorized*)
                     (set-basic-authorization req "secretserver")
                     (with-http-body (req ent)))))))

authorizer classes

If an entity has an associated authorizer object, then before that entity's response function is run the authorizer is tested to see if  it will accept or deny the current request.    AllegroServe supplies three   interesting subclasses of authorizer and users are free to add their own subclasses to support their own authorization needs.  

The protocol followed during authorization is this:

  1. an entity object is selected that matches the request
  2. if the entity object's authorizer slot is nil then it is considered authorized.
  3. otherwise the authorize generic function is called, passing it the authorization object, the http-request object and the entity object
  4. the return value from authorize can be 
    t - meaning this request is authorized to access this entity
    nil - meaning that this request isn't authorized.  The response from AllegroServe will be the standard "failed request" response so the user won't be able to distinguish this response from one that would be received if the entity didn't exist at all.
    :deny - a denied request response will be returned.   It will not use the 401 return code so this will not cause a password box to be displayed by the browser.
    :done - the request is denied, and a response has already been sent to the requestor by the authorize function so no further response should be made.

 

password-authorizer  [class]

This subclass of authorizer is useful if you want to protect an entity using the basic authorization scheme that asks for a name and a password.     When you create this class of object you should supply values for the two slots:

Slot Name initarg what
allowed :allowed list of conses, each cons having the form ("name" . "password") where any of the listed name password pairs will allow access to this page.
realm :realm A string which names the protection space for the given name and password.   The realm will appear in the dialog box the browser displays when asking for a name and password.

An example of it's use is the following where we allow access only if the user enters a name of joe and a password of eoj or a name of fred and a password of derf.

  (publish :path "/foo"
    :content-type "text/html"
    :authorizer (make-instance 'password-authorizer
                       :allowed '(("joe" . "eoj")
                                  ("fred" . "derf"))
                       :realm "SecretAuth")

    :function
    #'(lambda (req ent)
        (with-http-response (req ent)
           (with-http-body (req ent)
              (html (:head (:title "Secret page"))
                    (:body "You made it to the secret page"))))))

 

location-authorizer [class]

This authorizer class checks the IP address of the request to see if it is permitted access to the entity.  The  authorizer can specify a sequence of  patterns and for each pattern a command of :accept (permit the access) or :deny (forbid the access).    The first pattern that matches determines if the request is accepted or denied.  If the pattern list is empty or if no pattern matches, then the request is accepted. 

The single slot of an object of class location-authorizer is

Slot Name initarg what
patterns :patterns a list of patterns and commands, where the syntax of a pattern-command is described below.

A pattern can be

The example of using a location-authorizer only permits connections coming in via the loopback network (which occurs if you specify http://localhost/whatever) or if they come from one particular machine (tiger.franz.com).  Note that we end the pattern list with :deny so that anything not matching the preceding patterns will be denied.

(publish :path "/local-secret-auth"
    :content-type "text/html"
    :authorizer (make-instance 'location-authorizer
                         :patterns '((:accept "127.0" 8)
                                     (:accept "tiger.franz.com")
                                     :deny))

    :function
    #'(lambda (req ent)
        (with-http-response (req ent)
           (with-http-body (req ent)
               (html (:head (:title "Secret page"))
                     (:body (:b "Congratulations. ")
                       "You made it to the secret page"))))))

function-authorizer   [class]

This authorizer contains a function provided by the user which is used to test if the request is authorized.   The function take three arguments, the http-request object, the entity and the authorizer object.   It must return one of the four value that the authorize function returns, namely t, nil :deny or :done.

A function-authorizer is created as follows

(make-instance 'function-authorizer
    :function #'(lambda (req ent auth)
                          t  ; always authorize
                 ))

The function slot can be set using (setf function-authorizer-function) if you wish to change it after the authorizer has been created.

 

Cookies

Cookies are name value pairs that a web server can direct a web browser to save and then pass back to the web server under certain circumstances.   Some users configure their web browsers to reject cookies, thus you are advised against building a site that depends on cookies to work.

Each cookie has these components:

  1. name - a string.   Since you can get multiple cookies sent to you by a web browser, using a unique name will allow you to distinguish the values.
  2. value - a string
  3. path - a string which must be the prefix of the request from the web browser for this cookie to be sent.  The string "/" is the prefix of all requests.
  4. domain - a string which must be the suffix of the name of the machine where the request is being sent in order for this cookie to be sent.
  5. expiration - a time when this cookie expires.
  6. secure - either true or false.  If true then this cookie will only be sent if the connection is through a secure socket

 

(set-cookie-header request &key name value expires domain path secure encode-value external-format)

This function should be called between the calls to with-http-response and with-http-body.   It can be called more than once.  Each call will cause one Set-Cookie directive to be sent to the web browser.     The name and value arguments should be given (and they should be strings).  They will be automatically encoded using the same encoding used in urls (we call it uriencoding). The purpose of this encoding is to convert characters that are either unprintable or those that have a special meaning into a printable string.    The web browser doesn't care about the name and value, it just stores them and sends them back to the web server.     If you use the get-cookie-values function to retrieve the cookie name and value pairs, then it will automatically decode the uriencoding.

You can disable the encoding of the value by specifying a nil value to encode-value.    This should only be necessary if you are working with buggy http client applications.

If the path argument isn't given, it will default to "/" which will allow this cookie to match all requests.
If the domain argument isn't given then it will default to the host to which this request was sent.  If you wish to specify this you are only allowed to specify a subsequence of the host to which this request was sent (i.e the name of the machine running the webserver).   The domain should have at least two periods in it (i.e.  ".foo.com").
expires can be a lisp universal time or it can be the symbol :never meaning this should never expire.  If expires isn't given or is nil then this cookie will expire when the user quits their web browser.
secure should be true or false.  Any non-nil value is interpreted as true. The default value is false.
The external-format is used to convert bytes to characters.   It defaults to the value of *default-aserve-external-format*.

 

(get-cookie-values request &key external-format)

Return the cookie name and value pairs from the header of the request.   Each name value pair will be in a cons whose car is the name and whose cdr is the value.   The names and values will be decoded (in other words the decoding done by set-cookie-header will be undone).    The external-format is used to convert bytes to characters.   It defaults to the value of *default-aserve-external-format*.

 


Variables

These special variables contain  information about AllegroServe or help control AllegroServe:

*aserve-version* - a list of three values: (major-version minor-version sub-minor-version) which is usually printed with periods separating the values (i.e. X.Y.Z).

*default-aserve-external-format* - a symbol or external format object which is the default value for those AllegroServe functions that take an external-format argument.   http requests are normally run in separate lisp threads and those threads bind *default-aserve-external-format* to the value of the external-format argument to the start function.   Thus changing the value of *default-aserve-external-format* in one thread will not affect its value in other threads.   You should decide the default external format before you start AllegroServe running.

*http-response-timeout* - the default value for the timeout argument to with-http-response.   [in future versions of AllegroServe we'll treat this value like *default-aserve-external-format* and bind it in each worker thread]

*mime-types* - a hash table where the keys are the file types (e.g. "jpg") and the values are the MIME types (e.g. "image/jpeg").

 


AllegroServe request processing protocol

We'll describe here the steps AllegroServe goes through from the time it receives a request until a response to that request has been sent back to the browser.    We want the protocol to be open so that users can extend AllegroServe's behavior to suit their needs.  However given that AllegroServe is a new program and will be undergoing extensive review from its users, we expect that the protocol will change.   It shouldn't lose any of its current extensibility but the names and argument lists of generic functions may change. 

When a client connects to the port on which AllegroServe is listening, AllegroServe passes that connected socket to a free worker thread which then wakes up and calls the internal function net.aserve::process-connection.   If there are no free worker threads then AllegroServe waits for one to be available.

In each worker thread the variable *wserver* is bound to the wserver object that holds all the information about the webserver on which the connection was made (remember that one AllegroServe process can be running more than one webserver).   process-connection reads the request from the socket (but doesn't read past the header lines).     If the request can't be read within *read-request-timeout* seconds (currently 20) then the request is rejected.    The request is stored in an object of class http-request.    Next process-connection calls handle-request to do all the work of the request and then log-request to log the action of the request.  Finally if the response to the request indicated that the connection was to be kept open rather than being closed after the response, then process-connection loops back to the top to read the next request.

 

(handle-request (req http-request))    [generic function]

This generic function must locate the entity to handle this request and then cause it to respond to the request.   If there is no matching entity then handle-request must send a response back to the client itself.  handle-request uses locators to find the entity (more on this below), and then if an entity is found and that entity has an authorizer, it calls authorize to see if this request is allowed to access the selected entity.  If the entity passes the authorization then process-entity is called to cause the entity to respond to the request.  process-entity returns true if it processed the entity, and nil if did not in which case the search continues for an entity.  If there is no entity to respond then failed-request is called to send back a failure message.

A locator is an object used to map requests into entities.    The value of (wserver-locators *wserver*) is a list of locator objects.   handle-request calls

(standard-locator (req http-request) (loc locator)) [generic function]

on each successive locator in that list until one returns an entity object.     AllegroServe has two built-in locator classes, locator-exact and locator-prefix, that are subclasses of locator.   When you call publish or publish-file you are adding the entity to locator of class locator-exact found in the wserver-locators list.   When you call publish-directory you are adding to the locator of class locator-prefix.    Users are free to define new locator classes.    Locators should define the standard-locator method as well as

(unpublish-locator (loc locator))    [generic  function]

which if called should remove all published entities from the locator.

 

Let's return to handle-request.  It has called standard-locator and found an entity.   Next it checks to see if the entity has an authorizer value and if so calls

(authorize (auth authorizer) (req http-request) (ent entity))   [generic function]

The return value will be one of

If there is no authorizer for this entity then we just call process-entity.    If there is no entity, then we call failed-request.

 

(failed-request (req http-request))    [generic function]

send back a response to the effect that the url request doesn't exist on this server.

 

(denied-request (req http-request))   [generic function]

send back a response to the effect that access to the requested url was denied.

 

(process-entity  (req http-request) (ent entity))    [generic function]

Send back a response appropriate to the given entity.     The macros with-http-response and with-http-body should be used in the code that sends the response.

 

 


Client functions

AllegroServe has a set of functions that perform http client-side actions.   These functions are useful in generating computed pages that reflect the contents of other pages.  We also use the client-side http functions to test AllegroServe.

The client-side functions described in this section are exported from the net.aserve.client package.

The function do-http-request sends a request and retrieves the whole response.    This is the most convenient function to use to retrieve a web page.

If you need more control over the process you can use the functions: make-http-request, read-client-response-headers and client-request-read-sequence.

 

(do-http-request uri &key method protocol accept
                          content content-type query format cookies
                          redirect redirect-methods basic-authorization
                          keep-alive headers proxy user-agent external-format ssl
                          skip-body)

Sends a request to uri and returns four values:

  1. The body of the response.  If there is no body the empty string is returned.
  2. the response code (for example, 200, meaning that the request succeeded)
  3. an alist of headers where the car of each entry is a lowercase string with the header name and the cdr is a string with the value of that header item.
  4. the uri object denoting the page accessed.  This is normally computed from the uri value passed in but if redirection was done then this reflects the target of the redirection.  If you plan to interpret relative html links in the body returned then you must do so with respect to this uri value

The uri can be a uri object or a string.   The scheme of the uri must be nil or "http".   The keyword arguments to do-http-request are

Name default description
method :get The type of request to make.  Other possible values are :put, :post and :head:head is useful if you just want to see if the link works without downloading the data.
protocol :http/1.1 The other possible value is :http/1.0.  Modern web servers will return the response body in chunks if told to use the :http/1.1 protocol.  Buggy web servers may do chunking incorrectly (even Apache has bugs in this regard but we've worked around them).  If you have trouble talking to a web server you should try specifying the :http/1.0 protocol to see if that works.
accept "*/*" A string listing of MIME types that are acceptable as a response to this request.  The type listed can be simple such as "text/html" or more complex like "text/html, audio/*"  The default is to accept anything which is expressed "*/*".
content nil If the method is :put or :post then the request should include something to be sent to the web server.   The value of this argument is either a string or a vector of type (unsigned-byte 8) which will be sent to the web server.   It may also be a list of strings or vectors. See the query argument for a more convenient way to :post data to a form.
content-type nil A string which is to be the value of the Content-Type header field, describing the format of the value of the content argument.    This is only needed for :put and :post requests.
query nil This is a query alist of the form suitable for query-to-form-urlencoded.   If the method is a :get then the value of  this argument is urlencoded and made the query string of the uri being accessed.  If the method is :post then the query string is urlencoded and made the content of the request.  Also the content-type is set to application/x-www-form-urlencoded.
format :text The body of the response is returned as a string if the value is :text or as an array of type (unsigned-byte 8) if the value is :binary.    When the body is a string the external-format argument is important.
cookies nil If you wish the request to include applicable cookies and for returned cookies to be saved, then a cookie-jar object should be passed as the value of this argument.
redirect 5 If the response is a redirect (code 301, 302, 303), and the method is one given by the value of redirect-methods then if this argument is true (and, if an integer, positive), do-http-request will call itself to access the page to which the redirection is pointed.  If redirect is an integer then in the recursive call the value passed for redirect will be one less than the current value.  This prevents infinite recursion due to redirection loops.
redirect-methods (:get :head) List of http methods which will be redirected if redirect is true.
basic-authorization nil If given, it is a cons whose car is the name and whose cdr is the password to be used to get authorization to access this page.
keep-alive nil If true then the web server will be told to keep the connection alive.    Since do-http-request closes the connection after the request this option currently does no more than allow us to experiment with how a web server responds to a keep-alive request.
headers nil an alist of conses ("header-name" . "header-value") for additional headers to send with the request.
proxy nil the name and optionally the port number of a proxying web server through which this request should be made.   The form is of the argument is "www.machine.com" or "www.machine.com:8000" if the web server is listening on port 8000 rather than 80.   Proxying web servers are often used when clients are behind firewalls that prevent direct access to the internet.   Another use is to centralize the page cache for a group of clients.
user-agent nil If given it specifies the value of the User-Agent header to be sent with the request.  Some sites respond differently based on the user-agent they believe has made the request.  The lack of a User-Agent header may cause a server to ignore a request since it believes that it is being probed by a robot.  The value of user-agent can be a string or one of the keywords :aserve, :netscape or :ie in which case an appropriate user agent string is sent.
external-format the value of *default-aserve-external-format* This determines the socket stream's external format.
ssl nil If true then the connection is made using the Secure Sockets Layer protocol.    If the uri uses the https scheme then ssl is assumed to be true and the ssl argument need not be specified.
skip-body nil If the value is a fucntion (satisifies functionp) then the value is funcalled passing the client-request object as an argument.   At this point the client-request object contains the information on the headers of the response.   The function should return true if the body of the response should be skipped and nil returned as the first value from do-http-request.  If skip-body is not a function then if its value is true then reading the body is skipped and nil returned in its place.

 

For example:

user(5): (do-http-request "http://www.franz.com")
"<HTML>
    <HEAD>
        <TITLE>Franz Inc: Allegro Common Lisp and Common Lisp Products</TITLE>
        <BASE FONTFACE=\"helvetica, arial\" FONTSIZE=\"1\">
.....
"
200
(("content-type" . "text/html") ("transfer-encoding" . "chunked")
("server" . "Apache/1.3.9 (Unix) PHP/3.0.14")
("date" . "Mon, 24 Apr 2000 11:00:51 GMT"))

 

It's easy to use do-http-request to fill in form objects on a page.   If the form has input elements named  width and height then you can send a request that specifies that information in this way:

(do-http-request "http://www.foo.com/myform.html" 
                 :query '(("width" . 23) ("height" . 45)))

The above assumes that the method on the form is "GET".   If the method is "POST" then a similar call will work:

(do-http-request "http://www.foo.com/myform.html"  :method :post
                 :query '(("width" . 23) ("height" . 45)))


       

 

Before we describe the lower level client request functions we will describe two classes of objects used in that interface.

client-request

A client-request object includes the information about the request and the response.

The public fields of a client-request that are filled in after a call to make-http-client-request are:

Accessor Description
client-request-uri uri object corresponding to this request
client-request-socket socket object open to the web server denoted by the uri
client-request-cookies the cookie-jar object (if any) passed in with this request.

 

After read-client-response-headers is called, the following fields of the client-request objects are set:

Accessor Description
client-request-response-code the integer that is the response code for this request.  The most common codes are 200 for Success and 404 for Not Found.
client-request-headers an alist of header values in the response.  Each entry is a cons of the form ("header-name" . "header-value").   The header names are all lower case.
client-request-protocol A keyword symbol naming the protocol  that the web server returned (which may be different that the protocol given in the request).   A typical return value is :http/1.1
client-request-response-comment A string giving a textual version of the response code.   The string is arbitrary and you should not depend on all web servers returning the same string for any given response code.

 

cookie-jar

A cookie-jar is a respository for cookies.  Cookies are stored in a jar when a response from a client request includes Set-Cookie headers.   Cookies from a jar are sent along with a request when they are applicable to the given request.   We won't describe the rules for cookie applicability here, you need only know that if you use our client functions  to access a site that uses cookies to implement persistence, then you should create a cookie-jar object and pass that same object in with each request.   More information on cookies can be found here.

A cookie-jar is created with (make-instance 'cookie-jar).

 

(cookie-jar-items  cookie-jar)

returns an alist of the cookies in the jar where each item has the form:

(hostname cookie-item ...)

The hostname is a string which is matched against the suffix of the name of the host in the request (that is, a hostname of  ".foo.com" matches "a.foo.com" and "b.foo.com". ).    The hostname should have at least two periods in it.     The following cookie-item objects in the list all apply to that hostname.    A cookie-item is a defstruct object and has these fields

Accessor Description
cookie-item-path A string that must be the prefix of the path of the request for it to match.  The prefix "/" matches all paths.
cookie-item-name The name of the cookie.  A string.
cookie-item-value The value of the cookie.  A string.
cookie-item-expires A string holding the time the cookie expires [in a future release we may make this a universal time]
cookie-item-secure true if this cookie should only be sent over a secure connection.

 

 

(make-http-client-request uri &key method protocol keep-alive
                                   accept cookies headers proxy
                                   basic-authorization query
                                   content content-type content-length
                                   user-agent external-format ssl)

This function connects to the web server indicated by the uri and sends the request.   The arguments are the same as those for do-http-request and are documented there.   There is one additional argument: content-length.    This argument can be used to set the content-length header value in the request.  After setting the content-length the caller of make-http-client-request would then be responsible for sending that many bytes of data to the socket to serve as the body of the request.   If content-length is given, then a value for content should not be given.

If  make-http-client-request succeeds in contacting the web server and sending a request, a client-request object is returned.    If make-http-client-request fails, then an error is signalled.

The returned client-request object contains an open socket to a web server, thus you must ensure that client-request object isn't discarded before client-request-close is called on it to close the socket and reclaim that resource.

After calling make-http-client-request the program will send the body of the request (if any), and then it will call read-client-response-headers to partially read the web server's response to the request.

The default value for external-format is the value of *default-aserve-external-format*

 

 

(read-client-response-headers client-request)

This function reads the response code and response headers from the web server.     After the function returns the program can use the client-request accessors noted above to read the web server's response.  The body of the response (if any) has not been read at this point.    You should use client-request-read-sequence to read the body of the response

 

 

(client-request-read-sequence buffer client-request
                              &key start end)

This fills the buffer with the body of the response from the web server.   The buffer should either be a character array or an array of (unsigned-byte 8).    If given, start specifies the index of the first element in the buffer in which to store, and end is one plus the index of the last element in which to store. 

The return value is one plus the last index in the buffer filled by this function. The caller of the function must be prepared for having the buffer only partially filled.   If the return value is zero then it indicates an End of File condition.

 

(client-request-close client-request)

The client-request object returned by make-http-request is closed.   This returns the resources used by this connection to the operating system. 

 

(uriencode-string  string &key external-format)

Convert the string into a format that would be save to use as a component of a url.     In this conversion most printing characters are not changed    All non printing characters and printing characters that could be confused with characters that separate fields in a url are encoded a %xy where xy is the hexadecimal representation of the char-code of the character.  
external-format defaults to the value of *default-aserve-external-format*.


Proxy

AllegroServe can serve as an http proxy.   What this means is that web clients can ask AllegroServe to fetch a URL for them.   The two primary uses for a proxy server are

  1. you have web clients on a local network and you would prefer that the web clients don't send messages out to the internet.   You run AllegroServe on a machine that has access both to the internal network and to the internet.  You then configure the web clients to proxy through AllegroServe (directions for doing this are given below).
  2. You wish to use AllegroServe's caching facility to store copies of pages locally to improve responsiveness.  In this case you must start AllegroServe as a proxy server for the web clients who will use the cache.

In order to run AllegroServe as a proxy server you should specify :proxy t in the arguments to the net.aserve:start function.   With this specified AllegroServe will still act as a web server for pages on the machine on which AllegroServe is running.  AllegroServe will act as a proxy for requests to other machines.

Each web browser has it's own way of specifying which proxy server it should use.   For Netscape version 4 select the Edit menu, then Preferences... and then click on the plus sign to the left of Advanced.   Then select Proxies and click on  Manual Proxy Configuration and the click on View and specify the name of the machine running AllegroServe and the port number on which AllegroServe is listening.   Then click OK on all the dialog boxes.

For Internet Explorer 5 select the Tools menu, and then Internet Options.. and then the Connections tab, and then LAN Settings.   Click on Use a Proxy Server and then click on Advanced and specify the machine name and port number for AllegroServe.  Then click on OK to dismiss the dialog windows.

 


Cache

The AllegroServe cache is a facilty in development.  We'll describe here the current status of the code.

The cache consists of a memory cache and a set of zero or more disk caches.      Items initially live in the memory cache and are moved to the disk caches when the memory cache fills up.   Items enter the memory cache due to a page being accessed via the proxy server.   Items in the disk cache move back to the memory cache if the data portion must be sent back to the requesting client (some requests can be answered without sending back the contents of the page and for these the item stays in the disk cache).

You specify the sizes of each cache.   The disk caches will never grow beyond the size you specified but the memory cache can exceed the specified size for a short time.  A background thread moves items from the memory cache to the disk caches and we will allow you to control how often that thread wakes up and ensures that the memory cache is within the desired constraints.

When net.aserve:start is called you specify if you want caching and if so what size caches you want.   A sample argument pair passed to net.aserve:start is

:cache '(:memory 10000000 :disk ("/tmp/mycache" 30000000) :disk (nil 20000000))

This says that the memory cache should be 10,000,000 bytes and that there should be two disk caches.   One disk cache is the file "/tmp/mycache" and can grow to 30,000,000 bytes and the other cache will have a name chosen by AllegroServe and it can grow to 20,000,000 bytes.   We should note here that one thing that distinguishes the AllegroServe caching facilty from that found in many other http proxy-caches is that AllegroServe uses a few  large cache files rather than storing each cached item in a separate file in the filesystem.  

A few other ways of specifying caching at startup is:

:cache t

This will create a memory case of the default size (currently 10 megabytes) and it will create no disk caches.

:cache 20000000

This will create a memory cache of 20,000,000 byte and no disk caches.

 

When caching is enabled we publish two links to pages showing cache information.    This is useful during debugging and is likely to change in the future.   The two pages are  /cache-stats  and  /cache-entries.

 


Request Filters

After AllegroServe reads a request and before it checks the locators to find an entity to handle the request, AllegroServe runs the request through a set of filters.

A filter is a function of one argument: the http-request object. The filter examines and possibily alters the request object. The idea is that filters can do large scale and simple url rewriting, such as changing all requests for one machine to another machine. The filtering occurs before the test to see if this is a proxy request so a filter can change a proxy request to a non proxy request or vice versa.

The currently active filters are found in two places.  First the vhost-filters function of the applicable vhost returns a set of vhost specific filters.   Next the wserver-filters function on the current wserver object returns a set of server global filters.     Both of these functions are setf'able to change the set of filters.

 

A filter function returns :done if no more filters should be run after this one. If the filter returns anything else then subsequent filters in the list are run as well.   If a filters in the vhost list returns :done then the server global filters are not even checked.

When a filter function runs it's most likely going to be looking at two slots in the request object, which are accessed via these functions:

Also the value of (header-slot-value request :host) is important to check and possibly change.

If the browser is setup to access the internet directly then a request from the user for
    http://foo.bar.com:23/whatever

will cause the request to be sent to the server at foo.bar.com port 23 and the request will have:

  1. the request-raw-uri is /whatever
  2. the request-uri is http://foo.bar.com:23/whatever
  3. the Host header value is "foo.bar.com:23"




If the browser is setup to send all requests through a proxy at proxy.blop.com then a request for
http://foo.bar.com:23/whatever
will come to proxy.blop.com and will have a different raw uri:

  1. the request-raw-uri is now http://foo.bar.com:23/whatever
  2. the request-uri is still http://foo.bar.com:23/whatever
  3. the Host header value is still "foo.bar.com:23"

If the filter wants to alter the destination of request it should ensure that the three values mentioned above are set appropriately for the destination. If the new destination is not served by the current allegroserve wserver, then the filter will have to make sure to turn it into a proxy request (and this will only work if this AllegroServe was started with proxying enabled).

 


Virtual Hosts

It is possible for a single web sever to act like two or more indepenent web servers.   This is known as virtual hosting.  AllegroServe supports the ability to run any number of virtual hosts in a single instance of AllegroServe.

AllegroServe runs on a single machine and listens for requests on one port on one or more more IP addresses.   When a request arrives there is usually a header line labelled Host whose value is the specific hostname typed into the browser by the user.   Thus if hostnames www.foo.com and www.bar.com both point to the same machine then it's possible for the webserver on that machine to distinguish a request for http://www.foo.com from a request for http://www.bar.com by looking at the Host header.

In order to make AllegroServe easy to use you can ignore the virtual hosting facility until you plan to use it.   As long as you don't specify a :host argument to any of the publish functions when adding content to your site, everything you publish will be visible from your web server no matter which hostname the web browser uses to access your site.  If you decide you want to make use of virtual hosting, then read on.

vhost class

In AllegroServe a virtual host is denoted by a instance of class vhost.    The contents of a vhost object are:

Accessor Function What initarg
vhost-log-stream Stream to which to write logging information on requests to this virtual host :log-stream
vhost-error-stream Unused by AllegroServe but applications may wish to store here a stream to which to write a log of errors :error-stream
vhost-names A list of all the names for this virtual host.  :names
vhost-filters list of filter functions :filters

The defaults values for the two streams in a vhost object is the wserver-log-stream from the server object.

Every instance of AllegroServe has a default vhost object that can be retrieved from the wserver object via the function wserver-default-vhost.    If a request comes in for a virtual host that's not known, then it's assumed to be for the default virtual host.

There are two ways to create virtual hosts in AllegroServe: implicitly or explicitly.    If a publish function is called with a :host value that names a host not known to be a virtual host then a vhost instance will be created automatically and stored in the wserver's hash table that maps names to vhost objects.  This is implicit virtual host creation.

If you know ahead of time the virtual hosts you'll be serving then it's better to setup all the virtual hosts explicitly.   You create a vhost instance with make-instance and you register each virtual host in the wserver-vhosts table using gethash.     Following is an example of setting up a server to have two virtual hosts, one that responds to three names and one that responds to two names.   Since we are using the default vhost to represent the first virtual host, this virtual host will also receive requests for names we haven't mentioned explicitly.

 

(defun setup-virtual-hosts (server)
  (let ((vhost-table (wserver-vhosts server))
	(foo-names '("localhost" "www.foo.com" "foo.com"))
	(bar-names '("www.bar.com" "store.bar.com")))
    
    (let ((default-vhost (wserver-default-vhost server)))
      (setf (vhost-names default-vhost) foo-names)
      (dolist (name foo-names)
	(setf (gethash name vhost-table) default-vhost)))
    
    (let ((bar-vhost (make-instance 'vhost :names bar-names)))
      (dolist (name bar-names)
	(setf (gethash name vhost-table) bar-vhost)))))

When a request comes in, AllegroServe will determine which vhost is the intended target and if none is found it will select the default vhost as the intended target.  The vhost so determined will be stored in the http-request object in the slot accessed by request-vhost function.

host argument to publish functions

We now are in a position to describe what values the :host argument to the publish functions can take on.   The :host argument can be nil or one of:

  1. a string naming a virtual host.  If there is no virtual host with this name a new virtual host object is created.
  2. a vhost object
  3. the symbol :wild
  4. a list of items of the above items

If the value of the :host argument is nil, then its value is assumed to be :wild.

The value of the :host argument is converted into a list of one or more vhost objects and/or the symbol :wild.    The meaning of a vhost is clear: it means that this entity will be visible on this virtual host.   The meaning of :wild is that this entity will be visible on all virtual hosts, except it can be shadowed by a entity specified for a particular virtual host.  Thus you could publish an entity for :path "/" and :host :wild and it will be used for all virtual hosts that don't specify a entity for :path "/".  Note that when a request comes in and the search is done for an entity to match the request every step of the way we look first for a vhost specific handler and then a :wild handler   It is not the case that we first do a complete search for a vhost specific handler and then restart the search this time looking for a :wild handler.

 


Timeouts

A web server is a program that provides resources to client program connecting over the network.  The resources a web server has to offer is limited and it's important that network problems or buggy clients don't cause those resources to be unavailable to new clients.   AllegroServe uses timeouts to ensure that no client can hold a web server resource for more than a certain amount of time.

Three common ways for a resource to be held are

  1. A client stops sending a request in the middle of the request.   This can happen if the client machine crashes or  if the client's machine loses network connectivity with the  machine running AllegroServe.
  2. A client stops reading the response to its request.    The networking code will automatically stop the sender from writing new data if the receiver has a lot of existing data to read.
  3. The response function to an http request can take a very long time, or may even be in an infinite loop.   This could be due to a bug in a http response function or something unexpected, like a database query taking a long time to finish.

 

Acl 6.0 or older

For AllegroServe running in Acl 6.0 or older timeouts are done this way:

net.aserve::*read-request-timeout*  - number of seconds AllegroServe allows for the request line (the first line) and all following header lines.   The default is 20 seconds.

net.aserve::*read-request-body-timeout* - number of seconds AllegroServe allows for the body of the request (if any) to be read.   The default is 60 seconds.

(wserver-response-timeout wserver) - the number of seconds AllegroServe allows for an http request  function to be run and finished sending back its response.  The initial value for this slot of the wserver object is found in *http-response-timeout* which defaults to 120 seconds.  You can alter this timeout value with the :timeout argument to with-http-response or by specifying a :timeout when publishing the entity.

Acl 6.1 or newer

In Acl 6.1 we added the capability of having each I/O operation to a socket stream time out.   This means that we don't have to predict how long it should take to get a request or send a response.  As long as we're making progress reading or writing we know that the client on the other end of the network connection is alive and well.    We still need a timeout to handle case (3) above but we can allow a lot more time for the http response since we aren't using this timer to catch dead clients as well.    Thus we have these timeout values:

(wserver-io-timeout wserver) - the number of seconds that AllegroServe will wait for any read or write operation to the socket to finish.   The value is initialized to the value of *http-io-timeout*   which defaults to 60 seconds.

(wserver-response-timeout wserver) -  the number of seconds AllegroServe allows for an http request function to be run and finished sending back its response. The initial value for this slot of the wserver object is found in *http-response-timeout* which defaults to 300 seconds. You can alter this timeout value with the :timeout argument to with-http-response or by specifying a :timeout argument to the publish function creating the entity.

publish-directory and publish-file default their timeout argument in a way that makes sense based on whether the Lisp supports I/O timeouts.    If I/O timeouts are supported then there is no reason to do a global timeout for the whole response if you're just sending back a file.   Thus in this case the timeout argument defaults to a huge number.

 


Miscellaneous

(ensure-stream-lock stream)

The function adds a process lock to stream's property list (under the indicator :lock) if no such lock is present.   Then it returns the object stream.

The AllegroServe logging functions make use of the stream's lock to ensure that only one thread at a time write log information to the stream.   If the logging functions find that a a log stream doesn't have a lock associated with it then the log information will still be written to the stream but under heavy load the log information from multiple threads will become intermixed.

 

(map-entities function locator)

When one of the publish functions is called enties are placed in locator objects.   The locator objects are then checked when http requests come in to find the appropriate entity.  map-entities will apply the given function of one argument to all the entities in the given locator.   One common use of map-entities is to find entities that you no longer wish to be published.  For that reason map-entities will remove the entity the passed to the function if the function returns the keyword symbol :remove as its value.

 


Running AllegroServe as a Service on Windows NT

On Windows NT (and Windows 2000 and Windows XP) when you log off all the programs you are running are terminated.   If you want to run AllegroServe on your machine after you log out you have to start it as a Windows Service.  This is easy to do thanks to code contributed by Ahmon Dancy.  

The first step is to download the ntservice code and documentation from the Franz opensource site.  Read the documentation carefully especially as regards the different capabilities of the accounts under which you may choose to run AllegroServe.  

You'll probably want to build an AllegroServe application that can run either normally or as a service,.  You can run it normally to debug it and then start it as a service when you're satisifed that it works.

Following is an example of how this can be done.   I've decided that if the /service argument is given on the command line when I start my application then I'll start my application as a service, otherwise I start it normally.      Here is the restart-init-function (to generate-application) that I define:

(defun start-aserve-application ()
  (flet ((start-application ()
	   (net.aserve:start :port 8020)
	   (loop (sleep 100000))))
    (if* (member "/service" (sys:command-line-arguments) :test #'equalp)
     then ; start as  a service
	  (ntservice:start-service #'start-application)
     else ; start as a normal app
	  (start-application)))))

 

I use (loop (sleep 100000)) to ensure that the restart-init-function never returns.

 

In order to register my application as a service to the operating system I call ntservice:create-service like this:

(ntservice:create-service "aservetest" "Aserve Test Service"
     "c:\\acl61\\testservice\\testapp\\testapp.exe -- /service")

Note that I use "--" before the "/service".  This is very important.    The "--" separates the arguments used to start up the program from the arguments passed to the program itself.    The call to ntservice:create-service is done only once and need not be done from within your application. 

Once an application is registered as a service you can start it by going to the Control Panel, selecting Administrative Tools and then Services.   Locate the service you just added, right click on it and select start.   You can stop the service with a a right click as well.

 


Using International Characters in AllegroServe

A character set is a collection of characters and a rule to encode them as a sequence of octets.   The default character set for web protocols is Latin1 (also known as ISO 8859-1).   The Latin1 character set represents nearly every character and punctuation needed for western European languages (which includes English).   

If you want to work with characters outside the Latin1 set you'll want to use the International version of Allegro CL which represents characters internally by their 16-bit Unicode value.    In this section we'll assume that you're using International Allegro CL.

What the web protocols refer to as charset (character set) Allegro CL refers to as an external-format.  Allegro CL uses a different term since it always uses 16-bit Unicode to represent characters internally.  16 bit unicode can represent nearly all characters on the planet.  It's only when those characters are read from or written to devices outside of Lisp that the actual encoding of those characters into octets matters.    Thus the external-format specifies how characters are encoded and specifies which Unicode characters are part of the character set that the external-format defines.  Attempts to write a Unicode character that's not part of the character set results in a question mark being written.

External-formats are also used in Allegro CL to do certain character to character transformations.  In particular on the Windows platform external formats are used to convert the lisp end of line (a single #\newline character) to the #\return #\linefeed character that is standard on Windows.   Thus an external format such as :utf-8   has a different effect on Windows than on Unix, and this is not desireable for web applications.   The function call (crlf-base-ef :utf-8) returns an external format on Windows and on Unix that simply does the character encoding part of the external format, and thus this is the external format you would want to use in a web application.

server to client (browser) character transfer

When a web server returns a response to a client it sends back a response line, a header and optionally a body.   The response line and header are always sent using a subset of the Latin1 character set (the subset corresponding the the US ASCII character set).   The body is sent using the full Latin1 character set, unless otherwise specified.  To specify the character set of the body you add an extra parameter to the Content-Type header.   Instead of specifying a content type of "text/html" you might specify "text/html; charset=iso-8859-2".    This alerts the http client that it must interpret the octets comprising the body of the response according to the iso-8859-2 character set.   This however is not enough to make AllegroServe encode the Unicode characters it's sending to the client using the approrpriate external format.  You would have to do this:

(with-http-response (req ent)
  (with-http-body (req ent :external-format (crlf-base-ef :iso8859-2))
     ... generate and write page here..
))

Note that the charset names and external format name are similar but not identical.   Check here for the charset names and check here for the Allegro CL external format names.

In order to make it easier to specify external formats in AllegroServe you can specify a default external format when you start the server (with the :external-format argument to the start function).   The variable *default-aserve-external-format* will then be bound to this external format in each of the threads that processes http requests.   It's the value of *default-aserve-external-format* that is used as the default argument to the :external-format argument to with-http-body.

The default value of the :external-format argument to the start function, and thus the default value of *default-aserve-external-format*, is (crlf-base-ef :latin1-base).   This means that regardless of the locale in which you run AllegroServe, AllegroServe will by default using the Latin1 character set, which is what is expected by web clients..

A very useful character set is utf-8 which is the whole Unicode character set and thus comprises all of the characters you can store inside Lisp.    The corresponding Allegro CL external format is the value of (crlf-base-ef :utf-8).   Specifying this character set allows you to write web pages that can characters from nearly every language in the world (whether the web browser can find the glyphs to display all those characters is another matter).

client (browser) to server character transfer

The brower sends characters to the web server when the user enters data into a form and submits the form.   The important thing to remember is that the browser will encode characters using the character set that was specified for the web page containing the form.  If you fail to specify a charset when the page was given to the web browser then the web browser will decide on its own how to encode characters that aren't part of the default character set ( which is of course Latin1).    The browser will not tell you which encoding it chose.   Therefore if you ever plan on allowing  non-Latin1 characters to be specified in your forms you'll want to specify a charset for the page containing the form.

You can specify the charset in the Content-Type field of the header that's sent with the page (as we described above) or you can put it in the page itself using a meta tag:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Retrieving form data in AllegroServe is done with the request-query function and that function takes an :external-format argument so you can specify how the form data can be decoded.   If your form sends multipart data then you can use the :external-format argument to get-multipart-sequence to retrieve the form data and decode the data.

examples

The AllegroServe test page has links to a few pages that show how international characters work with AllegroServe.  One of these is the the International Character Display page.  This page  shows what happens when the charset and external-format are set to different values and a page containing international characters is displayed.  It demonstrates how it important is is that those two character set specifications be kept in sync, and it shows that utf-8 is most likely the best choice for a character set for your web pages.

 


Debugging

Debugging entity handler functions is difficult since these are usually run on a separate lisp thread.  Also AllegroServe catches errors in entity handler functions, thus preventing you from interactively diagnosing the problem.

You can put AllegroServe in a mode that makes debugging easier with the net.aserve::debug-on function.   Note that this is not an exported function to emphasize the fact that you are working with the internals of AllegroServe.

 

(net.aserve::debug-on &rest debugging-features-to-enable)

We've classified the debugging features and given each a keyword symbol name.    This function turns on those named features.  If no arguments are given, then debug-on prints the list of debugging features and whether each is enabled.

 

(net.aserve::debug-off &rest debugging-features-to-disable)

This function turns off the given list of features.

 

The list of debug features are:

:info AllegroServe prints information at certain places while doing its processing.  
:xmit AllegroServe prints what it receives from and sends to the client.  In some cases the body of a request or response will not be printed.
:notrap When enabled, this prevents AllegroServe from catching errors in entity handler functions.  If an error occurs and you're running in an evironment where background processes automatically create new windows (such as the emacs-lisp interface) then you'll be given a chance to :zoom the stack and diagnose the problem.  Note that if a timeout has been established to limit the amount of time that a certain step is allowed (and this is done by default) then the interactive debugging session will be aborted when the timeout is reached.

 

Two pseudo debug features are :all and :log..   Specifying :all to debug-on or debug-off   is the same as listing all of the debug features.   Specifying :log is the same as specifying all features except :notrap.