A MultiThreaded Webserver: Introduction to Socket Programming and HTTP

Stats

Rating: 4.75 out of 5 by 12 users

Submitted: 04/30/02

Objective
Socket programming is a very interesting area of software developement. However, with little or no expirience it is a bit difficult to get a program up and going. In this tutorial I'll do my best to walk step by step through the development of a very simple Web server. When finished, you should have a general understanding of the HTTP protocol and socket programming with TCP.

This tutorial is presented in Java. For anyone interested in putting this to work and gaining some more valuable expirience, i'd reccomend (once you've got a grasp on how the code works, and the concepts of HTTP and socket programming) implementing this in C#. It'd be a good exercise in using the FCL and a good first project for C# socket programming.

Our Web Server
The goals for the Web server we'll design here are:

Remember, "the server" is simply the machine which is running our program (the WebServer program). Valid file's for our Web server will be html, jpeg's, gif's, and plaintext. This should be enough. If you're hungry for it, you can extend the server's functionality and file support.

Brief background on HTTP
The basic idea is this: When a user requests a web page (for example, clicks a link) the browser sends an HTTP request message for the objects to the server (which we'll be implementing). "The objects" here would consist of any html files, jpeg or gif files, or any others associated with the requested page. The server receives the request and responds with an HTTP response message that contains all the requested objects. In HTTP 1.0 (used here) a new TCP connection is created for each request, and the response is sent back over that connection, after which the connection is closed.
The structure of HTTP 'request' and 'response' messages is important to the implementation of our web server. The server will be sent requests, and will respond with appropriate response messages. But we'll cover the details of these HTTP messages in a moment. For now, I'll quickly explain sockets, and then we'll get to some coding.

Sockets
A socket is the interface between the application layer and the transport layer (if that makes any sense). In less technical terms, a socket can be thought of as a mailbox. A process delivers to and receives information from it's socket. Since this tutorial is presented in Java, you will see how to create and use sockets using the Java API java.net.Socket.

Excuse the extremely brief "discussion" on sockets, but the concept should become clearer in the code. So let's get to that.

import java.io.*;
import java.net.*;
import java.util.*;

public final class WebServer {
    public static void main(String args[]) throws Exception {

        //Establish the listen socket
        int PORT = 5306;     //select your favorite number > 1123
        ServerSocket listenSocket = new ServerSocket(PORT);

        //Process HTTP service requests in an infinite loop
        while(true) {
            //listen for TCP connection request
            //Construct an object to process the HTTP request message
            HttpRequest request = new HttpRequest(listenSocket.accept());
            Thread thread = new Thread(request);
            thread.start();
        }
    }
}

You'll notice something called a ServerSocket, which is a little different than just Socket. A ServerSocket listens on a specified port. When a request comes in on that port, the ServerSocket uses the accept() method to create a new TCP connection with the client (the computer sending the request), and return a new Socket which can be used to communicate with the client over the TCP connection. All information sent into that socket is sent (over TCP) to the client, and all information sent by the client to the server is picked up at that socket.

The port on which the ServerSocket is listening is up to you. It can be any number between 1024 and 65,536. Port numbers between 0 and 1023 are reserved for certain other application protocols (HTTP, FTP, SMTP, TELNET, etc.) and are therefore restricted. These are called well-known port numbers.

So execution of the server goes as follows:
First, a ServerSocket is created, and set to listen on a specified port. The server then enters an infinite loop, listening for new requests for connections. When it receives a request, it creates a new HttpRequest, with a reference to the associated Socket, processes the request (on a seperate thread of execution), and sends a response. The response will be generated and sent from within the class HttpRequest. So all that is left is to define the class HttpRequest.

So let's do it! Here's how it begins:

final class HttpRequest implements Runnable {

    final static String CRLF ="\r\n";
    Socket socket;

    public HttpRequest(Socket socket) throws Exception {
        this.socket = socket;
    }


    //Implement the run() method of the Runnable interface
    public void run() {
        try {
            processRequest();
        } catch (Exception e) {
            System.out.println(e);
        }
    }

    private void processRequest() throws Exception {
        //Get references to sockets input and output streams
        InputStream is = this.socket.getInputStream();
        DataOutputStream os = new DataOutputStream(this.socket.getOutputStream());
        
        //Set up input stream filter
        BufferedReader br = new BufferedReader(new InputStreamReader(is));
        
        //Get the request line of HTTP message 
        String requestLine = br.readLine();


        //...

We save a reference to the client socket as a member of the class. The CRLF is simply a carriage return line feed, which will come in handy later. The run method simply calls processRequest(). This is used so we are able to process our HTTP requests on seperate threads of execution. (Yep, thats all it takes to make the server multi-threaded!). The heart of this class is in the processRequest() method.

Only half the method is shown here, because there is some information on Http request message's and Http response message's that you'll have to know before examining the rest of this method. But for now, you see the creation of Input and Output Streams to the client socket. br can now be used to read from the socket, and os can be used to write to the socket. Remember, all information the client sends is sent to our socket, where we pick it up (read it), and all information we want to send to the client is sent through the socket, where we drop it off (write it). The last line of code here uses br to read the first line of the request message from the socket.
Before we move on, we've got to talk a little bit about request and response messages.

Http Message's
HTTP messages are are written in oridinary, ASCII format, that any human being could read. They consist of an initial request (for request messages) or status (for response messages) line, followed by several header lines. In the case of the response messages, the body (consisting of all the data of the requested object) is the last part of the message. A closer look at these message types follows.

HTTP Request Message
GET /somedir/page.html HTTP /1.0
Host: www.someschool.edu
Connection: close
User-agent: Mozilla/4.0
Accept-language: fr
extra carriage return, line feed

Above is an example of a typical HTTP request message. Something very similar to this would be generated and sent to the server on the appropriate port (80, for HTTP, though in this example it would be whatever you choose to set your ServerSocket on) when you click on a link for a webpage. In this example, the line of importance is the request line, the first line. This line could start with several values, inluding GET, POST, and HEAD. For our purposes, we need only focus on GET request messages. This specifies the file requested. It is the servers responsiblity to bundle that file into a response message and send it back to the client.

HTTP Response Message
HTTP/1.0 200 OK
Connection: close
Date: Thu, 06 Aug 1008 12:00 12:00:15 GMT
Server: Apache/1.3.0 (Unix)
Last-Modified: Mon, 22 Jun 19998 09:23:25 GMT
Content-Length: 6821
Content-Type: text/html

(data data data data data data ...)

The (data data data data data data ...) represents the entity body and is the meat of the message. This is the requested object that is being sent back to the client. The first line is the status line, which contains the HTTP version and an OK message. The status code for "OK" is 200. There are several different messages that could be included here, and all have specific status codes. Some common examples:

200 OK
404 Not Found
400 Bad Request
505 HTTP Version Not Supported

The next few lines are the header lines. The only header line we will use in our WebServer is the Content-Type header, which specifies the type of the object being sent. In our server the possible types are:

text/html
image/jpeg
image/gif
text/plain

So the response message our server will send will consist of a status line, followed by a Content-type line, followed by the requested object. The code below examines the recieved request line, generates an appropriate response message, and sends it to the client.

        // Extract the filename from the request line. Assume a GET command
        StringTokenizer tokens = new StringTokenizer(requestLine);
        tokens.nextToken(); //SKIP OVER THE ASSUMED 'GET' token
        String fileName = tokens.nextToken();
        // Drop the slash at the begginning. 
        if(fileName.startsWith("/"))
            fileName = fileName.substring(1,fileName.length());

        // Open the requested file.  
        FileInputStream fis = null;
        boolean fileExists = true;
        try {
            fis = new FileInputStream(fileName);
        } catch (FileNotFoundException e) {
            fileExists = false;
        }
    
        // Construct the response message.
        String statusLine = null;
        String contentTypeLine = null;
        String entityBody = null;
            
        if (fileExists) {
            statusLine = "HTTP/1.0 200 OK" + CRLF;
            contentTypeLine = "Content-type: " + contentType(fileName) + CRLF;
        } else {
            statusLine = "HTTP/1.0 404 Not Found" + CRLF; 
            contentTypeLine = "NONE";
            entityBody = "\n\n Not Found";
        }
        
        // Send the status line.
        os.writeBytes(statusLine);
        
        // Send the content type line.
        os.writeBytes(contentTypeLine);
        
        // Send a blank line to indicate the end of the header lines.
        os.writeBytes(CRLF);
        
        // Send the entity body.
        if (fileExists) {
            sendBytes(fis, os);
            fis.close();
        } else {
            os.writeBytes(entityBody);
        }
        
        //Close the streams
        os.close();
        br.close();
        socket.close();
    }

    private String contentType(String fileName) {
        if(fileName.endsWith(".htm") || fileName.endsWith(".html"))
            return "text/html";
        else if(fileName.endsWith(".jpg") || fileName.endsWith(".jpeg"))
            return "image/jpeg";
        else if(fileName.endsWith(".gif"))
            return "image/gif";
        else if(fileName.endsWith(".txt"))
            return "text/plain";
        else
            return "application/octet-stream";
    }
         
    private static void sendBytes(FileInputStream fis, OutputStream os) throws Exception {
        // Construct a 1K buffer to hold bytes on their way to the socket.
        byte[] buffer = new byte[1024];
        int bytes = 0;
        
        // Copy requested file into the socket's output stream.
        while((bytes = fis.read(buffer)) != -1 )
            os.write(buffer, 0, bytes);
    }
}

The two methods contentType() and sendBytes() are just helper methods, and should be self explanatory. What you need to concentrate on is the implementation of the processRequest() method.

The fileName could also be a pathname to a file in another directory. In this implementation, the file name's start in the current directory. This detail is up to you. You could append a "C:\Web\page\" to each requested file, then all objects/pathnames would be searched for in that directory. You're choice. Also, this is platform dependent, for the simple matter of the slashes used in pathnames. (A solution to this problem is beyond the scope of this tutorial, but not terribly difficult).

Telioses!
That's it, you're done. To try it out, just run this puppy on your machine. Then, fire up your favorite browser, and direct it to:

http://hostname:port# /filename

where port# is the port you have your ServerSocket listening on. Make sure "filename" is in the appropriate directory, and that it is one of the supported file types.
hostname is the name of the machine you are running the WebServer, such as ws13.ug.cs.sunysb.edu or 129.49.238.126.

Conclusion
Hopefully, you're not staring at your monitor wondering what you just read. I tried to keep things as clear and concise as possible. Now that I've finished this tutorial, I'll get the C# code for this up here ASAP, but I reccomend you use this tutorial as a roadmap, and write the C# code yourself. I'll post both Java and C# code samples in the Tools section.

Conditions of Use | Privacy Notice