The Hypertext Transfer Protocol (HTTP) is an application-level protocol with the lightness and speed necessary for distributed, collaborative, hypermedia information systems. It is a generic, stateless, object-oriented protocol which can be used for many tasks, such as name servers and distributed object management systems, through extension of its request methods (commands). A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.
Put simply, HTTP is the protocol that allows Web browsers and servers to communicate. It forms the basis of what a Web server must do to perform its most basic operations.
When discussing how a Web server works, it is not enough to simply outline a diagram of how low-level network packets go in and out of a Web server.
HTTP started out as a very simple protocol, and even though it has had numerous enhancements, it is still relatively simple. As with other standard Internet protocols, control information is passed as plain text via a TCP connection.
In fact, HTTP connections can actually be made using standard “telnet” commands.
/home/chughes > telnet www.extropia 80 GET /index.html HTTP/1.0
Note that port 80 is the default port a Web server “listens” on for connections.
In response to this HTTP GET command, the Web server returns to us the page “index.html” across the telnet session, and then closes the connection to signify the end of the document.
The following is part of the sample response:
But this simple request/response protocol was quickly outgrown, and it wasn’t long before HTTP was refined into a more complex protocol (currently version 1.1). Perhaps the greatest change in HTTP/1.1 is its support for persistent connections.
In HTTP/1.0, a connection must to be made to the Web server for each object the browser wishes to download. Many Web pages are very graphic intensive, which means that in addition to downloading the base HTML page (or frames), the browser must also retrieve a number of images. Many of them may actually be quite small and merely sliced up to provide some hard-coded formatting framework to the rest of the HTML page.
Establishing a connection for each one is wasteful, as several network packets have to be exchanged between the Web browser and Web server before the image data can ever start transmitting. In contrast, opening a single TCP connection that transmits the HTML document and then each image one-by-one is more efficient, as the negotiation of starting new TCP connections is eliminated.
An HTTP transaction consists of a header followed optionally by an empty line and some data. The header will specify such things as the action required of the server, or the type of data being returned, or a status code.
The header lines received from the client, if any, are placed by the server into the CGI environment variables with the prefix HTTP_ followed by the header name. Any – characters in the header name are changed to _ characters. The server may exclude any headers which it has already processed, such as Authorization, Content-type,
HTTP_ACCEPTThe MIME types which the client will accept, as given by HTTP headers. Other protocols may need to get this information from elsewhere. Each item in this list should be separated by commas as per the HTTP spec.Format: type/subtype, type/subtype
HTTP_USER_AGENTThe browser the client is using to send the request. General format:
The server sends back to the client:
- A status code that indicates whether the request was successful or not. Typical error codes indicate that the requested file was not found, that the request was malformed, or that authentication is required to access the file.
- The data itself. Since HTTP is liberal about sending documents of any format, it is ideal for transmitting multimedia such as graphics, audio, and video files. It also sends back information about the object being returned.
Indicates the media type of the data sent to the recipient or, in the case of the
HEAD method, the media type that would have be en sent had the request been a
GET. Content-Type: text/html
The date and time at which the message was originated. Date: Tue, 15 Nov 1994 08:12:31 GMT
The date after which the information in the document ceases to be valid. Caching clients, including proxies, must not cache this copy of the resource beyond the date given, unless its status has been updated by a later check of the origin server. Expires: Thu, 01 Dec 1994 16:00:00 GMT
An Internet e-mail address for the human user who controls the requesting user agent. From: [email protected] The request is being performed on behalf of the person given, who accepts responsibility for the
method performed. Robot agents should include this header so that the person responsible for running the robot can be contacted if problems occur on the receiving end.
Used with the
GET method to make it conditional: if the requested resource has not been modified since the time specified in this field, a copy of the resource will not be returned from the server; instead, a 304 (not modified) response will be returned without any data. If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT
Indicates the date and time at which the sender believes the resource was last modified. Useful for clients that eliminate unnecessary transfers by using caching. Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT
The Location response header field defines the exact location of the resource that was identified by the request URI. If the value is a full URL, the server returns a “redirect” to the client to retrieve the specified object directly.
Location: http://WWW.S tars.com/Tutorial/HTTP/index.html
If you want to reference another file on your own server, you should output a partial URL,
such as the following:
Allows the client to specify, for the server”s benefit, the address (URI) of the resource from which the request URI was obtained. This allows a server to generate lists of back-links to resources for interest, logging, optimized caching, etc. It also allows obsolete or mistyped links to be traced for maintenance. Referrer: http://WWW.Stars.com/index.html
Server response header field contains information about the software used by the origin server to handle the request.
Server: CERN/3.0 libwww/2.17
Information about the user agent originating the request. This is for statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake
of tailoring responses to avoid particular user agent limitations – such as inability to support HTML tables. User-Agent: CERN