Keeping Your Images from Adorning Other Sites

Keeping Your Images from Adorning Other Sites


June 14, 2000

Webmasters are ever searching for ways to make their sites look cool and attractive. One way is to dress it up with images, logos, and other graphics--sometimes referred to as 'eye candy.' Of course, if you happen to be in the forefront of this in any way, you run the risk of having others cadge your art in order to dress up their sites. And they probably won't even ask permission nor pay you a royalty, either.

This article shows how you can use Apache configuration directives to limit access to your art so that it's more difficult to use elsewhere.

The Problem

Simply put, there are two types of "infringement" involved here:

  1. Someone uses an IMG tag on its site to refer to a graphic on yours
  2. Someone downloads an image from your site and makes a copy on its

The first type not only causes your images to prettify someone else's site, but hurts you more directly because visitors to their site are hammering yours to get the images. Your log files get filled with access request entries, your bandwidth gets used -- and you're getting no benefit from it. This type of theft is almost completely preventable.

The second type of theft is more insidious. The 'borrower' doesn't cause your site to get pounded on for access to the images, since they've been copied to the borrower's site, but you probably weren't given any credit for the artwork--and you probably don't even know the theft happened. Because of the way the Web works, this type of theft can't really be prevented, but you can at least make it a little more difficult.

You can't completely prevent either of these, of course, but you can make them more difficult to do.

Identifying the Files to Protect

You're probably not going to want to protect every document on your site. Even if you do, for the sake of this article I'm assuming you only want to protect your artwork. So how do you indicate that the rules only apply to them? With directives such as the following in your server config files:


  
    <FilesMatch ".(gif|jpg)">
        [limiting directives will go here]
    </FilesMatch>
  

You can put a container such as this inside a <Directory> container, or inside a <VirtualHost> container, or outside any containers at all (in which case it applies to all such files on your server), or even inside .htaccess files. Put it wherever it makes sense to protect what you want protected.

The Key: the Referer Header Field

Down on the wire, where the browsers, spiders, and servers live, every request for a Web page includes a component called the HTTP request header. This contains information about the request, such as the user's preferred languages, the types of documents the client is able to handle -- and not least, the name of the item being requested. This information is conveyed in a series of name/value pairs called header fields.

One of these header fields is of particular importance to what we want to do. It's called the Referer field (yes, I know, it's misspelt--but that's how it's misspelt in the definition, too), and it indicates the URL of the client's last page if and only if the client is following a link. That is, if you're viewing page A, and click on a link to page B, the request for page B will include a Referer field that says "I'm following a link on page A." If no link is being followed, such as if the user just typed B's URL into the Location field of his browser, there will be no Referer field in the request header.

How does this help? Well, it gives us a way to tell whether an image is being requested because it was linked to by one of our pages -- or by someone else's.

Using SetEnvIf to 'Tag' Images

For a simple case, suppose our Web site's main page is <http://my.apache.org/>. In this case, we want to restrict any artwork requests that don't originate on our site (i.e., only allow them if the image was linked to by one of our pages). We can do this by using an environment variable (also called an envariable) as a flag, and setting it if the conditions are right. Something like the following ought to do it:


  
    SetEnvIfNoCase Referer "^http://my.apache.org/" local_ref=1
  

When Apache processes a request, it will examine the Referer field in the header, and set the environment variable local_ref to "1" if the value starts with our site address--i.e., is one of our pages.

The string inside the quotation marks is a regular expression pattern that the value must match in order for the environment variable to be set. Describing how to use regular expressions (REs) is far beyond the scope of this article; for now, just be aware that the SetEnvIf* directives use them.

The "NoCase" portion of the directive name means, "do this whether the Referer is 'http://my.apache.org/', or 'http://My.Apache.Org/', or 'http://MY.APACHE.ORG/' -- in other words, ignore the upper/lower caseness of the value.

Using Envariables in Access Control

The Order, Allow, and Deny directives allow us to control access to documents based upon the setting (or unset-ness) of an envariable. The first thing to do is to indicate the order in which Apache will process Allow and Deny directives; you do with the Order directive as follows:

    Order Allow,Deny
  

This means that Apache will go through any list of Allow directives it has that apply to the current request, and then repeat the process with any Deny directives. With this ordering, the default condition is 'denied;' that is, no-one will be able to access anything unless there's an applicable Allow directive.

All right, so let's add the directive that will let local references work:

    Order Allow,Deny
    Allow from env=local_ref
  

This will let a request proceed if the local_ref envariable is set (with any value whatsoever). Any and all other requests will be denied because they don't meet the Allow conditions and the default is to deny access.

Note:
Please don't fall into the trap of sprinkling your .htaccess and server config files with <Limit> containers. You almost certainly don't need them, and they'll just confuse the issue. Don't use them unless you really want to have GET requests treated differently from POST requests, for instance.

Putting It All Together

Putting all these pieces together, we end up with a stanza of directives that looks something like this:


  
        SetEnvIfNoCase Referer "^http://my.apache.org/" local_ref=1
        <FilesMatch ".(gif|jpg)">
            Order Allow,Deny
            Allow from env=local_ref
        </FilesMatch>
  

These may all appear in your server-wide configuration files (e.g., httpd.conf), or you can put the <FilesMatch> container in one or more .htaccess files. The effect is the same: Within the scope of these directives, images can only be fetched if they were linked to from one of your pages.

Note:
As of Apache 1.3.12 and earlier, the SetEnvIf* directives are only allowed in the server-wide configuration files. In later versions, they can be used inside containers and in .htaccess files.

Going Further

I mentioned earlier that you can't fully prevent image theft. That's because of two things, which apply pretty much to the two different types of poaching respectively:

  • Someone who really wants your artwork can always request it using a faked-up Referer value that happens to meet your criteria. In other words, by jiggering up the request so it looks like it's a reference from your site.
  • If someone legitimately views your artwork by going through your pages, the image files are almost certainly in his client's cache somewhere. So he can pull it out of a cached valid request rather than making another one just to pick up the image.

Though it's essentially impossible to foil someone who's really desperate to snitch your artwork, the steps described in this article should make it too difficult for the casual poacher.

Another thing you can do, depending upon how protective you are of your art, is to watermark the images. Watermarking a digital image consists of encoding a special 'signature' into the graphic so that it can be detected later. Digital watermarking doesn't degrade the quality of the image, and can be done in such a way that even a cropped subsection of the image contains the mark, and it's detectable even if the image has been otherwise edited since the mark was inserted. It's even possible to detect a watermark in an image that was printed and then scanned in, having left the digital realm altogether! If you watermark your images, there's an excellent chance you'll be able to prove snitching if you ever find a suspicious image on another site somewhere.

Logging Snitch-Attempt Requests

If you're not sure whether anyone is really after your artwork, you can use the same detection mechanism and envariable to log suspicious requests. For instance, if you add the following directives to your httpd.conf file, an entry will be made in the /usr/local/web/apache/logs/poachers_log file any time someone accesses one of your images without a valid Referer:


  
    SetEnvIfNoCase Referer      !"^http://my.apache.org/" not_local_ref=1
    SetEnvIfNoCase Request_URI  ".(gif|jpg)"               is_image=1
    RewriteEngine  On
    RewriteCond    $ {ENV:not_local_ref} =1
    RewriteCond    $ {ENV:is_image}      =1
    RewriteRule    .*                   -     [Last,Env=poach_attempt:1]
    CustomLog logs/poachers_log         CLF   env=poach_attempt
  

This should have the effect of logging all attempts to access your images using one of the potential 'snitching' techniques described in this article. The first two lines set flags for the conditions (that it's an image, and that it was't referred by a local document), the RewriteCond lines check to see if the flags are set, the RewriteRule line sets a third flag combining the two, and the last line causes the logging of the request in a special file if that last flag is set. The log entry is written in the pre-defined 'CLF' format ('Common Log Format'), but you could put together your own format just as easily.

Other Resources

The techniques described in this article are geared toward a single purpose, but illustrate some of the capabilities of the Apache server. Here are some pointers to resources for further investigation:

Then there are the specific pieces of the Apache documentation that are directly related to the directives and commands described in this article:

Conclusion

Custom artwork can result from someone's effort, and taking without permission something that another has created is generally accepted as theft. This article has described a basic way to put your works of art behind a velvet rope--if you're so inclined. It won't stop determined thieves, but it should hopefully stymy or dissuade the more casual ones.


Got a Topic You Want Covered?

If you have a particular Apache-related topic that you'd like covered in a future article in this column, please let me know; drop me an email at <coar@Apache.Org>. I do read and answer my email, usually within a few hours (although a few days may pass if I'm travelling or my mail volume is 'way up). If I don't respond within what seems to be a reasonable amount of time, feel free to ping me again.

About the Author

Ken Coar is a member of the Apache Group and a director and vice president of the Apache Software Foundation. He is also a core member of the Jikes open-source Java compiler project, a contributor to the PHP project, the author of Apache Server for Dummies, and a contributing author to Apache Server Unleashed. He can be reached via email at <coar@apache.org>.