Perchild: Setting Users and Groups per Virtual Host
August 18, 2000
One of the biggest problems with administering a major server housing multiple sites is restricting access to the sites to only those people responsible for maintaining a specific site. The reason for this is that all of the Apache child processes run with the same user and group Id. Therefore, all of the files need to be readable, writable, and executable by the user and group that the server is running as. This becomes a much bigger issue when you add CGI and PHP scripts to the site. If those scripts must access private information, then that information must be stored with relatively insecure user and group Ids.
Apache 1.3 solved this problem by introducing suexec, which introduces other problems and PHP and mod_cgi can not take advantage of it. Apache 2.0 has introduced a new MPM to solve this problem in a more elegant way that all scripts can take advantage of.
The new MPM is called Perchild, and it is based on the Dexter MPM. This means that a set number of child processes are created and each process has a dynamic number of threads. In this MPM it is possible to specify User and Group IDs for clusters of child process. Then, each virtual host is assigned to run in a specific cluster of child processes. If no cluster of child processes is specified, then the virtual host is run with the default User and Group Ids.
There were many designs considered for this MPM, but in the end only one made sense. The first consideration was which MPM to base off of. The options were the prefork, mpmt_pthread, and dexter. Prefork and mpmt_pthread had one major drawback, they create new child processes which are completely separated from each other whenever the server gets busy. This means that the parent process would need to determine what User and Group Ids the new process should have when it is created. While this seems easy at first glance, it requires load balancing techniques that begin to get very complicated. If the prefork or mpmt_pthread MPMs are desired, it makes more sense to put a load balancer or proxy in front of the web servers, and run multiple instances of Apache on different ports. To the client, this would look very similar to the Perchild MPM.
After eliminating prefork and mpmt_pthread, the only option left was Dexter. Now, the question was how to associate virtual hosts with child processes. Do we base the number of child processes on the number of virtual hosts, or do we allow the web admin to specify how the setup should look. Assuming that the more flexible we make the Perchild MPM, the more likely it was to be used, we allow the web admin to determine how their site looks. This is done through the combination of two directives:
The first directive allow the administrator to assign a number of child processes to use the same User and Group Ids. This is to provide for some level of robustness. Because Perchild creates new threads in the same child process to handle new requests, it is not the most robust server, although it is very scalable. If one of the threads seg faults, then that entire process will die, taking with it all of the requests currently being server by that child process. By specifying more than one child per user/group pair, we allow the server to balance the number of requests between multiple child processes. The second directive is specified inside a VirtualHost stanza, and assigns that Virtual Host to a specific User and Group Id. The server is smart enough to combine all of the VirtualHosts with the same User and Group Ids to the same child processes.
How Does it Work?
obvious question now, is how does this work internally. The Perchild
MPM has a special global table which it uses to start children and
allow those children to change to the correct user Ids. It
also uses the per-server configuration to pass requests between child
processes. When the MPM encounters a
parsing the configuration for each VirtualHost, if the server
The next step is to create the child processes. When each process is started, it checks the global child table, and switches to the appropriate User and Group Ids. If no User and Group Id are specified for this child process, then the User and Group specified in the main server are used. Each child also adds the socket in the socket table to the list of sockets it will poll on. From here, child startup proceeds as normal with each child process polling on all of the ports opened in the parent process. This leaves the server looking like Figure 1.
request comes in, the Perchild MPM is the first module called in the
The request processing then moves to the correct child processing. Once a socket is passed over the Unix Domain socket, the new child process is woken up out of poll with data its end of the Unix Domain socket. Each child has a table over sockets to use for this occasion, there is one socket in the table for each thread in the process. Usually, the sockets are set to -1, but when the passed socket descriptor is detected, we set this thread's spot in the table to -2. Later, the fact that the socket is -2 is used to determine that we must receive the socket descriptor from the Unix Domain socket. The received socket is then placed in the thread's position in the socket table.
Processing then continues as normal, reading from the Unix Domain socket, until the post_read_request phase. At this point, we know that the request has come from another child process in our server and we know that this request is meant for this child processes User and Group Id. The only thing left to do is replace the Unix Domain socket that is currently in the connection structure with the socket that was passed from the first child process. This child then continues serving the original request.
This will never be the fastest MPM, because it relies on passing socket descriptors between processes, which is inherently a slow process. It would be much faster to give the server multiple IP addresses, and have different Apache installations listen to port 80 on different IPs. However, that can get very difficult to administer.
This MPM was finished the day before the fifth alpha was released, so it is not well tested at all. Over the next few weeks and months, this MPM will become more stable and more portable. Currently, this MPM has only ever been tested on Linux, but with minor modifications, it should work on almost all Unices. There has been talk of modifying the Windows MPM to allow the threads to change their identities for each request, but that has not happened yet.