Changes

From SME Server
Jump to navigationJump to search
Splitting the SME Server Developers Guide (and relocating)
<div class="CHAPTER">
=Process startup, supervision and shutdown=

<div class="SECT1">

==Process startup==

In typical Linux systems, services (processes) are started at boot time through a mechanism such as <span class="emphasis">''System V init''</span>. When the system administrator needs to change the settings, they modify the configuration files and then restart the service or notify the process that it needs to re-read the configuration.

It is usually assumed that processes which have been started will continue to run, and only require intervention during configuration changes. There are a number of problems with this model, which are addressed by the SME Server:

* Processes do occasionally fail through software errors, memory exhaustion and accidental finger poking by the system administrator.
* Some startup scripts and processes do not gracefully handle server crashes, such as power outages. The startup scripts and processes often use process identifier (PID) files to determine whether the process is running. Reliable handling of PID files is impossible to achieve under all failure cases.
* Many processes do not deal properly with rapid invocation of stop and start requests. This is often, but not always, due to "PID file race" conditions.

</div><div class="SECT1">
----

==Process supervision: runit (and supervise)==

The SME Server addresses these issues by running processes under the '''runit''' process supervision environment, which:

* runs each process under control of its own supervisor process
* imposes process limits
* restarts the process if it fails
* provides a consistent mechanism for controlling the underlying process

<div class="NOTE"><blockquote class="NOTE">

'''Note: '''Gerrit Pape's '''runit''' came from previous work by Dan Bernstein on the '''supervise''' supervision environment. '''runit''' provides additional features, and has been released under a free software license.

</blockquote></div><div class="SECT2">
----

===The runit process tree===

When a Linux system boots, it starts the '''init''' process, which then starts all other processes. When '''init''' enters "run-level 7", it starts '''/etc/runit/2''' from an entry in <tt class="FILENAME">/etc/inittab</tt>.

<tt class="FILENAME">/etc/runit/2</tt> starts the '''runsvdir''' master supervision process, which scans the <tt class="FILENAME">/service/</tt> directory for work to do. If the '''runsvdir''' command happened to fail, it would be restarted by '''init'''.

The '''runsvdir''' command looks for subdirectories under the <tt class="FILENAME">/service/</tt> directory, and starts a '''runsv''' process to manage that directory. If any of the '''runsv''' processes fail, they will be restarted by '''runsvdir'''.

Each '''runsv''' process looks for a <tt class="FILENAME">run</tt> script under the directory it is managing. '''runsv''' runs the <tt class="FILENAME">run</tt> script and keeps a connection to the process started by that script. If the process dies, it is restarted.

If the directory also has a <tt class="FILENAME">log</tt> subdirectory, '''runsv''' runs <tt class="FILENAME">run</tt> script in that directory and connects the output of the main program to the input of the "logger" process.

This produces a process tree which looks something like this:

[root@gsxdev1 events]# pstree 1
init-+-acpid
|-md1_raid1
|-md2_raid1
| ...
|-runsvdir-+-runsv-+-multilog
| | `-ulogd
| |-6*[runsv---multilog]
| |-runsv-+-multilog
| | `-ntpd
| |-runsv-+-multilog
| | `-tinydns
| |-runsv-+-cvm-unix
| | `-multilog
| |-runsv-+-multilog
| | `-mysqld
| |-5*[runsv-+-multilog]
| | `-tcpsvd]
| |-runsv-+-multilog
| | `-oidentd
| |-runsv-+-multilog
| | `-smtp-auth-proxy
| |-runsv-+-multilog
| | `-smbd---smbd
| |-runsv---httpd---10*[httpd]

This looks like a complex process tree, but is a critical part of the SME Server's design for reliability. Each process is independent, has a consistent management interface, has process limits imposed on it, and will restart if it happens to fail.

<div class="NOTE"><blockquote class="NOTE">

'''Note: '''For the curious, if init fails, the system reboots.

</blockquote></div>

For further documentation on runit, refer to the runit manual page.

</div><div class="SECT2">
----

===Run-level 7 and the e-smith-service wrapper===

The SME Server runs in the normally unused run-level 7. This ensures that the only software running on the SME Server is software that we have chosen to run, and it is started and stopped in a consistent way. If we need to replace a standard startup script with one which runs the process under supervise, we can do so <span class="emphasis">''without modifying the original package''</span>.

In order to run a process under run-level 7, all you need to do is provide a link in the <tt class="FILENAME">/etc/rc.d/rc7.d/</tt> directory to your startup script. However, in most cases your process should only start if it is enabled in the configuration database.

If you look at the <tt class="FILENAME">/etc/rc.d/rc7.d/</tt> directory. you will see that it contains a large number of links to the <tt class="FILENAME">/etc/rc.d/init.d/e-smith-service</tt> script.

S00microcode_ctl -> /etc/rc.d/init.d/e-smith-service
S05syslog -> /etc/rc.d/init.d/e-smith-service
S06cpuspeed -> /etc/rc.d/init.d/e-smith-service
S15nut -> ../init.d/e-smith-service
S15raidmonitor -> /etc/rc.d/init.d/e-smith-service
S26apmd -> /etc/rc.d/init.d/e-smith-service
S35bootstrap-console -> /etc/rc.d/init.d/e-smith-service
[...]

This script is key to ensuring that services start when they are enabled and do not start when they are disabled, as it:

* Checks the name of the link, e.g. <tt class="FILENAME">S05syslog</tt>
* Removes the S05 prefix, leaving <tt class="FILENAME">syslog</tt>
* Checks to see whether <tt class="FILENAME">syslog</tt> is defined in the configuration database, and whether it has its <var class="LITERAL">status</var> set to <var class="LITERAL">enabled</var>.
* If so, it runs the <tt class="FILENAME">/etc/init.d/syslog</tt> script with the argument <var class="LITERAL">start</var>.
* If the service is not enabled, it exits without starting the service.

<div class="NOTE"><blockquote class="NOTE">

'''Note: '''If a script exists in the <tt class="FILENAME">/etc/init.d/supervise/</tt> directory, <tt class="FILENAME">e-smith-service</tt> will use that in preference to the one in the <tt class="FILENAME">/etc/init.d/</tt> directory. This allows us to install our own supervised startup scripts <span class="emphasis">''without modifying the original package''</span>.

</blockquote></div></div></div></div>

Navigation menu