A new view of configuration -- it's all about "who

For each package, there is configuration defined by the author or maintainer of the package. There might be several authors, who would coordinate on one file, but in fact there's no reason they couldn't all have their own if rules exist to combine them. These files typically should exist along with the other files of the package, and not in an area belonging to others, such as /etc.

Next comes any packager which bundles the program into something else, most probably an operating system. Ie. Debian would have a place where it puts all changes which are done by the Debian team. They would not, generally, touch the package files at all. (Of course, sometimes they would need to, more on that below.)

Other post-packagers who take Debian and apply their own changes would again do so in their directories. For example, Ubuntu is based on Debian but would have its own Ubuntu section. The variants of Ubuntu (Xubuntu, Kubuntu, Edubuntu, Mint, Mythbuntu etc.) might be a simple addition on top of that.

Next you want places for "network" sysadmins, who apply changes and policy for entire companies, departments and networks. Each level of sysadmin would have their own directory in which to make changes. People could also "import" system administration work from outsiders by having a directory where their changes go and are imported to.

Next to last is the configuration for the local machine itself. A directory on the local machine would be used just for this. Any change that applies to more than this machine would go elsewhere.

Finally, configuration for the end user. While all the above configuration would go into "system" level directories, this would reside in a special ".config" directory in the user's home directory.

(Further levels of configuration are possible, such as based on the directory data resides in, as is commonly used by web servers.)

In this case, "changes" can mean both installing a whole program or making a minor configuration change. In general these two tasks should be divided, because installation has some specific rules to be talked about below.

On top of all this would be configuration imported from trusted sysadmins who bundle together their own customizations. For example, while you might install your mail system directly from the author, with modifications from your OS packager, somebody might advertise, "here's some config to make your mail system do X or Y." You might want to import that, and keep importing it as it changes and updates.

Configuration Files

Of course, you need a configuration language and a way to deal with conflicts and the ordering of configuration parameters.

Today an XML makes sense. Generally, a chain of authority would be laid out, and the files processed in that order. Ie. we would start with the author's configuration, then the packager, then the sysadmins and finally the user.

We might then descend back through that chain looking for "override" configuration set at any level, for rules to be applied that will supersede changes at lower levels.

At each level, the file could grant or deny powers to set parameters or use configurations of given types at lower levels as well.

Hopefully most configuration sets will be "unordered" in that they don't depend on one another. These can be readily defined anywhere, at any level, with the only question being what to do when definitions conflict.

Some configuration, however, will need dependencies. As such, higher level files will need to include tags at various points that can be used by lower levels to insert configuration "into the stream" at the right point.

For example a server might in its master (author's) configuration have an empty XML block of a form like "<point name=dir1/>" which would allow later configuration files to indicate that this is the point where a group of changes should be interpreted if there is to be an ordering.

Existing config formats

Most programs use some form of "fieldname = value" configuration, one per line, with continuation via indent or backslash. The delimiter is commonly equals, colon or whitespace. The system should allow blocks of such configuration surrounded by XML to specify which type it is, and other attributes to be associated with it -- such as who, what version and/or what date. This makes it easy for existing programs to import their configuration and old code by wrapping it in some simple XML.

For very popular programs that don't use this style, custom code to understand their config syntax will be needed until such time as the program starts to use the new config database. If the custom code can be written in languages like perl that are whiz-bang at parsing configuration files, this is not so hard a task.

Old style programs

For older programs, the tool would need to provide commands so lower levels can safely "edit" the master configuration file, which would be move to a safe "source" location. Then a new master configuration file would be written out based on the rules.

Consider a program that wishes to have a file like /etc/mydaemon.config as its configuration file. The original would be moved to a new location. The sysadmin's changes, in a place like /whofig/netadmin/1/mydaemon/changes, would contain "patch" style XML tags to change the source and generate the new configuration. This would of course not be safe if changes are made to the master which are incompatible with the patches, but that's always true in such cases!

The configuration master program would read the original config file and all the individual additions and changes, and generate a new resulting config file when anything has changed. A symlink would be placed from /etc/mydaemon.config to the generated config file, allowing /etc to remain read-only.

See also the possibility of a whofig filesystem.

Master Database

There does need to be one master database which says where things are after they get installed. Ie. each program needs to be able to reliably query where it is installed, and the location of the compiled/cached configuration for it and other materials it needs.

Permanent archive (for backup)

In order to be a packager of software, you would need to commit that you, or somebody else, would permit permanent access to anything you put in an independent package. (A package which exists only in the context of a distribution is not independent, the owners of the distribution make that commitment.)

The reason is that users are now able to back up their systems by storing only their own personal file tree. If the system is lost, they must know can get all the other file trees from archives. Of course, "from archives" may simply mean the CD a distribution was provided on, so a restore can be done by grabbing the data off the CD and merging with the user's own file trees and nearer trees.

Indeed, users will judge just how stable imported material is, and back it up if they have any doubts about its public permanence and ease of access.

Installing Packages

Just about every package will have its own space, named after it. The "who" in a package is the package maintainer. (This is a title, not necessarily an individual.) People who work on the package would combine their work into one package using other collaborative tools.

Some confusion arises when you install a package. Two or more things are going on. First of all, you're importing a new tree of files into the system, which are maintained by the package maintainer. However, the act of importing is a change made to the system by you, the sysadmin or user. So the package is brought in, and it's noted in your own tree what you did -- namely import a particular version of somebody else's tree.

You might actually do more than this importing a package, because you might import with it other people's changes to the package. In particular, the people who build a distribution (which is largely a collection of custom configured packages) would try to avoid changing the packages themselves. Instead, they would create their own changes in their own tree. So a typical package import would bring in the package's tree, and a one or more sub-trees belonging to the distribution. One way to do that would be for you, the admin, to bring in not "Firefox" but "Ubuntu's Firefox." That latter package contains both an import of the master "Firefox" as well as Ubuntu's customizations for that.

In addition, the install process might have asked you some questions, or calculated some custom values based on the particulars of your system. These would be stored in your file tree, in a sub-tree for the package.

Doing all this would be the job of an enhanced package management system.

A problem is also present when one maintainer installs a package at one version level, and another installs a newer (or older) version. Generally the "lower" (closer to the user) choice should win, though an argument can be made that the newest version should win unless an explicit command is given to use an older one. Ideally if the two (or more) levels eventually settle on the same version, there should be only one -- this would cause the lower levels to remove their changes, as they are no longer changes.

Package managers would need to continue their old roles of handling dependencies and install/removal of packages. They would need to understand that dependencies must go to an equal or higher level -- you can't install a package as "network sysadmin" that depends on a package installed by the "machine admin." You would have to install that lower level "machine" package at the "network" level, ideally arranging to remove it from the machine level.

Compiling from source

A problem is created when a person decides to install a program compiling from source, with custom configuration. This is the traditional way to do things on linux, and common for new tools that are not yet fully packaged for your OS.

It may be necessary to consider the new program as the "property" of the person who compiles and installs it, and they install it into their own tree. It makes them responsible for maintaining updates and merging in changes that come from the real authors. This is how we do things today, so it's tolerable. If it's done all the time you are reverting to the old mishmash.

(This does not apply to automated packaged compiling from source as done with portage. Presumable those building a portage package would configure it to follow a whofig system if they wanted to. There it is clear who is making the changes, while when a user compiles a program it's not as clear.)

It would also be possible to move compiling of source into a whofig-style system. In effect, version control systems perform some of this role, dividing out the changes of various contributors. To do this would require a suite of tools to make it easy to work on source distributed over a filesystem ilke this, with the original source in one directory and a user's patches in another.

New packages

When a package is brand new, you might not have confidence in how it fits into your distribution, or if the maintainers will commit to providing a public archive of all versions. In this case you might wish to install their packages in a subspace of your own tree, rather than a tree belonging independently to the maintainers.

If a program is "promoted" from such ad-hoc installation and status it will need to know how to uninstall its old ways and start with brand new ones. It might leave symlinks behind.

New Ways of Thinking

In this system, nobody except the OS authors puts files in places like /etc. Even the OS designers, if creating a file that users will modify or which stores their modifications, would not put it in /etc.

I propose a new root level directory called /whofig, which would have sub-directories for all the types of people changing the system, and sub-directories within for each individual, and finally sub-directories or files within those for each package getting changes.

Files maintained within an OS distribution could live in a /whofig/os, but since the filesystem by default will "belong" to the OS distribution they could live anywhere.

As noted below, one place they should move away from is the old traditional places. That way, we can identify old programs that don't understand whofig. New programs that suddenly put themselves in /usr/bin can be spotted and dealt with as best as possible.

All sysadmin changes would go into /whofig. Some of the directories there, such as /whofig/company, might be mirrored from master systems over the network.

There would also be $HOME/whofig (or .whofig if you prefer) for per-user program config information. This would contain a faster-access database file as well as the manually tweaked files for different tools.

Within each changer's space will be a software directory, with subdirectories for each system that changer installs or configures.

Since large packages need disk space, there could be an explicit /packages directory, or symlinks could be used from /whofig to the physical locations of such files.

When you download a new package, even a source package, you would never touch the config.h or other config file that came with it. Instead the tools might also support a special sub-directory or paired directory for test changes to software not yet fully installed. Again the advantage is that if you get a new update of the package, you never changed its original so you should be able to just drop the update in place. That won't always work but it will always be easier. Of course, source code revision control systems are still recommended for source code changes.

(In fact, there's some merit to having a version control system for the config files of the system being described here.)

Migration

This can't happen overnight. We can't even begin without tools and libraries to handle the various config files and allow programs to access their own configuration information. And probably a package manager that understands the concepts of installing packages into the system.

All packages built to work with whofig would move all their files out of traditional locations. Only non-whofig programs would remain there. For legacy reasons, files accessed by other programs could be left as symlinks in the traditional locations. Ideally all such symlinks would contain a magic token in their path to identify them. Unless absolutely needed, most symlinks would be to read-only files generated by the new tools, to prevent attempts at configuring things the old way. Text files would include comments pointing to the new way to configure.

Over time, the traditional directories would become nothing but symlinks, and eventually nobody would even be using the symlinks. Of course that will take a lot of time.

Many tools already support being installed in any named place in the filesystem. Many support at least system vs. user configuration, too. Many can thus see a basic migration with just a recompile.

Initially, tools that read a config file today will just read an identical file produced by an independent whofig compiler from all the various source files. It may even be possible to support per-user config for programs that don't traditionally support it, if they can be given an option to read their config from an arbitrary location. A wrapper would rebuild this based on timestamps when it changed.

Control of lower levels

Some configuration files will be edited by hand, in text editors. Many will also have GUIs or other UIs for setting configuration. Some configuration is just data remembered about user activity and preferences.

Ideally, especially with machine generated config, we'll have a timestamp for all changes, and know what version of a program they were written for. This allows smarter programs to know the difference between an option set for version 1 and one written with version 2 in mind.

Upper levels of control will be able to make tags to control how lower levels are handled. Ideally you might want to say, "Ignore any user configuration for this value that was created for version < 2.05 or before Jan 4, 2003."

Anal sysadmins may also wish to forbid or lock-out actions on lower user levels. Thus XML attributes might dictate that a user can't change a value set by a higher sysadmin, or the higher sysadmin can set a minimum or maximum on numeric values, or a regexp for textual values.

The library that allows fetching of configuration will allow more than just querying the value for a given field. It might return the entire stack of values, and indicate who set each one and approximately when in time or in version history. Most programs would not get into this but those that wanted to, could.

Design notes

This system will be used by all sorts of programs, possibly even the kernel itself for kernel config. As such, it has to use a static database, not a database system like mySQL that needs a running process to be used. Of course, the power of the database systems is considerable, so the tools could also be built to use a static on-disk database until the real database comes online, and reflect any changes made in one into the other. Many people don't run a database, nor should they have to in small embedded systems, so this must not be depended on.

Writing

It is good if programs can write to their own config. However, due to the static file requirements, the master database must be read-only except to the program that rebuilds it from other config files. (Also, this program must not lock the database for more than a very short time to write it.)

One option would be to make the read-only database only be a list of other databases, which belong to programs that can be trusted to read and write to them.

Alternately, the master database can contain all sorts of data, and included in that can be a file, database or directory for data to be written by the program. However, the program should also follow the whofig rules in writing out data, based on who it is being written out for. In particular if a program has its own configuration user interface, it should know who is doing the configuring and write out the info in a space belonging to that party.

Some programs won't know about this, but admins could set an environment variable when they want to do config that is not per-user or per-machine.

Predefined spaces

Here are some drafts on Whofig File Structure and how the new file tree might be organized.

What's Needed

This is just a draft of an idea for what's needed to make a dramatic change in how easy sysadmin and upgrading are. I am coding other things, but if you are interested in helping with this, let me know.

Here's a list of partial steps on the way to whofig.

The key things needed though are a finalized structure and tools to build the master configuration database once people start using the structure, and then packages that use that database to get their config.

Other Thoughts

There are other projects delving into easing the configuration and sysadmin problems. One that also values the idea of treating the changes to a system like a unit that can be isolated is Arusha.

A venerated tool for applying changes as scripts to many machines is CFengine.

Some new thinking in another direction on managing configuration files can be found from Matthew Arnison.