DevOps Documentation

In a previous entry I wrote that the key to removing the wall between developers and operations is communication. I wrote about becoming involved in your co-workers’ meetings and understanding your co-workers’ needs, and this is of course very important, but sometimes it is best to have things written down, because your co-workers aren’t always around to answer your questions (ops people who have tried calling developers at 4am know this all too well). Good documentation is often toted as the answer to all of life’s problems, but we never make time for it. I’ve found a few ways to reduce the barriers to creating documentation, which I discuss below.

First, documentation needs to be revision controlled and live with the code, as doing so will reduce the cost for the developers to create documentation and help keep track of changes to documentation. This also means you don’t use binary formats for your documentation–no OpenOffice Doc, no MS Word, no format that you can’t store in git, svn, or whichever revision control system your company uses. You must be able diff revisions. Documentation stored primarily in plain text is ideal because plain text is a universal format; plain text can be emailed to anyone, edited by anyone, read by anyone and easily manipulated with tools readily available to both developer and operations. My current choice for documentation is ReStructured Text, which you already know how to write even if you haven’t seen it before. ReST is a plain text format that evolved out of the docutils package and focuses on simplicity and clarity of meaning.

Furthermore, documentation that will be used by the developers and ops people should live with or near the code that the documentation is for so that the docs are easily accessible to both groups. Documentation stored in revision control also means that you can set up commit hooks that trigger alerts when the documentation changes. If you use a web-based system for browsing source code that system probably offers an RSS feed that can notify you of changes to files, so you can see when your documentation has changed without leaving your RSS client. The automation possibilities are endless–if a specific source file got updated but the docs didn’t? Send an email to the person who made the commit asking why.

Keeping the documentation within the source tree has the advantage of the documentation always being checked out with the code, and thus the most recent revision is available to all the developers, and the ops people keep up with the source tree as they work with the documentation.

The DevOps Dialogue Document

One of the first documents that an ops person looking to bridge the development divide should create is a dialogue document. This is a document that the developer and the ops person work on together that tracks all of the information needed to handle the care and feeding of the software. What should this document look like?

Administrative Details

A description of the software role and purpose.
A contact in development, a contact in ops, and a contact in QA. These are the people that can be bugged about issues with the software, and these are the technical, not managerial, contacts.

Operating Requirements

Disk space required for installation, disk space used in normal operations, disk space used by logs, disk space used by data, and estimates of growth rates.
Memory requirements.
Network requirements, including ports the software listens on, connections the software makes to other services, protocols used (at all layers), if SSL is required, and so on.
OS requirements, such as platform, distribution, release of distribution, kernel versions, patch versions, word size (32 or 64 bit), other unique OS requirements.
Environment requirements, such as libraries required, variables that need to be set, shells that are required, users and groups that need to be on the system, and more.
Details on how to start and stop the software.

Configuration Details

Details on how the software is configured; is it via a config file? Is the configuration in a database? Is there a way to change the running configuration on the fly?

Monitoring & Debugging

How does the software get monitored? Is there a way to have the software report status? What about performance monitoring?
What process do the developers use to debug issues they encounter in the software? For example, is there a mechanism to get a stack trace, such as Java’s handling of SIGQUIT?
How does the software respond to common error situations like out of memory or out of disk space, or being unable to bind to a port? What about handing of bad input?

Backup & Restore

How does the software store state, and can you easily backup and restore that state?
Can you do a backup while the software is running and get a restorable backup? How can this be tested?

Security

What privileges are needed to run the software?
How is input validated?
How are any authentication tokens (passwords, certificates, etc) stored?

There is a lot more that should go in each section and many more sections that could be added; the above is an example that you can use to start your document.

At the last Boston DevOps Meetup, @hercynium mentioned that he uses the FCAPS model to figure out what he needs to know about software he is going to deploy and manage. I hadn’t heard about this model but after some quick research it looks like an excellent reference for helping you build your dialogue document. From the Wikipedia entry:

FCAPS is an acronym for Fault, Configuration, Accounting, Performance, Security, the management categories into which the ISO model defines network management tasks. In non-billing organizations Accounting is sometimes replaced with Administration.

These topics map well to the items that developers and ops people both need to understand about the software they are working on together.

No software will have the correct “answers” for the dialogue questions, and that’s not the point. You’re trying to start the dialogue that gets everyone thinking about software outside of their silo. I’ve written the questions above from the perspective of an ops person, but the developers should add their own set of questions — maybe a section on what the production environment is like and how software will be pushed to production. Since you are storing this document in plain text along side your source code under revision control you can easily check in new answers or questions and have those changes be seen by the developers on next update.

Developers and operations people lament the perceived lack of documentation and everyone agrees we need more. The purpose of the dialogue document is to create the documentation that is needed in the lowest cost way, so don’t try to make the perfect document from the start. Embrace the iterative and evolutionary nature of Agile and grow the document as your understanding of the software grows.

February 23, 2010 communications, devops, documentation

Damon Edwards says:

February 23, 2010 at 23:04

Hi Adam,

Excellent post. I wonder if there is a maturity model that needs to be considered. For example, configuration tools like Puppet or Chef and workflow/coordination tools like ControlTier try to bridge this communication gap by telling everyone to “communicate with code”. Rather than hand documents back and forth, hand executable code that take all (or most of the) ambiguity out of the handoff.

But before you can even get to that point you need to know what all of the procedural and configuration points are… I wonder how many companies would be better off with a step 0 like the “DevOps Dialogue Document” rather than reaching straight for the automation tools.

-Damon
http:dev2ops.org
scott says:

February 23, 2010 at 23:20

I agree with Damon, great post. We use a similar checklist for new software deployments and it’s been very useful.

One thing to keep in mind is the method of both input and submission: While it’s a good candidate for a wiki page, it should be submitted to the Ops organization through another format. People tend to format it the easiest way possible (read: very little extra formatting), which lends itself to difficult to read answers due to the low contrast.

@Damon: I believe the two go hand in hand, but having such a document should definitely be a prerequisite.
Adam Fletcher says:

February 23, 2010 at 23:35

@Damon the idea of a maturity model is interesting and a good way to look at how far along the path of “devops” your organization is (or even how far along a part of the organization, or single piece of software, is). I %100 agree that we should be talking in code as much as possible, but for some organizations we need to get there in small steps.

Thanks for the comment!
Nikolay Sturm says:

February 24, 2010 at 02:10

I totally don’t get the idea behind this post. Devops as I understand it in this context is about communication and colaboration. Ops should get involved in development whenever ops related topics are dealt with. Ops should be trained by Devs, so both can immediately see problems. And finally, Devs have to be around when their software is deployed, to get their share of learning.

Documentation might be a band-aid, but I don’t see how it helps fix your real problems. To me this sounds like the road to standardization, certification and bureaucratic death. 😉

Hm, to finish this rant construcively, documentation might be a starting point for root cause analysis: for each point you document, ask why, why, why and try to get rid of it.
uberVU - social comments says:

February 24, 2010 at 02:18

Social comments and analytics for this post…

This post was mentioned on Twitter by adamfblahblah: Communicating with documentation, a new #devops blog post: http://bit.ly/ceVzZo…
Andrew Clay Shafer says:

February 24, 2010 at 03:23

Adam,

My experience has been that documents don’t stay updated.

Echoing Damon, code don’t lie. Why record in text what you can mandate and enforce with code?

Also, I doubt devs will know half the stuff on the list.

For example, ‘Memory Requirements’ is likely going to depend on load, which no one knows until the thing is live and is not likely a static requirement. It’s all about feedback loops.

I have seen scenarios where lists like this would be an improvement, but I see this as more of a starting point than a goal. The situation that comes to mind is when you have a lot of different dev teams making lots of small internally facing applications to support the general business processes, which are then handed over to a monolithic but highly resource constrained ops team. I’d probably advocate trying to the segment the ops team so they have more knowledge of applications they support when possible. I know one smallish ops team that supports about 80 development teams, but their life is pretty close to hell and I would consider that an organizational failure to recognize the value of or invest in infrastructure. They would love to have this list.

Where the business is outward facing web applications, the ops team managing that better know the ‘software purpose’ without looking at a text file and if you need to look up contact info for who to talk with in development or qa, then you probably aren’t talking to them enough.

0.02
Andrew
Adam Fletcher says:

February 24, 2010 at 09:56

@nikolay – rant away, you’re on the Internet 😉 However, I completely agree that documentation tends to languish and die and that the code is the right place for documenting operating requirements (where code is the puppet/chef/other configuration & service management framework). I’ll get there in a later post.

@andrew – This is definitely meant as a starting point. I’ve found that some companies have already solved the communication problem between dev and ops because the company is organization is setup such that ops and dev people have to work together, but it is more common for companies to have separate dev and ops groups (who, to echo Nikolay, have started down the “… road to standardization, certification and bureaucratic death.”) I’m hoping that we can help the latter companies break down those walls by first increasing communication and then codifying practices in tools and automation. I know in many companies if the ops teamed forced a new tool on development without first talking to development the tool would be roundly rejected without consideration (and this happens the other way, with ops rejecting new code from development when development dumps a new project in ops’ lap). This is a more common situation then having freedom to wholesale change existing process (or the luxury of creating new process from scratch).

Thanks for your comments!
docutils says:

March 21, 2010 at 12:25

[…] (required) Website. Theme by Justin Winslow | WordPress | Entries (RSS) and Comments (RSS) …The Simple Logic Blog Archive DevOps DocumentationI wrote about becoming involved in your co-workers' meetings and understanding your co … format […]

Content

DevOps Documentation

The DevOps Dialogue Document