Boston DevOps @ MS NERD

Boston DevOps meeting regular Vladimir Vuksan has has gone and done a great thing – he’s setup the next meeting at Microsoft NERD, which is at One Memorial Drive in Cambridge, MA (about a mile from where we usually meet).

Vladimir has setup a registration link here.

This promises to be a great show; Vlad and Jeff Buchbinder will be giving a presentation on lessons learned while reengineering deployments at their companies. You can read more about what is on the agenda on this post on Vlad’s blog.

See you there – Tuesday, August 3rd, 2010, from 6pm until 8pm.

OpsCamp Boston 2010

This past Thursday I attended OpsCamp Boston 2010, an unconference themed around topics in systems operations. I was interested in meeting several of the people I know only from Twitter and also making new friends from the greater Boston Operations community. Microsoft NERD generously provided the use of their space for the conference.

Conference Structure

The structure of OpsCamp was unlike any of my other unconference experiences (almost all BarCamps) in that before the group decided on the what topics to discuss the sponsors had an opportunity to give lightning talks. I’ve got no issue with sponsors giving lightning talks, but the organizers arranged it such that all attendees of OpsCamp were in the room and essentially required to watch the sponsor talks. The unconference Rule of Two Feet (reason #2 for attending unconferences on this page) was never explained to them.

Another interesting difference between OpsCamp and other unconferences was that directly after the lighting talks, but again before the community had a chance to choose topics for sessions, the organizers had an unpanel answer seven or eight questions from the audience. The questions covered a variety of concerns/issues in systems operations today. The questions ended up leading to suggested sessions when we were finally able to decide what people would like to have the unconference on.

Unpanel

The questions for the unpanel, as my notes recall them:

  1. What happens when all the ops jobs move to India?
  2. How would cloud adoption affect the outsourcing of ops?
  3. What are the costs of ops and IT? What is the trend for that? What is the correct ratio of IT assets to people administrating them?
  4. Why won’t my ops people let me self-provision like I can in the cloud?
  5. What is the connection between the talks we just heard and the cloud?
  6. Should dedicated infrastructure and public cloud resources be centrally managed?
  7. Patch or Rebuild?

Questions 1, 2 and 3 became a single breakout session (more on that in a moment), and 4 became another breakout session. Topic 5 was addressed by Cory from Dyn, who said that he felt that the policy and process haven’t changed but the method has, in that we now deploy via APIs instead of physical hardware. For question 6, many people felt that you should mix your infrastructure between physical hardware, virtual hardware and the cloud, and manage it centrally, and it was pointed out that rPath, Opscode and others have technology to help do this. Questions 7 was the last breakout session of the camp.

Breakout: Will Ops Get Outsourced?

This breakout came out of the heated discussions on whether or not Ops people are going to be outsourced and the cost of Operations in general (questions 1,2 & 3, above), if new Ops people are less skilled, and other ideas that mirror the offshore development discussions of ten years ago. There was a pretty obvious age divide, which is a hot topic in technology in general, and a lot of discussion on the evolving nature of what an Ops person is. There was a lot of respectful arguing in this breakout session, and I think anyone who attended this session left thinking a lot about their own future in Operations.

In parallel with this session was a session on what tools can be used to build a cloud.

Breakout: Why can’t I just deploy to the cloud?

I ended up moderating this breakout panel because the gentleman who asked question 4 (above) had left OpsCamp early.

Why can’t a developer simply pull out her credit card and put her product in production? Perhaps even in a large company with an established IT department? I’m a big fan of everyone in the organization working towards delivering the service, not bickering over domain, so I’m in support of questions along these line. In that spirit I renamed this breakout “Why are developers trying to ruin the business? ~or~ Why are Ops people assholes?” in the hopes of bolstering attendance. Everybody at OpsCamp ended up going to this panel, so score one point for inflammatory panel titles.

There’s no short answer, and we lost track of the original question several times, but the overall idea is that process and repeatability increase the chance of successful service delivery, and often developers overlook these issues when creating software. That said, I think the Operations department should do everything it can to bring Ops processes to the developer (and make sure that Operations is built into the product, not bolted on later). Work together despite often having seemingly conflicting goals.

Breakout: Patch Or Rebuild?

The last session of the evening was a discussion of when it is okay to simply rebuild from an image instead of patching the running software. There were a lot of opinions on this, but I didn’t have too much to add because I think the right answer largely depends on the situation.

Networking

After the sessions, many of us retired to a bar in Kendall Square to have drinks (graciously provided by the folks from Dyn) and chat.

Final Thoughts

While I had some issues with the structure of OpsCamp, I enjoyed the people and the discussions that we had. I do wish that the organizers had encouraged people to post possible presentation topics on a wiki ahead of the camp (as was done for BarCamp 5) because I think that encourages people to prepare presentations on topics and helps avoid every session being a discussion.

I’ve also uploaded the raw notes for your viewing pleasure.

If you attended OpsCamp Boston I encourage you to come out to the Boston DevOps Meetups, the next of which is Tuesday, May 4th.

Recent Readings

Web

Devops Homebrew – Vladimir Vuksan is a regular at the Boston DevOps Meetups and I was happy to see this post on his previous job’s release process. The post is an excellent case study in DevOps in deployment.

An Agile Architectural Epic Kanban System – Part 1 – There’s a lot of room for Kanban and Agile in DevOps initiatives, and I think many people are already headed in that direction (I’ve started doing Kanban with the operations teams at ITA; they’ll be a post on how this is working in a few months). Having the developers and ops people use the same process management technique helps improve communication all around, and Kanban gives excellent visibility into what is happening now in an organization. The article above discusses using Kanban to give visibility into the process of architectural decision making, a process which is often invisible to developers or ops people.

Print

The Visible Ops Handbook – Tom Zauli from rPath brought me a copy of this at the last Boston DevOps Meetup, and I’m about halfway through. I think the practical steps recommended in Visible Ops would be very effective to gain control of an operations organization that is underwater, and after control is regained you can start automating as much as possible.

The Checklist Manifesto –  If you haven’t read Complications and Better you should stop reading this and pick up those two books right now. Dr. Gawande’s analytical look at process improvement in medicine (or lack thereof) is readable and it is easy to find parallels between his observations about medicine and any other industry. Both books are highly recommended for people who care about honest self reflection and evolutionary improvement.

The thesis of Dr. Gawande’s new book couldn’t simpler: checklists prevent errors. He backs this up with examples from many fields and the argument really is compelling; I can think of many cases at work where a checklist has saved the day. I think the DevOps trend of automating as much as possible, especially around deployment, is a way of encoding checklists. At ITA our deployment process went from a checklist that took a day or more to complete manually to code that performs the same checklist in under 45 minutes – that’s 45 minutes for an entire airline reservation system.

Getting Started With DevOps

DevOps has been defined in this article by Stephen Nelson-Smith, and the executive summary is that operations and development should no longer be separate functions (and never should have been) and need to start working closely together.

Why? Without working together, failures inevitably occur. For example, at the last Boston DevOps Meetup, one of the attendees, a developer, was commenting on the disconnect between him and his sysadmin and how their relationship was unlike the devops model.  

“That all sounds nice for you guys, but my sysadmin at work doesn’t seem to care about any of this. He’s not engaged.” 

The developer went on to give examples of times when the production servers broke the code because the production servers were configured incorrectly, or the ops person didn’t assist in debugging a problem because the ops person felt the problem was the developer’s to deal with.

We all talked about this for a while, when I realized that in another bar in another town, that developer’s sysadmin was saying to his friends, probably over a beer, “That all sounds nice for you guys, but my developers don’t care about any of this. They’re not engaged.” The sysadmin probably went on to talk about how the developers don’t keep their configurations sane and how they never debug the problems they create.

This is why we need DevOps. (And probably, really, DevOpsQASales, but that’s another post).

How do you get started with DevOps at your work?

  • If you are developer, invite your ops guys to your scrum or weekly meeting. Make sure they come and always ask them if what you are talking about has an impact on their work.
  • If you are an ops guy, invite your developers to your scrum or weekly meeting. Tell the developers about upcoming changes in each environment. Ask the developers what is going in their world.

If you’re a small team, bring everyone, if you are large, bring one developer from each project, but don’t invite the development managers, or the ops managers. Invite the people who do the work. You need to have the developers who will say, “We’re implementing a feature that uses these extra libraries, can they be installed in production?” and the ops people who will say, “Oh, if you use v2.8 of that library it won’t work on the older machines because of x, can you guys use v2.9?”. You want this to happen before you go to production.

Adding meetings to your calendar always sucks, but you’ll save headache later by talking to each other, and more importantly, you’ll buy into the projects that everyone is working on. You’ll believe in the work others at your company are doing and want to help if there are issues.

The developer above should be talking to his ops guys on a daily basis. They should go for a beer and talk about technical problems at work. I guarantee one of them will say, “Oh for that I just do x y and z, and it works great.” and this will be the solution to a nagging problem.

Technology, of course, can help a lot. In the example above, the ops guy should:

  • Create virtual machine images of production on a regular schedule that the developers must run their code on as part of checkin cycle.
  • Use a configuration management tool such as puppet or chef to keep staging and QA environments matching production environments.
  • Talk to the developers about the network, hardware and software that make up the production environment, including details of the resource limitations of those components.

While the developer should:

  • Write up details of the software’s operating requirements in terms of resource usage, environment configuration, and other dependancies.
  • Package the software properly, so the ops people can review package manifests on upgrades automatically to track changes in QA and staging.
  • Codify operational issues in unit tests (lossy network unit tests, disk full unit tests, out of memory unit tests, blocked port unit tests).

Working together:

  • Use build tools such as Hudson to trigger jobs on checkin – jobs which run the full set of unit tests on a production-like environment/virtual machine.
  • Define hand off procedures for new features, which require checkoffs from ops, QA and development.
  • Push the ops deployment scripts and methods to the developers’ workstations so the developers can use them.

Also, whenever possible you should automated that which can be automated. That’s what computers are really good at.

These are just a few examples. There’s a lot to talk about in later posts, such as more detail on the heavy use of automation, Agile practices in operations, proper storage of documentation (version control! plain text formats!) and more.

I hope this is useful introduction to some DevOps concepts as I’ve understood them. Please comment below about your own experiences.