3 Reasons why DevOps isn’t changing IT faster

iStock_RobotRaceThe recent article from Luke Kanies (from Puppet Labs) on Wired.com really got me thinking. Similar to Luke, I have had an interesting vantage point to observe the changing nature of systems administration – spending time as one myself – over the last decade or so. From my graduate physics department, to Loudcloud, EDS, BladeLogic, BMC, and now Sumo Logic, I have seen the best and not-so-great IT shops and how they operate. Not all of those IT teams I saw in action were, or are now, adopting best practices like automation and DevOps. The trend that Luke points out means that operations teams that continue to languish in constant firefighting mode, relying on ad-hoc scripts and the sweat off the admin’s brow, are becoming more obvious for being clearly out of step with the direction of the industry.

So, why aren’t all operations teams, and the techies themselves, falling over each other to embrace tools like Puppet, SumoLogic, and other DevOps/Automation tools? Clearly some organizations are embracing them. Why not all of them? I have few of my own ideas here, and I would like to hear yours as well.

1. The move from “Artist” to “Manager” is not natural

Back when I started in IT, most IT admins “owned” a small collection of devices or applications. Server admins owned a handful of servers, database admins a few databases, network admins a few switches and firewalls, etc. They controlled access to their systems jealously, and took personal pride in their operation. They were artists, and their systems, and the way those systems were managed, were an art form. As IT budgets have shrunk, and the load on IT increased, this level of care is impossible.

Yet, you still find many admins jealously guarding their root access privileges, instead of moving to a shared responsibility model with other admins. Why? I think it is the same reason why I feel such satisfaction after cooking a meal, building new shelves in my garage, or fixing a leaky faucet. I did it myself, and it feels good to start and finish something. Participating in automated processes can be deeply unsatisfying. That is why admins need to learn new skills, and find new pride in the quality of their automation or find satisfaction in steadily improving quality.

2. Using Automation may seem like losing control

One conversation from my IT past sticks in my mind more than any other. I was on site with a customer trying to explain the benefits of automation to a group of systems administrators. One system admin floored me by insisting that she could more accurately, and more quickly, make changes to 20 UNIX servers than I could ever do with automation. It was like some modern version of John Henry calling me out to some man vs. machine contest, and subtly decrying the inhumanity of my automation tool. I can’t even remember my answer now, but this perspective is at the root of much of the push-back against automation and DevOps. Instead of looking at the business outcome – better experience and value for the customer – some frustrated system admins see these new ideas as a direct affront to the quality of their work. This is precisely why I think the fundamental shift in DevOps is from a internal IT focus to an external customer focus. That way admins can measure their success by customer impact. Not an easy change to make, but it is essential for IT’s continued relevance.

3. Change seems hard/bad/unnatural/unneeded

Isn’t the root of the resistance here really the natural tendency to resist change? On the other hand, how many times have operations teams been assured that the latest IT fad will reduce their workload and improve quality, only to see the opposite happen? So what’s different about DevOps? I think I could write a whole blog entry just on that, but a few things come to mind. First, the focus is on customer value, which greatly simplifies priorities. Second, it’s all about outcomes, not process for the sake of process. Finally, it is all about continuous improvement driven by the experience of the people on the frontline. This means that admins must be rewarded going forward for doing things that increase customer value, rather than putting out fires or pleasing angry executives.

So, it all comes back to culture – surprise, surprise. I think this will be the primary challenge of DevOps going forward. How do we overcome the IT culture so resistant to change, while providing an attractive way for all of those systems administrators to breathe easily in their new roles?

IT is War : DevOps & ITIL through the lens of military history

Climbing into the DevOps vs. ITIL debate is like stepping into a minefield, as least from my vantage point. Both have serious-minded proponents, and engender the kind of passion that lesser methodologies only dream of.  But, after this very issue has come up multiple times recently, I really felt compelled to think more about it. Most of the attempts at reconciling the two haven’t resonated with me. So, I have tackled this issue the way that appeals the most to me – using military analogies.

Military history is probably my favorite part of the huge topic of history. I particularly enjoy understanding how advances in weapons, tactics, and organization have influenced the course of events. The evolution of IT over the years has been a lot about adjusting process and tactics to the available tools and competitors. The evolution of military strategy boils down to the same thing, so I think a quick look at the evolution of military tactics over time can shed some real light on the DevOps & ITIL debate.

Discipline and Organization beats Chaos

Success in war usually favors the bold – those who embrace new technology and changemacedonian pike phalanx their tactics to deal with new situations and threats. A few days ago I watched a documentary on the massive defeat of the invading Persian army by the Greek city states in 490 B.C.. What many people don’t know is that it was the technological advantage of iron lances and large shields, and the highly coordinated group maneuver called the Phalanx that gave the Greeks the edge. Moving as one unit, with shield interlocked, spears pointed front, the Greek hoplites were an unstoppable force cutting a swath through the Persians

Fast forward a thousand years. The Europeans, with their endless wars of the 17th, 18th, and 19th centuries made an art napoleon-army-salamanca-spainform of the highly trained foot soldier with a musket. Napoleon represents the pinnacle of that art. Coordinated volleys of musket fire from trained soldiers, supported by well-placed cannon, and fast moving light cavalry, could wipe out less well-equipped and trained troops. Napoleon also worked at a scale unheard before of his time, mastering logistics for hundreds of thousands of troops in the field. With his armies, Napoleon was able to dominate the Europe for 20 years, and his ideas lasted longer.

Tools can obsolete Tactics

The effectiveness of these large scale maneuvers was upset by the advances of the 19th century, particularly during the U.S. Civil War. The vastly increased accuracy of rifled muskets and cannon, and the high rate of fire afforded by caplock (percussion cap), meant that the magnificent infantry charges of the 18th century only made for easier targets and mind-blowing casualty rates. This reached a peak with the mindless violence of trench warfare in World War I.

MarineRaiders

And again, armies adapted. Along with the introduction of the tank, it was small group tactics that won the day and broke the stalemate in World War I France. Instead of ordering breathlessly stupid charges into the face of machine gun fire, small squads of soldiers could adapt to the circumstances and more rapidly advance.  World War II continued the refinement of those tactics. This didn’t mean that large scale coordination wasn’t still necessary. Artillery and air support still needed to be coordinated with the soldiers on the ground. Tanks and mobile infantry could move quickly and pack a powerful punch (which the Germans perfected with Blitzkrieg).

Very interesting, but what’s the point

Other than indulging my need to geek out with talk of weaponry, this rapid flyby of history does have a point. Successful armies over time have adapted their tactics and tools to meet the threats at hand. The ancient Greeks and Napoleon used organization and discipline to overwhelm their enemies. In the face of the devastating weapons of the 21st century, successful armies used more flexible and fast-moving tactics to dominate their slower moving enemies. All of these militaries adapted to their circumstances and made the most of what tools they had.

I don’t think IT is all that different. ITIL made a lot of sense when confronted with the chaos of IT operations, and the need to provide stable services for a business questioning the value being derived from their investment. With well-documented processes and coordination, IT departments could confront and conquer the chaos.

Conversely, DevOps has arisen in the wake of the pressures exerted by a hyper-competitive business environment and hard-to-please users with no end of choices. Just like the soldiers facing rifled carbines at Gettysburg and those facing machine guns nests in French trenches, IT operations teams trying to please the 21st century Internet user can’t march into battle with the highly coordinated, but rigid, maneuvers of ITIL. By the time they perform the service management equivalent of a pivot, the business has lost customers and revenue. In the word of U.S. General George S. Patton of World War II fame –

“A good plan violently executed now is better than a perfect plan next week”.

On the other hand, DevOps teams need a backdrop of coordinated services (e.g. cloud services or automation) to enable their agile methods, just like the U.S. Marines in World War II needed artillery and aerial support.

So, what’s the takeaway? We should never compare methodologies in a vacuum. Any methodology needs to solve today’s problems, not yesterday’s. The whole point of methodology is to provide a way to repeat the successes of the past. So, you need to find the Von Clausewitz that has succeeded where you want to succeed, and follow their lead. The methodology that best helps you meet your goals today is the right one every time.

Kaizen and the Art of DevOps Automation Maintenance

And now we come to the most “boring” part. Right? Maintenance. The death of joy for the innovator. Or is it? I don’t think so. Continuous innovation is at the core of DevOps and Lean methodology. Maintenance is essential to keeping the spirit of DevOps strong, and automation that isn’t improving will grow stale and useless.

So, let’s review the IT Automation Curator’s job description on last time:

  • Collect existing automation, and then Catalog it where others can find it (See Part 2)
  • Develop new automation based on requirements from IT (See Part 3)
  • Train others on how to use the automated processes (See Part 4)
  • Maintain the existing automation

Going back to Lean methodology, we can look to the idea of Continual Improvement or Kaizen. There are 3 main areas from Masaaki Imai‘s 1986 book Kaizen: The Key to Japan’s Competitive Success.

  • Reflection of processes. (Feedback)
  • Identification, reduction, and elimination of suboptimal processes. (Efficiency)
  • Incremental, continual steps rather than giant leaps. (Evolution)

Use Metrics and Reporting for Feedback

How can you improve if you don’t know how far you’ve come and how well you are doing? That’s like going on a diet without ever weighing yourself or even looking in the mirror. Healthy weight loss involves such small changes every week that it would discouraged if you didn’t look at changes over the long term (I speak from experience). So, why do so few companies and automation vendors include metrics and reporting on automation efficiency?! The whole point of implementing automation is to reduce waste, reduce costs, and increase velocity. But you have no context for understand if you have succeeded if you don’t have metrics and reporting (I talked about this in a previous post).

Ruthlessly eliminate sub-optimal processes

The whole point of this process is get rid of wasted effort and time – muda. The hardpart here is that once you agree to continually improve your automated processes, you have to be ruthless in your evaluation of their efficiency. That means there are no sacred cows. Just because somebody smart and dedicated invested hours of their life into creating something, doesn’t mean it can’t be improved or scrapped entirely. The whole team has to be dedicated to, and incentivized towards, efficiency and continuous improvement. This point is important – these kind of improvements come from the grassroots – not the top. If the people in the trenches aren’t bought into continual improvement, it won’t work.

Baby Steps, not Leaps of Faith

I know the hardest part of this approach for me is the gradualism. I like to grandiloquently solve grandiose problems with lofty and visionary solutions. The problem is that most of those involve large amounts of kool-aid, and they are never finished. The truly mature IT organization has to keep their eye on the goals of the business, and relentlessly reduce muda – step by tortuous step. We can refer back to the weight loss analogy. You lose weight through all of the small victories – Do I really need that donut? One serving is good enough. But bringing us back to our first point – small victories only show up as victories when you can measure your long term progress. Otherwise it looks like tentative, timid, risk-adverse behavior.

And so, we reach the last chapter of my IT Automation Curator series. It has been a lot of fun writing it, and I hope that you enjoyed it as well. I am looking forward to continuing to explore how the proven methods of lean and agile can be applied to DevOps and Operations overall.

Training others in the dark arts of DevOps Automation

If you are the Automation hero, why would you EVER share that stage? You are basically reducing your value to the organization by sharing your secrets. Right? Wrong! You are actually doing yourself a lot of harm, as I discussed in the the first blog post. How can you move on to other exciting challenges if you have to maintain your work of automation genius?

That is why the the IT Automation Curator’s job description has training as a core requirement:

  • Collect existing automation, and then Catalog it where others can find it (See Part 2)
  • Develop new automation based on requirements from IT (See Part 3)
  • Train others on how to use the automated processes
  • Maintain the existing automation

I had a great comment from jamesmarcus in the first blog post. Here is what he said:

“As a Director of IT I look to tools that promote easy automation, documentation, andbest practices. I try to design networks and setups with the “if I disappear” rule in mind. Meaning another sys admin of lesser knowledge should be able to look at my work and understand how why we did something in a certain way”.

I think this is a great perspective at the core of why I included training in my job description. Very few programmers, sysadmins, and other IT techies enjoy documenting their work. I don’t either – when it is after the fact. It is so mind-numbing to document your automation after you are already done and want to move on. So, that brings us to our first post.

Build your Automation to be well-documented and re-usuable

While performing amazing feats of scripting judo can impress your colleagues and get you kudos online, it is not a good long-term objective. One thing I learned early as a programmer is that creating incredibly efficient and elegant code seemed great, but it was really bad if even I couldn’t figure out what I had done a year later. That all comes down to great comments while you are writing the code. I know this may seem basic, but I have seen too many IT organizations with automation scripts, packages, etc. that no-one understands anymore. This is essentially a guarantee that the automation in question will be left alone to become outdated, brittle, and even “dangerous”. And if you are the only one that understands it, then it is your burden to bear.

So, basic to our train function of the IT Automation curator. How can you possible train people if your automation is over-complicated, un-documented, and impenetrable to mere mortals? Only with difficulty, and no one (especially you) will enjoy the experience. By documenting your automation very well as you write it, and building it to be as straightforward and simple as possible, you increase your chance of handing it off successfully. Writing well-documented and straight-forward automation has to be part of your process – bottom line.

Find automation disciples, and train them in the dark arts

While I see no reason why you can’t “teach” a class on automation as part of this role, I don’t think that is optimal, or even desirable for most people (stage fright anyone?). I have always envisioned a much more personal approach to automation training. Not every sysadmin, administrator, or IT techie extraordinaire will have an aptitude for, or interest in, designing automation. The right person has a somewhat rare combination of programming know-how, patience, troubleshooting skills, and IT systems knowledge. Obviously, everyone will use the automation, but only a few will write it.

The IT Automation Curator should be a mature, senior IT operator that has an eye for spotting talent. Like Mr. Miyagi in Karate Kid, you can watch for the young IT admin with lots of promise and fire in their belly, but unable to conquer the IT problems with their lousy karate skills. In all seriousness, I think mentoring promising candidates on automation best practices is more enjoyable and effective than the typical shotgun approach. The best part is that you can let the young upstart take care of the boring automation bits, while you save the best for yourself!

So, in summary:

  1. You can’t pass on automation that impenetrable to anyone but yourself
  2. One-on-one mentoring is a much more effective way to pass on your automation skills and knowledge

So, here is a parting challenge for all of you out there that actually remember Karate Kid. What might the IT Automation equivalent be of Mr. Myagi’s “catch a fly with chopsticks” trick?

IT Automation Curator for DevOps – Part 2 – Collect and Catalog

This topic is far too interesting and deep to cover in just one blog post. So, I am going to split the discussion into a few sections. I’ll use my proposed “job description” for an IT Automation Curator as a starting point:

  • Collect existing automation, and then Catalog it where others can find it
  • Develop new automation based on requirements from IT
  • Train others on how to use the automated processes
  • Maintain the existing automation

This first step of collect and catalog is where I have seen many automation efforts stumble. The natural inclination of most techies (myself included) is to jump right into developing automation, no matter what is in place. As I learned the hard way, that is a bad idea. So, I will give a few reasons why this step is important:

Reason #1: If you don’t know about all the automation in place, you don’t really understand how your data center is operating

It’s great that you developed that new automated process that auto-magically deploys a set of configurations for you. Are you sure that other scripts or tools won’t change it or corrupt it? Most IT teams have scripts strewn all over the place – some well known , some the detritus of sysadmins past. They may have been scheduled centrally or on individual servers. This is very hard to get a grip on. There are a few tools out there, but it is hard to ensure that you have found all automation spread over all the systems. This is just another reason why you need to control access and even re-build some servers from scratch (hopefully in an automated way).

Most IT operations teams also have multiple automation tools in play. Each silo-ed team has their preferred tool, which they guard jealously. Overall, this is not a good approach. The more tools you have, the harder it is to standardize automation and create efficient end-to-end processes. At a minimum, all of these tools need to documented and managed centrally.

Reason #2: Don’t duplicate work and ignore experience

A lot of the automation in place may not be optimal, but it was most likely built to solve the same problems you will need to solve later. Tossing it out, or just ignoring it, is essentially disregarding the combined experience of the IT team. Even if you rebuild it in a better tool, and in a more efficient way – the lessons learned will be valuable.

There is also an important less here about prioritization. Just because you can make an automated process more elegant or more efficient, doesn’t mean you should. More often than not you will have no end of automation projects to look at. Why spend your time on what already works? What is important is to apply automation judiciously, where it provides the most value for the business.

Reason #3: More sharing will always lead to better results

Fostering a culture of sharing automation, essentially an open-source culture, will ensure that everyone has access to the best work on offer, that they don’t re-invent the wheel, and it will allow for continual improvement. That last point is crucial. The idea is not for the automation curator to control all the automation per se. They should be catalysts for making better automation, whether they do it or not. So, it is important to leave one’s ego at the door, and admit that your automation becomes better when you let others critique it and improve on it.

Bottom line, having a central place to share and continually improve automation is essential. This will most likely affect your choice of automation platforms as well. If you can’t share and improve, then you will be hobbling yourselves.

So, how do you do this in your own environment? Do you have ideas about the best way to go about it? Any success stories?

IT Automation Curator – Good for techies, good for business, good for DevOps

Recently my thoughts have been going back to a concept I like in the seminal IT operations book, The Visible Ops Handbook (By Gene Kim, Kevin Behr, and George Spafford). I have been doing a lot of thinking about how Lean, DevOps, Agile, etc. are changing IT culture, or at least pressing for change. Properly leveraged automation is a big part of that change process – which makes me think of the passage in Visible Ops where the authors discuss changing the behavior of senior IT staff:

“Their mastery of configurations continually increases while they integrate it into documented and repeatable processes. We jokingly refer to this phenomenon as ‘turning firefighters into curators’ […]”*

As a former IT techie myself, I get the need to challenge oneself in the often routine and monotonous world of IT. Personally, I think that is a lot of the grass-roots impetus behind the DevOps movement, and the adoption of open-source automation tools. Creating automation is a way of turning the mind-numblingly mundane into something exciting and intellectually challenging. So far so good. Boredom leads to sinking morale and productivity – poor morale is bad for business.

So, what’s not to like? In short, it goes back to focus and sustainability. No, I’m not talking green-energy windmills. How do you sustain and focus the efforts of these budding automation aficionados? Left to their own devices, they will likely create lots of useful, but narrowly directed scripts, packages, etc. All of these will be focused on the problems they face on a daily basis. For the problems outside of the automation guru’s gaze – those problems will most likely remain unsolved.

So, this is where the idea from Visible Ops comes to the rescue. The answer is that we pull these gurus out of their day-to-day grind in the IT trenches,and make them automation curators. Now, I know that many of you hear curator and think of a older man in a tweed jacket, peering over horn rimmed glasses, waxing rhapsodic about the various manufacturer stamps of 18th American chamber pots. So, as interesting as early american port-a-potties may be, let’s look at the definition of curator:

curator – one who has the care and superintendence of something (Marriam-Webster Dictionary)

Clearly tweed is not mentioned. In all seriousness, museum curators do much more than merely talk about old things. Considering the Smithsonian’s own description, curators:

  • Acquire new items for the collection
  • Research the collection
  • Display the collection
  • Maintain the collection

So, if we work off the Smithsonian’s “model”, I suggest that an IT Automation Curators would:

  • Collect existing automation, and then Catalog it where others can find it
  • Develop new automation based on requirements from IT
  • Train others on how to use the automated processes
  • Maintain the existing automation

This kind of role is exactly what I missed someone had offered me early in my career. I would have jumped at it. It would have been a great new challenge for me, I would have been creating value for the business, and IT would have been more efficient. And this isn’t really a new idea. Software developers have long needed to share code snippets and concepts with each other, and they defined the interfaces between code as well. The trick here is that Automation Curator needs to take an active role in both building the best automation and also in promoting the proper use of automation in IT.

One last comment. We might ask if this would be better classified as an Automation Librarian. I think it is good question. At the end of the day, I think having the existence of the position is more important than what you call it. However, in my mind the concept of curator leans more towards the acquisition, development, and training part. The words Library and Librarian in IT seem to lean more towards the maintenance and storage part of the equation (notwithstanding what traditional librarians actually do). Curator is also a cool word.

So, why aren’t more IT shops doing this? What do you think?

This is the first part of a multi-part series. Check out the other parts:

* Kim, Gene; George Spafford; Kevin Behr (2005-06-15). The Visible Ops Handbook: Implementing ITIL in 4 Practical and Auditable Steps (Kindle Locations 917-919). IT Process Institute, Inc.. Kindle Edition.