The Metamorphosis: From Developer to DevOps
I would like to share a story about our metamorphosis. How we changed from Developers to something that is a Developer, and a Site Reliability Engineer, and a Security Person, and maybe even a Business Analyst.
This did not happen overnight. It took about 5 yrs.
First steps into DevOps
We were working for a big international corporate, where money was never an issue. At those days there were some smart enthusiasts envisioning a new platform. A platform built from the ground. A platform that uses all the new and shiny technology, principles and just everything that was heading up on the hype-scale. I was lucky to be part of the team, that was appointed to deliver this somewhat crazy vision.
Among the vast number of requirements the so called Product Management
defined Continuous Delivery and DevOps as something that we must achieve.
At that point I had no real idea what those were.
Surely I heard about both, but now I know, that I had no clue what those were,
or if they were interconnected or not.
Anyhow, I had a problem to solve.
I started to hunt for presentations on those topics, blogs, etc., but I was missing the concepts behind it.
So what I did, was, I bought a book, and started to read. It was the Continuous Delivery book by Jez Humble and David Farley. A great book, that changed the way how I think about software development. Actually this book convinced me about how DevOps is the right way, (or at least better way). Oh, not that DevOps, you can hear about now-a-days. No, the original one, where developers, operators/admins, testers, security folks, etc. all work in a same team and share the same faith: Deliver a service the customer likes, on time, within a budget.
At that point the metamorphosis began. Slowly I was changing, and I’ve changed the people in my team and the whole environment around me.
In the meanwhile the project failed. What were you expecting?
What were we expecting?!
These kinds of projects with all-in, big-bang stuff never work out, -
as the company meanwhile forgot about the real business requirements and
client needs, - the top management just saw a Rainbow Unicorn, and was
blinded by the rays. :-)
Departure to implementation
A part of my team, moved to a new company together, and with rationalized scope, a pragmatic approach and with some real clients and projects ahead, we carried over the knowledge and mentality. In less than a year, we started to delivered our first SaaS project on behalf of our customer to their clients.
All done in the spirit of DevOps using Continuous Delivery.
We’ve changed. We’ve changed a lot!
What changed? To realize some aspects of changes we need to go back in time.
Back in time, where DevOps was not a thing yet
I remember myself from around 8 years ago.
I was working on a product that had a requirement of full text search.
I was so glad this popped out, as I was inspired by elastic search,
and I wanted to try it. A shiny, cool, new stuff!
So I introduced Elastic Search in less than a month.
It was all happiness and rainbows, until we tried
to sell it to a customer. When the customer realized, he needs
now at least 4 times more hardware, he wanted that item out.
But it was not that easy to take it out, as it would actually
mean disabling the full text search. So our Engineers worked hard
to enable this stuff somehow on smaller hardware.
It somehow worked out, but we sacrificed few things, like speed,
reliability, resilience, etc..
I was sad. I thought that the next customer will be more willing to take it. So we went ahead and tried to understand the inner workings of the Elastic Search, to know how to help our customers to deliver it with confidence. It was hard. Our developers with the help of our administrators were working for more than 3 months to come up with some documentation and some example configuration, that was working reliable in our environment.
This as well as other similar issues made me sceptical about way we work. About the way we love to approach a problem as a developer.
Developers passion
We, developers, - or at least some of us, - we just love technology. We do fell in love with the new shiny stuff, with the latest database technology, or sorting algorithm, or else. Sure we tend to have our preferences. For example one of my peers loves algorithms and databases, the other one is a pattern purist, yet I’m in love with architecture, concurrency and distributed systems.
We all fall to our weakness: We see a new sexy thing, we fell in love and tomorrows project has it.
We never, or just rarely ask for a permission, as we are the experts, they used to say. We do not consult with the operation, why would we, they just need to operate it. :-) You just forgot to ask the security, as you used to think security should come on top of it. Didn’t write tests by myself, the test department will test it…
It’s all your responsibility now
But,… now, YOU are the operation, the security and the testing department as well. Nobody else but YOU.
That is a turning point. You now need to think it all over again.
So you want to introduce Kafka. Fine, that would be another piece of service you need to operate. Oh, and the SLA, it will be impacted as well: let’s assume we can come up with Kafka SLA to 99.95, and our application is already at 99.82 so adding this piece to it would mean getting to 99.77.
To be able to achieve 99.95 on Kafka you need to deploy it in a cluster to at least two availability zones. Huh, how I do that? How I deploy even one instance of Kafka? Oh, damn, I do not want to operate it, it’s too hard, lets find a fully managed service, that can be operated “near” our data center, or it will be slooooow. Ok, but that would cost us too much. So or we are going to operate it, or we cannot have it… Usually at that point you get back to a drawing table and rule out that item.
When you do operation, you want small number of well-known services. As this is how you can have high SLA number, how you can find people to operate it, be able to acquire the knowledge, etc..
I do not consider myself SRE, or alike, I don’t feel like there yet. But, I do operate a handful of services to more than 700.000 clients, at peak time 10.000 concurrent users, with a big help on application operation side from my peers, but on the other side its usually just few of us.
Although the best part is, that as of cautious decisions, a great architecture and fully automated SDLC makes me a relaxed person, who actually does operation in less than 5% of its time.
Just for those, who are not familiar with numbers in operation, operating an application of similar size consumed around 15 full time employees few years ago at one of our customers site. So, I think we are doing well.
Credits
Background Photo by Boris Smokrovic on Unsplash