SRE Story with Sathya Bhat
Learnings, musings, taking your chances and much more!
Today we have Sathya Bhat sharing his SRE story with us. I came across Sathya's work from the open-source community and his books. Many people also recommended him as a perfect candidate for SRE Stories.
Sathya, why don't you introduce yourself?
Sure, my full name is Sathyajith. But nobody calls me that. Everyone calls me Sathya. I've been working for nearly 18 years, time flies when you have been working. I am based out of Sydney these days.
It is amusing how I started because, in my classes, I was the go-to guy for anything related to computers. That sort of went into my head. I flunked the Computer Organization paper. I thought I knew everything and could answer all questions, but the exam wants you to respond in a certain way and not like how you know it. But anyways, I started working in 2007 and joined a small company 3i Infotech. They used to build insurance software, and I started working as a trainee there. I used to write patch notes back in those days. People would prefer release notes or patch notes. Because our software was essentially data-based, such as database procedures triggers, we used Oracle forms as the frontend. I used to write the weekly release, zip up the branches, etc.
That started my way of steering toward what is now known as SRE. Because I gained a knack for being the person to go to if somebody had trouble. I discovered not everyone was interested in figuring out why something was broken. They just wanted it to be fixed, and my curiosity was more on the side of why it was broken? I didn't care much for fixing it, and it's what I've always been interested in since. They would contact me for obscure bugs that are not easily reproducible. So I became the go-to person for those things.
Around 2009, I started exploring other things apart from SQL and dabbled in Python, Ruby, and C#. Just did random stuff. GitHub was launched in 2008 or 2009. So in late 2008 or mid-2009 is when I created a GitHub account. It was cool then; I had no idea it would become the forefront of today's software development world. I didn't have anything much to share on GitHub at that time.
Sathya's GitHub Profile It has definitely grown now!
I continued working on insurance and databases till 2015. By then, I started getting bored with this because there was the same repeatable work. I had been working at service companies till then. The problem with those companies is you can have the best time, or you can have the worst time, depending on the manager. To my credit, I have always had a good manager throughout my career. But I was getting bored and reaching the limits of my ability. I wanted to keep myself interested in different things, and in early 2012 or late 2011, I moved to Bengaluru. Around that time, I started helping Barcamp Bengaluru.
Barcamp Bengaluru 2023 edition is happening on May 20 and Sathya is still involved!
I was running my website on a dedicated domain name by then. I was very familiar with the whole terminal lifestyle and Linux things. But I have yet to use it professionally. When I started helping at Barcamp, one of the organizers asked me if I wanted to handle the server because they were using a shared host. This was about the time when Barcamp was getting really, really popular. The shared host could not hold the traffic we were getting. I had done a similar migration from a shared host to VPS, so I took up that role, and that's how I got into professional Linux server management.
Back then, there were no junior DevOps positions, and if you wanted to learn about AWS or the cloud, you did it on your own personal time, so it was tough. It was like a chicken and egg situation.
I went to a couple of interviews for the DevOps position. I bombed them severely because I essentially said no, no, no, no to all the questions they asked. This company was well-known, and I knew many people working there. So it wasn't very pleasant, but such is life. But soon after that, because of Barcamp, a mutual friend discovered I was interested in moving toward DevOps as a full-time professional job. So I told a friend that if you know someone looking for DevOps, let me know. And another friend had told the same guy that they were. The irony is that we all knew each other for a long time because of Barcamp. Also, it was funny that I had taken the lead in organizing Barcamp but did not attend it because of a family function on the same day. So I did everything. But I was not there for the actual event. But that's how it happened. I knew Prashanth from Barcamp and early Twitter. He gave me a chance at StyleTag even when knowing I didn't have actual experience with the cloud.
Prashanth is now the CTO of AntStack.
And that's how I started with DevOps and Cloud. So I went from that person never logged into AWS console before the job. On my first day, they asked me to create a backup of our RDS instances. The general manager of Engineering saw me exploring and said no, no, don't do this during the peak traffic. Later on, I learned that RDS stops all processing and then takes a snapshot while doing a backup, so it is disastrous to do it live. That was my level of my ignorance of the cloud. I was only there for nine months, and the company shut down soon after. But I learned so much in the first six months because I had some fantastic mentors. By the seventh month, I automated myself out of the job, which has been my goal since then. It's to do enough so that people don't have to rely on you. Many people believe their job is safe if they keep some dependency, but that's the wrong way of thinking.
Do enough so that people don't have to rely on you.
After that, I joined the API Gateway team at Adobe, where I started with SRE work. We were the internal developer platform for all of Adobe. Plus, we used to run the API Gateway for Adobe. Incredible five years, so much so that I didn't want to leave the job. But I had moved to Romania by then, and Romania is a fine place, but it was not gelling well with us. So, I decided to leave after five years. We were doing good SRE work then, along the same lines as how Google defines it in the SRE book. We had On Call setup as well. Of course, that was the painful part :) But there was so much to learn from many amazing folks as we had an open engineering culture. Most of the engineering repos and documents were open for people to read, understand and comment on.
From there, I moved to The Trade Desk. Officially, I'm still an SRE. But it is closer to platform engineering because we run a platform for our internal teams to onboard to the cloud via a self-service platform rather than us doing any of it. But yeah, that's a really long line of introduction :)
No, this is great; it gives an excellent overview of your journey. How do you look at Platform engineering vs. SRE vs. DevOps?
Yeah. So you notice different definitions of what platform engineering is, and it's like nebulous how there are different definitions of what SRE is, DevOps, etc. Many people ask me if DevOps is good or SRE is good. And my question is, you don't go by the title; you should ask what your day-to-day role will be. Because I've seen companies who have packaged sysadmin roles with DevOps and SRE roles because they couldn't hire anyone otherwise.
So similarly, platform engineering has a lot of different definitions. Which is the right one? Back in Adobe, we were also building stuff close to platform engineering because we provided a unified experience to other teams. It is about making it easy for teams to put their services on production.
Making it easy for teams to put their services on production — that is platform engineering.
You abstract all the implementation details, and you provide sane defaults. That's the kind of work we were doing at Adobe. Suppose a team wants to bring their service to the cloud. In that case, we have built self-service mechanisms for them so they can learn about the internals of AWS or the cloud without knowing about the internals of AWS or the cloud. It is similar to what Kubernetes provides to some extent, where you specify the ex amount of CPU and memory, and it does the rest of the things.
At The Trade Desk, it is different. Because of data sovereignty and client confidentiality issues, some teams only want to run their workloads on specific cloud. So for that reason, the development teams come to us saying that we want our service to run on the cloud and it has to be an Azure, but they don't care about the details. And that's what we provide.
I have seen in some discussions that platform engineering means people are building their own platforms instead of reusing what the cloud provides. I can see why they would want to do it, but it's like being on a treadmill, and the treadmill is just going to fast. People will always want more and more from you. If your team needs to be sized sufficiently enough, you're always running against that. Sooner or later, you will fall off that treadmill because you need help to keep up with the requests of what clients are asking you.
So, instead of rebuilding the platform, providing good abstractions makes platform engineering.
Instead of rebuilding the platform, providing good abstractions makes platform engineering.
If it means getting them an easy way to expose their metrics and enable unified logging, then be it. There is an excellent talk from Argocon 2022 from last year on how we did this at Adobe using Ethos - the platform the team built on top of AWS, Azure and Datacenter.
With this, a person new to Adobe's abstractions could get a system in production with multi-region deployment in under 30 minutes. That's a gold standard for me and shows how amazingly the platform engineering work can be done.
You are active in the community and have written books and a lot of content on the blog; how do you find time for it?
Now it's reduced a lot, I used to do a lot more. I don't have any said timetable or schedule. I don't do journaling; I don't get up and do meditation. I'm pulling the legs of people claiming to be all this. It depends on what you're interested in. Whatever things I've done, these are all the natural extensions of what I'm interested in. One of my fundamental beliefs and goals is that I have always liked helping people.
Why do I do it? Because people have helped me out at different times, which has made me a better person in terms of getting a better job, making me understand things, and giving me a chance. The blogs are also self-documented for selfish reasons. They are for me; if others find them interesting, that's good.
I used to do at least a hundred blogs a year from 2008-2009. But that was also when Twitter was just in its infancy, and my blog post would essentially be what my tweets were. And recently, I reduced that, which I am trying to fix.
As far as the books are concerned, it was, again, pure 100% coincidence/luck. For my first book - the editor was looking for people to write books, and he left a comment on my blog. And the funny thing is that message was in spam for two months. I didn't even realize it. It was marked as a spam comment. And then, two months later, I was looking at the spam comments. I saw the name and realized that it was from Apress. Then I immediately reached out to him again :) and that's how the first book happened.
Sathya's first book - Practical Docker with Python: Build, Release and Distribute your Python App with Docker
For the second book, it was, again, pure luck. I was helping with the planning for CDK Day, and some people mentioned that publishers had approached them to write a book on CDK. None had time to write an entire book, but they were ready to write a few chapters. They also wanted some feedback on publishing the book, and I gave them my experience working with Apress. They invited me to write a few chapters, and that's how the second book happened. You should use the chances you get because you don't know when you will get them again.
Sathya’s second book - The CDK Book.
Use the chances you get because you don't know when you will get them again.
Make the best of what opportunity gets your way because you never know when it will return. My first few jobs were also because of the network I built on early Twitter around 2006-2007. The crowd was tiny, the same way it is nowadays on Mastodon.
I just talked to many people and made a network with them. Things don't happen overnight. They can take some time, but the dots can connect.
You mentioned that so much content is out there these days. How do you keep yourself updated with new things or new trends?
It is mostly catching up on Twitter and Reddit. I follow r/devops, r/aws, r/sre. But my primary means of keeping up with the news is why I'm in the Last9 Discord as well because I like being around communities, especially niche communities like this. That's the best way to keep up with things because you can't keep up with newsletters as too many people are writing about too many things, which makes it really difficult. This is weird because I am trying to determine if I should start another AWS newsletter. So Reddit, Hackernews, Twitter, some niche communities, and meeting people at conferences and meetups are my primary sources. That's also one of the worst things about moving across countries. You completely give up on the network that you built. It becomes difficult to rebuild the network again, especially if you're older. You're in a different capacity than you used to be able to get as a young person to meet people.
Any conferences you are looking forward to attending this year?
I will definitely be going to AWS reInvent.
All right, some rapid-fire questions, Vim or Emacs?
I've never used emacs ever in my life. And I use Vim all the time and am not even a power Vim user. I do just enough to get by.
If you were not an SRE, what would you be?
An Auto Driver? I used to think of myself as an Auto driver as a kid. On a more serious note, I may be a teacher because I like helping out people.
How do you take time away from work and recharge?
I sleep a lot. I love listening to music. I'm a big fan of classic / indie rock, and I also play many games. I have. I just got the steam deck. I had a PS4, but I sold it when moving across countries. Now I'm thinking about whether I need it or not. I also have a switch for my portable gaming, but since I have the Steam deck now, I will give up on it. So a lot of gaming, a lot of sleeping.
What strength should someone have to become a good SRE?
Patience. It would help if you had a lot of patience. It would help if you had the patience to look through logs and information to understand what's happening and why it is going wrong. I lose my patience once in a while. Still, on the whole, I'm quite a patient person, and that's helped a lot in this job because I can't imagine sitting through an eight-hour video call to figure out what's happening or what's not happening.
What is the longest on-call or war room shift you had?
I need to find counts of my on-call shifts from Adobe. To give you context on how on-call used to work at my team in Adobe, a person will be on call 24/7 for a week. It was a rotational thing that was shared between teammates. Once every seven weeks, you would be on call 24/7. Usually, there used to be a code and deployment freeze during Christmas. I would take up on-call around that time as most of my other colleagues would be with their family for Christmas, and for me it was just another week. Usually, nothing used to happen around that time. I thought it would be the most silent on-call ever, but it turned out to be one of the worst it could ever be. I remember this because I got my M1 Mac about two weeks back. I joined the call at a hundred percent battery. I got off it, but the battery was still 20% left. And this is after being on a video call for 8 hours.
The incident was also terrible. There was a kinesis outage. We lost all observability. We couldn't do any Route 53 updates because, internally, AWS uses Kinesis as a message bus, so our DNS updates were not going through. Every time people asked for an update, we used to give a standard update - we are working on it and will let you know. I don't get to have that kind of incident now, and part of me is happy for it because I don't have to worry too much about it, and part of me is like I want to know more; it's FOMO. We have on-call these days but during work hours.
Thanks a lot Sathya for sharing your story with us, where can people find you?
Readers, if you would like to ask Sathya any questions, do reach out to him on social media. If you want to feature on SRE Stories or want to nominate someone, do let me know on Twitter.