Monitoring your network has become much more than up and down monitoring. Nowadays, IT is not just tasked with monitoring servers and bandwidth,
businesses are now in the cloud. Microservices are achieving goals in today's application and it all needs to be monitored. But why? Well, mostly because your users expect things to work and to get their job done.
Greg: I'm joined today on the phone with Mike Julian, a monitoring expert author and founder of Aster Labs, where he works as monitor and visibility consultant. Mike is also the editor of "Monitoring Weekly", a weekly email newsletter about all things monitoring. You can sign up for free at weekly.monitoring.love. Mike's book, Practical Monitoring, Effective Strategies in the Real World comes out this fall, and we are lucky enough to get an inside peek on what his strategies are, which will be covered in his book. Thanks so much for joining us today, Mike.
Mike: Yeah, thanks for having me.
Greg: So Mike, for the audience, can you give us a little bit of your background as an IT professional and what has led you to being the monitoring guru you are today?
Mike: Yeah, absolutely. So in 2006, this is where my journey starts. I was working for a small little school and they had a bunch of printers. And these printers kept going down, just all the time. And you know, the school was kind of big and it was kind of a pain to walk across the whole campus just to plug a network cable back in.
And since then I've gone on to work at a bunch of enterprise companies. I used to run the NOC for Oak Ridge National Lab. They had twenty-five thousand network devices on their network. But the scale is interesting and you ran into some really fun problems.
So since then, I've gone on to work for Peak Hosting, did some similar stuff there with data center management and monitoring. And Talis Consulting, where I helped companies like Airbnb and Consensus Corporation do their monitoring. So now I'm an independent consultant.
Greg: Wow. So those are some pretty prolific companies. I mean they're definitely disruptors, I would call them. And it's funny you bring up the printers because when I worked in IT, printers were the bane of my existence.
Mike: Aren't they for all of us?
Greg: So in your book, you cover not only how you monitor infrastructure but all the applications and the microservices that are much of the backbone of the apps we use today. When we think of monitoring, generally the first thing that comes to my mind, at least, is hardware. But in today's cloud environment and connected apps, there's much more to watch. How has this changed in how IT monitors their networks today?
Mike: Yeah, so we can go any number of different directions with that. Do you want to talk about just networks or do you want to talk about applications?
Greg: Actually, I'm kind of interested to hear a little bit more about the microservices and virtual environments because that seems to be the new thing that IT teams need to tackle these days.
Mike: Yeah. So I'm actually writing the section on microservices; I'm working on it this morning. And it's an interesting challenge. So it's...if you've got a dozen different services, you would think that oh, well the network is highly reliable. But it never is, the network isn't reliable, you can't trust it. Now, usually it's reliable until suddenly it isn't and latency is spiking across the board. Except in a microservices environment, you don't know that. Suddenly you'll have a dozen different microservices all talking to each other and suddenly one service is taking longer to talk to another except it's buried way down in the tiers of all this microservice mesh. And you have no idea what's going on.
So every time you click a button suddenly it takes way longer than it should and you're like well, I don't know why that is. When you're running a monolith it's a lot easier. It's just one application. It might be spread over 2 servers or 3 or 5 or 20 but it's pretty simple to debug. But microservices is more like imagine like 20 different applications and they're all kinda sorta interdependent on a network you can't trust.
Greg: So there's a lot of ways that this could become an issue for the users using these applications.
Mike: Yeah, absolutely. So what I've found is that there's a few things that are the most important things to be tracking with really any applications, but especially with microservices. Latency, throughput, and error rates. If you track just those three things, both inbound and outbound on the microservice, you're doing quite well.
Greg: So you are basically setting a threshold to say, "I expect this many errors within this amount of time," and if it goes past that threshold essentially you have a failed state?
Mike: So yes and no. In the world of microservices, everything is pretty much always in a sort of a semi-failed state. It's broken and you just don't know it. So yes, I do look at error rates in that I don't expect them to go over some sort of threshold and that threshold is based on my previous experience with that particular microservice. Sometimes in a more mature environment, I'll opt for a statistical model, looking at a rolling average, a standard deviation, or something like that.
Greg: So there's a lot more to it than just basically seeing if the thing's working or not.
Mike: Right.
Greg: So when I'm discussing monitoring with other IT pros, I rarely hear about the connection between monitoring and data security. And, you know, honestly on Defrag This, especially the past few episodes, we've constantly been talking about data security. Especially with Wanna Cry and that stuff that came out, I'm sure you've heard about that.
Yeah, and it seems like especially since I think in most companies these are two different roles. You've got the network admin who's obviously monitoring the flow of traffic through the network, and bandwidth, and all that fun stuff. But there probably is a bit of a rift working with a data security professional, like a CISO (Chief Information Security Officer), for instance. They're two very different job titles, yet they have to work side by side to kind of accomplish the same thing. You know, keeping the data safe while also being able to provide it to the people who need to have access to it.
Mike: Yeah, absolutely. When I worked at Oak Ridge National Lab, we had dozens of systems administrators and network administrators, network engineers. But the infosec people--we called them Cyber Security--were a completely separate team. And they had their own tools, they had their own way of doing things, they had their own room. Which, to get into the room, it's a locked door that you had to badge into. And everyone else had an open office plan.
It was very much the security people were segmented off into their own world and occasionally would come out and say "Hey, you have a problem", like please, you have to do something about it now. We're like "Oh, hi. Who are you?" And at my next job, when I went to Peak Hosting, once we started really integrating these two roles of having someone that was both...having the security people talking with the system administrators and network engineers, things started getting a lot better. We started noticing like, "Oh, this is how security should work." And suddenly it's no longer an adversarial relationship. And once you start working really well together like that, you can get some amazing stuff done.
Greg: You know, the reason I bring it up is it's an ongoing struggle that I see, especially the people that I talk to in healthcare, it's an ongoing rift. Because it seems like they're speaking two different languages. Now within the context of your book, I'm sure you discuss different strategies in which different teams can communicate with each other as well as how monitoring can help the rest of the organization.
Mike: In addition, I also go into a few other topics on monitoring the security of a host for like rootkits and other intrusions. Using the oddity [SP] subsystem in Linux to track what people are doing, like what users are doing or more maliciously, what automated script kitties are doing.
Greg: Now unfortunately, that's all the time we have today. So our listeners are going to have to wait to buy your book to learn more about the script kitties and other data security issues and how to monitor for them. So I'd like to thank you, Mike, for calling in today to Defrag This. And we are looking forward to your book, which comes out this fall.
Mike: Yeah, thank you so much for having me. It's been a great time with you.
Greg: Yeah, of course, my pleasure.
Get our latest blog posts delivered in a weekly email.