DevOps Topeaks

Two DevOps engineers sharing their hands-on experience, a dash of knowledge, a bit of brainstorming, and having fun along the way.

All Episodes

DevOps Topeaks

#22 - Serverless Backlash

June 09, 2023 • Omer & Meir • Season 1 • Episode 22

Send a text

Is Amazon dropping serverless? What was the trigger to the huge backlash on serverless and microservices of the past few weeks?
We discussed AWS's blog post, DHH comments, Kelsey Hightower response and more!

Amazon's Prime Video moving away from serverless: https://www.primevideotech.com/video-streaming/scaling-up-the-prime-video-audio-video-monitoring-service-and-reducing-costs-by-90
DHH blog post: https://world.hey.com/dhh/even-amazon-can-t-make-sense-of-serverless-or-microservices-59625580
Pprof (Go profiling): https://github.com/google/pprof
Dex: https://dexidp.io/docs/kubernetes/

Meir's blog: https://meirg.co.il
Omer's blog: https://omerxx.com
Telegram channel: https://t.me/espressops

0:00

Yes, I think I'm ready. And action, action, wow, long time ago, long time ago, definitely. So welcome to DevOps topics and number 20 second to yes, we are in the 22nd episode, which marks our second season, but not really a second season, it's the same season, but we had a little bit of a break. Exactly, because we are like the young and the restless, we don't rest. Yeah, and we are not young, so we are not. The burden, the beautiful, also, we are bold, but, okay. But with an A, with an A, not with an O, yeah, exactly. So today's topic is going to be on serverless, but not just serverless. We are going to talk about AWS's shift from serverless architecture to ECS, like containerized. It's not really AWS's to be exact, it's Amazon Prime video. And then it kind of, from there, it went on into AWS, we'll talk about that. AWS product manager is speaking about serverless, but I'm framing it as a serverless backlash, because it started with Amazon and then it blew up. That's the name of the episode, OML. Star Wars backlash. Yes, it is. Yes. Yeah. All right. Okay, so OML, what's the first thing that comes up to your mind when I say serverless backlash? Okay, the first thing is the A, J, J, J is the guy behind, hey.com today. I think it's one of the founders, or at least the core members of Ruby on Rails. It's a really strong persona in our field, at least, of the internet. And everything started with a really quite amazing blog post by Amazon Prime video. Prime video, if you don't know, is the Amazon Netflix. They have their own streaming platform, and they released a blog post where they're saying, I don't remember the exact words, obviously, we'll add the link. But basically, what they were saying is that they're having a massive shift from using entirely, their entire thing was deployed on serverless functions, obviously AWS, they're part of Amazon, and they're shifting everything, not only from serverless to non-serverless, but from microservices architecture to one monolith, deployed on, I don't know, how many servers, but basically one huge monolith. Same as Microsoft, GitHub, there are lots of big examples that are using a monolith. It's an ever going discussion on the internet, but having something that large, starting from will focus on the serverless side, not actually the monolith that's a different discussion. And they were saying two things. One, it became far more performant. Everything works faster. Everything works better. The developers are happy, but moreover, they saved 90% of their build, or their cloud cost build. Now, obviously, it's Amazon and they're using AWS, but still, it's different divisions. It's a budget. They are, in some way, paying for that. They are consuming resources, and they managed to save 90%, which is mind-blowing. So, DHHS saw that. He wrote his own blog post. He published it on LinkedIn. It got like 25,000-something likes, and it might not blow up a LinkedIn. That's it. That's the story. Okay, so that's what comes up to. Oh, my, when I say serverless backditch. Yeah, it's just yours. To me is, I think that a friend of ours said, Mr. Tukey, I think he always said, it's an overkill. So, every time I think of serverless, and companies trying to shift to serverless, like big companies, when you have large workloads and large scale. So, when you do that, you're taking a huge risk, and you need to understand what will it cost, how will you manage it, and how will it work. And then, I think that what happens to AWS, to Amazon Prime, where they try to eat their own dog food, but eventually, they pay a lot for that food. Exactly. Okay, I totally agree. So, we do have an episode we made specifically on serverless, right? So, we don't need to reiterate everything, but just for a small context, serverless is very appealing, especially when you start, because it's so easy to start with, right? You don't need to manage infrastructure. Developers love it, because you can use some kind of a framework around it, either AWS Sam or serverless.com, and it has everything for you. It's like magic. You deploy everything. You've got the API. The info does its own thing. If you're using serverless.com, for example, and you run it, everything's connected. You have API gateway launching itself for cloud formations, stuck doing stuff under the hood. You just launch something. It runs and poof. You have everything ready to work in production. It's amazing. It lets you manage environments and permissions and authentication. You can see the appeal, but when it grows too far, and you start to have one, two, three, four, five functions. Obviously, you have to have at least a few functions to manage things. If it grows to dozens and hundreds, you get lost. And I'm speaking from the position of someone who has to manage a production, workload, a huge production workload of serverless functions. It is not easy. It is very appealing on one hand. When it grows, it becomes crazy, and we're just talking about management. The costs are mind-blowing, right? I'm not even imagining what they had to go through with a with Prime video, but I can say our cost is very, very high. And not only that, it comes with additional payments you have to make. For example, monitoring serverless is not easy, because it's so distributed. It's hard to trace things. So you have to pay to additional platforms and services that do a really great job in managing that. And so it's like this snowball that keeps growing. And so that's why this shift is so, so amazing to me. Okay, so let's start with the questions, you know? Okay, now that's the fun part. Okay, so usually when I worked with serverless, when I work with serverless, I work on a small scale. I've made, I've never deployed like a fully blown, maybe, let's say in Amazon buying scale, serverless application. So I want to talk to first to talk about the flexibility. Okay, now I'm wondering like how Amazon Prime, like how they work, because they love scale. So I'm talking, thinking about flexibility. They moved to a monolith, right? Yeah, okay. I'm wondering how you work when you want to maybe move a function from one, let's say if you work with a serverless framework, and you want to move a function from one stack to another stack. Do you feel that it's flexible enough? Because with microservices, let's like split with microservices, it's obvious how you just have a microservice, maybe a Docker file, which becomes a Docker image, and then you deploy it and you can deploy it anywhere and everywhere. But with those frameworks like serverless framework, once the lambda is part of that stack, it's a bit hard to maybe rip it off that stack and move it to another stack. If you want, maybe you want to do it global or shared or whatever. So, and I guess in Amazon's scale, it's even more difficult because, you know, customers, you already have live customers and whatever. So is it for you? I'm asking from your experience, for you? Do you feel like any difficulties ripping off lambdas from one stack to another? Or once you put them in a stack, you're like, that's not going to move, that's going to stay healed forever. I think you're touching an immensely important point. I can't really speak from experience because I don't do all this shifting, all that much. But the very important point here is, let's take the age for example. He on his article mentioned something that Kelsey Hightower said. I think we mentioned Kelsey Hightower a number of times is one of the biggest proponents of Kubernetes is working at Google. And he said something like, I'm thinking serverless aside from the discussion right now. Monolith versus microservices. Let's call them microservices is this very broad term. You can speak about microservices or nano services. When you speak about functions, it's not even a microservice. It's smaller than that. It's just a function. It's one logical task, hopefully. So he said, we're basically, when we're doing that, we're new shifting towards microservices and definitely towards functions. We're taking things that could be moduled as different classes or modules within a function. And we shift them towards network communication between different components. So for example, let's take Kubernetes as the biggest example of running different services. So instead of having one monolith speaking between modules within itself, you're now shifting all this routing within the application to routing within the network inside either a cluster or something else. And that becomes even a bigger problem when you talk about serverless functions because they might not always run within your VPC within your cloud network. They run somewhere out there. And so every bit of small communication, even when you said something about scale, by the way, even the scale, if I'm running my own application, the scale can be me managing multi-threaded application, right? I'm getting a request. It's a thread. I'm getting another request that's another thread. With Lambda, usually what happens is instead of opening more thread, you're opening more lambdas. That's the behavior. Every quest comes in from API Gateway, Alanda starts and that's just AWS example. It's correct for every kind of serverless implementation. A request comes in, Alanda comes up, it handles this request. Another request comes in. There's another Lambda, right? So imagine the scale and the cost and the performance of that. Again, I'm trying not to get too much into the serverless basics of hot starts and cold starts and all of that. We've covered all of that in the serverless episode, but it's hard and it's a big shift. And what Kelsey said and also DHA, they've said more often than not, like nine out of 10 times, that's not the way to go. That's not the right approach. I don't know if I support this. I don't know if I support completely moving everything from serverless to a huge monolith. I'm assuming prime video knows far better than me or what they're doing. But there is a middle ground, right? Microservices doesn't have to be the small functions and very, very small bits of the application. I think Kubernetes can support you, like have a great support for services. They don't have to be that small. Like you said, you can have a Docker file, and a container, and run a single service within a container and have things communicate. We all do that daily and it works great. It's a discussion. I don't know. So back to your question. I say, okay, so yeah. So again, I don't have experience with shifting stuff. It's not easy, exactly like you said. It's not easy. It's hard. It's very easy to get started, but as soon as you try to kind of provide structure and architecture something big, the complex is growing very, very fast. Now I want to take your network example and maybe shift to another topic, right? Okay. When I say another topic, I mean, Amazon moved to a monolith. Now let's play the assumptions game. Okay. I want to assume why they, like, what was their biggest pain? So I think from the, like, in my eyes, the data processing, you know, the data processing, data analytics, or analysis, or everything, or streaming, you know, something which is very, very heavy. Usually requires high IOPS, right? Input throughput, like the volumes usually is very, very slow. When I say slow, I mean, when you are using, or, you know, sharing data between services, if it's on the hard disk, and usually movies are on the hard drive, right? On the hard disk, because it's heavy, so you've stored over there. If you use elastic file system or a network drive, it would be slow. And streaming should be fast. Of course, you have baffling and everything, but it will be very, very expensive, because network drives cost way more money than, you know, elastic block store or any other volume. Again, I think I'm gonna, don't push to talk about Zesty. Unfortunately, I'm pushing it to tell, but I don't mean to try very hard. Yeah, yeah, push, I'm pushing it, but I think, I think, like in my eyes, the move was, like, if you have a lot of services, on the same easy to instance, well, the shared volume, well, they all have a shared volume. Everything is super fast, especially if you talk about streaming, especially if you talk about streaming heavy data. Okay, that's my assumption why they decided to move, because I think the storage, the storage is faster, if you take all the services and put it in a single machine, and then everything is faster. So what's your assumption? Or maybe it's also your assumption, I don't know. That's, first of all, that's a very interesting angle. I didn't even think of that. That's a very, very interesting angle. At the end of the day, that's what they're doing. They're providing a streaming service, and they do have to manage a lot of data. So, okay, fast. It seems important, fast. It's not like totally backlogged. You know where you can just, let's ride for a day. You know, it has to be super fast. So let's assume, let's assume they were, like, one of the biggest users of, I'm assuming, Cloudfront as a CDN, they are probably one of the biggest CDN users in the world, because at the end of the day, it's a streaming service. If I'm not mistaken, Netflix is one of the biggest customers of AWS. So they probably are also one of the biggest users of their CDN, and it's probably using the same pops around the world to provide them media. That said, of course, they have to ingest a data and store it somewhere. There is a single source of truth. And let's start with the problem, Lambda and Storage. First of all, I think like you said, you cannot really provide block storage support for Lambda. And if you do want to do that, you need to implement something like EFS. And that's the next one. It's the first system storage, right? So you do have all the social networks. 20 gigabytes of time. Yes, right. You have the ephemeral storage on the disk, and that evaporates with the implementation of the micro VM that starts with the Lambda. But if you want the persistency, you need to use something like any EFS. Any effect, first of all, it's more expensive to begin with. If you need to provision more IOPS, that even grows further. It's not as fast. It's a network storage. And that work at that storage is not as fast as an EBS provision block storage. So that's the beginning. And taking it from there, Lambda's have a limit to how much they can run and how much they can ingest both in data and a long period of running times forces you to move errors for, right? I think the capacity is 15 minutes. Now, let's not even get there because once you run minutes, like long minutes, surely if you get to 13, 14, 15 minutes, you pay a lot of money to run that. The runtime is expensive with Lambda. That's the biggest spend. And so what you do, the natural shift is towards something like step function within Amazon. And you start leveraging something like containers, like standalone containers that run. Why don't I, I'm not sure what just happened. What is step functions? What are you talking about? Do you want to explain it? No, no, I want you to. Or do you want me, I don't know, whatever you want. Okay, a step function is a service within Amazon that kind of lets you structure different functions as different components in a logical workflow. Let's call it, right? You want to go with some Lambda pipeline, you know? It's like a, that's why it is Lambda. I, that's actually a perfect, perfect framing for that. So that's what it is. It's a pipeline. There are different components that are in charge of different logical steps within an application. And step functions are a way to run through those steps, exactly like within a pipeline. So that's what it is, basically. But then when those Lambda's within the pipeline start hitting this 15 minutes mark, you kind of shift towards either fargate containers on ECS. And that has an entire set of problems within its own because they're standalone, hard to manage, hard to monitor, hard to track, et cetera, et cetera. And so I don't know if you're between the lines, if you're reading between the lines of what I'm saying, again, it's a huge snowball that keeps growing and growing, the complexity grows, the hard cheap grows. Everything becomes more complicated, more expensive, harder to track when things get harder to track you pay for services to start helping you. Data dog and you're really, or just cloud watch, things like that, the complexity grows. That's what I'm saying, especially like you said, with storage. I didn't even think of the lens of the storage here, which is massive within a streaming service. I think I'd just go like when it comes to storage, maybe, okay, just guessing though, the architecture, not really, I'm not sure how to build a proper streaming service, you know, because it's gotta be super complicated. Like, I think I always say that if you want to build something heavy, build a streaming service, because usually that's the heaviest thing that you can do online, you know, that's like, there's nothing more intense than live goods streaming. Right? Live 4K through the internet delivered to, yeah, thousands of people, exactly. So I think like they store, so even if they store it on S3, right, like in object storage, they pull it to, like you pull the data to a machine and then this machine is like a sort of a cache to the S3, because the movie that really, like, let's say the movie, Shoshanker-Demption, right? Like, one user, you know, because it's number one IMDB, so everybody knows it. So one user requested this movie, so it's fetched for my S3, for the first time ever. So maybe this user experience, maybe, I'm not sure, probably not, slower download. This is now, like, the machine is caching the movie on it. So every following request comes to the same machine and suddenly everything is super fast for the following user's blah, blah, blah, blah. Maybe. Which is the exact description of a CDN. Did we have an episode going through CDNs? No. But actually, I just talked about a CDN, like, yeah, I just imitated CDN with S3 and S3. You described the CDN, that's a heck of a CDN. Exactly. Like this. With the edge point, yeah. Exactly. With the one caveat that the, I mean, the great thing about a CDN is it's distribution, right? You have a lot of pops, that's the power of it. Because at the end of the day, it's just a caching machine. But hang on, EBS, and you know that, like, you probably teach that EBS for streaming specifically. They don't, like, users don't write. They only read. So you can have a single EBS attached to multiple instances. Right? At the end of, well, that's probably, yes. Like, so that's the attach. Yeah, multi-attachment. So you can have one EBS, which has maybe 20 movies. I don't know, like, a cached 20 movies where I'm assuming, I'm assuming you're now just describing what's going, like the underlying platform of the CDN server that's brought. Yeah, pure guessing, but it sounds right. It's probably what's going on under the hood. But that's the exact implementation of a CDN. And again, the power of it is having so many pops around the world. If you take a look at where the biggest one, fastly, Akamai, Cloudfro and et cetera, et cetera. Cloudflare. Cloudflare. Yeah. So the distributed, their distribution is so immense. They have dozens of pops within every continent in the world. It grows tens of thousands, I think. And that's the part of it. You have so many small pops holding so much cash. And that's how you deliver media to end customers. So I'm guessing. I'm guessing. I think I mean, that's a wise guess that that's how things are done with the streaming service. OK, so let's try to bring value to the viewers. Listen, all right. So the value, I think, would be now is when to cut. We already talked about it a bit in the serverless episode. Some will 10 episodes ago or something like that. But will you cut? Where do you say, where do you say? Because every answer can be, it depends. But what's your rule of thumb when you say, OK, this one, I think we should do it serverless. This one, I think we should maybe do it with microservices on ECS. This one maybe should be on Kubernetes. Like what would be your approach when you do the cut and say, I'd take this technology over this one, especially when it comes to serverless. Because think about the discussion, discussions that Amazon find it, Amazon, yeah, when Amazon find it, find video did before they made this shift. Think about the war rooms. Think about how they talk. And this guy said, we should do this and that. It will cost less more. And then she said, no, we should do this and that. So they fought a lot behind the scenes, I assume, until they arrived to a decision, until the decision was made. So what's your rule of thumb for shifting between technologies? I'll try to answer with a question. We were both of consultants for quite a long time, right? Do you remember or do you have in mind the first time you meet a customer the first day you sit with them? What's the first question you ask? What are you trying to achieve? I mean, like what's your goal? That's one. And have something else in mind? I don't know, you're tackling me, I have no idea. So mine was always, what is your, tell me your two biggest pain points or three biggest pain points, which is basically the same because also if you're asking what are you trying to achieve, they will probably describe the product and somewhere along the conversation, they'll describe the pain point because we were consultants around ops. Naturally, we were there to fix something or help with something. So that's part of it. What am I here to fix? So instead of asking, what am I here to fix? Because they may be focusing on the wrong area. I was told to ask and I always used it and I still try to use it even not as a consultant today. What is the biggest pain point? So if you're running a system and you're trying to think, what is my biggest pain point? And and mine is not me as an either ops engineer or as developer, try to think as a group, as an R&D, as a team, as a business, right? What are we trying to achieve? Obviously your question and what's the pain points? So I'm assuming, right? If you're running on serverless, what is the pain points? I can assume developers are having issues with monitoring their application because it's so widely distributed. Their application is segmented into, let's say, 20 functions. And it's hard to read. And you can do that with a simple interview. Go to one of the developers and ask them, what's good and what are the goods and what are the beds here? It's a home man. That's a dog and log that I own our hearing you and saying, we have a solution for that, you know? Yeah, of course, a solution. I just read a report from that dog. I don't want to probably mess up the numbers, but something around like $43 million spent within one quarter on just understanding the ingestion of the great service, super expensive. You may be able to save with the better architecture, right? There's another service, I wouldn't name names, but great service for monitoring Lambda functions within AWS. It works great, but it's so expensive to use. And so I would ask, what's the biggest pain point? By the way, that might be a point point. I'm paying so many case of dollars a month, just to monitor my serverless architecture. That can be a pain point. Another one would be, it's taking longer to deploy, it's taking longer to debug. It's hard for me to understand what's going on in production. In my case, I need to provide tons of permissions to the developers for them to be able to work out things and debug things within production, maybe it's hard to run the code locally, because you start using local stack. I think you mentioned the episode stuff like that. I mean, the complexity and you need to understand where the complexity comes from, what is the problem? By the way, your conclusion, at the end of the day, might not be to change the architecture, maybe rebuild the same technology that you're using server that can work great, by the way, if you group things within a logical manner. If you start opening a repo for each and every function, then the distribution becomes even harder. It's harder to track, harder to monitor, harder to manage. If you are a software-compliant company or trying to become a software-compliant, not to mention ISO, everything's harder when it's so much. I'm also thinking about our normal. I think we've never talked about it, but when it comes to, maybe we talked about COBLINito and API Gateway, but when it comes to authentication. So when it comes to Lambda, OK, if I compare it to maybe containerized application on ECS, COBLINETIS, whatever, so if you use Nest.js or Express.js, whatever, they have those plugins, those extensions for using JWTs, JOT tokens, and they work with us to zero provider, and with every provider, when it comes to Lambda, it's way harder to protect them. I mean, way harder, comparing to something that has built in native support with community and whatever. Right, you're not using Nest. You're not using Ruby on Rails. You are using API Gateway, and then you're forced into the token mechanism of API Gateway, which can be very simple, very easy to mess up, very easy to deploy your own Lambda function that's not covered by an authorization token. It's very easy, and if you try to block it, then other issues start popping up. Like, but also use Ruby when you're back at you. I won't say usually, but if you have a backend, then a frontend, right? So maybe your frontend is React and backend with Nest, yes, whatever. So to communicate between the two, it's like, to me, it was easier to have a normal backend server on in containers, rather than putting the Nest.js server on a Lambda function, which is an overkill, a Lambda function runs and needs to own a whole Nest. That sounds like a bad marriage. Yeah, but think of a maybe we have people on the line, on in the audience that are running Lambda functions with Nest.js servers, who knows? I hope we don't. OK, but it might be. Yeah, yeah, definitely, definitely. So I agree, I totally agree. When you run within a cluster, everything's structured. I'm assuming you've built your own authentication server or your own service of authentication that you didn't build, but anyway, it sits there and everyone knows how to work with it. And it sits on its own and it's easy to discuss with, it's not this service on AWS that you can mess up completely and deploy your own infrastructure. And I mean, the sky's the limit of what you can do. For it, good and bad. I totally agree, it can agree more. So to me, like, Lambda function, well, I like, I like using them for coin jobs, schedule jobs, maybe like notifications, stuff like that. But and maybe for a small or a very, very small, like for a very small organization or a mock up or maybe an internal service as a backend API server. But when it comes to a server that, you know, serves customers, a production server, I'm usually most of the time, I think I've never done an architecture of serverless, you know, full serverless when it serves only customers like, but with heavy, heavy scale, because I was already afraid from the Lambda concurrency, it was already afraid from sharing data on storage. And, you know, there are many things that you need to take care of, like the logs, as you said, because it distributed and whatever. So my old term is for small applications, coin jobs, schedule jobs, minor tasks, and maybe even tests or internal projects, use Lambda functions for heavy scale loads, for something that means a lot of data processing. Well, even if it comes to like how we talk about Amazon pine, it's like you have a lot, a lot of data processing, you should use, I won't say a monolith, but containerized, at least, right? I think that's mine. I want to give you, I want to give you, first of all, totally. I want to give you one more perspective, but I wanted to fight with this episode is something we're just agreeing all over the place. One more perspective, but before that, just one thing I want to mention, and we have our corner at the end, so I'll mention it there too. But Lambda comes with a lot of issues. Obviously, if you're just starting out, either listen to our episode, or at least try to understand what you're doing, understand the infrastructure, hot starts, cold starts, things like that. I have my own site project that has a huge memory leak, because of something I did with database connections. And database connection is one of the biggest issues with serverless, because you keep reopening those connections, you need to handle them, you need to make sure they're closed, you need to make sure that if it scales widely, because of a lot of cold starts, first of all, the connection is shared within the run of the RAM, and if you're scaling too wide, the system that you're approaching, the data system that you're connecting to can handle those load of connections, that's another topic. So that's just one thing I wanted to say. In terms of perspective, I want to provide one more perspective, because again, Lambda is very easy to start with. It helps you, the frameworks are amazing to get started with. Imagine your small startup of two, three people, maybe one of them is a developer, maybe two of them are developers. You don't have a lot of ops skills within the system, within the group, I mean, it's hard to get started with ECS, with Kubernetes, with the running containers. Maybe it's not a skill you have at all. Maybe those developers never work with containers, I don't know why, but they didn't, and then serverless becomes very appealing, because it's easy to get started with. All you have to do is wrap your code within this handler function and throw it away, an Amazon or Google or Azure will take care of it. And I can understand the appeal, and that can be very, that's a lie, what you just said, because then people start using it, and they say, okay, I want to do yarn, ed, axios, yarn, ed, web tech, yarn, ed, whatever. That is the one to suicide, because then you deal with lambda layers. And that's exactly what I want to say. If you're starting to use lambda, which is good, and it's appealing to a small team, go ahead, just understand the infrastructure, understand what you're doing. If you're going to have to use a lot of layers, and maybe that's not the way to go, because as you start adding layers, it would be easier to just be a little container, or just run on an EC2, or just something grow. So I understand the need to go quickly to the market. I mean, that comes first, you need the money, you need to have paying customers, you need to have investors investing money in the company to keep going, and pay salaries, et cetera, et cetera. So it's understandable. Hey, Natalie, what are you talking about? Uh, something in my mouth. Oh, thanks, Mark. Yes, it's not as important. You know what? You're touching a very important point about this shift, by the way. The DOJ speaks a lot about these huge shifts, and specifically here from serverless to monolith, right? This may be a great thing, and they've saved 90% on the monthly bill. Can you imagine how much engineering hours did it take to make that shift? How much did it cost in salaries? Nobody speaks about that. Not Amazon and their blog post, not D.A.J. in his blog post. I'm assuming our, I am hoping they've counted it in, and they made the calculation. They never spoke about it. And nobody ever talks about salaries and engineering hours when you make those big shifts. So every decision, while it can be great, and it can be a great shift, and a great change in a switch, and whatever, it comes with a cost. Even if you don't see it at the moment in the AWS bill at the end of the month, it costs and engineering hours cost, and not only for the shift itself, maybe you're shifting into a huge cluster of Kubernetes, that needs to have two engineers constantly working around the clock to keep it alive, because of how it's built, because of how it's distributed, et cetera, et cetera. Just think about that as well. Okay, I'll say just one last thing about lambdas, and then we move to the corner. Moving to the corner. So one last thing is, you talked about the great topic, like the lambda functions and database connections. And I didn't even think about it, but you need to, each lambda function that is provisioned, it needs to connect with a database, and that's a whole new end point that connects with a database, so you suddenly open a new connection. Other than that, there's also a new elastic network interface that is attached to the lambda. So you can also hit that limit of maximum number of elastic network interfaces, and that's only because your lambda function runs in a VPC, probably, because you want to communicate with the database. Now. Yeah, yeah. Well, I'm sorry to interrupt you. Amazon made a big shift around 2018, by the way, I think we touched it on the episode. They made a big shift, because what you're saying was a huge problem to lambda runners, and they created this diagram where you have one ENI provisioned to everything that runs within the VPC, and that's kind of a single point where everything comes out of. And then that kind of offloaded the problem with the immediate data problem of lots of VNI's. Just do that, just a comment. Okay, okay, good. Okay, good to know, right? To see how I didn't use it that much that I can, you know, I'm not updated to, I'm like, check DPT update to 2021. Well, I had to use it. That's fine. Yeah. Okay, so also when it comes to lambda, you also need to make sure it's a lambda in VPC, which is also another, I won't say pain points, but if you run a container in a private network, you don't have to worry about those things. It's just in a private network. It's an overhead for sure. Okay. So I think we covered a lot, like why would Amazon plan moved to, I don't know, to containerize or monolith? And I told you my guess is about the storage, because fetching storage, writing to storage, living from storage. I think like this is like a huge decision, and if you're listening now and you're wondering if your application should be a containerized around the functions, if you have a heavy load usage of storage, you know, IOPS input output throughput, whatever. Maybe you should consider using containers on easy to, like easy to types. So the volumes I'll share, then you can edit them right fast. And maybe you said it to save costs. Okay. Any last words before we move to the kernel? Yes, one thing. Just, I want to continue and build on what you just said. Think on what you're doing, and not only think whether what I'm doing right now matches the situation, the technology, but think on the future. So again, I'm not saying overkill and over architecture and overstructure everything. No, no, no, that's not the deal. Don't run into Kubernetes. If you have 10 lines of Python code, that's not the idea. But do think that you are going to grow within a year into something that will need some kind of scale, right? And if everything you start with, I'm speaking for experience on what I'm saying now. If everything you're going to build is going to be very much serverless oriented and built into functions running within a VPC, opening connections, using API gateways, all those stuff, if you ever decide to make a shift, even a small one, it's going to be very hard. It's going to cost in salaries. It's going to, you know, force you into a lot of planning meetings and designs and whatever not. Try to make the decision from day one, if not from day one, from Q1, think about the future because it's going to, it's going to be painful. That's all. All right. Okay, ready? We are moving to the corner. Ali, I haven't actually moved to the corner and I'm doing a friend of mine. I was waiting for that. Yeah, so we have a friend over here. We have our first audience. Yeah, first, Ali's and El ever. Actually, it's not Elvis, Elvis, he's here. You said Elvis, but oh wait, Elvis is hang on. Ha ha, he's here. Yeah, I'm Spain. Anyways, so corner of the wig affects Paul. Yeah, so now we are going to talk about experiences with we've had this week. If it comes to technology, maybe ideas, any topic that happened to us this week or maybe even last week. So, Omele, would you like to start? I would be happy to. Sure. Okay, I have, that's maybe it's topic for another session, but I have a small telegram bot. My brother asked me to build and it's open source. It's in my GitHub repo. I can link to it. I had a memory leak. I kept seeing it. I talked about FlyIO, by the way. I'm deploying it with FlyIO. It was one of the projects I mentioned in earlier episodes. And it helps me a lot because it has its own Grafana, so you can view everything. And I kept seeing my container getting restarted. And then I went into the metrics and I saw that I have like memory building up. Like those small steps, every hour it builds up, it builds up until it, you know, everything's clogged and it dies and a new container starts and it keeps rebuilding. And I had to track it. So, to track it, it's written in Go. So, to track the memory, I had to profile the application. In order to profile the application, Go has a tool. Obviously developed by Google, it's called PPROF, which is basically, I don't know, what the PIP stands for, profiling Go applications. You can provide profiles for CPU memory and whatever else. I needed to profile my memory. So, I ran two different profiles, one at the beginning at the start of the application. And then after I have this scheduled task every one hour, I assumed they were connected and I ran another profile and then opened them. PPROF can open a web browser with this nice visual diagram and it can show you the functions and how much memory they're consuming and the processes and everything. And it's really easy to debug and understand what's going on. And that helped me remove the issue of my memory leak. If you're interested, my memory leak was because I did not share the database connection. And every query to the database, I just created another pointer in the system pointing to a new connection and that kept rebuilding because every hour I have this scanning the users and providing some data. And that just opened a new set of 10, 15 new connections and it kept growing and growing and staying forever in the system. Obviously, the memory kept growing, the application kept dying and that's it. So you'll see it in the application level, right? So it's not the application level, right? No, no, no, no, I just started one shared connection when the application started and then just shared that pointer within a global function. By the way, sorry, a global variable. Usually, globals are one of the ways to create memory leaks. In that case, that was the solution. Yeah, that's it. Nice. What was yours? Actually, this week, oh, yeah, this week, did something which is like an ops thing. So remember that in the previous episode, it talked with you about like, you know, the act and then yes, so and yeah, everything. So now I did something which is related to ops, okay? I wanted to allow our developers to be able to access our internal Kubernetes cluster of internal services with the Kubernetes dashboard. Obviously, I didn't want all the developers to have an administrator access to the dashboard, right? So only maybe the DevOps team have cluster admin and that will have, I don't mind having view access to all of the cluster because again, it's an internal cluster, but the right access should be per namespace per application, per whatever. And then you're starting to think like, how can I, okay, so the authentication part, usually authentication part is the quote unquote, you know, easy part, but the authorization part is the pain point over here. Because now you're trying to think, okay, how are they going to authenticate? So what I did, there's actually a great blog post by AWS and I will share the link to it over here. Using decks, are you familiar with decks? I just wanted to mention decks and I kept my mouth shut. Yes, definitely. It's amazing. So what I'm doing, well, the process is not that nice, but I prefer it not to be that nice because it's very, very sensitive, right? So I prefer developers because I don't do it on a daily basis like me. I prefer them to have like this extra step of logging in or signing in, then having a smooth process and maybe someone else will be also be able to log in. Okay, so first I, before I even made the Kubernetes dashboard private, I use the public load balancer. So the application was open wide open, right? Then I would like why wide open should be only if you're connected to the VPN, you should be able to access to the Kubernetes dashboard. So first in first that worked, great. But how can I authorize users? Like how do I know that developers stay and move from here to well? So apparently with decks, you can create that is something which is called decks, K-A-S, like Bernetti's authenticator. So that's like an application by itself, right? It's not decks. It's like another application is related to decks. Well, you log in to that application with Google, for example, or any identity provider, like Facebook, GitHub, whatever your company uses, probably not Facebook, but whatever. And so we're using Google, right? So I define my Google workspace to be that and to provide a follow that Kubernetes authenticator. So you log in and then you get this ID token that you can use for logging into the Kubernetes dashboard. So Kubernetes dashboard, you know, when you log in for the first time, it tells you a cube config or ID token, right? So you need to go to the decks login page, get the token, and with that you log in to the Kubernetes dashboard. And the token that you generate, that's the awesome thing about it. This token, it's a JWT token, right? Tell me it's based on the Google group. It's based on the Google group. That is amazing. So I can't believe you're saying it. It's exactly what I'm looking for. Roll, binding, you just create, you know, you just, like, if we have a cluster roll, for example, a cluster admin, you create a cluster roll binding to the Google group and the token is based, but like there are a lot of things you need to do. So I won't talk about the configuration because it's a bit of annoying because they don't make the solution. The link with me and the audience, that's exactly what they're looking for. I will, it's amazing because now, okay, so I can say one caveat is because I'm using application of balancer. If I used NGINX controller for example, or NGINX dot balancer, the authorization header would get directly to the Kubernetes dashboard and the step of filling in the ID token would be useless, right? Because I'm using AWS ALB controller, right? So because this is the one I have explored in the cluster, I can also deploy NGINX controller, but for now, I didn't do that. So I cannot pass the authorization header and this is why when you log into the Kubernetes dashboard, you see the login page every time. Okay, so I don't mind because the developers I understand why they didn't get a little bit annoying, but if you use the NGINX controller, whatever, you know, it's, you can pass the authorization header in the header and the, I don't know, the whole base access with the identity provider can finally meet together and have fun together. And this blog post actually was updated, I think on this January, like five months ago. So it's a bit fresh, you know, if you talk about the amount of service that's doing something similar, not necessarily only for Kubernetes. I use it, I mean, it started from Kubernetes. There's, I forgot the base of it. Is it cloak? No, no, no, no, no, no, no. So what I'm talking about is called SSL sync. It's by Buzzfeed and that's based on, oh, I forgot what it was. Anyway, it's a very known service for Kubernetes. And that did roughly. Oh, see, maybe. Yeah, yeah, yeah, thank you. Thank you. Oh, that was based on that. And that provided the same functionality with the caching function. So you basically didn't have to keep reauthorizing yourself. It has a cookie and that cookie can hold as long as your TTL states. The problem is it's just stopping maintained like a year, a year and a half ago. So we wanted to offload to something else. We started using DEX. That was a great service. It's good to know that we have that service. So I didn't know about that. Yeah, DEX is nice because then, you know, the thing that you do with DEX will like, maybe you also do it for your Kubernetes cluster, will you, how do you say? And associate, yeah, that's it. Associate DEX as the identity provider for your cluster instead of Google Workspace. So Google Workspace is behind the scenes, you know? So it's nice that even if I switch Google, usually that doesn't happen because your user library, user directory doesn't change that often or never. Okay, hopefully. But it's nice for you can have a layer where you can manage everything and it's easy and maintained. So I really enjoyed it. So all IDC, OpenID Connect, authentication tool, Kubernetes dashboard to the Kubernetes cluster, with roles, with authorization, works amazing, DEX, good job. Amazing. Yeah, good to know. Okay, all right. So I think this is enough for this week. I think so. We didn't break a record for this time. Usually we'll break records, but, okay. Like it was for the length of the, yeah. The gap, the gap, the gap, the gap, that was record, yeah. Okay, so see you next week. And thank you, Omeb. See you. Thank you very much. Yeah, I love that. You went to the audience. Yeah, audience. Bye, Odeans. Bye bye.