#28 - The Perfect Pipeline

Send us a Text Message.

What's the perfect CI/CD pipeline?
What is even CD?
How do we implement our own and what we see as the holy grail "perfect" flow?
That's our topic for this week

Links:

https://aws.amazon.com/blogs/aws/new-aws-public-ipv4-address-charge-public-ip-insights/
https://www.runatlantis.io
https://k8sgpt.ai
https://12factor.net

Meir's blog: https://meirg.co.il
Omer's blog: https://omerxx.com
Telegram channel: https://t.me/espressops

0:00

oh my ready yep and action yeah magnets to the rescue magnets to the rescue so hello everyone and welcome to the vopstopics episode number 28 we are in the 28th almost 100 almost almost we are getting the week by week and we have on the line surprisingly and if you're watching in a video so he's also on a video not just on the line and today we are going to talk about CICD okay continuous integration continuous development or deployment or delivery or whatever you want to call it yeah and the perfect pipelines we're gonna talk about the perfect pipeline ever like how do we see it as they've upsenions the perfect pipelines so oh my when I say CICD and the perfect pipeline what is the first thing that comes up to your mind but the first thing that comes to my mind is just as you were saying it there as you were presenting the topic I was thinking oh my god this can be a podcast of its own it can be like a long living podcast with like a hundred different episodes each of them just analyzing every step in every option and what is CI and what is CD and what is deployment or delivery oh it's so I think that's the first thing that comes into my mind but so let's let's follow up let's do that with a question okay so I have a follow-up question on what you've just said we say CICD so can you break it down to me like what would you do when you say CI how would you classify or do the CI stuff and now would you classify and do your deployment stuff like do you even break it is it in the same pipeline do you have different pipelines so how do you break between the CI and CD that's exactly it so I did lots of different I had lots of different scenarios over the years I can tell you what I'm doing today and what I think is the I don't know if the perfect pipeline because at the end of the day like we'd like to say it depends on the technology it depends on what you're doing right if you're running against Kubernetes you might have a different CD system if you're only deploying lambda functions it's easier to just run from the same pipeline the deployment as well and if you're really advanced and you're actually running continuous delivery not only deployment we keep and which we can touch on that in a little while then that is another game right it's having this detached system that probably tests whatever is going on I mean after you run a deployment it tests what's going on and then the system on its own makes the decision whether to actually deliver then by delivery I mean call all the way to production but let's wait with that so I think a normal pipeline to just lay the ground for that it most of the time includes build test and deploy right those are the generic that's a full pipeline that's a full supply plane yes because that takes you from building something be that a binary a Docker image both them I mean maybe testing that everything is in order and testing is just you know that's another podcast of its own what are you testing unit testing integration testing end to end etc etc and all the way to add deployment which takes whatever you build and tested hopefully and deploy that into any environment I think that's it do you agree yeah so just to classify it I'd say build and test is for the CI and deployment is for the city if you break it down maybe to two pipelines to separate pipelines correct yes yes okay so we have a ground okay ground rules right so build and test in CI cities just a deployment okay so as far as I know okay a lot of people in the world have the same pipeline for building and testing and also for deployment uh-huh and before even diving into a specific I'd say I'm not even sure if to say I'll protect you but I want to you know shoot out some some type of deployments and applications so let's say I have a mono weapon or any kind of project which includes the front end the backend and the infrastructural now I do want to for a second anytime someone says a mono repo I smell a fight so I'm going to stay out of this fight now okay so so let's say let's say it's even in separate projects aren't we yeah I'm just saying okay so let's say for the front end we have react just just to make it abroad you know so a lot of people will say oh react I know that one this is a front end framework okay I know it so it's types gift and react the backend will also be with maybe types clips so it will be nest JS also a lot of people know nest JS or even express JS I don't really care just laying down the ground rules okay fountains with react backing with um nest JS or express JS and the infrastructural is with terraform hopefully most of you guys know terraform okay so now what I'm wondering how we you and I armor can break it down so we'll have theoretically the perfect pipeline for each use case for front end for backend and for infrastructural and I'm saying like it's gonna be like a few minutes on each topic and only the ground rules that lead you on how to build the pipeline for each scenario so let's start from the infrastructural okay from you know the bottom of the ship okay from the soils do you have any tips or commendation or anything special that you do when you do a CICD for the infrastructural when you use terraform I don't I'm not fully there yet but yes my one saying about that is keep your infrastructure managed as you would an application repo because deploying infrastructural some would say it's actually more important but I don't want to play the games of what's more important that they're both equally important to the customer at the end of the day it doesn't work without infra and it doesn't work without an application just treat it like it's an actual production system like hopefully you would not go and deploy a pod to Kubernetes in production you know just ordinarily just deploy a pod to production or switch a running application with another version the same you don't want to do with your infrastructure for the most part I know some things I personally like to keep I do like to keep manual usually areas around DNS stuff like that that's how I was educated other than that just take the infrastructure code also for staging and other environments not only production treat it as a different environment meaning you want to change something in infra or create a new one add the code to the repo create a commit in a different branch push it create a pull request ask for someone to review that fellow member in the team your team leader whoever that may be approve the the other merge request and then once it gets in it also goes through a CICD pipeline and testing by the way terraform for example there's a plan you can test if everything's right if there are missing variables if something conflicts with the state or something like that so that's kind of a test and then deploy it to staging make sure that everything is okay there and then progress the production now I want to do a deep dive so this like what is you lay down like a very high level build side view of the process now because I'm sure like there's a DevOps engineer listening to us saying dude I know this is the you know metal level process of what I'm supposed to do but eventually I need to write steps in my CICD and it writes steps in my GitHub actions drone I also to say what ever CICD service I'm using so now I'm wondering with you homel like do you have any special tips when it comes to you say metal quest so when do you do the plan on each push to each to any branch maybe you have branches pale environments maybe you plan on any environment like what's your what's your strategy when it comes to planning on terraform um okay so just to take I don't want to take it out of the equation because it depends on what your strategy and how you run your Git systems personally I like it I think it's called trunk based where you only run a main or a master and then you open feature branches and then you all merge into the same one just because you don't want too many conflicting branches merging with each other or progressing as you were working on a feature that that's also a good solution by the way I used to work like that many years and it was okay for the most part kind of annoying for developers but it was okay they can get out so we're talking about the actual steps what do I want to do yeah because now I'm wondering when you say plan with terraform yeah so what happens in maybe pull a quest or any push to feature branch is that you plan but eventually when you want to apply you want to apply the things that you planned you know in the past and when you're saying what you've just said we'll have plan and apply in the same pipeline and I don't think that's the what we want because you really want to break it down well we see the plan we say okay that is good now I want to apply that plan to an environment k2 alive environment in AWS GCP Azure or whatever cloud so any tips about that it would be hard to give like a generic tip that follows everything I can I can do something in this approach I'm at all that we mentioned lots of times here let's call Atlantis that's specifically tailored for pull requests for terraform right so you're making changes to your infra use Atlantis to manage that for you and then the entire set of questions you just laid out should I run the plan should I deploy right away should we wait for something etc etc is also with Atlantis because when you integrate that into the system it kind of integrates as an automation to the pipeline flow and then you create a merger quest it already it because it's already integrated it kind of pops and then runs the plan and then tells you okay plan is okay are you ready to approve or plan went wrong they start those I don't know merges blocked please fix the changes re push the commit and then I will check again so that kind of solved the entire set of questions and and let's you run your infrastructure as code managed as a Git process like a production application I hope it's kind of around what you've asked but no I'm not sure what you're doing I think it's like okay so I'll use Atlantis think basically exactly I'll use Atlantis for that okay so I want you to challenge my idea I have an idea where well I already implemented it and I want you to tackle my idea I want you to do let me know what is wrong with my idea or maybe it's perfect maybe this is a perfect pipeline for infrastructural and then we move to the back end end to the fountain okay let's go so I think that planning with Terraform is important to do on any branch to any environment so assuming I have three live environments development staging and production okay and my my like I am always to know everything up to date no matter what happens to any environment so on push to any feature branch I plan with Terraform so even though I'm only in development right now I'm only in development and I change maybe I added some cognitive user pull maybe I added some S3 bucket or whatever I did in Terraform I commit and push and then I get three plans on each push I get on each push I get three plans okay by the way if you're worried about racing conditions stuff like that there is a lock time out in Terraform where you can use so if something is locked during the planning time it can make the CI not fail so it's nice to search for a lot of time question I'm not sure it's 100% relevant to the question but just so I have the context your workflow is feature branch coming out of dev reemerged back to dev then staging then production right okay so I have only a main branch actually okay well as you said so main branch I check out from that branch walk on my feature then commit commit and push everything looks good yeah then merging it back but get in this part so git has nothing to do with what inside of the live environments so git is for only managing the versions so how do you decide what goes to what exactly we'll get to that in a second that's the deployment part but for now I just said I'm planning okay on for three environments simultaneously in my pipeline hmm each plan is then pushed to its own s pre bucket of Terraform plans okay according to an aiming convention per build per commit or whatever so now I have I like the artifacts of plans in the same as three buckets according to an aiming convention blah blah amazing following that assuming I want to deploy a version it doesn't have to be from a specific branch I can do it from a branch like feature branch from main branch maybe form a version branch because let's say in GitHub when you create a release it creates its own you know the new tag creates a new branch and then that also triggers a new CI CD and if I create let's say version 1.0.0 RC2 or something like that in GitHub in GitHub so then I will have a new branch a new CI CD a new CI is triggered I'll have again a plan for all three environments and then in the deploy that's a separate pipeline so in the deploy I and I cannot avoid it I gotta do also Terraform in it because you gotta get all the modules and stuff so I do Terraform in it I do I download the relevant S3 plan per environment so in my deployment step usually what I do is um provide a path to the pushed plan you know from the CI step provide the target environment and all of that is like in a workflow dispatch so like a manual in a manual workflow okay so it doesn't really matter if you're working with GitHub actions or any other get um CI CD provider just use the manual workflow that you have in that provider provides two inputs which are well the version or the path in the S3 that you want to apply because that's the path to the plan and provide the environment that you want to deploy to where where the environment coming from that's what I'm trying to understand so the environment is coming from like I used uh choice box so I have three choices like development staging and production like a human choice box like a human choice box but that also reverent so if you want to get it from the API you can also you want to define the problem here is my problem shoot uh the way I see it any kind of human integration with decision making as to what goes where is the source of some time somewhere in the future it's going to be a problem that's going to be a mistake you cannot avoid it um and so that's my only tip here because it sounds right my problem is decision making made by a human being sometimes somewhere someone's going to push the wrong button or trigger that because they think you know you're getting uh replaced or joined by a new team member and they decide to make everything automated and then build the system around that that kind of figures out a way to send the trigger with automation of the choice box right somehow this is going to break that's my feeling and so at the beginning we talked about what is the D of CD which at the end of the world we wanted to be also be delivery and what you're saying here I kind of created the human gate right because things are happening and the human gate is where the decision happens whether I want to deploy to this environment or that environment or whether the change doesn't make sense because it breaks something I found an error and I'm saying no kind of sure we need to start with a human gate there's no way to avoid that but let's do it in a more kind of structured way that can be the branch name it can be a conventional name in I don't know your source code something like that but the moment you put something like a choice box it kind of leads you to an error and kind of complete question about it and if I make it most clicked so let's say the deploy part in my CD pipeline is a deploy only if the source version contains alc whatever and then I'll know that maybe for production and staging I'm only deploying alc and for development I can deploy any version or any any branch version any plan what do you say about that it sounds perfect let's open let's open a parenthesis and put a horror story that I think I told you 100 times but let's do it again I'm maybe that's what affecting me because I keep that in the back of my mind should shoot it obviously horror stories have to do with Jenkins but it's not only Jenkins any type of system that would use a human gate would somewhere down the road have that so we had a process doing exactly that there's a choice box that decides which environment is it going to be deployed to we had like dev one two three four and then staging a production if I'm not mistaken hopefully that's correct first problem with this to me is if you're already running a Jenkins server separate another server for production that's me anyway let's go on with the story someone created an automation around that because he wanted to update something within the environment I don't remember what any he wanted to test his script so he ran it but as he was running it he forgot to provide a variable right to the script the script had a variable that sets the environment so he thought to himself okay no worries I forgot to put the variable it'll just run with the default which was dev one at the time because it was the first one in the list um his automation system actually kind of interpreted that as star so it ran a loop over all the environments and at it took him seven minutes to delete our entire set of environments including production and it took us an all-nighter and then the next warning to figure out at like 60 70% of the system so customers can at least work but it was a sheet show it was so bad and I cannot forget that and that's why try to avoid these things as much as I can that's it another question yeah sure another question then so let's say my CD pipeline gonna improve the solution as we talk so let's say my CD pipeline it doesn't just apply the branch to you know the plan to a specific environment what it does it sends a slack notification okay with approval button okay so you get a slack to maybe we have a releases channel and there is an approval button and only certified people can approve those types of requests so maybe I want to so any push to staging or production requires another a four-person step and when you click on approve it will send the hook back to GitHub actions and trigger the correct pipeline that will deploy to staging or production or whatever what do you say about that so we have another buffer another human buffer in the road okay I think first of all it is an improvement because the fact that someone says it isn't it isn't you yes because you're taking the choice from deciding what environment should it be deployed to a yes no question which is easier less less error problem the problem I think is that you're still kind of having strings attached to something that's very manual it's a very human interactive right if someone wants to integrate an automated system into that he'll have to work around this around the approval through slack maybe an automated bot in slack which kind of it's kind of a weird solution or just bypassing the entire thing and then when you start creating bypasses maybe you're bypassing everything and you will never be approved or the question is why would you want to okay so you're talking about maybe building on top of it like to my eyes the system is perfect that it is first of all I don't even need slack approval as you said but the system to my well maybe because we didn't have any issues so far right but the system works perfect as this even though I agree that if someone chooses maybe production well the like currently we filter it by actor and branch you know so not everyone can just select this branch to this environment because otherwise what would be the meaning of choosing production you know it's not protected so the plan to production is protected so the only thing you can really destroy maybe is development and staging challenge accepted you know I always like to be challenged with destructions of stuff so what else can be wrong in this solution I mean I want to say a statement and I want you to attack it okay based on what you said because you started your you started this section by saying what was it this is already perfect I'm happy with things the way they are which is allegedly allegedly yeah okay you know I will not gonna look at each and every specific thing you did there um sure but you said it's already perfect I'm happy with where I am right and I'm going to say okay I get that you're happy where you're at to me as yeah we're talking about the perfect CI CD pipeline right we're looking for the perfect one to me a perfect pipeline is one that a human doesn't have to be part of it needs to be at times things will go wrong you need engineers I'm not saying let's get rid of that I'm saying that the perfect pipeline if nothing went wrong goes on its own through the entire set of environments all the way to production and rolls itself back if needed on its own that's the perfect pipeline hang on with it so the perfect pipeline and maybe it's even specific to infrastructure okay I'm not sure if it's specific to back and then front and what for infrastructure sometimes the changes are so big let's say I do want to delete my cognitive user pull which is a big change you know destroying user pull or maybe an estri bucket is a big change and this is the change that I do want to do usually what happens a human reads the terraform plan sees okay these buckets should be deleted maybe they even take to take it even a step further and go manually and empty the estri bucket before the plan takes place right before we apply because sometimes it can get an error and blah blah blah and I don't know like there's always a human involved in that process especially when it comes to infah okay I want to say something you described the leading cognitive pull right for example that's I think less than 1% of the deployment for me most of my infers I mean sure you make big changes at times you change DNS and you change routing and you build environments but like 99.9% of the changes I make is like having a port removing a specific updating a reference to document updating a labna function hash something exactly I mean those are changes that don't enforce or don't require human interaction I mean be my guest add some kind of a switch that tells the pipeline this one needs human gate this one needs human interaction but like 9 out of 10 times or even more than that 99 out of 100 times I don't need that I'm not saying I'm there yet I'm saying that's where I want to be so let's flip the infra for a second in terms of because infra again you said you need a plan you need to kind of approve stuff on the back end world that correct me if I'm wrong I feel like before we jump to the back end because infah is like it's like I just want to finish it up in the infah so when it comes to the perfect pipeline infah you're saying I don't even though it's infah I don't want human involvement that's the least I don't care if it's infra or not it should be exactly the same that's what I'm saying the if you have I try to work in principles right I'm not saying I'm there by the way I'm not a woman if you delete a docker image or update the application in the back end that's like me roll out is like a push yeah if you delete like infah holds a state if you delete a database it's worse than deleting the docker image of an application because this one can be built and this one has a state so you need to recover it as long as you have it just to recover the pen or something like that I'll say two things it's a very it's an edge case it's an extreme case I agree but it's an extreme case and but the other the other side of that if you blend everything and everything is built correctly trust the process right even with back end deployments you delete you create migrations you change the data and it will not deleting a database which is okay it's dangerous you're literally deleting that data in the database so as long as you have backups and snapshots and and exactly a DR plan you'll be okay so for the most part yes I think you can treat both on the same level and what I started saying is I try to apply principles and I said yes I'm not a saint I'm not doing these like all the way like it's not by the book but the principles are treat things in the process that you think is right and I think the right process is no human interaction all the way automated to production and we can touch the back end in a few moments and then I can share what I think is the right process I just like it that we like I can't say we like we don't agree we have different perceptions so in my like where I think that you must have human interaction you know so the like my concept is that before you go to production there must be a human like even if it's the perfect pipeline ever eventually when you do a retro okay on this print or you do a retro on the deployment someone will say how come you're fully automated the production in pipeline where was the human buffer where was the operation human shield along the way you know this is what I feel that someone will want to know why a human didn't inspect the terraform plan or maybe back end deployment or front end deployment before it went live to customers let me tell you another controversial statement and then you can attack it shoot okay if you if you stand behind what you just said which you think you always need the human buffer in deployment to production infra back end only production environments production environment but I'm not I'm not separating between infrastructure and a back end application right or whatever shoot application I'm saying if you think you always need to have a human buffer you don't trust your process enough so specifically for a back end you probably don't the right saying is you don't trust your test which means improve your tests that are exhausted my video is done yeah but if you don't trust the back end processes because of tests and if you don't trust your inference because of the way you test the infra do you want to stop for a second and switch the camera yes okay we'll be right back all that is back okay yes so okay I started by saying you don't trust the process enough if you think you need a human interaction in the middle and I don't care if that's an infrastructure or a back end thing but specifically in the back end the saying the likely saying is you don't trust your testing process which what I want to say the next line would be improve your testing process you need just sure maybe you need to act on integration test probably a time to implement and to end testing and when you have the entire flow from start to finish and you trust the process enough you have enough test there's no reason for a human buffer it'll be all right if you if you have a perfect pipeline and everything was tested and we didn't even start talking about security and licensing and and compliance and stuff like that but if you have everything integrated the perfect pipeline at everything passed you you already put human kind of human gates right the gates where you're steps if all the steps you have a hundred steps they all passed security compliance testing end to end user interaction whatever everything passed you're good to go why would you need to stop maybe I'm old-fashioned I don't know I've made maybe it's an old-fashioned way of saying like um I already been to an organization as well whether the telephone plan did such a bad impact where people just ask why hasn't anyone looked in the CI CD logs to see the plan why was it like accepted without checking it up front you know so maybe this is what maybe maybe I'm talking and I designed my process out of you know bad experience I'd say so will I think we'll just agree to disagree with I say I want a human interaction along the way when they're going to be on a experiment and you say the VPR and this in the city I was I worked with I think over 15 today they all thought like you and and I can understand where it's coming from when you own something especially when you own a production environment that you have customers you want a human to be there and approve the thing right I don't know I don't want to say blame culture here because it feels like you want you want someone to blame right but it is kind of an old you want someone is responsible I'm waiting usually there is a least manager or someone someone who says listen in this release we need to have this and this features so who is responsible for that it's a better word this bill better wording totally ownership responsibility not blaming sure I'm with you I'm with you on that I'm just saying there's no reason I mean if you test good enough and you do have the perfect pipeline you've already I mean okay let's let's go back a little bit where when we do ops our ultimate goal is to automate ourselves right we want to automate stuff otherwise you'd be why would you use terraform just go and build stuff you'd probably do it better right I'm joking obviously but you want to automate stuff you want to you want the things to reproduce on their own you build them you you I mean you made your mark now let things run so I'm saying if you do that with infrastructure and terraform and testing and security and all of that there's a reason why systems work better and that's what I'm saying if you build good enough pipeline you don't need human interaction again again again again I am not there I'm not a saint but if you ask me whether I have reached the goal this is where I should be I don't think I'm there yet okay I'll do a recap of my almost perfect pipeline but not so perfect coding to email okay because we talked a lot I just want to do a recap and then move to backend and backend in front of them will be way shorter okay because we touched a lot of concepts while we talked about the infrastructure right so the goal is to on any commit to any branch you plan to all environments push that plan to an S3 bucket and in your deployment pipeline you select the S3 path of the plan and the environment you want to deploy to of course it's best to restrict the deployment according to the actor who triggered the pipeline or maybe as almost suggested as I suggested to all my record after he told me it's not so good so maybe send a slack message where you just need to decide yes or no and then deploy and when I say the plan in terraform apply the plan that was downloaded the only caveat here that I hit it is let's say you want to roll back by the way on so you have to re plan so think about it if I want to roll back in in my almost perfect pipeline I need to do a restart to an old pipeline so again this is pushed again I need to check again and that's like less than half a percent of my CIP pipelines which have most that they're mostly doing on the backend so let's move to backend let's move to backend okay so the backend I know that a lot of people are building for some reason different artifacts for different environments instead of deploying like building the exact same piece of code the exact same piece of eye like artifact let's say it's a Docker image because we're talking about where application we talked about expressed yes yes so for backend I don't know people are building people is like developers or DevOps engineers building and then a flavor of development flavor of staging flavor of production or whatever what do you have to say about that things let's start with the easy one first of all your exactly right and we just talked about we just had the fight around our trust like how much we trust the process to let it go on its own right to production mainly if you build the same artifact most likely if it was okay in dev and staging most likely it will deploy itself and not saying it is scale the same way I'm not saying it respond the same way but it will most likely run okay to begin with in production especially if you build the same artifact lots of issues start when you rebuild the same thing that's the first thing your trust rose if you use the same artifact so you're exactly right beyond that is just waste waste of storage and the bigger one waste of time and we did not touch it at all but I think one of the most important aspects of CICD mainly CI is the speed in which things can run now again I was the biggest proponent about let's run test let's do end to end let's do this let's do that but one of the biggest variables is being able to run things quickly both because you want to run a lot of them right DevOps is all about the cycle of what they call it deploy feedback fix deploy right there's the feedback loop but on the other hand if you need to fix something if you need to roll back or or just fix a bug a bug quickly and then push the production you want it to deploy within within minutes sure you want to trust the process but it should be fairly quick if you do exactly what you said that's one huge way to shorten the path to production because you don't actually need to be so well let's be practical the CI process the CI steps you would like you would add to a CI CD pipeline okay the CI steps for a pipeline what would they would what would they be like build the artifact so let's say I'm building a Docker image or maybe like running unit tests or whatever tests then build the artifact to run any tests that you want and then what after the build you push it to your but there's something very important to say about that the fact that I will the fact that someone's able to take the same artifact and run it in dev environment and then in staging and then in production means the code itself is agnostic to environment changes and that's you're saying yes because you understand it I think it to lots of people developers it's not all that straightforward that means that nothing gets baked into the image nor the code base that has to do with the environment everything should be a consumed from the outside world and by saying that there's a very famous concept that's called the 12 factor applications go check that out we leave the link one of the biggest concept there is consume everything from environment variables that are being injected to you during runtime not during build time and that's the thing when I change the I can take the same image and have it run in production and it will use different database endpoints and secrets and variables and parameters and everything around that so that's a very very very important aspect and then yeah yeah I want to emphasize what you just said all right so when you build something when you build an artifact if you're using an external tool or service such as sneak vulnerability maybe son or cube for linting and testing or whatever but that is only relevant to the build time process then the API keys should be used during the build time process the API keys to sneak son or cube or any other API that you use but okay by the way I don't see you anymore but if you're trying to or if you want to run an application as almost said the secrets and API keys and everything should be injected should be injected during runtime and usually that is done with environment variables so you need to make sure that when you deploy something when you deploy an artifact that doing the runtime that artifact like the your application knows how to fetch all of those environment variables or get all of those environment environment variables doing runtime this way as all else said your your artifact will be agnostic so this is like the difference between using and consuming secrets during build time yes uh now one thing first of all I totally agree here nothing to say about that sometimes city owes mostly they can't tell you okay but that's kind of a waste of um uptime when I'm starting to run my application why should I go and waste time especially when it comes into lambdas why should I waste time in the beginning of the lambda runtime which I am paying good money for with fetching secrets and parameters when I can fetch them in build time it may have some truth to that but this is so dangerous first of all you have you're just moving the problem to somewhere else right we now need to build for every single time for every environment the same artifact again and again and again second you're exposing yourself to vulnerabilities if you're baking everything into your lambda package at the end of the day it's just a zip file that's uploaded to s3 more likely and someone can fetch it you can make mistakes configuration mistakes the bucket can be um leaked or opened to the public or someone gets into your AWS environment and today the approaches we assume that we're vulnerable in one way or another why expose yourself further than you have to be so pay that few milliseconds in the beginning just run a good process right don't fetch the entire set of parameters or do bake the parameters but separate them from the secrets and only grab the secrets from your you should some of the sick I mean some of them are really critical if it's a database password if it's an endpoint it's not that big of a deal I mean it's not good it's not that big of a deal if it's the user name and password of literally your production database probably don't bake it into the image so just uh think about it okay so let do a recap see see I so for the backend the CI process is build run unit test build the artifact and also push it to a private registry like maybe a private docal hub um I can't even type of registry but there's a MCL exactly maybe it's even like an s3 if you want to push the zip file or toggio whatever once you have your altifactory okay like your altifect store of tested artifact this is this is also something I love on there because I only push altifects that were tested so that is also like because you know that you only store things that were tested somehow then in your deployment process as we talked about the info you can just select which artifact to deploy to which environment and again me and OMA can fight here for a long time like whether it's good or bad to do so but uh how would you deploy OMA like how would you select your artifact on how to deploy it to which environment same as you did for the info like same process where it should be automated that's the big pipeline I like to drink myself so I would again operate on the screen I would still select it manually like I want to deploy this version to this environment and okay other than that also we mentioned that it's important during runtime to inject environment variables or secrets to the runtime instead of burning them in the docal image and I mean publishing again we can talk about it another hour so just again a small thing which can be discussed in a maybe different episode but time is everything and there are so many ways to improve time in CICD using caching runners caching is a big thing when you deploy stuff this also means that you need to enforce very rigid and strict hardcoded versions to everything that can be in your libraries that you consume on terraform and obviously to your dependencies in the backend so that's one thing using caching and front of course and the other thing is using light images because at the end of the day even if it's running on just containers which are fairly quick to start if it's heavy loaded containers like each of them is a jigga or two or three the runners that spin up need to pull the images from somewhere and that takes a lot of time it can take a lot of time if you have lots of deployments running alongside each other that's something to think about so you know alpine images and stuff like that I want to add to the time consuming by the way you said something important also time consuming like if you only test in the CI process this means you only test once and you only do it during development time now suddenly the vpl and the release manager tells you we need to release this version to this environment guess what everything was already tested everything was already built was already pushed the only thing that you need to do is to deploy an artifact that was already built and tested that means that deployment time is concentrated only on deploying nothing else and this is what I like about it we don't involve testing during deployment the one thing that yeah okay so that's super important to remember we need to deploy giant I mean sure that's it there are let's put a little start to that and and the fine print is sometimes you are going to have end-to-end test that needs to run against a different environment at the end of the day you are deploying something to a different environment so things are of course in the world of microservices different versions run and staging a different version of other things you're working with a running in production you might run run against that but yes generally yes that you should not and you don't want to regardless of time you don't want to build again because it's already been tested and built like you said you know what I like to add to my deployment path in the backend or even in the fontan I like to let's say inject some variable which says version and then when the deployment is completed I like in the cist pipeline to do a curl you know to query a path of slash version and check in the same pipeline the devotion that I deployed was actually deployed you know because the fact that it said deployed it's nice but if the application doesn't really report back that it was deployed with the version that I think of if I was an issue I tell you you don't trust your process you have trust issues I just want to I just want to make sure for what is that percentage itself end to end I like making sure yeah I have trust issues yeah so that last but not least the fontan which I think is the most hype and most troublesome process of them all maybe I'm not sure but how would you do you want me to start I do want to describe the best CI CD for fontan again react VUJS whatever no matter let me be annoying and say that I try I try to treat for the most part principle wise everything the same say in front backend frontend should be the same yes they don't go into the same place infra changes terraform code and deploys to AWS backend is just a pod on Kubernetes or a lambda function running and frontend can be a static file uploaded to S3 or something else but I have an issue with the fontan so I have a very big issue my application my fontan application maybe needs to use an API key okay so I will talk about API key later but my application my fontan application needs to speak with the backend and how it does it has this variable which is called API URL and these variable changes for each environment so for staging I have this URL production this development this so every time the URL changes but when I build my fontan application as you already know it's like it's bundled into maybe a single index HTML and then main JS or whatever so it's bundled into static files so how can you inject each time a different variable to a different environment so I'm quite confused so again how can you build like we cannot really we are contradicting the story that we told about the backend because it's it's not going to be the exact sense what I started by saying sure okay you have edge cases and they're not all alike but the process for the most part should be the same you you are running build test and deploy yes when it's different environments you can like inject this game of fetching where I'm sitting at and then deciding where I'm running or what environment I'm running but I agree that's what I by the way what you describe is what I do I don't see another way of managing it but you can shorten things like you can use again caching to faster the process and stuff like that you don't have to rebuild you don't always have to rebuild the entire thing some of the fontan frameworks do actually they did some work for you in that sense because they understand you will need to rebuild it for other environments so they kind of they do that smart loading and smart caching to reduce times of build at the end of the day yes it's different it's a static test you know that I treated the same as my terraform so backend is agnostic backend is easy to understand because you can inject stuff during runtime but the infrastructure you know terraform has a very strict plan you know it's like it's a locked artifact same goes to the front end it has a very strict you know static artifact so in my case for both of them I build for all environments so each commit builds artifacts for all environments both form infra and both for fontan so for the fontan that I get actually for each commit three zip files for example okay and those are pushed to s3 and upon deployment I choose which artifacts to push to each environment but that's a bit easier I think because then I have like a restriction of okay so if this was down like if this was pushed for development it can only be deploys to development and stuff like that just like I can do for my terraform so this is how I do it okay build for all environments on the same commit and then deploy when you deploy I have trust issues you select it manually which artifact which environment what would you say about that same thing no no no no actually no because this is one of those things that it depends we all have different cases especially with infrastructure it varies so much I mean we we're only talking about terraform people use other technologies some of them are rather limited in what they can say information yeah so basically the situation changes as long as the principle are there and you understand okay this is a principle right I want to use the same artifact for everything but it does not apply here because with the technology I'm using doesn't support that or the I don't know the framework I'm working or the context doesn't make sense but the general principle applies and then you can decide how to work within it as long as you understand the process and why you're doing stuff speed efficiency security ease of deployment is of rolling rolling back issues that arise etc yeah so the only thing that I said like as you settle guarding the speed when I do build for all environments and let's say I'm only intending to push to development and not to production so I'm just I'm building for production and running test for production but I'm not intending to deploy the production right now I only want to do it when I get back to my main branch so I'm doing it now well I prefer doing it now you know to see things earlier because we always do the shift left right so I like building everything at the same time in the CI CD even though it takes time so my CI CD is crazy you know it has a lot of matrix I use GitHub action so I use strategy matrix and then I build all flavors at the same time without duplicating my code so I don't have code duplications and stuff like that oh so you also use this I do make yes I have to build a binary that runs on different platforms and we need to test and build it for different architectures architectures operating systems yes do you build it for any branch or only build and test on any branch when we run a feature I mean okay a feature branch will need to be tested on different environments if the change is big enough at the end of the day when you merge it into one of the release branch it will not really release branches environment branches it will be tested against everything but sometimes it doesn't make to slow down the pipeline of the feature branch to rebuild again and again and again so you work on one or two architectures and one or two operating systems and then when you merge it to one of the main branches it'll get built for everything I have a tough question then why like you cannot catch things before you merge I mean you're saying that no yeah yeah I'm sometimes saying like I'm asking and saying it sounds like you merge and then you're like oh man that's the most let's revert instead of finding it out so I will think about it most of the case you know these are changes that will most likely won't be a problem with other environments when you understand the changes deep enough to or just make sense that the change can hurt the operating system that runs against or the architecture then you are trying to you are running the matrix against your system but it's not common most of the changes are small enough for you no wondering you said when you when you so you talking about humans at the developer that makes a change this human interaction so the developer needs to think if he actually doesn't need to think if he doesn't need to other platforms he runs it and it won't get rebuilt for every branch but if he sees an issue then I mean when senior enough developers understand what changes require that and then yes they trigger the pipeline that builds everything but most of the time I mean again 99.9% that's not the case it's not a problem it's rarely the case that they're them I feel like we live in different worlds you live in the 99.9% of the world and I live in the point point point point point point point point point point point point one I think I think that's the nature of zero one I think it's a nature of product I'm building a service that runs on mainly on easy to instances on AWS that's where it stays it's not a C++ application that needs to run on hardware that I don't trust it's different you know so I want to recap on everything in less than I'll try to do it in 10 seconds okay all my ones to automate everything and if there's a human intervention that then you're doing it well I can't say wrong but it's not perfect and for me is infrastructure and frontend build to all environments at the same time for any branch and the deployment step should be manual selected from your artifact store and then deploy to the relevant environment and back-end both Omer and I agreed that it should be agnostic and then inject all the secrets and stuff like that like environment variables during runtime but do not burn the API keys and secrets during build time to the artifact so it can be agnostic so this is like a very very short TLDR of maybe a one hour post almost okay so you want to move to the point now that we talked about that is we have so much to say about any of these and maybe we will in future episodes but generally I believe the summary the bottom line of mind the takeaway would be the perfect process doesn't involve you build something to replace you and then you can focus on other stuff that's how I see everything all right good so this this is why it's good because we have different perspectives will fight and when the fight will bring out ideas and then we fight and keep it together but okay so I think that's the our success here that's our winster yeah epic all right yes okay so moving to the corner this week I'll do the effect just a corner of the week yeah just this week so in this corner Omer and I will talk about challenges news experiences technologies or anything news or anything else that we've experienced this week or any other week same owner would you like to start okay two things first of all AWS released something quite scary two days ago and they said that they are now starting to charge for every public IP regardless of whether that's attached to an instance or not it feels like you have not heard of that by judging by your eyes so they're starting to charge for everything it's not going to be all that cheap by the way that's basically it I mean there there's more to it I live the announcement below and as far as I know if you have maybe you created a new elastic IP in AWS and it's not allocated right now then you pay for it so what you're saying even though it's attached yeah as far as I know and pay for it it's been not only elastic IP but I'm not sure about that we need to read it to be sure it feels like two things about that it feels like AWS are out of addresses are trying to I thought you were gonna say out of money that's my second out of AWS is announced at the last time they've changed like they've increased prices was 2016 or something like that there were always every announcement come with this brag about how we're only lowering prices or helping you reduce costs and this is the first time I remember since 2016 please anyone correct me if I'm wrong that they've actually increased the price of something that's kind of scary and that probably requires a lot of actions on all of us on behalf of all of us so that's the thing I'm just thinking about okay spot instances public spot instances I just want to say this is all what you bring out for IPv4 maybe they're trying to push us to IPv6 which doesn't yet doesn't seem to be delivering on its promise yet and yet and yet spot instances with public IPs when they come up they have a public IP then you delete them so what you pay for the public IP so again that's I'm not 100% sure let's reread same goes for public ECS tasks and same goes for any public IP service they're pushing us to start making changes to our infrastructure for not consuming addresses if we don't have to and applying IPv6 that's I think where they're going with it that's the first thing good to know other is a tool I kind of discovered and didn't have yet a lot of time to play with but it's called let's see the exact name it's called Kubernetes gptk8s gpt.ai which is a tool that's supposed to be used kind of like charge gpt only for Kubernetes and they say that they have like an sre mind baked into a tool that you can ask questions or ask things to do you know run a deployment fix that change the the version on that instance or that note sorry again that's the tool I'll leave a link below hopefully it's good enough as they promise those are mine on that note I would like to share my experience this week with chaj gpt so you gave me a good connecto to my connecto now okay so this week I have a few tips like you can also ask if you want to do maybe a chaj gpt like a paw but we'll see if we're gonna do an episode on that because it's a bit it's a niche okay so let's have a chaj gpt like a paw in 15 seconds something like that so I realized that you can tell chaj gpt first if you're paying for the plus version you can use custom instructions which is very very new because it was released on the 25th of July like five days ago so yeah so you can custom add custom instructions for example I'll tell you what I did I told it that I'm a DevOps engineer that I live in Israel and that I want that every time I intrigue with a question I want him to reply with challenge accepted and an exclamation mark like Barniston some from how I met your model so every time I ask something like can you do that it will tell me challenge accepted and then I'm good enough to tell you there's no such thing as a DevOps engineer no okay it's not there yeah maybe it's an occupation all right okay so like these are the custom instructions that I told him also you can also add to the custom instructions or in any live chat I don't know if you ever thought about it but you can tell it can you please rate my really so I told it please add the rating to of the of my last of each last message with my English level and then each time you send it to message we tell you seven eight nine and then you can add please add a sentence about how I can improve it or tell my proficiency you should have heard your accent what about what when you're when you're really pushing it then have it right yeah when you talk it yeah when you talk to you okay so another thing you can ask can you tell me the relevancy to AI of my question so these goals like am I asking a good question to make sure that I know how to oppose changing people so I wanted to improve myself I wanted my questions to be perfect so I tell it to rate my questions so please exactly so reply to my last message with the rating of how close it is to the my last change because it's an AI model that was trained and now you're making a train you train you exactly because I want to be as much as a lines to it as I can to get the best results that I can you can also add please add so you can also add please add a sentiment okay sentiment analysis to your messages so let's say you want to have a big conversation okay with someone maybe with your girlfriend may with your boss you can tell it hi I want you to I want to text my boss my girlfriend my whatever and then you want to simulate the conversation and to understand if your messages have a positive sentiment maybe a cynic sentiment sad you know so it it can analyze your sentiment and you can do simulations to understand if you're talking like the way you should like the way you would want someone to judge your your text okay and again it's only speculation and assessment of an AI model but it's good right yeah but you would also use that to change sentiment they take like an email they wrote and then they have it you know run it in a more casual voice or do it a little bit more professional and all but I like the conversation I think people miss the fact that Chaji PT is a conflict like yeah you should but people think it's like Google when I chat with it I like it to be a full conversation and with a thing about or someone that can give me feedback on what I'm doing and improve my and improve my questions or my technical knowledge in every aspect you know so this is why also in my custom instructions I told it please give when you give your default references should be to the Python language because it's easier for me to understand with examples so each time you want to give an example please give it in Python so it's not in a random language that is according to some AI model so it translates it to Python and then it's easier for me to understand the response you know so I tweaked it a lot that does it fit my name maybe a person I think good question but does it keep a context for future conversations for example you told it yeah it knows that may or all yes only in the custom so it's a thing the custom instruction again right now it's bad it's in beta for all the Chaji PT4 users on only the plus users can use it but you know I like yeah so custom instruction is relevant to any new chat to any new chat but if you want to test it like for the current chat only what I did was I created a new chat did a lot of testing like what I want to appear in every message that I asked Chaji PT and then I took that and put it in my custom instruction so I think custom instruction is like a prefix yeah it gives content a new chat that you do it gives context yeah so every time I asked it knows that I'm a DevOps engineer it knows I prefer Python references I want to get a sentiment level of my English or anything like that when I ask questions yeah so that's like a way another way to use Chaji PT not just like a blunt like ask a question get a response it's also funny because it says challenge accepted when intrigued okay and I can talk a lot about it because I did a lot of experiments you can do tons of things with it it's like really the tip of the iceberg so this like Chaji PT like a poem in five minutes but we can do a full episode okay anything else or maybe it's going to be our longest one it's gonna be our okay thank you everyone see you and bye bye see you next week thank you all

DevOps Topeaks

#28 - The Perfect Pipeline

Listen to this podcast on