The Scripting Den Podcast

Episode 31 - How (and why) to optimize your unit tests for performance?

Season 1 Episode 31

Unit tests are usually 2nd (if not 3rd) class citizens in any codebase, teams usually work on them only once they're done with all the "important" features.
However, if you do it like that, chances are those tests will eventually start slowing your development process down. They will affect your coding, and your deployment pipelines, affecting your time-to-resolution metrics or even the business by slowing down time-to-market numbers.

In other words, bad tests are a real problem.

In this episode I cover several reasons why optimizing for performance is actually a great idea, and multiple things you can do to improve your unit tests.

Mainly:

  • Get rid of your 3rd party dependencies.
  • Test only what needs to be tested.
  • Optimize the test data.
  • Implement selective testing.


Get in touch!

Rate us!
To ensure the podcast reaches more developers, make sure to rate it on your favorite podcasting app and in Podchaser!
Thank you

All right, welcome to another episode of the scripting then podcast, the place where I use my 20 years of experience to share some insights and hopefully shed some light in your day to day as a developer. Anyway, in today's episode, I want to cover the concept of optimizing your tests for speed, specifically your unit tests. This is something that as developers, we usually don't pay a lot of attention to when we write them, we just write them to get over them and just move on into the next feature. But we should be paying more attention to not only how well they execute, but rather how fast they execute. And in this episode, I'm going to cover why you should care and some tips that could help you to like to improve the speed at which your unit tests are running. So let's take a look. All right, so first, let's talk about why you should care about this. Why do you need just your tests, your unit tests to run as fast as your actual business logic, essentially, there are several reasons why the first one and the most the one that should be the most obvious to everyone is the faster your tests run, you have a faster feedback loop, essentially, with the with the quality of your code. So let me explain. There are two instances in which your tests are going to be really useful. The first one is when you're developing the feature, when you if you're doing TDD, for example, if you're writing tests first, and then you're writing your code for your feature, and checking against your tests. But also, like, if you're doing maintenance of an existing feature that has tests, and you want to make sure that whatever you call your writing is not breaking and existing functionality, that's number one, essentially, when you're writing code. And then the other one is when you're deploying, right? As part of your CI CD pipeline, which you should have your tests are going to run to make sure that you're not pushing, you're not trying to deploy anything that is not working. So the feedback loop specifically is very relevant in the first one in in the moment when you're writing code, your right code, ideally, your right code until you until you reach a point where you say, Okay, I think I'm done, or I think I've fixed the back or, you know, updated the feature, whatever you were trying to do. And then the next immediate thing that you want to do is run the test before you even try yourself, just run the test, make sure everything works, or it's till you're in the same state, and then go on and make your manual tests and make sure that everything is working. Now, in that process of checking that you haven't broken anything or that the tests are properly valid in your feature. If that process takes, you know, a minute, two minutes, five minutes, you know, that's crazy, we can really, you know, when we're developing something, we should try to avoid these deadlocks, these moments in time in a workflow that we're just, you know, dead in the water, just needing to get a cup of coffee, because there's nothing we can do for, if you want to be more proficient, you want to, you know, deploy or push code faster, obviously, with quality, then we need to optimize the these processes that have the potential to break the flow and just have us waiting, looking at the screen without being able to do anything. So that feedback loop is the one I was talking about, making sure that, you know, if you hit the save button and then test it, and then 10 seconds later, you already have a result, that's what you should aim for. Having that immediate feedback of saying, yes, you know, you haven't broken anything, everything is working fine, or yes, you broke these pieces of code, just check them out. That's the immediate value that tests give you. But if the process takes too long for you to get that result, then the value is diluted, you know, you're just wasting time as part of the process, which we should all try to avoid that. So that's number one. The media number two benefit is the one I was talking about in the CICD pipeline. And it's the same, it's kind of essentially the same thing. When you're trying to deploy something, you want the deploy to go smoothly, obviously, but also fast. Why? Because sometimes you're deploying a new feature, which is, you know, expected, and you can take your time, that's fine. But when you're trying to deploy a hotfix, for example, you want the deployment to go fast, you want the deploy to hit production as fast as possible, because whatever is happening right now is affecting your users. So if your tests are the reason why your deployment is slow, then that's a problem. And that's a problem that you can definitely fix, right? I mean, it just means that you haven't really been paying attention, you or your team haven't really been paying attention to the way you write your tests and you code them and you execute them. You're just thinking of tests as a second class citizen, which should not be the case. Tests are just as important as the code that they test. So we should all make sure that whatever team you're part of has that mentality as well. And then there are two more benefits to improving the performance or the speed at which your tests are running. One is faster tests, you know, speedy tests, encourage developers to create more tests. It can be seen that as a silly benefit. But to be honest, just, just take a step back and think of when was the last time you wrote a test and did you actually enjoy the process? Is your team actively making sure that you know, tests are being written and maintained? Usually what happens is that tests are again a second class citizen, if not a third class citizen, and they're just the last thing that they do, the last thing that a developer would do after everything's done, after they've done all the testing manually, after they think they're done with the feature, they'll write some tests. Because it just takes time away from them to do all the more interesting things. And because, you know, if they're not properly set up and if they're not efficient when they're running, they just take time. They take time out of their schedule to work on all the things. So having performant tests will encourage others to create more tests, because they just run fast, they give that extra value of the feedback on the state of your code, they get, they get that value very fast. And that is definitely, when they see that value is when they'll recognize the importance of having more tests and making sure that everything is tested, or as, you know, testing as much as possible. And finally, the last benefit of having fast and performant tests is that when your system scales, when the code grows enough, if you don't have a performant testing suite, essentially, something that loads and executes the tests and all the tests that need to run. If it's not proficient, if it's not efficient when running, then once the code base is big enough, then you're going to start having a problem. In the sense of slower deployments, slower building time, slower development workflows, essentially, everything's going to slow down. So if you expect your code base to grow big, this is something that you should be looking into. Now, leaving the benefits or the reasons why you would want this aside, how can you improve the performance of your tests? Well, the immediate first thing that you should be looking at, and this is something that I always recommend when we're talking about unit tests in general, not only about thinking about performance, but in general how they work is get rid of third party dependencies, get rid of everything that is not the code you're testing, essentially. Remember, the unit test is meant to test as piece of code. It can even sometimes it's not even a function, it's just a section of a function. But if that code that you're testing is using some other external dependency, and by external dependency, I mean, it could be the database, it could be the file system, it could be an external API, or it could just be a set of other functions that do all the calculations and return of value. Anything that is not specifically the code, the lines of code that you're testing should be either mocked or stopped, meaning you can inject a dependency that will replace that external dependency that the code has, and you can then control the output of that operation, whatever it is. And you are also avoiding having to depend on something that could be slowing down your test without the slowness being the fault of the code that you're testing. So let me say that again. So remember, when you're testing, when you're writing a unit test, you're right, you're testing a specific bit of your code, whatever the result of that unit test is, should be related specifically to that bit of code. So if there is a side effect, if there is slowness, if there is anything affecting the performance of your test that is not directly related to the code, then it's a problem that you can easily avoid by stopping or mocking those dependencies. If it's a database, for example, which is the most common problem or the most common practice that some developers do is they just create like a test database for the test so they can just create it when the test start and drop it when the test end. That process, even if it just a local database, you're adding a delay to the test because they're interacting with another system. And if for some reason that other system fails, your test is going to fail, but it's not going to be the fault of your of the code you're testing. It's just going to be the fault of third party dependency. That is why I always recommend the same thing. When you're writing unit test, remove every single third party dependency, whatever it is, just mock it, control exactly what you get out of the operations that interact with it. If it's a select, your mock will always return the same thing that you want. If it's if you're trying to test a narrow condition, for example, with the interact while interacting with a database, you can easily mock that connection to the database and always return an error when the test run so that you can mock the logic so you can test the logic that handles that error. It just there are way too many benefits for not be to not use this technique. And one of these benefits is the test is going to be faster is going to run faster because there is nothing else that needs to be interacted with everything's already loading memory and everything's going to work. The other advice that I could give you in terms of improving the performance of your test in general as a test suite is very related to the previous point. Just make sure that you test what you need to test. There, you know, it could be that you're implicitly testing the interaction with third party dependencies by not removing them, right? You're testing the connection to a database, something that you should not be testing, even if you're mocking these dependencies. You just need to consider that whenever the piece of code that you're testing, even the one that you're mocking, it just just contains the execution of a library of a framework or, you know, a call that a function call that you didn't write, but a dependency, right? I mean, maybe let's just say that you're having mongo database and you have the mongo's or m library for Node.js. Let's just imagine that it's just an ORM library in your Node.js project. And you have to call the same function to save a document to into a database. If as part of your unit test, you're testing, for example, the connection to the database, you're calling, you know, the mongo's dot connect or something like that to the database. You have to think that whoever created your third party dependency, your, your, your library, the library you depend on has already done that has already created a full set of test suites to test that particular method and everything and every piece of code related to it. So you're just testing something that has already been tested. Now, if you don't know if that's a case, then you probably shouldn't be using that library in the first place. That's a whole different conversation. But let's just assume that you bet it that library, you know, it has unit test, you know, it's been, it has a high quality of code. Then why would you test something that has already been tested? You'd have to be smart about the pieces of code that you test. Just make sure that the tests are executing and are testing code that you've written. That might look like a small improvement, but you'll be surprised how many times I've seen tests written that are just double checking something out. You know, we already know has been tested by the library creators. So that's just adding milliseconds to on top of milliseconds for your test. So why would you create that delay that, you know, in a big codebase could be considerable if you don't really need it? Just make sure that you are checking and testing specifically the code that you've written and nothing else. Another advice that I could give you that has probably has like minimal impact, but could also affect the performance of your test is that being smart about how you create and how you maintain your test data. Some in some situations, if you definitely need to load, you know, test data to be processed by your code, just make sure that you have the absolute minimum needed for your code to be tested properly. So what do I mean by this? If your code is loading a JSON file and processing the data in some, you know, following some logic, then just make sure that you have two records in your JSON file, three, maybe, but that's it. Don't create a JSON file of, you know, three megabytes that will one for once, take longer to just be loaded into a memory and then have your code go through the entire set of objects due to test something that could have been tested with just three records or two records or just one, you know, be smart about how you create essentially the test data. That's, that's what I'm trying to say. Just make sure that it's easily available, that it can be loaded fast and that it needs to be loaded and processed by your code that it just doesn't add a delay by the nature of it. Essentially, it doesn't add a delay because it's just so big. Unless, of course, what you're trying to test is the ability of your code to process large amounts of data. But if that's not the case, then just be smart about how much test data you maintain and, you know, how it's loaded and if you can minimize it, make sure you do. Another thing that you should be looking into is trying to keep your test as simple as possible. If you feel like you're over-engineering your test because you have to work around, for example, your methods to try to, or, you know, the public API of your, of whatever you're trying to test, because you really didn't design it thinking about tests, you know, in the future, then instead of writing an overly complex test that needs to, so that it can run and execute, probably think about fixing the design of your API. And by API, I don't mean like HTTP API, but rather the access, the public access to your code. Remember, the unit test is only going to test public code. It's never going to be directly testing private code, whatever that means in your framework, in your language. So make sure that if you're doing extra work just to reach specific set of logic, specific logic path within your your code, then that's probably a sign that there's a problem with your design. It's not a problem with the test. The test should be very straightforward. It should just be executing your code and checking the results. There should be minimal, there should not be almost anything else around it. So if you feel like you're at, you are adding extra bits just to reach or just test a specific section of your code, then rethink the design and it will probably make it a lot easier to test in the future or to maintain even in the future. One very common example of that of this is trying to create very complex and and work arounds to inject dependencies into your classes, for example, or your modules, simply because you didn't think about that when you were designing them. So you end up using like metaprogramming or obscure features of the language or the framework that you're using, just to get and replace some sort of dependency that you have hardcoded in your in the tested code. Instead of doing that, you could just redesign your code to accept a replacement or, you know, to accept dependency injection. You have to remember that when you're designing your code, you have to think about the tests that are going to execute against it. If you encase it in a box that is so unpenetrable that you can really reach it from anywhere other than the specific place where it's meant to be used, then you're probably making it very hard to you or someone else who's going to be writing tests in the future. This is why TDD is such a great methodology because you already start with the test. So you already think of that interface of that API into your code to be accessible, to be injectable, to be everything. You just need to write whatever is inside those that interface. But, you know, that's probably the topic for another episode. But definitely consider that, you know, if you're just writing way too many lines inside a single test, and you're just testing one thing, which you should be, then you're probably over-engineering it, which means there's probably a design issue rather than a test issue. And that test, if you don't fix that design issue, then the test is just going to have to do a lot more work, and it's going to be slower than it should be. And the final piece of advice here is not really related to the unit test, but rather of the tools that you use to execute the tests. Just consider that in some situations, if the code base is large enough, if the test suites have way too many tests, a good option could be to implement selective testing, which means that when you deploy something, only the affected portion of the code gets retested and everything else is not. So there are CI CD tools and test suites that allow you to do this. So essentially, you optimize the whole process, the whole testing process, not by making tests faster, but rather by just executing the tests that need to be executed because they have a direct dependency with whatever changed, right? So if you have 100 tests for your entire system, and only a small portion of the code base changed in the last deployed, for example, then maybe two or three tests need to be rerun, everything else is expected to not be needed because it would just be redundant. It is definitely a huge improvement in testing time. But you just need to make sure that whatever you're using is compatible with this methodology. It's definitely a great option. It's just not straightforward to implement that all. And that's it for this episode. Remember, testing is not meant to be boring or secondary activity. It is very important to keep and maintain the integrity and the quality of your code, whether you're just getting started with the project or your maintenance and legacy code, tests are meant to be there to give you sort of a safety net when you're making changes. Now, if they are slow, they will affect your workflow, they will make you less proficient as a developer. So make sure that whatever you do has the speed of the test in mind when you're writing them, it will give you a faster feedback loop, it will increase your productivity, it will increase the speed at which deployments are done, giving you faster time to resolution, giving you even faster time to market, which is crucial for the business and for the dev team as well. And finally, it will encourage more developers to create more tests if they work fast, if they are performing. And it will make sure that your tests scale with the code base of your application, the bigger the code, the more tests you have. So make sure that they're performing so they can be executed faster. And in terms of how you can do that, remember that mocks and steps are your go-to tools, you should be removing every single piece of third-party dependency from your code when running the tests. Make sure that you're testing only what needs to be tested, you're not retesting something from a third-party dependency from a library or something that you just should assume it's been tested or double check it, but inside your tests, they should not be tested. Make sure to optimize the test data that you use, just don't create tons of data for your tests, if you can just be slim and be loaded fast and processed fast, that's even going to be better. And finally, remember, if you can, if the CICD tool you're using allows for it and the test suite allows for it, then implement selective testing, it will definitely improve the performance of the overall process, especially when it comes to deployments. If you're deploying small changes, hotfixes, new features, but not doing full system deployments, this is especially useful for existing applications with large code bases, where you're not always like affecting the whole thing when you're deploying, you're just making small changes. So keep that in mind. Anyway, I hope you found the video or the podcast interesting. If you did, and if you're watching this in YouTube, just make sure to subscribe, like the video and share it as well, or leave a comment. I love that. I love when developers reach out and share what they think about the episode or your four ideas as well to cover. And if you're listening this in podcast format, just make sure to like or even write the podcast in wherever application you're using to make sure that we can reach more developers. Anyway, thank you. That's it. And I'll see you in the next one. Bye. And that was it for this episode of the scripting them podcasts. Thank you so much for listening. Remember that you can find me online on Twitter at the league mat 123. And you can also find the podcast YouTube channel just searching for the scripting that on YouTube. Thank you again and catch you on the next one.

People on this episode