Why you shouldn't write unit tests?

Introduction

So you're writing tests, following all the best practices you've read about. You've read that "unit" tests should test the smallest possible piece of code in isolation. A single class, a single method, a single responsibility. Uncle Bob says so. Kent Beck says so. Every testing framework tutorial starts with this assumption.

You feel productive. Your coverage is high. Your tests run in milliseconds. Everything looks great on paper.

THen, you refactor some internal logic and suddenly half your test suite breaks. You change how two classes collaborate and spend more time fixing tests than writing actual features. Your "fast feedback" has become a slow trudge through brittle test maintenance.

Surprisingly, most of what i've seen defined as "unit testing" actually made our codebases harder to maintain, not easier.

The way most developers approach unit testing comes from a very theoretical and dogmatic approach, leading to tests that break for the wrong reasons and pass when they shouldn't, causing more harm than good.

Traditional Unit Testing

When I started coding, I have been taught this exact cycle when writing tests:

Write code
Write a test that exercises that code
When your code needs a collaborator, introduce an interface
Mock out that interface in your tests
Repeat until you have "good coverage"

The assumption is that "unit" is interpreted as a small testable part, typically a single class. This leads to tests that mock out everything except the class under test, creating what many consider and value as "pure" unit tests.

Looking at a real example. Say you're building an orders API for an e-commerce platform. You might have something like this:

public class OrderService
{
    private readonly IOrderValidator _orderValidator;
    private readonly IOrderCalculator _orderCalculator;

    public OrderService(
        IOrderValidator orderValidator,
        IOrderCalculator orderCalculator)
    {
        _orderValidator = orderValidator;
        _orderCalculator = orderCalculator;
    }

    public async Task<OrderResult> CreateOrderAsync(CreateOrderRequest request)
    {
        var validationResult = await _orderValidator.ValidateAsync(request);
        if (!validationResult.IsValid)
            return OrderResult.Failed(validationResult.ErrorMessage);

        var order = new Order(request.CustomerId, request.Items);
        var pricing = await _orderCalculator.CalculatePricingAsync(order);
        order.ApplyPricing(pricing);

        // Process the order...
        return OrderResult.Success(order.Id);
    }
}

Following traditional unit testing practices, your test might look like this:

[Test]
public async Task CreateOrderWithValidRequestReturnsSuccess()
{
    // Arrange
    var mockOrderValidator = new Mock<IOrderValidator>();
    var mockOrderCalculator = new Mock<IOrderCalculator>();

    var orderService = new OrderService(
        mockOrderValidator.Object,
        mockOrderCalculator.Object);

    var request = new CreateOrderRequest
    {
        CustomerId = "customer123",
        Items = new[] { new OrderItem("product1", 2) }
    };

    mockOrderValidator
        .Setup(x => x.ValidateAsync(It.IsAny<CreateOrderRequest>()))
        .ReturnsAsync(ValidationResult.Valid());

    mockOrderCalculator
        .Setup(x => x.CalculatePricingAsync(It.IsAny<Order>()))
        .ReturnsAsync(new OrderPricing(100.00m, 10.00m, 110.00m));

    // Act
    var result = await orderService.CreateOrderAsync(request);

    // Assert
    Assert.True(result.Success);
    mockOrderValidator.Verify(x => x.ValidateAsync(request), Times.Once);
    mockOrderCalculator.Verify(x => x.CalculatePricingAsync(It.IsAny<Order>()), Times.Once);
}

Problems

This test looks comprehensive, but it's actually quite fragile and not testing what matters:

Too Curious:

The test knows way too much about how OrderService works internally.

If you change the order of operations, add logging, or refactor the internal flow, the test breaks even though the behavior from a user's perspective hasn't changed.

Ironically, TDD advocates understand this problem better than most and try to solve it via rigorous manual discipline.

The "Red" in Red Green Refactor exists precisely because when you write the test first, you're forced to think about behavior before implementation.

However most developers skip the "Red" phase and write tests after the code is done, which inevitably couples the test to whatever implementation decisions they just made.

Mock Hell:

You're spending more time setting up mocks on your own code rather than actually testing the behavior.

Each mock setup is another assumption about how your code should work internally, not what it should do externally. More importantly, it is counted as code coverage when it really is not!

False Confidence:

This test tells you that your OrderService calls the right methods in the right order, but it doesn't tell you if an actual order gets created correctly.

You could have bugs in your mapping logic, validation, or state transitions that this test would never catch.

Maintenance Nightmare:

Every time you refactor OrderService, you need to update not just the implementation but also all the mock setups.

Your tests become a maintenance burden instead of a safety net as you'll have to change them every time you change something.

Essentially they're not preventing regressions at all, defeating the purpose of having tests in the first place!

A Different Approach: Application Level Testing

I don't think there's a refined or standard name to this type of tests, I usually call them Application Level Tests but have seen them go by Component Tests, Service Tests etc...

The name does not matter and the idea remains the same:

Instead of testing individual classes in isolation, focus on testing behavior from the outside as much as possible.

Here's the same scenario using what I call "Application Level testing":

public class CreateOrderTests
{
    private readonly TestApplication _app;
    private readonly OrderTestFixture _orderFixture;

    public WhenCreatingAnOrder()
    {
        _app = new TestApplication();
        _orderFixture = new OrderTestFixture(_app);
    }

    [Test]
    public async Task WithValidItems_CreatesOrderAndProcessesPayment()
    {
        // Arrange
        var customer = await _orderFixture.CreateCustomerAsync("John Doe");
        var product = await _orderFixture.CreateProductAsync("Laptop", price: 999.99m, stock: 5);

        var request = new CreateOrderRequest
        {
            CustomerId = customer.Id,
            Items = new[] { new OrderItem(product.Id, quantity: 1) },
            PaymentMethod = new PaymentMethod("card", "4111111111111111")
        };

        // Act
        var response = await _app.PostAsync("/api/orders", request);

        // Assert
        response.StatusCode.Should.Be(HttpStatusCode.Ok);
        var order = await response.ReadAsAsync<OrderResponse>();

        order.Status.Should().Be("Confirmed");
        order.Total.Should().Be(999.99m);
        order.Items.Should().HaveCount(1);
        order.Items[0].ProductId.Should().Be(product.Id);

        var savedOrder = await _orderFixture.GetOrderAsync(order.Id);
        savedOrder.Should().NotBeNull();
        savedOrder.PaymentStatus.Should().Be("Paid");

        var updatedProduct = await _orderFixture.GetProductAsync(product.Id);
        updatedProduct.Stock.Should().Be(4); // Stock decremented

        var notifications = await _orderFixture.GetNotificationsForCustomerAsync(customer.Id);
        notifications.Should().ContainSingle(n => n.Type == "OrderConfirmation" && n.Id == order.Id);
    }
}

Differences

These tests verify that the system behaves correctly from a user's perspective.

They don't care whether you use OrderService or OrderHandler or OrderProcessor internally.

But only care that when you send a valid order request, you get an order back, and when you send an invalid one, you get an appropriate error.

Real Data Flow

Instead of mocking everything, we're using real implementations with in-memory or test databases. This catches integration bugs that mock-based tests miss completely.

You can even now test complex flows/edge cases. This is very useful for compensating actions, retry strategies, logging/telemetry etc...

Here's an example of extending this same test suite in ways we couldn't possibly do in the traditional ways:

    [Test]
    public async Task WithInsufficientStock_FailsWithAppropriateError()
    {
        // Arrange
        var customer = await _orderFixture.CreateCustomerAsync("Jane Doe");
        var product = await _orderFixture.CreateProductAsync("Limited Item", price: 50.00m, stock: 1);

        var request = new CreateOrderRequest
        {
            CustomerId = customer.Id,
            Items = new[] { new OrderItem(product.Id, quantity: 2) }, // More than available
            PaymentMethod = new PaymentMethod("card", "4111111111111111")
        };

        // Act
        var response = await _app.PostAsync("/api/orders", request);

        // Assert
        response.StatusCode.Should().Be(HttpStatusCode.UnprocessableEntity);
        var error = await response.ReadAsAsync<ErrorResponse>();
        error.Code.Should().Be("insufficient_inventory");

        // Verify no side effects occurred
        var orders = await _orderFixture.GetOrdersForCustomerAsync(customer.Id);
        orders.Should().BeEmpty();

        var updatedProduct = await _orderFixture.GetProductAsync(product.Id);
        updatedProduct.Stock.Should().Be(1); // Stock unchanged
    }

    [Test]
    public async Task WithInvalidPaymentMethodFailsAndReleasesInventory()
    {
        // Arrange
        var customer = await _orderFixture.CreateCustomerAsync("Bob Smith");
        var product = await _orderFixture.CreateProductAsync("Widget", price: 25.00m, stock: 10);

        var request = new CreateOrderRequest
        {
            CustomerId = customer.Id,
            Items = new[] { new OrderItem(product.Id, quantity: 2) },
            PaymentMethod = new PaymentMethod("card", "4000000000000002") // Invalid Card
        };

        // Act
        var response = await _app.PostAsync("/api/orders", request);

        // Assert
        response.StatusCode.Should().Be(HttpStatusCode.UnprocessableEntity);
        var error = await response.ReadAsAsync<ErrorResponse>();
        error.Code.Should().Contain("payment_invalid");

        // Verify inventory was released
        var updatedProduct = await _orderFixture.GetProductAsync(product.Id);
        updatedProduct.Stock.Should().Be(10); // Stock back to original

        // Verify no order was created
        var orders = await _orderFixture.GetOrdersForCustomerAsync(customer.Id);
        orders.Should().BeEmpty();
    }

The tests verify that side effects actually happened: inventory was decremented, notifications were sent, payments were processed.

Resilient to Refactoring

You can completely restructure your internal architecture and these tests will continue to pass as long as the external behavior remains the same.

Imagine you're introducing a new "User Validator Middleware", or centralizing the order calculation service in a new/different layer.

Minimal to no changes on those tests will ensure that there's actually no regressions between the old and new code.

At the end of the day, this is what you want from tests. They should catch regressions in behavior, not changes in structure or be rewritten every time you do one.

What is a Unit?

A question that every dev wonders but nobody can actually agrees on what a unit is.

Is it a method? class? module? bounded context? Ask 5 devs and you'll get 6 different answers.

Even the same developer will define "unit" differently depending on what feature they're working on that day.

When you're working in a team, this subjectivity creates chaos, especially when iterating fast. Everyone has different definitions and approaches to testing the same functionality.

Tests start overlapping in weird ways. You're trying to understand how something works, you assume you're testing X, but the developer before you was testing Y while mocking Z.

It becomes a nightmare when you need to change existing code. How is this component tested? Should the existing test change? Is there even a test covering this scenario? You spend more time archaeology than actual development.

Application Level testing cuts through this mess.

Instead of arguing about artificial boundaries that are fragile/internal and bound to change, you align on something concrete that actually matters: the behavior your application exposes to users.

Everyone on your team understands what that means and where it happens. You only need to dive into the implementation details rabbit hole for genuine edge cases.

We still need Traditional Testing

There are times when testing singular/very small unit still makes sense:

Complex Algorithms

Even with complex business logic or calculations, your default should still be Application Level tests. Only consider isolated testing if you've actually proven that testing through the application boundary is prohibitively expensive/impossible.

Most developers reach for isolated tests way too early. They see a complex algorithm and immediately think "this needs its own test class."

But ask yourself why, are the combinations you're testing realistic scenarios that can actually happen in production? Or you're just overly testing for no reason?

An Exception

The most common and valid use case for a 'yes` answer for "lowering" the level of tests is promoting layers and components:

Following up on our Orders API and talking about Order Pricing. In the early days of your product, pricing is simple. You fetch prices from providers, maybe apply a basic discount, done. Testing this through your orders API works fine.

But then your company/product grows. Suddenly pricing becomes complex. Different prices by country, by customer tier, by time of day, by inventory levels. Bulk discounts, promotional codes, seasonal adjustments.

Yeey! You now have hundreds of edge cases.

At this point, testing every pricing combination through your orders API becomes a counter intuitive nightmare. Your test suite might take forever to run, and you're creating thousands of fake orders just to test pricing logic which seem unreasonable.

This is when you extract pricing into its own library/service/component/module with its own Application Level Tests.

You're not testing implementation details anymore, you're testing a legitimate business domain that has its own clear boundaries and contracts.

Common Push back!

Here's the common answers I faced when proposing this variations

"But my tests are faster!":

Traditional unit tests are fast to run, but slow to maintain. Every time you refactor, you're updating both production code and test code. The time you save in test execution, you lose in test maintenance.

Application Level tests might take a few hundred milliseconds instead of a few milliseconds, which is not very noticeable for a Human behind a screen, but they break far less often and provide much better coverage. Factor the total time in!

"I can't test error paths without mocks!":

You absolutely can. Use test implementations that can be configured to simulate different behaviors, including failures. This gives you much more control than mocks while still testing real code paths.

Mocks/Stubs are ok for external dependencies where you're rightfully assuming how they work and testing your code reaction.

The dependencies themselves (and those assumptions) will be tested by other types of tests like Integration/Contracts Tests.

"My coverage will go down!":

Coverage metrics based on line of code execution are misleading and not something you should reliably track. Application Level tests typically provide better functional coverage even if they show lower line coverage.

Instead of measuring lines covered, measure behaviors covered. A single Application Level test often covers multiple classes and integration points that would require dozens of traditional unit tests.

"These aren't unit tests, they're integration tests!":

The goal isn't to categorize tests but to write tests that give you confidence in your system's behavior while being maintainable over time.

Call them whatever you want, although I still find unit a more appropriate name given they're exclusively in memory. Your unit is your service if you think about it ;)

Safely Start Today!

You are sold, but failing to see how can you apply that in your existing legacy code. Here's what worked for me before:

You don't need to rewrite your entire test suite overnight. Align with your team, and strategically do the following:

New Projects/Services: Start writing Application Level tests from the beginning
New Features (Existing Service): When adding features to existing codebases, write Application Level tests instead of traditional unit tests
Bug Fixes: When fixing bugs, write a Application Level test that reproduces the issue, then fix it
Refactoring: When refactoring code that has brittle unit tests, consider replacing them with Application Level tests

Over time, you'll find that your test suite becomes more valuable and less burdensome, especially when you got all the scaffolding done.

Conclusion

The goal of testing isn't to follow a particular methodology or hit coverage targets. It's to give you confidence that your software works correctly and continues to work as you make changes.

Traditional unit testing often fails at this goal because it focuses on implementation details rather than behavior. By shifting to Application Level testing, you can write tests that are more valuable, more maintainable, and more resilient to change.

Disclaimer

Before you start implementing Application Level tests, a word of caution about the examples in this article. The OrderTestFixture and data setup patterns I showed are simplified for illustration purposes and come with their own set of pitfalls.

Test code should be treated like production code. Setting up test data is an art with multiple considerations around maintainability, isolation, performance, and readability. The same goes for test helpers and fixtures.

There are better patterns that avoid the issues I glossed over here. You can see what i'm babbling about in this future article (retroactive link to be put 😉).