Posted: January 20, 2023

A Tale Of Two Scope Creeps

Table of Contents:

The First Scope Creep
The Second Scope Creep
The Moral Of the Story

I was reflecting on a week filled with deploys last night and thought there was an interesting takeaway about two different instances of scope creep — and the two very different outcomes we as a team ended up with in two of our applications. It’s a story where you as the reader get to decide at the end of the day what the takeway is. I’ll be interested to hear if you end up with the same takeaway that I did.

The First Scope Creep

Last sprint, we were focusing on making some permissions-based changes to an application we maintain. While doing so, we noticed we were running pretty close to some system limits in another part of the application. Even though the story that were supposed to be working on should have had us purely updating permission sets, we had a bit of extra time and it seemed like a good opportunity to approach this potential limits-based issue.

Cut to a few days later — we were still working on the same thing. We’d run into a whole host of problems with something that felt like it should have been relatively simple. We have a scheduled job which processes quite a bit of data. In order to cut down on the size of the job, we were planning to update the finish method in a Batchable class that looked something like this:

public class OurBatchable implements Database.Batchable<SObject>, Database.Stateful {
  private final Set<Id> ourStatefulSet = new Set<Id>();

  public void finish(Database.BatchableContext bc) {
    Database.executeBatch(new OurSecondBatchable(this.ourStatefulSet));
  }
}

Given that the stateful collections we were using drive out the limit-approaching behavior, the “relatively simple” solution would be to simply split them:

public class OurBatchable implements Database.Batchable<SObject>, Database.Stateful {
  private final Set<Id> ourStatefulSet = new Set<Id>();
  // let's say half the max batch size amount
  private static Integer SPLIT_AMOUNT = 2000 / 2;

  public void finish(Database.BatchableContext bc) {
    List<Set<Id>> splitSetIds = new List<Set<Id>>{
      new Set<Id>()
    };
    for (Id recordId : this.ourStatefulSet) {
      if (splitSetIds[splitSetIds.size() - 1].size() == SPLIT_AMOUNT) {
        splitSetIds.add(new Set<Id>());
      }
      splitSetIds[splitSetIds.size() - 1].add(recordId);
    }
    for (Set<Id> splitSetId : splitSetIds) {
      Database.executeBatch(new OurSecondBatchable(this.splitSetId));
    }
  }
}

This is a seemingly innocuous change, but it had broad-reaching implications across our entire application. We took it as the chance to do some much needed refactoring on OurBatchable, and though it took longer than we’d anticipated, in the end it seemed like a pretty good change. We were trying to walk the “leave the codebase cleaner than when you found it” philosophy. I’d recently read Stop lying to yourself – you will never “fix it later”, so even though our “simple permissions change” ended up taking up more time than we expected, it seemed like we were doing the right thing.

This Monday, we went to deploy our changes. Immediately, we ran into issues in production. The root cause was unclear; we’d been running the modified version of our daily job for going on 2 weeks across varius lower environments without issue, but what was clear is that we needed to rollback.

Because we believe in linear git history and we employ package-based development, our rollback strategy looks like this:

Create a revert commit that undoes the last commit (which is the sum total diff of all the changes we’d made previously): git revert HEAD~1 works nicely here
Create a new package version and push that change along with the revert — this unblocks current users
Cherry pick only the permission set changes: git checkout HEAD~2 -- path/to/permissionsets
Create a new package version
Commit those changes and push

Because our CI runs deploys on commits to our main branch, we were able to unblock current users while packaging up steps 3-5.

The Second Scope Creep

We had our sprint retro and talked about good and bad scope creep, before moving on to our next item: creating a scheduled job in another package of ours to monitor for service interruptions with one of the APIs we integrate with. There’d been an issue a month or two ago where a breaking change was made to the API, and because our application makes frequent use of it, we were “first responders” as far as letting the other company know. Now we wanted to automate the process of monitoring the status of the API as a “keep alive” check; we didn’t want log messages created by errors from our users to be our first indication that something was wrong.

This API uses GraphQL. For those unfamiliar with the GraphQL query syntax, this is the sort of thing that gets stuffed into the setBody() method of an HttpRequest:

query myQuery {
  apiName(filter1: "someString", filter2: 10010, filter3: "1700-01-01") {
    edges {
      node {
        someFieldYouCareAbout
        someObjectYouCareAbout {
          aFieldOnThatObject
        }
      }
    }
  }
}

Because GraphQL isn’t valid JSON, building up a request body with all of the fields isn’t exactly easy. Perhaps at some point GraphQL support will be added to the Apex standard library, so that we’ll have a GraphQL class in addition to JSON and XMLStreamWriter. In the meantime — at a certain point, you’re going to be working with strings. We’d previously vetted a few Apex GraphQL libraries that were open source, but none really fit our use-cases and many of them were complete overkill for what we were trying to achieve. We’d originally implemented the usage of this API in May of 2021 by making a “minified” GraphQL string:

HttpRequest req = new HttpRequest();
// filling out of other req properties
req.setBody(
  'query myQuery { apiName(filters...)' +
  ' { edges { node { someFieldYouCareAbout someObjectYouCareAbout { aFieldOnThatObject } } } }'
);

I’ll be the first to say that it wasn’t pretty — but it also gets the job done. Sometimes practicality is the best tool we can wield.

With that being said — we’d also done some work as of late to start interacting with this API elsewhere (in a third, heretofore unmentioned application!), and as such had considerably streamlined the approach it took for a downstream consumer of the GraphQL API to get data back from it. That code? Wildly awesome — an object-oriented paradise of sorts. But I’m not going to show it, because it’s a total tangent in the allegory we’re in! It made a lot of sense that we would write our scheduled job to make use of the updated syntax, but that left us in an interesting position:

we now had a scheduled job making use of a much cleaner API
we still had our old code making callouts in a separate, less clean, way elsewhere in this same app

On the one hand, after having been burned not even hours before by scope creep, it made sense to leave the existing application code as is, it also felt bad. We hadn’t touched this application in months — who knew when we’d get back to it next? The scheduled job basically had all of the code we needed to cut over to our newer API service. In a minute, we had copied it over to our existing service and completely eliminated all of the GraphQL-based code from this repository.

Since this was the place we’d introduced the usage of the GraphQL API originally, we had a ton of tests already written that dealt with the usage of the API; this was a strict refactor, and we could prove that the update should be completely harmless by just running the tests. Nothing needed to change in the tests, at all — and they all passed. In one fell swoop, we’d eliminated nearly 100 lines of code.

The Moral Of the Story

We did four deploys across two “stories” worth of work this week:

3 to the first application (1 poisoned, 1 revert, 1 update with just the relevant changes)
1 to the second application (creating the scheduled job and migrating to our newer GraphQL architecture)

Both stories had scope creep in them. One took up a significant amount of time, and even after a significant amount of testing it still required a “fail-forward” rollback scenario. The other worked immediately and eliminated a lot of code. If I were to take a stab at my own moral in this, it’s my favorite quote from Ralph Waldo Emerson:

A foolish consistency is the hobgoblin of little minds

I had to laugh when realizing the links above are to other blogs where I’ve referenced that quote, previously. In other words — sometimes scope creep is bad. Sometimes it’s good. Sometimes we don’t even know whether it’s good or bad until after the fact. Live and learn from it as you’re continually refining your approach to software development (see also: blameless post-mortems from The Life & Death Of Software).

Lastly, I was surprised to find out I'd won a Codey Award for the most listened-to Salesforce Developer podcast of 2022. Thanks for being jazzed about open source with me!

As always, thanks to Henry Vu for his support on Patreon.

In the past 6 years, hundreds of thousands of you have come to read & enjoy the Joys Of Apex. Over that time period, I've remained staunchly opposed to advertising on the site, but I've made a Patreon account in the event that you'd like to show your support there. Know that the content here will always remain free. Thanks again for reading — see you next time!