Feature-flagging via LaunchDarkly — and why we moved to StatSig

16 min readNov 6, 2022

Why Feature Flag

When I first joined Motion in late February of 2022, there were three critical problems facing the engineering team: constant firefighting, incredibly long (and unstable) release processes, and no ability to roll back. I immediately recognized all three types of issues as stemming from a lack of feature flag infrastructure. Feature flagging is a backbone for any large engineering organization.

a. By gating a feature behind a flag, instead of just whether or not it merges into main, it allows teams to firefight on their terms. With feature flags, if a new code path is rolled out and starts causing bugs, instead of reverting the entire release teams can just revert the flag.

b. Furthermore, engineers no longer need to have incredibly long-lived and large feature branches. Instead, Motion encourages engineers to frequently commit small patches to main that are gated behind feature flags.

c. Lastly, by moving away from Git Flow — which is fundamentally merge based — to a release-branch model where we cut release branches, we allow a completely automated release process to take place by CI, with no humans involved. This speeds up release times and makes them much more stable.

So like any new Head of Engineering whose team was drowning, I decided we needed to buy a solution, and not build one ourselves — especially since feature flags were not our core product. While I was completely correct in my initial diagnosis, I was horribly incorrect in choosing LaunchDarkly.

LaunchDarkly had been highly recommended by several peers I had worked with at previous startups, so I honestly didn’t put too much thought into vendor evaluation — everything was horribly broken and we needed to move fast, so let’s just do it. Or so I naively believed.

Timeline

I want to speak to the duration and scope of our usage to be fully transparent. We aren’t a massive company (just 8 engineers), and perhaps there’s things LaunchDarkly does that other platforms do not for ultra large companies. But I can confidently say it’s not the right solution for small to mid-sized startup of our size.

I was fully onboarded onto the Motion codebase by March 14th, and by April 1st LaunchDarkly was mandatory for every new feature in our backend, web app, and chrome extension.

By July 1st it was obvious this was a huge mistake. By August 1st we began integrating Statsig into our codebase, and by November 1st LaunchDarkly was completely phased out.

At our peak we had 52 concurrent feature flags running in prod and about as many in dev.

Reason 1: Pricing

There’s a reason this is #1. As a small startup, we are very cost conscious. Startups can only suffer “dirty” ARR growth for so long before we’re forced to clean up our act and become profitable — especially in this economy. The trend has moved towards usage based pricing models, but LaunchDarkly stubbornly insists on seat-based pricing. So every engineer we added had an additional cost.

Like any scrappy startup, we tried to stay away from this and just give access to a select-few, but it just added extra burdens on those few engineers to keep track of everyone’s launches and feature flags — from reminding them to turn them on after a release to cleaning them up after a launch. Even adding new ones to the test environment became a laborious activity that folks had to pair on. Ultimately the developer productivity cost wasn’t worth it.

But the worst part about LaunchDarkly is their bizarre MAU estimation. Motion is a paid, logged-in only product. It is not possible to have mysterious unaccounted-for MAU. And yet, as soon as the second week of our usage, we got these rather scary warnings:

On March 29th I grew concerned enough to contact their support team:

Exact numbers have been redacted to protect company specific information.

They responded with the following anonymous-user explanation:

We went back and forth a few times, but we never got to the bottom of this. Again, it’s not possible for us to have anonymous users — our product is logged-in only, and paid-only. Our MAU must match Stripe. We never got to the bottom of this.

While it’s great that there were no overages from this, as the email indicated it did mean the dashboard was somewhat hampered. There was a very limited amount of functionality — we could never search customers for example by email or by name, and other odd UI quirks started popping up as soon as this red banner came up. My guess is that this is LaunchDarkly’s way of incentivizing you to pay up for a higher tier.

Reason 2: Custom Attributes are Completely Broken

Ok, sometimes products we love are insanely expensive. Like Apple. And yet, complain as I might, I still pony up for a new iPhone. So maybe you’re thinking LaunchDarkly is in that premium tier of product that you just pay up for but in exchange you get the best there is. Right?

Wrong.

Custom attributes are completely fucking broken. When I discovered this I was so shocked I had to re-reply to support to make sure I wasn’t misunderstanding their response.

So in the most basic example, when we calculate someone has a use motion.com email, we wanted to set a very trivial isInternal custom attribute on their profile to true. Obviously, we were going to use this for dogfooding. Should be easy right? When someone logs in, client checks their email with firebase auth, if it has the domain, set the attribute. Then the backend fetches their LaunchDarkly profile based on the user ID and you should have access to that custom attribute! Boom easy peasy.

Nope!

You have to supply the computed custom attribute values every time you call LaunchDarkly!!! I’m going to say this again because I could not fucking believe it. You have to supply the computed values every. damn. time.

When I first encountered this I was convinced this was a bug in their SDK, so I did the “right thing” and immediately reported it. This was their response:

I was so aghast I responded with an even dumber question just to double check that I wasn’t strawmanning their suggestion:

And this was their response:

There were many more harder to calculate custom attributes we’d like to have set (power users with > X number of calendar events, power task users with > Y number of Motion tasks over a 2 week period, etc) which would have been quite difficult to compute on the fly each time we called LaunchDarkly.

Some of you maybe wondering — like I was — what value is LaunchDarkly providing in that case besides being a $1000/mo hash map? It’s a hell of a business model, that’s for sure.

Reason 3: LaunchDarkly on a Chrome Extension ☠️

Integrating LaunchDarkly in a chrome extension is not super trivial, and their documentation does a pretty awful job. If you naively put LaunchDarkly’s JS SDK into the content script, then you’ll end up with a copy of LaunchDarkly on every tab your extension runs on. In our case, that was every single tab they opened, and we had quite a few flags loaded and getting updates. We were already running into performance and memory issues, so this wasn’t an option.

Instead, we wanted to run LaunchDarkly on the service worker background thread that would be shared between all the content scripts across all the tabs. This way when a feature flag updated, all the tabs running the extension would get it at the same time.

But running in a service worker means you don’t have things like fetch or window, etc. So instead I just went into the LaunchDarkly JS Client SDK on Github and adapted it slightly to work in a background thread context:

import { logInDev } from '../utils'
import api from '../chromeApi/chromeApiBackground'
import newHttpRequest from './httpRequest'
/**
 * Because we're running in either a chrome extension or service worker,
 * we don't have access to the common browser environments (like the
 * window object). So we can't use the standard JS SDK.
 *
 * Instead, we're using the "common" JS SDK which has no browser-specific
 * code. This SDK is used by the react, node, and JS SDKs.
 * https://github.com/launchdarkly/js-sdk-common
 *
 * In order for us to use this, we need to define a "platform" object,
 * similar to how the regular client JS SDK does:
 * https://github.com/launchdarkly/js-client-sdk/blob/56a62f9f39d5141f373eaddb04d527c30ac2ae1c/src/browserPlatform.js
 *
 * This function does that for us.
 *
 * @param options Usually an empty object
 * @returns A LaunchDarkly "platform" that the JS common SDK can use
 */
export default function makeMotionPlatform () {
  const ret = {} as any
  ret.synchronousFlush = false
  ret.httpRequest = (method: string, url: string, headers: any, body: any) => {
    ret.synchronousFlush = false
    return newHttpRequest(method, url, headers, body, false)
  }
  ret.httpAllowsPost = () => {
    return 'withCredentials' in new XMLHttpRequest()
  }
  // Image-based mechanism for sending events if POST isn't available
  ret.httpFallbackPing = () => {}
  // const eventUrlTransformer = options && options.eventUrlTransformer
  ret.getCurrentUrl = () => 'https://usemotion.com'
  ret.isDoNotTrack = () => {
    return false
  }
  try {
    ret.localStorage = {
      get: (key: string) => {
        logInDev(`[launchdarkly] localStorage get ${key}`)
        return new Promise(resolve => {
          resolve(api.storage.local.get(key))
        })
      },
      set: (key: string, value: any) => {
        logInDev(`[launchdarkly] localStorage set ${key}: ${value}`)
        return new Promise<void>(resolve => {
          api.storage.local.set({ key: value })
          resolve()
        })
      },
      clear: (key: string) => {
        logInDev(`[launchdarkly] localStorage remove ${key}`)
        return new Promise<void>(resolve => {
          api.storage.local.remove(key)
          resolve()
        })
      }
    }
  } catch (e) {
    // In some browsers (such as Chrome), even looking at window.localStorage at all will cause a
    // security error if the feature is disabled.
    ret.localStorage = null
  }
  // The browser built-in EventSource implementations do not support setting the method used for
  // the request. When useReport is true, we ensure sending the user in the body of a REPORT request
  // rather than in the URL path. If a polyfill for EventSource that supports setting the request
  // method is provided (currently, launchdarkly-eventsource is the only polyfill that both supports
  // it and gives us a way to *know* that it supports it), we use the polyfill to connect to a flag
  // stream that will provide evaluated flags for the specific user. Otherwise, when useReport is
  // true, we fall back to a generic  'ping' stream that informs the SDK to make a separate REPORT
  // request for the user's flag evaluations whenever the flag definitions have been updated.
  ret.eventSourceAllowsReport = false
  // If EventSource does not exist, the absence of eventSourceFactory will make us not try to open streams
  if (EventSource) {
    const timeoutMillis = 300000 // this is only used by polyfills - see below
    ret.eventSourceFactory = (url: string, options: any) => {
      // The standard EventSource constructor doesn't take any options, just a URL. However, some
      // EventSource polyfills allow us to specify a timeout interval, and in some cases they will
      // default to a too-short timeout if we don't specify one. So, here, we are setting the
      // timeout properties that are used by several popular polyfills.
      // Also, the skipDefaultHeaders property (if supported) tells the polyfill not to add the
      // Cache-Control header that can cause CORS problems in browsers.
      // See: https://github.com/launchdarkly/js-eventsource
      const defaultOptions = {
        heartbeatTimeout: timeoutMillis,
        silentTimeout: timeoutMillis,
        skipDefaultHeaders: true
      }
      const esOptions = { ...defaultOptions, ...options }
      return new EventSource(url, esOptions)
    }
    ret.eventSourceIsActive = (es: any) =>
      es.readyState === EventSource.OPEN || es.readyState === EventSource.CONNECTING
  }
  ret.userAgent = 'MotionPlatform'
  ret.version = '0.1'
  ret.diagnosticSdkData = {
    name: 'motion-sdk',
    version: '0.1'
  }
  ret.diagnosticPlatformData = {
    name: 'JS'
  }
  ret.diagnosticUseCombinedEvent = true // the browser SDK uses the "diagnostic-combined" event format
  return ret
}

The newHttpRequest you see above is a variation of similar code in the JS SDK that I, once again, zombified and adapted for our purposes:

import { logInDev } from '../utils'
import api from '../chromeApi/chromeApiBackground'
const emptyResult = { promise: Promise.resolve({ status: 200, header: () => null, body: null }) }
/**
 * This has been lifted for the most part directly from the `httpRequest.js` source from LaunchDarkly.
 * We _could_ theoretcally have used `fetch` instead of XMLHttpRequests, but I wasn't confident in all
 * the unkown changes that would incur. In an effort to keep as much 1:1 with the official implementation
 * as possible, I decided to more or less life the file with very minor changes.
 *
 * In particular, all references to `window` have been removed.
 *
 * https://github.com/launchdarkly/js-client-sdk/blob/56a62f9f39d5141f373eaddb04d527c30ac2ae1c/src/httpRequest.js
 */
export default function newHttpRequest (method: string, url: string, headers: any, body: any, pageIsClosing: boolean) {
  logInDev(`[launchdarkly] url: ${url}, method: ${method}`)
  if (pageIsClosing) {
    // When the page is about to close, we have to use synchronous XHR (until we migrate to sendBeacon).
    // But not all browsers support this.
    logInDev('[launchdarkly] Returning empty result for http request')
    return emptyResult
    // Note that we return a fake success response, because we don't want the request to be retried in this case.
  }
  const xhr = new XMLHttpRequest()
  xhr.open(method, url, !pageIsClosing)
  for (const key in headers || {}) {
    if (Object.prototype.hasOwnProperty.call(headers, key)) {
      xhr.setRequestHeader(key, headers[key])
    }
  }
  if (pageIsClosing) {
    xhr.send(body) // We specified synchronous mode when we called xhr.open
    logInDev('[launchdarkly] Returning empty result for http request')
    return emptyResult // Again, we never want a request to be retried in this case, so we must say it succeeded.
  } else {
    let cancelled: boolean = false
    const p = new Promise((resolve, reject) => {
      xhr.addEventListener('load', async () => {
        if (cancelled) {
          logInDev('[launchdarkly] Request cancelled')
          return
        }
        logInDev(`[launchdarkly] status: ${xhr.status} body: ${xhr.responseText}`)
        // LaunchDarkly only uses GETs for the flag fetching
        // POSTs are used for diagnostics and other analytics
        if (method === 'GET') {
          try {
            const flags = JSON.parse(xhr.responseText)
            await api.storage.local.set({ flags })
          } catch {}
        }
        resolve({
          status: xhr.status,
          header: (key: string) => xhr.getResponseHeader(key),
          body: xhr.responseText
        })
      })
      xhr.addEventListener('error', () => {
        if (cancelled) {
          logInDev('[launchdarkly] Request cancelled')
          return
        }
        logInDev('[launchdarkly] Network request errored!')
        reject(new Error())
      })
      xhr.send(body)
    })
    const cancel = () => {
      cancelled = true
      logInDev('[launchdarkly] Request cancelled')
      xhr.abort()
    }
    return { promise: p, cancel: cancel }
  }
}

With these two defined, we could then initialize LaunchDarkly in the background thread somewhat similarly to how the JS SDK itself does:

/**
 * A singleton class that runs in the background script for a chrome extension
 * or a service worker in the case of a web app. It should only be initialized once.
 *
 * We rely on message passing to communicate feature flag changes to the frontend.
 * This is because in the case of the extension, there may be many tabs and
 * having each tab monitor for feature flag changes, even with caching, would be
 * poor for performance.
 *
 * Instead, we write the new feature flags to local storage under the 'flags' key.
 * Then, react components can read them based on the api.storage.onChanged api.
 */
export class LaunchDarkly {
  static initialize (user: firebase.User) {
    let CLIENT_SIDE_ID
    if (isChromeExtension) {
      CLIENT_SIDE_ID = isProd() ? PROD_CLIENT_ID : TEST_CLIENT_ID
    } else {
      CLIENT_SIDE_ID = webappEnv === 'prod' ? PROD_CLIENT_ID : TEST_CLIENT_ID
    }
    const ldUser = {
      key: user.uid,
      email: user.email || undefined,
      name: user.displayName || undefined,
      firstName: user.displayName?.split(' ')[0] || undefined,
      lastName: user.displayName?.split(' ')[user.displayName.split(' ').length - 1] || undefined,
      avatar: user.photoURL || undefined,
      custom: {
          isInternal: user.email?.endsWith('@usemotion.com') || user.email?.endsWith('@inmotion.app') || false
      }
    }
    const options = {}
    const extraOptionDefs = {
      fetchGoals: { default: true },
      hash: { type: 'string' },
      eventProcessor: { type: 'object' },
      eventUrlTransformer: { type: 'function' },
      disableSyncEventPost: { default: false }
    }
    const clientVars = (ld as any).initialize(CLIENT_SIDE_ID, ldUser, options, makeMotionPlatform(), extraOptionDefs)
    const client = clientVars.client
    const emitter = clientVars.emitter
    const goalsPromise = new Promise<void>(resolve => {
      const onGoals = emitter.on('goalsReady', () => {
        emitter.off('goalsReady', onGoals)
        resolve()
      })
    })
    client.waitUntilGoalsReady = () => goalsPromise
    clientVars.start()
  }
}

Full disclaimer — this took about two days to figure out, and while it mostly works I’m 100% sure there’s some underlying bugs. I’m positive I’m using code in a way it wasn’t meant to be used, so please don’t blindly copy paste this code wherever.

Reason 4: Detecting Usage of a flag is broken

After you’re sufficiently confident in a feature, it’s natural to want to set the feature flag to 100% true (or 100% false), and eventually, remove that feature flag and the dead code branches entirely. This dramatically simplifies your code by removing cruft and also reduces chaos in your feature flagging system. Oncall engineers have fewer levers they need to focus on, and in general there’s fewer nodes in the system to monitor.

So a good feature flagging system should make it as easy to remove a feature flag as it does to add a new feature flag. And LaunchDarkly fails spectacularly yet again in this regard.

The main reason is that it’s impossible to tell if a feature flag is actually being used or not. This may be unique to us and yet another time when the client-side MAU quota is biting us (see Reason #1), and if so, please forgive me for beating this horse to death. But there’s a chance this bug is caused by something else entirely, and if that’s the case you need to be aware.

After a major refactor we decided to set the feature flag to 100% true, and when archiving a flag, LaunchDarkly produces a graph for you in the ‘Insights’ section to let you know what % of users are getting a particular flag, and what the value of the flag is.

This was our insights graph after we set the flag to 100% true:

And what was the graph when we completely removed it from our code? Yup, same image. So when we go to archive the flag, LaunchDarkly helpfully tells us — don’t archive it! In the past week there’s been 17 thousand invocations of this flag.

We scratch our heads and wonder how that’s possible when there’s no usages in the code. Is it possible some old chrome extensions just never updated? Ok, maybe. Let’s give it another couple weeks. No change in the graph. One month goes by, no change in the graph.

I rage-archive it — let the old chrome extension crash and let’s see if anyone complains. Nobody complains. Yup, the graph is just completely lying and broken. Wonderful.

Reason 5: Death by a 1000 Papercuts

Ok, the above four reasons were really bad. But there were many more nuisances that also irked us.

No support for Expo in React Native — so our mobile apps just couldn’t use feature flags. (Statsig supports Expo!)
Updates were sometimes very random and sometimes fell back to the wrong value in 1% of cases. We’d frequently get reports from a small minority of users who were just flat out getting the wrong values. These all went away completely after switching to Statsig. Nothing else fundamnetal in our code changed, so I’m convinced LaunchDarkly does have some underlying bugs — or maybe the ‘account overage’ warnings were doing something more nefarious than support was letting on. I’m not sure. But this was a constant and nagging source of issues for us for months.
Archiving a flag in the test environment also archives it in prod. This is a mistake we didn’t make more than once, and to be fair the UI does let you know, but it should be way more obvious. Archiving a flag should be in bright blinking red letters, especially since you (you = LaunchDarkly) knew it was still being used in prod. This caused our first incident. Statsig handles this the right way.
Mandating approvals on feature flags is an extra paid feature. So annoying, especially as bad feature flag config changes are one of the biggest causes of incidents as an eng team grows. Once again, Statsig provides this for free.

Conclusion

Lesson 1: Choosing vendors is not a joke

The biggest lesson I learned is that choosing vendors is not quite a type 1 decision, it’s a lot closer than a type 2 decision. This was pretty early in my tenure as head of engineering, and I emotionally was still acting like an IC at heart who was overconfident in his ability to code his way out of any problem.

In reality, spending my time fixing vague feature flag issues that constantly plague 1% of our users is a huge waste of my and the business’s time. There’s much higher leverage activities I could have been doing, and the time we spent migrating away from LaunchDarkly was a huge sunk cost — and I was the only one to blame.

Technical vendor evaluation is now something I take much more seriously (to be seen in a couple months after we use our new end-to-end test QA automation software!).

We’ve been using Statsig for a couple months now without any major concerns. They have a fantastic startup program that you can apply for which offsets the early stage costs, and they have great documentation. Their UI and design is clean, and their SDKs are performant.

One minor nit of Statsig is that their JS client SDK’s don’t get realtime updates by default — we implemented our own polling logic, but this was very trivial to do and not a huge issue.

Lesson 2: Clean up feature flags!

Lastly, feature flagging doesn’t come for free. When left to their own devices, it’s natural for engineers (including me!) to just forget about their feature flags for already launched features. They just end up lingering in the code, causing unnecessary increases in cyclomatic complexity.

At Motion, we’re now militant about cleaning up your feature flags 2 weeks after a launch. Keeping them lingering for no reason only causes further quota pressure on Statsig and increases code complexity.

PS — We’re Hiring

If you like this sort of thing, and you think you’d like working with us at Motion, my DMs are open! We’re hiring :)