The Test that Cried Wolf

There was an interesting post on the BDD list today which is a pretty common question:

TLDR I want to automate receiving a SMS in my test to verify my SMS send with <vendor> worked what is the best way to do this?

An answer came back that you can use twilio and recive the message through their API
This is in general a terrible idea and you should avoid it.
The argument quickly came back that its easy and relatively cheap to automate why not?

STOP

People have a mistaken view that something being cheap and simple to autmate make that thing a good idea to automate. The reason its so terrible to automate the sending of a text message has nothing to do with the cost of the initial automation (though its not as simple as people think, I have done it!). The reason its so terrible is that it will become the Test-That-Cried-Wolf.

Let’s start with the service you will use to receive text messages (in this case twilio)

http://status.twilio.com/services/incoming-sms

1 day, 23 hours ago     This service is operating normally at this time.
2 days ago      We are investigating a higher than normal error rate in TwiML and StatusCallback webhooks
1 week, 6 days ago      This service is operating normally, and was not impacted by the POST request issue.
1 week, 6 days ago      We are investigating an issue with POST requests to /Messages and /SMS/Messages.
2 weeks, 1 day ago      Twilio inbound and outbound messaging experienced an outage from 1.30 to 1.34pm PDT. The service is operating normally at this time.
2 weeks, 1 day ago      Our messaging service is currently impacted. We are investigating and will provide further updates as soon as possible.
2 weeks, 1 day ago      All queued messages have been delivered. All inbound messages are being delivered normally.
2 weeks, 1 day ago      All inbound messages are being delivered normally. Our engineers are still working on delivering queued messages. We expect this to be resolved before 6pm PDT
2 weeks, 1 day ago      A percentage of incoming long code messages, that were received between 3.02pm and 3.45pm are queued for delivery. Our engineers are actively investigating the situation.
2 weeks, 2 days ago     A number of Twilio services experienced degraded network connectivity from 8:47am PT to 8:50am PT.  All services are now operating normally.
2 weeks, 2 days ago     This service is operating normally at this time.
2 weeks, 2 days ago     We are getting reports of elevated errors. Our Engineering Team is aware and are working to resolve.
2 weeks, 5 days ago     This service is operating normally at this time.
2 weeks, 5 days ago     We are investigating a problem where webhooks in response to incoming SMS or MMS messages may be delayed or may be made multiple times.

What happens when your service that you only use for receiving SMS in your test is having a problem? Test Fails.
What happens when your service sending the SMS is having a problem? Test Fails.
There are at minimum two other providers here. Test Fails.
Anyone who has owned a phone knows that SMS are not always delivered immediately. How long do you wait? Test Fails.
Anyone who has owned a phone knows that SMS is not guarenteed delivery. Test Fails.

Start adding these up and if you run your tests on a regular basis you can easily expect 1-2 failures/week. On most teams I deal with a failed test gets looked at immediately to figure out why its failing. In all of these cases it will have nothing to do with anything in your code and is a temporal issue (quite likely not impacting production). How many times will you research this problem before you say “well it does that all the time”.

The cost of such tests is not in their initial implementation but in their false positives . When >90% of the test failures have nothing to do with your system the failures will GET IGNORED. What’s the point of having a test when you ignore the failures? These are the tests-that-cry-wolf and should be avoided. There is a place for such tests, they are on the operations side where any crying-wolf is a possible production issue and WILL be investigated.

3 Comments

  1. Posted October 23, 2014 at 2:10 am | Permalink | Reply

    Agreed. A corollary to this that I’ve observed is it starts infecting the team’s opinion of _all_ the tests. The team starts thinking “oh, it’s ok, it’ll probably work on the CI Server….git push”.

  2. Posted October 26, 2014 at 6:35 am | Permalink | Reply

    You could throw the test into its own group and run it less frequently than the standard test group, say before a release or nightly. Name the group something that describes it for what it is “ExternalDependencyTestGroup”. That way you at least have a quick way of making sure the end-to-end system is working, but don’t run it as part of your CI and waste everyone’s time when it breaks.

  3. Thomas
    Posted March 10, 2015 at 12:27 am | Permalink | Reply

    Using stuff like NServiceBus you would more likely implement this as a custom check that is executed frequently in your production endpoint. While a failing integration test like this is not telling you anything about your codebase, a custom check like this will help keeping customer complaints from escalating beyond first level support.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: