By Johna Till Johnson and Vladimir Brezina
Take a scientist and an engineer, add a kit designed for children, and you’ll end up with a science project.
A few days ago (on the first day of spring, to be exact), we decided to color Easter eggs. We’re not sure whose idea it was (each of us says it originated with the other), but regardless: There we were with 14 hard-boiled eggs and the same PAAS egg-dyeing kit that Johna remembered from childhood. (In Czechoslovakia, too, a country nominally communist but where Easter traditions were hard to uproot, Vlad had something very similar.)
We set to work. The dye tablets fizzed in the vinegar, the appropriate amount of water was added, and the first six eggs were happily soaking in their colors. And then one of us noticed something:
“Hey, what are those lines?” As the dye deepened, several of the eggs were showing white lines, two per egg, circumscribing the eggs and trisecting them neatly. Why was this happening?
“It’s almost as if there’s a waxy or oily deposit, something that’s preventing the dye from absorbing,” Vlad speculated. “But the lines are so even…”
Suddenly Johna had a flash. “I bet it’s wheels. The eggs go on little wheels to be transported for processing, and they probably lubricate the wheels with something!” (Can you guess which one of us is the scientist and which the engineer?) We were envisioning something like this or this.
We looked at each other, the same question in both eyes: “How could we validate this hypothesis?”
What we needed was something that could potentially remove the waxy coating (if that’s indeed what it was). Some kind of universal solvent…
The thought was the deed. And fortunately, we had a handy household chemical that should do the trick: Acetone, otherwise known as nail polish remover.
So we launched an experiment.
And in the course of doing so, as Vlad pointed out, we encountered many of the common pitfalls of “real” science: ad hoc experimental design, materials limitations, inadequate sample size, data completeness issues, experimenter bias, exaggeration of the scope of the conclusions… Indeed, the very goal of “validating”—instead of neutrally testing—our hypothesis was setting us up for confirmation bias: we wanted the hypothesis to be true.
What did we find out?
We cleaned roughly half, six out of the fourteen, of the eggs with acetone, leaving the other eight untouched. Why not seven—exactly half? This happened because we were still adjusting the experimental design as we went along—a big no-no.
Of course, since we didn’t know what the waxy substance was (if, indeed there was a waxy substance), we had no way of knowing whether acetone would remove it. But we used acetone anyway because (a), it does a pretty good job removing many oily and waxy chemicals, and we’d already determined that whatever it was wasn’t soluble in acid (vinegar) or water; and (b), we, um, had acetone lying around.
Which is another common occurrence in science: you use the tools you have, or that you can afford, rather than what might be optimal. Of course, in “real” science you cannot (yet) admit in your published paper that you could not do the experiment right because you didn’t have the money.
The first batch of acetone-treated eggs went into the dye.
We watched closely. Were those lines forming? “No, no, lines go away!” Johna shouted, and aggressively sloshed dye over the treated eggs, trying to get the lines to disappear.
Vlad laughed: “See how it works? You want a certain result, so you’re pushing to make it happen.” As indeed Johna was. Properly, the experiment should have been conducted blind: we should have been unaware which eggs were treated and which untreated until the end of the experiment.
As the first batch of eggs matured, we realized our testing methodology was too primitive: We were only recording “line” or “no line”. In reality, some eggs had no lines at all, some had very clear lines… and some were in-between. What did we do with the in-between ones? Count them as “line” because we could faintly see lines, or “no line” because the lines were clearly fainter?
There was really no good solution to this problem short of going down the rathole of measuring line width, sharpness, etc. So the results we recorded were a simplified, and very ad hoc, representation of the actual situation.
With the third and final batch of eggs we encountered yet another issue. Johna was unsatisfied with the out-of-the box “purple” color. To her eye, it was more of a fuchsia. So she wanted to give one of the purple eggs a finishing wash in blue.
That egg—an acetone-treated one—had shown some signs of line formation. But by giving it a double dose of dye, Johna was changing the background treatment of that one egg—inconsistent test methodology in action!
But hey, we were making Easter eggs! So into the blue dye it went. And for whatever reason, the lines disappeared. (And the egg emerged a lovely shade of true purple).
Finally it was time to tally up the results. Of the eight untreated eggs, seven had lines. Of the six treated eggs, three did.
So, did we validate the hypothesis? Well, you could say that untreated eggs had an 87.5% chance of having lines while the treated ones had only a 50% chance—a large decrease, a pretty clear indication that our hypothesis was correct.
You could say that—but it would be incorrect.
Because we don’t actually know what is “normal” for each group of eggs: maybe the 87.5% result was an extreme outlier for the untreated eggs, and the average would be more in line with those for the treated ones. Or the other way round, of course: the 50% result might have been atypical of the treated eggs.
In other words, we don’t know what the error in our measurements was. To estimate the error, we’d have to run several rounds of the experiment, using many more eggs, and obtain a distribution of the results for the untreated and for the treated eggs. Then we could compare the distributions and see if the difference was (statistically and practically) significant.
So no, we didn’t validate the hypothesis—although, if this were a “real” science experiment, our results might well have been reported as doing so in the popular press.
Furthermore, was the experiment that we did really able to validate the hypothesis in the first place? If the hypothesis was that the lines were the traces of the mechanics of the egg-sorting process, then no. What we did was much more modest: we merely tested the effect of acetone on the lines. The extrapolation to the larger conclusion was not addressed by our results. This is why “real” scientific papers—in contrast to the accompanying press releases—often sound so unsatisfactory.
But scientists are infinitely optimistic about their theories! And it’s not possible to formally disprove a theory by any number of experiments. So we can still continue to believe that the lines were the result of some sort of mechanical process during the egg collection and sorting. More studies are needed…
What we have certainly learned, though, is that doing science correctly isn’t child’s play, even when you’re playing with children’s toys.
We discovered afterward on the Internet that similar experimentation with Easter egg coloring kits is irresistible for many people, as for example here and here and here… Another lesson: read the literature in the field first!