How does the FBI know it found “Female DNA”?

So, the latest development in the investigation of the Boston Marathon bombing is a report that the FBI has identified “female DNA” on the remains of at least one of the two bombs used by Dzhokhar and Tamerlan Tsarnaev in the attack. According to the report, published first in the Wall Street Journal, some genetic material has been recovered, and the FBI has gone to collect a DNA sample from Katherine Russell, the widow of Tamerlan Tsarnaev, presumably to see if it matches the DNA recovered from the bomb.

Here’s the thing. How does the FBI know, or think it knows, that it has recovered “Female DNA”? Well, there aren’t a lot of details available yet, but there are a couple of possibilities.

First, let’s start with the basic genetics. Humans normally have 46 chromosomes, which come in 23 pairs, as well as some mitochondrial DNA. From the mitochondrial DNA, and 22 of the 23 other chromosome pairs, there is nothing to tell you whether the DNA came from a male or a female. The genetic difference between males and females resides in that last chromosome pair, the sex chromosomes. At the sex chromosomes, women have two X chromosomes, while men have one X chromosome and one Y chromosome.

So, if you have a discrete source of your DNA sample, like a hair, you could do a couple of things. You could test it for the presence of Y-chromosome genetic material. If the DNA source was female, you should not find any. Of course, that requires basing your conclusion on a negative result (the absence of a Y chromosome), which is not ideal, since it is possible that you could miss the material for technical reasons (e.g., failure of a particular chemical reaction).

The real thing you would look for to indicate that you had DNA from a female is the presence of two different X chromosomes. That means you need to identify the DNA sequence on part of the X chromosome. You can do this by actually sequencing a region of the chromosome, but this is probably unnecessarily expensive. After all, the vast majority of sites on the chromosome are going to be identical, not just in the X chromosomes in your sample, but in every X chromosome in every human being in the world.

What you can do instead is use tools that focus on specific sites that are already known to be variable in the population. Maybe there’s a particular site where it is known that some people have a C in their DNA sequence, while other people have a G. (This is referred to as a “polymorphic” site.) You simply ask whether that site in your particular sample has a C or a G, while simultaneously asking the analogous question about thousands of other sites.

If your DNA sample came from a male, you might find that the answer is C, or G, or whatever, at a particular site. What the answer is is not as important as the fact that there will be a single answer. If your DNA sample comes from a woman, you should find that sometimes you have a mixture of C and G. Of course, at a given site, you could still get a single answer, say, G, if both of the woman’s X chromosomes had a G at this position. However, if you look at a whole bunch of sites, you should find that a decent number of them indicate a mixture of two sequences — revealing the presence of two distinct X chromosomes, and therefore, a female.

But what if you don’t have a discrete genetic sample, like a hair, to work from? There’s not a lot of detail in the original article, so we have to speculate a bit here. (I’ve reached out to the reporters from the original piece, to see if there was some genetics-dork-relevant information that did not make it into the article. I’ll post an update if and when I hear back.) It seems likely that the bombs would also have carried DNA from one or both of the Tsarnaev brothers. Thus it is possible that the DNA collected by the FBI could contain a mixture of cells from multiple different individuals — like, say, they swabbed all around the bomb’s remains to collect their samples. What would they need to do then?

Well, first of all, let’s consider the case where you had a mixture of the two brothers’ DNA. The Y chromosomes from Dzhokhar and Tamerlan would be (virtually) identical, having both been copied from single the Y chromosome of their father. The two would have distinct X chromosomes, each of which would be a patchwork of pieces copied from their mother’s two X chromosomes.

So, the X chromosomes present in this sort of sample would look similar in some ways to the X chromosomes you would get from a female DNA sample: there would be some polymorphic sites where you would find a mixture of DNA sequences in your sample. However, we would not expect to find as many of these mixed sites as in a sample from a female. On average, half of the X chromosome sequence inherited by one brother would be (virtually) identical to the sequence inherited by the other brother. Although, depending on how, exactly, recombination plays out, the identical fraction of their X chromosomes could range anywhere from nearly none to nearly all of it. It is possible, just by chance, that the X chromosomes inherited by Dzhokhar and Tamerlan would be as different from each other as the two X chromosomes present in their mother. Of course, at this point, DNA samples have almost certainly been collected from both brothers, so that investigators would know exactly what sequences to expect.

But what if there was an even messier mixture of DNA, say with samples from both brothers as well as one or more additional people? Well, at some point, the procedure of just looking for mixed sites in the DNA sequence is going to run into trouble. At most of these polymorphic sites, there are just two variants circulating at any frequency in the population. So, simply identifying sites that are polymorphic within your sample will let you distinguish between one X chromosome and more than one, but will not necessarily do a good job of telling you exactly how many different X chromosomes are present.

One approach to deal with this situation would be to look at a different type of polymorphism, one where there are more than two sequence variants present in the population. The polymorphisms most commonly used in this sort of context are short tandem repeats (STRs). These are stretches of DNA where a short sequence, maybe four or so nucleotides long, is repeated over and over again. Due to the nature of the process by which DNA is copied, these sequences are prone to a particular type of mutation, where the number of repeats increases or decreases. So, I might have a stretch of 19 copies of the sequence TCTA at a particular site in my genome, while you might have 23 copies of TCTA at the same location in your genome.

By looking at a whole bunch of these STR sites, the FBI could probably tell if the DNA they collected contained two, three, four, or more distinct X chromosomes. And, these are most likely the sorts of sites they will be using to see if the DNA collected from the bomb matches the DNA collected from Katherine Russell.

Although the focus of this post has been on genetics, and specifically what it means for the FBI to say that they recovered some “female DNA,” I would be remiss if I did not include the caveat (emphasized in the original WSJ article) that there are a lot of different ways that someone’s DNA might have gotten onto one of the bombs without that person having been involved in the bombing — even if that person winds up being Tamerlan Tsarnaev’s widow.

E. O. Wilson is Wrong Again — not About Math, but About Collaboration

So, stop me if you’ve heard this one.

Q: What’s the difference between E. O. Wilson and a stopped clock?

A: A stopped clock does not have unlimited access to a national media platform to push its ridiculous ideas on the public.


A couple of weeks ago, E. O. Wilson published a piece in the Wall Street Journal, where he argued that you don’t need math to be a great scientist. There are two parts of the argument. First, that science is more about conceptual thinking that does not require mathematical formalism to get at great ideas. Second, that when it comes time to mathematize, you can always find a mathematician to collaborate with.

He has already been taken to task in places like Slate and Huffington Post. The criticism in these pieces and most of the grumbling I’ve heard around the internet has been something along the lines of, “Nuh uh! Math is too important!” More specifically, that the era of math-free scientific discovery is over. That to operate at the frontier of science in the twenty-first century, you have to be able to grapple with the mathematical and statistical concepts required in the days of big data.

There’s something to that.

On the other hand, I’m sympathetic to what Wilson is trying to do here. I would hate to see anyone drop out of science because they don’t feel that they can keep up with the math. Of course, that’s partly because I think most people can do more math than they think they can, if you know what I mean.

But what I want to focus on here is Wilson’s view of collaboration. This, even more than math, is going to be the must-have talent of the twenty-first-century scientist. The thing about science is, an awful lot of it has been done. To get to the frontiers of human knowledge requires years of study, and, for those of us without super powers, a lot of specialization. At the same time, the most interesting and important problems often lie between areas of specialization, and require contributions from more than one area. Those most interesting and important problems are going to be solved by teams and networks of people who bring different skills to the table, and who are able to integrate their skills in a way that leads to a whole that is greater than the sum of the parts.

It’s that integration bit, I think, that Wilson does not really get. Wilson’s view of collaboration seems to go something like this: you make some observations about some biology, come up with some ideas, then you go find someone who can translate those into the language of mathematics.

Here’s the thing about translation, though. It can’t be unidirectional, or rather, it shouldn’t be unidirectional. At the risk of something or other (obscurity? pretentiousness?), I’m going to dip into poetry here. Robert Haas (Poet, Berkeley Professor, and Occupy hero), in addition to writing a bunch of his own extraordinary verse, has translated seven volumes of poetry by Czech Nobel laureate Czesław Miłosz. Or, more accurately, he collaborated with Miłosz to produce those translations.

After Miłosz’s death, Haas included their translation of Czesław Miłosz’s poem “O!” in his own volume Time and Materials. The poem is prefaced with this note about the translation process:

In his last years, when he had moved back to Kraków, we worked on the translation of his poems by e-mail and phone. Around the time of his ninetieth birthday, he sent me a set of poems entitled “Oh!” I wrote to ask him if he meant “Oh!” or “O!” and he asked me what the difference was and said that perhaps we should talk on the phone. On the phone I explained that “Oh!” was a long breath of wonder, that the equivalent was, possibly, “Wow!” and that “O!” was a caught breath of surprise, more like “Huh!” and he said, after a pause, “O! for sure.”  Here are the translations we made:

Now, if you’re not a writer and/or avid reader of poetry, it may seem strange to fuss over the difference between “Oh!” and “O!” But worrying about the difference between “Oh!” and “O!” is precisely the sort of thing that differentiates poetry from other forms of writing. Robert Frost famously defined poetry as “what gets lost in translation.” One way to unpack that statement is to say that translation can typically capture the basic meaning of words and phrases, but the part of writing that is poetry is the part that goes beyond that basic meaning. Poetry is about subtle differences in meaning. It is about connotation and cultural resonance. It is about the sounds that words make and the emotional responses that they trigger in someone who has encountered that word thousands of times before, in a wide variety of contexts.

These things almost never have simple one-to-one correspondences from one language to another. That means that a good translation of poetry requires a back-and-forth process. If you have a translator who is truly fluent in both languages — linguistically and culturally — this back-and-forth can happen within the brain of the translator. But, if your translation involves two people, who each bring their expertise from one side of the translation, they have to get on the phone every so often to discuss things like the difference between “O!” and “Oh!”

Doing mathematical or theoretical biology is exactly like this.

The theories and observations that build up in the biological domain exist in a language that is profoundly different from the language of mathematics. For theory in biology to be both accurate and relevant, it has to stay true to both of these languages. That means there has to be a vibrant, even obsessive, back-and-forth between the biological observations and concepts and the mathematical representations that attempt to capture and formalize them.

As in the poetry case, if you, as an individual scientist, have a deep understanding of the biology and a fluency in the relevant mathematics, that back-and-forth can happen in your own brain. Where E. O. WIlson is right is in his assertion that, if you don’t have the math, you can still make a contribution, by focusing on building your deep understanding of the biology, and then by finding yourself a mathematician you can collaborate with.

But there’s a trick.

If you’re going to follow this route, you have to sit down with your mathematician, and you have to walk through every single equation. You have to press them on what it means, and you have to follow the thread of what it implies. If you’re the mathematician, you have to sit down with your biologist and say, “If we assume A, B, and C, then mathematically that implies X, Y, and Z.” You have to understand where, in the biology, A, B, and C come from, and you have to work together to discover whether or not X, Y, and Z make any sense.

Basically, each of you has to develop some fluency in the other’s language, at least within the narrow domain covered by the collaboration. If you’re not willing to put in this level of work, then yes, you should probably consider a different career.

Now, maybe you think I’m being unfair to Wilson here. After all, he doesn’t explicitly say that you should hand your ideas over to the mathematicians and walk away. And obviously, I don’t have any privileged access to the inner workings of Wilson’s brain or the nature of his collaborations.

But let’s go back to a couple of years ago, when he collaborated with Martin Nowak and Corina Tarnita to write a controversial paper in which they argued that modeling the evolution of social behaviors based on “kin selection” was fundamentally flawed. That paper elicited a response from the community that is rare: multiple responses criticizing the paper on multiple fronts, including one letter (nominally) co-authored by nearly 150 evolutionary biologists.

I won’t go into the details here, as I have written about the paper and the responses multiple times in the past (here and here, in particular, or you can just watch my video synopsis of the criticism here).

Briefly, the controversial article (published in Nature, arguably the most prestigious journal for evolutionary biologists), completely misinterprets, misrepresents, and/or ignores the work done by other people in the field. It’s a little bit like if you published a physics paper where you said, “But what if the speed of light is constant in different frames of reference? No one has ever thought of that, so all of physics is wrong!” That’s an exaggeration, of course, but the flaws in Wilson’s paper are of this general type.

The weird thing about the paper is that it includes an extensive supplement, which cites much of the literature that is disregarded by the main text of the paper. It is exactly the sort of error that happens when you have something that is written by a disconnected committee, where the right hand does no know what the left hand is doing. Basically, it is hard to imagine a scenario in which someone could actually have understood the papers that are cited and discussed in the supplementary materials, and then turned around and, in good faith, have written that paper.

That leaves us with a few possible explanations. It could be that the authors were just not smart enough to understand what they were talking about. Or it could be that they deliberately misrepresented prior work to make their own work seem more original and important. For the purposes of our discussion here, let’s assume that neither of these explanations is accurate.

Instead, let’s assume that everyone involves is fundamentally competent, and was acting in good faith. In that case, perhaps the problem came from a failure of collaboration. E. O. Wilson probably knows more than just about anyone else in the world about the biology underlying the evolution of social behavior — especially among eusocial insects. Martin Nowak is a prominent and prolific mathematical biologist. Corina Tarnita was a postdoc at the time, with a background primarily in mathematics.

Wilson, as he acknowledges, lacks the mathematical skills required to really understand what the models of kin selection do and do not assume and imply. Tarnita, I imagine, has these skills, but as a young researcher coming out of math, perhaps lacked the biological knowledge and the perspective on the field to understand how the math related to the prior literature and the state of the field. Nowak, in principle, had both the mathematical skills and the biological experience to bridge this gap. He’s a curious case, though, as he, rather famously in the field, is interested in building and solving models, and has little interest in what has been done by other people, or in chasing down the caveats and nuanced implications of his work.

Among the three of them, Wison, Nowak, and Tarnita have all of the skills and knowledge required to write an accurate analysis of models of kin selection. But if assembling the requisite skills was all that was necessary, that Nature paper would have been very different — in much the same way that you could dump a pile of gears, shafts, and pistons in my driveway, and I could drive away in a Camaro.

The challenge of interdisciplinary collaboration is to combine your various skills in a way that creates something greater than the sum of the parts. If you can master this, you’ll be able to make great contributions to whatever field you apply your skills and interests to.

In the case of Wilson’s disastrous paper, what we got was a situation where the deficits that each of the researchers brought to the table combined to create something greater than the sum of the parts. Sadly, I get the feeling that Wilson does not understand this difference, that he thinks collaborating with mathematicians means explaining your intuition, and then waiting for them to “prove” them.

So, yes, you can be a great scientist in the twenty-first century, even if you don’t have great mathematical skills yourself. But, just as Robert Haas called up Czesław Miłosz on the phone to discuss the difference between “O!” and “Oh!” maybe you’re going to have to call up your mathematician collaborators to talk about the difference between O(x) and o(x). You don’t necessarily have to understand the difference in general, but you do need to understand the difference and its implications in the context of the system you’re studying, otherwise you’re not really doing science at all.

Cognitive Biases and the Trouble with Moral Local Shopping

So, the other day, after picking my son up from school, I stopped in at the local hardware store to pick up something or other, maybe a sack of nuts few screws. The nuts screws would have been cheaper at Lowe’s or Home Depot, but I try to shop local when I can. That is, I am happy to pay a higher price for the satisfaction of feeling like I’m supporting the local economy rather than a big corporation, for a sense that the employees are well paid and well treated (whether true or not), and with the idea that sometimes it’s really convenient to have a local hardware store, and it would be a shame if it went out of business and I had to drive over to Lowe’s or Home Depot every time my nut sack screw drawer was empty.

Now, as often happens when you run an errand after picking your son up at school, we were in the middle of shopping when he announced that he needed to use the bathroom. So, I found one of the very nice employees there and asked if he could, you know, use the bathroom.

He said no. More specifically, he said that normally he would let us use it, but the assistant manager was in the store that day, and he was worried that the assistant manager would tell the owner, who had a policy that customers could not use the bathroom. He apologized, and recommended that we go across the street to Dunkin’ Donuts, where they have nice, clean bathrooms, and they don’t give you a hard time, even if you come in to use them without buying anything.

Okay, so what the hell?

The standard story that we tell each other and ourselves when we are bemoaning the loss of little mom-and-pop stores is that these big chain stores are run by heartless corporations, that local business owners know and care about their customers, that they see them as people, rather than just sources of revenue. Why then do Lowe’s and Home Depot have open, well marked bathrooms, while my local hardware store has frightened employees who steer me towards Dunkin’ Donuts?

Of course, this isn’t really about bathrooms. Let me tell you another story.

A couple of days ago, I found a cool looking coffee shop that seemed to emphasize ethical sourcing of its beans, and was staffed by a bunch of people with various tattoos, piercings, and hair dyes. My initial thought was, “Hey, this is cool. I could work here instead of Starbucks, and I could encourage people I know to come here, too.”

As you probably know, the way wifi works at Starbucks is that you click a button in your web browser, agreeing to terms of use, and that’s it.

At this place, they had access to a paid wifi service. Now, they offered free access as well, but I had to go back up to the counter, wait in line again, and ask for the password, which was handed to me on a small card, and gave me access for two hours.

This, like the bathroom, is not a big thing. It’s a little thing, but it’s an annoying little thing. I can’t even tell you how much I paid for my coffee, or whether it was more or less than I would have paid at Starbucks. AND, given the choice, I would favor the smaller business on general principles, but this little thing left me soured on the experience.

My point is not to argue that Lowe’s, Home Depot, and Dunkin’ Donuts are offering public bathrooms as part of a philanthropic effort to prevent public urination and bladder infections. I’m sure that these corporations are just as calculating and heartless as we all imagine them to be. There is only one reason for these corporations to provide nice, clean public bathrooms: the costs (in space, supplies, and cleaning) are outweighed by the benefits (in customer satisfaction and loyalty).

Remember a few years ago Starbucks did not have free and open wifi. For a while they put time limits on it, or required that you use a Starbucks card to access it. So why did they make it so easy now? Well, presumably for the exact same reason that Dunkin’ Donuts lets you use their bathroom: because it makes financial sense.

Sure, there are downsides to having free, unlimited wifi at your coffee shop. Sometimes you’re going to get a customer who milks a single cup of coffee for six hours, taking up a table and an outlet. It has to be hard not to look at that customer and get resentful, to feel like they are ripping you off, getting away with something. But here’s the thing. Whatever that customer is costing you, you are more than earning back from the people who came to your coffee shop because you have free, unlimited wifi. Maybe you even earn it back from that same customer, who uses your table all day on Monday, and then picks up his coffee to go on Tuesday, Wednesday, Thursday, and Friday.

The problem is that the guy who is sitting there using the wifi all day is cognitively salient. After all, he’s sitting right there! All day! It is easy to sit there and brood about how he is cheating the system, getting away with something. The extra customers you get are less salient, because it is easy to imagine that they would have come in anyway. If someone is on the wifi for only half an hour, why would it matter if you have a two-hour limit?

I suspect the difference is that open, unlimited, easy-to-access wifi makes you feel welcome, while limited, closed wifi makes you feel at best like a supplicant and at worst like a would-be criminal who is being scolded in advance.

Why am I so sure that free, open, unlimited wifi is the financially smarter move? Because big corporations like Starbucks and Panera, with lots of data and people who are trying to maximize profits, deliberately switched to the unlimited system.

What puzzles me is why small business owners don’t look at this and say, “You know what? If I am going to compete with these big stores, I should set up free wifi and a nice bathroom. I should try to make my customers feel as welcome and comfortable as I can.”

I suspect that there are two problems here. The first, which I have already alluded to, is one of cognitive biases. It is true in a wide range of contexts that negative events impact us more strongly than positive events: it is emotionally more painful to lose five dollars than it is emotionally gratifying to gain five dollars. So the guy who is freeloading on your wifi is more emotionally salient than all of the people who come for the wifi, spend money, and then leave again before you get mad.

This is a place where the cold, calculating nature of the disembodied corporation has an advantage. It can actually crunch those numbers and discover that this is one of those circumstances where you can cast your wifi upon the waters, for you will find customers after many days.

This may be the difference between the owner and the employee as well. If we use the bathroom, maybe the owner perceives that cost in a direct, emotional way that the employee does not, despite the fact that the employee is more likely to be the one who has to clean the bathroom.

The second problem, I think, is the moral language that we often use when discussing shopping locally, where it is presented as a moral duty to support local businesses. I think that there might be some good, rational reasons to shop locally when possible, but I think that the moral framing causes more harm than good. Small-business owners will often use this as a sort of crutch: “If you’re not shopping local, it’s because you’re a bad person, not because I provide an inferior product at a higher price.” It seems to me that if you’re going to start an independent coffee shop, you need to ask yourself, “What can I do to provide the most satisfying experience for my customers? How can I use my local knowledge and connections to create something wonderful that Starbucks could never pull off?” Every now and then you find something like that. When I lived in Santa Fe, there were a few different places that successfully did this, and I would rotate around, working in various locations, and spending way too much money on coffee.

Of course, the moral argument — this vague sense that small businesses are somehow better than big ones — is one that I buy to an extent. It is one of the reason why I’m willing to pay a little bit more to re-nut my sack. But when the moral argument takes center stage, it eliminates the incentive on small businesses to think creatively about what they’re doing — or at least to copy uncreatively the best practices of their most successful competitors.