So, the latest development in the investigation of the Boston Marathon bombing is a report that the FBI has identified “female DNA” on the remains of at least one of the two bombs used by Dzhokhar and Tamerlan Tsarnaev in the attack. According to the report, published first in the Wall Street Journal, some genetic material has been recovered, and the FBI has gone to collect a DNA sample from Katherine Russell, the widow of Tamerlan Tsarnaev, presumably to see if it matches the DNA recovered from the bomb.
Here’s the thing. How does the FBI know, or think it knows, that it has recovered “Female DNA”? Well, there aren’t a lot of details available yet, but there are a couple of possibilities.
First, let’s start with the basic genetics. Humans normally have 46 chromosomes, which come in 23 pairs, as well as some mitochondrial DNA. From the mitochondrial DNA, and 22 of the 23 other chromosome pairs, there is nothing to tell you whether the DNA came from a male or a female. The genetic difference between males and females resides in that last chromosome pair, the sex chromosomes. At the sex chromosomes, women have two X chromosomes, while men have one X chromosome and one Y chromosome.
So, if you have a discrete source of your DNA sample, like a hair, you could do a couple of things. You could test it for the presence of Y-chromosome genetic material. If the DNA source was female, you should not find any. Of course, that requires basing your conclusion on a negative result (the absence of a Y chromosome), which is not ideal, since it is possible that you could miss the material for technical reasons (e.g., failure of a particular chemical reaction).
The real thing you would look for to indicate that you had DNA from a female is the presence of two different X chromosomes. That means you need to identify the DNA sequence on part of the X chromosome. You can do this by actually sequencing a region of the chromosome, but this is probably unnecessarily expensive. After all, the vast majority of sites on the chromosome are going to be identical, not just in the X chromosomes in your sample, but in every X chromosome in every human being in the world.
What you can do instead is use tools that focus on specific sites that are already known to be variable in the population. Maybe there’s a particular site where it is known that some people have a C in their DNA sequence, while other people have a G. (This is referred to as a “polymorphic” site.) You simply ask whether that site in your particular sample has a C or a G, while simultaneously asking the analogous question about thousands of other sites.
If your DNA sample came from a male, you might find that the answer is C, or G, or whatever, at a particular site. What the answer is is not as important as the fact that there will be a single answer. If your DNA sample comes from a woman, you should find that sometimes you have a mixture of C and G. Of course, at a given site, you could still get a single answer, say, G, if both of the woman’s X chromosomes had a G at this position. However, if you look at a whole bunch of sites, you should find that a decent number of them indicate a mixture of two sequences — revealing the presence of two distinct X chromosomes, and therefore, a female.
But what if you don’t have a discrete genetic sample, like a hair, to work from? There’s not a lot of detail in the original article, so we have to speculate a bit here. (I’ve reached out to the reporters from the original piece, to see if there was some genetics-dork-relevant information that did not make it into the article. I’ll post an update if and when I hear back.) It seems likely that the bombs would also have carried DNA from one or both of the Tsarnaev brothers. Thus it is possible that the DNA collected by the FBI could contain a mixture of cells from multiple different individuals — like, say, they swabbed all around the bomb’s remains to collect their samples. What would they need to do then?
Well, first of all, let’s consider the case where you had a mixture of the two brothers’ DNA. The Y chromosomes from Dzhokhar and Tamerlan would be (virtually) identical, having both been copied from single the Y chromosome of their father. The two would have distinct X chromosomes, each of which would be a patchwork of pieces copied from their mother’s two X chromosomes.
So, the X chromosomes present in this sort of sample would look similar in some ways to the X chromosomes you would get from a female DNA sample: there would be some polymorphic sites where you would find a mixture of DNA sequences in your sample. However, we would not expect to find as many of these mixed sites as in a sample from a female. On average, half of the X chromosome sequence inherited by one brother would be (virtually) identical to the sequence inherited by the other brother. Although, depending on how, exactly, recombination plays out, the identical fraction of their X chromosomes could range anywhere from nearly none to nearly all of it. It is possible, just by chance, that the X chromosomes inherited by Dzhokhar and Tamerlan would be as different from each other as the two X chromosomes present in their mother. Of course, at this point, DNA samples have almost certainly been collected from both brothers, so that investigators would know exactly what sequences to expect.
But what if there was an even messier mixture of DNA, say with samples from both brothers as well as one or more additional people? Well, at some point, the procedure of just looking for mixed sites in the DNA sequence is going to run into trouble. At most of these polymorphic sites, there are just two variants circulating at any frequency in the population. So, simply identifying sites that are polymorphic within your sample will let you distinguish between one X chromosome and more than one, but will not necessarily do a good job of telling you exactly how many different X chromosomes are present.
One approach to deal with this situation would be to look at a different type of polymorphism, one where there are more than two sequence variants present in the population. The polymorphisms most commonly used in this sort of context are short tandem repeats (STRs). These are stretches of DNA where a short sequence, maybe four or so nucleotides long, is repeated over and over again. Due to the nature of the process by which DNA is copied, these sequences are prone to a particular type of mutation, where the number of repeats increases or decreases. So, I might have a stretch of 19 copies of the sequence TCTA at a particular site in my genome, while you might have 23 copies of TCTA at the same location in your genome.
By looking at a whole bunch of these STR sites, the FBI could probably tell if the DNA they collected contained two, three, four, or more distinct X chromosomes. And, these are most likely the sorts of sites they will be using to see if the DNA collected from the bomb matches the DNA collected from Katherine Russell.
Although the focus of this post has been on genetics, and specifically what it means for the FBI to say that they recovered some “female DNA,” I would be remiss if I did not include the caveat (emphasized in the original WSJ article) that there are a lot of different ways that someone’s DNA might have gotten onto one of the bombs without that person having been involved in the bombing — even if that person winds up being Tamerlan Tsarnaev’s widow.
There is no way they can distinguish a scenario where there are two extra men vs. one woman, unless they know who it is and have matching DNA
Unless they do some crazy autosome to X coverage calculations (which is apparently possible says Nathan Pearson)
I think that you could distinguish two men from one woman if you looked for Y-chromosome genotypes as well. If you’re working with STRs, you should have enough distinct alleles to separate things out reasonably well.
And yes, if you’re able to quantify the DNA sufficiently, you would be able to infer the relative copy numbers of the various chromosomes that you started with. I don’t know how easy those methods are at the moment, or whether they require special equipment and/or conditions to get reliable results. I know people can do it in controlled experimental settings, but when you’re starting with the sort of field-collected sample that the FBI has, can you get reliable quantitative PCR data?
Anybody out there know?