Category Archives: science

Gene Patents Overturned — and Scalia’s Weird Dissenting Opinion

So, the Supreme Court just ruled that Myriad Genetics does not, in fact, have the right to patent two naturally occurring human genes, BRCA1 and BRCA2. This is good news, because . . . well, because patenting a gene is total bullshit.

If you’re not familiar, these two genes are important because genetic variation in their DNA sequences has been linked to breast cancer. So, the sequence of your DNA in these two genes can reveal if you have a higher-than-average risk of developing breast cancer. It was exactly this sort of test that prompted Angelina Jolie to undergo a preemptive double mastectomy.

The problem is that the tests were really, really expensive, because of Myriad’s patents. So, the immediate consequence of the ruling should be that the prices for these tests should come way, way down.

The opinion (PDF here, if you’re interested) focuses on the difference between “discovering” something — like the sequence or location of a gene — and “creating” something — like a thing that can be patented. So, a gene is a naturally occurring thing that can not be patented. However, if you take the mRNA from a gene and reverse-transcribe it to make cDNA, this new thing might still be patentable. But, the ruling explicitly notes that the cDNA would be a creation because of the removal of introns. So, cDNA from a single-exon gene might not be patentable.

The ruling explicitly states that it offers no opinion on the patentability of genes that have had their DNA sequences deliberately altered — leaving that question for another day.

It also points out limitations of the ruling with respect to plants. The goal here seems to be to ensure that this ruling is not interpreted as invalidating any plant patents covering plant strains that have been developed through selective breeding.

That all seems pretty straightforward. The ruling does seem to leave a number of issues surrounding the patenting of genetic material unresolved, but it is quite clear about which issues it is kicking down the field.

But then there’s this bit of weirdness at the end.

The opinion is pretty much unanimous, which is always nice. Except for a little, tiny bit of dissension from Antonin Scalia. Here is the complete text of his dissenting opinion:

I join the judgment of the Court, and all of its opinion except Part I–A and some portions of the rest of the opinion going into fine details of molecular biology. I am unable to affirm those details on my own knowledge or even my own belief. It suffices for me to affirm, having studied the opinions below and the expert briefs presented here, that the portion of DNA isolated from its natural state sought to be patented is identical to that portion of the DNA in its natural state; and that complementary DNA (cDNA) is a synthetic creation not normally present in nature.

I actually thought Part 1-A of the ruling was a little weird when I first read it. Not because it said anything strange or controversial, but because it read sort of like a Wikipedia entry on basic genetics, and contains a lot of details that don’t seem particularly relevant?.

Here’s the full text of the part of the ruling about which Scalia says, “I am unable to affirm those details on my own knowledge or even my own belief.”

Genes form the basis for hereditary traits in living organisms. See generally Association for Molecular Pathology v. United States Patent andTrademark Office, 702 F. Supp. 2d 181, 192–211 (SDNY 2010). The human genome consists of approximately 22,000 genes packed into 23 pairs of chromosomes. Each gene is encoded as DNA, which takes the shape of the familiar “double helix” that Doctors James Watson and Francis Crick first described in 1953. Each “cross-bar” in the DNA helix consists of two chemically joined nucleotides. The possible nucleotides are adenine (A), thymine (T), cytosine (C), and guanine (G), each of which binds naturally with another nucleotide: A pairs with T; C pairs with G. The nucleotide cross-bars are chemically connected to a sugar-phosphate backbone that forms the outside framework of the DNA helix. Sequences of DNA nucleotides contain the information necessary to create strings of amino acids, which in turn are used in the body to build proteins. Only some DNA nucleotides, however, code for amino acids; these nucleotides are known as “exons.” Nucleotides that do not code for amino acids, in contrast, are known as “introns.” 

Creation of proteins from DNA involves two principal steps, known as transcription and translation. In transcription, the bonds between DNA nucleotides separate, and the DNA helix unwinds into two single strands. A single strand is used as a template to create a complementary ribonucleic acid (RNA) strand. The nucleotides on the DNA strand pair naturally with their counterparts, with the exception that RNA uses the nucleotide base uracil (U) instead of thymine (T). Transcription results in a single strand RNA molecule, known as pre-RNA, whose nucleotides form an inverse image of the DNA strand from which it was created. Pre-RNA still contains nucleotides corresponding to both the exons and introns in the DNA molecule. The pre-RNA is then naturally “spliced” by the physical removal of the introns. The resulting product is a strand of RNA that contains nucleotides corresponding only to the exons from the original DNA strand. The exons-only strand is known as messenger RNA (mRNA), which creates amino acids through translation. In translation, cellular structures known as ribosomes read each set of three nucleotides, known as codons, in the mRNA. Each codon either tells the ribosomes which of the 20 possible amino acids to synthesize or provides a stop signal that ends amino acid production.

DNA’s informational sequences and the processes that create mRNA, amino acids, and proteins occur naturally within cells. Scientists can, however, extract DNA from cells using well known laboratory methods. These methods allow scientists to isolate specific segments of DNA — for instance, a particular gene or part of a gene—which can then be further studied, manipulated, or used. It is also possible to create DNA synthetically through processes similarly well known in the field of genetics. One such method begins with an mRNA molecule and uses the natural bonding properties of nucleotides to create a new, synthetic DNA molecule. The result is the inverse of the mRNA’s inverse image of the original DNA, with one important distinction: Because the natural creation of mRNA involves splicing that removes introns, the synthetic DNA created from mRNA also contains only the exon sequences. This synthetic DNA created in the laboratory from mRNA is known as complementary DNA (cDNA).

Changes in the genetic sequence are called mutations. Mutations can be as small as the alteration of a single nucleotide—a change affecting only one letter in the genetic code. Such small-scale changes can produce an entirely different amino acid or can end protein production altogether. Large changes, involving the deletion, rearrangement, or duplication of hundreds or even millions of nucleotides, can result in the elimination, misplacement, or duplication of entire genes. Some mutations are harmless, but others can cause disease or increase the risk of disease. As a result, the study of genetics can lead to valuable medical breakthroughs.

So, what do you think Scalia is objecting to? Is he just signaling that he thinks that the details of the molecular biology are not important here? Is it the claim that “Genes form the basis for hereditary traits in living organisms”? Is he unable to affirm with his own belief that G pairs with C? That uracil substitutes for thymine in RNA? That humans have 23 pairs of chromosomes?

Please share your most outlandish conspiracy theories in the comments!

At the Ronin Blog: An Outsider’s Theory of Everything

So, there’s been some buzz in the news the past few days about Eric Weinstein, a non-academic, who presented a lecture at Oxford on a mathematical theory that he has been working on for years. The theory aims to provide a “grand unification” for physics. The lecture and press coverage surrounding it have already received some negative reactions from academic scientists and science writers. Over at the Ronin Blog, I’ve tried to break down and evaluate those criticisms.

Basketball, Fraud, and the Broken University Incentive System

So, in 2005, Nature published a study by a group of researchers at Rutgers University showing that Jamaican teenagers who exhibited a high degree of body symmetry were judged (by other teenagers) to be better dancers than those with less symmetrical bodies. The study was an exploration of the idea that physical symmetry might be a trait subject to sexual selection — with “good dancers” serving as a proxy for “attractive mates.”

In 2013, Rutgers basketball coach Mike Rice received a $475,000 severance after video footage surfaced showing Rice pelting his players with basketballs, grabbing and shoving them, and shouting obscenities at them.

What do these two stories have in common? The stories reveal how Universities are not motivated by their stated missions (truth, education, etc.), so much as by a drive for money and prestige. As a friend of mine recently put it, “Pursuing truth is great, just don’t let it get in the way of your careerism.”

Now, you may be wondering how you can get a job at a University where you get to abuse students, and if you get caught and fired, you still walk away with nearly half a million dollars. I mean, you probably thought you could only get a job like that in an S & M dungeon, or maybe the CIA.

Here’s what pisses me off about the whole thing. Rice did not lose his job for abusing his players. He lost his job for embarrassing the University. When the video first surfaced, everything was handled in house. Rice was given a three-game suspension and a $50,000 fine — secretly. Then, when the video showed up on line, he was fired, along with an assistant coach and the athletic director (who received a $1.25 million severance package). (Here‘s one version)

Some players have argued that Rice’s behavior was not as bad as it appears on the video, and David Plotz argued in Slate that this type of behavior from his own high-school coach made him “a better player and a better man.” Maybe they’re right. Maybe we would all be better off if we were berated regularly for our shortcomings. There’s a balance in there somewhere between “preventing bullying” and raising a generation of hothouse flowers, and I’m not always convinced that we, as a society don’t veer too far in the direction of kid gloves.

The point is, you can imagine defending having a coach like Mike Rice. Likewise, you can imagine arguing that behavior like Rice’s is absolutely unacceptable in a University setting. What is bullshit is internally saying, yes, Mike Rice is great, he just needs to tone it down a notch, and then, when the public finds out, being all, “Oh my god! I can’t believe this was happening!” Basically, the sense you get is that no one was interested in figuring out what the right thing to do was. Everyone was motivated by protecting their own careers, and deflecting embarrassment away from the University.

So how does this relate to dancing teenagers in Jamaica?

The 2005 paper had seven co-authors, but the three key players are Robert Trivers and Lee Cronk, both from the Rutgers Anthropology Department, and William Brown (a postdoc at the time, now in the Psych department at University of Bedfordshire).

A couple of years after the paper came out, Trivers started to suspect that Brown had manipulated the data to make the results more compelling than they actually were. He went on to detail his analysis in a book published in 2009.

So what happens when one of the authors of a high profile paper comes out and claims that the results presented in the paper were not just wrong, but fraudulent? A news piece just out at Nature notes that Trivers approached Nature in 2008 about retracting the paper, but they were not interested. After all, why would they be? Nature’s business plan hinges on publishing studies that are exciting — studies that will be cited by a lot of other papers, and that will attract a lot of attention from the popular press.

This means that Nature (and other high-impact journals like Science) are particularly prone to a few types of bad papers. One is the shocking result that would lead to a major paradigm shift if it were true — but it is not true. One is the paper that seems to represent a huge advance, but is a modest advance that seems bigger than it is because it fails to cite much of the relevant literature. And one is the type we’re potentially talking about here, where a paper presents some really beautiful results, which are beautiful because the data has been manipulated in some way.

The problem is that this business model works great. The upsides of citations and press coverage apparently far outweigh the downsides of publishing incorrect, or incorrectly presented results. In general, there is not that much effort that goes into replicating results in science (arguably, a lot less than there should be). And even if a study fails to be replicated, and is shown to have been wrong, this typically happens years later, when no one is paying attention to the original study anymore.

That means that Nature and Science tend to have this funny status. In biology, anyway, everyone dismisses them as “magazines” or “tabloids.” At the same time, everyone is desperate to publish there.

DarwinEatsCake0085

 

So what do you do if you’re one of these journals? Well, if you are driven by a moral principle of promoting truth and spreading the best information possible, you put more effort into vetting papers up front, and when a paper is shown to be wrong, you actively try to correct the misperceptions stemming from the fact that people are saying, “Well, this was published in Nature, so it must be right.” On the other hand, if you are driven by the financial health of your corporation, you continue to publish things that will attract attention, and you retract or correct only when you absolutely have to, and you do it as quietly as possible, so as not to harm your brand.

Similarly, what do you do if you are a University associated with publishing a fraudulent paper? Again, if your guiding principles are truth and honest scholarship . . . oh, who am I kidding, those have not been the guiding principles of Universities for decades. As a University, your guiding principle is to maximize your money and prestige (to the extent that prestige begets money). That means that you will promote truth to the extent that you are required to do so, in order to maintain your brand as an important “research” and “education” institution. But what you really want is to keep the accusations of fraud as quiet as possible.

According to Trivers’s account of events, his going public with his book detailing the case for fraud, Rutgers was forced to undertake an investigation, since the work was supported in part by a grant from the NSF. Had they failed to do so, they might have become ineligible for NSF funds. When they completed their investigation in 2012, the University refused to make the results public. Trivers has posted the report on his own site (here). In brief, the report says that, yes, there is evidence for fraud in the paper (although the Nature news piece notes that Brown still denies having manipulated the data).

There’s another part of the story, though, that is much more disturbing. Trivers tells of an incident where he went to confront co-author Lee Cronk in his office about the situation. Trivers felt that Cronk had backed Brown’s version of events, and that the Rutgers report had vindicated his position over theirs, and he called Cronk a “punk.” Cronk then reported the incident to the campus police, and Trivers wound up getting banned from campus for alleged violation of the University’s anti-violence policy.

Now, clearly, I was not there, and so I have no independent source of knowledge about what happened that day. However, the version of event related by Trivers (read the whole thing here), sounds completely consistent with my own interactions with him. His version of the story suggests motivations and actions on the part of others:

It suggests that Cronk overreacted / acted punitively when Trivers confronted him. Maybe he was embarrassed and angry, and pretended to have felt physically threatened by Trivers in order to get back at him. Maybe he really did feel physically threatened, because, maybe, like most academics, he’s sort of a wuss. Maybe he overreacted in the heat of the moment, later wishing that he had not gotten the police involved. I don’t know. I’ve never met him, but this does pass the sniff test as the type of thing I can imagine a lot of Professor types doing.

It suggests that Rutgers sided against Trivers in the incident, perhaps out of retaliation for his original whistleblowing about the paper and the embarrassment and financial cost his actions imposed on the University. After all, if he had kept his mouth shut, they would not have had to undertake the whole fraud investigation, which I’m sure cost a bundle, and they would not have had the embarrassment of national press coverage. Again, clearly, I have no privileged details here, but this sounds like something you would do if you were a University that

I don’t want to pick on Rutgers in particular here. I think that this is the course of action that would have been taken by any University. That does not make it right, though.

I also don’t want to pick on Brown and Cronk, both of whom I suspect are perfectly bright and competent researchers. Even if Brown did actively manipulate the data, I see that as a pathology of the system, where getting a Nature paper can mean the difference between getting a job and not getting a job. If we didn’t hand out jobs and funding and promotions based on numbers of publications and citations — but rather on the quality and rigor of people’s research — there would not be incentives to manipulate or fabricate data. Even if Cronk maliciously got the Rutgers police to ban Trivers from campus, he did so in the context of a system that almost never rewards standing up and saying, “You know what, I was wrong.”

I know a lot of great scientists, people who are their own harshest critics, who are reluctant to publish results until they are 100% certain, who view even critical reviews of their manuscripts as good opportunities to make their work better. I also know a lot of scientists who are happy to cut corners, exaggerate the significance and importance of their results, and promote themselves, even, sometimes, at the expense of the truth.

Unfortunately, the way the current incentive system works, the latter group tend to have much better jobs than the former.

Until we can fix that, everything is going to conspire to encourage researchers to exaggerate, manipulate, and even fabricate their data. And everything is going to conspire to discourage Universities and journals from addressing and correcting fraudulent and erroneous results.

We have to reward honest, serious work that is not necessarily flashy. And we have to reward people who are willing to admit and correct mistakes.

Most of all, we have to reassert the principle of “truth” as our highest value in academia, and fight against its erosion by the secondary values of fame, prestige, and, most importantly, funding.

––––––––––––––––––––––––

Disclosure statements.

Since 2011, I have had an unpaid adjunct / visiting scholar position at Rutgers in the Genetics Department. During that time, I have gotten to know Trivers a bit, and we, in fact, have plans to work on a project together. In my experience, he is brilliant, boisterous, foul-mouthed, politically incorrect, and honest to a fault — exactly the sort of person all academics should aspire to be. Unfortunately, in their current incarnation, Universities much favor mediocre minds and personalities who will toe the corporate line, and who will bring funding to the University without inconvenient things like “truth” get in the way.

In elementary school, I was once on the losing side of a full-length, YMCA-league basketball game that ended 6 to 5. In junior high, I once attended a week-long basketball camp at the end of which I received the Orwellian “Most Improved Player” award.

How does the FBI know it found “Female DNA”?

So, the latest development in the investigation of the Boston Marathon bombing is a report that the FBI has identified “female DNA” on the remains of at least one of the two bombs used by Dzhokhar and Tamerlan Tsarnaev in the attack. According to the report, published first in the Wall Street Journal, some genetic material has been recovered, and the FBI has gone to collect a DNA sample from Katherine Russell, the widow of Tamerlan Tsarnaev, presumably to see if it matches the DNA recovered from the bomb.

Here’s the thing. How does the FBI know, or think it knows, that it has recovered “Female DNA”? Well, there aren’t a lot of details available yet, but there are a couple of possibilities.

First, let’s start with the basic genetics. Humans normally have 46 chromosomes, which come in 23 pairs, as well as some mitochondrial DNA. From the mitochondrial DNA, and 22 of the 23 other chromosome pairs, there is nothing to tell you whether the DNA came from a male or a female. The genetic difference between males and females resides in that last chromosome pair, the sex chromosomes. At the sex chromosomes, women have two X chromosomes, while men have one X chromosome and one Y chromosome.

So, if you have a discrete source of your DNA sample, like a hair, you could do a couple of things. You could test it for the presence of Y-chromosome genetic material. If the DNA source was female, you should not find any. Of course, that requires basing your conclusion on a negative result (the absence of a Y chromosome), which is not ideal, since it is possible that you could miss the material for technical reasons (e.g., failure of a particular chemical reaction).

The real thing you would look for to indicate that you had DNA from a female is the presence of two different X chromosomes. That means you need to identify the DNA sequence on part of the X chromosome. You can do this by actually sequencing a region of the chromosome, but this is probably unnecessarily expensive. After all, the vast majority of sites on the chromosome are going to be identical, not just in the X chromosomes in your sample, but in every X chromosome in every human being in the world.

What you can do instead is use tools that focus on specific sites that are already known to be variable in the population. Maybe there’s a particular site where it is known that some people have a C in their DNA sequence, while other people have a G. (This is referred to as a “polymorphic” site.) You simply ask whether that site in your particular sample has a C or a G, while simultaneously asking the analogous question about thousands of other sites.

If your DNA sample came from a male, you might find that the answer is C, or G, or whatever, at a particular site. What the answer is is not as important as the fact that there will be a single answer. If your DNA sample comes from a woman, you should find that sometimes you have a mixture of C and G. Of course, at a given site, you could still get a single answer, say, G, if both of the woman’s X chromosomes had a G at this position. However, if you look at a whole bunch of sites, you should find that a decent number of them indicate a mixture of two sequences — revealing the presence of two distinct X chromosomes, and therefore, a female.

But what if you don’t have a discrete genetic sample, like a hair, to work from? There’s not a lot of detail in the original article, so we have to speculate a bit here. (I’ve reached out to the reporters from the original piece, to see if there was some genetics-dork-relevant information that did not make it into the article. I’ll post an update if and when I hear back.) It seems likely that the bombs would also have carried DNA from one or both of the Tsarnaev brothers. Thus it is possible that the DNA collected by the FBI could contain a mixture of cells from multiple different individuals — like, say, they swabbed all around the bomb’s remains to collect their samples. What would they need to do then?

Well, first of all, let’s consider the case where you had a mixture of the two brothers’ DNA. The Y chromosomes from Dzhokhar and Tamerlan would be (virtually) identical, having both been copied from single the Y chromosome of their father. The two would have distinct X chromosomes, each of which would be a patchwork of pieces copied from their mother’s two X chromosomes.

So, the X chromosomes present in this sort of sample would look similar in some ways to the X chromosomes you would get from a female DNA sample: there would be some polymorphic sites where you would find a mixture of DNA sequences in your sample. However, we would not expect to find as many of these mixed sites as in a sample from a female. On average, half of the X chromosome sequence inherited by one brother would be (virtually) identical to the sequence inherited by the other brother. Although, depending on how, exactly, recombination plays out, the identical fraction of their X chromosomes could range anywhere from nearly none to nearly all of it. It is possible, just by chance, that the X chromosomes inherited by Dzhokhar and Tamerlan would be as different from each other as the two X chromosomes present in their mother. Of course, at this point, DNA samples have almost certainly been collected from both brothers, so that investigators would know exactly what sequences to expect.

But what if there was an even messier mixture of DNA, say with samples from both brothers as well as one or more additional people? Well, at some point, the procedure of just looking for mixed sites in the DNA sequence is going to run into trouble. At most of these polymorphic sites, there are just two variants circulating at any frequency in the population. So, simply identifying sites that are polymorphic within your sample will let you distinguish between one X chromosome and more than one, but will not necessarily do a good job of telling you exactly how many different X chromosomes are present.

One approach to deal with this situation would be to look at a different type of polymorphism, one where there are more than two sequence variants present in the population. The polymorphisms most commonly used in this sort of context are short tandem repeats (STRs). These are stretches of DNA where a short sequence, maybe four or so nucleotides long, is repeated over and over again. Due to the nature of the process by which DNA is copied, these sequences are prone to a particular type of mutation, where the number of repeats increases or decreases. So, I might have a stretch of 19 copies of the sequence TCTA at a particular site in my genome, while you might have 23 copies of TCTA at the same location in your genome.

By looking at a whole bunch of these STR sites, the FBI could probably tell if the DNA they collected contained two, three, four, or more distinct X chromosomes. And, these are most likely the sorts of sites they will be using to see if the DNA collected from the bomb matches the DNA collected from Katherine Russell.

Although the focus of this post has been on genetics, and specifically what it means for the FBI to say that they recovered some “female DNA,” I would be remiss if I did not include the caveat (emphasized in the original WSJ article) that there are a lot of different ways that someone’s DNA might have gotten onto one of the bombs without that person having been involved in the bombing — even if that person winds up being Tamerlan Tsarnaev’s widow.

E. O. Wilson is Wrong Again — not About Math, but About Collaboration

So, stop me if you’ve heard this one.

Q: What’s the difference between E. O. Wilson and a stopped clock?

A: A stopped clock does not have unlimited access to a national media platform to push its ridiculous ideas on the public.

Bazinga!

A couple of weeks ago, E. O. Wilson published a piece in the Wall Street Journal, where he argued that you don’t need math to be a great scientist. There are two parts of the argument. First, that science is more about conceptual thinking that does not require mathematical formalism to get at great ideas. Second, that when it comes time to mathematize, you can always find a mathematician to collaborate with.

He has already been taken to task in places like Slate and Huffington Post. The criticism in these pieces and most of the grumbling I’ve heard around the internet has been something along the lines of, “Nuh uh! Math is too important!” More specifically, that the era of math-free scientific discovery is over. That to operate at the frontier of science in the twenty-first century, you have to be able to grapple with the mathematical and statistical concepts required in the days of big data.

There’s something to that.

On the other hand, I’m sympathetic to what Wilson is trying to do here. I would hate to see anyone drop out of science because they don’t feel that they can keep up with the math. Of course, that’s partly because I think most people can do more math than they think they can, if you know what I mean.

But what I want to focus on here is Wilson’s view of collaboration. This, even more than math, is going to be the must-have talent of the twenty-first-century scientist. The thing about science is, an awful lot of it has been done. To get to the frontiers of human knowledge requires years of study, and, for those of us without super powers, a lot of specialization. At the same time, the most interesting and important problems often lie between areas of specialization, and require contributions from more than one area. Those most interesting and important problems are going to be solved by teams and networks of people who bring different skills to the table, and who are able to integrate their skills in a way that leads to a whole that is greater than the sum of the parts.

It’s that integration bit, I think, that Wilson does not really get. Wilson’s view of collaboration seems to go something like this: you make some observations about some biology, come up with some ideas, then you go find someone who can translate those into the language of mathematics.

Here’s the thing about translation, though. It can’t be unidirectional, or rather, it shouldn’t be unidirectional. At the risk of something or other (obscurity? pretentiousness?), I’m going to dip into poetry here. Robert Haas (Poet, Berkeley Professor, and Occupy hero), in addition to writing a bunch of his own extraordinary verse, has translated seven volumes of poetry by Czech Nobel laureate Czesław Miłosz. Or, more accurately, he collaborated with Miłosz to produce those translations.

After Miłosz’s death, Haas included their translation of Czesław Miłosz’s poem “O!” in his own volume Time and Materials. The poem is prefaced with this note about the translation process:

In his last years, when he had moved back to Kraków, we worked on the translation of his poems by e-mail and phone. Around the time of his ninetieth birthday, he sent me a set of poems entitled “Oh!” I wrote to ask him if he meant “Oh!” or “O!” and he asked me what the difference was and said that perhaps we should talk on the phone. On the phone I explained that “Oh!” was a long breath of wonder, that the equivalent was, possibly, “Wow!” and that “O!” was a caught breath of surprise, more like “Huh!” and he said, after a pause, “O! for sure.”  Here are the translations we made:

Now, if you’re not a writer and/or avid reader of poetry, it may seem strange to fuss over the difference between “Oh!” and “O!” But worrying about the difference between “Oh!” and “O!” is precisely the sort of thing that differentiates poetry from other forms of writing. Robert Frost famously defined poetry as “what gets lost in translation.” One way to unpack that statement is to say that translation can typically capture the basic meaning of words and phrases, but the part of writing that is poetry is the part that goes beyond that basic meaning. Poetry is about subtle differences in meaning. It is about connotation and cultural resonance. It is about the sounds that words make and the emotional responses that they trigger in someone who has encountered that word thousands of times before, in a wide variety of contexts.

These things almost never have simple one-to-one correspondences from one language to another. That means that a good translation of poetry requires a back-and-forth process. If you have a translator who is truly fluent in both languages — linguistically and culturally — this back-and-forth can happen within the brain of the translator. But, if your translation involves two people, who each bring their expertise from one side of the translation, they have to get on the phone every so often to discuss things like the difference between “O!” and “Oh!”

Doing mathematical or theoretical biology is exactly like this.

The theories and observations that build up in the biological domain exist in a language that is profoundly different from the language of mathematics. For theory in biology to be both accurate and relevant, it has to stay true to both of these languages. That means there has to be a vibrant, even obsessive, back-and-forth between the biological observations and concepts and the mathematical representations that attempt to capture and formalize them.

As in the poetry case, if you, as an individual scientist, have a deep understanding of the biology and a fluency in the relevant mathematics, that back-and-forth can happen in your own brain. Where E. O. WIlson is right is in his assertion that, if you don’t have the math, you can still make a contribution, by focusing on building your deep understanding of the biology, and then by finding yourself a mathematician you can collaborate with.

But there’s a trick.

If you’re going to follow this route, you have to sit down with your mathematician, and you have to walk through every single equation. You have to press them on what it means, and you have to follow the thread of what it implies. If you’re the mathematician, you have to sit down with your biologist and say, “If we assume A, B, and C, then mathematically that implies X, Y, and Z.” You have to understand where, in the biology, A, B, and C come from, and you have to work together to discover whether or not X, Y, and Z make any sense.

Basically, each of you has to develop some fluency in the other’s language, at least within the narrow domain covered by the collaboration. If you’re not willing to put in this level of work, then yes, you should probably consider a different career.

Now, maybe you think I’m being unfair to Wilson here. After all, he doesn’t explicitly say that you should hand your ideas over to the mathematicians and walk away. And obviously, I don’t have any privileged access to the inner workings of Wilson’s brain or the nature of his collaborations.

But let’s go back to a couple of years ago, when he collaborated with Martin Nowak and Corina Tarnita to write a controversial paper in which they argued that modeling the evolution of social behaviors based on “kin selection” was fundamentally flawed. That paper elicited a response from the community that is rare: multiple responses criticizing the paper on multiple fronts, including one letter (nominally) co-authored by nearly 150 evolutionary biologists.

I won’t go into the details here, as I have written about the paper and the responses multiple times in the past (here and here, in particular, or you can just watch my video synopsis of the criticism here).

Briefly, the controversial article (published in Nature, arguably the most prestigious journal for evolutionary biologists), completely misinterprets, misrepresents, and/or ignores the work done by other people in the field. It’s a little bit like if you published a physics paper where you said, “But what if the speed of light is constant in different frames of reference? No one has ever thought of that, so all of physics is wrong!” That’s an exaggeration, of course, but the flaws in Wilson’s paper are of this general type.

The weird thing about the paper is that it includes an extensive supplement, which cites much of the literature that is disregarded by the main text of the paper. It is exactly the sort of error that happens when you have something that is written by a disconnected committee, where the right hand does no know what the left hand is doing. Basically, it is hard to imagine a scenario in which someone could actually have understood the papers that are cited and discussed in the supplementary materials, and then turned around and, in good faith, have written that paper.

That leaves us with a few possible explanations. It could be that the authors were just not smart enough to understand what they were talking about. Or it could be that they deliberately misrepresented prior work to make their own work seem more original and important. For the purposes of our discussion here, let’s assume that neither of these explanations is accurate.

Instead, let’s assume that everyone involves is fundamentally competent, and was acting in good faith. In that case, perhaps the problem came from a failure of collaboration. E. O. Wilson probably knows more than just about anyone else in the world about the biology underlying the evolution of social behavior — especially among eusocial insects. Martin Nowak is a prominent and prolific mathematical biologist. Corina Tarnita was a postdoc at the time, with a background primarily in mathematics.

Wilson, as he acknowledges, lacks the mathematical skills required to really understand what the models of kin selection do and do not assume and imply. Tarnita, I imagine, has these skills, but as a young researcher coming out of math, perhaps lacked the biological knowledge and the perspective on the field to understand how the math related to the prior literature and the state of the field. Nowak, in principle, had both the mathematical skills and the biological experience to bridge this gap. He’s a curious case, though, as he, rather famously in the field, is interested in building and solving models, and has little interest in what has been done by other people, or in chasing down the caveats and nuanced implications of his work.

Among the three of them, Wison, Nowak, and Tarnita have all of the skills and knowledge required to write an accurate analysis of models of kin selection. But if assembling the requisite skills was all that was necessary, that Nature paper would have been very different — in much the same way that you could dump a pile of gears, shafts, and pistons in my driveway, and I could drive away in a Camaro.

The challenge of interdisciplinary collaboration is to combine your various skills in a way that creates something greater than the sum of the parts. If you can master this, you’ll be able to make great contributions to whatever field you apply your skills and interests to.

In the case of Wilson’s disastrous paper, what we got was a situation where the deficits that each of the researchers brought to the table combined to create something greater than the sum of the parts. Sadly, I get the feeling that Wilson does not understand this difference, that he thinks collaborating with mathematicians means explaining your intuition, and then waiting for them to “prove” them.

So, yes, you can be a great scientist in the twenty-first century, even if you don’t have great mathematical skills yourself. But, just as Robert Haas called up Czesław Miłosz on the phone to discuss the difference between “O!” and “Oh!” maybe you’re going to have to call up your mathematician collaborators to talk about the difference between O(x) and o(x). You don’t necessarily have to understand the difference in general, but you do need to understand the difference and its implications in the context of the system you’re studying, otherwise you’re not really doing science at all.

How Many English Tweets are Actually Possible?

So, recently (last week, maybe?), Randall Munroe, of xkcd fame, posted an answer to the question “How many unique English tweets are possible?” as part of his excellent “What If” series. He starts off by noting that there are 27 letters (including spaces), and a tweet length of 140 characters. This gives you 27140 — or about 10200 — possible strings.

Of course, most of these are not sensible English statements, and he goes on to estimate how many of these there are. This analysis is based on Shannon’s estimate of the entropy rate for English — about 1.1 bits per letter. This leads to a revised estimate of 2140 x 1.1 English tweets, or about 2 x 1046. The rest of the post explains just what a hugely big number that is — it’s a very, very big number.

The problem is that this number is also wrong.

It’s not that the calculations are wrong. It’s that the entropy rate is the wrong basis for the calculation.

Let’s start with what the entropy rate is. Basically, given a sequence of characters, how easy is it to predict what the next character will be. Or, how much information (in bits) is given by the next character above and beyond the information you already had.

If the probability of a character being the ith letter in the alphabet is pi, the entropy of the next character is given by

– Σ pi log2 pi

If all characters (26 letter plus space) were equally likely, the entropy of the character would be log227, or about 4.75 bits. If some letters are more likely than others (as they are), it will be less. According to Shannon’s original paper, the distribution of letter usage in English gives about 4.14 bits per character. (Note: Shannon’s analysis excluded spaces.)

But, if you condition the probabilities on the preceding character, the entropy goes down. For example, if we know that the preceding character is a b, there are many letters that might follow, but the probability that the next character is a c or a z is less than it otherwise might have been, and the probability that the next character is a vowel goes up. If the preceding letter is a q, it is almost certain that the next character will be a u, and the entropy of that character will be low, close to zero, in fact.

When we go to three characters, the marginal entropy of the third character will go down further still. For example, t can be followed by a lot of letters, including another t. But, once you have two ts in a row, the next letter almost certainly won’t be another t.

So, the more characters in the past you condition on, the more constrained the next character is. If I give you the sequence “The quick brown fox jumps over the lazy do_,” it is possible that what follows is “cent at the Natural History Museum,” but it is much more likely that the next letter is actually “g” (even without invoking the additional constraint that the phrase is a pangram). The idea is that, as you condition on longer and longer sequences, the marginal entropy of the next character asymptotically approaches some value, which has been estimated in various ways by various people at various times. Many of those estimates are in the ballpark of the 1.1 bits per character estimate that gives you 1046 tweets.

So what’s the problem?

The problem is that these entropy-rate measures are based on the relative frequencies of use and co-occurrence in some body of English-language text. The fact that some sequences of words occur more frequently than other, equally grammatical sequences of words, reduces the observed entropy rate. Thus, the entropy rate tells you something about the predictability of tweets drawn from natural English word sequences, but tells you less about the set of possible tweets.

That is, that 1046 number is actually better understood as an estimate of the likelihood that two random tweets are identical, when both are drawn at random from 140-character sequences of natural English language. This will be the same as number of possible tweets only if all possible tweets are equally likely.

Recall that the character following a q has very low entropy, since it is very likely to be a u. However, a quick check of Wikipedia’s “List of English words containing Q not followed by U” page reveals that the next character could also be space, a, d, e, f, h, i, r, s, or w. This gives you eleven different characters that could follow q. The entropy rate gives you something like the “effective number of characters that can follow q,” which is very close to one.

When we want to answer a question like “How many unique English tweets are possible?” we want to be thinking about the analog of the eleven number, not the analog of the very-close-to-one number.

So, what’s the answer then?

Well, one way to approach this would be to move up to the level of the word. The OED has something like 170,000 entries, not counting archaic forms. The average English word is 4.5 characters long (5.5 including the trailing space). Let’s be conservative, and say that a word takes up seven characters. This gives us up to twenty words to work with. If we assume that any sequence of English words works, we would have 4 x 10104 possible tweets.

The xkcd calculation, based on an English entropy rate of 1.1 bits per character predicts only 1046 distinct tweets. 1046 is a big number, but 10104 is a much, much bigger number, bigger than 1046 squared, in fact.

If we impose some sort of grammatical constraints, we might assume that not every word can follow every other word and still make sense. Now, one can argue that the constraint of “making sense” is a weak one in the specific context of Twitter (see, e.g., Horse ebooks), so this will be quite a conservative correction. Let’s say the first word can be any of the 170,000, and each of the following zero to nineteen words is constrained to 20% of the total (34,000). This gives us 2 x 1091 possible tweets.

That’s less than 1046 squared, but just barely.

1091 is 100 billion time the estimated number of atoms in the observable universe.

By comparison, 1046 is teeny tiny. 1046 is only one ten-thousandth of the number of atoms in the Earth.

In fact, for random sequences of six (seven including spaces) letter words to total only to 1046 tweets, we would have to restrict ourselves to a vocabulary of just 200 words.

So, while 1046 is a big number, large even in comparison to the expected waiting time for a Cubs World Series win, it actually pales in comparison to the combinatorial potential of Twitter.

One final example. Consider the opening of Endymion by John Keats: “A thing of beauty is a joy for ever: / Its loveliness increases; it will never / Pass into nothingness;” 18 words, 103 characters. Preserving this sentence structure, imagine swapping out various words, Mad-Libs style, introducing alternative nouns for thing, beauty, loveliness, nothingness, alternative verbs for is, increaseswill / pass prepositions for of, into, and alternative adverbs for for ever and never.

Given 10000 nouns, 100 prepositions, 10000 verbs, and 1000 adverbs, we can construct 1038 different tweets without even altering the grammatical structure. Tweets like “A jar of butter eats a button quickly: / Its perspicacity eludes; it can easily / swim through Babylon;”

That’s without using any adjectives. Add three adjective slots, with a panel of 1000 adjectives, and you get to 1047 — just riffing on Endymion.

So tweet on, my friends.

Tweet on.

C. E. Shannon (1951). Prediction and Entropy of Written English Bell System Technical Journal, 30, 50-64

This seems like a weird way to fix peer review

So, it is common to hear scientists complain about peer review, about how it is “broken,” and there is probably something to that. Over at Backreaction, a blog by theoretical physicists at The Economist, Sabine Hossenfelder argues that the future of peer review, on that will fix its problems, is already here, in the form of what she calls “pre-print peer review.”

The idea is to separate the peer review process from the journals, and attach it to the manuscript. So, if I write a manuscript, I would send it out, for a fee, to a peer review service, which might be run by a publishing company, or by some other entity. According to Hossenfelder, once you got back the review,

This report you could then use together with submission of your paper to a journal, but you could also use it with open access databases. You could even use it in company with your grant proposals if that seems suitable.

Okay, so maybe Hossenfelder has a very different perception of what is wrong with peer review than I do. If your ultimate goal is to submit the manuscript for traditional publication, this seems problematic and, ultimately, unsustainable.

Just think for a moment about the dynamics and market pressures. First of all, if authors have control over the reviews that they purchase, one might expect that they will only attach these reviews to their papers when those reviews are positive. Furthermore, if there are multiple peer-review services, the market pressures would presumably drive them all towards more and more positive reviews. Basically, it sets up a system that will be unraveled by “review inflation.” Thinking as a journal editor or grant reviewer, I suspect that I would quickly become very skeptical of these reviews. And I certainly would not be willing to substitute their recommendations for my own judgment and the opinions of referees I selected.

You can imagine ways to address this problem. For instance, certain peer-review services could build reputations as tough reviewers, so that their “seal of approval” meant more. At this point, however, you’ve merely layered on another set of reputations and rankings that must be kept track of. While this approach is billed as a way to simplify the peer review process and make it cheaper and more efficient, I have difficulty imagining that it would not do just the opposite.

Hossenfelder argues that this new model of peer review is not just desirable, but inevitable

irrespective of what you think about this, it’s going to happen. You just have to extrapolate the present situation: There is a lot of anger among scientists about publishers who charge high subscription fees. And while I know some tenured people who simply don’t bother with journal publication any more and just upload their papers to the arXiv, most scientists need the approval stamp that a journal publication presently provides: it shows that peer review has taken place. The easiest way to break this dependence on journals is to offer peer review by other means. This will make the peer review process more to the point and more effective.

First, in what way does this have anything to do with high subscription fees? Most open access journals have pretty much the same peer-review structure that subscription journals have. There are legitimate problems with the current dominance of scientific publishing by for-profit corporations that use free labor to evaluate publicly funded science, and then turn around and charge people a lot of money to access that science. However, given the expanding number of high-quality open-access journals that use the traditional peer review system, it seems like peer review is orthogonal to this issue.

Second, yes, there are many people who feel that they need the peer-review stamp of approval. The potential benefit here is that an author could pay for peer review and then post their work on the arXiv, thereby circumventing journals altogether, and allowing more junior researchers to pursue this publishing model. It just seems to me that an author-funded system that is so easily gamed is unlikely to provide any real sense of legitimacy to anyone with this specific concern.

Third, when she says that this will make the process “more to the point and more effective,” I honestly can’t imagine what mechanism she has in mind. Given that it is published in The Economist, my suspicion is that this claim is based on some sort of invisible hand argument — that if we just free peer review from its shackles, it will become efficient and beautiful. But maybe that’s unfair on my part.

The post goes on to point to two outfits that are already working to implement this model: Peerage of Science (which is up and running) and Rubriq (which is getting started). Rubriq seems focused on the author-pay model, creating a standard review format that could travel from journal to journal. Peerage provides reviews free to authors, and it paid by journals when they use a review and then publish a paper. I’ve not seen anything that addresses the problem of review inflation.

I don’t know. Maybe there’s something I’m missing here. What do you guys think?

Two more from Fisher and Haldane

So, previously I introduced you to Darwin Eats Cake’s two newest characters, R. A. Fisher’s Pipe and J. B. S. Haldane’s Mustache. Well, the comedy duo have provided two more installations of their series, tentatively entitled, “Stuff Sitting in Jars on a Shelf, Talking.”

I would not necessarily have predicted this, but as it turns out, Fisher’s Pipe has a really juvenile sense of humor.

It’s sort of sad, really.

Best URL for sharing: http://www.darwineatscake.com/?id=151
Permanent image URL for hotlinking or embedding: http://www.darwineatscake.com/img/comic/151.png
Best URL for sharing: http://www.darwineatscake.com/?id=152
Permanent image URL for hotlinking or embedding: http://www.darwineatscake.com/img/comic/152.png

Epigenetics and Homosexuality

So, last week featured a lot of news about a paper that came out in the Quarterly Review of Biology titled “Homsexuality as a Consequence of Epigenetically Canalized Sexual Development.” The authors were Bill Rice (UCSB), Urban Friberg (Uppsala U), and Sergey Gavrilets (U Tennessee). The paper got quite a bit of press. Unfortunately, most of that press was of pretty poor quality, badly misrepresenting the actual contents of the paper. (PDF available here.)

I’m going to walk through the paper’s argument, but if you don’t want to read the whole thing, here’s the tl;dr:

This paper presents a model. It is a theory paper. Any journalist who writes that the paper “shows” that homosexuality is caused by epigenetic inheritance from the opposite sex parent either 1) is invoking a very non-standard usage of the word “shows,” or 2) was too lazy to read the actual paper, and based their report on the press release put out by the National Institute for Mathematical and Biological Synthesis.

That’s not to say that this is a bad paper. In fact, it’s a very good paper. The authors integrate a lot of different information to come up with a plausible biological mechanism for epigenetic modifications to exert influence on sexual preference. They demonstrate that such a mechanism could be favored by natural selection under what seem to be biologically realistic conditions. Most importantly, they formulate their model into with clear predictions that can be empirically tested.

But those empirical tests have not been carried out yet. And, in biology, when we say that a paper shows that X causes Y, we generally mean that we have found an empirical correlation between X and Y, and that we have a mechanistic model that is well enough supported that we can infer causation from that correlation. This paper does not even show a correlation. It shows that it would probably be worth someone’s time to look for a particular correlation.

As a friend wrote to me in an e-mail,

I found it a much more interesting read than I thought I would from the press it’s getting, which now rivals the bullshit surrounding the ENCODE project for the most bullshitty bullshit spin of biology for the popular press. A long-winded-but-moderately-well-grounded-in-real-biology mathematical model does not proof make.

Exactly.

Okay, now the long version.

The Problem of Homosexuality

The first thing to remember is that when an evolutionary biologist talks about the “problem of homosexuality,” this does not imply that homosexuality is problematic. All it is saying is that a straightforward, naive application of evolutionary thinking would lead one to predict that homosexuality would not exist, or would be vanishingly rare. The fact that it does exist, and at appreciable frequency, poses a problem for the theory.

In fact, this is a good thing to keep in mind in general. The primary goal of evolutionary biology is to understand how things in the world came to be the way they are. If there is a disconnect between theory and the world, it is ALWAYS the theory that is wrong. (Actually, this is equally true for any science / social science.)

Simply put, heterosexual sex leads to children in a way that homosexual sex does not. So, all else being equal, people who are more attracted to the opposite sex will have more offspring than will people who are less attracted to the opposite sex.

[For rhetorical simplicity, I will refer specifically to “homosexuality” here, although the arguments described in the paper and in this post are intended to apply to the full spectrum of sexual orientation, and assume more of a Kinsey-scale type of continuum.]

The fact that a substantial fraction of people seem not at all to be attracted to the opposite sex suggests that all else is not equal.

Evolutionary explanations for homosexuality are basically efforts to discover what that “all else” is, and why it is not equal.

There are two broad classes of possible explanation.

One possibility is that there is no biological variation in the population for a predisposition towards homosexuality. Then, there would be nothing for selection to act on. Maybe the potential for sexual human brain simply has an inherent and uniform disposition. Variation in sexual preference would then be the result of environmental (including cultural) factors and/or random developmental variation.

This first class of explanation seems unlikely because there is, in fact, a substantial heritability to sexual orientation. For example, considering identical twins who were raised separately, if one twin is gay, there is a 20% chance that the other will be as well.

Evidence suggests that sexual orientation has a substantial heritable component. Image: Comic Blasphemy.

This points us towards the second class of explanation, which assumes that there is some sort of heritable genetic variation that influences sexual orientation. Given the presumably substantial reduction in reproductive output associated with a same-sex preference, these explanations typically aim to identify some direct or indirect benefit somehow associated with homosexuality that compensates for the reduced reproductive output.

One popular variant is the idea that homosexuals somehow increase the reproductive output of their siblings (e.g., by helping to raise their children). Or that homosexuality represents a deleterious side effect of selection for something else that is beneficial, like how getting one copy of the sickle-cell hemoglobin allele protects you from malaria, but getting two copies gives you sickle cell anemia.

It was some variant of this sort of idea that drove much of the research searching for “the gay gene” over the past couple of decades.  The things is, though, those searches have failed to come up with any reproducible candidate genes. This suggests that there must be something more complicated going on.

The Testosterone Epigenetic Canalization Theory

Sex-specific development depends on fetal exposure to androgens, like Testosterone and related compounds. This is simply illustrated by Figure 1A of the paper:

Figure 1A from the paper: a simplified picture of the “classical” view of sex differentiation. T represents testosterone, and E represent Estrogen.

SRY is the critical genetic element on the Y chromosome that triggers the fetus to go down the male developmental pathway, rather than the default female developmental pathway. They note that in the classical model of sex differentiation, androgen levels differ substantially between male and female fetuses.

The problem with the classical view, they rightly argue, is that androgen levels are not sufficient in and of themselves to account for sex differentiation. In fact, there is some overlap between the androgen levels between XX and XY fetuses. Yet, in the vast majority of cases, the XX fetuses with the highest androgen levels develop normal female genitalia, while the XY fetuses with the lowest androgen levels develop normal male genitalia. Thus, there must be at least one more part of the puzzle.

The key, they argue, is that tissues in XX and XY fetuses also show differential response to androgens. So, XX fetuses become female because they have lower androgen levels and they respond only weakly to those androgens. XY fetuses become male because they have higher androgen levels and they respond more strongly to those androgens.

This is illustrated in their Figure 1B:

Sex-specific development is thus canalized by some sort of mechanism that they refer to generically as “epi-marks.” That is, they imagine that there must be some epigenetic differences between XX and XY fetuses that encode differential sensitivity to Testosterone.

All of this seems well reasoned, and is supported by the review of a number of studies. It is worth noting, however, that we don’t, at the moment, know exactly which sex-specific epigenetic modifications these would be. One could come up with a reasonable list of candidate genes, and look for differential marks (such as DNA methylation or various histone modifications) in the vicinity of those genes. However, this forms part of the not-yet-done empirical work required to test this hypothesis, or, in the journalistic vernacular, “show” that this happens.

Leaky Epigenetics and Sex-Discordant Traits

Assuming for the moment that there exist various epigenetic marks that 1) differ between and XX and XY fetuses and 2) modulate androgen sensitivity. These marks would need to be established at some point early on in development, perhaps concurrent with the massive, genome-wide epigenetic reprogramming that occurs shortly after fertilization.

The theory formulated in the paper relies on two additional suppositions, both of which can be tested empirically (but, to reiterate, have not yet been).

The first supposition is that there are many of these canalizing epigenetic marks, and that they vary with respect to which sex-typical traits they canalize. So, some epigenetic marks would canalize gonad development. Other marks would canalize sexual orientation. (Others, they note, might canalize other traits, like gender identity, but this is not a critical part of the argument.)

The model presented in this paper suggests that various traits that are associated with sex differences may be controlled by distinct genetic elements, and that sex-typical expression of those traits may rely on epigenetic modifications of those genes. Image: Mikhaela.net.

The second supposition is that the epigenetic reprogramming of these marks that normally happens every generation is somewhat leaky.

There are two large-scale rounds of epigenetic reprogramming that happen every generation. One occurs during gametogenesis (the production of eggs or sperm). The second happens shortly after fertilization. What we would expect is that any sex-specifc epigenetic marks would be removed during one of these phases (although it could happen at other times).

For example, a gene in a male might have male-typical epigenetic marks. But what happens if that male has a daughter? Well, normally, those marks would be removed during one of the reprogramming phases, and then female-typical epigenetic marks would be established at the site early in his daughter’s development.

The idea here is that sometimes this reprogramming does not happen. So, maybe the daughter inherits an allele with male-typical epigenetic marks. If the gene influences sexual orientation by modulating androgen sensitivity, then maybe the daughter develops the (male-typical) sexual preference for females. Similarly, a mother might pass on female-typical epigenetic marks to her son, and these might lead to his developing a (female-typical) sexual preference for males.

So, basically, in this model, homosexuality is a side effect of the epigenetic canalization of sex differences. Homosexuality itself is assumed to impose a fitness cost, but this cost is outweighed by the benefit of locking in sexual preference in those cases where reprogramming is successful (or unnecessary).

Sociological Concerns

Okay, if you ever took a gender-studies class, or anything like that, this study may be raising a red flag for you. After all, the model here is basically that some men are super manly, and sometimes their manliness leaks over into their daughters. This masculinizes them, which makes them lesbians. Likewise, gay men are gay because they were feminized by their mothers.

That might sound a bit fishy, like it is invoking stereotype-based reasoning, but I don’t think that would be a fair criticism. Nor do I think it raises any substantial concerns about the paper in terms of its methodology or its interpretation. (Of course, I could be wrong. If you have specific concerns, I would love to hear about them in the comments.) The whole idea behind the paper is to treat chromosomal sex, gonadal sex, and sexual orientation as separate traits that are empirically highly (but not perfectly) correlated. The aim is to understand the magnitude and nature of that empirical correlation.

The other issue that this raises is the possibility of determining the sexual orientation of your children, either by selecting gametes based on their epigenetics, or by reprogramming the epigenetic state of gametes or early embryos. This technology does not exist at the moment, but it is not unreasonable to imagine that it might exist within a generation. If this model is correct in its strongest form (in that the proposed mechanism fully accounts for variation in sexual preference), you could effectively choose the sexual orientation of each of your children.

Image: Brainless Tales.

This, of course, is not a criticism of the paper. The biology is what it is. It does raise certain ethical questions that we will have to grapple with at some point. (Programming of sexual orientation will be the subject of the next installment of the Genetical Book Review.)

Plausibility/Testability Check

The question one wants to ask of a paper like this is whether it is 1) biologically plausible, and 2) empirically testable. Basically, my read is yes and yes. The case for the existence of mechanisms of epigenetic canalization of sex differentiation seems quite strong. We know that some epigenetic marks seem to propagate across generations, evading the broad epigenetic reprogramming. We don’t understand this escape very well at the moment, but the assumptions here are certainly consistent with the current state of our knowledge. And, assuming some rate of escape, the model seems to work for plausible-sounding parameter values.

Testing is actually pretty straightforward (conceptually, if not technically). Ideally, empirical studies would look for sex-specific epigenetic modifications, and for variation in these modifications that correlate with variation in sexual preference. The authors note that one test that could be done in the short term would be to do comparative epigenetic profiling of the sperm of men with and without homosexual daughters.

As Opposed to What?

The conclusions reached by models in evolution are most strongly shaped by the set of alternatives that are considered in the model. That is, a model might find that a particular trait will be selectively favored, but this always needs to be interpreted in the context of that set of alternatives. Most importantly, one needs to ask if there are likely to be other evolutionarily accessible traits that have been excluded from the model, but would have changed the conclusions of the model if they had been included.

The big question here is the inherent leakiness of epigenetic reprogramming. A back-of-the-envelope calculation in the paper suggests that for this model to fully explain the occurrence of homosexuality (with a single gene controlling sexual preference), the rate of leakage would have to be quite high.

An apparent implication of the model is that there would then be strong selection to reduce the rate at which these epigenetic marks are passed from one generation to the next. In order for the model to work in its present form, there would need to be something preventing natural selection from finding this solution.

Possibilities for this something include some sort of mechanistic constraint (it’s just hard to build something that reprograms more efficiently than what we have) or some sort of time constraint (evolution has not had enough time to fix this). The authors seem to favor this second possibility, as they argue that the basis of sexual orientation in humans may be quite different from that in our closest relatives.

On the other hand this explanation could form a part of the explanation for homosexuality with much lower leakage rates.

What Happened with the Press?

So, how do we go from what was a really good paper to a slew of really bad articles? Well, I suspect that the culprit was this paragraph from the press release from NIMBios:

The study solves the evolutionary riddle of homosexuality, finding that “sexually antagonistic” epi-marks, which normally protect parents from natural variation in sex hormone levels during fetal development, sometimes carryover across generations and cause homosexuality in opposite-sex offspring. The mathematical modeling demonstrates that genes coding for these epi-marks can easily spread in the population because they always increase the fitness of the parent but only rarely escape erasure and reduce fitness in offspring.

If you know that this is a pure theory paper, this is maybe not misleading. Maybe. But phrases like “solves the evolutionary riddle of homosexuality” and “finding that . . . epi-marks . . . cause homosexuality in opposite-sex offspring,” when interpreted in the standard way that I think an English speaker would interpret them, pretty strongly imply things about the paper that are just not true.

Rice, W., Friberg, U., & Gavrilets, S. (2012). Homosexuality as a Consequence of Epigenetically Canalized Sexual Development The Quarterly Review of Biology, 87 (4), 343-368 DOI: 10.1086/668167

Update: Also see this excellent post on the subject by Jeremy Yoder over at Nothing in Biology Makes Sense.

2012 Gift Guide for Population Geneticists

So, it’s that time of year again, when you have to come up with gift ideas for the population geneticist in your life. Personally, I like cash, but if you insist on coming up with personalized gifts, here are some ideas for you:

1. Mathematical Population Genetics, by Warren Ewens

This book was originally published in 1979. When I was in grad school, it had been out of print for years. People would pass around xeroxed copies that had been made from other xeroxed copies.

Finally, a couple of years ago, the second edition came out. So now the population geneticist in your life can own their very own book-shaped copy.

Of course, it’s a little bit pricey. Fortunately, there are plenty of other gifts on this list for the folks about whom you don’t care enough to buy this book. 🙁

2. The Gospel of the Flying Spaghetti Monster, by Bobby Henderson

Okay, cheapskate, maybe this is a little bit more your speed. This is the perfect gift for the pastafarian population geneticist.

Or it could be a good evangelical gift for those who have not yet been touched by his noodly appendage.

And look, it comes with one of those little ribbon things that means you don’t have to use your wadded up Starbucks receipt as a bookmark!

3. Gene Pool Shirt

Get it?

It’s a jean shirt!

With a pool ball on it!

Great conversation starter!

Also comes in Flaming 8-Ball!

4. Obnoxious Car Decals

There are a number of different aggressively obnoxious things that you can get for your car, like a T-Rex eating a Jesus fish. But if your goal in life is to get your headlights smashed by some nice religious folk, nothing will beat this “Procreation Car Emblem.”

If you’re in the mood for something a little more subtle, there are some good options in the “Customers who bought this item also bought” section.

5. Remarkable, by Lizzie Foley

Okay, okay, I know what you’re thinking. That this is shameless promotion of my wife’s book, and has nothing to do with population genetics.

Yes, fine, it’s shameless, but it’s a great book, perfect for the population geneticist with one or more F1s a home (ages 8 and up!). And it does feature a cameo appearance by population geneticist and UCLA Professor John Novembre. For reals!

Also, the story features boy and girl identical twins. So, analyze that.

6. DNA Earrings

What’s that?

I can’t hear you.

I’ve got DNA in my ear.

7. DNA Portraits

Okay, check this out. You send in a swab of DNA, and $199, and they’ll send you a giant picture of a gel, which is I guess is supposed to be some fraction of your genome? Maybe? It looks like there are supposed to be eight sample lanes, and it’s that old-school sequencing analysis where each dideoxynucleotide terminator gets its own lane. So this might be about forty bases of sequence. Maybe?

To be honest, though, this looks a lot more like a protein gel to me. Maybe they use your DNA, clone a little tiny homunculus of you, grind it up, trypsin digest it, and this is that gel.

If that wasn’t bad enough, you also have the option of getting your DNA made into a giant QR code poster (that no one will ever scan).

For the money, I’d go with two copies of the Ewens book.

8. Personalized Genetic Analysis

The classic here is 23 and Me.

Okay, maybe you’re thinking, no, a real population geneticist would not want one of these goofy personalized genetic analysis things. Those are for amateurs, mere heredity enthusiasts. Will my population geneticist friend be offended by the ridiculous pinpointing of their Y-chromosome and mitochondrial ancestry, or the ridiculous breakdown of racial composition, or the ridiculous risk-factor analysis?

Well, that’s the beauty of this gift. If they are the wild-eyed, naive sort of population geneticist, they’re just going to be so gosh-darned excited to get all that cool information. If they’re the bitter, cynical sort of population geneticist (most of them, in my experience), you’ll be giving them the gift of feeling knowledgable and superior!

If you want to surprise them, order the kit and swab their cheek while they’re sleeping.

If you really want to surprise them, order a second kit, swab a random guy, get the results, and claim that the results are from their father.

9. Darwin Eats Cake Stuff

Yeah, you thought plugging my wife’s book was shameless? I’ll show you shameless! Check out these new items from the official Darwin Eats Cake store:

Look! It’s a mug illustrating the academic funding cycle: papers->money->caffeine->papers.
Also works for non-population-geneticist academic types.

Look! It’s a trucker hat featuring Guillaume the Adaptationist Goat’s credo!

Look! It’s a t-shirt featuring J B S Haldane’s moustache in a jar!

Don’t see anything you like? You can check out the comics and contact the “artist” here to submit special requests.

10. Ronald Reagan Riding a Velociraptor with a Machine Gun

Okay, so this one really has nothing to do with population genetics, but it is 100% pure awesome.

Prints available in 11×17 or 24×36 from SharpWriter at deviantART.

Other ideas? Leave them in the comments.