What a Remarkable Paperback!!

So, guess what came in the mail yesterday? That’s right! It’s the paperback edition of Remarkable, by Lizzie K. Foley. The hardback came out last April under the Dial imprint of Penguin. The paperback is through Puffin (also part of Penguin), and has a completely new cover. Here’s a stack of them:

Foreground: The nineteen best books ever written. Background: Our new kitchen wall color.

And here’s a close-up of the cover, so that you can really see the awesome cover art by Fernando Juarez, which has a bit of a Dr. Seuss-ey vibe:

Pictured: Jane, The Pirate Ship Mozart Kugeln, Lucky the Lake Monster, The Mansion at the Top of Remarkable Hill, the Bell Tower (under construction). Not pictured: the nefariously identical Grimlet Twins, Melissa and Eddie, Remarkable’s School for the Remarkably Gifted, Ebb, Jeb, Flotsam, Madame Gladiola, Penelope Hope Adelaide Catalina, Anderson Brigby Bright Doe III, Lucinda Wilhelmina Hinojosa, Mad Captain Penzing the Horrific, and more.

That means that, yes, you can now get this excellent book in paperback form, which is both more affordable and more bendable than the original!

Should you buy it? Yes! Why? Let me tell you!

Here are the pull quotes from just a few of the positive reviews Remarkable has received:

From the New York Times:

A lot of outlandish entertainment.

From Booklist:

A remarkable middle-grade gem.

From Kirkus Reviews:

A rich, unforgettable story that’s quite simply — amazing.

The story centers on the town of Remarkable, where all of the residents are gifted, talented, and extraordinary. Everyone in the town is a world-class musician, or writer, or architect . . .

Except for Jane.

In fact, she is the only student in the entire town who attends the public school, rather than Remarkable’s School for the Remarkably Gifted. But everything changes when the Grimlet Twins join her class and pirates arrive in town. Plus, there’s a weather machine, a psychic pizza lady, a shy lake monster, and dentistry.

The book is both funny and thoughtful. You can enjoy it as a goofy adventure full of wacky characters and wordplay. It’s for ages eight and up, but if you’re an grown up who likes kids’ books at all, you’ll find that there is a lot here to engage the adult reader.

Speaking of which, you can also read it as a subversive commentary on a culture that pushes children towards excellence rather than kindness and happiness. As Jane’s Grandpa John says near the end of the book:

The world is a wonderfully rich place, especially when you aren’t trapped by thinking that you’re only as worthwhile as your best attribute. . . . It’s the problem with Remarkable, you know. . . . Everyone is so busy being talented, or special, or gifted, or wonderful at something that sometimes they forget to be happy.

Now, I know, you’re thinking to yourself that you should take my endorsement with a grain of salt. After all, Lizzie Foley is my wife, and I can’t be trusted to provide an honest, unbiased assessment of her book . . .

Or can I?

I’m gonna give you some straight talk on correlation versus causation. You might assume that I like this book because I’m married to the person who wrote it. You would not be more wrong. In fact, if I did not know Lizzie Foley, and I read this book, I would track her down and marry her.

So, yes, you should run out right now and get yourself a copy of this book. You should give it to your ten year old, or you should read it with your eight year old, or you should just curl up with it yourself. Just remember, she’s already married. I’m looking at you, Ryan Gosling!

How Many English Tweets are Actually Possible?

So, recently (last week, maybe?), Randall Munroe, of xkcd fame, posted an answer to the question “How many unique English tweets are possible?” as part of his excellent “What If” series. He starts off by noting that there are 27 letters (including spaces), and a tweet length of 140 characters. This gives you 27140 — or about 10200 — possible strings.

Of course, most of these are not sensible English statements, and he goes on to estimate how many of these there are. This analysis is based on Shannon’s estimate of the entropy rate for English — about 1.1 bits per letter. This leads to a revised estimate of 2140 x 1.1 English tweets, or about 2 x 1046. The rest of the post explains just what a hugely big number that is — it’s a very, very big number.

The problem is that this number is also wrong.

It’s not that the calculations are wrong. It’s that the entropy rate is the wrong basis for the calculation.

Let’s start with what the entropy rate is. Basically, given a sequence of characters, how easy is it to predict what the next character will be. Or, how much information (in bits) is given by the next character above and beyond the information you already had.

If the probability of a character being the ith letter in the alphabet is pi, the entropy of the next character is given by

– Σ pi log2 pi

If all characters (26 letter plus space) were equally likely, the entropy of the character would be log227, or about 4.75 bits. If some letters are more likely than others (as they are), it will be less. According to Shannon’s original paper, the distribution of letter usage in English gives about 4.14 bits per character. (Note: Shannon’s analysis excluded spaces.)

But, if you condition the probabilities on the preceding character, the entropy goes down. For example, if we know that the preceding character is a b, there are many letters that might follow, but the probability that the next character is a c or a z is less than it otherwise might have been, and the probability that the next character is a vowel goes up. If the preceding letter is a q, it is almost certain that the next character will be a u, and the entropy of that character will be low, close to zero, in fact.

When we go to three characters, the marginal entropy of the third character will go down further still. For example, t can be followed by a lot of letters, including another t. But, once you have two ts in a row, the next letter almost certainly won’t be another t.

So, the more characters in the past you condition on, the more constrained the next character is. If I give you the sequence “The quick brown fox jumps over the lazy do_,” it is possible that what follows is “cent at the Natural History Museum,” but it is much more likely that the next letter is actually “g” (even without invoking the additional constraint that the phrase is a pangram). The idea is that, as you condition on longer and longer sequences, the marginal entropy of the next character asymptotically approaches some value, which has been estimated in various ways by various people at various times. Many of those estimates are in the ballpark of the 1.1 bits per character estimate that gives you 1046 tweets.

So what’s the problem?

The problem is that these entropy-rate measures are based on the relative frequencies of use and co-occurrence in some body of English-language text. The fact that some sequences of words occur more frequently than other, equally grammatical sequences of words, reduces the observed entropy rate. Thus, the entropy rate tells you something about the predictability of tweets drawn from natural English word sequences, but tells you less about the set of possible tweets.

That is, that 1046 number is actually better understood as an estimate of the likelihood that two random tweets are identical, when both are drawn at random from 140-character sequences of natural English language. This will be the same as number of possible tweets only if all possible tweets are equally likely.

Recall that the character following a q has very low entropy, since it is very likely to be a u. However, a quick check of Wikipedia’s “List of English words containing Q not followed by U” page reveals that the next character could also be space, a, d, e, f, h, i, r, s, or w. This gives you eleven different characters that could follow q. The entropy rate gives you something like the “effective number of characters that can follow q,” which is very close to one.

When we want to answer a question like “How many unique English tweets are possible?” we want to be thinking about the analog of the eleven number, not the analog of the very-close-to-one number.

So, what’s the answer then?

Well, one way to approach this would be to move up to the level of the word. The OED has something like 170,000 entries, not counting archaic forms. The average English word is 4.5 characters long (5.5 including the trailing space). Let’s be conservative, and say that a word takes up seven characters. This gives us up to twenty words to work with. If we assume that any sequence of English words works, we would have 4 x 10104 possible tweets.

The xkcd calculation, based on an English entropy rate of 1.1 bits per character predicts only 1046 distinct tweets. 1046 is a big number, but 10104 is a much, much bigger number, bigger than 1046 squared, in fact.

If we impose some sort of grammatical constraints, we might assume that not every word can follow every other word and still make sense. Now, one can argue that the constraint of “making sense” is a weak one in the specific context of Twitter (see, e.g., Horse ebooks), so this will be quite a conservative correction. Let’s say the first word can be any of the 170,000, and each of the following zero to nineteen words is constrained to 20% of the total (34,000). This gives us 2 x 1091 possible tweets.

That’s less than 1046 squared, but just barely.

1091 is 100 billion time the estimated number of atoms in the observable universe.

By comparison, 1046 is teeny tiny. 1046 is only one ten-thousandth of the number of atoms in the Earth.

In fact, for random sequences of six (seven including spaces) letter words to total only to 1046 tweets, we would have to restrict ourselves to a vocabulary of just 200 words.

So, while 1046 is a big number, large even in comparison to the expected waiting time for a Cubs World Series win, it actually pales in comparison to the combinatorial potential of Twitter.

One final example. Consider the opening of Endymion by John Keats: “A thing of beauty is a joy for ever: / Its loveliness increases; it will never / Pass into nothingness;” 18 words, 103 characters. Preserving this sentence structure, imagine swapping out various words, Mad-Libs style, introducing alternative nouns for thing, beauty, loveliness, nothingness, alternative verbs for is, increaseswill / pass prepositions for of, into, and alternative adverbs for for ever and never.

Given 10000 nouns, 100 prepositions, 10000 verbs, and 1000 adverbs, we can construct 1038 different tweets without even altering the grammatical structure. Tweets like “A jar of butter eats a button quickly: / Its perspicacity eludes; it can easily / swim through Babylon;”

That’s without using any adjectives. Add three adjective slots, with a panel of 1000 adjectives, and you get to 1047 — just riffing on Endymion.

So tweet on, my friends.

Tweet on.

C. E. Shannon (1951). Prediction and Entropy of Written English Bell System Technical Journal, 30, 50-64

This seems like a weird way to fix peer review

So, it is common to hear scientists complain about peer review, about how it is “broken,” and there is probably something to that. Over at Backreaction, a blog by theoretical physicists at The Economist, Sabine Hossenfelder argues that the future of peer review, on that will fix its problems, is already here, in the form of what she calls “pre-print peer review.”

The idea is to separate the peer review process from the journals, and attach it to the manuscript. So, if I write a manuscript, I would send it out, for a fee, to a peer review service, which might be run by a publishing company, or by some other entity. According to Hossenfelder, once you got back the review,

This report you could then use together with submission of your paper to a journal, but you could also use it with open access databases. You could even use it in company with your grant proposals if that seems suitable.

Okay, so maybe Hossenfelder has a very different perception of what is wrong with peer review than I do. If your ultimate goal is to submit the manuscript for traditional publication, this seems problematic and, ultimately, unsustainable.

Just think for a moment about the dynamics and market pressures. First of all, if authors have control over the reviews that they purchase, one might expect that they will only attach these reviews to their papers when those reviews are positive. Furthermore, if there are multiple peer-review services, the market pressures would presumably drive them all towards more and more positive reviews. Basically, it sets up a system that will be unraveled by “review inflation.” Thinking as a journal editor or grant reviewer, I suspect that I would quickly become very skeptical of these reviews. And I certainly would not be willing to substitute their recommendations for my own judgment and the opinions of referees I selected.

You can imagine ways to address this problem. For instance, certain peer-review services could build reputations as tough reviewers, so that their “seal of approval” meant more. At this point, however, you’ve merely layered on another set of reputations and rankings that must be kept track of. While this approach is billed as a way to simplify the peer review process and make it cheaper and more efficient, I have difficulty imagining that it would not do just the opposite.

Hossenfelder argues that this new model of peer review is not just desirable, but inevitable

irrespective of what you think about this, it’s going to happen. You just have to extrapolate the present situation: There is a lot of anger among scientists about publishers who charge high subscription fees. And while I know some tenured people who simply don’t bother with journal publication any more and just upload their papers to the arXiv, most scientists need the approval stamp that a journal publication presently provides: it shows that peer review has taken place. The easiest way to break this dependence on journals is to offer peer review by other means. This will make the peer review process more to the point and more effective.

First, in what way does this have anything to do with high subscription fees? Most open access journals have pretty much the same peer-review structure that subscription journals have. There are legitimate problems with the current dominance of scientific publishing by for-profit corporations that use free labor to evaluate publicly funded science, and then turn around and charge people a lot of money to access that science. However, given the expanding number of high-quality open-access journals that use the traditional peer review system, it seems like peer review is orthogonal to this issue.

Second, yes, there are many people who feel that they need the peer-review stamp of approval. The potential benefit here is that an author could pay for peer review and then post their work on the arXiv, thereby circumventing journals altogether, and allowing more junior researchers to pursue this publishing model. It just seems to me that an author-funded system that is so easily gamed is unlikely to provide any real sense of legitimacy to anyone with this specific concern.

Third, when she says that this will make the process “more to the point and more effective,” I honestly can’t imagine what mechanism she has in mind. Given that it is published in The Economist, my suspicion is that this claim is based on some sort of invisible hand argument — that if we just free peer review from its shackles, it will become efficient and beautiful. But maybe that’s unfair on my part.

The post goes on to point to two outfits that are already working to implement this model: Peerage of Science (which is up and running) and Rubriq (which is getting started). Rubriq seems focused on the author-pay model, creating a standard review format that could travel from journal to journal. Peerage provides reviews free to authors, and it paid by journals when they use a review and then publish a paper. I’ve not seen anything that addresses the problem of review inflation.

I don’t know. Maybe there’s something I’m missing here. What do you guys think?

Free Tips for ex-Westboro Baptists Apologizing

So, nobody asked me for this advice, but if I only gave out advice when people asked for it, I would probably burst from all the advice building up inside me.

Today, Anderson Cooper apparently interviewed Libby Phelps Alvarez, granddaughter of Westboro Baptist founder Fred Phelps (via Gawker — I did not watch this). She was raised in the church, but fled / escaped / defected in 2009, and has recently started speaking publicly about her experience. Let me just say that she deserves a lot of respect for that. I mean, she had to reject her whole upbringing and family, which must be hard, even if your family is full of Phelpses.

Here’s the thing that pissed me off though. Her interview included the following statement of regret:

I do regret if I hurt people, because that was never my intention.

This is such the standard, cliche pseudo-apology that it is easy at first glance to overlook what an offensive pile of garbage this is. First of all, “if”? Really? Again, this is super common in these circumstances, but if you’ve spent most of your live holding up “God Hates Fags” signs at the funerals of soldiers and children, you know damn well that you hurt people.

Even worse, though, is the second bit. When some politician or celebrity pseudo-apologizes, saying it was never their intention to hurt anyone, it is often at least plausible that they were being careless, and not intentionally hurtful.

In this case though, hurting people is precisely the intention of every public appearance the Westboro Baptist Church makes. Now, maybe you could make the case that you thought you were practicing tough love, hurting people in a way that would lead them back to the path of righteousness, or some such nonsense. This would be bullshit, of course, but it would at least be plausible according to some sort of twisted logic.

The fact is, you did intend to hurt people. I believe that you wish now that you had not hurt people in the past, and that’s great. I believe that you were a kid, did not know better, and are not fully responsible for your actions, at least up to a point. I believe that you think of yourself as a good person, and I am eager to believe that you have become one. But when I see this sort of pseudo-apology, it makes me a little bit skeptical.

Maybe try something like this: “I know that I hurt a lot of people, and I am sorry. I understand now how hurtful my words and actions were in a way that I did not understand then.”

I feel bad about this. I mean, given where she started from, she has progressed further in the past few years than most people do in their lifetimes. But if you’re going to make amends publicly, a good way to start is by being honest.

Aaron Bady on MOOCs

So, since starting the Ronin Institute, I’ve been giving some thought to how one, as an independent scholar, can participate in teaching. After all, while some independent scholars are happy to be relieved of onerous teaching duties that keep them from their research, most actually like students, and would prefer to be involved in teaching to some extent.

One way to do this is through adjunct teaching at a local college or university. This is not necessarily appealing, though, since it typically pays terribly (for the number of hours you have to put in to do a good job), and it requires you to participate, at least passively, in undermining the traditional employment structure of the university. That is, as an adjunct, you’re basically a scab. (This may or may not be a negative, depending on your position on various things, but it’s not something that really appeals to me personally.)

The other way is through online courses. These are appealing to me in some ways. They are potentially more open, accessible, and democratic. They also feel as if they are more in keeping with the underlying mission of the Ronin Institute. After all, a part of the mission is to build a model of scholarship that is consistent with, well, life. We believe that it should be possible to function as a scholar while at the same time having family or other priorities that control where you live and when you work. Doesn’t that mean that we should be working to extend education to people for whom the constraints of the traditional college system does not work? At least part of me feels like maybe it does.

That leads us to the Next Big Thing™: the Massively Open Online Course (MOOC). This seems like an obvious path for the independent scholar. However, I’ve been hesitant about that path because I’m not yet convinced that anyone has yet figured out how to really make this work. I mean sure, you can record lectures, and you can assign problem sets, and you can even organize online video-chat discussions. But based on my personal experiences with online communications of various sorts, I have this suspicion that these courses, as they currently exist, are missing some critical element. Something that is hard to articulate, but is actually central to a genuine educational experience.

Anyway, that’s the context in which I read Aaron Bady’s new piece in The New Inquiry, where he articulates a number of things that I think are absolutely true, but which had previously existed in my own consciousness in a nebulous, impressionistic form. Go read the whole piece, but among the points he argues are:

  1. MOOCs are being offered as a solution to high student-teacher ratios. This is ironic, since they lead to massive increases in the student-teacher ratio (and a decrease in teacher accessibility).
  2. In California, at least, MOOCs are being used to privatize education, under the veneer of making education “more accessible.” He points out that for-pay MOOCs are not really “Open” in the way that implied by the appropriation of the term.
  3. Good teaching involves attention and response to various paralinguistic cues from the students. It is not inconceivable that there could be online tools to facilitate this, but they certainly do not exist today. And certainly not when the primary product is a pre-recorded lecture.
  4. MOOCs will work best (perhaps only) for self-directed learners, who do not require the pressure and feedback provided by the in-person classroom setting. However, for those people, it is not clear that your typical MOOC provides added value over, say, access to Wikipedia.
  5. Even for the self-directed, a part of the college experience is learning how to interact and exchange ideas with others — debating and disagreeing in a respectful way: “If we take a process of socialization and make it a process of anti-socialization—if to be “at” college, you must be alone in front of a computer—we take the dynamic that creates the legendary poisonous atmosphere of “the comment thread” and use it to create adults.”
Anyway, I’d love to know what others think, especially if you’ve ever taught online classes.

Here’s your White History Month(s), Asshole

So, it’s Black History Month again, which means that it’s time for whiny racists to renew their annual cry of, “Why isn’t there a White History Month? Isn’t that reverse racism, which is really just racism? You know, whites are actually this country’s second class citizens.” And so on.

There are two responses that you normally hear, both of which I am sympathetic to. The snarky one is that every other month is basically white history month. The earnest one is that we need a black history month because the history and contributions of African Americans are still underrepresented in the public consciousness when compared with the canonical history of the Washingtons and Roosevelts.

But there is another, less snarky version of the first answer, which is that there are, in fact, numerous recognized history and heritage months celebrating the history and contributions of people who are by and large subsets of “white.”

So, here, for future reference, are your White History Months, (as per this Awareness Month Calendar from Nellis Air Force Base):

  • March: Irish-American Heritage Month
  • March: Greek-American Heritage Month
  • April: Arab-American Heritage Month
  • April: Tartan (Scottish-American) Heritage Month
  • May: Jewish-American Heritage Month
  • July: French-American Heritage Month
  • September 15 – October 15: German-American Heritage Month
  • October: Italian American Heritage Month
Other, non-Black Heritage Months:
  • May: Asian-American and Pacific Islander Heritage Month
  • June: Caribbean-American Heritage Month
  • November: Native-American Heritage Month
Hispanic Heritage month is also September 15 – October 15. From a legal perspective, “Hispanic” is an ethnic identity that is orthogonal to race, so that you can be “White Hispanic” or “Black Hispanic” when you’re filling out your equal opportunity questionnaire. So, Hispanic Heritage Month might count as a sort of partial White History Month. I’ve left it out of the list, though, since I suspect that most people who are complaining about the lack of a White History Month don’t mean to include Hispanics when they say “White.” Similarly, Women’s History Month (March).
In addition, you can find, at the state and local level, History Months and Weeks for Russians, Swedes, Dutch, Czechs, and on and on.
For the White Survivalists out there, there’s even a National Preparedness Month (September).
Also, Movember.

The Week in Star Wars: Silent Film and Traceroute Scroll

So, here are a couple of items for you Star Wars fans out there.

First, here’s the “I am your father” scene from Empire, rendered as a silent film:

(via @brainpicker)

Second, you should open a terminal window on your computer right now and type in the following command:

traceroute 216.81.59.173

It might take some time, since a lot of people are doing this right now, but the wait is worth it. (Assuming you’re a huge dork.)

If you don’t know how to open a terminal window, consult your friend with the thickest neck beard. They should be able to help you.

Today’s awesome masculism hashtag

So, as soon as you are back home from stocking up on marshmallow fluff and shuriken for the latest snowpocalypse, you need to go to twitter and check out this hashtag: #INeedMasculismBecause

Here’s an uncurated screengrab from right now. The awesomeness in just this snippet gives you some indication of the high-quality snark being generated right now:

There’s also a fair bit of amusement to be had in the “Men’s Rights” folks who stumble on the hashtag non-ironically.

Update: It sort of looks to me like this was maybe started non-ironically by the men’s rights folks, and has been coopted by snark. It’s sort of like meta-trolling.

Science, Poetry, and Current Events, where "Current" and "Events" are Broadly Construed