The Rise of the Giants

Whales are all large by any measure, but one group of them in particular, the baleen whales (Mysticeti), are especially large, and, interestingly, this group only became really big relatively recently. Why did they get so big? Ed Yong (on Twitter) writes about the rise of these majestic giants in a series of great articles here and here, based on two separate yet related studies by Slater et al. and Gearty et al.

Read more

Estimate Time for Job Completion (With Progress Updates) When Tar'ing Huge Directories

For the sake of future me, I am recording this here, the coolest shell trick I’ve learned this year: (Linux): tar cf - /folder-with-big-files -P | pv -s $(du -sb /folder-with-big-files | awk '') | gzip > big-files.tar.gz (OSX): tar cf - /folder-with-big-files -P | pv -s $(($(du -sk /folder-with-big-files | awk '') * 1024)) | gzip > big-files.tar.gz with output looking like: 4.69GB 0:04:50 [16.3MB/s] [==========================> ] 78% ETA 0:01:21 Requires ‘pv’: https://github.

Read more

The Traveler's Restaurant Process --- A Better Description of the Dirichlet Process for Partitioning Sets

I. "Have Any of These People Ever Been to a Chinese Restaurant?" The Dirichlet process is a stochastic process that can be used to partition a set of elements into a set of subsets. In biological modeling, it is commonly used to assign elements into groups, such as molecular sequence sites into distinct rate categories. Very often, an intuitive explanation as to how it works invokes the "Chinese Restaurant Process"

Read more

'Joy Plots' -- Great Plot Style for Visualizing Distributions on Discrete/Categorical or Multiple Continuous Variables

R doing what R does really, really, really, really, really, really, *R*eally well: visualization. Folks, this might be THE plot to use to visualize distributions of discrete/categorical variables or simultaneous distributions of multiple continuous variables, replacing or at least taking up a seat alongside the violin plots as the current best approach IMHO. Source code repository: ggjoy Example of use (EDIT: This plot style is named after the “Joy Division”, due to a similar graphic on one of their album covers.

Read more

'Pre-Columbian Mycobacterial Genomes Reveal Seals As A Source Of New World Human Tuberculosis'

When, in 1994, definitive evidence of tuberculosis in humans was reported from pre-Columbian America, it was a startling. Conventional understanding had pegged tuberculosis as part of the new, exotic, and (to immunologically-naive populaces) deadly menagerie of pathogens brought by Europeans over to the Americas. While there were suggestions of pre-Columbian tuberculosis in the Americans, these were based on lesions on bones, which were ambiguous. Unlike previous cases, however, the Chiribaya mummy from 1000-1300 CE in Peru was shown beyond doubt to have been exposed to tuberculosis:

Read more

Multispecies Coalescent Species Delimitation: Conflating Populations with Species in the Grey Zone (Evolution 2017 Talk)

Folks! The always fantastic Evolution meetings were a blast. So many great talks, and, perhaps more importantly, great catching up with so many friends, collaborators, and colleagues! I presented a talk on our PNAS paper showing how the Multispecies Coalescent model, when used for “species” delimitation, actually delimits Wright-Fisher populations. Titled “Multispecies Coalescent Species Delimitation: Conflating Populations with Species in the Grey Zone”, the entire talk can be viewed here:

Read more

'Phylogenomics reveals rapid, simultaneous diversification of three major clades of Gondwanan frogs at the Cretaceous–Paleogene boundary'

Some nice work that ties the timing of the radiation of three independent lineages of frogs, constituting the majority of modern living frogs, to about the time the major groups of dinosaurs took a hit (literally and figuratively!). Compelling and interesting story, with lots of intriguing follow-up questions. A more general article covering the findings is available here. Yan-Jie Feng, David C. Blackburn, Dan Liang, David M. Hillis, David B.

Read more

Solving the 'Could not find all biber source files' Error

Biblatex is a fantastic bibliography/citation manager for LaTeX. It trumps the older bibtex for its much easier customizability and configuration. It does however, have one bug that can be very perplexing to figure out due to the misleading error message that results: “Could not find all biber source files”. At first glance this message seemed straightforward enough to send me poking about the project file structure and build system, checking paths and names.

Read more

Building GCC From Scratch Natively on OSX 10.11 (El Capitan) and Above

With every iteration of their desktop operating system, Apple seems more and more determined to try new and novel ways to irritate me. The rootless security model that prevents anyone from writing to ‘/usr‘ (except for ‘/usr/local’; though there is no way for you to re-create this directory if you wipe it). The big problem is that the build process of GCC requires that ‘/usr/include’ exists, and the OSX 10.11 security model does not allow you to create it.

Read more