The Palaver and Perils of Removing a Git Submodule

In contrast to the simple single-line nicely-behaved “git submodule add” command, removing a submodule requires you to: $ git submodule deinit -- path/to/submodule $ rm -rf .git/modules/path/to/submodule $ git rm -f path/to/submodule $ git commit -a -m "Remove submodule" Apart from the multiple steps with “rm -rf freely being tossed around willy-nilly, note the second command where you manually fiddle with the guts of the repository structure. That does not get propagated up to any remotes in a push or any repos that pull from you or a shared remote.

Read more

How to Install R on an HPC: A Comedy in T̶w̶o̶ -- NO -- THREE Acts (a.k.a. 'The Longest Day')

TL;DR: Just look at the Gist. Summary: Act I, in which I try and fail. Act II, in which I think I succeed but actually failed without knowing it till I tried to use it. Act III, in which I return to my beginning, ponder the universe, dive deep into the depths of the abyss, and come back with the magic bean that makes everything work.

Read more

The Rise of the Giants

Whales are all large by any measure, but one group of them in particular, the baleen whales (Mysticeti), are especially large, and, interestingly, this group only became really big relatively recently. Why did they get so big? Ed Yong (on Twitter) writes about the rise of these majestic giants in a series of great articles here and here, based on two separate yet related studies by Slater et al. and Gearty et al.

Read more

Estimate Time for Job Completion (With Progress Updates) When Tar'ing Huge Directories

For the sake of future me, I am recording this here, the coolest shell trick I’ve learned this year: (Linux): tar cf - /folder-with-big-files -P | pv -s $(du -sb /folder-with-big-files | awk '') | gzip > big-files.tar.gz (OSX): tar cf - /folder-with-big-files -P | pv -s $(($(du -sk /folder-with-big-files | awk '') * 1024)) | gzip > big-files.tar.gz with output looking like: 4.69GB 0:04:50 [16.3MB/s] [==========================> ] 78% ETA 0:01:21 Requires ‘pv’: https://github.

Read more

The Traveler's Restaurant Process --- A Better Description of the Dirichlet Process for Partitioning Sets

I. "Have Any of These People Ever Been to a Chinese Restaurant?" The Dirichlet process is a stochastic process that can be used to partition a set of elements into a set of subsets. In biological modeling, it is commonly used to assign elements into groups, such as molecular sequence sites into distinct rate categories. Very often, an intuitive explanation as to how it works invokes the "Chinese Restaurant Process"

Read more

'Joy Plots' -- Great Plot Style for Visualizing Distributions on Discrete/Categorical or Multiple Continuous Variables

R doing what R does really, really, really, really, really, really, *R*eally well: visualization. Folks, this might be THE plot to use to visualize distributions of discrete/categorical variables or simultaneous distributions of multiple continuous variables, replacing or at least taking up a seat alongside the violin plots as the current best approach IMHO. Source code repository: ggjoy Example of use (EDIT: This plot style is named after the “Joy Division”, due to a similar graphic on one of their album covers.

Read more

'Pre-Columbian Mycobacterial Genomes Reveal Seals As A Source Of New World Human Tuberculosis'

When, in 1994, definitive evidence of tuberculosis in humans was reported from pre-Columbian America, it was a startling. Conventional understanding had pegged tuberculosis as part of the new, exotic, and (to immunologically-naive populaces) deadly menagerie of pathogens brought by Europeans over to the Americas. While there were suggestions of pre-Columbian tuberculosis in the Americans, these were based on lesions on bones, which were ambiguous. Unlike previous cases, however, the Chiribaya mummy from 1000-1300 CE in Peru was shown beyond doubt to have been exposed to tuberculosis:

Read more

Multispecies Coalescent Species Delimitation: Conflating Populations with Species in the Grey Zone (Evolution 2017 Talk)

Folks! The always fantastic Evolution meetings were a blast. So many great talks, and, perhaps more importantly, great catching up with so many friends, collaborators, and colleagues! I presented a talk on our PNAS paper showing how the Multispecies Coalescent model, when used for “species” delimitation, actually delimits Wright-Fisher populations. Titled “Multispecies Coalescent Species Delimitation: Conflating Populations with Species in the Grey Zone”, the entire talk can be viewed here:

Read more

'Phylogenomics reveals rapid, simultaneous diversification of three major clades of Gondwanan frogs at the Cretaceous–Paleogene boundary'

Some nice work that ties the timing of the radiation of three independent lineages of frogs, constituting the majority of modern living frogs, to about the time the major groups of dinosaurs took a hit (literally and figuratively!). Compelling and interesting story, with lots of intriguing follow-up questions. A more general article covering the findings is available here. Yan-Jie Feng, David C. Blackburn, Dan Liang, David M. Hillis, David B.

Read more