Scheduling Jobs: Cron vs. Anacron vs. Systemd

Today, you generally have three choices for a time-based system scheduler: “Cron”, “anacron”, and “systemd”. There is a lot of information out on the web about these, so here I shall just summarize the basics from a practical standpoint to answer the question “which scheduler/timing system should I use?” “Cron” is the traditional timer. It is available everywhere, and tried, tested, loved, and it just works most of the time for most people.

Read more

Supercharging Your Mouse Ergonomics in Linux, Part 1: Scrolling with Mouse Movements

Sometimes I use a touchpad, and at other times an optical mouse, but my main workstation pointer device is a trackball, in particular, the Logitech M570. This model comes with two little index finger buttons, labeled “Forward Button” and “Back Button”, in the image below: which by default send you forward and backward through your web page visit chain in your browser. I find this mapping a phenomenal waste of two very useful buttons.

Read more

Goodbye Walled Cult Garden, Hello (Again) Beloved Muddy Farmyard

End of an Era The great experiment is over: After almost a decade of OSX, I have now returned to Linux: My foray into the world of OSX/macOS was good for most of its run, and, at its height, it was simply brilliant. When I first adopted OSX (after half a decade of running Linux), it was the ultimate developer experience. Not just the perfect balance between developer power/functionality and user bell-and-whistles, but actually as good as the best at both.

Read more

SSH Key-Based Passwordless Access for Private GitHub Repositories

My SSH key-based authorization kept failing on a private GitHub repo on a remote machine, even though my SSH keys for that machine were properly registered on the GitHub account. The solution was to force the “git” protocol instead of “http”. That is, instead of: url = https://github.com/accountname/reponame.git use: url = git@github.com:accountname/reponame.git in your “.git/config: [core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [remote "origin"] url = git@github.

Read more

The Palaver and Perils of Removing a Git Submodule

In contrast to the simple single-line nicely-behaved “git submodule add” command, removing a submodule requires you to: $ git submodule deinit -- path/to/submodule $ rm -rf .git/modules/path/to/submodule $ git rm -f path/to/submodule $ git commit -a -m "Remove submodule" Apart from the multiple steps with “rm -rf freely being tossed around willy-nilly, note the second command where you manually fiddle with the guts of the repository structure. That does not get propagated up to any remotes in a push or any repos that pull from you or a shared remote.

Read more

How to Install R on an HPC: A Comedy in T̶w̶o̶ -- NO -- THREE Acts (a.k.a. 'The Longest Day')

TL;DR: Just look at the Gist. Summary: Act I, in which I try and fail. Act II, in which I think I succeed but actually failed without knowing it till I tried to use it. Act III, in which I return to my beginning, ponder the universe, dive deep into the depths of the abyss, and come back with the magic bean that makes everything work.

Read more

The Rise of the Giants

Whales are all large by any measure, but one group of them in particular, the baleen whales (Mysticeti), are especially large, and, interestingly, this group only became really big relatively recently. Why did they get so big? Ed Yong (on Twitter) writes about the rise of these majestic giants in a series of great articles here and here, based on two separate yet related studies by Slater et al. and Gearty et al.

Read more

Estimate Time for Job Completion (With Progress Updates) When Tar'ing Huge Directories

For the sake of future me, I am recording this here, the coolest shell trick I’ve learned this year: (Linux): tar cf - /folder-with-big-files -P | pv -s $(du -sb /folder-with-big-files | awk '') | gzip > big-files.tar.gz (OSX): tar cf - /folder-with-big-files -P | pv -s $(($(du -sk /folder-with-big-files | awk '') * 1024)) | gzip > big-files.tar.gz with output looking like: 4.69GB 0:04:50 [16.3MB/s] [==========================> ] 78% ETA 0:01:21 Requires ‘pv’: https://github.

Read more

The Traveler's Restaurant Process --- A Better Description of the Dirichlet Process for Partitioning Sets

I. "Have Any of These People Ever Been to a Chinese Restaurant?" The Dirichlet process is a stochastic process that can be used to partition a set of elements into a set of subsets. In biological modeling, it is commonly used to assign elements into groups, such as molecular sequence sites into distinct rate categories. Very often, an intuitive explanation as to how it works invokes the "Chinese Restaurant Process"

Read more