Visualising Operating System Derivation
After I ordered a new laptop yesterday (Huawei MateBook X Pro (2018)) I started pondering what OS I might run on it. I started looking through the DistroWatch.com Top 100 OSes (by Page Hit Ranking). This table (shown in the right side bar on the homepage) ranks OSes by the number of hits to their DistroWatch.com page in the last 6 months. It's a decent proxy for what's out there and what people are interested in using at moment.
As I went through the list it struck me how many appeared to based on one of three operating systems: Arch Linux, Debian, and Ubuntu (which is itself based on Debian). I thought it might be interesting to visualise this in order to see the clustering and also get a feel for how many of the top 100 are actually independent.
I whipped up a quick Ruby script to pull the data from DistroWatch.com and then output a dot file for GraphViz to render. The result is below (click to view full size SVG). The script uses Nokogiri to parse the HTML documents. I then use a combination of XPath and CSS selectors to extract the information I need.
Update: Some people have commented to say that some historical derivation is missing, like OpenBSD being originally forked from NetBSD. That's not what this graph is showing. It's mostly showing how an OS is constructed today. I.e. when the project makes a new release is it being rebased on another project or not. In this context, OpenBSD is very much independent. Ultimately though it visualises DistroWatch.com's "Based On" field, whatever that happens to be.
The graph confirms my suspicion. There are significant clusters around Arch, Debian, and Ubuntu. This helps with the original goal of picking an OS to try on the laptop. I already run Arch, which in my mind rules out all its derivatives -- I'd rather just run the real thing. I have no desire to run Debian or Ubuntu as I much prefer a rolling software model. Of the remaining independent options Alpine and Void are interesting. I run Alpine on my server but I'm not sure it would be what I'm after in a desktop (I did play with a desktop install in a virtual machine though). I'm interested in giving Void more of a go -- especially the musl libc variant -- so that will be what I try first.
The source of the script is on GitHub if there's any GraphViz wizards out there that can improve the graph, PRs are most welcome.