Github language statistics

I enjoyed Aldo Cortesi’s rather interesting post about language statistics on github. He’s done some good analysis, and there are some interesting nuggets of information to be had about Perl, Haskell (though fewer, as there are only 18 projects that made his criteria) as well as other languages.

Of course, there is some silliness there too: you can bash Perl for many reasons (and if you’ve read my blog before, you might know that I do too), but there are some gems of forced interpretation:

C and Perl projects show a marked decline in activity over their first year. I suspect that the Perl result is due to the fact that it becomes harder and harder to contribute to a Perl codebase, the bigger it gets. The C result is more of a mystery.

I’m not sure that the premise is true — perhaps Perl projects are more limited in scope, for example. And Modern Perl is a quite different beast from the Matt’s PERL script archive of yesteryear. But the punchline is priceless ;-)

Here’s what I read with my biased interpretation of his results ;-)

  • Far fewer Perl projects than Ruby/Python so far. Github is a Ruby community effort, so it’s unsurprising that it would dominate here.
  • Median contributors for Perl is above average. This is substantiated by the total contributors for long running projects being comparable with Ruby/Python.
  • Perl projects seem to have many, small commits. This would seem to be a good thing, and rather in keeping with the Git Way. (/me shuffles embarrassedly at the sight of his own, rather monolithic git commits…)
  • While Aldo suggests Perl projects are “significantly more “top-heavy” than those in other languages, with a smaller core of contributors doing more of the work,” one could also hypothethize that Perl projects are good at attracting and retaining a strong core team. This certainly seems to be the case with long running, active projects such as Catalyst and Moose.

So, thanks Aldo for taking the time to do this fascinating analysis (though I’m sure you won’t mind if I draw some slightly different conclusions than you ;-P)


  1. rgrau says:

    Just yesterday, schwern posted in his blog at that gitPan import just completed.

    21,766 new perl repositories in github :)

    here’s the link to the post:

  2. There are big lies and there is statistics… “I eliminated projects with less than 3 watchers” – Perl projects are usually smaller, so there are less watchers. “Perl decline” is also suspicious: “This graph shows the number of commits per day, over the first 300 days of a project’s life.”. Currently there are many old Perl projects that are uploaded to GitHub, they usually were written by only one developers, or commited by only one developer. So this result is not surprising.

  3. Mark Wotton says:

    Could also be that Perl hackers tend to bite off small, discrete chunks that can actually be finished.

  4. dagolden says:

    His criteria of having at least 3 watchers really limits the data set to projects whose contributors primarily use github for tracking and collaboration. (E.g. adding 21,000 Perl projects that no one watches doesn’t really change things.) The statistics have huge selection bias so drawing any sort of conclusion from them is absurd.

  5. Programmer says:

    Adding nearly 22k perl projects (2/3rd of which are probably defunct) to, once again, try to make perl look more alive than it is? Embarrassing. Perl5 is dead, you people should be moving to perl6 before you wreck its chances to be a player.