The peril in counting source lines on an OSS project

Published in

Heptio

5 min readMar 22, 2017

Image by Nicholas_T at http://flickr.com/photos/14922165@N00/543334336; licensed under the terms of the cc-by-2.0.

“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” — Bill Gates

There seems to be a phase that OSS projects go through where as they mature and gain traction. As they do it becomes increasingly important for vendors to point to their contributions to credibly say they are the ‘xyz’ company. Heptio is one such vendor operating in the OSS space, and this isn’t lost on us. :)

It helps during a sales cycle to be able to say “we are the a big contributor to this project, look at the percentage of code and PRs we submitted”. While transparency is important as is recognizing the contributions that key vendors, focus on a single metric in isolation (and LoC in particular) creates a perverse incentive structure. Taken to its extreme it becomes detrimental to project health.

I want to call attention to this since I see the drum beat starting to measure LoC contributed to the Kubernetes project. I believe that not only are there better ways to optimize for success of the project than counting lines contributed, but also that focusing on LoC leads to bad outcomes.

Source lines just isn’t a good overall metric of project contribution. Any engineering manager worth their salt will tell you this: measuring contribution as ‘volume’ in isolation is a poor idea. It penalizes conciseness and rewards expansiveness. It focuses on quantity over quality and ignores any weighting of ‘impact’. At the end of the day creating an incentive structure not aligned with the health of the project is not a wise idea. Promoting this as a primary metric of success leads to all kinds of gamification opportunities. Same applies to PR volume, should a PR that fixes a spelling mistake in a doc be weighted at the same level as a PR to address a subtle bug in the scheduler?

Library vendoring distorts the LOC metric even were it a good one to start with. This one is language specific. For Go, it is common practice to check in a version of your dependencies with your repo. This is called vendoring. It is very easy to inflate line counts based on importing or updating vendored libraries. Focusing on source lines contributed creates peculiar incentives to manage the vendoring process since that represents the lion’s share of physical code in the repository.

Third party (vendor) specific code further distorts things. As of today there is a lot of vendor specific code mixed into the core repositories. We (the community) aspire to clean this up and define better interfaces along a provider model to ensure sanity as other vendors emerge. All vendor specific code should likely be factored out of core; but that requires a reduction in vendors perceived contributions to the project. Doing the right thing will likely meet resistance if we are focused on vendor driven source line contributions.

Even inside Kubernetes, not all repos are the same. There is no weighting for impact on these LoC calculations. A big community driven project like Kubernetes is split over many repos. In fact, there are large efforts underway to break the project up further. But not all of these repos (and not all code in all repos) represent impact on the wider project. That isn’t to say there isn’t good work going on there; there is! It is just important to recognize that 10 lines in the common service network path might take a lot more work and thought than 1000 lines of code in the periphery.

Bigger in the case of Kubernetes Core is not better. Far from it. Perhaps the most pernicious issue we face is that by rewarding LoC contributed, we create an incentive to expand the size of any given area of the project. Focusing on core contributions will create an incentive to increase the size of core. The broader project is already 1.4M lines of code. Do we really want to make it significantly larger? Much of the success of Kubernetes is in what it isn’t as much as what it is. As a clean underlying framework for orchestration, a strong and healthy ecosystem has grown around it. We should focus on getting the point where the core is ‘finished’ and has strong extensibility points. Look the recent push in the Linux kernel world to remove lines of code for each addition for inspiration.

Not all contributions are in code. This is a stickier one. Getting a Kubernetes release out is a big effort that leans on a lot of people. Very little of that effort results in PRs or code changes. How do we value that? There are countless other efforts that do much to advance the project beyond writing code.

So what is a better approach to measure contribution? There is no single metric that will capture the contributions of a single vendor, but there are a set of acid tests that a community friendly vendor should be able to answer:

How many people are full time on the project? Google, for example has a huge team working on K8s. They don’t crow about it but they are carrying a disproportional amount of the load for the community. It would be nice to see vendors talk about how many folks they have working on non-vendor specific contributions and non-core code efforts.
What percentage of your time is spent dealing with toil and general community operating taxes? Can a vendor point to significant investments in community support and infrastructure? Again look at the work people like Google is doing to drive things like release processes, etc. Other folks like CoreOS and Samsung SDS have been doing a lot of quiet, but important work to support the health of the community, and even Heptio (despite its newness) has full time engineers working on community chores.
How many bugs did you fix this year outside of the code you contributed? We have over 5000 active Kubernetes issues, could use some help. Would love to see vendors talking up how many customer reported issues they have fixed. :)
What key new feature efforts have you led? Are these adding value to the community even if they are outside of the core? Look at folks like CoreOS’s work on Operators or their contributions through etcd. Consider Red Hat’s work in pushing RBAC authorization (and a great many other areas). Look to Google’s work on pretty much everything. These are legitimately interesting accomplishments and are worth celebrating.
What testing contributions have you made? The stability and velocity of the project is deeply contingent on the strength of its test automation and CI investments.
What SIGs are you helping drive forward? SIGs are the lifeblood of the community. Do you have full time staff participating and supporting the SIGs. Are you helping to grow and contribute to existing SIGs vs creating new SIGs with potential fragmentation?

So where does this leave us? Hopefully we (as a community) won’t get into the LOC arms race that we have seen in other OSS communities. More significantly I hope that as product marketing folks look to drive the visibility of vendors they focus on things that are actually healthy for the community and the customers at the other side of it. As Sarah Novotny loves to say, let’s see folks take more pride in ‘chopping wood and carrying water’.

The peril in counting source lines on an OSS project

Written by Craig McLuckie