Tag Archives: Engineering

How to make your build faster

Screen Shot 2012 03 26 at 5 47 40 PM
I did an informal survey a while ago and the results were frankly abysmal. I cannot understand how people work in such environments. I am ADD, if I have a feedback cycle more than a few seconds I have already hit ctrl+right arrow and am reading my email or sitting in twitter. This however has a very serious affect on my work. I am constantly pushing and popping my mental stack of what was going on (more often than not having cache misses as well). The sample size was pretty decent as well with over 700 respondents.

This post is going to look at some ways we can make our build times faster. There are loads of ways aside from buying SSDs. I work in a VM on a non-SSD drive on a laptop and rarely see a build over 5 seconds mostly because I focus very heavily on not having solutions with 20,000 projects in them but there are some other ways as well. Most codebases I see in “enterprise” environments should have more than one solution.

Project Structures

To start with we can use smarter project layouts. Patrick Smacchia has some good posts on it:

Advices on partitioning code through .NET assemblies

Hints on how to componentize existing code

Keeping well factored projects can go a long way and making a larger project will be faster than building many smaller ones (why? as an example they have a tendency of reusing many of the same references etc that need to be loaded repeatedly causing far more i/o).

Parallel Builds

These are good reading to start but there are many more options that are available. Let’s start with one of the comments from Ben “I think unless you go down the parallel build/test route (like NCrunch) then this issue of build times is not going to go away.”

What a great selling point. Luckily for builds everyone can already do this. Did you know that Visual Studio and MSBuild already do parallel builds? All you have to do is drop into your configuration and you will see an options screen like the one here

Screen Shot 2012 03 26 at 4 39 49 PM

Put in your max number of cores that you want. Work with msbuild? just use /maxcpucount:4. Of course this is still rather limiting as if you have project 1 that references project 2 which references project 3 you are in a serial situation anyways. The maxcpucount represents the number of projects that can be built concurrently (e.g.: providing no dependencies). Both MM and nCrunch support this option as well. This can get your build times to be quicker in some scenarios though I tend to see only a +- 20-25% increase on average.

Mighty Moose

We did a quick survey on Mighty Moose users about a week ago and I was amazed to see that very few people were using this feature (about 10% of a survey size +- 100-150 people). Mighty Moose has a smart build system inside of it that is actually much better than what msbuild can do because it has more information. I know dotnetdemon from red gate has something similar to it.

Basically since I know what is changing and I know how that affects the dependency graph I can be much more intelligent with my builds. I also have further knowledge through static analysis about what the affects of that change was (are public contracts changed etc?). Of course its a bit more complicated than this in practice (lots of edge cases) but it can make a huge difference in your build times for your typical builds in your TDD cycle.

You can see the difference in the video above. MSBuild on an incremental build in AutoTest.Net about 4.5 seconds. MM for the same build + locate tests to run + run the tests about 2.5-3 seconds. Much bigger solutions will get far more gain (whats also odd is that the opposite advice applies with MM build system, more projects tend to be faster than few so long as they are intelligent and not a dependency nightmare) though AutoTest.Net itself is not a tiny piece of code either. It has a reasonable amount of code in it, even the incremental build is doing pretty well when you consider my dev machine is a VM on a non-SSD drive. MM can do better because we have more information than msbuild does as we stay running compared to it running then dying. You can enable the setting on the first screen in config (Build Setup) here.

Screen Shot 2012 03 26 at 6 17 23 PM

Some people have asked if its so much faster why do we make you opt in to using it? There are a few reasons for this. The largest is compatibility mode. Most people do not generally build their projects directly and often have reasons why they don’t actually build successfully. Building the solution gives the highest level of compatibility with existing VS projects. As such we make you opt in for it, but its worth opting in for!

Code Coverage [2]

Yesterday I wrote about some of the issues I find with code coverage being shown in a UI. More often than not displaying code coverage leads to a false sense of security. We have made a conscious decision to not show line by line code coverage in Mighty Moose but instead have taken a different path.

Let’s go through a quick example from yesterday in the “blank slate” path.

[Test]
public void can_multiple_by_0() {
Assert.AreEqual(0, Multiple(5,0));
}

int Multiply(int a, int b) {
return 0;
}

Simplest possible thing. Now let’s add another test.

[Test]
public void correctly_multiplies_two_numbers() {
Assert.AreEqual(6, Multiply(2,3));
}

fails. when I change code to

int Multiply(int a, int b) {
return a * b;
}

It says not covered, then after running it says covered. Does this mean my test was good? Would I see a visual difference if my test had been

[Test]
public void correctly_multiplies_two_numbers() {
Assert.AreEqual(1,1);
}

They would both just mark the line as being green.

This situation will work quite differently with the way that mighty moose works. Mighty Moose does not show line by line coverage. Instead it shows you method level coverage in the margin (gutter). When you add the second test you will see the number go up by one in the margin. The number in the margin is the # of tests covering your method at runtime. In other words you can see the coverage occur when you are working. You can see this process in this video. With just line by line coverage as discussed in the last post you would not see that the new test actually covers the method.

Of course this does not allow you to see what lines are covered by those tests. It only tells you that those tests are covering the method in question. You need to understand what the tests actually are covering. This is by design. A common question I get to this is “well how could I know code the tests are covering”. Its this thing we do occasionally as developers called “thinking”. If your code is so complex that looking at the tests you can’t figure this out you probably have bigger problems.

Screen Shot 2012 03 22 at 1 17 15 PM

Going along with the # there is also a colour in a circle around the number. This represents a risk analysis MM is doing on your code (its pretty naive right now but actually works surprisingly, to me anyways, well. We may actually include line by line coverage in this metric shortly but we still won’t show you the line by line coverage. This is something that you can key off of to get a relative idea of safety. It does not preempt your responsibility to actually look at tests before you start say refactoring it is just something to give you an idea of your comfort level.

These “risk margins” are very important because I tend to find two common situations. Either this thing is very poorly tested or it tested pretty well. There are lots of things to improve the situations in the middle (code reviews and pairing are good strategies as is running an occasional code coverage report and going over it with developers on your team during a code review really I don’t hate code coverage just when its used heavily in my IDE :)). The margins however give you a quick indicator whether you are in a good or a bad situation.

The margins are also telling you to go look at graphs when you don’t feel comfortable. This really helps with the other big problem of coverage. What on earth is that thing covering this and how far away is it? Does it make a difference if something is 40 method calls away vs a unit test calling directly?

Screen Shot 2012 03 22 at 1 22 56 PM

You can see the tests (they are yellow, interfaces are blue) and the paths they take to cover this particular method. Graphs are one of the most powerful things in mighty moose, I was surprised to see not a lot of people using them via the analytics. You can also use your arrow keys inside the graph to navigate to any node inside of the graph (maybe you are refactoring and want to look at a test?).

The basic idea here is that simple code coverage is not enough. There is more involved with being comfortable than just coverage. Distance is important as is ensuring that the test actually does something.

As they say to assume makes an ass out of u and me. Line by Line code coverage has a tendency of giving us false security. The goal when putting together this stuff in MM was to assist you in identifying your situation and getting more knowledge as quickly as possible. Not to give people a false sense of security. Even a green circle in the margin is just us saying this “seems” to have reasonable coverage. No tool as of today can tell you this thing actually has reasonable coverage.

Code Coverage

One of our most frequently asked questions about Mighty Moose is why do we not do line by line code coverage. We have the capability of doing it, it would take a few weeks to make sure sequence points are right, we already have an entire profiler implementation. We choose not to do it.

I have a personal issue with code coverage. I don’t believe it offers much value either showing me information as I am typing or looking through reports. I also believe that there is a downside to using code coverage that most people do not consider.

Today I started espousing some of these thoughts on twitter with Phillip Haydon who I had promised a few weeks ago to write this blog post to. He is one of the many people wanting line by line code coverage support built into Mighty Moose.

Screen Shot 2012 03 21 at 3 28 52 PM

This is a very normal discussion that I have with people. Let’s look at some of the scenarios of usage here. There are mainly three. The first is I am writing new code going through normal TDD cycles on a blank slate, the second is that I am coming through and adding to existing code, and the last is I am refactoring code.

Blank Slate

The first use case is the one most people see in demos (look at me wow, I can do SuperMarketPricing with this super awesome tool :P). And of course code coverage is very good looking here. You write your test. You go write some code. You see the code get covered. But was it covered by one test or more than one test? Let’s try a simple example (yes very simplified)

[Test]
public void can_multiple_by_0() {
Assert.AreEqual(0, Multiple(5,0));
}

int Multiply(int a, int b) {
return 0;
}

Simplest possible thing. Now let’s add another test.

[Test]
public void correctly_multiplies_two_numbers() {
Assert.AreEqual(6, Multiply(2,3));
}

fails. when I change code to

int Multiply(int a, int b) {
return a * b;
}

It says not covered, then after running it says covered. Does this mean my test was good? Would I see a visual difference if my test had been

[Test]
public void correctly_multiplies_two_numbers() {
Multiply(2,3);
Assert.AreEqual(1,1);
}

They would both just mark the line as being green. Basically I just got some eye-candy that made me feel good when it wasn’t really doing anything for me. Maybe I can mouse over the eye-candy to then get the count and list of tests but do you actually do that? I am too busy on my next test.

Adding to Existing Code

When I am adding to existing code, it already has some test coverage. This is where test coverage really is supposed to be very good as I can see that the code I am changing has good test coverage.

Of course, do you trust the tests that are covering your code? Do you test that they are good tests and actually test what they are supposed to? Working mostly on teams I find so many bad tests that I almost always look around to see what tests are and what they are doing before I rely upon them as a form of safety net that I am not breaking things. Hell they could all be meaningless. And of course as I said on twitter I find my past-self to be quite guilty of having bogus tests occasionally. He is just like my boss a real !@#hole who makes it hard for me to do things now (yes I work for myself).

Knowing that a test “covers” a line of code can not make me avoid the need to look around. If I can avoid the need to look around I probably also know I am in a high coverage situation and am very familiar with the tests (so telling me this line is covered is not that valuable).

Refactoring

The last one here is refactoring. Here I should get a sense of security by looking at my code coverage that I can safely refactor this piece of code.

This should sound fairly similar to the issue above when talking about adding to existing code that I still need to look around. The tests could be completely bogus. They could be a slew of integration tests coming through. They could be calling into the code yet never actually asserting off anything relevant to the section of code they are covering. There are countless reasons why I need to look around.

To me all of these scenarios add up to code coverage on its own being eye-candy that has a tendency of making me feel more secure than I really am. Bad tests happen. I don’t want to give people a false sense of security. The fact that *something* covers this code is not all that valuable without knowing where that thing is, what its goal is, how that relates to here.

Another issue that I have in general with code coverage is I find (especially amongst relatively inexperienced developers) that they write tests to reach code coverage and not to write expressive tests. Even worse is when you talk about a team that has made the asinine decision to have “100% code coverage for our whole system”. Better make sure those autoproperties have tests boys, those will be high value later! You may laugh but I worked with a team who was up in arms over the fact that the closing brace after a return statement was not considered “covered” and was “messing up their otherwise perfect metric”

In the next post we will look at what was done in Mighty Moose instead of line by line code coverage.

Powerful Questions

Powerful questions are key to good analysis sessions.

From http://www.theworldcafe.com/pdfs/aopq.pdf a great quick read btw.

“Questions can be like a lever you
use to pry open the stuck lid on a
paint can. . . . If we have a short
lever, we can only just crack open
the lid on the can. But if we have a
longer lever, or a more dynamic
question, we can open that can up
much wider and really stir things
up. . . . If the right question is
applied, and it digs deep enough,
then we can stir up all the
creative solutions.”

Powerful questions dig into underlying assumptions, they create interest, and most of all they get people interested and good discussions come forth from them.

Question why

I am looking for what powerful questions you use in your analysis process so we can create a list (grabbing some from an old thread in my email as well). Here are some examples:

What is the smallest possible thing we can do to deliver this business value?

What is the need this system fills, not “what it does”

If I turned off the server tomorrow who would be the first person to notice and why?

How would you verify that this is working correctly?

what is the earliest point you can know whether the system has any value to you? How will we do this?

Why are we starting here?

Please keep them coming, the idea of powerful questions can quickly unlock the door to great discussions involving our software process.

Build Times [ctd]

All I can say is wow! I never had any idea that the problem was so prevalent in the .NET space. Over 50% of people had builds ranging over 1 minute! This is a real pain point for development teams (think about your TDD cycle). I can tell you the first thing I would be doing is ctrl+right arrow to get over to my email and twitter to start checking out what is going on. This while seeming inoculate has a major cost associated with it.

I forget what I was working on. I get a push/pop on my mental stack. I also tend to get easily distracted wait shiny thing. This cost is real and can actually be quantified through the use of monitoring software. This is actually a feature we have been looking at building into a plugin (monitoring of users flow to help measure improvements).

I would love to see some discussion on why build times has been prioritized to such a low level on teams. Is it just death march mentality? Is it a frog in a boiling pot? Is it technologies that you use that have notoriously long build times? Is your software actually decoupled into reusable bits?

Build Times?

Let’s be honest (its anonymous anyways) for .NET DEVELOPERS ONLY! Think about your normal build as you work (not a production build etc)

Build Times

Was answering some questions about Mighty Moose today and figured I would just drop a quick note here about the topic.

“Mighty Moose does not work very well on my project as it takes 3-5 minutes for things to show up as my build takes that long.”

OK STOP.

Go to jail 1

If your build is taking this long you have some very serious problems on your team and with your source base. A build being used by developers to actively develop that takes this long should be your first priority to fix. There is no excuse for having a build that takes this long under any circumstances.

Note I make a difference between a build developers use to actively develop and a build on a CI server etc.

To have a build taking this long makes it impossible for developers to develop. How many times per day do you build? You are losing n MINUTES per build? Extrapolate this to a team of 7.

Oddly I have found many people doing this on purpose! The rational is using refactoring tools like reshaper and coderush. They want refactoring support across the whole codebase. Ask yourself, do you really want refactoring support across layer and tier boundaries? This sounds scary.

Another reason I commonly heart is for debugging support (that way you can step into everything). You can do this without creating one massive solution for everything, a bit of brush up on the debugging tools/support in Visual Studio can show you how to do this.

Do not create solutions that take minutes to build. Solutions are meant to be workspaces to code in, not how you produce your production build etc. Break up into many solutions to keep build times lean. You should already have these kinds of boundaries in your system.

If you corporate anti-virus takes you from 20-30 seconds to 2-3 minutes dear god find some way to get it off those machines even if it means living on a segregated network.

Mighty Moose resolution: Won’t fix. If you are in this scenario though the cascading builds in Mighty Moose though sometimes non-trivial to setup might help alleviate some of your pain as they only build projects that need to be built (and if your build is fast, they can make it even faster!)

Follow

Get every new post delivered to your Inbox.

Join 7,602 other followers