Refactoring

Software Engineering Large Practical

Software Development Employment

Nearly all job descriptions include some mention of experience
But note: few of them specify experience of success
Experience of both success and failure are both important
Hence, do not be afraid to be wrong
The trick is to recognise failure
Unfortunately this is much harder than it may seem

Failure

Failure is a necessary part of learning
- And certainly a necessary part of mastery
If you are not failing regularly you are either:
- Not being ambitious enough and/or
- being very lucky
In either case, you are not learning much

Failure

Source Code Control

This is why SCC is so well loved
It assists in making lots of micro-failures that can be thrown away
Rather than one big failure
Whenever you are facing a tough choice (or even an easy one):
1. Start a new branch and just try something
2. If it does not work, you can just throw the branch away

Chess

How long does it take to learn to play chess?

Chess

How long does it take to learn to play chess?

Typically two or three answers:
1. About 20 minutes
2. You never stop
3. 15 or so years, 10k hours

Expert in 10K Hours

Refers to the idea that it takes approx 10K hours to become an expert in anything
So if you wish to do this in one year, you are looking at: ~27 hours a day
- You might want to budget for more than a year
Even at four years that is ~7 hours a day
- 9-5 with an hour for lunch every day

Note, the idea is debateable, seems to refer to being world-class rather than “merely” an expert. You might also need some innate talent as well.

Programming Java?

Apparently 21 days
Just as for Chess, you can learn the rules of programming, relatively quickly
But it takes much longer to learn to program well
- Relevant comic
So what is it that you are learning in all this time?

Recall: Adaptability

Adaptability

This is a great sentiment, but it is horribly vague
What do we mean by code being adaptable?
And how do we achieve this?
How do we even know if we have achieved it?

Adaptability

That is a large fraction of what you will learn in your 10K hours
That is why failure is so important
- Only when you write non-adaptable code and experience the results will you appreciate how to (attempt to) avoid it
I cannot teach you this in a lecture, otherwise it would not take you 10K hours to learn
However, I can tell you two important strategies:
1. Refactoring
2. Testing

Refactoring

Refactoring is the process of changing code such that it computes exactly the same function (of inputs to outputs), but has a better design.
This is tremendously powerful, because it allows us to try out various designs, rather than guessing which one is the best
It allows us to determine whether something is possible, without necessarily building it in the best way
It allows us to design retrospectively once we know significant details about the problem at hand.
It allows us to avoid the cost of full commitment to a particular solution which, ultimately, fails.

Janitorial Work

Wish to discuss two points:
- Real and Logical Time
- How to break a rule
To do so I'll need the notion of janitorial work
Examples of Janitorial
- Reformating
- Commenting
- Changing Names
- Tightening

Janitorial Work

Reformating


void method_name (int x)
{
  return x + 10;
}

Becomes:


void method_name(int x) {
  return x + 10;
}

There is plenty of software which will do this for you
Reformatting should be largely unnecessary, if you keep your code formatting correctly in the first place
- More commonly required on group projects

Janitorial Work

Reformating

Refomatting is entirely superficial
It is important to consider when you apply this
Reformatting can result in a large ‘diff’
This may well conflict with other work performed concurrently

Janitorial Work

Commenting

Comments are a highly contentious issue
When done as janitorial work this can be particularly useful
- You can comment on the stuff that is not obvious even to yourself as you read it. This is much more difficult when writing new code
The important thing to comment is not what or how but why
Try not to have redundant information in your comments:
```
// the first integer argument
```
- The fancy XML formatting does nothing to save this comment

Janitorial Work

Commenting

Ultra bad:


// increment x
x += 1;

Better:


// Since we now have an extra element to consider
// the count must be incremented
x += 1;

Janitorial Work

Changing Names

The previous example used x as a variable name
Unless it really is the x-axis of a graph, choose a better name
This is of course better to do the first time around
However as with commenting, unclear code can often be more obvious to its author upon later reading it


// Since we now have an extra element to consider
// the count must be incremented
number_of_set_elements += 1;

Janitorial Work

Tightening


void main(...){
  run_simulation();
}

Tightened to become:


void main(...){
  try{
    run_simulation();
  } catch (FileNotFoundException e) {
    // Explain to the user ..
  }
}

Janitorial Work

Tightening


void compute_equation(int z){
  // .. complex calculation
  return result;
}

Tightened to become:


void compute_equation(int z){
  if (z < 0){
      raise // Some horrifying exception
  }
  // .. complex calculation
  return result;
}

Janitorial Work

Tightening

For some this is not janitorial work, since it actually changes in a non-superficial way the function of the code
I place it here, since similar to other forms it is often caused by being unable to think of everything when writing new code
And similarly you often come across such slackness when re-reading your code for some other purpose

Janitorial Work

Most of this work is work that arguably could have been done right the first time around when the code was developed
However, when developing new code, you have limited cognitive capacity
You cannot think of everything when you develop new code, janitorial work is your time to rectify the minor stuff you forgot
Better than trying to get it right first time is making sure you later review your code

Janitorial Work

“Refactoring is the process of changing code such that it computes exactly the same function (of inputs to outputs), but has a better design.”
Strictly speaking janitorial work is not refactoring
- It should not change the function of the code
  - Tightening might, but generally for exceptional input
- But neither does it make the design any better
In common with refactoring:
- You are making existing code easier to understand/use
- You should not perform janitorial work on pre-existing code whilst developing new code

Janitorial Work

Suppose I'm implementing some new feature and I come across this


// prase the 'validate' command

It's tempting to fix it right now and you should


// parse the 'validate' command

So how do I follow these two bits of advice?
How do I “fix small things right now” whilst also avoiding “doing two things at once”

Real and Logical Time

The answer is, I fix it right now in real time, but use SCC to avoid doing two things at once in logical time
You should be on a development branch and do this:


$ git checkout master # go back to the master branch
$ editor mycode.cobol # Fix the typo
$ git commit -a -m "Fixed a prase->parse typo"
$ git checkout dev # go back to your development branch
$ git rebase master # pretend you fixed the typo before

Do not use cobol, that's just a joke

Real and Logical Time

If you just fix it right now in the midst of current development
- You may throw away the current development
- So you would also be throwing away the small fixes you have done

How to Break a Rule

“you should not perform janitorial work on pre-existing code whilst developing new code”
If this is “too much work right now” then just fix the typo rather than leave it as is
In other words, if you must break the rule, break it such that the code is still fixed
This is especially true if the fix is some form of tightening
Of course if the fix itself is too much work for right now, then it should go in a bug tracker

Finally

Janitorial Work

It will not do you any harm to use the phrase “janitorial work” in your commit messages

More About Refactoring

Refactoring is a term which encompasses both factoring and defactoring
Generally the principle is to make sure that code is written exactly once
We hope for zero duplication
However, we would also like for our code to be as simple and comprehensible as possible

Factoring and Defactoring

We avoid duplication by writing re-usable code
Re-usuable code is generalised
Unfortunately, this often means it is more complicated
Factoring: is the process of removing common or replaceable units of code, usually in an attempt to make the code more general
Defactoring: is the opposite process: specialising a unit of code usually in an attempt to make it more comprehensible

Factoring


void primes(int limit){
    integer x = 2;
    while (x <= limit){
        boolean prime = true;
        for (i = 2; i < x; i++){
            if (x % i == 0){ prime = false; break; }
        }
        if (prime){ System.out.println(x + " is prime"); }
    }
}

A very naive but perfectly reasonable bit of code to print out a set of prime numbers up to a particular limit

Factoring


void print_prime(int x){
    System.out.println(x + " is prime");
}
void primes(int limit){
    x = 2;
    while (x <= limit){
        ... // as before
        if (prime){ print_prime(x); }
    }
}

Here we have “factored out” the code to print the prime number to the screen. This may make it more readable, but I have not made the code more general.

Factoring

To make it more general we have to actually parameterise what we do with the primes once we have found them.


interface PrimeProcessor{
    void process_prime(int x);
}
class PrimePrinter implements PrimeProcessor{
    public void process_prime(int x){
        System.out.println(x + " is prime");
    }
}
void primes(int limit, PrimeProcessor p){
    x = 2;
    while (x <= limit){
        ... // as before
        if (prime){ p.process_prime(x); }
    }
}

Factoring

If I wish to store the primes instead:


class PrimeRecorder implements PrimeProcessor{
    public LinkedList primes;
    public PrimeRecorder(){
       self.primes = new LinkedList();
    }
    public void process_prime(int x){
        self.primes.append(x);
    }
}

Factoring

I can go further and factor out the testing as well:


interface PrimeTester{
    boolean is_primes(int x);
}
class NaivePrimeTester implements PrimeTester{
    public boolean is_prime(int x){
        for (i = 2; i < x; i++){
            if (x % i == 0){ return false; }
        }
        return true;
    }
}
void primes(int limit, PrimeTester t, PrimeProcessor p){
    x = 2;
    while (x <= limit){
        if (t.is_prime(p)){ p.process_prime(x); }
    }
}

Factoring

Now that I've factored out the test, it does not have to be used solely for primes


interface IntTester{
    boolean property_holds(int x);
}
class NaivePrimeTester implements IntTester{
    public boolean property_holds(int x){
        for (i = 2; i < x; i++){
            if (x % 2 == 0){ return false; }
        }
        return true;
    }
} // Similarly for PrimeProcessor to IntProcessor
void number_seive(int limit, IntTester t, IntProcessor p){
    x = 0;
    while (x <= limit){
        if (t.property_holds(p)){ p.process_integer(x); }
    }
}

Factoring

Print the perfect numbers:


interface IntTester{
    boolean property_holds(int x);
}
class PerfectTester implements IntTester{
    public boolean property_holds(int x){
        return (sum(factors(x)) == x);
    }
} // Similarly for PerfectProcessor
void number_seive(int limit, IntTester t, IntProcessor p){
    x = 0;
    while (x <= limit){
        if (t.property_holds(p)){ p.process_integer(x); }
    }
}

Factoring

We might find the two extra parameters a bit ugly, no problem:


public abstract class NumberSeive{
    abstract boolean property_holds(int x);
    abstract void process_integer(int x);
    abstract int start_number;
    void number_seive(int limit){
        x = self.start_number;
        while (x <= limit){
            if (self.property_holds(p)){ self.process_integer(x); }
        }
    }
}

Factoring

Here is the code for printing the primes:


public class PrimeSeive inherits NumberSeive{
    public boolean property_holds(int x){
        for (i = 2; i < x; i++){
            if (x % 2 == 0){ return false; }
        }        return true;  }
    void process_integer(int x) { System.out.println (x + " is prime!"); }
    int start_number = 2;
}

Factoring

Print the perfect numbers:


public class PerfectSeive inherits NumberSeive{
    public boolean property_holds(int x){
        return (sum(factors(x)) == x); }
    void process_integer(int x) { System.out.println (x + " is perfect!"); }
    int start_number = 2;
}

Factoring

So which version do we prefer? This one:


public abstract class NumberSeive{
    abstract boolean property_holds(int x);
    abstract void process_integer(int x);
    abstract int start_number;
    void number_seive(int limit){
        x = self.start_number;
        while (x <= limit){
            if (self.property_holds(p)){ self.process_integer(x); }
        }}} // Close all the scopes
public class PrimeSeive inherits NumberSeive{
    public boolean property_holds(int x){
        for (i = 2; i < x; i++){
            if (x % 2 == 0){ return false; }
        }        return true;  }
    void process_integer(int x) { System.out.println (x + " is prime!"); }
    int start_number = 2;
}

Factoring

Or the original version?


void primes(int limit){
    integer x = 2;
    while (x <= limit){
        boolean prime = true;
        for (i = 2; i < x; i++){
            if (x % 2 == 0){ prime = false; break; }
        }
        if (prime){ System.out.println(x + " is prime"); }
    }
}

Factoring

Something in between?


LinkedList get_primes(int limit){
    int x = 2; LinkedList results = new LinkedList();
    while (x <= limit){
        boolean prime = true;
        for (i = 2; i < x; i++){
            if (x % 2 == 0){ prime = false; break; }
        }
        if (prime){ results.append(x); }
    }
}
void primes(int limit){ 
    for x in get_primes(limit){
        System.out.println(x + " is prime"); 
    }
}

Factoring

The answer of course depends on the context
- How likely am I to need more number seives?
- How likely am I to do something other than print the primes?
The compromise is very adaptable and also rather comprehensible:
- It is likely slower for printing the primes out
- Unless your application is likely to process a large number of primes or other numbers, does it matter how efficient it is?

Defactoring

Numbers such as the number 20 can be factored in different ways
- 2,10
- 4,5
- 2,2,5
If we have the factors 2 and 10, and realise that we want the number 4 included in the factorisation we can either:
- Try to go directly by multiplying one factor and dividing the other
- Defactor 2 and 10 back into 20 and then divide 20 by 4

Defactoring

Similarly your code is factored in some way
In order to obtain the factorisation that you desire, you may have to first defactor some of your code
This allows you to factor down into the desired components
This is often easier than trying to short-cut across factorisations

Sieve of Eratosthenes

Create a list of consecutive integers from 2 to n: (2, 3, 4, ..., n)
Initially, let p equal 2, the first prime number
Starting from p, count up in increments of p and mark each of these numbers greater than p itself in the list
- These will be multiples of p: 2p, 3p, 4p, etc.; note that some of them may have already been marked.
Find the first number greater than p in the list that is not marked
- If there was no such number, stop
- Otherwise, let p now equal this number (which is the next prime), and repeat from step 3

Sieve of Eratosthenes


void primes(int limit){
    LinkedList prime_numbers = new LinkedList();
    boolean[] is_prime = new Array(limit, true);
    for (int i = 2; i ‹ Math.sqrt(limit); i++){
        if (is_prime[i]){
            prime_numbers.append(i);
            for (j = i * i; j ‹ limit; j += i){
                is_prime[j] = false;
    }

I can probably do this via our abstract number sieve class, but I doubt I want to. The alternative is to defactor back to close to our original version and then factor the way we want it.

Defactoring

Defactoring then can be used as the first step of refactoring
It might also simply be that you feel the current factored version is over-engineered
Flexibility is great, but it is generally not without cost
- The cognitive cost associated with understanding the more abstract code
If the flexibility is not now or unlikely to become required then it might be worthwhile defactoring
Don't be shy in explaining your reasoning in comments and commit messages

Refactoring Can Better Document

What does this code do?


r = g.nextDouble();
d = 1.0 - (1.0/x * Math.log(r));
System.out.println (d);

Refactoring Can Better Document

Better?


dice = generator.nextDouble();
delay = 1.0 - (1.0/rate * Math.log(dice));
System.out.println (delay);

Refactoring Can Better Document

How about now?


// Choose a delay from the exponential distribution given the rate
dice = generator.nextDouble();
delay = 1.0 - (1.0/rate * Math.log(dice));
System.out.println (delay);

Refactoring Can Better Document

Do I need a comment?


double calculate_exponential_delay(double rate, Random generator){
    dice = generator.nextDouble();
    delay = 1.0 - (1.0/rate * Math.log(dice));
    return delay;
}
System.out.println (calculate_exponential_delay(rate));

Even if the method is defined elsewhere and you only see the print line


System.out.println (calculate_exponential_delay(rate));

Refactoring Can Better Document


System.out.println (calculate_exponential_delay(rate));

If your code is highly coupled it will be difficult to extract such self-documenting fragments
In this case, you have code you should try to re-arrange first before factoring out
If your factored out method has a ridiculously long name, or many parameters it is a good sign that it is not worth factoring out:


xs = calculate_exponential_delays_from_global_events(rate_function, 
                                                     generator , 
                                                     ...);

Have a Clear Goal

Refactoring

Always have a clear goal for your refactoring
Why is the current code inappropriately factored?
How will your improvements help this situation?
As always, you can record your reasons in your SCC commit messages

Breaking (Bad) Methods

Here is a question posted to stack overflow:

When is a function too long? [closed]

“35 lines, 55 lines, 100 lines, 300 lines? When you should[sic] start to break it apart? I'm asking because I have a function with 60 lines (including comments) and was thinking about breaking it apart”

Breaking (Bad) Methods

Here is a question posted to stack overflow:

When is a function too long? [closed]

long_function(){ ... }

into:

small_function_1(){...}
small_function_2(){...}
small_function_3(){...}

“The functions are not going to be used outside of the long_function, making smaller functions means more function calls, etc. When would you break apart a function into smaller ones? Why?”

A Blog Post

Long methods and classes are evil

See the original here.

Any method should fit into your IDE window ... I strive for an average of not more than 5 lines of code per method
Too many private members .. say greater than 10
The size of the code file. Offhand, I’d say any class over 10k in size ...
Exception handling code and instrumentation tend to push methods to be much larger. Invest some thought in how to segregate this type of code away from the main functionality
Reduce the number of public methods in a class. Just picking a number, I would say less than 10 in most cases.

Common Rules

These are some common rules:

“Methods should have no more than X lines of code”
“Classes should have no more than X methods/private variables”
“Files should have no more than X lines of code”

Ignore Such Rules

Most such rules are well intentioned
They are supposed to be easy to adhere to and check
But unless you understand the motivation behind such a rule, following it will do you no good
These rules tell you what not to write, but they do not explain what you should write instead
Not to mention the fact that most good rules have some exceptions

Example (long methods)


integer my_long_method(int input){
   int x = 0;
   ...
   // Do some stuff
   ...
   // Do some other stuff
   ...
   // Finally return
   return x;
}

Oh oh, this method is apparently 900 lines of code long.

Example (long methods)


void do_some_stuff(int x, int y){
    // Do some stuff
}
integer do_some_other_stuff(int x, String star){
    // Do some other stuff
    return x
}
integer my_long_method(int input){
   int x = 0;
   ...
   do_some_stuff(x, 0);
   x = do_some_other_stuff(x, input_string);
   // Finally return
   return x;
}

Ah good, this method is now only a few lines long

Example (long methods)

This is not any better, unless:
- It aids the understanding of the method, it could obscure it
- Any of the new smaller methods are plausibly called from elsewhere
  - Generally, if they are not currently called from elsewhere, then you have no evidence that they ever will be
  - You can wait until there is a need
  - Libraries are an obvious exception to this

Repeating Advice

Do not refactor and update functionality at the same time
If you are updating functionality and realise that a refactor would help, either:
- Back of the current changes, refactor and merge in your new functionality or
- Back of the current changes, throw them away, refactor and restart the new functionalityor
- Finish the new functionality and then refactor
Obviously which of these you do will depend on how you have implemented and how the refactor interacts with that work

Refactoring Summary

Code should be factored into multiple components
Refactoring is the process of changing the division of components
Defactoring can help the process of changing the way the code is factored
Well factored code will be easier to understand
Do not update functionality at the same time

Summary

Refactoring is very important:
- Refactoring allows us to avoid doing a large amount of upfront design and also avoid producing a a big hairy mess
- Do not change functionality whilst refactoring
- Your code should be adaptable
Do not optimise blindly, benchmark and profile
There is not a thing on this page that your source code control will not make easier

Your Proposals

Do not forget that your proposal can include questions to elicit feedback
The more specific your question is, the better the feedback can be directed
At the very least, give your own answer to the question

Examples

“Should I use language X for this project?”
- I already have your proposal so I can answer this question
- I might assume that your answer is yes
- However, if you tell me why you think that, I can better explain why I agree/disagree with your line of reasoning

Examples

“Should I use language X for this project?
I know X a lot so I'm pretty confident in it

There is a well matured library which does ...

I do not think I require static typing here because ...

”