Refactoring

Software Engineering Large Practical

Software Development Employment

  • Nearly all job descriptions include some mention of experience
  • But note: few of them specify experience of success
  • Experience of both success and failure are both important
  • Hence, do not be afraid to be wrong
  • The trick is to recognise failure
  • Unfortunately this is much harder than it may seem

Failure

  • Failure is a necessary part of learning
    • And certainly a necessary part of mastery
  • If you are not failing regularly you are either:
    • Not being ambitious enough and/or
    • being very lucky
  • In either case, you are not learning much

Failure

Source Code Control

  • This is why SCC is so well loved
  • It assists in making lots of micro-failures that can be thrown away
  • Rather than one big failure
  • Whenever you are facing a tough choice (or even an easy one):
    1. Start a new branch and just try something
    2. If it does not work, you can just throw the branch away

Chess

How long does it take to learn to play chess?

Chess

How long does it take to learn to play chess?
  • Typically two or three answers:
    1. About 20 minutes
    2. You never stop
    3. 15 or so years, 10k hours

Expert in 10K Hours

  • Refers to the idea that it takes approx 10K hours to become an expert in anything
  • So if you wish to do this in one year, you are looking at: ~27 hours a day
    • You might want to budget for more than a year
  • Even at four years that is ~7 hours a day
    • 9-5 with an hour for lunch every day
Note, the idea is debateable, seems to refer to being world-class rather than “merely” an expert. You might also need some innate talent as well.

Programming Java?

  • Apparently 21 days
  • Just as for Chess, you can learn the rules of programming, relatively quickly
  • But it takes much longer to learn to program well
  • So what is it that you are learning in all this time?

Recall: Adaptability

Adaptability

  • This is a great sentiment, but it is horribly vague
  • What do we mean by code being adaptable?
  • And how do we achieve this?
  • How do we even know if we have achieved it?

Adaptability

  • That is a large fraction of what you will learn in your 10K hours
  • That is why failure is so important
    • Only when you write non-adaptable code and experience the results will you appreciate how to (attempt to) avoid it
  • I cannot teach you this in a lecture, otherwise it would not take you 10K hours to learn
  • However, I can tell you two important strategies:
    1. Refactoring
    2. Testing

Refactoring

  • Refactoring is the process of changing code such that it computes exactly the same function (of inputs to outputs), but has a better design.
  • This is tremendously powerful, because it allows us to try out various designs, rather than guessing which one is the best
  • It allows us to determine whether something is possible, without necessarily building it in the best way
  • It allows us to design retrospectively once we know significant details about the problem at hand.
  • It allows us to avoid the cost of full commitment to a particular solution which, ultimately, fails.

Janitorial Work

  • Wish to discuss two points:
    • Real and Logical Time
    • How to break a rule
  • To do so I'll need the notion of janitorial work
  • Examples of Janitorial
    • Reformating
    • Commenting
    • Changing Names
    • Tightening

Janitorial Work

Reformating


void method_name (int x)
{
  return x + 10;
}
Becomes:

void method_name(int x) {
  return x + 10;
}
  • There is plenty of software which will do this for you
  • Reformatting should be largely unnecessary, if you keep your code formatting correctly in the first place
    • More commonly required on group projects

Janitorial Work

Reformating

  • Refomatting is entirely superficial
  • It is important to consider when you apply this
  • Reformatting can result in a large ‘diff’
  • This may well conflict with other work performed concurrently

Janitorial Work

Commenting

  • Comments are a highly contentious issue
  • When done as janitorial work this can be particularly useful
    • You can comment on the stuff that is not obvious even to yourself as you read it. This is much more difficult when writing new code
  • The important thing to comment is not what or how but why
  • Try not to have redundant information in your comments:
    
    // the first integer argument
    
    • The fancy XML formatting does nothing to save this comment

Janitorial Work

Commenting

Ultra bad:

// increment x
x += 1;
Better:

// Since we now have an extra element to consider
// the count must be incremented
x += 1;

Janitorial Work

Changing Names

  • The previous example used x as a variable name
  • Unless it really is the x-axis of a graph, choose a better name
  • This is of course better to do the first time around
  • However as with commenting, unclear code can often be more obvious to its author upon later reading it

// Since we now have an extra element to consider
// the count must be incremented
number_of_set_elements += 1;

Janitorial Work

Tightening


void main(...){
  run_simulation();
}
Tightened to become:

void main(...){
  try{
    run_simulation();
  } catch (FileNotFoundException e) {
    // Explain to the user ..
  }
}

Janitorial Work

Tightening


void compute_equation(int z){
  // .. complex calculation
  return result;
}
Tightened to become:

void compute_equation(int z){
  if (z < 0){
      raise // Some horrifying exception
  }
  // .. complex calculation
  return result;
}

Janitorial Work

Tightening

  • For some this is not janitorial work, since it actually changes in a non-superficial way the function of the code
  • I place it here, since similar to other forms it is often caused by being unable to think of everything when writing new code
  • And similarly you often come across such slackness when re-reading your code for some other purpose

Janitorial Work

  • Most of this work is work that arguably could have been done right the first time around when the code was developed
  • However, when developing new code, you have limited cognitive capacity
  • You cannot think of everything when you develop new code, janitorial work is your time to rectify the minor stuff you forgot
  • Better than trying to get it right first time is making sure you later review your code

Janitorial Work

  • “Refactoring is the process of changing code such that it computes exactly the same function (of inputs to outputs), but has a better design.”
  • Strictly speaking janitorial work is not refactoring
    • It should not change the function of the code
      • Tightening might, but generally for exceptional input
    • But neither does it make the design any better
  • In common with refactoring:
    • You are making existing code easier to understand/use
    • You should not perform janitorial work on pre-existing code whilst developing new code

Janitorial Work

  • Suppose I'm implementing some new feature and I come across this
  • 
    // prase the 'validate' command
    
  • It's tempting to fix it right now and you should
  • 
    // parse the 'validate' command
    
  • So how do I follow these two bits of advice?
  • How do I “fix small things right now” whilst also avoiding “doing two things at once”

Real and Logical Time

  • The answer is, I fix it right now in real time, but use SCC to avoid doing two things at once in logical time
  • You should be on a development branch and do this:

$ git checkout master # go back to the master branch
$ editor mycode.cobol # Fix the typo
$ git commit -a -m "Fixed a prase->parse typo"
$ git checkout dev # go back to your development branch
$ git rebase master # pretend you fixed the typo before
Do not use cobol, that's just a joke

Real and Logical Time

  • If you just fix it right now in the midst of current development
    • You may throw away the current development
    • So you would also be throwing away the small fixes you have done

How to Break a Rule

  • “you should not perform janitorial work on pre-existing code whilst developing new code”
  • If this is “too much work right now” then just fix the typo rather than leave it as is
  • In other words, if you must break the rule, break it such that the code is still fixed
  • This is especially true if the fix is some form of tightening
  • Of course if the fix itself is too much work for right now, then it should go in a bug tracker

Finally

Janitorial Work

  • It will not do you any harm to use the phrase “janitorial work” in your commit messages

More About Refactoring

  • Refactoring is a term which encompasses both factoring and defactoring
  • Generally the principle is to make sure that code is written exactly once
  • We hope for zero duplication
  • However, we would also like for our code to be as simple and comprehensible as possible

Factoring and Defactoring

  • We avoid duplication by writing re-usable code
  • Re-usuable code is generalised
  • Unfortunately, this often means it is more complicated
  • Factoring: is the process of removing common or replaceable units of code, usually in an attempt to make the code more general
  • Defactoring: is the opposite process: specialising a unit of code usually in an attempt to make it more comprehensible

Factoring


void primes(int limit){
    integer x = 2;
    while (x <= limit){
        boolean prime = true;
        for (i = 2; i < x; i++){
            if (x % i == 0){ prime = false; break; }
        }
        if (prime){ System.out.println(x + " is prime"); }
    }
}
A very naive but perfectly reasonable bit of code to print out a set of prime numbers up to a particular limit

Factoring


void print_prime(int x){
    System.out.println(x + " is prime");
}
void primes(int limit){
    x = 2;
    while (x <= limit){
        ... // as before
        if (prime){ print_prime(x); }
    }
}
Here we have “factored out” the code to print the prime number to the screen. This may make it more readable, but I have not made the code more general.

Factoring

To make it more general we have to actually parameterise what we do with the primes once we have found them.

interface PrimeProcessor{
    void process_prime(int x);
}
class PrimePrinter implements PrimeProcessor{
    public void process_prime(int x){
        System.out.println(x + " is prime");
    }
}
void primes(int limit, PrimeProcessor p){
    x = 2;
    while (x <= limit){
        ... // as before
        if (prime){ p.process_prime(x); }
    }
}

Factoring

If I wish to store the primes instead:

class PrimeRecorder implements PrimeProcessor{
    public LinkedList primes;
    public PrimeRecorder(){
       self.primes = new LinkedList();
    }
    public void process_prime(int x){
        self.primes.append(x);
    }
}

Factoring

I can go further and factor out the testing as well:

interface PrimeTester{
    boolean is_primes(int x);
}
class NaivePrimeTester implements PrimeTester{
    public boolean is_prime(int x){
        for (i = 2; i < x; i++){
            if (x % i == 0){ return false; }
        }
        return true;
    }
}
void primes(int limit, PrimeTester t, PrimeProcessor p){
    x = 2;
    while (x <= limit){
        if (t.is_prime(p)){ p.process_prime(x); }
    }
}

Factoring

Now that I've factored out the test, it does not have to be used solely for primes

interface IntTester{
    boolean property_holds(int x);
}
class NaivePrimeTester implements IntTester{
    public boolean property_holds(int x){
        for (i = 2; i < x; i++){
            if (x % 2 == 0){ return false; }
        }
        return true;
    }
} // Similarly for PrimeProcessor to IntProcessor
void number_seive(int limit, IntTester t, IntProcessor p){
    x = 0;
    while (x <= limit){
        if (t.property_holds(p)){ p.process_integer(x); }
    }
}

Factoring

Print the perfect numbers:

interface IntTester{
    boolean property_holds(int x);
}
class PerfectTester implements IntTester{
    public boolean property_holds(int x){
        return (sum(factors(x)) == x);
    }
} // Similarly for PerfectProcessor
void number_seive(int limit, IntTester t, IntProcessor p){
    x = 0;
    while (x <= limit){
        if (t.property_holds(p)){ p.process_integer(x); }
    }
}

Factoring

We might find the two extra parameters a bit ugly, no problem:

public abstract class NumberSeive{
    abstract boolean property_holds(int x);
    abstract void process_integer(int x);
    abstract int start_number;
    void number_seive(int limit){
        x = self.start_number;
        while (x <= limit){
            if (self.property_holds(p)){ self.process_integer(x); }
        }
    }
}

Factoring

Here is the code for printing the primes:

public class PrimeSeive inherits NumberSeive{
    public boolean property_holds(int x){
        for (i = 2; i < x; i++){
            if (x % 2 == 0){ return false; }
        }        return true;  }
    void process_integer(int x) { System.out.println (x + " is prime!"); }
    int start_number = 2;
}

Factoring

Print the perfect numbers:

public class PerfectSeive inherits NumberSeive{
    public boolean property_holds(int x){
        return (sum(factors(x)) == x); }
    void process_integer(int x) { System.out.println (x + " is perfect!"); }
    int start_number = 2;
}

Factoring

So which version do we prefer? This one:

public abstract class NumberSeive{
    abstract boolean property_holds(int x);
    abstract void process_integer(int x);
    abstract int start_number;
    void number_seive(int limit){
        x = self.start_number;
        while (x <= limit){
            if (self.property_holds(p)){ self.process_integer(x); }
        }}} // Close all the scopes
public class PrimeSeive inherits NumberSeive{
    public boolean property_holds(int x){
        for (i = 2; i < x; i++){
            if (x % 2 == 0){ return false; }
        }        return true;  }
    void process_integer(int x) { System.out.println (x + " is prime!"); }
    int start_number = 2;
}

Factoring

Or the original version?

void primes(int limit){
    integer x = 2;
    while (x <= limit){
        boolean prime = true;
        for (i = 2; i < x; i++){
            if (x % 2 == 0){ prime = false; break; }
        }
        if (prime){ System.out.println(x + " is prime"); }
    }
}

Factoring

Something in between?

LinkedList get_primes(int limit){
    int x = 2; LinkedList results = new LinkedList();
    while (x <= limit){
        boolean prime = true;
        for (i = 2; i < x; i++){
            if (x % 2 == 0){ prime = false; break; }
        }
        if (prime){ results.append(x); }
    }
}
void primes(int limit){ 
    for x in get_primes(limit){
        System.out.println(x + " is prime"); 
    }
}

Factoring

  • The answer of course depends on the context
    • How likely am I to need more number seives?
    • How likely am I to do something other than print the primes?
  • The compromise is very adaptable and also rather comprehensible:
    • It is likely slower for printing the primes out
    • Unless your application is likely to process a large number of primes or other numbers, does it matter how efficient it is?

Defactoring

  • Numbers such as the number 20 can be factored in different ways
    • 2,10
    • 4,5
    • 2,2,5
  • If we have the factors 2 and 10, and realise that we want the number 4 included in the factorisation we can either:
    • Try to go directly by multiplying one factor and dividing the other
    • Defactor 2 and 10 back into 20 and then divide 20 by 4

Defactoring

  • Similarly your code is factored in some way
  • In order to obtain the factorisation that you desire, you may have to first defactor some of your code
  • This allows you to factor down into the desired components
  • This is often easier than trying to short-cut across factorisations

Sieve of Eratosthenes

  1. Create a list of consecutive integers from 2 to n: (2, 3, 4, ..., n)
  2. Initially, let p equal 2, the first prime number
  3. Starting from p, count up in increments of p and mark each of these numbers greater than p itself in the list
    • These will be multiples of p: 2p, 3p, 4p, etc.; note that some of them may have already been marked.
  4. Find the first number greater than p in the list that is not marked
    • If there was no such number, stop
    • Otherwise, let p now equal this number (which is the next prime), and repeat from step 3

Sieve of Eratosthenes


void primes(int limit){
    LinkedList prime_numbers = new LinkedList();
    boolean[] is_prime = new Array(limit, true);
    for (int i = 2; i ‹ Math.sqrt(limit); i++){
        if (is_prime[i]){
            prime_numbers.append(i);
            for (j = i * i; j ‹ limit; j += i){
                is_prime[j] = false;
    }
I can probably do this via our abstract number sieve class, but I doubt I want to. The alternative is to defactor back to close to our original version and then factor the way we want it.

Defactoring

  • Defactoring then can be used as the first step of refactoring
  • It might also simply be that you feel the current factored version is over-engineered
  • Flexibility is great, but it is generally not without cost
    • The cognitive cost associated with understanding the more abstract code
  • If the flexibility is not now or unlikely to become required then it might be worthwhile defactoring
  • Don't be shy in explaining your reasoning in comments and commit messages

Refactoring Can Better Document

What does this code do?

r = g.nextDouble();
d = 1.0 - (1.0/x * Math.log(r));
System.out.println (d);

Refactoring Can Better Document

Better?

dice = generator.nextDouble();
delay = 1.0 - (1.0/rate * Math.log(dice));
System.out.println (delay);

Refactoring Can Better Document

How about now?

// Choose a delay from the exponential distribution given the rate
dice = generator.nextDouble();
delay = 1.0 - (1.0/rate * Math.log(dice));
System.out.println (delay);

Refactoring Can Better Document

Do I need a comment?

double calculate_exponential_delay(double rate, Random generator){
    dice = generator.nextDouble();
    delay = 1.0 - (1.0/rate * Math.log(dice));
    return delay;
}
System.out.println (calculate_exponential_delay(rate));
Even if the method is defined elsewhere and you only see the print line

System.out.println (calculate_exponential_delay(rate));

Refactoring Can Better Document


System.out.println (calculate_exponential_delay(rate));
  • If your code is highly coupled it will be difficult to extract such self-documenting fragments
  • In this case, you have code you should try to re-arrange first before factoring out
  • If your factored out method has a ridiculously long name, or many parameters it is a good sign that it is not worth factoring out:

xs = calculate_exponential_delays_from_global_events(rate_function, 
                                                     generator , 
                                                     ...);

Have a Clear Goal

Refactoring

  • Always have a clear goal for your refactoring
  • Why is the current code inappropriately factored?
  • How will your improvements help this situation?
  • As always, you can record your reasons in your SCC commit messages

Breaking (Bad) Methods

Here is a question posted to stack overflow:

When is a function too long? [closed]

35 lines, 55 lines, 100 lines, 300 lines? When you should[sic] start to break it apart? I'm asking because I have a function with 60 lines (including comments) and was thinking about breaking it apart”

Breaking (Bad) Methods

Here is a question posted to stack overflow:

When is a function too long? [closed]

long_function(){ ... }

into:

small_function_1(){...}
small_function_2(){...}
small_function_3(){...}
“The functions are not going to be used outside of the long_function, making smaller functions means more function calls, etc. When would you break apart a function into smaller ones? Why?”

A Blog Post

Long methods and classes are evil

See the original here.
  1. Any method should fit into your IDE window ... I strive for an average of not more than 5 lines of code per method
  2. Too many private members .. say greater than 10
  3. The size of the code file. Offhand, I’d say any class over 10k in size ...
  4. Exception handling code and instrumentation tend to push methods to be much larger. Invest some thought in how to segregate this type of code away from the main functionality
  5. Reduce the number of public methods in a class. Just picking a number, I would say less than 10 in most cases.

Common Rules

These are some common rules:
  • “Methods should have no more than X lines of code”
  • “Classes should have no more than X methods/private variables”
  • “Files should have no more than X lines of code”

Ignore Such Rules

  • Most such rules are well intentioned
  • They are supposed to be easy to adhere to and check
  • But unless you understand the motivation behind such a rule, following it will do you no good
  • These rules tell you what not to write, but they do not explain what you should write instead
  • Not to mention the fact that most good rules have some exceptions

Example (long methods)


integer my_long_method(int input){
   int x = 0;
   ...
   // Do some stuff
   ...
   // Do some other stuff
   ...
   // Finally return
   return x;
}
  
Oh oh, this method is apparently 900 lines of code long.

Example (long methods)


void do_some_stuff(int x, int y){
    // Do some stuff
}
integer do_some_other_stuff(int x, String star){
    // Do some other stuff
    return x
}
integer my_long_method(int input){
   int x = 0;
   ...
   do_some_stuff(x, 0);
   x = do_some_other_stuff(x, input_string);
   // Finally return
   return x;
}
  
Ah good, this method is now only a few lines long

Example (long methods)

  • This is not any better, unless:
    • It aids the understanding of the method, it could obscure it
    • Any of the new smaller methods are plausibly called from elsewhere
      • Generally, if they are not currently called from elsewhere, then you have no evidence that they ever will be
      • You can wait until there is a need
      • Libraries are an obvious exception to this

Repeating Advice

  • Do not refactor and update functionality at the same time
  • If you are updating functionality and realise that a refactor would help, either:
    • Back of the current changes, refactor and merge in your new functionality or
    • Back of the current changes, throw them away, refactor and restart the new functionalityor
    • Finish the new functionality and then refactor
  • Obviously which of these you do will depend on how you have implemented and how the refactor interacts with that work

Refactoring Summary

  • Code should be factored into multiple components
  • Refactoring is the process of changing the division of components
  • Defactoring can help the process of changing the way the code is factored
  • Well factored code will be easier to understand
  • Do not update functionality at the same time

Summary

  • Refactoring is very important:
    • Refactoring allows us to avoid doing a large amount of upfront design and also avoid producing a a big hairy mess
    • Do not change functionality whilst refactoring
    • Your code should be adaptable
  • Do not optimise blindly, benchmark and profile
  • There is not a thing on this page that your source code control will not make easier

Your Proposals

  • Do not forget that your proposal can include questions to elicit feedback
  • The more specific your question is, the better the feedback can be directed
  • At the very least, give your own answer to the question

Examples

  • “Should I use language X for this project?”
    • I already have your proposal so I can answer this question
    • I might assume that your answer is yes
    • However, if you tell me why you think that, I can better explain why I agree/disagree with your line of reasoning

Examples

  • “Should I use language X for this project?
    • I know X a lot so I'm pretty confident in it
    • There is a well matured library which does ...
    • I do not think I require static typing here because ...

Any Questions