Refactoring
Software Engineering Large Practical
Software Development Employment
- Nearly all job descriptions include some mention of experience
- But note: few of them specify experience of success
- Experience of both success and failure are both important
- Hence, do not be afraid to be wrong
- The trick is to recognise failure
- Unfortunately this is much harder than it may seem
Failure
- Failure is a necessary part of learning
- And certainly a necessary part of mastery
- If you are not failing regularly you are either:
- Not being ambitious enough and/or
- being very lucky
- In either case, you are not learning much
Failure
Source Code Control
- This is why SCC is so well loved
- It assists in making lots of micro-failures that can be thrown away
- Rather than one big failure
- Whenever you are facing a tough choice (or even an easy one):
- Start a new branch and just try something
- If it does not work, you can just throw the branch away
Chess
How long does it take to learn to play chess?
Chess
How long does it take to learn to play chess?
- Typically two or three answers:
- About 20 minutes
- You never stop
- 15 or so years, 10k hours
Expert in 10K Hours
- Refers to the idea
that it takes approx 10K hours to become
an expert in anything
- So if you wish to do this in one year, you are looking at: ~27 hours a day
- You might want to budget for more than a year
- Even at four years that is ~7 hours a day
- 9-5 with an hour for lunch every day
Note, the idea is debateable, seems to refer to being world-class rather
than “merely” an expert. You might also need some innate talent as well.
Programming Java?
- Apparently 21 days
- Just as for Chess, you can learn the rules of programming, relatively quickly
- But it takes much longer to learn to program well
- So what is it that you are learning in all this time?
Recall: Adaptability
Adaptability
- This is a great sentiment, but it is horribly vague
- What do we mean by code being adaptable?
- And how do we achieve this?
- How do we even know if we have achieved it?
Adaptability
- That is a large fraction of what you will learn in your 10K hours
- That is why failure is so important
- Only when you write non-adaptable code and experience the results will you appreciate how to (attempt to) avoid it
- I cannot teach you this in a lecture, otherwise it would not take you 10K hours to learn
- However, I can tell you two important strategies:
- Refactoring
- Testing
Refactoring
- Refactoring is the process of changing code such that it computes
exactly the same function (of inputs to outputs), but has a better
design.
- This is tremendously powerful, because it allows us to try out
various designs, rather than guessing which one is the best
- It allows us to determine whether something is possible, without
necessarily building it in the best way
- It allows us to design retrospectively once we know significant
details about the problem at hand.
- It allows us to avoid the cost of full commitment to a particular
solution which, ultimately, fails.
Janitorial Work
- Wish to discuss two points:
- Real and Logical Time
- How to break a rule
- To do so I'll need the notion of janitorial work
- Examples of Janitorial
- Reformating
- Commenting
- Changing Names
- Tightening
Janitorial Work
Reformating
void method_name (int x)
{
return x + 10;
}
Becomes:
void method_name(int x) {
return x + 10;
}
- There is plenty of software which will do this for you
- Reformatting should be largely unnecessary, if you keep your code
formatting correctly in the first place
- More commonly required on group projects
Janitorial Work
Reformating
- Refomatting is entirely superficial
- It is important to consider when you apply this
- Reformatting can result in a large ‘diff’
- This may well conflict with other work performed concurrently
Janitorial Work
Commenting
Janitorial Work
Commenting
Ultra bad:
// increment x
x += 1;
Better:
// Since we now have an extra element to consider
// the count must be incremented
x += 1;
Janitorial Work
Changing Names
- The previous example used
x
as a variable name
- Unless it really is the x-axis of a graph, choose a better name
- This is of course better to do the first time around
- However as with commenting, unclear code can often be more obvious to
its author upon later reading it
// Since we now have an extra element to consider
// the count must be incremented
number_of_set_elements += 1;
Janitorial Work
Tightening
void main(...){
run_simulation();
}
Tightened to become:
void main(...){
try{
run_simulation();
} catch (FileNotFoundException e) {
// Explain to the user ..
}
}
Janitorial Work
Tightening
void compute_equation(int z){
// .. complex calculation
return result;
}
Tightened to become:
void compute_equation(int z){
if (z < 0){
raise // Some horrifying exception
}
// .. complex calculation
return result;
}
Janitorial Work
Tightening
- For some this is not janitorial work, since it actually changes in
a non-superficial way the function of the code
- I place it here, since similar to other forms it is often
caused by being unable to think of everything when writing new code
- And similarly you often come across such slackness when re-reading your code for some other purpose
Janitorial Work
- Most of this work is work that arguably could have been done right
the first time around when the code was developed
- However, when developing new code, you have limited cognitive capacity
- You cannot think of everything when you develop new code, janitorial work
is your time to rectify the minor stuff you forgot
- Better than trying to get it right first time is making sure you
later review your code
Janitorial Work
- “Refactoring is the process of changing code such that it computes
exactly the same function (of inputs to outputs), but has a better
design.”
- Strictly speaking janitorial work is not refactoring
- It should not change the function of the code
- Tightening might, but generally for exceptional input
- But neither does it make the design any better
- In common with refactoring:
- You are making existing code easier to understand/use
- You should not perform janitorial work on
pre-existing code whilst developing new code
Janitorial Work
- Suppose I'm implementing some new feature and I come across this
// prase the 'validate' command
It's tempting to fix it right now and you should
// parse the 'validate' command
So how do I follow these two bits of advice?
How do I “fix small things right now” whilst
also avoiding “doing two things at once”
Real and Logical Time
- The answer is, I fix it right now in real time, but
use SCC to avoid doing two things at once in logical time
- You should be on a development branch and do this:
$ git checkout master # go back to the master branch
$ editor mycode.cobol # Fix the typo
$ git commit -a -m "Fixed a prase->parse typo"
$ git checkout dev # go back to your development branch
$ git rebase master # pretend you fixed the typo before
Do not use cobol, that's just a joke
Real and Logical Time
- If you just fix it right now in the midst of current development
- You may throw away the current development
- So you would also be throwing away the small fixes you have done
How to Break a Rule
- “you should not perform janitorial work on
pre-existing code whilst developing new code”
- If this is “too much work right now” then just fix the
typo rather than leave it as is
- In other words, if you must break the rule, break it such that the
code is still fixed
- This is especially true if the fix is some form of tightening
- Of course if the fix itself is too much work for right now, then
it should go in a bug tracker
Finally
Janitorial Work
- It will not do you any harm to use the phrase
“janitorial work” in your commit messages
More About Refactoring
- Refactoring is a term which encompasses both factoring and defactoring
- Generally the principle is to make sure that code is written exactly once
- We hope for zero duplication
- However, we would also like for our code to be as simple and
comprehensible as possible
Factoring and Defactoring
- We avoid duplication by writing re-usable code
- Re-usuable code is generalised
- Unfortunately, this often means it is more complicated
- Factoring: is the process of removing common or replaceable
units of code, usually in an attempt to make the code more general
- Defactoring: is the opposite process: specialising a unit of code
usually in an attempt to make it more comprehensible
Factoring
void primes(int limit){
integer x = 2;
while (x <= limit){
boolean prime = true;
for (i = 2; i < x; i++){
if (x % i == 0){ prime = false; break; }
}
if (prime){ System.out.println(x + " is prime"); }
}
}
A very naive but perfectly reasonable bit of code to print out a set of
prime numbers up to a particular limit
Factoring
void print_prime(int x){
System.out.println(x + " is prime");
}
void primes(int limit){
x = 2;
while (x <= limit){
... // as before
if (prime){ print_prime(x); }
}
}
Here we have “factored out” the code to print the prime number
to the screen. This may make it more readable,
but I have not made the code more general.
Factoring
To make it more general we have to actually parameterise what we do with
the primes once we have found them.
interface PrimeProcessor{
void process_prime(int x);
}
class PrimePrinter implements PrimeProcessor{
public void process_prime(int x){
System.out.println(x + " is prime");
}
}
void primes(int limit, PrimeProcessor p){
x = 2;
while (x <= limit){
... // as before
if (prime){ p.process_prime(x); }
}
}
Factoring
If I wish to store the primes instead:
class PrimeRecorder implements PrimeProcessor{
public LinkedList primes;
public PrimeRecorder(){
self.primes = new LinkedList();
}
public void process_prime(int x){
self.primes.append(x);
}
}
Factoring
I can go further and factor out the testing as well:
interface PrimeTester{
boolean is_primes(int x);
}
class NaivePrimeTester implements PrimeTester{
public boolean is_prime(int x){
for (i = 2; i < x; i++){
if (x % i == 0){ return false; }
}
return true;
}
}
void primes(int limit, PrimeTester t, PrimeProcessor p){
x = 2;
while (x <= limit){
if (t.is_prime(p)){ p.process_prime(x); }
}
}
Factoring
Now that I've factored out the test, it does not have to be used solely for
primes
interface IntTester{
boolean property_holds(int x);
}
class NaivePrimeTester implements IntTester{
public boolean property_holds(int x){
for (i = 2; i < x; i++){
if (x % 2 == 0){ return false; }
}
return true;
}
} // Similarly for PrimeProcessor to IntProcessor
void number_seive(int limit, IntTester t, IntProcessor p){
x = 0;
while (x <= limit){
if (t.property_holds(p)){ p.process_integer(x); }
}
}
Factoring
Print the perfect numbers:
interface IntTester{
boolean property_holds(int x);
}
class PerfectTester implements IntTester{
public boolean property_holds(int x){
return (sum(factors(x)) == x);
}
} // Similarly for PerfectProcessor
void number_seive(int limit, IntTester t, IntProcessor p){
x = 0;
while (x <= limit){
if (t.property_holds(p)){ p.process_integer(x); }
}
}
Factoring
We might find the two extra parameters a bit ugly, no problem:
public abstract class NumberSeive{
abstract boolean property_holds(int x);
abstract void process_integer(int x);
abstract int start_number;
void number_seive(int limit){
x = self.start_number;
while (x <= limit){
if (self.property_holds(p)){ self.process_integer(x); }
}
}
}
Factoring
Here is the code for printing the primes:
public class PrimeSeive inherits NumberSeive{
public boolean property_holds(int x){
for (i = 2; i < x; i++){
if (x % 2 == 0){ return false; }
} return true; }
void process_integer(int x) { System.out.println (x + " is prime!"); }
int start_number = 2;
}
Factoring
Print the perfect numbers:
public class PerfectSeive inherits NumberSeive{
public boolean property_holds(int x){
return (sum(factors(x)) == x); }
void process_integer(int x) { System.out.println (x + " is perfect!"); }
int start_number = 2;
}
Factoring
So which version do we prefer? This one:
public abstract class NumberSeive{
abstract boolean property_holds(int x);
abstract void process_integer(int x);
abstract int start_number;
void number_seive(int limit){
x = self.start_number;
while (x <= limit){
if (self.property_holds(p)){ self.process_integer(x); }
}}} // Close all the scopes
public class PrimeSeive inherits NumberSeive{
public boolean property_holds(int x){
for (i = 2; i < x; i++){
if (x % 2 == 0){ return false; }
} return true; }
void process_integer(int x) { System.out.println (x + " is prime!"); }
int start_number = 2;
}
Factoring
Or the original version?
void primes(int limit){
integer x = 2;
while (x <= limit){
boolean prime = true;
for (i = 2; i < x; i++){
if (x % 2 == 0){ prime = false; break; }
}
if (prime){ System.out.println(x + " is prime"); }
}
}
Factoring
Something in between?
LinkedList get_primes(int limit){
int x = 2; LinkedList results = new LinkedList();
while (x <= limit){
boolean prime = true;
for (i = 2; i < x; i++){
if (x % 2 == 0){ prime = false; break; }
}
if (prime){ results.append(x); }
}
}
void primes(int limit){
for x in get_primes(limit){
System.out.println(x + " is prime");
}
}
Factoring
- The answer of course depends on the context
- How likely am I to need more number seives?
- How likely am I to do something other than print the primes?
- The compromise is very adaptable and also rather comprehensible:
- It is likely slower for printing the primes out
- Unless your application is likely to process a large number of
primes or other numbers, does it matter how efficient it is?
Defactoring
- Numbers such as the number 20 can be factored in different ways
- If we have the factors 2 and 10, and realise that we want the number
4 included in the factorisation we can either:
- Try to go directly by multiplying one factor and dividing the other
- Defactor 2 and 10 back into 20 and then divide 20 by 4
Defactoring
- Similarly your code is factored in some way
- In order to obtain the factorisation that you desire, you may have
to first defactor some of your code
- This allows you to factor down into the desired components
- This is often easier than trying to short-cut across factorisations
Sieve of Eratosthenes
- Create a list of consecutive integers from 2 to n: (2, 3, 4, ..., n)
- Initially, let p equal 2, the first prime number
- Starting from p, count up in increments of p and mark each of these
numbers greater than p itself in the list
- These will be multiples of p: 2p, 3p, 4p, etc.;
note that some of them may have already been marked.
- Find the first number greater than p in the list that is not marked
- If there was no such number, stop
- Otherwise, let p now equal this number (which is the next prime),
and repeat from step 3
Sieve of Eratosthenes
void primes(int limit){
LinkedList prime_numbers = new LinkedList();
boolean[] is_prime = new Array(limit, true);
for (int i = 2; i ‹ Math.sqrt(limit); i++){
if (is_prime[i]){
prime_numbers.append(i);
for (j = i * i; j ‹ limit; j += i){
is_prime[j] = false;
}
I can probably do this via our abstract number sieve class, but I doubt
I want to. The alternative is to defactor back to close to our original
version and then factor the way we want it.
Defactoring
- Defactoring then can be used as the first step of refactoring
- It might also simply be that you feel the current factored version is
over-engineered
- Flexibility is great, but it is generally not without cost
- The cognitive cost associated with understanding the more
abstract code
- If the flexibility is not now or unlikely to become required
then it might be worthwhile defactoring
- Don't be shy in explaining your reasoning in comments and commit messages
Refactoring Can Better Document
What does this code do?
r = g.nextDouble();
d = 1.0 - (1.0/x * Math.log(r));
System.out.println (d);
Refactoring Can Better Document
Better?
dice = generator.nextDouble();
delay = 1.0 - (1.0/rate * Math.log(dice));
System.out.println (delay);
Refactoring Can Better Document
How about now?
// Choose a delay from the exponential distribution given the rate
dice = generator.nextDouble();
delay = 1.0 - (1.0/rate * Math.log(dice));
System.out.println (delay);
Refactoring Can Better Document
Do I need a comment?
double calculate_exponential_delay(double rate, Random generator){
dice = generator.nextDouble();
delay = 1.0 - (1.0/rate * Math.log(dice));
return delay;
}
System.out.println (calculate_exponential_delay(rate));
Even if the method is defined elsewhere and you only see the print line
System.out.println (calculate_exponential_delay(rate));
Refactoring Can Better Document
System.out.println (calculate_exponential_delay(rate));
- If your code is highly coupled it will be difficult to extract such
self-documenting fragments
- In this case, you have code you should try to re-arrange first before
factoring out
- If your factored out method has a ridiculously long name, or many
parameters it is a good sign that it is not worth factoring out:
xs = calculate_exponential_delays_from_global_events(rate_function,
generator ,
...);
Have a Clear Goal
Refactoring
- Always have a clear goal for your refactoring
- Why is the current code inappropriately factored?
- How will your improvements help this situation?
- As always, you can record your reasons in your SCC commit messages
Breaking (Bad) Methods
Here is a question posted to stack overflow:
When is a function too long? [closed]
“35 lines, 55 lines, 100 lines, 300 lines?
When you should[sic] start to break it apart?
I'm asking because I have a function with 60 lines (including comments)
and was thinking about breaking it apart”
Breaking (Bad) Methods
Here is a question posted to stack overflow:
When is a function too long? [closed]
long_function(){ ... }
into:
small_function_1(){...}
small_function_2(){...}
small_function_3(){...}
“The functions are not going to be used outside of the long_function, making smaller functions means more function calls, etc.
When would you break apart a function into smaller ones? Why?”
A Blog Post
Long methods and classes are evil
See the original here.
- Any method should fit into your IDE window ...
I strive for an average of not more than 5 lines of code per method
- Too many private members .. say greater than 10
- The size of the code file. Offhand, I’d say any class over 10k in size ...
- Exception handling code and instrumentation tend to push methods to be much
larger. Invest some thought in how to segregate this type of code away from
the main functionality
- Reduce the number of public methods in a class. Just picking a number,
I would say less than 10 in most cases.
Common Rules
These are some common rules:
- “Methods should have no more than X lines of code”
- “Classes should have no more than X methods/private variables”
- “Files should have no more than X lines of code”
Ignore Such Rules
- Most such rules are well intentioned
- They are supposed to be easy to adhere to and check
- But unless you understand the motivation behind such a rule,
following it will do you no good
- These rules tell you what not to write, but they do not explain
what you should write instead
- Not to mention the fact that most good rules have some exceptions
Example (long methods)
integer my_long_method(int input){
int x = 0;
...
// Do some stuff
...
// Do some other stuff
...
// Finally return
return x;
}
Oh oh, this method is apparently 900 lines of code long.
Example (long methods)
void do_some_stuff(int x, int y){
// Do some stuff
}
integer do_some_other_stuff(int x, String star){
// Do some other stuff
return x
}
integer my_long_method(int input){
int x = 0;
...
do_some_stuff(x, 0);
x = do_some_other_stuff(x, input_string);
// Finally return
return x;
}
Ah good, this method is now only a few lines long
Example (long methods)
- This is not any better, unless:
- It aids the understanding of the method, it could obscure it
- Any of the new smaller methods are plausibly called from elsewhere
- Generally, if they are not currently called from elsewhere,
then you have no evidence that they ever will be
- You can wait until there is a need
- Libraries are an obvious exception to this
Repeating Advice
- Do not refactor and update functionality at the same time
- If you are updating functionality and realise that a refactor would help, either:
- Back of the current changes, refactor and merge in your new functionality or
- Back of the current changes, throw them away, refactor and restart the new functionalityor
- Finish the new functionality and then refactor
- Obviously which of these you do will depend on how you have implemented and how the refactor interacts with that work
Refactoring Summary
- Code should be factored into multiple components
- Refactoring is the process of changing the division of components
- Defactoring can help the process of changing the way the code is factored
- Well factored code will be easier to understand
- Do not update functionality at the same time
Summary
- Refactoring is very important:
- Refactoring allows us to avoid doing a large amount of upfront
design and also avoid producing a
a big hairy mess
- Do not change functionality whilst refactoring
- Your code should be adaptable
- Do not optimise blindly, benchmark and profile
- There is not a thing on this page that your source code control will
not make easier
Your Proposals
- Do not forget that your proposal can include questions to elicit feedback
- The more specific your question is, the better the feedback can be directed
- At the very least, give your own answer to the question
Examples
- “Should I use language X for this project?”
- I already have your proposal so I can answer this question
- I might assume that your answer is yes
- However, if you tell me why you think that, I can better
explain why I agree/disagree with your line of reasoning
Examples
- “Should I use language X for this project?
- I know X a lot so I'm pretty confident in it
- There is a well matured library which does ...
- I do not think I require static typing here because ...
”