Methodology

Software Engineering Large Practical

Points of Contact

Please email me: a.d.clark@ed.ac.uk
Discussion forum: https://discuss.inf.ed.ac.uk/?q=forum/5
Ask questions in class
I'm available for at least 20 minutes in the downstairs common area after every lecture

Student Reps Meeting

“SELP the students has asked for resources to be made available to them, other than a Google search. It is an individual project that students have to work on their own and the framework is fine but some help would be good from the lecturer.”

Student Reps Meeting

Going through the student representatives is a good thing that I encourage, particularly where you have concerns
However, if you actually want a response, this is a very indirect method
I hope you can agree that the above is very difficult for me to answer effectively
I am positive that the original wording from the student has been somewhat garbled into what I have received
As a result, I'm not really sure how to address this
But I will try nonetheless

Today's Lecture

I will begin to address this concern
But first, I want to speak about your methodology, because it affects how I should address this
I will then lead into a discussion of resource options, which is incomplete
If you feel I have failed in my attempt, get back in touch

Anonymous Feedback

I understand that going through the SSLC meeting has the benefit of anonymising question/complaints
- That is a large part of the purpose of the SSLC
You can either use the discussion forum, this is a weak form of anonymising in that I can see your matric numbers and can find out who you are if I wish (I won't but I clearly could)
I'm happy to accept anonymous email: eg.
- http://notsharingmy.info/
- http://www.sendanonymousemail.net/
- If I can't reply to your anonymous email I'll respond via the Clarifications page

Virtual DiCE

For those of you who wish to work away from the university
You may be interested in Virtual DiCE
This is essentially the School's DiCE setup but run in a virtual machine
It can be run on any machine which can run VirtualBox
More information here: http://computing.help.inf.ed.ac.uk/vdice

Conflicting Constraints

You would like your project to be:
- Easy to maintain/modify
- Readable
- Fast and memory efficient
- Secure
Unfortunately these properties tend to be in conflict:
- Improving one means compromising on at least one other

Conflicting Constraints

Traditional Engineering Projects

Such as building a bridge
Begin with a lengthy design process
You cannot begin the building process until you have completed the design process
This is because modifying the design mid-way through the building is expensive (at best)

Waterfall

When people first started to build software large projects they encountered similar problems and sought to copy the techniques from traditional engineering projects to alieviate them
This meant rigourous requirement specification and analysis
Followed by detailed design
Followed by implementation
Followed by testing

Software Development

However, we soon realised that software development is somehow unique
It has two important distinguishing properties:
1. Implementations can be copied
2. The requirements are often non-obvious

Implementations Can be Copied

Because implementations can be copied anything you do implement is likely to be new
If there were an existing solution to your problem you could just reuse that solution
This is normally not true of traditional engineering projects:
- Just because there exists a bridge somewhere does not mean you do not have to build a new one
Of course you may still build something that has been built before, but not something that you have built before

Bridge Requirements are Obvious

The requirements of a bridge, or a stadium, etc. are mostly obvious
Some requirements may not be obvious, but relatively easy to write down
Many engineering projects are of objects that we all use/need
- We are all relative experts in the use of houses
This is often far less true of software projects

Exhaustive Upfront Design

We have learnt that exhaustive upfront design is generally wasteful
Worse, it can be deterimental by locking your design into a bad fit for the solution
We have learnt that more important than accurate design upfront is to remain flexible
Keep your design and implementation such that it can be changed

Backtracking

What all approaches to software development are trying to minimise is backtracking
That is: removing work you have done because you now realise it is wrong
Upfront design attempts to avoid this by foreseeing any problems
Iterative implementations avoid this by being flexible

Writing Software is Searching

You are searching for the correct solution
There are two ways to search:
1. Try to determine where the item is and then look there
2. Just look in many places, it must be in one of them
Obviously there is a continuum of approaches between these extremes
- Search many places, but limit the number of places to those that are plausible, likely, probable

Searching and Backtracking

Bridge Requirements are Stable

When the original requirements are changed, for example the level of traffic exceeds what was originally planned for, you build a new bridge
But software requirements tend to change frequently
- Partly because the users did not know what they wanted to begin with
- Partly because the environment around the software changes:
  - They now want it to work on their smartphone
  - They now want it to work on a Mac
  - They now do not want a Java applet/Flash application etc.

Adaptability and Moving Goal Posts

Difficulty

Knowing whether you are heading in the correct direction
- If you are heading in the wrong direction you either need to backtrack or change direction
Note: this is as difficult whether you are attempting to find the correct direction before starting or not

Refactoring

Refactoring is the process of changing code such that it computes exactly the same function (of inputs to outputs), but has a better design.
This is tremendously powerful, because it allows us to try out various designs, rather than guessing which one is the best
It allows us to determine whether something is possible, without necessarily building it in the best way
It allows us to design retrospectively once we know significant details about the problem at hand.
It allows us to avoid the cost of full commitment to a particular solution which, ultimately, fails.

Suggested Strategy

Note that this is merely a suggested strategy
1. Start with the simplest program possible
2. Incrementally add features based on your proposal's requirements
3. After each feature is added, refactor your code
  - This step is important, it helps to avoid the risk of developing an unmaintainable mess
  - Additionally it should be done with the goal of making future feature implementations easier
  - Test that your refactor has not modified functionality

Suggested Strategy

At each stage, you always have something that works
Although you need not specifically design for later features you do at least know of them, and hence can avoid doing anything which will make those features particularly difficult.

Alternative Inferior Strategy

Design the whole system before you start
Work out all components and sub-components you will need
Start with the sub-components which have no dependencies
Complete each sub-component before moving on to the next
Once you have developed all the dependencies of a component you can now choose that component to develop
Finally, put everything together to obtain the entire system
- Test the entire system

Methodology and Resources

So, do not worry too much about making mistakes early on
Just make sure you get started
If you have made a mistake, that is not a problem you can correct it
With that in mind, I will try to give some more resources
These necessarily depend on language choice, so I'll start there

Language Choice

The first choice you will have to make is your choice of implementation language
Even in this you are somewhat helped by the fact that web deployments are naturally split at least into client-end code and server-end code
As a result, you can, and likely will, make use of at least two programming languages
That said, implementation language is one of the few choices you have to make early on and is near irreversible

Language Choice

Languages come in many varieties, here are some of the distinctions made:
1. Compiled vs Interpreted
2. Strongly typed vs Weakly typed
3. Statically typed vs Dynamically typed
4. Functional vs Imperative
5. Object Oriented vs Classless
6. Lazy vs Eager
7. Managed vs Unmanaged
For the most part these are independent of each other giving us 2⁷ (128) possibilities
You may have already made your choice, but the information here may help you to apply your choice of language well

Compiled vs Interpreted

Many languages will claim to be either a “compiled language” or an “interpreted language”
The distinction is intended to be simple:
- Either the source code is translated into machine code and then run or:
- An interpreter reads the source code and executes each line of code dynamically

Compiled/Interpreted Language?

There is not really any such thing as a “compiled language” or an “interpreted language”
There are compiler or interpreter implementations
A language may have one particularly official implementation
Interpreters are nearly always implemented via some kind of bytecode
So we only really have compiler implementations, it is just a question of what that compiler targets, physical machines or virtual bytecode machines

Compiled/Interpreted Implementations

Ocaml has ocamlbyte and ocamlopt
Java is generally compiled to the JVM, but implementations such as gcc-java exist
C# and some other languages now target the CLR runtime
Python is generally interpreted but Cython exists (an optimising static compiler)

Compiled vs Interpreted

Conclusion

The distinction between compiled and interpreted is one of implementation not languages
However, some language features lend themselves to one more easily than the other
But, increased runtime sophistication has meant that the line between compiled and interpreted has become increasingly blurred
Your language choice should probably not focus too heavily on whether the official language implementation is a compiler or an interpreter

Type Systems

Languages involve expressions which evaluate to values
It is possible to give a type to those values
We can then check that operations use values of an appropriate type
For example we may check that we are not trying to add a string to an integer: 3 + "hello"
The types may also determine what the operation is:
- Integer addition: 3 + 2
- Floating point addition: 3.0 + 2.0

Type Systems

Some type systems also give types to statements
For example some type systems determine what exceptions may be raised by a given command (which may be a sequence of commands)
Some such type systems oblige the user to declare these exceptions
For our purposes we will concentrate on the typing of expressions/values

Strongly typed vs Weakly typed

This is often confused as a distinction between statically and dynamically typed languages but this distinction is quite separate
One can have static-strong, static-weak, dynamic-strong, dynamic-weak

Strongly typed vs Weakly typed

Strongly: Objects of the wrong, or incompatible types cause an error:
- 3 + "5" = error, as seen in C++, Java, Python, Ocaml
Weakly: Objects of the wrong, or incompatible types are converted:
- 3 + "5" = "35" in Javascript
- 3 + "5" = 8 in PHP, Perl5, Tcl

Advantages of Strong Typing

When something goes wrong, the error is produced as soon as it is discovered
This makes it easier to investigate the source of the error
Additionally, you are less likely to calculate incorrect results
Often, incorrect results are worse than no results

Advantages of Weak Typing

Occasionally completing a computation and obtaining a result is better than obtaining no result
Even if the result you obtain is wrong
Displaying a web page wrongly is generally better than not displaying it at all
You can implement this in either a strongly or weakly typed language but it is easier in a weakly typed one

Strong vs Weak Typing

Conclusion

Backend Server Code

Chances are you should care more about getting the correct result than getting any kind of a result, so you should use a language with a strong type system
But do not confuse weak typing with other type system distinctions, such as nominative, structural, duck typing

Frontend Interface Code

More likely to care about getting any result
In any case you are ultimately limited to Javascript, though you may choose a strongly typed language that compiles to Javascript

Statically typed vs Dynamically typed

A statically typed language specifies that the typing of expressions should be done before the program is run
A dynamically typed language specifies that the typing of values should be done whilst the program is run
Not to be confused with implicit/explicit typing

I have investigated type systems, but:

Source: TIOBE language index

Statically typed vs Dynamically typed

One reason to type expressions is to aid compilation
Recall the typing of the operands to an addition operator meant that we could determine what kind of addition is required
We might also need to know the size of the computed value so that we know where it might be stored
Obviously, if the purpose of the types is to aid compilation, the type checking will have to be done statically
More importantly the typing of expressions and values is done to avoid the computation of incorrect results

Advantages of Static Typing

Type errors are caught before you attempt to run the program
- This means for example that type errors should not occur mid-run on a user's machine
- Even during development, perhaps you have a program that:
  - takes seconds to compile,
  - minutes/hours to run
  - and a type error in the final printing of the result
- Using static types you will be alerted to the type error after the compile
- Using dynamic types you will be alerted at the end of a first run

Advantages of Static Typing

You may be releasing a library, which isn't “run”
- Of course you should have a test suite with 100% code coverage
- That does not always mean the tests are particularly useful
- What you should have and what you do have are not always the same
- Static typing gives you some kind of guarantee for “free”

Advantages of Dynamic Typing

Static type checking is necessarily conservative
This means it will reject some programs that ultimately would not, when run, have resulted in a type error
During development you can avoid type checking code you know will not be run, this is a subtle point (if there is time at the end I will expand it)
You should be testing comprehensively
- If so, this lessens the gains from static typing

Two Competing Forces

When programmers learn static type systems it often feels like you are getting more program correctness for free
It seems as though it is not quite for free, and that the static type system does hamper productivity in the short term
It also seems likely that static type systems can save on some kinds of work in the future
The question is, does short term loss in productivity repay for itself with long term increase in productivity?

Philosophy of Typing

Just as I suggested that a language can be neither a compiled nor interpreted language it is also something of an implementation issue as to when typing is performed
However, there is generally a type system attached to each language
Some type systems are very difficult or even impossible to fully check statically
Some language designers deliberately ensure that it is possible to statically type check the language

Statically Typed vs Dynamically Typed

Conclusion

The distinction between statically typed and dynamically typed is in theory one of implementation, but in practice one of language
The distinction though is softer than some may suggest
It is more of a gradient than a dichotomy
For this project, either kind of type system will be fine
But, whichever choice you make, I recommend making use of additional static analysers
And, whichever choice you make, you should write some tests

Functional vs Imperative

This distinction is somewhat disputed
The main idea is that a functional language computes values of expressions, but does not modify state
An imperative language is simply a non-functional language, that is, one which allows/encourages the programmer to directly modify state

Functional vs Imperative

It turns out, that a lot of programs involve a lot of functional computation, with a very small amount of state modification
Hence, the term functional is often relaxed to include those languages that discourage state modification
More importantly, such languages, encourage declarative code.
- That is, code which does not modify the state

Functional Programming

I tend to describe any language with proper support and syntax for nested, higher-order functions to be functional
A higher-order function is simply one that:
- Takes one or more functions as parameters
- Returns a function as a result
In general treating functions as any other kind of value is known as providing first class functions
If the language also allows nested functions which can access the scope of containing functions, the implementation requires function closures
The provision of nested, higher-order functions usually encourages declarative programming

Functional Programming

Languages which entirely forbid state updates I describe as strictly functional
Even this is a little confusing because some people describe eager evaluation as strict evaluation
So I might also say a pure functional language or simply a pure language

Functional Advantages

The key advantage of a functional programming language is the hugely pretentious phrase “referential transparency”
I'm not sure, but I suspect this phrase is one reason functional programming languages are not more widely adopted
It means, that an expression evaluates to the same result regardless of the time, or state, in which it is evaluated
In particular invoking a function: some_fun(args) with the same arguments args will always produce the same result

Functional Advantages

This makes testing and/or reasoning about the correctness of code much easier
In theory, it means code is more re-usable
- This is debatable, and not, to my knowledge, demonstrated (either way) satisfactorily
- But it's certainly plausible

Functional Advantages

In theory, this additionally allows for some interesting compiler optimisations, consider the following double transformation over a list of items:


some_list = map f (map g original_list)

This is common in both functional and imperative languages, even if in imperative languages it is an array which is looped over.

Functional Advantages


some_list = map f (map g original_list)

It can be re-written to, the faster:


fg = f . g
some_list = map fg original_list

Where f . g is the composing of two functions together. This is faster because it only loops over the list once.

Functional Advantages


some_list = map f (map g original_list)

It can be re-written to, the faster:


fg = f . g
some_list = map fg original_list

However, this optimisation, changes the order of execution. So it is only applicable where, f does not modify state which g references or vice-versa. In a functional language this is both, more likely and easier to automatically check.

Functional Advantages


fg = f . g
some_list = map fg original_list

Similarly if you have multiple processors, you could begin the second map operation in parallel as soon as the first transforms the first item.
Again, only if you can determine that there are no state dependencies.
In general parallel programming can in theory be advanced by limiting state updates

Imperative Advantages

With no state modifications all information required by any function must be passed in as an argument
This can arguably make the code more complicated
Worse, it can require a large refactoring in order to make a relatively simple change

Imperative Advantages

However, recall that my definition of a functional programming language did allow for state modifications.
It only required nested, higher order functions
It's hard to argue that not providing these is an advantage to the programmer
One could argue that the implementation (of the language) is simpler
- It is debatable, but one can certainly argue that the implementation of nested higer-order functions, requires a performance degradation
- Functions are more heavyweight and hence more expensive to invoke

Functional Web Development

Most of the state changes and flow-of-control will be done by the web server not your web application code
Your web application code will be providing answers to specific requests
This makes a functional language somewhat appealing
However, this also means the implementation of web server framework tends to be awkward
- which in turn means there are fewer of them

Functional vs Imperative

Conclusion

You could certainly use either a functional or an imperative language for this practical
You're probably best off with whichever you prefer
There is arguably better resources for imperative languages

Object Oriented vs Classless

Given my glowing recommendation for higher-order functions why are they not more commonly used?
Classes, or objects, allow for a similar abstraction
An object is really a collection of state together with operations over that state

Typical Class Definition


class ClassName (ParentClass){
   classmember_1 = 0;
   classmember_2 = "hello";

   void class_method_1(int i){
       self.classmember += i;
   }
   void class_method_2(String suffix){
       print_to_screen(self.class_method_2);
       print_to_screen(suffix);
   }
}

Object Oriented Languages Popularity

Category	Ratings Sep 2013	Delta Sep 2012
Object-Oriented Languages	56.0%	-1.1%
Procedural Languages	37.3%	-0.9%
Functional Languages	3.8%	+0.6%
Logical Languages	3.0%	+1.3%

Source: TIOBE language index

Advantages of Object-Oriented

Surprisingly debatable
Most people agree that there is some value in object-oriented programming
But when asked to give concrete advantages, most offer:
- Vague perceived benefits, with no logic connecting to OOP:
  - Advances reuse
  - Better models the real world
- Clear benefits but which are not unique to OOP:
  - Polymorphism (fancy word for a specific kind of generality)
  - Encapsulation (fancy word for hiding/abstraction)

Advantages of Classless

No one really argues that the provision of classes is inherently destructive
In a similar way to higher-order functions, having the ability to utilise classes does not do any harm if you never use them
However, once the temptation is there, it's easy to go class crazy
But such arguments are not arguments against the use of an object-oriented language, so much as an argument for careful use of classes

Object Oriented vs Classless

Conclusion

By all means choose an object-oriented language
There is little reason not to, but pure languages often do not have a notion of an object
- This is for good reason and should not put you off choosing a pure language
If you do choose an object-oriented language, use your classes with care
Classes are just one way of organising source code.
- There are others which are just as effective
- Using an OOP language will not magically organise your source code for you

Lazy vs Eager

Conclusion

I'll not get into this debate, but I'm happy to discuss this after the lecture with anyone interested
Main conclusion: either is fine

Managed vs Unmanaged

Automatic memory management, sometimes called garbage collection
Without this, whenever you need to store a value in memory, you must first ask for the space in memory
When a value in memory is no longer useful, you should give back the space in memory that it used
If you let the last reference to a value go out of scope, without freeing up the associated memory, you will not ever do so, hence you have a space-leak
Unfortunately, if you give back the memory too soon, you may subsequently try to reference the value, this may cause a segmentation fault

Advantages of Memory Management

You need not manage the memory yourself, this is hugely liberating
I believe there is much gained productivity associated with:
- Object Oriented Languages
- Dynamically/Statically typed languages
- Lazy languages
- Reflection
which is actually gained productivity from automatic memory management which has been misattributed to the above
I'm not saying these things do not also improve productivity

Advantages of Memory Management

I can say f(g(x)) and not have to think about whether the intermediate result produced by g needs to be cleaned-up
I can return from anywhere I like in the middle of a method, without worrying about all paths re-joining to free-up used memory
- Honestly: “Only One Return” was a common coding rule
- Sometimes called “Single Entry, Single Exit”

Advantages of Manual Memory Management

Nostalgia
In theory you can implement manual memory management more efficiently
- This is a bit debatable
- In any case, the improved productivity gained through the use of an automatic garbage collector, can be put to use in optimising the rest of your code
- In particular better algorithms rather than faster implementation of the same one

Advantages of Manual Memory Management

Predictability, it can be difficult to know when the garbage collector might run
- So real-time systems which must respond to incoming external events may suffer
- But there is much research into automatic garbage collection, and real-time garbage collectors do exist

Managed vs Unmanaged

Conclusion

Choose a managed language
If you are only familiar with an unmanaged language either:
- learn a new managed one or
- use a conservative garbage collector

Other Distinctions

Low-level vs High-level
- This is mostly a distinction made from a combination of those above
Significant Whitespace or not:
- Personally I love it, but it is syntax; it does not matter
- If it bothers you that much you can always write a parser for a different syntax
Scripting vs Systems:
- If you must distinguish these you can interoperate between them

What is the Best Language?

Main Conclusion

In general, it is less what the language provides and more what libraries are available in that language
That will of course depend largely on your proposal

Language Choice

Conclusions: Server Code

My hard advice can be summarised as:
- It is likely better for you to choose a strongly typed language
- Choose a language with automatic memory management
A good idea is to use your report to explain your reasons for your langauge choices
A perfectly valid reason is:
- “Language X is my favourite language which I know better than all others”

Language Choice

Obvious Choices: Promising

Python, Ruby: Dynamically, strongly typed well supported with many libraries, very well used for web application development, many frameworks to choose from, both heavy and light weight
Java, C#: Statically, strongly typed well supported with many libraries, tend to have more heavy weight frameworks

Language Choice

Obvious Choices: Questionable

Haskell, O'Caml Statically strongly typed, elegant languages which should be well suited to web application development, but have much less support/libraries/frameworks, frameworks tend to be heavy weight and non-flexible
PHP, Dynamically, weakly typed. Very popular for web site development, but almost universally hated. Well supported by web application frameworks, its success remains a mystery

Language Choice

Obvious Choices: Avoid

Perl, Once the web language of choice, now out-dated
C, C++, Avoid.

Framework Choice

As I said in the second lecture, generally two categories:
- Heavyweight: Those that include everything
- Lightweight: Those that do just the routing part and let you choose libraries for the rest
Both are reasonable choices, but heavyweight frameworks tend to be less flexible, meaning that switching once started is more difficult or requires more work

Resources

Any Questions

Example of Subtle Point

Suppose you have a method to create some data type:


void create_character(int initial_health){ ... }

You realise some new feature requires a second parameter:


void create_character(int initial_health, Gender gender){ ... }

You have a small test case to test your new feature, which you know will only call this method once, say at the start

Example of Subtle Point

Unfortunately calls to this method are spread throughout your code


void restart_game (...){
  ... create_character(100); ... }
void respawn(...){
  ... create_character(80); ... }
void duplicate_cheat(...){
  ... create_character(100); ... }

But you know none of these will get called in your small test case.

Example of Subtle Point

With a static compiler you will have no choice but to update each call anyway


void restart_game (...){
  ... create_character(100, character.current_gender); ... }
void respawn(...){
  ... create_character(80, character.current_gender); ... }
void duplicate_cheat(...){
  ... create_character(100, character.current_gender); ... }

Furthermore, your new feature might not work so you might revert the change

Example of Subtle Point

Worse, you might not yet have reasonable values so you just do this:


void restart_game (...){
  ... create_character(100, None); ... }
void respawn(...){
  ... create_character(80, None); ... }
void duplicate_cheat(...){
  ... create_character(100, None); ... }

Example of Subtle Point


void restart_game (...){
  ... create_character(100, None); ... }
void respawn(...){
  ... create_character(80, None); ... }
void duplicate_cheat(...){
  ... create_character(100, None); ... }

But now, once you have completed your new feature the static type checker is of no help in finding all the places that you need to update your calls to create_character

Example of Subtle Point

Some languages have optional parameters or default arguments:


void create_character(int initial_health, Gender gender=Female){ ... }

But not all do and the same arguments apply for similar situations with changes to types, classes, interfaces, abstract classes etc.

Soft Typing

Soft typing is something of a compromise between static and dynamic typing
The idea of soft typing is to statically type as much of the program as is possible
Where the type system cannot determine that an expression or operation will never cause a type error, it inserts a run-time check
In this sense a dynamic type system is an extreme example of a soft-typing system that is not very good at determining any expressions which will never produce a type error

Soft Typing

In a sense many of our supposedly static type systems are in fact soft type systems which need few checks
Commonly, array indexes are not statically checked to be within the bounds of the size of the array
Instead a dynamic run-time check is inserted for this purpose
Additionally cast operations are generally checked at runtime as they cannot be statically checked to be valid

Static Analysers

When a type is not used by the compiler, then ultimately the static type checker is simply a static analyser
We can deploy many static analysers
We can also, omit to run any or all of them during a development run
Personally, I'm a big fan of static analysers
Static type systems are no exception, but I think they should be optional