Methodology

Software Engineering Large Practical

Points of Contact

  • Please email me: a.d.clark@ed.ac.uk
  • Discussion forum: https://discuss.inf.ed.ac.uk/?q=forum/5
  • Ask questions in class
  • I'm available for at least 20 minutes in the downstairs common area after every lecture

Student Reps Meeting

“SELP the students has asked for resources to be made available to them, other than a Google search. It is an individual project that students have to work on their own and the framework is fine but some help would be good from the lecturer.”

Student Reps Meeting

  • Going through the student representatives is a good thing that I encourage, particularly where you have concerns
  • However, if you actually want a response, this is a very indirect method
  • I hope you can agree that the above is very difficult for me to answer effectively
  • I am positive that the original wording from the student has been somewhat garbled into what I have received
  • As a result, I'm not really sure how to address this
  • But I will try nonetheless

Today's Lecture

  • I will begin to address this concern
  • But first, I want to speak about your methodology, because it affects how I should address this
  • I will then lead into a discussion of resource options, which is incomplete
  • If you feel I have failed in my attempt, get back in touch

Anonymous Feedback

  • I understand that going through the SSLC meeting has the benefit of anonymising question/complaints
    • That is a large part of the purpose of the SSLC
  • You can either use the discussion forum, this is a weak form of anonymising in that I can see your matric numbers and can find out who you are if I wish (I won't but I clearly could)
  • I'm happy to accept anonymous email: eg.

Virtual DiCE

  • For those of you who wish to work away from the university
  • You may be interested in Virtual DiCE
  • This is essentially the School's DiCE setup but run in a virtual machine
  • It can be run on any machine which can run VirtualBox
  • More information here: http://computing.help.inf.ed.ac.uk/vdice

Conflicting Constraints

  • You would like your project to be:
    • Easy to maintain/modify
    • Readable
    • Fast and memory efficient
    • Secure
  • Unfortunately these properties tend to be in conflict:
    • Improving one means compromising on at least one other

Conflicting Constraints

Traditional Engineering Projects

  • Such as building a bridge
  • Begin with a lengthy design process
  • You cannot begin the building process until you have completed the design process
  • This is because modifying the design mid-way through the building is expensive (at best)

Waterfall

  • When people first started to build software large projects they encountered similar problems and sought to copy the techniques from traditional engineering projects to alieviate them
  • This meant rigourous requirement specification and analysis
  • Followed by detailed design
  • Followed by implementation
  • Followed by testing

Software Development

  • However, we soon realised that software development is somehow unique
  • It has two important distinguishing properties:
    1. Implementations can be copied
    2. The requirements are often non-obvious

Implementations Can be Copied

  • Because implementations can be copied anything you do implement is likely to be new
  • If there were an existing solution to your problem you could just reuse that solution
  • This is normally not true of traditional engineering projects:
    • Just because there exists a bridge somewhere does not mean you do not have to build a new one
  • Of course you may still build something that has been built before, but not something that you have built before

Bridge Requirements are Obvious

  • The requirements of a bridge, or a stadium, etc. are mostly obvious
  • Some requirements may not be obvious, but relatively easy to write down
  • Many engineering projects are of objects that we all use/need
    • We are all relative experts in the use of houses
  • This is often far less true of software projects

Exhaustive Upfront Design

  • We have learnt that exhaustive upfront design is generally wasteful
  • Worse, it can be deterimental by locking your design into a bad fit for the solution
  • We have learnt that more important than accurate design upfront is to remain flexible
  • Keep your design and implementation such that it can be changed

Backtracking

  • What all approaches to software development are trying to minimise is backtracking
  • That is: removing work you have done because you now realise it is wrong
  • Upfront design attempts to avoid this by foreseeing any problems
  • Iterative implementations avoid this by being flexible

Writing Software is Searching

  • You are searching for the correct solution
  • There are two ways to search:
    1. Try to determine where the item is and then look there
    2. Just look in many places, it must be in one of them
  • Obviously there is a continuum of approaches between these extremes
    • Search many places, but limit the number of places to those that are plausible, likely, probable

Searching and Backtracking

Bridge Requirements are Stable

  • When the original requirements are changed, for example the level of traffic exceeds what was originally planned for, you build a new bridge
  • But software requirements tend to change frequently
    • Partly because the users did not know what they wanted to begin with
    • Partly because the environment around the software changes:
      • They now want it to work on their smartphone
      • They now want it to work on a Mac
      • They now do not want a Java applet/Flash application etc.

Adaptability and Moving Goal Posts

Difficulty

  • Knowing whether you are heading in the correct direction
    • If you are heading in the wrong direction you either need to backtrack or change direction
  • Note: this is as difficult whether you are attempting to find the correct direction before starting or not

Refactoring

  • Refactoring is the process of changing code such that it computes exactly the same function (of inputs to outputs), but has a better design.
  • This is tremendously powerful, because it allows us to try out various designs, rather than guessing which one is the best
  • It allows us to determine whether something is possible, without necessarily building it in the best way
  • It allows us to design retrospectively once we know significant details about the problem at hand.
  • It allows us to avoid the cost of full commitment to a particular solution which, ultimately, fails.

Suggested Strategy

  • Note that this is merely a suggested strategy
    1. Start with the simplest program possible
    2. Incrementally add features based on your proposal's requirements
    3. After each feature is added, refactor your code
      • This step is important, it helps to avoid the risk of developing an unmaintainable mess
      • Additionally it should be done with the goal of making future feature implementations easier
      • Test that your refactor has not modified functionality

Suggested Strategy

  • At each stage, you always have something that works
  • Although you need not specifically design for later features you do at least know of them, and hence can avoid doing anything which will make those features particularly difficult.

Alternative Inferior Strategy

  1. Design the whole system before you start
  2. Work out all components and sub-components you will need
  3. Start with the sub-components which have no dependencies
  4. Complete each sub-component before moving on to the next
  5. Once you have developed all the dependencies of a component you can now choose that component to develop
  6. Finally, put everything together to obtain the entire system
    • Test the entire system

Methodology and Resources

  • So, do not worry too much about making mistakes early on
  • Just make sure you get started
  • If you have made a mistake, that is not a problem you can correct it
  • With that in mind, I will try to give some more resources
  • These necessarily depend on language choice, so I'll start there

Language Choice

  • The first choice you will have to make is your choice of implementation language
  • Even in this you are somewhat helped by the fact that web deployments are naturally split at least into client-end code and server-end code
  • As a result, you can, and likely will, make use of at least two programming languages
  • That said, implementation language is one of the few choices you have to make early on and is near irreversible

Language Choice

  • Languages come in many varieties, here are some of the distinctions made:
    1. Compiled vs Interpreted
    2. Strongly typed vs Weakly typed
    3. Statically typed vs Dynamically typed
    4. Functional vs Imperative
    5. Object Oriented vs Classless
    6. Lazy vs Eager
    7. Managed vs Unmanaged
  • For the most part these are independent of each other giving us 27 (128) possibilities
  • You may have already made your choice, but the information here may help you to apply your choice of language well

Compiled vs Interpreted

  • Many languages will claim to be either a “compiled language” or an “interpreted language”
  • The distinction is intended to be simple:
    • Either the source code is translated into machine code and then run or:
    • An interpreter reads the source code and executes each line of code dynamically

Compiled/Interpreted Language?

  • There is not really any such thing as a “compiled language” or an “interpreted language”
  • There are compiler or interpreter implementations
  • A language may have one particularly official implementation
  • Interpreters are nearly always implemented via some kind of bytecode
  • So we only really have compiler implementations, it is just a question of what that compiler targets, physical machines or virtual bytecode machines

Compiled/Interpreted Implementations

  • Ocaml has ocamlbyte and ocamlopt
  • Java is generally compiled to the JVM, but implementations such as gcc-java exist
  • C# and some other languages now target the CLR runtime
  • Python is generally interpreted but Cython exists (an optimising static compiler)

Compiled vs Interpreted

Conclusion

  • The distinction between compiled and interpreted is one of implementation not languages
  • However, some language features lend themselves to one more easily than the other
  • But, increased runtime sophistication has meant that the line between compiled and interpreted has become increasingly blurred
  • Your language choice should probably not focus too heavily on whether the official language implementation is a compiler or an interpreter

Type Systems

  • Languages involve expressions which evaluate to values
  • It is possible to give a type to those values
  • We can then check that operations use values of an appropriate type
  • For example we may check that we are not trying to add a string to an integer: 3 + "hello"
  • The types may also determine what the operation is:
    • Integer addition: 3 + 2
    • Floating point addition: 3.0 + 2.0

Type Systems

  • Some type systems also give types to statements
  • For example some type systems determine what exceptions may be raised by a given command (which may be a sequence of commands)
  • Some such type systems oblige the user to declare these exceptions
  • For our purposes we will concentrate on the typing of expressions/values

Strongly typed vs Weakly typed

  • This is often confused as a distinction between statically and dynamically typed languages but this distinction is quite separate
  • One can have static-strong, static-weak, dynamic-strong, dynamic-weak

Strongly typed vs Weakly typed

  • Strongly: Objects of the wrong, or incompatible types cause an error:
    • 3 + "5" = error, as seen in C++, Java, Python, Ocaml
  • Weakly: Objects of the wrong, or incompatible types are converted:
    • 3 + "5" = "35" in Javascript
    • 3 + "5" = 8 in PHP, Perl5, Tcl

Advantages of Strong Typing

  • When something goes wrong, the error is produced as soon as it is discovered
  • This makes it easier to investigate the source of the error
  • Additionally, you are less likely to calculate incorrect results
  • Often, incorrect results are worse than no results

Advantages of Weak Typing

  • Occasionally completing a computation and obtaining a result is better than obtaining no result
  • Even if the result you obtain is wrong
  • Displaying a web page wrongly is generally better than not displaying it at all
  • You can implement this in either a strongly or weakly typed language but it is easier in a weakly typed one

Strong vs Weak Typing

Conclusion

Backend Server Code
  • Chances are you should care more about getting the correct result than getting any kind of a result, so you should use a language with a strong type system
  • But do not confuse weak typing with other type system distinctions, such as nominative, structural, duck typing

Frontend Interface Code
  • More likely to care about getting any result
  • In any case you are ultimately limited to Javascript, though you may choose a strongly typed language that compiles to Javascript

Statically typed vs Dynamically typed

  • A statically typed language specifies that the typing of expressions should be done before the program is run
  • A dynamically typed language specifies that the typing of values should be done whilst the program is run
  • Not to be confused with implicit/explicit typing

I have investigated type systems, but:


Source: TIOBE language index

Statically typed vs Dynamically typed

  • One reason to type expressions is to aid compilation
  • Recall the typing of the operands to an addition operator meant that we could determine what kind of addition is required
  • We might also need to know the size of the computed value so that we know where it might be stored
  • Obviously, if the purpose of the types is to aid compilation, the type checking will have to be done statically
  • More importantly the typing of expressions and values is done to avoid the computation of incorrect results

Advantages of Static Typing

  • Type errors are caught before you attempt to run the program
    • This means for example that type errors should not occur mid-run on a user's machine
    • Even during development, perhaps you have a program that:
      • takes seconds to compile,
      • minutes/hours to run
      • and a type error in the final printing of the result
    • Using static types you will be alerted to the type error after the compile
    • Using dynamic types you will be alerted at the end of a first run

Advantages of Static Typing

  • You may be releasing a library, which isn't “run”
    • Of course you should have a test suite with 100% code coverage
    • That does not always mean the tests are particularly useful
    • What you should have and what you do have are not always the same
    • Static typing gives you some kind of guarantee for “free”

Advantages of Dynamic Typing

  • Static type checking is necessarily conservative
  • This means it will reject some programs that ultimately would not, when run, have resulted in a type error
  • During development you can avoid type checking code you know will not be run, this is a subtle point (if there is time at the end I will expand it)
  • You should be testing comprehensively
    • If so, this lessens the gains from static typing

Two Competing Forces

  • When programmers learn static type systems it often feels like you are getting more program correctness for free
  • It seems as though it is not quite for free, and that the static type system does hamper productivity in the short term
  • It also seems likely that static type systems can save on some kinds of work in the future
  • The question is, does short term loss in productivity repay for itself with long term increase in productivity?

Philosophy of Typing

  • Just as I suggested that a language can be neither a compiled nor interpreted language it is also something of an implementation issue as to when typing is performed
  • However, there is generally a type system attached to each language
  • Some type systems are very difficult or even impossible to fully check statically
  • Some language designers deliberately ensure that it is possible to statically type check the language

Statically Typed vs Dynamically Typed

Conclusion

  • The distinction between statically typed and dynamically typed is in theory one of implementation, but in practice one of language
  • The distinction though is softer than some may suggest
  • It is more of a gradient than a dichotomy
  • For this project, either kind of type system will be fine
  • But, whichever choice you make, I recommend making use of additional static analysers
  • And, whichever choice you make, you should write some tests

Functional vs Imperative

  • This distinction is somewhat disputed
  • The main idea is that a functional language computes values of expressions, but does not modify state
  • An imperative language is simply a non-functional language, that is, one which allows/encourages the programmer to directly modify state

Functional vs Imperative

  • It turns out, that a lot of programs involve a lot of functional computation, with a very small amount of state modification
  • Hence, the term functional is often relaxed to include those languages that discourage state modification
  • More importantly, such languages, encourage declarative code.
    • That is, code which does not modify the state

Functional Programming

  • I tend to describe any language with proper support and syntax for nested, higher-order functions to be functional
  • A higher-order function is simply one that:
    • Takes one or more functions as parameters
    • Returns a function as a result
  • In general treating functions as any other kind of value is known as providing first class functions
  • If the language also allows nested functions which can access the scope of containing functions, the implementation requires function closures
  • The provision of nested, higher-order functions usually encourages declarative programming

Functional Programming

  • Languages which entirely forbid state updates I describe as strictly functional
  • Even this is a little confusing because some people describe eager evaluation as strict evaluation
  • So I might also say a pure functional language or simply a pure language

Functional Advantages

  • The key advantage of a functional programming language is the hugely pretentious phrase “referential transparency”
  • I'm not sure, but I suspect this phrase is one reason functional programming languages are not more widely adopted
  • It means, that an expression evaluates to the same result regardless of the time, or state, in which it is evaluated
  • In particular invoking a function: some_fun(args) with the same arguments args will always produce the same result

Functional Advantages

  • This makes testing and/or reasoning about the correctness of code much easier
  • In theory, it means code is more re-usable
    • This is debatable, and not, to my knowledge, demonstrated (either way) satisfactorily
    • But it's certainly plausible

Functional Advantages

In theory, this additionally allows for some interesting compiler optimisations, consider the following double transformation over a list of items:

some_list = map f (map g original_list)
This is common in both functional and imperative languages, even if in imperative languages it is an array which is looped over.

Functional Advantages


some_list = map f (map g original_list)
It can be re-written to, the faster:

fg = f . g
some_list = map fg original_list
Where f . g is the composing of two functions together. This is faster because it only loops over the list once.

Functional Advantages


some_list = map f (map g original_list)
It can be re-written to, the faster:

fg = f . g
some_list = map fg original_list
However, this optimisation, changes the order of execution. So it is only applicable where, f does not modify state which g references or vice-versa. In a functional language this is both, more likely and easier to automatically check.

Functional Advantages


fg = f . g
some_list = map fg original_list
  • Similarly if you have multiple processors, you could begin the second map operation in parallel as soon as the first transforms the first item.
  • Again, only if you can determine that there are no state dependencies.
  • In general parallel programming can in theory be advanced by limiting state updates

Imperative Advantages

  • With no state modifications all information required by any function must be passed in as an argument
  • This can arguably make the code more complicated
  • Worse, it can require a large refactoring in order to make a relatively simple change

Imperative Advantages

  • However, recall that my definition of a functional programming language did allow for state modifications.
  • It only required nested, higher order functions
  • It's hard to argue that not providing these is an advantage to the programmer
  • One could argue that the implementation (of the language) is simpler
    • It is debatable, but one can certainly argue that the implementation of nested higer-order functions, requires a performance degradation
    • Functions are more heavyweight and hence more expensive to invoke

Functional Web Development

  • Most of the state changes and flow-of-control will be done by the web server not your web application code
  • Your web application code will be providing answers to specific requests
  • This makes a functional language somewhat appealing
  • However, this also means the implementation of web server framework tends to be awkward
    • which in turn means there are fewer of them

Functional vs Imperative

Conclusion

  • You could certainly use either a functional or an imperative language for this practical
  • You're probably best off with whichever you prefer
  • There is arguably better resources for imperative languages

Object Oriented vs Classless

  • Given my glowing recommendation for higher-order functions why are they not more commonly used?
  • Classes, or objects, allow for a similar abstraction
  • An object is really a collection of state together with operations over that state

Typical Class Definition


class ClassName (ParentClass){
   classmember_1 = 0;
   classmember_2 = "hello";

   void class_method_1(int i){
       self.classmember += i;
   }
   void class_method_2(String suffix){
       print_to_screen(self.class_method_2);
       print_to_screen(suffix);
   }
}

Object Oriented Languages Popularity

CategoryRatings Sep 2013Delta Sep 2012
Object-Oriented Languages 56.0% -1.1%
Procedural Languages 37.3% -0.9%
Functional Languages 3.8% +0.6%
Logical Languages 3.0% +1.3%
Source: TIOBE language index

Advantages of Object-Oriented

  • Surprisingly debatable
  • Most people agree that there is some value in object-oriented programming
  • But when asked to give concrete advantages, most offer:
    • Vague perceived benefits, with no logic connecting to OOP:
      • Advances reuse
      • Better models the real world
    • Clear benefits but which are not unique to OOP:
      • Polymorphism (fancy word for a specific kind of generality)
      • Encapsulation (fancy word for hiding/abstraction)

Advantages of Classless

  • No one really argues that the provision of classes is inherently destructive
  • In a similar way to higher-order functions, having the ability to utilise classes does not do any harm if you never use them
  • However, once the temptation is there, it's easy to go class crazy
  • But such arguments are not arguments against the use of an object-oriented language, so much as an argument for careful use of classes

Object Oriented vs Classless

Conclusion

  • By all means choose an object-oriented language
  • There is little reason not to, but pure languages often do not have a notion of an object
    • This is for good reason and should not put you off choosing a pure language
  • If you do choose an object-oriented language, use your classes with care
  • Classes are just one way of organising source code.
    • There are others which are just as effective
    • Using an OOP language will not magically organise your source code for you

Lazy vs Eager

Conclusion

  • I'll not get into this debate, but I'm happy to discuss this after the lecture with anyone interested
  • Main conclusion: either is fine

Managed vs Unmanaged

  • Automatic memory management, sometimes called garbage collection
  • Without this, whenever you need to store a value in memory, you must first ask for the space in memory
  • When a value in memory is no longer useful, you should give back the space in memory that it used
  • If you let the last reference to a value go out of scope, without freeing up the associated memory, you will not ever do so, hence you have a space-leak
  • Unfortunately, if you give back the memory too soon, you may subsequently try to reference the value, this may cause a segmentation fault

Advantages of Memory Management

  • You need not manage the memory yourself, this is hugely liberating
  • I believe there is much gained productivity associated with:
    • Object Oriented Languages
    • Dynamically/Statically typed languages
    • Lazy languages
    • Reflection
    which is actually gained productivity from automatic memory management which has been misattributed to the above
  • I'm not saying these things do not also improve productivity

Advantages of Memory Management

  • I can say f(g(x)) and not have to think about whether the intermediate result produced by g needs to be cleaned-up
  • I can return from anywhere I like in the middle of a method, without worrying about all paths re-joining to free-up used memory
    • Honestly: “Only One Return” was a common coding rule
    • Sometimes called “Single Entry, Single Exit”

Advantages of Manual Memory Management

  • Nostalgia
  • In theory you can implement manual memory management more efficiently
    • This is a bit debatable
    • In any case, the improved productivity gained through the use of an automatic garbage collector, can be put to use in optimising the rest of your code
    • In particular better algorithms rather than faster implementation of the same one

Advantages of Manual Memory Management

  • Predictability, it can be difficult to know when the garbage collector might run
    • So real-time systems which must respond to incoming external events may suffer
    • But there is much research into automatic garbage collection, and real-time garbage collectors do exist

Managed vs Unmanaged

Conclusion

  • Choose a managed language
  • If you are only familiar with an unmanaged language either:

Other Distinctions

  • Low-level vs High-level
    • This is mostly a distinction made from a combination of those above
  • Significant Whitespace or not:
    • Personally I love it, but it is syntax; it does not matter
    • If it bothers you that much you can always write a parser for a different syntax
  • Scripting vs Systems:
    • If you must distinguish these you can interoperate between them

What is the Best Language?

Main Conclusion

  • In general, it is less what the language provides and more what libraries are available in that language
  • That will of course depend largely on your proposal

Language Choice

Conclusions: Server Code

  • My hard advice can be summarised as:
    • It is likely better for you to choose a strongly typed language
    • Choose a language with automatic memory management
  • A good idea is to use your report to explain your reasons for your langauge choices
  • A perfectly valid reason is:
    • “Language X is my favourite language which I know better than all others”

Language Choice

Obvious Choices: Promising

  • Python, Ruby: Dynamically, strongly typed well supported with many libraries, very well used for web application development, many frameworks to choose from, both heavy and light weight
  • Java, C#: Statically, strongly typed well supported with many libraries, tend to have more heavy weight frameworks

Language Choice

Obvious Choices: Questionable

  • Haskell, O'Caml Statically strongly typed, elegant languages which should be well suited to web application development, but have much less support/libraries/frameworks, frameworks tend to be heavy weight and non-flexible
  • PHP, Dynamically, weakly typed. Very popular for web site development, but almost universally hated. Well supported by web application frameworks, its success remains a mystery

Language Choice

Obvious Choices: Avoid

  • Perl, Once the web language of choice, now out-dated
  • C, C++, Avoid.

Framework Choice

  • As I said in the second lecture, generally two categories:
    • Heavyweight: Those that include everything
    • Lightweight: Those that do just the routing part and let you choose libraries for the rest
  • Both are reasonable choices, but heavyweight frameworks tend to be less flexible, meaning that switching once started is more difficult or requires more work

Resources

Any Questions

Example of Subtle Point

Suppose you have a method to create some data type:

void create_character(int initial_health){ ... }

You realise some new feature requires a second parameter:

void create_character(int initial_health, Gender gender){ ... }

You have a small test case to test your new feature, which you know will only call this method once, say at the start

Example of Subtle Point

Unfortunately calls to this method are spread throughout your code

void restart_game (...){
  ... create_character(100); ... }
void respawn(...){
  ... create_character(80); ... }
void duplicate_cheat(...){
  ... create_character(100); ... }

But you know none of these will get called in your small test case.

Example of Subtle Point

With a static compiler you will have no choice but to update each call anyway

void restart_game (...){
  ... create_character(100, character.current_gender); ... }
void respawn(...){
  ... create_character(80, character.current_gender); ... }
void duplicate_cheat(...){
  ... create_character(100, character.current_gender); ... }

Furthermore, your new feature might not work so you might revert the change

Example of Subtle Point

Worse, you might not yet have reasonable values so you just do this:

void restart_game (...){
  ... create_character(100, None); ... }
void respawn(...){
  ... create_character(80, None); ... }
void duplicate_cheat(...){
  ... create_character(100, None); ... }

Example of Subtle Point


void restart_game (...){
  ... create_character(100, None); ... }
void respawn(...){
  ... create_character(80, None); ... }
void duplicate_cheat(...){
  ... create_character(100, None); ... }

But now, once you have completed your new feature the static type checker is of no help in finding all the places that you need to update your calls to create_character

Example of Subtle Point

Some languages have optional parameters or default arguments:

void create_character(int initial_health, Gender gender=Female){ ... }

But not all do and the same arguments apply for similar situations with changes to types, classes, interfaces, abstract classes etc.

Soft Typing

  • Soft typing is something of a compromise between static and dynamic typing
  • The idea of soft typing is to statically type as much of the program as is possible
  • Where the type system cannot determine that an expression or operation will never cause a type error, it inserts a run-time check
  • In this sense a dynamic type system is an extreme example of a soft-typing system that is not very good at determining any expressions which will never produce a type error

Soft Typing

  • In a sense many of our supposedly static type systems are in fact soft type systems which need few checks
  • Commonly, array indexes are not statically checked to be within the bounds of the size of the array
  • Instead a dynamic run-time check is inserted for this purpose
  • Additionally cast operations are generally checked at runtime as they cannot be statically checked to be valid

Static Analysers

  • When a type is not used by the compiler, then ultimately the static type checker is simply a static analyser
  • We can deploy many static analysers
  • We can also, omit to run any or all of them during a development run
  • Personally, I'm a big fan of static analysers
  • Static type systems are no exception, but I think they should be optional