A rough guide to publishing on the Informatics Server
Author Simon Wilkinson (but please also see Acknowledgments, below)
Revision 1.17
This is a work in progress.For further information, or if you
have comments or contributions please mail webadmin @ inf.ed.ac.uk.
Introduction
The Informatics web server is designed to hold a collection of information
for use by users both within and outside the School. The unique nature
of computing within the School, along with a number of deliberate design
decisions mean that publishing on this web server is undertaken in a
different way to most other servers within the University
In addition, the server has been designed with a clear style, and structure
which we are keen to keep. Retaining this architecture will make the
system easier to use, and to maintain.
This document starts by describing means of publishing on the server,
later sections provide background and justification for many of the
decisions originally made, and will hopefully help as a guide to those
maintaining the system in future years.
Do you need to publish?
Information on the Informatics server comes from a variety of sources, not
all of which require you to have a publishing account in order to be able
to alter it. In particular
- News and Events are automatically generated from a database.
Contact the web administrator if you wish to add information to this section.
Providing you only wish to add a short piece, you will not need a publishing
account for this operation.
- Personal information is generated from a database. See the
contact details at the bottom of these pages for information on how to
change it.
Terminology
The web can be a confusing mess of conflicting jargon. In order to avoid
confusion as far as possible, we have picked a few terms which we will
use consistently throughout this document.
- Object
- An object is an item which is available on the world wide web, pointed
at by a single URL. Objects are usually files, and are generally written
in HTML.
- URL
- Uniform Resource Locator. An "address" for an object on the World
Wide Web. For example http://www.inf.ed.ac.uk/
- Container
- A folder or directory that will contain other URLs. A container
has a URL of the form http://www.inf.ed.ac.uk/wibble/, and
should contain a file called index.html which will create that
containers index. If the container does not contain an index.html
file, then an error will be returned to anyone browsing to that
container.
Becoming a publisher
The first stage to becoming a publisher of information on the Informatics
site is to apply for a publishing account. You can do this by going to
http://publish.inf.ed.ac.uk/publish/newuser/
and filling out the form on this page. Whilst you are waiting for your
request to be processed, please take the time to read this document.
Creating content
Selecting a location
Before creating new content on the Informatics site, it is important to
think about how that content will fit into the overall structure of the
site. If your content alreay has an obvious location (for example, a
document relating to the ITO), then this is a straightforward step.
However, if your content is something new, that doesn't have a location
for it already created on the server, please mail webadmin @inf.ed.ac.uk
to ask them where an appropriate location would be. One of the key issues
in maintaining the clean, clear, structure of the Informatics site, along with
reducing the maintenance load, is ensuring that the URL space is keep
clean. "Cluttering" areas of the space with documents because you already
have permission to create objects there will only harm the server as a
whole. In addition, the web administrators reserve the right to prevent
the serving of pages placed in inappropriate locations.
Details of the design decisions made when structuring the URL tree for the
site are given later in this document. If you wish to suggest a location
for your content based on this document, that would be greatly appreciated.
If you are adding to a location that you already have publishing permission
for, by creating new objects or containers, please follow the guidelines
given below when you are naming them.
- Container names should be short and meaningful, giving a general
description of the documents in the tree below them
- Container and object names should avoid using non-alphanumeric
characters
- Names should be meaningful to the end user, and avoid using
site-specific terms, or information.
- The correct extension for HTML documents is .html (not .htm)
- Names should be all in lower case.
- There should never be two names which are differentiated purely
by capitalisation.
Remember that URLs are "for life". Once created and referenced URLs will
persist for many years. If you are creating content that you intend to be
transient, then place it and label it as such. Remember that changing URLs
for the same content (for instance, if you decide that you don't like the
previous URL) will confuse the end user, and lead your potential audience
being split between the two locations. One of the reasons for exercising
care in URL placement is so that the site's structure can grow and expand
without any renamings being necessary.
If in any doubt about where to place new content, please contact
the web administrators for guidance before creating the document.
Creating your new location
The method for creating your location depends on the publishing
technique being used. Please see the document detailing your chosen
publishing technique for more details. Users who are using Netscape
Communicator to create content will not currently be able to create
new containers, although they can create new files.
If you have any difficulties in creating new directories or containers
please contact the web server administrators.
Controlling access to your location
Firstly, bear in mind that confidential information should never be
placed on the web. However, if you have information that you would like
to restrict to local users, please contact the web server administrators
for advice.
Document Formats
Documents on the Informatics server should be either PDF, HTML, or plain
ASCII text. The use of any other document format is strongly discouraged,
and may be rejected by either the submission or serving systems.
The supported formats were chosen after a great deal of consideration as
being the most portable formats available. In addition, it is strongly
recommended that all information is provided in HTML, as a baseline, with
PDF being provided as an optional extra.
Many browsers can only display HTML documents, most search engines will
only index HTML-based content, and PDF presents particular challenges to
screen readers used by the visually impaired.
A number of tools are available to produce HTML from other document formats,
these are discussed in more detail in a companion document on the
tools which are available for those publishing on the server.
Using the site style
As you will notice browsing around the site, it has a simple style,
which should be used on all pages within the site. Using this style is very
straightforward, and a number of means are provided to do so.
If you are editing raw HTML, then you can structure your document as follows
(sections shown in bold are parts that you should edit)
<!--#include virtual="/ssi/doctype.inc"-->
<TITLE>Put your title here</TITLE>
<!--#include virtual="/cgi-bin/metabase"-->
<!--#include virtual="/ssi/header.inc"-->
Content goes here
<!--#include virtual="/cgi-bin/locationbar"-->
<!--#include virtual="/ssi/footer.inc"-->
A discussion of what each of the components in the above do is contained
later in this document.
Authors using HTML editors can fetch a template from
http://www.inf.ed.ac.uk/template.html
Once you have a template to start editing, the procedure is the same as
if you were editing existing content.
Editing existing content
Database generated pages
If you are one of the lucky few with permission to edit pages over the
entire site, then please bear in mind that not all of the pages on the
site can be edited by hand. In particular, a number of sections of the
site are automatically generated.
- /map/ Is automatically generated nightly from the results of
the indexing run over the server.
- /people/ Is automatically generated from the database
- /events/ Is automatically generated from the database
- /research/ Will, eventually, also be automatically generated.
Please consult with the web administrators before creating files within this
tree.
In addition, it is possible that autogenerated pages may appear in other
areas of the tree. Autogenerated pages are clearly marked within their
source. You can view page source by using the "View-->Page Source" option
under Netscape 4 (under IE4 this is done through Page Properties).
When you look at the source of a database generated file, you will see
something like
<!-- @Conduit DAILY H301 @Creator daidb -->
<!-- WARNING - DO NOT EDIT THIS FILE BY HAND -->
The first line indicates that the page is being automatically updated from
the database by conduit H301 on a daily basis. These lines will only appear
on generated pages.
Style
The Informatics site has a simple style which is designed to follow the
reader around the site. Pages which do not follow this style will not only
look odd, they will break a number of the navigation models which we have
adopted to make it easier to find information on the site. Please don't
remove the standard headers and footers from pages which you are editing,
and ensure that they are present on new pages which you create.
In addition to the standard headers and footers, we would also be greatful
if authors would follow a number of other basic points
- Do not use frames - they break the web navigation and content models.
Try tables instead.
- Try to use sensible line lengths in your HTML source - this helps
our revision management system provide you with
better information.
- Avoid changing the font of your document
- Don't use text colour as a visual key
- Always supply alternate text for images (using the ALT tag, or an option
in your editor)
- Only use GIF, JPEG or PNG format images. Of these, GIF is the most
widely supported.
- Try to avoid using italics, as they may be hard to read
- Don't underline text
- Make your link anchors meaningful - don't use phrases such as "here" or
"click here"
- Be careful with using tables to manage text layout, as they can confuse
speech reading software
- When linking to containers use container/. Not
container or container/index.html.
These points are aimed at producing a site which is accessible to as many
people as possible regardless of computing platform, web browser, or visual
disability.
Linking
Please be careful about linking to content held on external sites (by this
we include other sites within the School). In particular
official
School information should not be held within users personal web
space. Please do not place permanent links for committee papers, minutes
etc into people's home directories.
Consider whether the information that you are linking to actually belongs on
the School server as part of a permanent record. Remember that external
sites may change hands and ownership - what particular URLs point to may
change beyond all recognition over the lifetime of a page.
Titles
A title should uniquely identify the document it refers to. A number of
browsing tools make use of titles (such as "Bookmarks" or "Favourites"), and
our site's search engine and sitemaps also use titles to reference a
document. For this reason, despite the fact that the Title appears outside
the main document window, please try to use meaningful titles. Generally
speaking, the title should be the same as the first heading on the page.
Upon submission of your edited page to the server, the HTML within it
will be checked. Whilst it is possible for some browsers to display
faulty HTML, results are variable (as they have to guess the meaning
of the broken instructions). For this reason, and because HTML
mistakes are perhaps the single largest form of publishing problems,
we validate
all HTML pages before they are published. The
validation service is based around the
W3C Validation service. There is a
locally mirrored version at
validator.inf.ed.ac.uk
If the validation fails you will be given a list of line numbers, and error
messages which should enable you to fix the mistakes.
You may find, if you are using an HTML editor such as Netscape Composer,
that it produces invalid HTML itself. This is an unfortunate problem with
a number of these editors, and is due to faults in their software. A
number of tools exist which can fix this broken HTML, please see the
companion tools document for more details.
A number of tools exist to correct invalid HTML, and many of these are
used to automatically correct files submitted with some publishing
techniques (specifically those used by Netscape Composer),
unfortunately this automatic correction is not possible with direct
CVS publishing.
In the event of encountering errors, you might like to try the
tidy utility to automatically repair your HTML. However,
please be aware that tidy will _not_ expand any server side
includes used in your document. The standard Informatics SSIs contain
important structural information which tidy requires to be able to
parse the document. In this situation you may find the tools ssiexpand and ssireduce helpful.
Metadata, sitemaps and location bars
Metadata is "information about data", in this case it is data about
the information stored in a web page. We store a number of pieces of
metadata and use it in a number of different ways. Broadly speaking we
use metadata for the following purposes :
- To improve the search features, and to assist internet search engines
- To generate the sitemap
- To control the text used in the location bar at the bottom of every
page
- To provide information about document authors and publishers, both
for browsers and document maintainers.
We store metadata for all items present on the server, but at present only
make use of it for HTML documents.
Supported metadata fields
We currently use the following fields within our metadata. These are
strongly based around those suggested by the
Dublin Core
- Author's email
- The email address of the primary author(s) of the document.
- Author's name
- The full name or names of the primary author(s).
- Contributor's name
- The name of any contributors to the content. A contributor is someone
who is not primarily responsible for the content, but has made significant
contributions.
- Contributor's email
- The email address(es) of any contributors to the document
- Title
- The title of the document, as contained within the HTML. Title can be
set manually for non-HTML documents.
- Publisher
- The publisher of the document, this will usually be
"School of Informatics, The University of Edinburgh"
- Keywords
- A set of keywords describing the document.
- Description
- A short, textual, description of the contents of the document. An
abstract, if you like.
- Short name
- A short, one or two word, name for the document which is used when
indicating site position on the location bar.
- Sitemap title
- The name by which the document should appear in the sitemap
- Note: The site map is a simple alphabetical list - be careful what you put in here or we will wnd up with too many documents beginning with, for example "Informatics".
Automatic generation of metadata
A number of these fields are automatically created when the document is
published on the server. These, currently, are:
Author's email,
Author's name,
Contributor's name,
Contributor's
email,
Title,
Publisher.
The Author is the first person to place the document on the server, people
who subsequently edit the document will be listed as Contributors.
Manual editing of metadata
All of the above metadata
may be manually edited by browsing to the page on the
Informatics server that you wish to modify, then replacing the
www portition of the URL with
publish. Selecting the
Edit metadata option on the bar at the top of the page
will produce a screen allowing the editing of all of the options
detailed above.
Note - pages not located on the main informatics server, such as the
pages associated with the various informatics institutes, can not be
edited in this way.
Metadata for search engines
Both our internal search system, and internet wide crawlers such as
Altavista, can use pieces of metadata to improve their search performance.
The metadata that they honour are the
Keywords and
Description pieces. If you pick your keywords well, then your
document is far more likely to appear in relevant searches. The
description you provide is used to replace the automatically generated
document summary presented in the list of search results, and can make
users far more likely to access your document.
The description and keywords metadata is only altered by using the manual
option described above. Its worth taking the time to add the information
if you view your document as being interesting to other readers, and its
worth checking that the information is still relevant if you radically
change the content of a document, or take over a document from another
author.
Sitemaps and location bars
The Informatics site has an automatically generated
site
map, which contains "key documents ... as chosen by their authors".
In order to keep the quality of this page as high as possible, you will
need to explicity add your page to it, if you feel that that is appropriate.
You can do so by editing the metadata of your page, and inserting an
appropriate title into the
Sitemap field, or by selecting the
Add to sitemap option on the publishing bar at the bottom of the
page.
Do give some thought to how you title your document in the site map. It may be
a good idea to start the title with an appropriate keyword.
Location bars are, as discussed earlier, automatically generated for every
page. The text used in this names is usually a version of the container
name in the URL. If you own a container index page, then you can change
the name used to link to that page. To do so, enter the new name as
the Shortname field. Bear in mind that the name should be as
short as possible so that the location bar remains usable.
Further Reading
This section is by no means complete - I am listing the documents I
consulted in detail in building the site. I hope eventually to expand this
into a more complete set of references. It will also, undoubtedly,
suffer from linkrot.
HTML
Accessibility
Usability
Acknowledgments
Text has been contributed by Tim Colles.
Design, style and HTML recommendations have been culled from a number
of sources, many of them listed in the "Further reading" section above.
In particular the W3C and the RNIB provide useful information on designing
a more accessible web. Jakob Nielsen's alertbox column provides hints
on producing a more manageable and navigatable web. A number of sites
who make their authoring hints available such as Sun and mozilla.org,
also provided tips. The design of the server is based strongly upon one
done by Visual Resources for a EDINFO redesign - Arthur Wilson of EUCS
provided helpful and timely advice regarding this.