General Web Related Frequently Asked Questions

Listed below are an assortment of frequently, and not so frequently, asked web questions.

If the question you wanted to ask isn't covered here, then please check the support FAQ as well.

Questions

Answers

How can I control access to my pages?
To control access you need to create a file called .htaccess (note the leading dot) in the directory that you want to restrict. This will impose restrictions on this directory, and all of the levels below it.

To restrict users based on the IP address that they're connecting from, you need to use something like:

deny from all
allow from 129.215.0.0/16

This example would restrict access to all machines within the University's network, ie within .ed.ac.uk. However, it would exclude the student accommodation network (ResNet), to allow these machines to see that pages too, add the following line:

allow from 10.0.0.0/8
Remember that IP address based restrictions are by no means 100% secure, especially as the number of machines you allow access to increases.

See:

homepages.inf password restriction example

As a quick example of creating a password restricted area on homepages.inf.ed.ac.uk, if you do the following (replacing $USER with your UUN) then you'll end up with a password protected area at http://homepages.inf.ed.ac.uk/USERNAME/private/ , where you'll need to enter the username "test" and the password "foobar" to access the area.
cd /public/homepages/$USER/web
mkdir private
chmod o-rwx private
cd private
htpasswd -bc .htpasswd test foobar
cat > .htaccess <<EOF
# create an .htpasswd file with the following
AuthType         Basic
AuthName         "Test passwd"
AuthUserFile     /public/homepages/$USER/web/private/.htpasswd
Require          valid-user
EOF
Note, this is not secure, so do not use your DICE username and password when creating the password file, but should be sufficient for simple, non-critical uses.
How do I stop my web page being indexed by search engines?
You can't guarantee that this will work for all web robots that trawl the web looking for web pages, but the more well known, well behaved, do. e.g. Google.

robots.txt

The web master of a web service can create suitable entries in the /robots.txt file. If you are not in a postition to maintain this file yourself, then you may be able to ask the web master. However for services like homepages.inf.ed.ac.uk or groups.inf.ed.ac.uk, this cannot be done, as we don't have the effort to deal with every individual request. We will be looking at some automated process, but not yet.

Meta tags

Newer robots pay attention to META tags in the HTML documents themselves. This is something you can do, and does not need action on behalf of the web master. However, as the page says, not all robots support this. See:

How do stop email addresses being harvested?
It's a sad fact of life that certain individuals trawl the web looking for web pages with email addresses on them. These addresses are then harvested and sold on to spammers (people who send unsolicited junk email), that then clog mailboxes up with junk.

This means that if you have a web page containing an email address (useful when telling someone how to get in contact), then its likely that the address will start receiving junk mail.

When including an email address on a web page, try to disguise the fact that its an email address. Humans should still be able to recognise it, but programs that go around looking for them will probably not. So rather than putting "send mail to SomeAddress@inf.ed.ac.uk" on your web page, do "send mail to SomeAddress (at) inf . ed . ac . uk", it looks a little clumsy, but people should know what you mean, and it should throw most automated harversters off the scent.

You can go further by using convoluted HTML to obfuscate things, eg "send mail to SomeAddress@inf.ed.ac.uk"

Which actually looks like this in HTML:

send mail to <a
href="mailto:SomeAddress&#64;inf&#46;ed.ac.uk">SomeAddress&#64;inf&#46;ed.ac.uk</a>

This example also use the mailto: URL in the <A> tag. This is usually an even more sure fire way of the harvesting programs finding real email addresses, however if you must use it, you can still try to confuse things by using numerical entities as in the above example.

None of these tricks will guarantee that the address won't be harvested, but they should help.

How to redirect from legacy personal pages to homepages.inf
If you've moved your old personal pages from the likes of www.dcs.ed.ac.uk/home/legacy_username/ to homepages.inf.ed.ac.uk/dice_username/, then you'll probably want anyone (or search engines), that have bookmarked the old location of your web pages, to be automatically forwarded to their new location. You can do this by creating a .htaccess file in the root of your legacy web space containing a line similar to the following:
# Redirect all requests to my new homepage
Redirect permanent /home/legacy_username http://homepages.inf.ed.ac.uk/dice_username
This assumes that you've moved all the files and kept their relative postitions and names. So someone visiting www.dcs.ed.ac.uk/home/neilb/foo/bar.html will automatically be redirected to homepages.inf.ed.ac.uk/neilb/foo/bar.html.

Note that the second argument to Redirect (the location that you want to redirect to somewhere else) should match that portion of the URL after the machine address, thus - for example - the redirect-from part of www.dcs.ed.ac.uk/home/legacy_username would be /home/legacy_username, and of www.cogsci.ed.ac.uk/~legacy_username would be /~legacy_username.

How do I see who's been accessing my pages - logging?
As we can't allow unfettered access to the web logs, you have to log access to your own web pages yourself. To achieve this the basic idea is to run some code while the page is served by the webserver. This code can then log the details of the fetch. For example: embed some PHP code in the page to log the access; or include an <img> which is actually a CGI that runs on the server and can do the logging.

Examples of the two methods described above are detailed in this document.


Home : Systems : Web 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh