General web questions

General web-related 'Frequently Asked Questions'

Listed below are an assortment of frequently, and not so frequently, asked questions relating to websites and webpages hosted by the School of Informatics.

If the question you wanted to ask isn't covered here, then please check the support FAQ as well.

Questions

How can I control access to my pages?
How do I stop my web page being indexed by search engines?
How do I stop email addresses being harvested?
How do I see who's been accessing my pages - logging?
How does AFS affect web services?
How can I get my link to homepages to appear on my 'people' page?
My chrome is slow. How can I fix it ?

Answers

How can I control access to my pages?

To control access, create a file called .htaccess (note the leading dot) in the directory which you want to restrict. The restrictions placed in this file will apply to this directory and to all levels of subsidiary directories within it.

It's usual to restrict access based either on who someone is or on where they are connecting from (their IP address).

To restrict users based on their IP address use something like:

Require ip   129.215.0.0/16
Require host ed.ac.uk

This example would restrict access to most machines within the University's network, that is within .ed.ac.uk. However, it would exclude the student accommodation network (ResNet). To allow ResNet machines to access the pages too, add the following line:

Require ip 10.0.0.0/8

Remember that IP address-based restrictions are by no means 100% secure, especially as the number of machines you allow access to increases. Note that the move to IPv6 address space will mean that you will also see addresses of the form:

# University IPv6 subnet
Require ip 2001:630:3c1::/48

which can also be used in .htaccess files.

To restrict access to a user (or group of users), they need to prove who they are. This is usually via a username/password mechanism. The Apache links below go into the gruesome detail, but the simplest methods are AuthType Basic (like the example below) or AuthType Cosign (see the Cosign page for more information).

Apache docs on restricting host access.
Apache docs on user authentication.
Apache's general guide on access control.

homepages.inf password restriction example

Here's a quick example of creating a password-restricted area on homepages.inf.ed.ac.uk. Doing the following (replacing $USER with your UUN) will make a password protected area at https://homepages.inf.ed.ac.uk/USERNAME/private/. Users will need to enter the username "test" and the password "foobar" to access the area.

cd /public/homepages/$USER/web
mkdir private
chmod o-rwx private
cd private
htpasswd -bc .htpasswd test foobar
cat > .htaccess <<EOF
# create an .htpasswd file with the following
AuthType         Basic
AuthName         "Test passwd"
AuthUserFile     /public/homepages/$USER/web/private/.htpasswd
Require          valid-user
EOF

Note that this is not secure, so do not use your DICE username and password when creating the password file. However it should be sufficient for simple, non-critical uses. A more secure way, would be to use Cosign as mentioned above.

How do I stop my web page being indexed by search engines?

This solution is not guaranteed to work for all web robots that trawl the web looking for web pages, but it should work for the more well-known and well-behaved ones - Google, for example.

robots.txt

The web-master of a web service can create suitable entries in the /robots.txt file. If you are not in a position to maintain this file yourself, then you may be able to ask the web-master. However for services like homepages.inf.ed.ac.uk we can't help, as we don't have the effort to deal with every individual request. One day we may look at automating the process, but not yet.

Meta tags

Newer robots pay attention to META tags in the HTML documents themselves. This is something you can do, and does not need action from the web-master. However, as the page says, not all robots support this. See:

https://www.robotstxt.org/meta.html

How do I stop email addresses being harvested?

It's a sad fact of life that certain individuals trawl the web looking for web pages with email addresses on them. These addresses are then harvested and sold on to spammers (people who send unsolicited junk email), who then clog up mailboxes with junk.

If a web page contains an email address (useful when telling someone how to get in contact), then it's likely that the address will start receiving junk mail.

When including an email address on a web page, try to disguise the fact that it's an email address. Humans should still be able to recognise it, but programs may not. For instance rather than putting "send mail to SomeAddress@inf.ed.ac.uk" on your web page, write "send mail to SomeAddress (at) inf . ed . ac . uk". It looks a little clumsy, but people should know what you mean, and it should throw most automated harvesters off the scent.

You can go further by using convoluted HTML to obfuscate things, for instance "send mail to SomeAddress@inf.ed.ac.uk"

Which actually looks like this in HTML:

send mail to <a
href="mailto:SomeAddress&#64;inf&#46;ed.ac.uk">SomeAddress&#64;inf&#46;ed.ac.uk</a>

This example also uses the mailto: URL in the <A> tag. This is usually an even more sure-fire way of the harvesting programs finding real email addresses. If you must use mailto, try to confuse things by using numerical entities as in the above example.

None of these tricks will guarantee that the address won't be harvested, but they should help.

How do I see who's been accessing my pages - logging?
	Though technically there are ways you can log access to your web pages yourself, you must not do so without first obtaining the Head of School's permission, and then you must abide by the various legal requirements. See the Access to web logs topic for a bit more detail.

How does AFS affect web services?

To be able to serve pages to the outside world a web server needs to be able to read the pages to be served. If the apache web server daemon can't read a file, it can't serve it. As the bulk of our file system is now based on AFS, and AFS ACLs restrict who (and what) can access files, then the web servers need to be given access (via the ACLs) to read the files it then serves on the web.

See the Serving Web pages from AFS page for more information.

How can I get my link to homepages to appear on my 'people' page?

The people pages under https://www.inf.ed.ac.uk/people/ are automatically generated from data held in Theon (the school database).

If you wish to have a 'Personal Page' link to your homepages appear beside your telephone and room number details on your contact page, then you can use the Self Service interface to add this (if you are either a member of staff or a PGR student). Changes made through this interface will take up to 24 hours to actually appear on your contact page.

My chrome is slow. How can I fix it?

If you're having chrome problems, try the following:

Close Chrome, then:

killall google-chrome

Wait a few seconds, and run the above command again, until you see "google-chrome: no process found" Then:

mv ~/.cache/google-chrome ~/.cache/google-chrome-OLD

If that fixes the problem you can safely remove the old cache:

rm -rf ~/.cache/google-chrome-OLD

Last reviewed:

18/04/2023

You are here

General web questions

General web-related 'Frequently Asked Questions'

Questions

Answers

homepages.inf password restriction example

robots.txt

Meta tags

System Status

Feedback, accessibility and data protection

Search form

You are here

General web questions

General web-related 'Frequently Asked Questions'

Questions

Answers

homepages.inf password restriction example

robots.txt

Meta tags

System Status

Choose a topic

Feedback, accessibility and data protection