Final Submission deadline: November 15, 4pm. Each student must submit an Individual report.
For example projects, see past projects from the Stanford course..
The submission should include:
More details on these elements are given below.
Put all the files as described above in a folder named
<yourID>_coursework
on DICE, where <yourID> is your university student id.
on DICE, run the command:
submit stn cw1 <yourID>_coursework
It is a good idea to try the submission early to make sure that it is working for you. You can resubmit upto the deadline.
If you are using virtual dice, you should know that in the past virtual dice has had problems executing submissions.
The report should be a 3 pages pdf document. The list of references can be outside these 3 pages. Shorter and concise is better! Make sure to describe everything in your own words. Remember to put a suiatble title to your project and mention your team. The report should include:
Problem statement. Clearly define the problem you are solving. This can be an optimization problem, a hypothesis to validate or invalidate etc. It can be different from what you said in the proposal. A (correct) mathematical formulation will be appreciated. Keep in mind that the description should be readable by anyone without specialized knowledge. Explain very briefly the importance of the problem. State why the problem is challenging.
Related work. Mention any relevant papers and state what problem they are solving. Don't describe all the details, just clarify how your work is different.
Your solution/approach. Describe in details your idea for solving the problem. Where possible, show why it is different from previous work. USe examples. Time may not permit you do everything, in which case explain what you are presenting here and what can be done later.
Results. Describe your results. Divide into algorithms, analysis, experiments etc as suitable. See how computer science papers are uaually divided. Give plots, tables that show your results. Discuss them. You need to explain the results and their importance to us. We were not there during the work, so we cannot understand things that you do not explain. Discuss what else can be done on this topic.
Appendix. You can attach an appendix of up to 3 pages with any additional plots, results and proofs that might be important. This can include results from your teammate that may be relevant to your discussions. You will not be marked on the appendix, these are supporting material to validate what you say in the report. You can cite material in the appendix from your main report, but keep in mind that we may not read it. So the main report has to be understandable on its own.
This should be readable and well commented. The generally good way to structure it is to write code for important elements: classes, functions etc, and then to call them from different scripts, passing suitable arguments like data file, parameter values etc. ipython notebook may be good for this.
We would like to sometimes run your algorithm to see that it works. So if possible, provide a guideline pdf. This can simply consist of what command to run to which plot. And any comment you would like to include. These examples should be in terms of some small datasets, for example, those you use for testing while writing code.
Including an ipython notebook can be the easy thing to do if you are using ipython.
This should also contain clear instructions on how to run your code on a different dataset if needed.
Ideally, your code should run on DICE. (you don't have to work on it, just try to make sure the code runs.) If it is not possible to run it on DICE, please provide clear instructions on how we may be able to run it on other computers.
Submitting the dataset may be impractical if it is larger than a few megabytes. Please put it in a folder and make the folder readable by all on DICE. This can be done by running the following command from the parent directory:
chmod -R a+r <folder_name>
In your submitted code and example codes above, use the absolute full path to the dataset folder on dice. You can find the path to any directory by running
pwd
from inside the directory. This way, we will be able to run your code on the data without you having to submit the data. Test that this is working before submitting.
The dataset can also contain the small sample datasets for the examples above, and any cleaned/modified.reformatted versions of data you are using.
If your main dataset is too large for storing on your account, please provide the link to the original dataset, or put it on a cloud service and provide that link so that we can download.
You will be marked on trying new ideas, justifying them and explaining clearly. Ideas may not always work out well, but your should be able to justify your decisions and ideas.
Group work:
Writing:
Project management.
At the beginning, it is often useful to experiment with small datasets (like parts of a big one) or even artifical ones. This is useful in making sure that your code works as expected, and get intuition of where it is working and where not. You can also easily plot and visualize small graphs. These are useful examples in developing your ideas as well as explaning them in the report.
Things will not always work out the way you expected.
You may find yourself short of time. Do not panic! A common mistake is to stick too rigidly to the plan. Think what you can do instead of the original plan.
May be you can run your program on a smaller dataset. May be you can simplify the algorithm so that it does not do exactly the same thing or does not work on all types of data, but does work on some types of data. May be it is solving a slightly different problem from the original one. May be the reason it does not work is an interesting observation in itself.
Instead of worrying that you may not be able to do it, think what you can do. Often, interesting things crop up if you spend a bit of thought. If you keep notes of ideas/possibilities from start, some of them can be useful in a crunch. Your final goal is to submit some kind of an interesting report.
Comparisons.
The purpose of this plan is to make sure that you are prepared for the project and have some ideas to go about doing it. It is possible to adjust your plan as you do the project. The proposal is not marked.
Submit a half-page (not more) pdf document for your group. The document should contain:
Team. Mention your team in the document: names and student ids.
Problem statement or hypthesis. What you have now is a general topic area. A "problem statement" is something more preceise that can be unambiguously checked if you have achieved. For example, "influcence maximization" is a topic, but is vague until we state the exact problem. In class, we defined a particular influence maximization problem, by defining the independent activation model and the "problem statement" was to design an algorithm (for this model) that achieves a property: an approximation of the maximum influence in polynomial time. In your project, a problem statement can be to design an algorithm that satisfies some property X. If you are doing an experiment or data exploration oriented project, then the problem statment should consist of a "hypothesis" to be tested. A hypothesis can be something about the properties of the data (and network defined from it), in terms of, say communities, their properties, connected compoonents, shortest paths etc. In the final report, you should be able to say if the hypothesis was validated or not. The problem statement should not be something trivial, like what is already generally known. The problem statement can be fine tuned as you do the project. So it is not that you absolutely must achieve what you set out here. But it is important to have a credible problem statement now so that your project has a clear direction.
Importance. State why you think the problem is interesting and relevant. If you solve the problem, how can someone use that result?
The relevant dataset. State which dataset and attributes of the data you will use to solve the problem. Why the dataset is sufficient and has all the information needed for your project. This means that you must have found, opened and started to use the dataset by this time.
Related work. Which other papers have already addressed the same/similar problem? And optionally, how will your work be different? (you do not need to write a complete survey. Check if there are papers that are close to your problem statement figure out their topic.)
Preliminary ideas. Highlight a few techniques that you are planning to explore in the project. The more original the better! Don't be shy to try whacky ideas if you can justify them!
Evaluation and Baselines. How would you show that your solution is good and your observations are meaningful? Usually this is done by comparing with "baselines" -- existing simple methods or standard networks. How would you do evaluation? What are your baselines?
Schedule/Timeline. What are the intermediate stages for your project? When do you expect to finish each of those?
Additional consideration: Teamwork: How would you split the work in the team? Who will submit what in final version? Plan this from now. Your marking will be separate. You can include this in the plan if you wish.
Additional consideration: Computation time. How long will it take to run things on the data? Of course, you cannot know this exactly, but try some simple computations on the data so that you get some idea. If the computation is too time consuming for a large dataset you are using, how would you adapt your approach?
It may not be possible to answer all the questions above perfectly at this stage. But you should have answers to the problem statement, dataset and ideas questions. Do try to answer the rest to save issues later on.
On Piazza, post a note in the project folder, with the proposal as a pdf attachment. You are free to post it publicly, in which case other classmates can see the discussion.
Put an appropriate title for the note. In the body of the post, mention your team, and any specific questions or discussion points you have. We will try to answer them.