|
|
Instructor: Mark LeBlanc
Mark's Web Page -- Email
Office: Science Center[B103]
Office Hours: by appt. or
Mon 11:30-12:30, 3:30-4:30
Wed 11:30-12:30
|
Phone: 286-3970
Class Meeting Times:
MWF 9:30-10:20am, A118
Lab: Wed 3:30-5:20, csLab
|
" ... then (the algorithms of today) will become 'simple' problems and a
new generation of challenges, which we can now only barely imagine, will take
their place on the frontier of what it is possible to do with computers."
Aho and Ullman, Foundations of Computer Science, p95.
Texts: Data Structures and Algorithm in C++,
by Michael T. Goodrich, Roberto Tamassia, and David Mount.
John Wiley and Sons, 2004. Required. An excellent introduction to the importance
of and rigor in algorithm analysis.
Applications Programming in ANSI C (3rd Ed.) by Johnsonbaugh
and Kalin.
Prentice Hall, 1996. Required ... because everyone needs a good reference
on C and this is a very good book; you'll pull it down from the shelf
a lot
Selected Theme Readings in Genomics:
Krane, Dan E. and Raymer, Michael L. (2003).
Fundamental
Concepts of Bioinformatics. Benjamin-Cummins, Boston, MA.
Developing Bioinformatics Computing Skills,
Cynthia Gibas & Per Jambeck. O'Reilly, 2001. Ch. 1, 2, and 7 especially
good.
Goodman, Nat. (2002). View from the waterworks: the world of NCBI. Genome Technology, No. 23, July 2002, p 74-79.
Other good books in the Library and/or in the csLab include:
Introduction to Algorithms, (2nd edition) by Cormen, Leiserson, Rivest, and Stein.
McGraw Hill, 2001. (Highly recommended when you need more depth in a particular topic).
The Algorithm Design Manual, by Steven S. Skiena,
Springer-Verlag, 1998. An excellent starting point for a wide
range of problems.
Content of the course: An introduction to the mathematical foundations, design, implementation and computational
analysis of fundamental algorithms. Problems include heuristic searching, sorting, several graph theory problems, string matching,
and the theoretical expression of their orders of growth. Out-of-class assignments and hands-on labs emphasize the balance between
theoretical hypotheses (that is you figure out on a napkin how long an algorithm will take to run ) and experimental verification
(if your napkin
math says the algorithm will finish in a reasonable time, then you implement a program to actually run an experiment).
Theme: The Fall 2003 theme throughout the
course is Genomics. You grapple with what it means computationally
to have (nearly complete) sequenced genomes available on the web, including
human's 3 billion base pairs. The DNA alphabet of base pairs (letters)
is made up of only four choices: A, G, T, or C
(the abbreviations for their chemical equivalents, Adenine, Guanine,
Thymine, and Cytocine). In short, genomics is "computer
science meets biology" or "biology meets Big Oh." For the purposes of
computing, a DNA molecule is a word (a string of characters) where the
characters are taken only from this four-letter alphabet (A,C,G,T), for
example the string: or 5-mer "ACGTC". Strings, strings, strings -- DNA,
DNA, DNA. (Actually, we could also use protein-coding strings of amino
acids, but we will stick with DNA this semester). In particular, we will
focus on (i) string matching algorithms (finding all occurrences of "ACGTC"
in the DNA sequence of C.elegans or some other organism), (ii)
the representation of strings in a suffix tree data structure, and (iii)
state-of-the-art sequence databases and their search algorithms (e.g.,
BLAST) and their potential uses. We are especially interested in addressing
some of the "open search questions" which will require computational methods,
e.g., dynamic programming to find inverted repeats that do not base-pair-match
exactly (irregular palindromes) in sequences of DNA. In particular, you
will do four labs and an individual final project on problems in genomics
and/or evolutionary trees. Final projects include the potential for you
as an undergraduate to ask novel research questions as you search DNA
sequences from the human or other genomes.
The sequencing of genomes (e.g., files filled with megabytes of A,C,G,T's)
provides a new and rich source of data that is screaming for programmers
to help make sense of it all. Said differently, the biologists know that
they can not do the work alone: they need you! And software developers
rely on the expert scientist(s). We will use the expertise of local biology
faculty doing research in this area. They will serve as the "lead scientist"
and help us define problems you might work on. The intent is for some
of your homework to be research.
Designing and implementing algorithms to search genomes is like standing
at the bottom of a mine with a pick axe: "There's a lot of mining to be
done; might as well start here." Some will find gold.
This course does not require previous experience in college-level biology,
although it is clear that a number of students do come with some background.
Our genome expertise and guidance will come from Dr. Betsey Dexter Dyer,
Professor of Biology here at Wheaton College. Some labs and your final
project will be true research. Your individual final project has the potential
to ask an original research question.
Curriculum: This course is one of four core courses that are required for a computer
science major
or minor at Wheaton. In many ways,
Algorithms is the first course
that challenges students with the sometimes subtle rigor of the discipline.
It seems that everyone loves computers today, but the requirements for
reliable and efficient software go far beyond the glitz of the web. The sections
in an Algorithms text, for example: Big Oh notation,
growth rates, combinatorial search, heuristic methods and intractable problems are themes which emerge
throughout most subdisciplines of computer science.
As always ...
In computer science, if you are almost correct you are a liability.
Fred Kollett (1941-1997), Professor of Math and CS, Wheaton College
Topics:
Exact pages to read are assigned in lecture and lab.
Difficulty
Level |
Reading |
Topic |
** |
ANSI C text |
A taste of C...
Porting your code to other platforms: CodeWarrior(MacOS), Visual
C++(Win), gcc (RedHat Linux). Intro to input/output in C.
|
** |
you program |
a0: Quick C review
due Friday, Sept. 05 |
********** |
Ch. 3 |
Big Oh and napkin math ...
because f(n)=O(g(n)) matters |
**** |
you program
Ch. 1 |
a1: FindAllPrimes()
due Mon., Sept. 15
On to C++ ... |
* |
selected reading: "View from the waterworks: the world of
NCBI." Genome Technology, July 2002. |
Mon., Sept 15 Prof. Betsey Dyer (Biology) guest lecture:
Genomics 101 (with a live Julia-Child-like demo of separating
DNA) |
** |
Genomics Lab1 |
BLASTing the 16S RNA
Genes
Wed, Sept. 17 |
***** |
notes on string searching
Ch. 11 |
Search -- the human genome is 3 billion letters ... no time to
waste! Knuth-Morris-Pratt (KMP) algorithm. |
**** |
you program |
a2: eLmer Jr.
Searching for strings of DNA
due Mon. Sept. 22 |
**** |
Genomics Lab2 |
Tweeking eLmer Jr.
Wed, Sept. 24 |
********** |
selected readings in text |
Recursion, recurrence relations, induction (oh yeah, prove it) |
****** |
you program; recurrence relations and closed-form revisited |
a3: isPal() - Finding Inverted Repeats
due Wed., Oct 22 |
****** |
Krane and Raymer (2003), Ch. 2 |
Dynamic Programming; Needleman-Wunsch and Smith-Waterman
algorithms |
** |
Guest Lecture |
Friday, Oct. 17 Prof. Tommy Ratliff (Mathematics) on "Finding Roots of Equations" |
******** |
Ch. 10 |
Sorting (your socks et al.) |
**** |
you program |
a4: Experiments in Sorting
due Fri., Oct. 31 |
**** |
Genomics Lab3
|
Specifications for Research Project on the Genomics of Olfaction
Fri, Oct. 31 |
******* |
Ch. 12 |
Graph Data Structures |
***** |
Ch. 12 |
Breadth-first, depth-first search |
************ |
Notes |
Adversarial search in games -- check mate dude |
**** |
Genomics Lab4
|
Briefing to coordinate your experiment, results, and presentation
Mon., Nov. 24 |
*********** |
Ch. 6 |
Minimum Spanning Tree |
*********** |
Ch. 12 |
Shortest Path -- I got here first |
*********** |
you program |
Final Project in Genomics
oral presentation of results:
Sunday, Dec. 07 in the evening |
**** |
you program |
a5: King Saul Checkers
due Fri., Dec. 12 |
The exact pages for you to read in your text and/or on reserve will be assigned in lecture.
Your Grade:
Things to do |
Grading Percents |
Frequency |
4 Genomics Labs | 5% |
Sept. 17, Sept. 24, Oct. 31, Nov. 24 |
Homeworks | 15% | TBA |
6 Programs | 40% | (see dates above) |
2 Exams | 20% |
Wed., Oct. 08 and Wed., Nov. 07 |
Final project | 10% |
Oral Presentation of results: Sunday evening, Dec. 07 |
Final Exam | 10% |
TBA |
Late Submissions:
Due is due. Always turn in whatever you have on time. Something
turned in on time is much better thanreceiving no credit because it is
late. If the program is "due" by Tuesday, 5am, I reserve the
right to turn OFF the dropbin, that is, you will not be allowed to submit
your source code after that point. If your code does not compile/run/work
right, submit what you have and tell me in your README file. So
let us agree together: all assignments are due as stated. (Exceptions
granted for serious reasons). If your professor is your boss (and he is
of sorts) and your boss wants it when she said she wants it, then you
submit what you have and explain what it does and does not do in your
README file. Late is not an option. (Good, glad we
can all agree with this).
Honor Code Revisited: It goes without saying that all submitted work
will be the student's own, in keeping with the Wheaton Honor Code. Use discretion;
don't ask your colleague for "the" answer or for a section of code. However,
I do encourage you to discuss the problem in general, such as the type of
data structure one might use or to find a syntax error. For homework, your
answers and code must be your own from beginning to end. On the exams, you
may not get help from anyone but the instructor.
HOMEWORK
(0)It is expected that you spend at least 3 hours on reading and practice problems for every 90 minutes of lecture. This computes to at least 6 hours of work in the text per week. This should be done throughout the semester and not just when studying for exams. The material is cumulative in a big way; for example, week 5 depends heavily on weeks 1 through 4.
(1) It is expected that you spend at least 6-10 hours per week on your current programming assignment. WARNING: Programmers typically underestimate the time it takes to complete a software project; 6-10 hours per week on your programming assignment may be one of those "underestimations."
EXAMS
There are two exams during the semester and a final exam during finals week.
There will be no makeups, nor will the lowest exam grade be dropped. If you are an athlete and/or you have a conflict with an exam date, please see me within the first week of classes.
LABS in GENOMICS
There will be four labs in genomics. In Lab1 you will use the most common
and one of the highest powered web-based genomics tools: BLAST. BLAST
helps you "blast" your way through multiple genomes, for example, to see
just how alike the DNA sequences within genes from one genome (e.g., a
bacteria) match the DNA sequences in genes from other organisms (e.g.,
a worm, a human, etc). After completing your third program (eLmer Jr.
-- a program to find DNA words of length L), Lab2 will be a demo of your
program to a biologist and the biologist will suggest some possible "tweaks"
to your program, e.g., how to alter eLmer to look for other things. In
Lab3, we will discuss the specification for your final projects. Lab4
will serve as a briefing where you can report on the status and your plans
for your final project, experiment, and oral presentation.
FINAL PROJECT in GENOMICS
You will work on one "large" problem in genomics. The biological significance
of the problem will be explained by a biology expert; however, the software
will be entirely up to you. More specifically, you will work on the "Genomics
of Olfaction" where you will design and implement an experiment to
find locate groups of DNA sequences within genes which have the
highest summation of hydrophobic values. Many more details to follow.
An oral presentation of your results (as if to a large audience at a conference)
and documented software are included in the requirements.
HELP
I have listed my office hours on the syllabus. But you know I'm always
near a keyboard! Don't let this material bury you. Study, study, study
and talk about it with me and others as often as you can. Homeworks and
programming assignments can be especially challenging. Visit me; ask questions!
Please don't wait too long before you see me;
a quick chat in my office can often clear things up.
I'm here a lot
|