18 Sept 2008--bio projects
Monica Sweat, Aaron, Soojin, Mark Borodovsky, Charles Isbell, Jung, Greg Turk, and Mark met in Klaus.
After a brief discussion where Monica brought everyone up to speed on the Bioinformatics course, the meeting focused on biology projects that could serve as targets in a BioCS1 – activities that we think CS1 students would find motivating and engaging, and through which, we could teach CS1 concepts and skills.
The projects are described here roughly in increasing complexity.
- Given a string of nucleotides, return the "base composition," e.g., what percentage of all nucleotides are A's?
- Given a DNA string of nucleotides, can we find a given subsequence?
- Given a string, can we find a subsequence with constraints, e.g., "starts with GTTT, ends with ATG, and contains an AAA somewhere in the middle"? This can already be pretty computationally intense, giving us the opportunity to talk about optimizations and algorithmic complexity.
- What is the "edit distance" (number of changes or mutations) between two sequences?
- Can we find an RNA palindrome, where the sequence is the same forward or backward, reflected around a mirror point?
String as instructions
We talked about reading a string as a series of instructions:
- Construct an interpreter that allows the user to definea figure (e.g. "l10,10.20,20b15,15" might mean "draw a line from 10,10 to 20,20, then place a black dot at 15,15") or to define a series of instructions for a turtle/robot (e.g., "UUDLR" might mean "Up, Up, Down, Left, Right"). The idea is to talk about viewing DNA as a list of instructions, and to touch on CS ideas of interpretation.
- We also talked about protein folding as a way of defining pictures.
- Monica suggested actually building a program to simulation protein synthesis: Walk a string of nucleotides, match the codons up to amino acids, and then to the proteins that will be created from those amino acids. This activity uses hash-tables and dictionaries, and gets into "interpretation" by interpreting "stop codons," too. Strings and textfiles can be introduced to hold the nucleotide sequences and the intermediate results.
- At this point, it would be natural to talk about cell regulatory processes, how the equivalent of "If-Then"'s appear in the cell.
- Once we have Monica's project, we can go on to talk about how random mutation in the nucleotide sequence can lead to changes in the amino acids and proteins. We can also talk about sex as another way to create a diversity of nucleotide sequences to test.
- Once we're on the path of considering mutation vs. other forms of generating sequences, plus the "edit distance" from earlier, we can talk about constructing a genealogy – which sequences most likely were ancestors (even parents) of other sequences? This may require a good bit of probability, and even some CS2 ideas (like trees, networks, and other dynamic data structures). We decided not to filter for CS1 vs. CS2 ideas at this point – we'll work that out later.
- We could do predator/prey simulations. Aaron was excited about the idea of talking through how some predator-prey simulations were simple, solvable equations, and then with only slight changes, the simulations were no longer easily solvable and a dynamic component would be necessary. Thus, we can talk about analytical-only vs. computational-necessary simulations.
- Greg talked about doing different kinds of Predator/Prey simulations on a grid, which makes them simpler to solve and still quite interesting computationally. (See the article Greg sent and is uploaded on the Resources page.)
- Disease propagation simulations are natural here. We can talk about catastrophe theory and instabilities, and epidemiology and models of disease transmission.
- Cellular automata are natural here and more easily constructed than other simulations. We were less excited about Conway's "Game of Life" and more excited by those cellular automata that actually do explain real world behavior (e.g., the model describing wildfire propagation).
Modeling the Individual
- Abelson and diSessa's Turtle Geometry does some nice simulations of individual animal behavior and how that changes depending on the sense organs available, e.g., how an animal searches if they can only test gradients with their nose, and how an animal searches with only one eye, versus two eyes.
The plan at the next meeting is to start laying these ideas out against what CS1 ideas:
(1) are necessary to solve the problem (and thus have to be learned for this project, or previously),
(2) are covered by the collection of projects, in order to identify CS1 concepts that aren't covered by this set of projects yet.
Whiteboards from meeting (click on images to see full-size):