Introduction to Bioinformatics
Pages: 1, 2
Microarrays

Section of a microarray image, courtesy of Eric Jeffery, Corixa Corporation
Microarray technologies show us which genes are turned on in different cell types in different circumstances. In response to infection, for example, certain cell types will express sets of genes and synthesize certain proteins that respond to the stress. Messenger RNA (mRNA) is like a photocopy of a blueprint that is used in the shop to build a specific type of protein. In a microarray, we can attach sequences from a range of genes to a glass slide in a series of dots, and then bind the mRNA extracted from a population of cells and measure how much binds to each dot. That gives us a snapshot of which genes are being expressed at any given time. Compare the patterns for mRNA from, for example, normal breast tissue and from a breast tumor, and you can identify proteins that are only present in the tumor. Those proteins are potential targets for cancer treatments, vaccines, and other therapeutics.
Systems Biology

Edited screenshot taken from CytoScape tutorial (www.cytoscape.org)
The genome gives us all of the genes in an organism, and microarrays tell us which subset is expressed in a particular biological process. Now the bottleneck in understanding biology is shifting to the world of proteins and the interactions between them. The traditional approach of dissecting out individual interactions with the help of mutations and inhibitors just doesn't scale. That is where systems biology comes in with a slew of novel technologies aimed at seeing the big picture of everything going on in a cell.
New advances in mass spectrometry have allowed this established chemical analysis technology to identify the components of complex mixtures of proteins. Inventive chemical labeling techniques provide insight into the transient interactions between different proteins in the cell. This bundle of new technologies is called proteomics.
The integration of all of these results with gene expression data and the collective knowledge of cell biology, contained in the scientific literature, becomes another huge challenge. This is leading to exciting work in textual analysis, pathway modeling, and network visualization.
Structural Biology

Nitrogenase structure 1CP2 displayed in MacPyMol (pymol.sourceforge.net)
While our abstraction of the DNA sequence works remarkably well, in the world of proteins the nuances of three-dimensional structure are everything. Structural biologists determine the structure of proteins using X-ray crystallography and nuclear magnetic resonance, a slew of heavy numerical methods, and a lot of computing. This is a huge field in its own right that predates bioinformatics by several decades. It focuses on the details of structure, the dynamics of molecular motion, and the specific interactions with drugs and other proteins. Bioinformatics, with its focus on huge volumes of data, has often had an uneasy interface with structural biology; "quantity versus quality" some might say, but that distinction is becoming every more blurred as all of these data sources become more integrated.
Software in Bioinformatics
Two main factors have shaped the current landscape of bioinformatics software. As already mentioned, the field has been driven by the massive amount of data and the research projects that generate it. As a result, most people in bioinformatics work on very focused projects and few have the luxury to sit back and write the ideal program for gene prediction, for example.
In addition, the technologies used in the lab, and the data they produce, have evolved very rapidly. That has made it very difficult to commit a lot of resources and time to specific pieces of software. The lifespan of a software project is often quite short and the lead time before deployment is minimal. Being able to understand the essence of a problem and hack up a quick solution that gets the job done are critical skills for a good bioinformatics developer.
A classic example is the genome assembler written by Jim Kent at UC Santa Cruz. Excellent software already existed for assembling the fragments of data produced by sequencing instruments into large blocks, but it could not handle the scale of the task that the Human Genome Project had created. Rather than try to modify existing code, it made sense for Kent to start from scratch and build something, in very short order, that was tailored to the task at hand. More than a quick hack, but a lot less than a complete, polished product, Jim's software assembled the human genome.
Refined, mature software packages usually emerge from research groups with a direct bioinformatics focus, as opposed to playing a support role in, say, a genome center. Of all of the software out there, the "killer app" in bioinformatics has to be BLAST, the suite of sequence comparison tools from NCBI, the National Center for Biotechnology Information at the NIH. The BLAST team built a very fast sequence-comparison engine that could search the entire contents of GenBank in seconds. Over the years, they have improved performance and extended their algorithms, but have always retained their focus on what they do well. As a result, every molecular biologist that has ever looked at a sequence has used the NCBI BLAST server.
What Role Is There for Mac OS X?
Molecular biologists have had a long history of involvement with the Mac, in part from a natural gravitation to the platform but undoubtedly helped by Applied Biosystems' choice of the Mac as the front end to its DNA-sequencing instruments. That changed in the mid-90s as the Windows interface improved and the price/performance ratio shifted in favor of the PC platform. Computer scientists and developers coming into bioinformatics, on the other hand, were used to using Unix. The rise of Linux has locked that preference firmly in place.
Mac OS X has the potential to be the ideal platform for bioinformatics development, with Unix under the hood, a great desktop, productivity applications, and integration with Windows systems. Porting existing bioinformatics packages from Linux is usually straightforward, and many are already available from the Fink project. Being able to expose command-line tools to the desktop user in a simple way will broaden their user base dramatically, and Mac OS X provides several ways in which to build this kind of interface.
Getting Involved in the Field
Bioinformatics is a very rewarding area for software developers to work in. There is something for everyone, whether you're into the minutiae of database design, complex user interface design, advanced statistical algorithms, or good old Perl script hacking. The technologies that produce the data you work on are amazing. The data and the biology behind them are fascinating. On top of that, biologists tend to be nice people to be around!
One topic that we will cover in the coming months is how you get started in bioinformatics: what programming skills you need, how much biology you should know, and how to build a lifelong career in the field.
Bioinformatics and the Mac DevCenter
Over the next few months we will cover these topics and others in more detail. Where possible, we'll also include short tutorials that introduce you some of the key software tools used in bioinformatics. These will guide you through analyses of real datasets from the Human Genome Project and elsewhere. They won't make you an expert, but I hope they will spur you in to further explorations of your own. Stay tuned!
Robert Jones runs Craic Computing, a small bioinformatics company in Seattle that provides advanced software and data analysis services to the biotechnology industry. He was a bench molecular biologist for many years before programming got the better of him.
Return to the Mac DevCenter
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 4 of 4.
-
Yes, nice summary...
2004-06-13 09:13:42 raydreams [Reply | View]
I'm always looking for a nice 1-2 page article givign a quick run down of bioinformatics to send to people.
Yes, unfortunately, getting a career in bioinformatics seems to happen (like many careers) by accident or when you're not really looking for it. I myself have an undergrad degree in biology and doing wetlab molecular biology (for seven years) where part of my responsibility was sequencing, submitting and searching GenBank. Which ultimately lead to me going back to school for a CS undergrad (and presently a grad). Which lead to a job for a company called Perceptive Scientific Intruments writing cytogenetics software. But PSI got bought out by their competitior Applied Imaging - resulting in layoffs. Now I'm stuck in generic IT (which was not my motivation for getting a CS degree), but ironically working with (sometimes) the Bioinformatics Division at NASA-JSC - though not bioinformatics as described here.
Unfortunately, a desire to want to do bioinformatics is not enough. Companies that do real bioinformatics don't hire too often. A person would have a better chance breaking into the field by fiding a position working for a university researcher - as a student or employee.
-
Enjoyed the article
2004-06-12 16:48:16 bj_ray [Reply | View]
I think it is interesting that you plan to discuss how an individual might enter this field. Typical articles on specialized topics don't really provide a path for someone who might be interested in it as a career. This field, in particular, doesn't seem to be the type of field that any IT developer could simply apply for. It is especially interesting to me being an IT software developer with a degree in Electrical Engineering and wishes that I would have pursued Biomedical Engineering back when I was in school. Maybe this will be a guiding light. Looking forward to your next article.
-
Reading list
2004-06-12 13:54:23 mariox19@mac.com [Reply | View]
Is there any substantial introduction to the field of bioinformatics for a general audience. If you have no background in biology, how will you be able to investigate whether or not you personally will find the field interesting and whether or not you have an aptitude for it?
Nice article by the way. I'm looking forward to the follow ups.






Third is the reality that bioinformatics is not a theoretical science; it is driven by the data, which in turn is driven by the needs of biology. Relatively few researchers have the luxury to develop algorithms and theories in the traditional academic sense. Most people are fully consumed in the day-to-day management and analysis of data.
My perspective might be idealistic, but the reasoning behind it might be worth thinking about.
The view you present is certainly present in many bioinformatics support groups, but I have I have mixed feelings about it. I worry that a lot of people don't think enough about the fact that the methods are (done properly) encapsulations of models of biological theory. Unless you understand that underlying theory well enough, you're going to use the tools in a sloppy way.
The reason I worry, is this that much the bulk processing is vulnerable to a bioinformatic variant of the old garbage in - garbage out rule.
Likewise, if you are going to develop new analytical methods, you had better have a deep understanding the biological systems involved.
There is plenty of need for "straight" IT - sys. admin., GUI designers, web services, databasing, etc. There is a wide range of skills used in the bigger teams ranging from the computational biologist who is thoroughly versed in the subtle aspects of the field through to the local "Unix geek" who can run rings around anyone in the place in the command line but who knows damn-all biology. The high-throughput stuff is needed and probably the bulk of these teams aren't at the computational biology end of the spectrum, but someone along the way needs to take responsibility for the approach used. People coming to the field should think about where they belong on that pipeline.
Data management does take up a large part of most research projects (of any kind), but at the heart of it will be some analytical processes.
For those wishing to get into the algorithmic side of things, there certainly is scope for new developers from straight IT backgrounds -- but under the guidance of someone experienced to ensure that what they are doing is meaningful. I'd second another poster's advice to work under someone who has experience in the field first. There are plenty of studentship, internship, etc., opportunities out there.
Hmm, this got rather long. Must have lit my wick :-) I Hope this isn't wasted bandwidth.