The Site

Search -, mail archives, ACL, PubMed, and more

BioNLP mailing list archives

Joining the BioNLP mailing list.

The ACL Wiki for Computational Linguistics

A brief introduction to NLP

Articles from journals and proceedings

Some preprints and where to find more

Two tutorials, two bibliographies (PSB 2001)

The Literature -- Books and journals

Organizations, meetings and proceedings

On-line resources, including links to collections and on-line documents and presentations

Knowledge issues -- Ontologies, semantics, AI

Software tools

Standard corpora

Biological corpora (Medline, etc.)

BIONLP mailing list

People -- Researchers and practitioners in NLP

Grammars and parsing

Techniques -- Pattern matching, parsing, statistical approaches

Machine-readable dictionaries, lexicons

About this site

Contact Bob Futrelle using email.


Natural language processing of biology text

Founding editor: Bob Futrelle (2001)

The archives are now the primary source of content for
They are searchable via Google.

Site updated 7/5/2010
News, 7/5/2010: Here is the updated link to the collection BioNLP Resources created by Alex Morgan. Though it is six years old, it can be a useful starting point.

News, 10/31/06: The Association for Computational Linguistics has created a Wiki (link) so I've created a prominent link to it in the column to the left.

News, 7/20/05: Additional search options have been added to the search page, namely: Google Scholar, Citeseer, and BLIMP. A search on BLIMP for "2005" is impressive, returning 67 hits, as of today.

News, 6/20/05: You might be interested in a paper we prepared for use in our own research. It is an Open Access Biology paper in which we've numbered every sentence and every token within each sentence. This makes it possible for people working at a distance to discuss various items and constructions of interest to parsing, text mining, etc. Access it here.

News, 6/20/05: Google Scholar has grown to the point that it is a useful tool for finding BioNLP-related papers. A mailing list note about Google Scholar is here.

News, 4/13/05: The volume of material in our field continues to increase. So rather than trying to cache a large number of papers on the BioNLP site, the information on new papers, conferences, etc., is being distributed primarily through the BIONLP mailing list and is available in the publicly readable archives. Since you can search the archives using Google, that makes the information reasonably available.

News, 10/7/04: Two papers devoted to biology text analysis and mining have just been published in the IBM Systems Journal. Access them at the top of our Articles page

News, 9/18/04: Abstracts for six papers on Biomed text mining from PAKDD 2004 are available. Follow the Articles link on the left or go directly here.

News, 8/8/04: I've added Google phrase search of PubMed to the search page. It adds some capability missing in PubMed itself. There is also a link there to some notes about it. Follow the "Search" link at the top left of this page.
The searching question has generated a good bit of email to the list. See the August 2004 email list archives, for example.
(8/9/04:): There is an additional note in the mail archive that describes how to search the hundreds of millions of words of full text articles (not abstracts) in PubMed Central using Google phrase search. See this mail item.

News, 7/16/04: Alexander Morgan has produced a quite useful page of information about and links to a variety of freely available BioNLP resources. It is located at: It is divided into the following sections:

  • Text Processing Tools
  • Lexical Resources
  • Corpora
  • Annotation Tools

News, 3/19/04: CALL FOR PAPERS: IEEE Transactions on Knowledge and Data Engineering (TKDE)
Special Issue on Mining Biological Data, including: Literature Extraction, Text Mining, and Ontologies. Submission due date 15 July 2004. Details in this PDF extracted from the latest issue of TKDE. This news item was also sent to the BioNLP mailing list, as most of these items are. So joining the mailing list will get such information to you in a timely way.

News, 2/2/04: A lengthy new review on biomedical text mining by Shatkay and Feldman was just published. See the link on the Articles page.

News, 11/30/03: New OUP book on computational linguistics: info, table of contents.

News, 11/17/03: The much-heralded new Open Access journal, PLOS Biology, has a new issue out, Vol 1, No 2, which has feature article "Tough Mining -- The challenges of searching the scientific literature" about NLP for biology. It's a news feature rather than a technical article, but it's interesting. Access it here.

News, 11/6/03: I have created a search facility using standard Google hacks, that allows you to search the site or the mail archives or the huge ACL Anthology (at Use the Search link at the top left or right here. (You'll notice that a number of the search results start with "Return to BIONLP.ORG home page" -- that's something I need to fix.)

News, 8/19/03: The sixteen papers from the 2003 ACL Workshop on Natural Language Processing in Biomedicine are available online. You can retrieve them at: The papers are in pdf and ps format and include a Bibtex entry. A quick list of paper titles and authors is available in a BioNLP mailing list archive posting here.

News, 7/20/03: A 2002 review by Mandell and Majoros on Genomics and NLP (10 pages, PDF) Here is a copy cached on the site. See also the note about it on the articles page.

News, 7/6/03: BioMed Central research article corpus available for data mining. BioMed Central has published more than 2400 peer reviewed research articles, all of which are covered by BioMed Central's open access license policy: Unlike a traditional journal's license agreement, BioMed Central's license allows completely free reuse and redistribution of the content by anyone. Note that these are full-text articles, not abstracts. Further details are available here.

News, 6/19/03: The deadline for the SIGIR'03 Workshop on Text Analysis for Bioinformatics has been extended to June 27, 2003. We seek short papers on preliminary and recent work. Authors will retain copyright ownership and are free to submit their papers for publication elsewhere after the workshop. The workshop will be held on August 1, 2003 in Toronto.

See for more information.

News, 6/4/03: Five papers on NLP from the ISMB 2003 meeting, June 29 - July 3 are now posted on the BioNLP site in PDF format via this page.

News, 4/17/03: CALL FOR PAPERS - Submit abstracts by May 15, 2003. BioLINK has announced the meeting of the Special Interest Group in Text Mining at this year's ISMB:
BioLINK Text Data Mining SIG: Biology Literature, Information and Knowledge at ISMB 2003, Brisbane, Australia Friday, June 27, 2003 9:00 - 17:30
For details see this BioNLP list archive item.

News, 4/9/03:

BioLINK: Biological Literature, INformation and Knowledge
Mailing list and website

From their website, --
"The Special Interest Group on Text Mining (or BioLINK) was created to address the need of communication and interchange of ideas in the field of text mining and information extraction applied to biology and biomedicine...."
Go there to see more details, list of organizers of the group, online papers, etc. There is also a mailing list. Send inquiries about the mailing list to

News, 4/8/03: The SIGIR'03 Workshop on Text Analysis for Bioinformatics will be held August 1st, in Toronto, Canada. Paper submission by June 16th. Click for details in the BioNLP archive.

News, 3/13/03: The IEEE Computer Society Bioinformatics is looking for papers on NLP in Biology. Paper deadline is coming up soon, April 1, but there is a May 22 deadline for Poster Abstracts that will be published in the Proceedings. The conference will be held at Stanford, August 11-14, 2003. More information here: And here is a two-page PDF version of the call for papers.

News, 12/24/2002: TREC2003, the Text Retrieval Conference, has a Genomics track. Click for details in the BioNLP archive

News, 12/23/2002: A very useful new review paper on biology text data mining has just been published by Hirschman, et al. The citation, abstract and references are available here.

Motivation for this site

The literature of the field of biology is the largest of all the sciences. The volume of biology literature each year, measured in bytes, is about fifty times the size of the entire human genome, junk and all. But locked in this literature is an enormous amount of information that can tell us much about the structure and function of genes, proteins, cells and organisms -- how they work as well as how they can fail.

The newly emergent interest in natural language processing for biology has been christened "Information Extraction". But work in this area has been going on for many decades under different names and this site includes a good deal of information about past and current work in NLP and in information extraction for biology in particular. The other major descriptor of the general field is "Computational Linguistics".

The goals for this site include providing material and links in the following areas:

  • Introductions to NLP, including texts, papers and FAQs
  • Biological corpora, e.g., Medline, electronic journals
  • NLP databases such as lexicons and grammars
  • NLP tools such as pattern matchers, taggers and parsers
  • Advanced topics such as statistical approaches and machine learning
  • Meeting and workshop information
  • NLP preprints and reprints
  • Biology-specific NLP such as word lists, statistics of bio text
  • Research groups in NLP and biological NLP in particular
  • Development of a mailing list and archive for people interested in BioNLP

Activities in this community could include:

  • Hosting workshops
  • Developing sessions in larger meetings
  • Exchanging researchers between research groups
  • Developing performance measures, test materials and competitions

The site was created by Bob Futrelle, February 27, 2001.

Earlier News (as of 11/28/2002)

News, 12/20/2002: Computational linguist, Daniel Jurafsky, received a MacArthur "Genius Award" in 2002. Though Dan focuses primarily on speech, it's nice to know that one of our own has been so highly honored. His book with Martin is listed on our Books and Journals page.

A challenge -- BioNLP is not easy (by RPF 11/02)

News, 11/28/2002: PSB 2003 Linking Biomedical Language, Information and Knowledge, January 3-7, 2003. Papers now online.

More news, 11/28/2002: ACL 2002 Workshop on Natural Language Processing in the Biomedical Domain. Papers now online.

11/28/2002: There will be a special session at PSB 2003, "Linking Biomedical Language, Information and Knowledge". The session is part of the Pacific Symposium on Biocomputing 2003 January 3-7, 2003 Kauai Marriott Resort and Beach Club.
Here are online copies the introductory paper and all six session papers.

There was a Workshop on Natural Language Processing in the Biomedical Domain at ACL 2002 in Philadelphia. I have placed a mirror of the web pages for the workshop here which includes online copies of the twelve papers, in PDF and Postscript formats. Be warned that some of the links there are not operational, since I have not copied the entire ACL CD contents to the site(!).

11/28/2002: There was a text mining workshop at ISMB 2002 in Edmonton, Alberta, Canada on August 2nd, 2002. Here is the initial announcement. When the workshop has its own page, or I can otherwise get copies of or links to the papers, there'll be a link here.

Archives of even earlier News - Archives.

CONTRIBUTIONS: Send me your papers and reports or links to them. This site will improve primarily by the collection of contributions from researchers and practitioners from around the world. I would be happy to add links to any on-line papers and reports you have or are aware of or cache them on this site for easy access. Any links to other resources would also be most welcome.