News

Improved Text Searching

07/11

Simple text searches at rcsb.org are now easier and more accurate. Text searching from the top query bar uses the combined power of the open source Apache Solr platform and an indexing of PDBx/mmCIF data.

Access this new functionality by entering a search term or terms in the top bar of any RCSB PDB page and hitting ‘Go’ or a keyboard return. Searches for multiple words (for example, insulin receptor) and queries for adjacent words enclosed in double quotation marks (for example, “insulin receptor”) may return different results. The first search finds results where the words appear anywhere in the entry, whereas the second returns results where the search terms appear exactly as ordered.

Search results are assigned “Match Scores” to help indicate the relevance of the result and to sort structures from “Higher to Lower” matches and vice versa. The figure below shows a search for the name Perutz.

RCSB PDB News Image

When a search term appears in one of the following categories, the corresponding PDBx/mmCIF tokens are highlighted to help users gauge level of interest.

  • Structure author: one of the authors of the structure (_audit_author.name).
  • Citation author: one of the authors of the citation (primary or otherwise) corresponding to the entry. An author may appear as a citation author, structure author or both for a particular entry (_citation_author.name).
  • Citation: The title of the citation (primary or otherwise) corresponding to the entry (_citation.title).
  • Entity name: The name commonly associated with the entity matching the search. An entity is a chemically distinct part of the structure entry (_entity_name_com.name).
  • Entity Description: A description of the macromolecular contents of an entity (_entity.pdbx_description).
  • Keywords: Keywords that describe the structure (_struct_keywords.text and struct_keywords.pdbx_keywords) defined by the authors of the entry and curated by the annotation staff. The struct_keywords.pdbx_keywords token is displayed as “Classification” on the corresponding Structure Summary page.
  • Structure Title: The title of the structure entry (_struct.title).

The figure below shows the results for an entry found with the search query "insulin receptor." Note the highlighting indicating the matching fields:

RCSB PDB News Image

The next figure shows the results for an entry found with the search query insulin receptor (without quotes). More results are returned than in the previous example. Note the highlighted terms insulin, receptor, and insulin receptor:

RCSB PDB News Image

If a query match is found only in other tokens of a data file, results will be returned without highlighting and with the note “matching fields are not prominent.“ The figure below shows a search for the the term “model peptide”. In entry 3OTP, the term appears only in the _entity.details category in the structure data file.

RCSB PDB News Image

News Index