THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
iPhyClassifier is an interactive online tool for 16S rrnA gene sequence-based phytoplasma taxonomic assignment and group/subgroup classification. iPhyClassifier performs sequence similarity analysis, simulates laboratory restriction enzyme digestions and subsequent gel electrophoresis, and generates virtual restriction fragment length polymorphism (RFLP) profiles. Based on overall sequence similarity scores and calculated RFLP pattern similarity coefficients, iPhyClassifier makes instant suggestions on tentative ‘Candidatus Phytoplasma species’ assignment and phytoplasma 16Sr group/subgroup classification status (Zhao et al., 2009b).
Suggestions on ‘Candidatus Phytoplasma species’ assignment are guided by the principles and rules established by the IRPCM Phytoplasma Taxonomy Group (2004). Suggestions on 16Sr group/subgroup classification status are based on the classification scheme that was established by Lee et al. (1993; 1998; 2000), which has been expanded recently through computer-simulated RFLP analysis (Wei et al., 2007; 2008; Cai et al., 2008; Quaglino et al., 2009; Zhao et al., 2009a; 2009b).
Since the operation of iPhyClassifier is solely dependent upon sequence information, any error in a query (input) sequence, which misrepresents the phytoplasma strain under study, could result in erroneous group/subgroup classification and ‘Candidatus species’ assignment. While sequence errors may arise at various stages during PCR amplification, plasmid multiplication, and DNA sequencing, they usually occur randomly and can be rectified by sample replications. To ensure credible operations of iPhyClassifier, we highly recommend that consistent sequence data from at least two independent samples i.e. from two or more infected plants or insect individuals be obtained. If only one infected plant or insect sample is available for study, consistent sequence data from at least two independently cloned DNA segments derived from two separate PCRs must be obtained. Each clone (plasmid) should be sequenced in both directions and a minimum of 3X coverage per base position achieved.
Program componentsThe current version of iPhyClassifier contains the following three program modules: a sequence similarity search and pairwise sequence similarity score calculation module (PM1), an intelligent sequence trimming and virtual RFLP analysis module (PM2), and a virtual electrophoresis gel image plotting module (PM3).
PM1 carries out two functions. First, it performs pairwise nucleotide sequence comparisons (query against database entries) using the Basic Local Alignment Search Tool (blast, Altschul et al., 1990) to quickly identify a query’s phylogenetically close neighbors and to determine whether or not the query sequence comes from a phytoplasma. Second, PM1 creates a global sequence alignment between the query sequence and sequences from the reference strain of each known ‘Ca. Phytoplasma species’ using clustalw algorithm (Thompson et al., 1994) and calculates percentage nucleotide sequence similarity scores using the Myers-Miller algorithm (Myers & Miller, 1988).
PM2 consists of two Perl scripts, TrimF2nR2 and RFLP_pattern_comparison. The TrimF2nR2 script prepares input nucleotide sequences for simulated enzymatic digestions (Zhao et al., 2009b). The script parses through input sequences for generic annealing sites of phytoplasmal universal primers R16F2n and R16R2 (Gundersen & Lee, 1996) and trims each input sequence to the full-length F2nR2 region, which includes the primer annealing sites (Wei et al., 2007). On each trimmed F2nR2 sequence, the RFLP_pattern_comparison script conducts simulated enzymatic digestions, records the length of each restriction fragment, and performs pair-wise comparisons of the recorded fragment lengths. Based on summarized numbers of similar and dissimilar fragments, the script calculates a similarity coefficient (F) for each pair of phytoplasma strains as described previously (Wei et al., 2008b).
PM3 consists of two Perl scripts, VGelME and VGgelMS (Zhao et al., 2009b). While VGelME generates virtual electrophoresis gel images resulting from in silico digestions of single input sequence (an F2nR2 fragment from a single phytoplasma strain) by 17 individual enzymes, VGelMS produces gel images resulting from in silico digestions of multiple input sequences (F2nR2 fragments from multiple phytoplasma strains) by a single restriction enzyme. The latter helps identify key restriction enzymes that distinguish different group and subgroup patterns.
16S rrnA databases
The current version of iPhyClassifier incorporates three 16S rrnA gene sequence databases: DB1, a set of full- or near-full-length 16S rrnA gene sequences from reference strains of all formally described ‘Candidatus Phytoplasma species’, reference strains of IRPCM Phytoplasma Taxonomy Group proposed (2004) but yet to be formally described ‘Candidatus Phytoplasma species’, reference strains of potentially new ‘Candidatus Phytoplasma species’ identified in our previous study (Wei et al., 2007), and all type strains of other named prokaryotic species; DB2, a set of F2nR2 sequences from representative strains of established phytoplasma 16Sr groups and subgroups, and DB3, a set of F2nR2 sequences compiled from all phytoplasma 16S rrnA sequences currently deposited in the GenBank at the National Center for Biotechnology Information (NCBI), NIH, USA, the European Molecular Biology Laboratory (EMBL), and the nucleotide databases of the DNA DataBank of Japan (DDBJ). The names of the non-phytoplasmal prokaryotic species in DB1 are those validly published in the International Journal of Systematic and Evolutionary Microbiology (formerly International Journal of Systematic Bacteriology), and were obtained from the website of the Society for Systematic and Veterinary Bacteriology at http://www.bacterio.cict.fr/validationlists.html (Euzéby, 1997; Last updated on October 09, 2008). All 16S rrnA gene sequences were downloaded from NCBI’s nucleotide sequence database at http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi using the Entrez search and retrieval tool (W
The overall operational process of iPhyClassifier is outlined in Fig. 1. The aim of the entire operation is to provide meaningful suggestions on tentative 16Sr group/subgroup classification status and ‘Candidatus Phytoplasma species’ (or related strain) assignment for any phytoplasma strain under study. The operation starts by receiving query sequence(s) from users. The queries, in FASTA format, can either be uploaded as a precompiled file from the user’s computer to the iPhyClassifier web server or directly typed or pasted into the query sequence input window in the iPhyClassifier webpage.
The first step of the iPhyClassifier operation is to perform pairwise comparisons between each query sequence and the sequences in the database DB1 for quick identification of the query’s phylogenetically close neighbors. In this initial stage of sequence comparison, blast algorithm is used. If none of the ‘Ca. phytoplasma species’ in DB1 appears among the top 50 hits returned from the blast search, or one or more ‘Ca. phytoplasma species’ appear(s) among the top hits but share(s) ≤ 91% sequence similarity with the query, the operation will abort, warning that the query sequence is unlikely from a phytoplasma. If at least one ‘Ca. phytoplasma species’ is among the top hits returned from the blast search and shares ≥ 92% sequence similarity with the query, the operation will proceed and the query sequence will be fed into the clustalw program for global alignment with all phytoplasma sequences in the database DB1Phy (a subset of DB1) and for sequence similarity score calculation. Such combined search strategy aids identification of the query’s phylogenetically closest neighbor at significantly reduced computing time (Chun et al., 2007). In accordance with the convention on 16S rrnA gene sequence-based prokaryotic species delineation (Murray & Schleifer, 1994; Stackebrandt & Goebel, 1994), iPhyClassifier implements the recommendation of the IRPCM Phytoplasma Taxonomy Group (2004) and presets 97.5% 16S rrnA gene sequence similarity as the cut-off value for new ‘Candidatus species’ recognition. Since the generally conserved 16S rrnA gene sequences contain pockets of hyper variable regions, the sequence similarity score calculation should be based upon comparison of full- or near-full-length 16S rrnA genes. It requires that each query sequence covers at least 1200 positions within a 16S rrnA gene. The output of this operational step consists of the assignment of the query strain tentatively to an existing ‘Candidatus Phytoplasma species’ as a -related strain or, of the suggestion that the query represents a potentially new ‘Candidatus Phytoplasma species’, depending on the sequence similarity scores.
The second step of the iPhyClassifier operation is to trim each query sequence to the full-length F2nR2 region using regular expressions that match primer pair R16F2n/R16R2. This step is critical because, in the 16S rrnA gene-based phytoplasma classification scheme, strains are classified into groups and subgroups strictly based on RFLP patterns derived from 16S rrnA gene F2nR2 fragments (Lee, et al., 1998; 2000; Wei et al., 2007; 2008).
The third step of iPhyClassifier operation is to simulate restriction digestions on trimmed F2nR2 fragments, compares the RFLP pattern types derived from each query strain to those derived from database DB2, and calculates pairwise RFLP pattern similarity coefficients. In this step, iPhyClassifier implements the criterion proposed in our previous work (Wei et al., 2008), presetting 0.97 as the threshold similarity coefficient for delineation of a new subgroup RFLP pattern type within a given group. Thus, if the virtual F2nR2 RFLP pattern derived from a 16S rrnA gene of a phytoplasma strain under study has a 0.97 or lower similarity coefficient with 16S rrnA genes of all existing representative or reference strains of the given group, a new subgroup pattern type is recognized. Adoption of 0.97 as the threshold similarity coefficients for new subgroup delineation is warranted because it reflects precisely the existing subgroup classification scheme, in which as little as one restriction site difference can distinguish a new subgroup. A similarity coefficient of 0.85 or less with all previously recognized subgroups signals that the strain under study may represent a new 16Sr group, in agreement with all previously designated groups. RFLP patterns that have a similarity coefficient of 0.99 or 0.98 to the standard pattern type of the designated representative or reference member in a given subgroup are considered as variants of the standard pattern type. These variants or minor pattern types are denoted with one or two stars (* or **) following their corresponding subgroup letters, for example 16SrI-A* (F = 0.99) and 16SrI-A** (F = 0.98) as suggested previously (Wei et al., 2008b). Since similarity coefficient values are influenced by both the number and the particular set of restriction enzymes selected for RFLP analysis, the threshold similarity coefficients for new subgroup and group pattern type delineations are strictly based on the use of a specific set of 17 restriction enzymes originally established for classification of phytoplasmas using actual gel electrophoresis-based RFLP analysis (Lee et al., 1998). The output of this operational step is assignment of the strain under study into an existing subgroup or erection of a new subgroup. Because presence of two heterogeneous rrn operons in individual phytoplasma strains is widespread (see interoperon sequence heterogeneity issue in the Critical Issues section below), final subgroup designation of strains with heterogeneous rrn operons should be based on composite patterns derived from both rrn operons. At the end of this operational step, the query sequence is added to database DB3.
Concomitant with similarity coefficient calculation, which generates numerical output of the RFLP pattern analysis, the iPhyClassifier also provides visual output, i.e. virtual gel images resulting from the RFLP pattern analysis. The gel images reveal informative sites or “visible” genetic markers along the 16S rrnA sequences, transforming sequence information into accessible “virtual phenotypic characters” for phytoplasma strain differentiation and classification.
Issue of rrn interoperon sequence heterogeneity:The genomes of all four completely sequenced phytoplasma strains and numerous reference strains of ‘Ca. Phytoplasma species’ harbor two ribosomal RNA operons, rrnA and rrnB. In many strains, the sequences of the two rrn operons differ from each other (Lee et al., 1993; 1998; Firrao et al., 1996; Liefting et al., 1996; Davis & Sinclair, 1998; Jomantiene et al., 2002; Davis et al., 2003). For those phytoplasma strains with two heterogeneous rrn operons, if the sequence variations between the two operons fall into restriction sites within the 16S rrnA gene F2nR2 region, two different virtual 16Sr RFLP pattern types will result from iPhyClassifier operation. It is therefore important to distinguish between subgroup pattern type(s) and final subgroup designation, and to avoid erroneous assignment of the same strain into two different 16Sr subgroups. In this regard, iPhyClassifier adopt the recommendation by Wei et al. (2008b) and use a three-letter subgroup designation, where the first and second letters (in parenthesis) denote the RFLP pattern types of rrnA and rrnB, respectively, and the third letter designates the 16Sr subgroup. For example, Dandelion virescence phytoplasma (DanVir), a member of the previously delineated subgroup 16SrIII-P (Jomantiene et al., 2002), possesses two sequence-heterogeneous rrnA operons displaying two different 16Sr RFLP patterns, 16SrIII-P (rrnA, AF370119) and 16SrIII-O (rrnB, AF370120), therefore the subgroup status of DanVir is re-designated as 16SrIII-(P/O)P.
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990). Basic local alignment search tool. J Mol Biol 215, 403-410.
Cai, H., Wei, W., Davis, R.E., Chen, H. & Zhao, Y. (2008). Genetic diversity among phytoplasmas infecting Opuntia species: virtual RFLP analysis identifies new subgroups in the peanut witches'-broom phytoplasma group. Int J Syst Evol Microbiol 58, 1448-1457.
Chun, J., Lee, J.H., Jung, Y., Kim, M., Kim, S., Kim, B.K. & Lim, Y.W. (2007). EzTaxon: a web-based tool for the identification of prokaryotes based on 16S ribosomal RNA gene sequences. Int J Syst Evol Microbiol 57, 2259-2261.
Davis, R.E., Jomantiene, R., Kalvelyte, A. & Dally, E.L. (2003). Differential amplification of sequence heterogeneous ribosomal RNA genes and classification of the ‘Fragaria multicipita’ phytoplasma. Microbiol Res 158, 229-236.
Davis, R.E. & Sinclair, W.A. (1998). Phytoplasma identity and disease etiology. Phytopathology 88, 1372-1376.
Euzéby, J.P. (1997). List of Bacterial Names with Standing in Nomenclature: a folder available on the Internet. Int J Syst Bacteriol 47, 590-592.
Firrao, G., Gobbi, E., Carraro, L. & Locci, R. (1996). Molecular characterization of a phytoplasma causing phyllody in clover and other herbaceous hosts in northern Italy. Eur J Plant Pathol 102, 817-822.
IRPCM Phytoplasma/Spiroplasma Working Team–Phytoplasma taxonomy group (2004). ‘Candidatus Phytoplasma’, a taxon for the wall-less, non-helical prokaryotes that colonize plant phloem and insects. Int J Syst Evol Microbiol 54, 1243-1255.
Jomantiene, R., Davis, R.E., Valiunas, D. & Alminaite, A. (2002). New group 16SrIII phytoplasma lineages in Lithuania exhibit interoperon sequence heterogeneity. Eur J Plant Pathol 108, 507-517.
Lee, I.-M., Davis, R.E. & Gundersen-Rindal, D.E. (2000). Phytoplasma: phytopathogenic mollicutes. Annu Rev Microbiol 54, 221-255.
Lee, I.-M., Gundersen-Rindal, D.E., Davis, R.E. & Bartoszyk, I.-M. (1998). Revised classification scheme of phytoplasmas based on RFLP analysis of 16S rrnA and ribosomal protein gene sequences. Int J Syst Evol Microbiol 48, 1153-1169.
Lee, I.-M., Hammond, R.W., Davis, R.E. & Gundersen, D.E. (1993). Universal amplification and analysis of pathogen 16S rDNA for classification and identification of mycoplasmalike organisms. Phytopathology 83, 834-842.
Liefting, L.W., Andersen, M.T., Beever, R.E., Gardner, R.C. & Foster, L.S. (1996). Sequence heterogeneity in the two 16S rrnA genes of Phormium yellow leaf phytoplasma. Appl Environ Microbiol 62, 3133-3139.
Murray, R. G. E. & Schleifer, K. H. (1994). Taxonomic notes: a proposal for recording the properties of putative taxa of procaryotes. Int J Syst Bacteriol 44, 174–176.
Myers, E.W. & Miller, W. (1988). Optimal alignments in linear space. Comput Appl Biosci 4, 11-7.
Quaglino, F., Zhao, Y., Bianco, P., Wei, W., Casati, P., Durante, G. & Davis, R.E. (2009). New 16Sr subgroups and distinct SNP lineages among grapevine Bois noir phytoplasma populations. Annal Appl Biol 154, in press.
Stackebrandt, E. & Goebel, B. M. (1994). Taxonomic note: a place for DNA-DNA reassociation and 16S rrnA sequence analysis in the present species definition in bacteriology. Int J Syst Bacteriol 44, 846–849.
Thompson, J.D., Higgins, D.G., & Gibson, T.J. (1994). CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673–4680.
Wei, W., Davis, R.E., Lee, I.-M. & Zhao, Y. (2007). Computer-simulated RFLP analysis of 16S rrnA genes: identification of ten new phytoplasma groups. Int J Syst Evol Microbiol 57, 1855-1867.
Wei, W., Lee, I.-M., Davis, R.E., Suo, X. & Zhao, Y. (2008). Automated RFLP pattern comparison and similarity coefficient calculation for rapid delineation of new and distinct phytoplasma 16Sr subgroup lineages. Int J Syst Evol Microbiol 58, 2368-2377.
Wheeler, D.L., Benson, D.A., Bryant, S., Canese, K., Church, D.M., Edgar, R., Federhen, S., Helmberg, W., Kenton, D. & other authors (2005). Database resources of the National Center for Biotechnology Information: Update. Nucleic Acid Res 33 (Database issue), 39–45.
Zhao, Y., Sun, Q., Wei, W., Davis, R.E., Wu, W. & Liu, Q. (2009a). ‘Candidatus Phytoplasma tamaricis’, a novel taxon discovered in witches’-broom diseased salt cedar (Tamarix chinensis Lour.) Int J Syst Evol Microbiol 59, in press.
Zhao, Y., Wei, W., Lee, I.-M., Shao, J., Suo, X. & Davis, R.E. (2009b). Construction of an interactive online phytoplasma classification tool, iPhyClassifier, and its application in analysis of the peach X-disease phytoplasma group (16SrIII). Int J Syst Evol Microbiol 59, in press.