Stargazer is a bioinformatics tool for calling star alleles in various polymorphic pharmacogenes using data from next-generation sequencing (NGS) or high-density single nucleotide polymorphism microarrays (SNP chips). For NGS data, Stargazer supports data from both whole genome sequencing (WGS) and targeted sequencing.

Stargazer identifies star alleles from NGS data by detecting single nucleotide variants (SNVs), insertion-deletion variants (indels), and structural variants (SVs). Stargazer detects SVs including gene deletions, duplications, and hybrids by calculating paralog-specific copy number from read depth.

When building Stargazer, we used the clinically important CYP2D6 gene as a model for detection and interpretation of SVs in the context of other observed SNVs and indels. We purposely chose CYP2D6 as a starting point because it is one of the most complex genetic loci to genotype in the human genome (over 100 star alleles have been defined for CYP2D6, some involving a gene hybrid with its nearby non-functional but highly homologous paralog CYP2D7). Genotyping CYP2D6 is also important for precision drug therapy because it metabolizes approximately 25% of drugs and its activity varies considerably among individuals due to the gene's highly polymorphic nature.

For more details on how Stargazer works, please see the Documentation page. Thanks for your interest in Stargazer!


The latest version of Stargazer can call star alleles in the following 40 genes:


We are continuously extending Stargazer to include other clinically important genes, so stay tuned!


Stargazer was developed by Seung-been "Steven" Lee in the Nickerson lab at the University of Washington.


If you use Stargazer in a published analysis, please report the program version and cite the appropriate article.

The most recent reference for Stargazer's genotyping method is:

Lee et al., 2019. Calling star alleles with Stargazer in 28 pharmacogenes with whole genome sequences. Clinical Pharmacology & Therapeutics. DOI: https://doi.org/10.1002/cpt.1552.

The Stargazer genotyping pipeline is described in:

Lee et al., 2018. Stargazer: a software tool for calling star alleles from next-generation sequencing data using CYP2D6 as a model. Genetics in Medicine. DOI: https://doi.org/10.1038/s41436-018-0054-0.


Selected articles in which Stargazer was used for genotype analysis are listed here.

Dalton and Lee et al., 2019. Interrogation of CYP2D6 structural variant alleles improves the correlation between CYP2D6 genotype and CYP2D6-mediated metabolic activity. Clinical and Translational Science. DOI: https://doi.org/10.1111/cts.12695.

McInnes et al., 2019. Hubble2D6: A deep learning approach for predicting drug metabolic activity. BioRxiv. DOI: https://doi.org/10.1101/684357.

Taliun et al., 2019. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. BioRxiv. DOI: https://doi.org/10.1101/563866.

Claw et al., 2019. Pharmacogenomics of nicotine metabolism: novel CYP2A6 and CYP2B6 genetic variation patterns in Alaska Native and American Indian populations. Nicotine & Tobacco Research. DOI: https://doi.org/10.1093/ntr/ntz105.


Stargazer_v1.0.6 (September 20, 2019)

  • Stargazer has been extended to call star alleles in 10 additional genes (CYP2A13, CYP2F1, CYP2J2, CYP2R1, CYP2S1, CYP2W1, CYP3A7, CYP3A43, CYP26A1, POR).
  • We're very excited to introduce the Database of Pharmacogenomic Structural Variants (DPSV)! The objective of DPSV is to provide a comprehensive summary of PGx SVs detected by Stargazer from real NGS samples (see examples in the DPSV page).
  • This version produces allele fraction profiles as well as copy number profiles.
  • This version uses phased allele fractions to determine relative gene copy numbers in samples with CN>2 (e.g., CYP2D6*1/*2x2 vs. *1x2/*2).
  • This version uses more advanced SV detection algorithms, including the use of copy number-stable region or CNSR.
  • This version uses improved systems for handling input VCF files created from various tools/sources.
  • This version uses expanded haplotype reference panels for increased phasing accuracy (+/- 100kb instead of 3kb).
  • This version uses Beagle v5.1 instead of v5.0 for increased phasing accuracy and speed.
  • This version uses expanded target regions for more accurate SV detection.
  • This version produces logging messages that are more user-friendly.
  • Some of the command line arguments have been changed. See the Documentation page.

Stargazer_v1.0.5 (July 23, 2019)

  • Stargazer has been extended to call star alleles in G6PD and NUDT15.
  • Additional tools have been added to Stargazer and, because of this, the command line has been changed.
  • This version uses Beagle v5.0 instead of Beagle v4.1 for phasing SNVs/indels.
  • Stargazer now supports "VCF only" mode for both NGS data and SNP chip data.

Stargazer_v1.0.4 (March 3, 2019)

  • Stargazer has been extended to call star alleles in 28 genes.
  • Many useful optional arguments have been added.
  • This version is described in Lee et al., 2019.

Stargazer_v1.0.3 (July 9, 2018)

  • Stargazer has been extended to call star alleles in CYP2A6/CYP2A7.

Stargazer_v1.0.2 (June 14, 2018)

  • To determine the duplicated star allele in samples with three gene copies or more (e.g., CYP2D6*1x2/*4 vs. *1/*4x2), Stargazer computes allele fractions from sequence reads that carry the corresponding variant. Previous versions of Stargazer test if the observed allele fraction from a sample with three gene copies or more is greater than the mean of allele fractions from all samples within the same sequencing project that are heterozygous for the variant of interest and do not have any structural variation. This version instead uses an optimal decision boundary found with Bayesian updating for two main reasons. First, the empirical mean is not always obtainable (i.e., there is only one sample with the variant) or the mean value might not be accurate if not many samples have this variant. Second, the approach allows utilization of an informative prior that says allele fractions should be centered at 0.5 if heterozygous samples without structural variation are used.

Stargazer_v1.0.1 (April 11, 2018)

  • For detection of structural variation, this version no longer filters out loci based on the variance in read depth across the samples. Instead, it filters out pre-selected regions that have been empirically shown to produce high noise (e.g., regions in which reads are mapping multiply).
  • In order to call structural variants, this version fits every pairwise combination of known sequence structures (one for each chromosome) against the sample's observed copy number profile and then selects the combination that produces the least deviance. This combinatorial testing is used in Stargazer_v1.0.0 as well but only for samples with more than one structural variation (abnormal structure for the first chromosome and abnormal structure for the second chromosome). Basically, this version generalizes the combinatorial testing to be applied to even samples without any structural variation (normal structure and normal structure) and samples with only one structural variation (normal structure and abnormal structure). As a result, the copy number plot now displays the two best sequence structures for the sample in addition to the sample's original copy number.

Stargazer_v1.0.0 (March 11, 2018)

  • This version is described in Lee et al., 2018.