Stargazer is a bioinformatics tool for calling star alleles (haplotypes) in PGx genes using data from NGS or SNP array. Stargazer accepts NGS data from both WGS and TS.
Stargazer identifies star alleles by detecting SNVs, indels, and SVs. Stargazer can detect complex SVs including gene deletions, duplications, and hybrids by calculating paralog-specific copy number from read depth.
When building Stargazer, we used the clinically important CYP2D6 gene as a model for detection and interpretation of SVs in the context of other observed SNVs and indels. We purposely chose CYP2D6 as a starting point because 1) the enzyme it encodes metabolizes ~25% of prescription drugs, 2) its activity varies considerably among individuals due to the gene's highly polymorphic nature, and 3) it is one of the most complex genetic loci to genotype in the human genome. To date, over 100 star alleles have been defined for CYP2D6, some involving a gene hybrid with its nearby non-functional but highly homologous paralog CYP2D7.
For more details on how Stargazer works, please see the Documentation page. Thanks for your interest in Stargazer!
Following is the list of abbreviations and acronyms used throughout this website:
1KGP, The 1000 Genomes Project; AF, allele fraction; AS, activity score; CN, copy number; DOI, digital object identifier; DPSV, The Database of Pharmacogenomic Structural Variants; GDF, GATK-DepthOfCoverage format; indels, insertion-deletion variants; NGS, next-generation sequencing; PGx, pharmacogenomics; SGE, Sun Grid Engine; SNP, single nucleotide polymorphism; SNV, single nucleotide variant; SV, structural variant; TS, targeted sequencing; VCF, variant call format; WGS, whole genome sequencing.
The latest version of Stargazer can call star alleles in the following 44 genes:
CYP1A1, CYP1A2, CYP1B1, CYP2A6/CYP2A7, CYP2A13, CYP2B6/CYP2B7, CYP2C8, CYP2C9, CYP2C19, CYP2D6/CYP2D7, CYP2E1, CYP2F1, CYP2J2, CYP2R1, CYP2S1, CYP2W1, CYP3A4, CYP3A5, CYP3A7, CYP3A43, CYP4B1, CYP26A1, CYP4F2, CYP19A1, DPYD, G6PD, GSTM1, GSTP1, GSTT1, NAT1, NAT2, NUDT15, POR, SLC15A2, SLC22A2, SLCO1B1, SLCO2B1, TBXAS1, TPMT, UGT1A1, UGT2B7, UGT2B15, UGT2B17, VKORC1.
We are continuously extending Stargazer to additional genes, so stay tuned!
Stargazer was developed by Seung-been Lee (he goes by "Steven") in the Nickerson lab at the University of Washington.
If you use Stargazer in a published analysis, please report the program version and cite the appropriate article.
The most recent reference for Stargazer's genotyping algorithm is:
Lee et al., 2019. Calling star alleles with Stargazer in 28 pharmacogenes with whole genome sequences. Clinical Pharmacology & Therapeutics. DOI: https://doi.org/10.1002/cpt.1552.
The Stargazer genotyping pipeline is described in:
Lee et al., 2018. Stargazer: a software tool for calling star alleles from next-generation sequencing data using CYP2D6 as a model. Genetics in Medicine. DOI: https://doi.org/10.1038/s41436-018-0054-0.
Below are selected articles in which Stargazer was used for genotype analysis.
Dalton and Lee et al., 2019. Interrogation of CYP2D6 structural variant alleles improves the correlation between CYP2D6 genotype and CYP2D6-mediated metabolic activity. Clinical and Translational Science. DOI: https://doi.org/10.1111/cts.12695.
McInnes et al., 2019. Hubble2D6: A deep learning approach for predicting drug metabolic activity. BioRxiv. DOI: https://doi.org/10.1101/684357.
Taliun et al., 2019. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. BioRxiv. DOI: https://doi.org/10.1101/563866.
Claw et al., 2019. Pharmacogenomics of nicotine metabolism: novel CYP2A6 and CYP2B6 genetic variation patterns in Alaska Native and American Indian populations. Nicotine & Tobacco Research. DOI: https://doi.org/10.1093/ntr/ntz105.
If you are interested in reading more detailed release notes or testing the latest beta version of Stargazer, click here.
Stargazer_v1.0.7 (November 20, 2019)
- Stargazer has been extended to call star alleles in CYP1B1, CYP4B1, CYP19A1, and TBXAS1 (formerly CYP5A1).
- This version uses a new algorithm for haplotype phasing rare variants that are not present in reference VCF -- i.e. cannot be phased statistically: 1) The algorithm is tentatively named as "phasing by extension." 2) The algorithm does not replace statistical phasing; it's only supplementary. 3) The algorithm utilizes haplotype information obtained by statistical phasing and a scoring system to determine which of the two haplotypes in a given sample is more likely to carry the rare variant based on the total number of "tagging" SNPs relevant to that rare variant and matching the observed haplotype -- hence the algorithm's name "phasing by extension." 4) Take the non-functional CYP2D6*21 allele as an example, which is defined by 2580_2581insC (core), 2851C>T (tagging), and 4181G>C (tagging). 2851C>T and 4181G>C are present in the 1KGP panel and thus statistically phasable, while 2580_2581insC is not. In order to call a sample with 2580_2581insC as having CYP2D6*21, Stargazer will first check which of the haplotypes contains 2851C>T and 4181G>C and then assign 2580_2581insC to that haplotype.
- A brand new tool named "view" has been added to Stargazer. Briefly, this tool allows you to view, for example, the list of SNPs and their information of a given star allele for specific samples (e.g. genotype call, allelic depth, functional annotation). See the related example in Stargazer and also the Documentation page for more details.
- A brand new tool named "sges" has been added to Stargazer. The major difference between this tool and the existing "sgep" tool (formerly "sge") is that the latter was designed for running per-project genotyping pipeline (i.e. single gene, multiple samples) while the former is designed for running per-sample genotyping pipeline (i.e. multiple genes, single sample). Here, the letter p in "sgep" means SGE pipeline for per-[p]roject while the letter s in "sges" means SGE pipeline for per-[s]ample. The "sgep" tool is more suited for projects involving a large volume of samples (e.g. cohort-based research study) while the "sges" tool is more suited for situations in which the return of results is expected to occur (e.g. individual PGx report). See the Documentation page for more details.
Stargazer_v1.0.6 (September 20, 2019)
- Stargazer has been extended to call star alleles in 10 additional genes (CYP2A13, CYP2F1, CYP2J2, CYP2R1, CYP2S1, CYP2W1, CYP3A7, CYP3A43, CYP26A1, POR).
- We're very excited to introduce the Database of Pharmacogenomic Structural Variants (DPSV)! The objective of DPSV is to provide a comprehensive summary of PGx SVs detected by Stargazer from real NGS samples (see examples in the DPSV page).
- This version produces allele fraction profiles as well as copy number profiles.
- This version uses phased allele fractions to determine relative gene copy numbers in samples with CN>2 (e.g., CYP2D6*1/*2x2 vs. *1x2/*2).
- This version uses more advanced SV detection algorithms, including the use of copy number-stable region or CNSR.
- This version uses improved systems for handling input VCF files created from various tools/sources.
- This version uses expanded haplotype reference panels for increased phasing accuracy (+/- 100kb instead of 3kb).
- This version uses Beagle v5.1 instead of v5.0 for increased phasing accuracy and speed.
- This version uses expanded target regions for more accurate SV detection.
- This version produces logging messages that are more user-friendly.
- Some of the command line arguments have been changed. See the Documentation page.
Stargazer_v1.0.5 (July 23, 2019)
- Stargazer has been extended to call star alleles in G6PD and NUDT15.
- Additional tools have been added to Stargazer and, because of this, the command line has been changed.
- This version uses Beagle v5.0 instead of Beagle v4.1 for phasing SNVs/indels.
- Stargazer now supports "VCF only" mode for both NGS data and SNP chip data.
Stargazer_v1.0.4 (March 3, 2019)
- Stargazer has been extended to call star alleles in 28 genes.
- Many useful optional arguments have been added.
- This version is described in Lee et al., 2019.
Stargazer_v1.0.3 (July 9, 2018)
- Stargazer has been extended to call star alleles in CYP2A6/CYP2A7.
Stargazer_v1.0.2 (June 14, 2018)
- To determine the duplicated star allele in samples with three gene copies or more (e.g., CYP2D6*1x2/*4 vs. *1/*4x2), Stargazer computes allele fractions from sequence reads that carry the corresponding variant. Previous versions of Stargazer test if the observed allele fraction from a sample with three gene copies or more is greater than the mean of allele fractions from all samples within the same sequencing project that are heterozygous for the variant of interest and do not have any structural variation. This version instead uses an optimal decision boundary found with Bayesian updating for two main reasons. First, the empirical mean is not always obtainable (i.e., there is only one sample with the variant) or the mean value might not be accurate if not many samples have this variant. Second, the approach allows utilization of an informative prior that says allele fractions should be centered at 0.5 if heterozygous samples without structural variation are used.
Stargazer_v1.0.1 (April 11, 2018)
- For detection of structural variation, this version no longer filters out loci based on the variance in read depth across the samples. Instead, it filters out pre-selected regions that have been empirically shown to produce high noise (e.g., regions in which reads are mapping multiply).
- In order to call structural variants, this version fits every pairwise combination of known sequence structures (one for each chromosome) against the sample's observed copy number profile and then selects the combination that produces the least deviance. This combinatorial testing is used in Stargazer_v1.0.0 as well but only for samples with more than one structural variation (abnormal structure for the first chromosome and abnormal structure for the second chromosome). Basically, this version generalizes the combinatorial testing to be applied to even samples without any structural variation (normal structure and normal structure) and samples with only one structural variation (normal structure and abnormal structure). As a result, the copy number plot now displays the two best sequence structures for the sample in addition to the sample's original copy number.
Stargazer_v1.0.0 (March 11, 2018)
- This version is described in Lee et al., 2018.