Tag snp selection for association studies pdf

A tag snp is a representative single nucleotide polymorphism snp in a region of the genome with high linkage disequilibrium that represents a group of snps called a haplotype. Most of the existing tagging snpset selection methods cannot make full use of the information hidden in common or rare variants associated diseases. For comparing alternative tag snp selection algorithms, we use coalescent simulation to. Transferability of tag snps in genetic association studies. A general question for linkage disequilibriumbased association studies is how power to detect an association is compromised when tag snps are chosen from data in one population sample and then deployed in another sample. This reduces the expense and time of mapping genome areas. Selection of genetic markers this chapter focuses on single nucleotide polymorphisms snps the most common form of variation in the human genome. The power of association studies based on tag snps using genotype data is similar to that using haplotype data. Tagger is a tool for the selection and evaluation of tag snps from genotype data such as that from the international hapmap project. How to select tag snps in genetic association studies. The use of highdensity tag snp arrays mainly illumina hap300 and hap550 for genomewide association studies has virtually revolutionized the field and led to the identification of strong susceptibility loci for several types of malignancies, including breast cancer hunter et al. Furthermore, we describe an innovative approach to combine both tag snp. Tagsnp selection based on pairwise ld criteria and power.

The ld measure r 2 has been used for tag snp selection 1, 12 because the statistical power of association studies is proportional to the value of r 2. Several methods have been proposed for selecting sets of genetic markers that characterize the polymorphisms in a region of interest 9. Snps hold much promise as a basis for genomewide diseasegene association. Poweroptimized tag snp selection our poweroptimized tag snp selection method is a stepwise greedy procedure to maximize power. Optimized tag snp content and dense marker spacing mean spacing 1. Increasing power of genomewide association studies by. Selection of snp subsets for association studies in.

Selection of representative snp sets for genomewide association. Efficiency and power in genetic association studies. It is proposed a new tagsnpset selection method based on ld information. Analysis of two different sets of snp genotype data from the hapmap is used to judge the practical aspects of using. To choose the proper sample size and genotyping platform for such studies, power calculations that take into account genetic model, tag snp selection, and the population of interest are required. Imputationaware tag snp selection to improve power for largescale, multiethnic association studies. Currently, typical genomewide association studies measure hundreds of thousands, or millions, of genetic variants. Jun, 2007 hapmap provides linkage disequilibrium ld information on a sample of 3. For example, a snp may replace the nucleotide cytosine c with the nucleotide thymine t in a certain stretch of dna. This power provides the fastest path to discovery and publication. It combines the simplicity of pairwise tagging methods with the efficiency benefits of multimarker haplotype approaches. Title page increasing the power of association studies by. Tag snp selection for association studies stram 2004. For both applications either for tag or index snp selection, the corresponding problem can be formulated as follows.

Tag snp selection for candidate gene association studies using hapmap and gene resequencing data article pdf available in european journal of humangenetics 1510. Citeseerx tag snp selection for association studies. Tag snp selection for association studies, genetic. Here, for any given subset of snps within a block, all pairwise r 2 values between the snps in this subset and the snps absent in this subset are calculated.

Snp selection for pharmacogenetic association studies is discussed. Tag snp selection for candidate gene association studies using. Tag snp selection and association studies over the past few years, numerous disease association studies, both genomewide and. Tag snp selection and its applications in association studies. Genomewide association studies gwas are meant to find the genetic. Dec 01, 2004 tag snp selection for association studies tag snp selection for association studies stram, daniel o. Summary illuminas tag snp approach in tandem with the powerful infinium as. Many methods have been developed, and new methods for tag snp selection are continuously being developed. Imputationaware tag snp selection to improve power for large. In tag snp selection problem, our goal is to achieve a feasible solution with smallest number of htsnps. Imputationaware tag snp selection to improve power for largescale, multiethnic association studies genevieve l. Linkage disequilibrium ld, which refers to the nonrandom association of alleles at different loci lewontin 1964 in haplotypes, plays a central role in genomewide association studies for. Despite the advances in genotyping technologies which have led to large reduction in genotyping cost, the tag snp selection problem remains an important problem for computational biologists and geneticists.

In low and mediumbudget association studies, a limited number of tag snps are selected out of a large set of available snps previously typed in an initial cohort. Tag snp selection for candidate gene association studies. Single nucleotide polymorphism snpset analysis in genomewide association studies gwas has emerged as a research hotspot for identifying genetic variants associated with disease susceptibility. In this paper, we present an or application for representative snp selection that implements our novel simulated annealing sa based featureselection. The goal is to minimize the number of markers selected for genotyping in a particular. For a candidate gene study, researchers can choose their tag snps. Selection of these tag snps poses several challenges as rare variants tend to be. Structured genomewide association studies with bayesian. Tag snp selection for association studies request pdf.

Two vertices v i and v j are connected if and only if two corresponding snps s i and s j are correlated. Consequently, a new generation of genotyping arrays are being developed designed with tag single nucleotide polymorphisms snps to improve rare variant imputation. Tag snp selection for prediction of tick resistance in. Pdf tag snp selection for candidate gene association. Bayesian variable selection regression for genomewide. Tag snp selection using particle swarm optimization. These tag snps are then typed in a larger set of control and affected individuals.

Sample sizes required at different powers of detecting. Selection and evaluation of tag snps tagger is a tool for the selection and evaluation of tag snps from genotype data such as that from the international hapmap project. Snp pvalue data and finds all snps in high ld with gwas snps, so that selection is from a much larger set of snps than the gwas itself. Pdf haplotype block partitioning and tag snp selection. A comparative study of tag snp selection using clustering. Acknowledgments i would like to express my deepest gratitude to my. Sep 15, 2004 2 the dependence of the performance of tag snp selection methods upon the density of snp markers genotyped for the purpose of haplotype discovery and tag snp selection.

For a tag snp selection problem using pairwise r 2, one can construct a graph g v,e with each vertex v i representing an snp s i. Selecting a maximally informative set of singlenucleotide polymorphisms for association analyses using linkage disequilibrium. Research article open access an efficient weighted tag snpset analytical method in genomewide association studies bin yan1, shudong wang1,2,3, huaqian jia1, xing liu1 and xinzeng wang1 abstract background. One application is to select a subset of the single nucleotide polymorphism snp biomarkers from the whole snp set that is informative and small enough for subsequent association studies. Tagging snps for association studies hum hered 2004. Twostage sampling designs for gene association studies. Software for tag single nucleotide polymorphism selection. Therefore, tag snp selection is not an issueoption for genomewide association studies. Although there is a broad literature on bayesian variableselection underhigh orultrahighdimensional. Significant genetic association may be interpreted as either 1 direct association, in which the genotyped snp is the true causal variant conferring disease susceptibility. Tagsnp selection isan important step indesigning case control association studies. Our variable selection approach is inherently hierarchical, and involves selection at both snpset level and individual snp level. Tag snp selection for association studies stram 2004 genetic. Tagging snpset selection with maximum information based on.

Using tag snps for a genomewide association study allows the investigator to maximize information content and minimize sample size without losing the power. A tag snp is a representative single nucleotide polymorphism in a region of the genome with high linkage disequilibrium the nonrandom association of alleles at two or more loci. Selecting tagging snps for association studies using power. In a typical gwas, an informative subset of the singlenucleotide polymorphisms snps, called tag snps, is genotyped in casecontrol individuals. These snps are usually chosen from haplotype data and are thus called haplotype tag snps htsnps.

Once the tag snp statistics are computed, the genomic regions that are in linkage disequilibrium ld with the most. Increasing the power of association studies by imputation. Effective tagging singlenucleotide polymorphism snpset selection is crucial to snpset analysis in genomewide association studies gwas. The differential pattern of mf and tl variation of snps was critical to effective tag snp selection, since the top snps were clearly distinct in the histograms of those windows fig. Methods for tag snp selection based on established multivariate statistical techniques may. Targeting the most informative snp loci supports the most efficient study designs. The tagsnp set that covers all snps is therefore the dominating set of the graph g. Efficient association study design via poweroptimized tag. Taylor1,2, 1epidemiology branch and 2laboratory of molecular carcinogenesis. Pdf tag snp selection and its applications in association studies. Each snp represents a difference in a single dna building block, called a nucleotide. Among selection methods that have proliferated, the ones based on pairwise ld measurement are attractive for the purpose of designing association studies. Power analysis for genomewide association studies bmc.

Genomewide association studies gwass aim to detect genetic risk factors for complex human diseases by identifying diseaseassociated singlenucleotide polymorphisms snps. A novel prediction method for tag snp selection using. Snp tagging and then evaluates freely available software for the selection of tag snps for genetic association studies. Haplotype block partitioning and tag snp selection using genotype data and their applications to association studies kui zhang,1,2 zhaohui s. Request pdf tag snp selection for association studies this report describes current methods for selection of informative single nucleotide polymorphisms snps using data from a.

A tool for selecting snps for association studies based on. Millions of polymorphisms are reported and freely available in public databases. Tag snp selection using particle swarm optimization chuang. We incorporate functional predictions of protein structure, gene regulation, splicing and mirna binding. Analysis of epidemiologic studies of genetic effects and gene. A distinction between haplotype blockbased and nonblockbased approaches yields two classes of procedures. Informative snp selection problem issp given a sample s of a population p of individuals either haplotypes or genotypes on m snps, select positions of k k for any individual, one can predict non. Snp and haplotype associations using a twostage design. The emergence of very large cohorts in genomic research has facilitated a focus on genotypeimputation strategies to power rare variant association. Prioritize and select snps for association studies with multistage.

The advantage is particularly striking when the set of tag snps is sparse. Haplotype block partitioning and tag snp selection using genotype data and their applications to association studies. Selecting the smallest subset of tag snps that can predict the other snps would considerably minimize the complexity of genomewide or blockbased snpdisease association studies. Efficiency and power in genetic association studies nature.

Recently, several methods have been published to select subsets of. Criteria for the selection of single nucleotide polymorphisms in pathway pharmacogenetics. At the time of this study, genotypes based on resequencing data were available from the egp website for 52 387 snps in 391 genes from egp. Therefore, it is essential to select only informative snps representing the original snp distributions in the genome tag snp selection for genome.

Tag snps are useful in wholegenome snp association studies, in. These snps are usually chosen from haplotypes and called haplotype tag snps htsnps. Therefore, less number of htsnps and more number of predictable snps cause a better fitness. The recent advances in genotyping and molecular techniques have greatly increased the knowledge of the human genome structure. Haplotype block partitioning and tag snp selection using. Request pdf tag snp selection for association studies this report describes current methods for selection of informative single nucleotide polymorphisms snps using data from a dense network. Functionally informative tag snps for disease association studies. Association studies can determine whether a genetic variant is associated with a disease or trait. Tag snp selection via a genetic algorithm sciencedirect. Citeseerx document details isaac councill, lee giles, pradeep teregowda. The value of genebased selection of tag snps in genomewide. Hapmap provides linkage disequilibrium ld information on a sample of 3.

But most existing methods of snpset analysis are affected by the quality of snpset, and poor quality of snpset can lead to low power in gwas. In this paper, we present an or application for representative snp selection that implements our novel simulated annealing sa based feature selection. As a result, there is now a need to identify among all these data, the relevant markers for genetic association studies. The aim of this chapter is not to enumerate and detail all available methods for haplotype block partitioning and tag snp selection, but rather to focus on how to use the available methods, tools, and resources to facilitate tag snp selection in association studies. It is possible to identify genetic variation and association to phenotypes without genotyping every snp in a chromosomal region. Because the power gauges the chance of success of an association study, selection of tag snps that yield higher power will increase the effectiveness of future association studies, at no.

Transferability of tag snps in genetic association studies in. Imputationaware tag snp selection to improve power for. An efficient weighted tag snpset analytical method in. Accordingly, the scale and cost of genotyping are expected to be largely reduced. Laboratory of molecular carcinogenesis, national institute of environmental health sciences, research triangle park, nc 27709, usa. Power calculations are important at the study design stage to ensure successful results. Jun 01, 2011 genomewide association studies gwass have been effectively identifying the genomic regions associated with a disease trait. This work demonstrates that, while there may be limits given current reference panels, improving gwas scaffold design is an underused means to increase power in association studies.

Imputation aware tag snp selection to improve power for. The value of genebased selection of tag snps in genome. The power of genomewide association studies can be computed using a set of tag. Single nucleotide polymorphism snpset analysis in genomewide association studies gwas has. A key strategy to improve the efficiency of association studies is to select a subset of informative snps, called tag snps, for analysis johnson et al. The program can also identify and choose tag snps for snps not in high ld with any gwas snp. In this case, snp bta60194nors rs41587782 was in high ld with the representative tag snp and thus, was excluded in the final step of the selection strategy. Abstract selection of genetic variants is a crucial first step in the rational design of studies aimed at explaining individual differences in susceptibility to complex human diseases or health intervention outcomes. Selection of snp subsets for association studies in candidate. Abstract this report describes current methods for selection of informative single nucleotide polymorphisms snps using data from a dense. Imputation aware tag snp selection to improve power for multi. Multimarkerld based genetic algorithm for tag snp selection. Pdf linkage disequilibrium ld plays a central role in association studies for identifying the genetic variation responsible for complex human. Genomewide association studies are a promising new tool for deciphering the genetics of complex diseases.

1608 845 1492 541 1056 1568 855 35 699 419 92 1253 1295 1444 1418 693 1300 1388 98 1235 1355 599 845 1253 529 815 897 365 363 1120 725 1279 1161 323 330 1142 953 754 1010 825 138 928 238 1053 496