907 resultados para computationally efficient algorithm
Resumo:
In vielen Bereichen der industriellen Fertigung, wie zum Beispiel in der Automobilindustrie, wer- den digitale Versuchsmodelle (sog. digital mock-ups) eingesetzt, um die Entwicklung komplexer Maschinen m ̈oglichst gut durch Computersysteme unterstu ̈tzen zu k ̈onnen. Hierbei spielen Be- wegungsplanungsalgorithmen eine wichtige Rolle, um zu gew ̈ahrleisten, dass diese digitalen Pro- totypen auch kollisionsfrei zusammengesetzt werden k ̈onnen. In den letzten Jahrzehnten haben sich hier sampling-basierte Verfahren besonders bew ̈ahrt. Diese erzeugen eine große Anzahl von zuf ̈alligen Lagen fu ̈r das ein-/auszubauende Objekt und verwenden einen Kollisionserken- nungsmechanismus, um die einzelnen Lagen auf Gu ̈ltigkeit zu u ̈berpru ̈fen. Daher spielt die Kollisionserkennung eine wesentliche Rolle beim Design effizienter Bewegungsplanungsalgorith- men. Eine Schwierigkeit fu ̈r diese Klasse von Planern stellen sogenannte “narrow passages” dar, schmale Passagen also, die immer dort auftreten, wo die Bewegungsfreiheit der zu planenden Objekte stark eingeschr ̈ankt ist. An solchen Stellen kann es schwierig sein, eine ausreichende Anzahl von kollisionsfreien Samples zu finden. Es ist dann m ̈oglicherweise n ̈otig, ausgeklu ̈geltere Techniken einzusetzen, um eine gute Performance der Algorithmen zu erreichen.rnDie vorliegende Arbeit gliedert sich in zwei Teile: Im ersten Teil untersuchen wir parallele Kollisionserkennungsalgorithmen. Da wir auf eine Anwendung bei sampling-basierten Bewe- gungsplanern abzielen, w ̈ahlen wir hier eine Problemstellung, bei der wir stets die selben zwei Objekte, aber in einer großen Anzahl von unterschiedlichen Lagen auf Kollision testen. Wir im- plementieren und vergleichen verschiedene Verfahren, die auf Hu ̈llk ̈operhierarchien (BVHs) und hierarchische Grids als Beschleunigungsstrukturen zuru ̈ckgreifen. Alle beschriebenen Verfahren wurden auf mehreren CPU-Kernen parallelisiert. Daru ̈ber hinaus vergleichen wir verschiedene CUDA Kernels zur Durchfu ̈hrung BVH-basierter Kollisionstests auf der GPU. Neben einer un- terschiedlichen Verteilung der Arbeit auf die parallelen GPU Threads untersuchen wir hier die Auswirkung verschiedener Speicherzugriffsmuster auf die Performance der resultierenden Algo- rithmen. Weiter stellen wir eine Reihe von approximativen Kollisionstests vor, die auf den beschriebenen Verfahren basieren. Wenn eine geringere Genauigkeit der Tests tolerierbar ist, kann so eine weitere Verbesserung der Performance erzielt werden.rnIm zweiten Teil der Arbeit beschreiben wir einen von uns entworfenen parallelen, sampling- basierten Bewegungsplaner zur Behandlung hochkomplexer Probleme mit mehreren “narrow passages”. Das Verfahren arbeitet in zwei Phasen. Die grundlegende Idee ist hierbei, in der er- sten Planungsphase konzeptionell kleinere Fehler zuzulassen, um die Planungseffizienz zu erh ̈ohen und den resultierenden Pfad dann in einer zweiten Phase zu reparieren. Der hierzu in Phase I eingesetzte Planer basiert auf sogenannten Expansive Space Trees. Zus ̈atzlich haben wir den Planer mit einer Freidru ̈ckoperation ausgestattet, die es erlaubt, kleinere Kollisionen aufzul ̈osen und so die Effizienz in Bereichen mit eingeschr ̈ankter Bewegungsfreiheit zu erh ̈ohen. Optional erlaubt unsere Implementierung den Einsatz von approximativen Kollisionstests. Dies setzt die Genauigkeit der ersten Planungsphase weiter herab, fu ̈hrt aber auch zu einer weiteren Perfor- mancesteigerung. Die aus Phase I resultierenden Bewegungspfade sind dann unter Umst ̈anden nicht komplett kollisionsfrei. Um diese Pfade zu reparieren, haben wir einen neuartigen Pla- nungsalgorithmus entworfen, der lokal beschr ̈ankt auf eine kleine Umgebung um den bestehenden Pfad einen neuen, kollisionsfreien Bewegungspfad plant.rnWir haben den beschriebenen Algorithmus mit einer Klasse von neuen, schwierigen Metall- Puzzlen getestet, die zum Teil mehrere “narrow passages” aufweisen. Unseres Wissens nach ist eine Sammlung vergleichbar komplexer Benchmarks nicht ̈offentlich zug ̈anglich und wir fan- den auch keine Beschreibung von vergleichbar komplexen Benchmarks in der Motion-Planning Literatur.
Resumo:
This thesis deals with the development of a novel simulation technique for macromolecules in electrolyte solutions, with the aim of a performance improvement over current molecular-dynamics based simulation methods. In solutions containing charged macromolecules and salt ions, it is the complex interplay of electrostatic interactions and hydrodynamics that determines the equilibrium and non-equilibrium behavior. However, the treatment of the solvent and dissolved ions makes up the major part of the computational effort. Thus an efficient modeling of both components is essential for the performance of a method. With the novel method we approach the solvent in a coarse-grained fashion and replace the explicit-ion description by a dynamic mean-field treatment. Hence we combine particle- and field-based descriptions in a hybrid method and thereby effectively solve the electrokinetic equations. The developed algorithm is tested extensively in terms of accuracy and performance, and suitable parameter sets are determined. As a first application we study charged polymer solutions (polyelectrolytes) in shear flow with focus on their viscoelastic properties. Here we also include semidilute solutions, which are computationally demanding. Secondly we study the electro-osmotic flow on superhydrophobic surfaces, where we perform a detailed comparison to theoretical predictions.
Resumo:
Zeitreihen sind allgegenwärtig. Die Erfassung und Verarbeitung kontinuierlich gemessener Daten ist in allen Bereichen der Naturwissenschaften, Medizin und Finanzwelt vertreten. Das enorme Anwachsen aufgezeichneter Datenmengen, sei es durch automatisierte Monitoring-Systeme oder integrierte Sensoren, bedarf außerordentlich schneller Algorithmen in Theorie und Praxis. Infolgedessen beschäftigt sich diese Arbeit mit der effizienten Berechnung von Teilsequenzalignments. Komplexe Algorithmen wie z.B. Anomaliedetektion, Motivfabfrage oder die unüberwachte Extraktion von prototypischen Bausteinen in Zeitreihen machen exzessiven Gebrauch von diesen Alignments. Darin begründet sich der Bedarf nach schnellen Implementierungen. Diese Arbeit untergliedert sich in drei Ansätze, die sich dieser Herausforderung widmen. Das umfasst vier Alignierungsalgorithmen und ihre Parallelisierung auf CUDA-fähiger Hardware, einen Algorithmus zur Segmentierung von Datenströmen und eine einheitliche Behandlung von Liegruppen-wertigen Zeitreihen.rnrnDer erste Beitrag ist eine vollständige CUDA-Portierung der UCR-Suite, die weltführende Implementierung von Teilsequenzalignierung. Das umfasst ein neues Berechnungsschema zur Ermittlung lokaler Alignierungsgüten unter Verwendung z-normierten euklidischen Abstands, welches auf jeder parallelen Hardware mit Unterstützung für schnelle Fouriertransformation einsetzbar ist. Des Weiteren geben wir eine SIMT-verträgliche Umsetzung der Lower-Bound-Kaskade der UCR-Suite zur effizienten Berechnung lokaler Alignierungsgüten unter Dynamic Time Warping an. Beide CUDA-Implementierungen ermöglichen eine um ein bis zwei Größenordnungen schnellere Berechnung als etablierte Methoden.rnrnAls zweites untersuchen wir zwei Linearzeit-Approximierungen für das elastische Alignment von Teilsequenzen. Auf der einen Seite behandeln wir ein SIMT-verträgliches Relaxierungschema für Greedy DTW und seine effiziente CUDA-Parallelisierung. Auf der anderen Seite führen wir ein neues lokales Abstandsmaß ein, den Gliding Elastic Match (GEM), welches mit der gleichen asymptotischen Zeitkomplexität wie Greedy DTW berechnet werden kann, jedoch eine vollständige Relaxierung der Penalty-Matrix bietet. Weitere Verbesserungen umfassen Invarianz gegen Trends auf der Messachse und uniforme Skalierung auf der Zeitachse. Des Weiteren wird eine Erweiterung von GEM zur Multi-Shape-Segmentierung diskutiert und auf Bewegungsdaten evaluiert. Beide CUDA-Parallelisierung verzeichnen Laufzeitverbesserungen um bis zu zwei Größenordnungen.rnrnDie Behandlung von Zeitreihen beschränkt sich in der Literatur in der Regel auf reellwertige Messdaten. Der dritte Beitrag umfasst eine einheitliche Methode zur Behandlung von Liegruppen-wertigen Zeitreihen. Darauf aufbauend werden Distanzmaße auf der Rotationsgruppe SO(3) und auf der euklidischen Gruppe SE(3) behandelt. Des Weiteren werden speichereffiziente Darstellungen und gruppenkompatible Erweiterungen elastischer Maße diskutiert.
Resumo:
Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal $t$-statistic with a permutation reference distribution to obtain the corresponding $p$-value. The number of computations required for the maximal test statistic is $O(N^2),$ where $N$ is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster. algorithm. Results: We present a hybrid approach to obtain the $p$-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analysis of array CGH data from a breast cancer cell line to show the impact of the new approaches on the analysis of real data. Availability: An R (R Development Core Team, 2006) version of the CBS algorithm has been implemented in the ``DNAcopy'' package of the Bioconductor project (Gentleman {\it et~al}, 2004). The proposed hybrid method for the $p$-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.
Resumo:
Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social sciences and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this paper, we develop multilevel latent class model, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the Expectation-Maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when either the number of classes or the cluster size is large. We propose a maximum pairwise likelihood (MPL) approach via a modified EM algorithm for this case. We also show that a simple latent class analysis, combined with robust standard errors, provides another consistent, robust, but less efficient inferential procedure. Simulation studies suggest that the three methods work well in finite samples, and that the MPL estimates often enjoy comparable precision as the ML estimates. We apply our methods to the analysis of comorbid symptoms in the Obsessive Compulsive Disorder study. Our models' random effects structure has more straightforward interpretation than those of competing methods, thus should usefully augment tools available for latent class analysis of multilevel data.
Resumo:
Currently photon Monte Carlo treatment planning (MCTP) for a patient stored in the patient database of a treatment planning system (TPS) can usually only be performed using a cumbersome multi-step procedure where many user interactions are needed. This means automation is needed for usage in clinical routine. In addition, because of the long computing time in MCTP, optimization of the MC calculations is essential. For these purposes a new graphical user interface (GUI)-based photon MC environment has been developed resulting in a very flexible framework. By this means appropriate MC transport methods are assigned to different geometric regions by still benefiting from the features included in the TPS. In order to provide a flexible MC environment, the MC particle transport has been divided into different parts: the source, beam modifiers and the patient. The source part includes the phase-space source, source models and full MC transport through the treatment head. The beam modifier part consists of one module for each beam modifier. To simulate the radiation transport through each individual beam modifier, one out of three full MC transport codes can be selected independently. Additionally, for each beam modifier a simple or an exact geometry can be chosen. Thereby, different complexity levels of radiation transport are applied during the simulation. For the patient dose calculation, two different MC codes are available. A special plug-in in Eclipse providing all necessary information by means of Dicom streams was used to start the developed MC GUI. The implementation of this framework separates the MC transport from the geometry and the modules pass the particles in memory; hence, no files are used as the interface. The implementation is realized for 6 and 15 MV beams of a Varian Clinac 2300 C/D. Several applications demonstrate the usefulness of the framework. Apart from applications dealing with the beam modifiers, two patient cases are shown. Thereby, comparisons are performed between MC calculated dose distributions and those calculated by a pencil beam or the AAA algorithm. Interfacing this flexible and efficient MC environment with Eclipse allows a widespread use for all kinds of investigations from timing and benchmarking studies to clinical patient studies. Additionally, it is possible to add modules keeping the system highly flexible and efficient.
Resumo:
Users of cochlear implant systems, that is, of auditory aids which stimulate the auditory nerve at the cochlea electrically, often complain about poor speech understanding in noisy environments. Despite the proven advantages of multimicrophone directional noise reduction systems for conventional hearing aids, only one major manufacturer has so far implemented such a system in a product, presumably because of the added power consumption and size. We present a physically small (intermicrophone distance 7 mm) and computationally inexpensive adaptive noise reduction system suitable for behind-the-ear cochlear implant speech processors. Supporting algorithms, which allow the adjustment of the opening angle and the maximum noise suppression, are proposed and evaluated. A portable real-time device for test in real acoustic environments is presented.
Resumo:
The widespread deployment of wireless mobile communications enables an almost permanent usage of portable devices, which imposes high demands on the battery of these devices. Indeed, battery lifetime is becoming one the most critical factors on the end-users satisfaction when using wireless communications. In this work, the optimized power save algorithm for continuous media applications (OPAMA) is proposed, aiming at enhancing the energy efficiency on end-users devices. By combining the application specific requirements with data aggregation techniques, {OPAMA} improves the standard {IEEE} 802.11 legacy Power Save Mode (PSM) performance. The algorithm uses the feedback on the end-user expected quality to establish a proper tradeoff between energy consumption and application performance. {OPAMA} was assessed in the OMNeT++ simulator, using real traces of variable bitrate video streaming applications, and in a real testbed employing a novel methodology intended to perform an accurate evaluation concerning video Quality of Experience (QoE) perceived by the end-users. The results revealed the {OPAMA} capability to enhance energy efficiency without degrading the end-user observed QoE, achieving savings up to 44 when compared with the {IEEE} 802.11 legacy PSM.
Resumo:
In clinical practice, traditional X-ray radiography is widely used, and knowledge of landmarks and contours in anteroposterior (AP) pelvis X-rays is invaluable for computer aided diagnosis, hip surgery planning and image-guided interventions. This paper presents a fully automatic approach for landmark detection and shape segmentation of both pelvis and femur in conventional AP X-ray images. Our approach is based on the framework of landmark detection via Random Forest (RF) regression and shape regularization via hierarchical sparse shape composition. We propose a visual feature FL-HoG (Flexible- Level Histogram of Oriented Gradients) and a feature selection algorithm based on trace radio optimization to improve the robustness and the efficacy of RF-based landmark detection. The landmark detection result is then used in a hierarchical sparse shape composition framework for shape regularization. Finally, the extracted shape contour is fine-tuned by a post-processing step based on low level image features. The experimental results demonstrate that our feature selection algorithm reduces the feature dimension in a factor of 40 and improves both training and test efficiency. Further experiments conducted on 436 clinical AP pelvis X-rays show that our approach achieves an average point-to-curve error around 1.2 mm for femur and 1.9 mm for pelvis.
Resumo:
This article provides an importance sampling algorithm for computing the probability of ruin with recuperation of a spectrally negative Lévy risk process with light-tailed downwards jumps. Ruin with recuperation corresponds to the following double passage event: for some t∈(0,∞)t∈(0,∞), the risk process starting at level x∈[0,∞)x∈[0,∞) falls below the null level during the period [0,t][0,t] and returns above the null level at the end of the period tt. The proposed Monte Carlo estimator is logarithmic efficient, as t,x→∞t,x→∞, when y=t/xy=t/x is constant and below a certain bound.
Resumo:
During the last decade wireless mobile communications have progressively become part of the people’s daily lives, leading users to expect to be “alwaysbest-connected” to the Internet, regardless of their location or time of day. This is indeed motivated by the fact that wireless access networks are increasingly ubiquitous, through different types of service providers, together with an outburst of thoroughly portable devices, namely laptops, tablets, mobile phones, among others. The “anytime and anywhere” connectivity criterion raises new challenges regarding the devices’ battery lifetime management, as energy becomes the most noteworthy restriction of the end-users’ satisfaction. This wireless access context has also stimulated the development of novel multimedia applications with high network demands, although lacking in energy-aware design. Therefore, the relationship between energy consumption and the quality of the multimedia applications perceived by end-users should be carefully investigated. This dissertation addresses energy-efficient multimedia communications in the IEEE 802.11 standard, which is the most widely used wireless access technology. It advances the literature by proposing a unique empirical assessment methodology and new power-saving algorithms, always bearing in mind the end-users’ feedback and evaluating quality perception. The new EViTEQ framework proposed in this thesis, for measuring video transmission quality and energy consumption simultaneously, in an integrated way, reveals the importance of having an empirical and high-accuracy methodology to assess the trade-off between quality and energy consumption, raised by the new end-users’ requirements. Extensive evaluations conducted with the EViTEQ framework revealed its flexibility and capability to accurately report both video transmission quality and energy consumption, as well as to be employed in rigorous investigations of network interface energy consumption patterns, regardless of the wireless access technology. Following the need to enhance the trade-off between energy consumption and application quality, this thesis proposes the Optimized Power save Algorithm for continuous Media Applications (OPAMA). By using the end-users’ feedback to establish a proper trade-off between energy consumption and application performance, OPAMA aims at enhancing the energy efficiency of end-users’ devices accessing the network through IEEE 802.11. OPAMA performance has been thoroughly analyzed within different scenarios and application types, including a simulation study and a real deployment in an Android testbed. When compared with the most popular standard power-saving mechanisms defined in the IEEE 802.11 standard, the obtained results revealed OPAMA’s capability to enhance energy efficiency, while keeping end-users’ Quality of Experience within the defined bounds. Furthermore, OPAMA was optimized to enable superior energy savings in multiple station environments, resulting in a new proposal called Enhanced Power Saving Mechanism for Multiple station Environments (OPAMA-EPS4ME). The results of this thesis highlight the relevance of having a highly accurate methodology to assess energy consumption and application quality when aiming to optimize the trade-off between energy and quality. Additionally, the obtained results based both on simulation and testbed evaluations, show clear benefits from employing userdriven power-saving techniques, such as OPAMA, instead of IEEE 802.11 standard power-saving approaches.
Resumo:
Behavior is one of the most important indicators for assessing cattle health and well-being. The objective of this study was to develop and validate a novel algorithm to monitor locomotor behavior of loose-housed dairy cows based on the output of the RumiWatch pedometer (ITIN+HOCH GmbH, Fütterungstechnik, Liestal, Switzerland). Data of locomotion were acquired by simultaneous pedometer measurements at a sampling rate of 10 Hz and video recordings for manual observation later. The study consisted of 3 independent experiments. Experiment 1 was carried out to develop and validate the algorithm for lying behavior, experiment 2 for walking and standing behavior, and experiment 3 for stride duration and stride length. The final version was validated, using the raw data, collected from cows not included in the development of the algorithm. Spearman correlation coefficients were calculated between accelerometer variables and respective data derived from the video recordings (gold standard). Dichotomous data were expressed as the proportion of correctly detected events, and the overall difference for continuous data was expressed as the relative measurement error. The proportions for correctly detected events or bouts were 1 for stand ups, lie downs, standing bouts, and lying bouts and 0.99 for walking bouts. The relative measurement error and Spearman correlation coefficient for lying time were 0.09% and 1; for standing time, 4.7% and 0.96; for walking time, 17.12% and 0.96; for number of strides, 6.23% and 0.98; for stride duration, 6.65% and 0.75; and for stride length, 11.92% and 0.81, respectively. The strong to very high correlations of the variables between visual observation and converted pedometer data indicate that the novel RumiWatch algorithm may markedly improve automated livestock management systems for efficient health monitoring of dairy cows.
Resumo:
SOMS is a general surrogate-based multistart algorithm, which is used in combination with any local optimizer to find global optima for computationally expensive functions with multiple local minima. SOMS differs from previous multistart methods in that a surrogate approximation is used by the multistart algorithm to help reduce the number of function evaluations necessary to identify the most promising points from which to start each nonlinear programming local search. SOMS’s numerical results are compared with four well-known methods, namely, Multi-Level Single Linkage (MLSL), MATLAB’s MultiStart, MATLAB’s GlobalSearch, and GLOBAL. In addition, we propose a class of wavy test functions that mimic the wavy nature of objective functions arising in many black-box simulations. Extensive comparisons of algorithms on the wavy testfunctions and on earlier standard global-optimization test functions are done for a total of 19 different test problems. The numerical results indicate that SOMS performs favorably in comparison to alternative methods and does especially well on wavy functions when the number of function evaluations allowed is limited.
Resumo:
Many attempts have already been made to detect exomoons around transiting exoplanets, but the first confirmed discovery is still pending. The experiences that have been gathered so far allow us to better optimize future space telescopes for this challenge already during the development phase. In this paper we focus on the forthcoming CHaraterising ExOPlanet Satellite (CHEOPS), describing an optimized decision algorithm with step-by-step evaluation, and calculating the number of required transits for an exomoon detection for various planet moon configurations that can be observable by CHEOPS. We explore the most efficient way for such an observation to minimize the cost in observing time. Our study is based on PTV observations (photocentric transit timing variation) in simulated CHEOPS data, but the recipe does not depend on the actual detection method, and it can be substituted with, e.g., the photodynamical method for later applications. Using the current state-of-the-art level simulation of CHEOPS data we analyzed transit observation sets for different star planet moon configurations and performed a bootstrap analysis to determine their detection statistics. We have found that the detection limit is around an Earth-sized moon. In the case of favorable spatial configurations, systems with at least a large moon and a Neptune-sized planet, an 80% detection chance requires at least 5-6 transit observations on average. There is also a nonzero chance in the case of smaller moons, but the detection statistics deteriorate rapidly, while the necessary transit measurements increase quickly. After the CoRoT and Kepler spacecrafts, CHEOPS will be the next dedicated space telescope that will observe exoplanetary transits and characterize systems with known Doppler-planets. Although it has a smaller aperture than Kepler (the ratio of the mirror diameters is about 1/3) and is mounted with a CCD that is similar to Kepler's, it will observe brighter stars and operate with larger sampling rate; therefore, the detection limit for an exomoon can be the same as or better, which will make CHEOPS a competitive instruments in the quest for exomoons.
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^