930 resultados para Classification algorithms
Resumo:
En aquest treball, es proposa un nou mètode per estimar en temps real la qualitat del producte final en processos per lot. Aquest mètode permet reduir el temps necessari per obtenir els resultats de qualitat de les anàlisi de laboratori. S'utiliza un model de anàlisi de componentes principals (PCA) construït amb dades històriques en condicions normals de funcionament per discernir si un lot finalizat és normal o no. Es calcula una signatura de falla pels lots anormals i es passa a través d'un model de classificació per la seva estimació. L'estudi proposa un mètode per utilitzar la informació de les gràfiques de contribució basat en les signatures de falla, on els indicadors representen el comportament de les variables al llarg del procés en les diferentes etapes. Un conjunt de dades compost per la signatura de falla dels lots anormals històrics es construeix per cercar els patrons i entrenar els models de classifcació per estimar els resultas dels lots futurs. La metodologia proposada s'ha aplicat a un reactor seqüencial per lots (SBR). Diversos algoritmes de classificació es proven per demostrar les possibilitats de la metodologia proposada.
Resumo:
We conduct a large-scale comparative study on linearly combining superparent-one-dependence estimators (SPODEs), a popular family of seminaive Bayesian classifiers. Altogether, 16 model selection and weighing schemes, 58 benchmark data sets, and various statistical tests are employed. This paper's main contributions are threefold. First, it formally presents each scheme's definition, rationale, and time complexity and hence can serve as a comprehensive reference for researchers interested in ensemble learning. Second, it offers bias-variance analysis for each scheme's classification error performance. Third, it identifies effective schemes that meet various needs in practice. This leads to accurate and fast classification algorithms which have an immediate and significant impact on real-world applications. Another important feature of our study is using a variety of statistical tests to evaluate multiple learning methods across multiple data sets.
Resumo:
Phase encoded nano structures such as Quick Response (QR) codes made of metallic nanoparticles are suggested to be used in security and authentication applications. We present a polarimetric optical method able to authenticate random phase encoded QR codes. The system is illuminated using polarized light and the QR code is encoded using a phase-only random mask. Using classification algorithms it is possible to validate the QR code from the examination of the polarimetric signature of the speckle pattern. We used Kolmogorov-Smirnov statistical test and Support Vector Machine algorithms to authenticate the phase encoded QR codes using polarimetric signatures.
Resumo:
Les logiciels sont en constante évolution, nécessitant une maintenance et un développement continus. Ils subissent des changements tout au long de leur vie, que ce soit pendant l'ajout de nouvelles fonctionnalités ou la correction de bogues dans le code. Lorsque ces logiciels évoluent, leurs architectures ont tendance à se dégrader avec le temps et deviennent moins adaptables aux nouvelles spécifications des utilisateurs. Elles deviennent plus complexes et plus difficiles à maintenir. Dans certains cas, les développeurs préfèrent refaire la conception de ces architectures à partir du zéro plutôt que de prolonger la durée de leurs vies, ce qui engendre une augmentation importante des coûts de développement et de maintenance. Par conséquent, les développeurs doivent comprendre les facteurs qui conduisent à la dégradation des architectures, pour prendre des mesures proactives qui facilitent les futurs changements et ralentissent leur dégradation. La dégradation des architectures se produit lorsque des développeurs qui ne comprennent pas la conception originale du logiciel apportent des changements au logiciel. D'une part, faire des changements sans comprendre leurs impacts peut conduire à l'introduction de bogues et à la retraite prématurée du logiciel. D'autre part, les développeurs qui manquent de connaissances et–ou d'expérience dans la résolution d'un problème de conception peuvent introduire des défauts de conception. Ces défauts ont pour conséquence de rendre les logiciels plus difficiles à maintenir et évoluer. Par conséquent, les développeurs ont besoin de mécanismes pour comprendre l'impact d'un changement sur le reste du logiciel et d'outils pour détecter les défauts de conception afin de les corriger. Dans le cadre de cette thèse, nous proposons trois principales contributions. La première contribution concerne l'évaluation de la dégradation des architectures logicielles. Cette évaluation consiste à utiliser une technique d’appariement de diagrammes, tels que les diagrammes de classes, pour identifier les changements structurels entre plusieurs versions d'une architecture logicielle. Cette étape nécessite l'identification des renommages de classes. Par conséquent, la première étape de notre approche consiste à identifier les renommages de classes durant l'évolution de l'architecture logicielle. Ensuite, la deuxième étape consiste à faire l'appariement de plusieurs versions d'une architecture pour identifier ses parties stables et celles qui sont en dégradation. Nous proposons des algorithmes de bit-vecteur et de clustering pour analyser la correspondance entre plusieurs versions d'une architecture. La troisième étape consiste à mesurer la dégradation de l'architecture durant l'évolution du logiciel. Nous proposons un ensemble de m´etriques sur les parties stables du logiciel, pour évaluer cette dégradation. La deuxième contribution est liée à l'analyse de l'impact des changements dans un logiciel. Dans ce contexte, nous présentons une nouvelle métaphore inspirée de la séismologie pour identifier l'impact des changements. Notre approche considère un changement à une classe comme un tremblement de terre qui se propage dans le logiciel à travers une longue chaîne de classes intermédiaires. Notre approche combine l'analyse de dépendances structurelles des classes et l'analyse de leur historique (les relations de co-changement) afin de mesurer l'ampleur de la propagation du changement dans le logiciel, i.e., comment un changement se propage à partir de la classe modifiée è d'autres classes du logiciel. La troisième contribution concerne la détection des défauts de conception. Nous proposons une métaphore inspirée du système immunitaire naturel. Comme toute créature vivante, la conception de systèmes est exposée aux maladies, qui sont des défauts de conception. Les approches de détection sont des mécanismes de défense pour les conception des systèmes. Un système immunitaire naturel peut détecter des pathogènes similaires avec une bonne précision. Cette bonne précision a inspiré une famille d'algorithmes de classification, appelés systèmes immunitaires artificiels (AIS), que nous utilisions pour détecter les défauts de conception. Les différentes contributions ont été évaluées sur des logiciels libres orientés objets et les résultats obtenus nous permettent de formuler les conclusions suivantes: • Les métriques Tunnel Triplets Metric (TTM) et Common Triplets Metric (CTM), fournissent aux développeurs de bons indices sur la dégradation de l'architecture. La d´ecroissance de TTM indique que la conception originale de l'architecture s’est dégradée. La stabilité de TTM indique la stabilité de la conception originale, ce qui signifie que le système est adapté aux nouvelles spécifications des utilisateurs. • La séismologie est une métaphore intéressante pour l'analyse de l'impact des changements. En effet, les changements se propagent dans les systèmes comme les tremblements de terre. L'impact d'un changement est plus important autour de la classe qui change et diminue progressivement avec la distance à cette classe. Notre approche aide les développeurs à identifier l'impact d'un changement. • Le système immunitaire est une métaphore intéressante pour la détection des défauts de conception. Les résultats des expériences ont montré que la précision et le rappel de notre approche sont comparables ou supérieurs à ceux des approches existantes.
Resumo:
This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
In the world we are constantly performing everyday actions. Two of these actions are frequent and of great importance: classify (sort by classes) and take decision. When we encounter problems with a relatively high degree of complexity, we tend to seek other opinions, usually from people who have some knowledge or even to the extent possible, are experts in the problem domain in question in order to help us in the decision-making process. Both the classification process as the process of decision making, we are guided by consideration of the characteristics involved in the specific problem. The characterization of a set of objects is part of the decision making process in general. In Machine Learning this classification happens through a learning algorithm and the characterization is applied to databases. The classification algorithms can be employed individually or by machine committees. The choice of the best methods to be used in the construction of a committee is a very arduous task. In this work, it will be investigated meta-learning techniques in selecting the best configuration parameters of homogeneous committees for applications in various classification problems. These parameters are: the base classifier, the architecture and the size of this architecture. We investigated nine types of inductors candidates for based classifier, two methods of generation of architecture and nine medium-sized groups for architecture. Dimensionality reduction techniques have been applied to metabases looking for improvement. Five classifiers methods are investigated as meta-learners in the process of choosing the best parameters of a homogeneous committee.
Resumo:
As condições meteorológicas são determinantes para a produção agrícola; a precipitação, em particular, pode ser citada como a mais influente por sua relação direta com o balanço hídrico. Neste sentido, modelos agrometeorológicos, os quais se baseiam nas respostas das culturas às condições meteorológicas, vêm sendo cada vez mais utilizados para a estimativa de rendimentos agrícolas. Devido às dificuldades de obtenção de dados para abastecer tais modelos, métodos de estimativa de precipitação utilizando imagens dos canais espectrais dos satélites meteorológicos têm sido empregados para esta finalidade. O presente trabalho tem por objetivo utilizar o classificador de padrões floresta de caminhos ótimos para correlacionar informações disponíveis no canal espectral infravermelho do satélite meteorológico GOES-12 com a refletividade obtida pelo radar do IPMET/UNESP localizado no município de Bauru, visando o desenvolvimento de um modelo para a detecção de ocorrência de precipitação. Nos experimentos foram comparados quatro algoritmos de classificação: redes neurais artificiais (ANN), k-vizinhos mais próximos (k-NN), máquinas de vetores de suporte (SVM) e floresta de caminhos ótimos (OPF). Este último obteve melhor resultado, tanto em eficiência quanto em precisão.
Resumo:
Pós-graduação em Geografia - IGCE
Resumo:
A computational pipeline combining texture analysis and pattern classification algorithms was developed for investigating associations between high-resolution MRI features and histological data. This methodology was tested in the study of dentate gyrus images of sclerotic hippocampi resected from refractory epilepsy patients. Images were acquired using a simple surface coil in a 3.0T MRI scanner. All specimens were subsequently submitted to histological semiquantitative evaluation. The computational pipeline was applied for classifying pixels according to: a) dentate gyrus histological parameters and b) patients' febrile or afebrile initial precipitating insult history. The pipeline results for febrile and afebrile patients achieved 70% classification accuracy, with 78% sensitivity and 80% specificity [area under the reader observer characteristics (ROC) curve: 0.89]. The analysis of the histological data alone was not sufficient to achieve significant power to separate febrile and afebrile groups. Interesting enough, the results from our approach did not show significant correlation with histological parameters (which per se were not enough to classify patient groups). These results showed the potential of adding computational texture analysis together with classification methods for detecting subtle MRI signal differences, a method sufficient to provide good clinical classification. A wide range of applications of this pipeline can also be used in other areas of medical imaging. Magn Reson Med, 2012. (c) 2012 Wiley Periodicals, Inc.
Resumo:
The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While dozens of classification algorithms have been applied to time series, recent empirical evidence strongly suggests that simple nearest neighbor classification is exceptionally difficult to beat. The choice of distance measure used by the nearest neighbor algorithm is important, and depends on the invariances required by the domain. For example, motion capture data typically requires invariance to warping, and cardiology data requires invariance to the baseline (the mean value). Similarly, recent work suggests that for time series clustering, the choice of clustering algorithm is much less important than the choice of distance measure used.In this work we make a somewhat surprising claim. There is an invariance that the community seems to have missed, complexity invariance. Intuitively, the problem is that in many domains the different classes may have different complexities, and pairs of complex objects, even those which subjectively may seem very similar to the human eye, tend to be further apart under current distance measures than pairs of simple objects. This fact introduces errors in nearest neighbor classification, where some complex objects may be incorrectly assigned to a simpler class. Similarly, for clustering this effect can introduce errors by “suggesting” to the clustering algorithm that subjectively similar, but complex objects belong in a sparser and larger diameter cluster than is truly warranted.We introduce the first complexity-invariant distance measure for time series, and show that it generally produces significant improvements in classification and clustering accuracy. We further show that this improvement does not compromise efficiency, since we can lower bound the measure and use a modification of triangular inequality, thus making use of most existing indexing and data mining algorithms. We evaluate our ideas with the largest and most comprehensive set of time series mining experiments ever attempted in a single work, and show that complexity-invariant distance measures can produce improvements in classification and clustering in the vast majority of cases.
Resumo:
Although Recovery is often defined as the less studied and documented phase of the Emergency Management Cycle, a wide literature is available for describing characteristics and sub-phases of this process. Previous works do not allow to gain an overall perspective because of a lack of systematic consistent monitoring of recovery utilizing advanced technologies such as remote sensing and GIS technologies. Taking into consideration the key role of Remote Sensing in Response and Damage Assessment, this thesis is aimed to verify the appropriateness of such advanced monitoring techniques to detect recovery advancements over time, with close attention to the main characteristics of the study event: Hurricane Katrina storm surge. Based on multi-source, multi-sensor and multi-temporal data, the post-Katrina recovery was analysed using both a qualitative and a quantitative approach. The first phase was dedicated to the investigation of the relation between urban types, damage and recovery state, referring to geographical and technological parameters. Damage and recovery scales were proposed to review critical observations on remarkable surge- induced effects on various typologies of structures, analyzed at a per-building level. This wide-ranging investigation allowed a new understanding of the distinctive features of the recovery process. A quantitative analysis was employed to develop methodological procedures suited to recognize and monitor distribution, timing and characteristics of recovery activities in the study area. Promising results, gained by applying supervised classification algorithms to detect localization and distribution of blue tarp, have proved that this methodology may help the analyst in the detection and monitoring of recovery activities in areas that have been affected by medium damage. The study found that Mahalanobis Distance was the classifier which provided the most accurate results, in localising blue roofs with 93.7% of blue roof classified correctly and a producer accuracy of 70%. It was seen to be the classifier least sensitive to spectral signature alteration. The application of the dissimilarity textural classification to satellite imagery has demonstrated the suitability of this technique for the detection of debris distribution and for the monitoring of demolition and reconstruction activities in the study area. Linking these geographically extensive techniques with expert per-building interpretation of advanced-technology ground surveys provides a multi-faceted view of the physical recovery process. Remote sensing and GIS technologies combined to advanced ground survey approach provides extremely valuable capability in Recovery activities monitoring and may constitute a technical basis to lead aid organization and local government in the Recovery management.
Resumo:
Statistical modelling and statistical learning theory are two powerful analytical frameworks for analyzing signals and developing efficient processing and classification algorithms. In this thesis, these frameworks are applied for modelling and processing biomedical signals in two different contexts: ultrasound medical imaging systems and primate neural activity analysis and modelling. In the context of ultrasound medical imaging, two main applications are explored: deconvolution of signals measured from a ultrasonic transducer and automatic image segmentation and classification of prostate ultrasound scans. In the former application a stochastic model of the radio frequency signal measured from a ultrasonic transducer is derived. This model is then employed for developing in a statistical framework a regularized deconvolution procedure, for enhancing signal resolution. In the latter application, different statistical models are used to characterize images of prostate tissues, extracting different features. These features are then uses to segment the images in region of interests by means of an automatic procedure based on a statistical model of the extracted features. Finally, machine learning techniques are used for automatic classification of the different region of interests. In the context of neural activity signals, an example of bio-inspired dynamical network was developed to help in studies of motor-related processes in the brain of primate monkeys. The presented model aims to mimic the abstract functionality of a cell population in 7a parietal region of primate monkeys, during the execution of learned behavioural tasks.
Resumo:
In functional magnetic resonance imaging (fMRI) coherent oscillations of the blood oxygen level-dependent (BOLD) signal can be detected. These arise when brain regions respond to external stimuli or are activated by tasks. The same networks have been characterized during wakeful rest when functional connectivity of the human brain is organized in generic resting-state networks (RSN). Alterations of RSN emerge as neurobiological markers of pathological conditions such as altered mental state. In single-subject fMRI data the coherent components can be identified by blind source separation of the pre-processed BOLD data using spatial independent component analysis (ICA) and related approaches. The resulting maps may represent physiological RSNs or may be due to various artifacts. In this methodological study, we propose a conceptually simple and fully automatic time course based filtering procedure to detect obvious artifacts in the ICA output for resting-state fMRI. The filter is trained on six and tested on 29 healthy subjects, yielding mean filter accuracy, sensitivity and specificity of 0.80, 0.82, and 0.75 in out-of-sample tests. To estimate the impact of clearly artifactual single-subject components on group resting-state studies we analyze unfiltered and filtered output with a second level ICA procedure. Although the automated filter does not reach performance values of visual analysis by human raters, we propose that resting-state compatible analysis of ICA time courses could be very useful to complement the existing map or task/event oriented artifact classification algorithms.
Resumo:
Agricultural intensification has caused a decline in structural elements in European farmland, where natural habitats are increasingly fragmented. The loss of habitat structures has a detrimental effect on biodiversity and affects bat species that depend on vegetation structures for foraging and commuting. We investigated the impact of connectivity and configuration of structural landscape elements on flight activity, species richness and diversity of insectivorous bats and distinguished three bat guilds according to species-specific bioacoustic characteristics. We tested whether bats with shorter-range echolocation were more sensitive to habitat fragmentation than bats with longer-range echolocation. We expected to find different connectivity thresholds for the three guilds and hypothesized that bats prefer linear over patchy landscape elements. Bat activity was quantified using repeated acoustic monitoring in 225 locations at 15 study plots distributed across the Swiss Central Plateau, where connectivity and the shape of landscape elements were determined by spatial analysis (GIS). Spectrograms of bat calls were assigned to species with the software batit by means of image recognition and statistical classification algorithms. Bat activity was significantly higher around landscape elements compared to open control areas. Short- and long-range echolocating bats were more active in well-connected landscapes, but optimal connectivity levels differed between the guilds. Species richness increased significantly with connectivity, while species diversity did not (Shannon's diversity index). Total bat activity was unaffected by the shape of landscape elements. Synthesis and applications. This study highlights the importance of connectivity in farmland landscapes for bats, with shorter-range echolocating bats being particularly sensitive to habitat fragmentation. More structurally diverse landscape elements are likely to reduce population declines of bats and could improve conditions for other declining species, including birds. Activity was highest around optimal values of connectivity, which must be evaluated for the different guilds and spatially targeted for a region's habitat configuration. In a multi-species approach, we recommend the reintroduction of structural elements to increase habitat heterogeneity should become part of agri-environment schemes.
Resumo:
Smart homes for the aging population have recently started attracting the attention of the research community. The "health state" of smart homes is comprised of many different levels; starting with the physical health of citizens, it also includes longer-term health norms and outcomes, as well as the arena of positive behavior changes. One of the problems of interest is to monitor the activities of daily living (ADL) of the elderly, aiming at their protection and well-being. For this purpose, we installed passive infrared (PIR) sensors to detect motion in a specific area inside a smart apartment and used them to collect a set of ADL. In a novel approach, we describe a technology that allows the ground truth collected in one smart home to train activity recognition systems for other smart homes. We asked the users to label all instances of all ADL only once and subsequently applied data mining techniques to cluster in-home sensor firings. Each cluster would therefore represent the instances of the same activity. Once the clusters were associated to their corresponding activities, our system was able to recognize future activities. To improve the activity recognition accuracy, our system preprocessed raw sensor data by identifying overlapping activities. To evaluate the recognition performance from a 200-day dataset, we implemented three different active learning classification algorithms and compared their performance: naive Bayesian (NB), support vector machine (SVM) and random forest (RF). Based on our results, the RF classifier recognized activities with an average specificity of 96.53%, a sensitivity of 68.49%, a precision of 74.41% and an F-measure of 71.33%, outperforming both the NB and SVM classifiers. Further clustering markedly improved the results of the RF classifier. An activity recognition system based on PIR sensors in conjunction with a clustering classification approach was able to detect ADL from datasets collected from different homes. Thus, our PIR-based smart home technology could improve care and provide valuable information to better understand the functioning of our societies, as well as to inform both individual and collective action in a smart city scenario.