847 resultados para classification methods
Resumo:
The Gaussian process latent variable model (GP-LVM) has been identified to be an effective probabilistic approach for dimensionality reduction because it can obtain a low-dimensional manifold of a data set in an unsupervised fashion. Consequently, the GP-LVM is insufficient for supervised learning tasks (e. g., classification and regression) because it ignores the class label information for dimensionality reduction. In this paper, a supervised GP-LVM is developed for supervised learning tasks, and the maximum a posteriori algorithm is introduced to estimate positions of all samples in the latent variable space. We present experimental evidences suggesting that the supervised GP-LVM is able to use the class label information effectively, and thus, it outperforms the GP-LVM and the discriminative extension of the GP-LVM consistently. The comparison with some supervised classification methods, such as Gaussian process classification and support vector machines, is also given to illustrate the advantage of the proposed method.
Resumo:
本文基于北黄海1140个表层沉积物样品的粒度分析结果,来探讨沉积物粒度组成、分布特征与物源和水动力环境的关系。结果表明,研究区底质类型主要有泥、粉砂、砂质粉砂、粉砂质砂、砂五种类型,少数站位含有砾石。砾石则主要分布在长山列岛附近海域、大连湾口及其东南近海;砂主要分布在123.3°E以东的海域和长山列岛附近海域;粉砂主要分布在研究区的西南部和大连湾外海;粘土的含量较低,含量大于30%的区域仅分布在研究区的西南部,大于16%的区域有向北和东北延伸的趋势,在大连湾至广鹿岛近海沉积物粘土含量也大于16%。 根据对沉积物粒度各粒级的因子分析结果,结合沉积物的Folk分类和Pejrup分类,将北黄海划分成五个主要沉积环境分区,分别受山东半岛沿岸流、黄海暖流、强潮流场、长山列岛和辽东沿岸流控制。北黄海中西部沉积物主要受山东半岛沿岸流影响,经过本区的黄海暖流限制了山东半岛沿岸流携带的细粒物质向东和向北的扩散和运移,研究区东部沉积作用的主控因素是潮流,长山列岛中部海域受附近岛屿剥蚀物影响,大连湾外海受辽东半岛沿岸流控制。研究区西南部细粒沉积物的粒度分布具有不均匀性和南北不对称性,其形成主要受控于山东半岛沿岸流和黄海暖流及其相互作用,潮流可能也起一定作用。
Resumo:
There are two major problems that have been concerned all the times, which are the mechanics characters of joint rock mass and the criterion for stability of engineering rock. Aim at the two problems, several works were conducted as follow: (1) Firstly, the mechanics characters of rock mass was studied by means of the Distinct Element Code. Subsequently, it was studied that the sensibility of joint surface roughness, strength of joint wall, joint stiffness ( i.e. tangential and normal stiffness) on the rock mass strength. (2) Based on the experimental rock mass classification methods of RMR and GSI, the program of “Parameters Calculation of the Rock Mass ” was developed. It has realized the rapid choice of rock mass parameters. (3) The concept of Representive Element Volume was induced based on the study of dimensional effect of rock mass. The Representive Element Volume of the horizontal and vertical pillar (ab. Two Pillars ) in the 2nd zone of Jinchuan mine were gained by the Geology Statistic Method and the Distinct Element Code. And then, the strength and deformatiom parameters of rock mass of the Two Pillars were obtained through numerical experiment. (4) From the confining depressure after thriaxial compression test of rock sample, it was concluded that the failure of rock is caused mainly by the lateral deformation and energy release happened during the confining depressure processure. The criterion of plastic energy catastrophe of rock engineering failure was proposed and validated. Subsquently, the stability of the horizontal pillar and Qianjiangping landslide in Three Gorges was judged by means of above-mentioned method. (5) Based on the fact there is a phenomenon of increasing energy concentration while the rock mass was compressed, rock information entropy (i.e. energy distribution entropy) was proposed. And it was revealed that there was change of energy distribution entropy while the rock mass was compressed to failure.
Resumo:
Since C.L. Hearn presented the concept of flow unit in 1984, its generation mechanisms and controlling factors have been studied in many aspects using different methods by researchers. There are some basic methods to do the research, and there are several concepts and classification standards about flow unit. Based on previous achievements and using methodologies from sedimentary geology, geophysics, seismic stratigraphy, and reservoir engineering, the author systemically studies the factors controlling flow unit, puts forward a series of methods for recognition, classification and evaluation of flow unit. The results obtained in this paper have important significance not only for understanding the flow unit, but also for revealing the distribution of remaining oil. As a case, this paper deals with the reservoir rocks in Guantao Group of Gudong Oilfield. Zhanhua Sag, Jiyang Depression in Bohaiwan Basin. Based on the study of stratigraphic, depositional and structural characteristics, the author establishes reservoir geological models, reveals the geological characteristics of oil-bearing reservoir of fluvial facies, points out the factors controlling flow unit and geological parameters for classification of flow unit. and summarizes methods and technologies for flow unit study when geological, well-logging and mathematical methods are used. It is the first attempt in literatures to evaluate reservoir by well-logging data constrained by geological conditions, then a well-logging evaluation model can be built. This kind of model is more precise than ever for calculating physical parameters in flow unit. In a well bore, there are six methods to recognize a flow unit. Among them, the activity function and intra-layer difference methods are the most effective. Along a section, the composition type of flow unit can be located according amplitude and impedance on seismic section. Slice method and other methods are used to distinguish flow unit. In order to reveal the distribution laws of flow unit in space, the author create a new method, named combination and composition of flow unit. Based on microscopic pore structure research, the classification methods of flow unit are developed. There are three types of flow unit in the reservoir of fluvial facies. They have their own lithology, petrophysics and pore structure character. Using judgement method, standard functions are built to determine the class of flow unit of fluvial facies. Combining reservoir engineering methods, the distribution laws of remaining oil in different types, or in different part of a flow unit are studied. It is evident that the remaining oil is controlled by the type of flow unit. The author reveals the relationship between flow unit and remaining oil distribution, builds the flowing models, predicts the variation of reservoir parameters in space, put forward different methods developing remaining oil in different flow unit. Especially, based on the results obtained in this paper, some suggestions for the adjustment of the developing flow units have been applied in Districts No.4 and No.7, and good results have been yielded. So, the results of this paper can guide oil field development. They are useful and significant for developing the remaining oil and enhancing the oil recovery efficiency.
Resumo:
We introduce Active Hidden Models (AHM) that utilize kernel methods traditionally associated with classification. We use AHMs to track deformable objects in video sequences by leveraging kernel projections. We introduce the "subset projection" method which improves the efficiency of our tracking approach by a factor of ten. We successfully tested our method on facial tracking with extreme head movements (including full 180-degree head rotation), facial expressions, and deformable objects. Given a kernel and a set of training observations, we derive unbiased estimates of the accuracy of the AHM tracker. Kernels are generally used in classification methods to make training data linearly separable. We prove that the optimal (minimum variance) tracking kernels are those that make the training observations linearly dependent.
Resumo:
— Consideration of how people respond to the question What is this? has suggested new problem frontiers for pattern recognition and information fusion, as well as neural systems that embody the cognitive transformation of declarative information into relational knowledge. In contrast to traditional classification methods, which aim to find the single correct label for each exemplar (This is a car), the new approach discovers rules that embody coherent relationships among labels which would otherwise appear contradictory to a learning system (This is a car, that is a vehicle, over there is a sedan). This talk will describe how an individual who experiences exemplars in real time, with each exemplar trained on at most one category label, can autonomously discover a hierarchy of cognitive rules, thereby converting local information into global knowledge. Computational examples are based on the observation that sensors working at different times, locations, and spatial scales, and experts with different goals, languages, and situations, may produce apparently inconsistent image labels, which are reconciled by implicit underlying relationships that the network’s learning process discovers. The ARTMAP information fusion system can, moreover, integrate multiple separate knowledge hierarchies, by fusing independent domains into a unified structure. In the process, the system discovers cross-domain rules, inferring multilevel relationships among groups of output classes, without any supervised labeling of these relationships. In order to self-organize its expert system, the ARTMAP information fusion network features distributed code representations which exploit the model’s intrinsic capacity for one-to-many learning (This is a car and a vehicle and a sedan) as well as many-to-one learning (Each of those vehicles is a car). Fusion system software, testbed datasets, and articles are available from http://cns.bu.edu/techlab.
Resumo:
Objectives: This study examined the validity of a latent class typology of adolescent drinking based on four alcohol dimensions; frequency of drinking, quantity consumed, frequency of binge drinking and the number of alcohol related problems encountered. Method: Data used were from the 1970 British Cohort Study sixteen-year-old follow-up. Partial or complete responses to the selected alcohol measures were provided by 6,516 cohort members. The data were collected via a series of postal questionnaires. Results: A five class LCA typology was constructed. Around 12% of the sample were classified as �hazardous drinkers� reporting frequent drinking, high levels of alcohol consumed, frequent binge drinking and multiple alcohol related problems. Multinomial logistic regression, with multiple imputation for missing data, was used to assess the covariates of adolescent drinking patterns. Hazardous drinking was associated with being white, being male, having heavy drinking parents (in particular fathers), smoking, illicit drug use, and minor and violent offending behaviour. Non-significant associations were found between drinking patterns and general mental health and attention deficient disorder. Conclusion: The latent class typology exhibited concurrent validity in terms of its ability to distinguish respondents across a number of alcohol and non-alcohol indicators. Notwithstanding a number of limitations, latent class analysis offers an alternative data reduction method for the construction of drinking typologies that addresses known weaknesses inherent in more tradition classification methods.
Resumo:
In semiconductor fabrication processes, effective management of maintenance operations is fundamental to decrease costs associated with failures and downtime. Predictive Maintenance (PdM) approaches, based on statistical methods and historical data, are becoming popular for their predictive capabilities and low (potentially zero) added costs. We present here a PdM module based on Support Vector Machines for prediction of integral type faults, that is, the kind of failures that happen due to machine usage and stress of equipment parts. The proposed module may also be employed as a health factor indicator. The module has been applied to a frequent maintenance problem in semiconductor manufacturing industry, namely the breaking of the filament in the ion-source of ion-implantation tools. The PdM has been tested on a real production dataset. © 2013 IEEE.
Resumo:
The in-line measurement of COD and NH4-N in the WWTP inflow is crucial for the timely monitoring of biological wastewater treatment processes and for the development of advanced control strategies for optimized WWTP operation. As a direct measurement of COD and NH4-N requires expensive and high maintenance in-line probes or analyzers, an approach estimating COD and NH4-N based on standard and spectroscopic in-line inflow measurement systems using Machine Learning Techniques is presented in this paper. The results show that COD estimation using Radom Forest Regression with a normalized MSE of 0.3, which is sufficiently accurate for practical applications, can be achieved using only standard in-line measurements. In the case of NH4-N, a good estimation using Partial Least Squares Regression with a normalized MSE of 0.16 is only possible based on a combination of standard and spectroscopic in-line measurements. Furthermore, the comparison of regression and classification methods shows that both methods perform equally well in most cases.
Resumo:
Many current e-commerce systems provide personalization when their content is shown to users. In this sense, recommender systems make personalized suggestions and provide information of items available in the system. Nowadays, there is a vast amount of methods, including data mining techniques that can be employed for personalization in recommender systems. However, these methods are still quite vulnerable to some limitations and shortcomings related to recommender environment. In order to deal with some of them, in this work we implement a recommendation methodology in a recommender system for tourism, where classification based on association is applied. Classification based on association methods, also named associative classification methods, consist of an alternative data mining technique, which combines concepts from classification and association in order to allow association rules to be employed in a prediction context. The proposed methodology was evaluated in some case studies, where we could verify that it is able to shorten limitations presented in recommender systems and to enhance recommendation quality.
Resumo:
Les documents publiés par des entreprises, tels les communiqués de presse, contiennent une foule d’informations sur diverses activités des entreprises. C’est une source précieuse pour des analyses en intelligence d’affaire. Cependant, il est nécessaire de développer des outils pour permettre d’exploiter cette source automatiquement, étant donné son grand volume. Ce mémoire décrit un travail qui s’inscrit dans un volet d’intelligence d’affaire, à savoir la détection de relations d’affaire entre les entreprises décrites dans des communiqués de presse. Dans ce mémoire, nous proposons une approche basée sur la classification. Les méthodes de classifications existantes ne nous permettent pas d’obtenir une performance satisfaisante. Ceci est notamment dû à deux problèmes : la représentation du texte par tous les mots, qui n’aide pas nécessairement à spécifier une relation d’affaire, et le déséquilibre entre les classes. Pour traiter le premier problème, nous proposons une approche de représentation basée sur des mots pivots c’est-à-dire les noms d’entreprises concernées, afin de mieux cerner des mots susceptibles de les décrire. Pour le deuxième problème, nous proposons une classification à deux étapes. Cette méthode s’avère plus appropriée que les méthodes traditionnelles de ré-échantillonnage. Nous avons testé nos approches sur une collection de communiqués de presse dans le domaine automobile. Nos expérimentations montrent que les approches proposées peuvent améliorer la performance de classification. Notamment, la représentation du document basée sur les mots pivots nous permet de mieux centrer sur les mots utiles pour la détection de relations d’affaire. La classification en deux étapes apporte une solution efficace au problème de déséquilibre entre les classes. Ce travail montre que la détection automatique des relations d’affaire est une tâche faisable. Le résultat de cette détection pourrait être utilisé dans une analyse d’intelligence d’affaire.
Resumo:
This paper highlights the prediction of Learning Disabilities (LD) in school-age children using two classification methods, Support Vector Machine (SVM) and Decision Tree (DT), with an emphasis on applications of data mining. About 10% of children enrolled in school have a learning disability. Learning disability prediction in school age children is a very complicated task because it tends to be identified in elementary school where there is no one sign to be identified. By using any of the two classification methods, SVM and DT, we can easily and accurately predict LD in any child. Also, we can determine the merits and demerits of these two classifiers and the best one can be selected for the use in the relevant field. In this study, Sequential Minimal Optimization (SMO) algorithm is used in performing SVM and J48 algorithm is used in constructing decision trees.
Resumo:
An analysis of historical Corona images, Landsat images, recent radar and Google Earth® images was conducted to determine land use and land cover changes of oases settlements and surrounding rangelands at the fringe of the Altay Mountains from 1964 to 2008. For the Landsat datasets supervised classification methods were used to test the suitability of the Maximum Likelihood Classifier with subsequent smoothing and the Sequential Maximum A Posteriori Classifier (SMAPC). The results show a trend typical for the steppe and desert regions of northern China. From 1964 to 2008 farmland strongly increased (+ 61%), while the area of grassland and forest in the floodplains decreased (- 43%). The urban areas increased threefold and 400 ha of former agricultural land were abandoned. Farmland apparently affected by soil salinity decreased in size from 1990 (1180 ha) to 2008 (630 ha). The vegetated areas of the surrounding rangelands decreased, mainly as a result of overgrazing and drought events.The SMAPC with subsequent post processing revealed the highest classification accuracy. However, the specific landscape characteristics of mountain oasis systems required labour intensive post processing. Further research is needed to test the use of ancillary information for an automated classification of the examined landscape features.
Resumo:
Die zunehmende Vernetzung der Informations- und Kommunikationssysteme führt zu einer weiteren Erhöhung der Komplexität und damit auch zu einer weiteren Zunahme von Sicherheitslücken. Klassische Schutzmechanismen wie Firewall-Systeme und Anti-Malware-Lösungen bieten schon lange keinen Schutz mehr vor Eindringversuchen in IT-Infrastrukturen. Als ein sehr wirkungsvolles Instrument zum Schutz gegenüber Cyber-Attacken haben sich hierbei die Intrusion Detection Systeme (IDS) etabliert. Solche Systeme sammeln und analysieren Informationen von Netzwerkkomponenten und Rechnern, um ungewöhnliches Verhalten und Sicherheitsverletzungen automatisiert festzustellen. Während signatur-basierte Ansätze nur bereits bekannte Angriffsmuster detektieren können, sind anomalie-basierte IDS auch in der Lage, neue bisher unbekannte Angriffe (Zero-Day-Attacks) frühzeitig zu erkennen. Das Kernproblem von Intrusion Detection Systeme besteht jedoch in der optimalen Verarbeitung der gewaltigen Netzdaten und der Entwicklung eines in Echtzeit arbeitenden adaptiven Erkennungsmodells. Um diese Herausforderungen lösen zu können, stellt diese Dissertation ein Framework bereit, das aus zwei Hauptteilen besteht. Der erste Teil, OptiFilter genannt, verwendet ein dynamisches "Queuing Concept", um die zahlreich anfallenden Netzdaten weiter zu verarbeiten, baut fortlaufend Netzverbindungen auf, und exportiert strukturierte Input-Daten für das IDS. Den zweiten Teil stellt ein adaptiver Klassifikator dar, der ein Klassifikator-Modell basierend auf "Enhanced Growing Hierarchical Self Organizing Map" (EGHSOM), ein Modell für Netzwerk Normalzustand (NNB) und ein "Update Model" umfasst. In dem OptiFilter werden Tcpdump und SNMP traps benutzt, um die Netzwerkpakete und Hostereignisse fortlaufend zu aggregieren. Diese aggregierten Netzwerkpackete und Hostereignisse werden weiter analysiert und in Verbindungsvektoren umgewandelt. Zur Verbesserung der Erkennungsrate des adaptiven Klassifikators wird das künstliche neuronale Netz GHSOM intensiv untersucht und wesentlich weiterentwickelt. In dieser Dissertation werden unterschiedliche Ansätze vorgeschlagen und diskutiert. So wird eine classification-confidence margin threshold definiert, um die unbekannten bösartigen Verbindungen aufzudecken, die Stabilität der Wachstumstopologie durch neuartige Ansätze für die Initialisierung der Gewichtvektoren und durch die Stärkung der Winner Neuronen erhöht, und ein selbst-adaptives Verfahren eingeführt, um das Modell ständig aktualisieren zu können. Darüber hinaus besteht die Hauptaufgabe des NNB-Modells in der weiteren Untersuchung der erkannten unbekannten Verbindungen von der EGHSOM und der Überprüfung, ob sie normal sind. Jedoch, ändern sich die Netzverkehrsdaten wegen des Concept drif Phänomens ständig, was in Echtzeit zur Erzeugung nicht stationärer Netzdaten führt. Dieses Phänomen wird von dem Update-Modell besser kontrolliert. Das EGHSOM-Modell kann die neuen Anomalien effektiv erkennen und das NNB-Model passt die Änderungen in Netzdaten optimal an. Bei den experimentellen Untersuchungen hat das Framework erfolgversprechende Ergebnisse gezeigt. Im ersten Experiment wurde das Framework in Offline-Betriebsmodus evaluiert. Der OptiFilter wurde mit offline-, synthetischen- und realistischen Daten ausgewertet. Der adaptive Klassifikator wurde mit dem 10-Fold Cross Validation Verfahren evaluiert, um dessen Genauigkeit abzuschätzen. Im zweiten Experiment wurde das Framework auf einer 1 bis 10 GB Netzwerkstrecke installiert und im Online-Betriebsmodus in Echtzeit ausgewertet. Der OptiFilter hat erfolgreich die gewaltige Menge von Netzdaten in die strukturierten Verbindungsvektoren umgewandelt und der adaptive Klassifikator hat sie präzise klassifiziert. Die Vergleichsstudie zwischen dem entwickelten Framework und anderen bekannten IDS-Ansätzen zeigt, dass der vorgeschlagene IDSFramework alle anderen Ansätze übertrifft. Dies lässt sich auf folgende Kernpunkte zurückführen: Bearbeitung der gesammelten Netzdaten, Erreichung der besten Performanz (wie die Gesamtgenauigkeit), Detektieren unbekannter Verbindungen und Entwicklung des in Echtzeit arbeitenden Erkennungsmodells von Eindringversuchen.
Resumo:
El presente proyecto tiene como objeto identificar cuáles son los conceptos de salud, enfermedad, epidemiología y riesgo aplicables a las empresas del sector de extracción de petróleo y gas natural en Colombia. Dado, el bajo nivel de predicción de los análisis financieros tradicionales y su insuficiencia, en términos de inversión y toma de decisiones a largo plazo, además de no considerar variables como el riesgo y las expectativas de futuro, surge la necesidad de abordar diferentes perspectivas y modelos integradores. Esta apreciación es pertinente dentro del sector de extracción de petróleo y gas natural, debido a la creciente inversión extranjera que ha reportado, US$2.862 millones en el 2010, cifra mayor a diez veces su valor en el año 2003. Así pues, se podrían desarrollar modelos multi-dimensional, con base en los conceptos de salud financiera, epidemiológicos y estadísticos. El termino de salud y su adopción en el sector empresarial, resulta útil y mantiene una coherencia conceptual, evidenciando una presencia de diferentes subsistemas o factores interactuantes e interconectados. Es necesario mencionar también, que un modelo multidimensional (multi-stage) debe tener en cuenta el riesgo y el análisis epidemiológico ha demostrado ser útil al momento de determinarlo e integrarlo en el sistema junto a otros conceptos, como la razón de riesgo y riesgo relativo. Esto se analizará mediante un estudio teórico-conceptual, que complementa un estudio previo, para contribuir al proyecto de finanzas corporativas de la línea de investigación en Gerencia.