840 resultados para Data Mining, Clustering, PSA, Pavement Deflection
Resumo:
M. Galea and Q. Shen. Simultaneous ant colony optimisation algorithms for learning linguistic fuzzy rules. A. Abraham, C. Grosan and V. Ramos (Eds.), Swarm Intelligence in Data Mining, pages 75-99.
Resumo:
R. Jensen and Q. Shen, 'Fuzzy-Rough Feature Significance for Fuzzy Decision Trees,' in Proceedings of the 2005 UK Workshop on Computational Intelligence, pp. 89-96, 2005.
Resumo:
The problem of discovering frequent poly-regions (i.e. regions of high occurrence of a set of items or patterns of a given alphabet) in a sequence is studied, and three efficient approaches are proposed to solve it. The first one is entropy-based and applies a recursive segmentation technique that produces a set of candidate segments which may potentially lead to a poly-region. The key idea of the second approach is the use of a set of sliding windows over the sequence. Each sliding window covers a sequence segment and keeps a set of statistics that mainly include the number of occurrences of each item or pattern in that segment. Combining these statistics efficiently yields the complete set of poly-regions in the given sequence. The third approach applies a technique based on the majority vote, achieving linear running time with a minimal number of false negatives. After identifying the poly-regions, the sequence is converted to a sequence of labeled intervals (each one corresponding to a poly-region). An efficient algorithm for mining frequent arrangements of intervals is applied to the converted sequence to discover frequently occurring arrangements of poly-regions in different parts of DNA, including coding regions. The proposed algorithms are tested on various DNA sequences producing results of significant biological meaning.
Resumo:
Mapping novel terrain from sparse, complex data often requires the resolution of conflicting information from sensors working at different times, locations, and scales, and from experts with different goals and situations. Information fusion methods help resolve inconsistencies in order to distinguish correct from incorrect answers, as when evidence variously suggests that an object's class is car, truck, or airplane. The methods developed here consider a complementary problem, supposing that information from sensors and experts is reliable though inconsistent, as when evidence suggests that an objects class is car, vehicle, or man-made. Underlying relationships among objects are assumed to be unknown to the automated system of the human user. The ARTMAP information fusion system uses distributed code representations that exploit the neural network's capacity for one-to-many learning in order to produce self-organizing expert systems that discover hierarchial knowledge structures. The system infers multi-level relationships among groups of output classes, without any supervised labeling of these relationships. The procedure is illustrated with two image examples.
Resumo:
Classifying novel terrain or objects front sparse, complex data may require the resolution of conflicting information from sensors working at different times, locations, and scales, and from sources with different goals and situations. Information fusion methods can help resolve inconsistencies, as when evidence variously suggests that an object's class is car, truck, or airplane. The methods described here consider a complementary problem, supposing that information from sensors and experts is reliable though inconsistent, as when evidence suggests that an object's class is car, vehicle, and man-made. Underlying relationships among objects are assumed to be unknown to the automated system or the human user. The ARTMAP information fusion system used distributed code representations that exploit the neural network's capacity for one-to-many learning in order to produce self-organizing expert systems that discover hierarchical knowledge structures. The system infers multi-level relationships among groups of output classes, without any supervised labeling of these relationships.
Resumo:
Classifying novel terrain or objects from sparse, complex data may require the resolution of conflicting information from sensors woring at different times, locations, and scales, and from sources with different goals and situations. Information fusion methods can help resolve inconsistencies, as when eveidence variously suggests that and object's class is car, truck, or airplane. The methods described her address a complementary problem, supposing that information from sensors and experts is reliable though inconsistent, as when evidence suggests that an object's class is car, vehicle, and man-made. Underlying relationships among classes are assumed to be unknown to the autonomated system or the human user. The ARTMAP information fusion system uses distributed code representations that exploit the neural network's capacity for one-to-many learning in order to produce self-organizing expert systems that discover hierachical knowlege structures. The fusion system infers multi-level relationships among groups of output classes, without any supervised labeling of these relationships. The procedure is illustrated with two image examples, but is not limited to image domain.
Resumo:
The purpose of this study is to explore aspects of social organisation during the Upper Palaeolithic and Mesolithic periods using craniometric data. Different hypotheses were tested using geometric morphometrics, alongside traditional craniometric data. The clustering of individuals from the same site, as well as a correspondence to an isolation-by-distance model—particular in the Mesolithic samples—points to population structure within these groups. Moreover, discontinuities in cranial traits between the early Upper Palaeolithic and later periods could suggest that the Last Glacial Maximum had a disruptive effect on populations in Europe. Differences in social organisation can often result from cultural norms regarding post-marital residence. Such differences can be tested by comparing cranial data to that of geographic information. Greater variation in male cranial traits relative to females, after controlling for location, suggests that the overall pattern of residence during the Upper Palaeolithic and Mesolithic was one of matrilocality. It has been suggested that coastal occupation was density dependent and these populations show a greater degree of sedentism than their inland counterparts. Moreover, it has been proposed that coastal areas were not continuously occupied until the Late Pleistocene due to spatial restrictions that would adversely affect reproductive opportunities. This study corroborates the pattern seen in cranial traits corresponded with that of a more sedentary population. The results are consistent with the hypothesis that coastal populations are more sedentary than inland populations during these periods. This study adds new information regarding the social dynamics of prehistoric populations in Europe and sheds light on some of the conditions that may have paved the way for the transition to agriculture
Resumo:
Temporal representation and reasoning plays an important role in Data Mining and Knowledge Discovery, particularly, in mining and recognizing patterns with rich temporal information. Based on a formal characterization of time-series and state-sequences, this paper presents the computational technique and algorithm for matching state-based temporal patterns. As a case study of real-life applications, zone-defense pattern recognition in basketball games is specially examined as an illustrating example. Experimental results demonstrate that it provides a formal and comprehensive temporal ontology for research and applications in video events detection.
Resumo:
Data identification is a key task for any Internet Service Provider (ISP) or network administrator. As port fluctuation and encryption become more common in P2P traffic wishing to avoid identification, new strategies must be developed to detect and classify such flows. This paper introduces a new method of separating P2P and standard web traffic that can be applied as part of a data mining process, based on the activity of the hosts on the network. Unlike other research, our method is aimed at classifying individual flows rather than just identifying P2P hosts or ports. Heuristics are analysed and a classification system proposed. The accuracy of the system is then tested using real network traffic from a core internet router showing over 99% accuracy in some cases. We expand on this proposed strategy to investigate its application to real-time, early classification problems. New proposals are made and the results of real-time experiments compared to those obtained in the data mining research. To the best of our knowledge this is the first research to use host based flow identification to determine a flows application within the early stages of the connection.
Resumo:
Discrete Conditional Phase-type (DC-Ph) models are a family of models which represent skewed survival data conditioned on specific inter-related discrete variables. The survival data is modeled using a Coxian phase-type distribution which is associated with the inter-related variables using a range of possible data mining approaches such as Bayesian networks (BNs), the Naïve Bayes Classification method and classification regression trees. This paper utilizes the Discrete Conditional Phase-type model (DC-Ph) to explore the modeling of patient waiting times in an Accident and Emergency Department of a UK hospital. The resulting DC-Ph model takes on the form of the Coxian phase-type distribution conditioned on the outcome of a logistic regression model.
Resumo:
The skin of fish is the first line of defense against pathogens and parasites. The skin transcriptome of the Atlantic salmon is poorly characterized, and currently only 2,089 expressed sequence tags (ESTs) out of a total of half a million sequences are generated from skin-derived cDNA libraries. The primary aim of this study was to enhance the transcriptomic knowledge of salmon skin by using next-generation sequencing (NGS) technology, namely the Roche-454 platform. An equimolar mixture of high-quality RNA from skin and epidermal samples of salmon reared in either freshwater or seawater was used for 454-sequencing. This technique yielded over 600,000 reads, which were assembled into 34,696 isotigs using Newbler. Of these isotigs, 12 % had not been sequenced in Atlantic salmon, hence representing previously unreported salmon mRNAs that can potentially be skin-specific. Many full-length genes have been acquired, representing numerous biological processes. Mucin proteins are the main structural component of mucus and we examined in greater detail the sequences we obtained for these genes. Several isotigs exhibited homology to mammalian mucins (MUC2, MUC5AC and MUC5B). Mucin mRNAs are generally > 10 kbp and contain large repetitive units, which pose a challenge towards full-length sequence discovery. To date, we have not unearthed any full-length salmon mucin genes with this dataset, but have both N- and C-terminal regions of a mucin type 5. This highlights the fact that, while NGS is indeed a formidable tool for sequence data mining of non-model species, it must be complemented with additional experimental and bioinformatic work to characterize some mRNA sequences with complex features.
Resumo:
Secretory factors that drive cancer progression are attractive immunotherapeutic targets. We used a whole-genome data-mining approach on multiple cohorts of breast tumours annotated for clinical outcomes to discover such factors. We identified Serine protease inhibitor Kazal-type 1 (SPINK1) to be associated with poor survival in estrogen receptor-positive (ER+) cases. Immunohistochemistry showed that SPINK1 was absent in normal breast, present in early and advanced tumours, and its expression correlated with poor survival in ER+ tumours. In ER- cases, the prognostic effect did not reach statistical significance. Forced expression and/or exposure to recombinant SPINK1 induced invasiveness without affecting cell proliferation. However, down-regulation of SPINK1 resulted in cell death. Further, SPINK1 overexpressing cells were resistant to drug-induced apoptosis due to reduced caspase-3 levels and high expression of Bcl2 and phospho-Bcl2 proteins. Intriguingly, these anti-apoptotic effects of SPINK1 were abrogated by mutations of its protease inhibition domain. Thus, SPINK1 affects multiple aggressive properties in breast cancer: survival, invasiveness and chemoresistance. Because SPINK1 effects are abrogated by neutralizing antibodies, we suggest that SPINK1 is a viable potential therapeutic target in breast cancer.
Resumo:
Achieving a clearer picture of categorial distinctions in the brain is essential for our understanding of the conceptual lexicon, but much more fine-grained investigations are required in order for this evidence to contribute to lexical research. Here we present a collection of advanced data-mining techniques that allows the category of individual concepts to be decoded from single trials of EEG data. Neural activity was recorded while participants silently named images of mammals and tools, and category could be detected in single trials with an accuracy well above chance, both when considering data from single participants, and when group-training across participants. By aggregating across all trials, single concepts could be correctly assigned to their category with an accuracy of 98%. The pattern of classifications made by the algorithm confirmed that the neural patterns identified are due to conceptual category, and not any of a series of processing-related confounds. The time intervals, frequency bands and scalp locations that proved most informative for prediction permit physiological interpretation: the widespread activation shortly after appearance of the stimulus (from 100. ms) is consistent both with accounts of multi-pass processing, and distributed representations of categories. These methods provide an alternative to fMRI for fine-grained, large-scale investigations of the conceptual lexicon. © 2010 Elsevier Inc.