807 resultados para Machine learning experiments


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In order to address road safety effectively, it is essential to understand all the factors, which
attribute to the occurrence of a road collision. This is achieved through road safety
assessment measures, which are primarily based on historical crash data. Recent advances
in uncertain reasoning technology have led to the development of robust machine learning
techniques, which are suitable for investigating road traffic collision data. These techniques
include supervised learning (e.g. SVM) and unsupervised learning (e.g. Cluster Analysis).
This study extends upon previous research work, carried out in Coll et al. [3], which
proposed a non-linear aggregation framework for identifying temporal and spatial hotspots.
The results from Coll et al. [3] identified Lisburn area as the hotspot, in terms of road safety,
in Northern Ireland. This study aims to use Cluster Analysis, to investigate and highlight any
hidden patterns associated with collisions that occurred in Lisburn area, which in turn, will
provide more clarity in the causation factors so that appropriate countermeasures can be put
in place.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In recent years, wide-field sky surveys providing deep multi-band imaging have presented a new path for indirectly characterizing the progenitor populations of core-collapse supernovae (SN): systematic light curve studies. We assemble a set of 76 grizy-band Type IIP SN light curves from Pan-STARRS1, obtained over a constant survey program of 4 years and classified using both spectroscopy and machine learning-based photometric techniques. We develop and apply a new Bayesian model for the full multi-band evolution of each light curve in the sample. We find no evidence of a sub-population of fast-declining explosions (historically referred to as "Type IIL" SNe). However, we identify a highly significant relation between the plateau phase decay rate and peak luminosity among our SNe IIP. These results argue in favor of a single parameter, likely determined by initial stellar mass, predominantly controlling the explosions of red supergiants. This relation could also be applied for supernova cosmology, offering a standardizable candle good to an intrinsic scatter of 0.2 mag. We compare each light curve to physical models from hydrodynamic simulations to estimate progenitor initial masses and other properties of the Pan-STARRS1 Type IIP SN sample. We show that correction of systematic discrepancies between modeled and observed SN IIP light curve properties and an expanded grid of progenitor properties, are needed to enable robust progenitor inferences from multi-band light curve samples of this kind. This work will serve as a pathfinder for photometric studies of core-collapse SNe to be conducted through future wide field transient searches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Detection of adulteration of non-processed vegetable oil with lesser value seed oils (classic example is hazelnut in virgin olive oil) has been in the centre of scientific attention for many years and several chemical methods were proposed. The recent EC Regulation 1169/2011, however, introduces necessity for different analytical method in a more complicated matrix. From the end of 2014, food businesses required to declare the composition of the refined oil mixture in the food product label. This creates a gap since there is no analytical method currently available to perform such analysis. In the first phase the work focused on 100% oil blends of various oil species of palm oil (and derivatives), sunflower and rapeseed oil before expanding to foodstuffs. Chromatographic methods remain highly relevant although suffer from various limitations which derive from natural compositional variation. Modern multivariate techniques based on machine learning algorithms, however, when applied in FTIR, Raman spectroscopic data have a strong potential in tackling the problem.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This work examines the conformational ensemble involved in β-hairpin folding by means of advanced molecular dynamics simulations and dimensionality reduction. A fully atomistic description of the protein and the surrounding solvent molecules is used, and this complex energy landscape is sampled by means of parallel tempering metadynamics simulations. The ensemble of configurations explored is analyzed using the recently proposed sketch-map algorithm. Further simulations allow us to probe how mutations affect the structures adopted by this protein. We find that many of the configurations adopted by a mutant are the same as those adopted by the wild-type protein. Furthermore, certain mutations destabilize secondary-structure-containing configurations by preventing the formation of hydrogen bonds or by promoting the formation of new intramolecular contacts. Our analysis demonstrates that machine-learning techniques can be used to study the energy landscapes of complex molecules and that the visualizations that are generated in this way provide a natural basis for examining how the stabilities of particular configurations of the molecule are affected by factors such as temperature or structural mutations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Cloud data centres are critical business infrastructures and the fastest growing service providers. Detecting anomalies in Cloud data centre operation is vital. Given the vast complexity of the data centre system software stack, applications and workloads, anomaly detection is a challenging endeavour. Current tools for detecting anomalies often use machine learning techniques, application instance behaviours or system metrics distribu- tion, which are complex to implement in Cloud computing environments as they require training, access to application-level data and complex processing. This paper presents LADT, a lightweight anomaly detection tool for Cloud data centres that uses rigorous correlation of system metrics, implemented by an efficient corre- lation algorithm without need for training or complex infrastructure set up. LADT is based on the hypothesis that, in an anomaly-free system, metrics from data centre host nodes and virtual machines (VMs) are strongly correlated. An anomaly is detected whenever correlation drops below a threshold value. We demonstrate and evaluate LADT using a Cloud environment, where it shows that the hosting node I/O operations per second (IOPS) are strongly correlated with the aggregated virtual machine IOPS, but this correlation vanishes when an application stresses the disk, indicating a node-level anomaly.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The battle to mitigate Android malware has become more critical with the emergence of new strains incorporating increasingly sophisticated evasion techniques, in turn necessitating more advanced detection capabilities. Hence, in this paper we propose and evaluate a machine learning based approach based on eigenspace analysis for Android malware detection using features derived from static analysis characterization of Android applications. Empirical evaluation with a dataset of real malware and benign samples show that detection rate of over 96% with a very low false positive rate is achievable using the proposed method.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Inherently error-resilient applications in areas such as signal processing, machine learning and data analytics provide opportunities for relaxing reliability requirements, and thereby reducing the overhead incurred by conventional error correction schemes. In this paper, we exploit the tolerable imprecision of such applications by designing an energy-efficient fault-mitigation scheme for unreliable data memories to meet target yield. The proposed approach uses a bit-shuffling mechanism to isolate faults into bit locations with lower significance. This skews the bit-error distribution towards the low order bits, substantially limiting the output error magnitude. By controlling the granularity of the shuffling, the proposed technique enables trading-off quality for power, area, and timing overhead. Compared to error-correction codes, this can reduce the overhead by as much as 83% in read power, 77% in read access time, and 89% in area, when applied to various data mining applications in 28nm process technology.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this study, we introduce an original distance definition for graphs, called the Markov-inverse-F measure (MiF). This measure enables the integration of classical graph theory indices with new knowledge pertaining to structural feature extraction from semantic networks. MiF improves the conventional Jaccard and/or Simpson indices, and reconciles both the geodesic information (random walk) and co-occurrence adjustment (degree balance and distribution). We measure the effectiveness of graph-based coefficients through the application of linguistic graph information for a neural activity recorded during conceptual processing in the human brain. Specifically, the MiF distance is computed between each of the nouns used in a previous neural experiment and each of the in-between words in a subgraph derived from the Edinburgh Word Association Thesaurus of English. From the MiF-based information matrix, a machine learning model can accurately obtain a scalar parameter that specifies the degree to which each voxel in (the MRI image of) the brain is activated by each word or each principal component of the intermediate semantic features. Furthermore, correlating the voxel information with the MiF-based principal components, a new computational neurolinguistics model with a network connectivity paradigm is created. This allows two dimensions of context space to be incorporated with both semantic and neural distributional representations.

Relevância:

80.00% 80.00%

Publicador:

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, an automatic Smart Irrigation Decision Support System, SIDSS, is proposed to manage irrigation in agriculture. Our system estimates the weekly irrigations needs of a plantation, on the basis of both soil measurements and climatic variables gathered by several autonomous nodes deployed in field. This enables a closed loop control scheme to adapt the decision support system to local perturbations and estimation errors. Two machine learning techniques, PLSR and ANFIS, are proposed as reasoning engine of our SIDSS. Our approach is validated on three commercial plantations of citrus trees located in the South-East of Spain. Performance is tested against decisions taken by a human expert.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recently there has been an increasing interest in the development of new methods using Pareto optimality to deal with multi-objective criteria (for example, accuracy and architectural complexity). Once one has learned a model based on their devised method, the problem is then how to compare it with the state of art. In machine learning, algorithms are typically evaluated by comparing their performance on different data sets by means of statistical tests. Unfortunately, the standard tests used for this purpose are not able to jointly consider performance measures. The aim of this paper is to resolve this issue by developing statistical procedures that are able to account for multiple competing measures at the same time. In particular, we develop two tests: a frequentist procedure based on the generalized likelihood-ratio test and a Bayesian procedure based on a multinomial-Dirichlet conjugate model. We further extend them by discovering conditional independences among measures to reduce the number of parameter of such models, as usually the number of studied cases is very reduced in such comparisons. Real data from a comparison among general purpose classifiers is used to show a practical application of our tests.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we propose a malware categorization method that models malware behavior in terms of instructions using PageRank. PageRank computes ranks of web pages based on structural information and can also compute ranks of instructions that represent the structural information of the instructions in malware analysis methods. Our malware categorization method uses the computed ranks as features in machine learning algorithms. In the evaluation, we compare the effectiveness of different PageRank algorithms and also investigate bagging and boosting algorithms to improve the categorization accuracy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The rapid evolution and proliferation of a world-wide computerized network, the Internet, resulted in an overwhelming and constantly growing amount of publicly available data and information, a fact that was also verified in biomedicine. However, the lack of structure of textual data inhibits its direct processing by computational solutions. Information extraction is the task of text mining that intends to automatically collect information from unstructured text data sources. The goal of the work described in this thesis was to build innovative solutions for biomedical information extraction from scientific literature, through the development of simple software artifacts for developers and biocurators, delivering more accurate, usable and faster results. We started by tackling named entity recognition - a crucial initial task - with the development of Gimli, a machine-learning-based solution that follows an incremental approach to optimize extracted linguistic characteristics for each concept type. Afterwards, Totum was built to harmonize concept names provided by heterogeneous systems, delivering a robust solution with improved performance results. Such approach takes advantage of heterogenous corpora to deliver cross-corpus harmonization that is not constrained to specific characteristics. Since previous solutions do not provide links to knowledge bases, Neji was built to streamline the development of complex and custom solutions for biomedical concept name recognition and normalization. This was achieved through a modular and flexible framework focused on speed and performance, integrating a large amount of processing modules optimized for the biomedical domain. To offer on-demand heterogenous biomedical concept identification, we developed BeCAS, a web application, service and widget. We also tackled relation mining by developing TrigNER, a machine-learning-based solution for biomedical event trigger recognition, which applies an automatic algorithm to obtain the best linguistic features and model parameters for each event type. Finally, in order to assist biocurators, Egas was developed to support rapid, interactive and real-time collaborative curation of biomedical documents, through manual and automatic in-line annotation of concepts and relations. Overall, the research work presented in this thesis contributed to a more accurate update of current biomedical knowledge bases, towards improved hypothesis generation and knowledge discovery.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Nowadays, communication environments are already characterized by a myriad of competing and complementary technologies that aim to provide an ubiquitous connectivity service. Next Generation Networks need to hide this heterogeneity by providing a new abstraction level, while simultaneously be aware of the underlying technologies to deliver richer service experiences to the end-user. Moreover, the increasing interest for group-based multimedia services followed by their ever growing resource demands and network dynamics, has been boosting the research towards more scalable and exible network control approaches. The work developed in this Thesis enables such abstraction and exploits the prevailing heterogeneity in favor of a context-aware network management and adaptation. In this scope, we introduce a novel hierarchical control framework with self-management capabilities that enables the concept of Abstract Multiparty Trees (AMTs) to ease the control of multiparty content distribution throughout heterogeneous networks. A thorough evaluation of the proposed multiparty transport control framework was performed in the scope of this Thesis, assessing its bene ts in terms of network selection, delivery tree recon guration and resource savings. Moreover, we developed an analytical study to highlight the scalability of the AMT concept as well as its exibility in large scale networks and group sizes. To prove the feasibility and easy deployment characteristic of the proposed control framework, we implemented a proof-of-concept demonstrator that comprehends the main control procedures conceptually introduced. Its outcomes highlight a good performance of the multiparty content distribution tree control, including its local and global recon guration. In order to endow the AMT concept with the ability to guarantee the best service experience by the end-user, we integrate in the control framework two additional QoE enhancement approaches. The rst employs the concept of Network Coding to improve the robustness of the multiparty content delivery, aiming at mitigating the impact of possible packet losses in the end-user service perception. The second approach relies on a machine learning scheme to autonomously determine at each node the expected QoE towards a certain destination. This knowledge is then used by di erent QoE-aware network management schemes that, jointly, maximize the overall users' QoE. The performance and scalability of the control procedures developed, aided by the context and QoE-aware mechanisms, show the advantages of the AMT concept and the proposed hierarchical control strategy for the multiparty content distribution with enhanced service experience. Moreover we also prove the feasibility of the solution in a practical environment, and provide future research directions that bene t the evolved control framework and make it commercially feasible.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.