960 resultados para least common subgraph algorithm
Resumo:
Feature selection is important in medical field for many reasons. However, selecting important variables is a difficult task with the presence of censoring that is a unique feature in survival data analysis. This paper proposed an approach to deal with the censoring problem in endovascular aortic repair survival data through Bayesian networks. It was merged and embedded with a hybrid feature selection process that combines cox's univariate analysis with machine learning approaches such as ensemble artificial neural networks to select the most relevant predictive variables. The proposed algorithm was compared with common survival variable selection approaches such as; least absolute shrinkage and selection operator LASSO, and Akaike information criterion AIC methods. The results showed that it was capable of dealing with high censoring in the datasets. Moreover, ensemble classifiers increased the area under the roc curves of the two datasets collected from two centers located in United Kingdom separately. Furthermore, ensembles constructed with center 1 enhanced the concordance index of center 2 prediction compared to the model built with a single network. Although the size of the final reduced model using the neural networks and its ensembles is greater than other methods, the model outperformed the others in both concordance index and sensitivity for center 2 prediction. This indicates the reduced model is more powerful for cross center prediction.
Resumo:
Lifelong surveillance is not cost-effective after endovascular aneurysm repair (EVAR), but is required to detect aortic complications which are fatal if untreated (type 1/3 endoleak, sac expansion, device migration). Aneurysm morphology determines the probability of aortic complications and therefore the need for surveillance, but existing analyses have proven incapable of identifying patients at sufficiently low risk to justify abandoning surveillance. This study aimed to improve the prediction of aortic complications, through the application of machine-learning techniques. Patients undergoing EVAR at 2 centres were studied from 2004–2010. Aneurysm morphology had previously been studied to derive the SGVI Score for predicting aortic complications. Bayesian Neural Networks were designed using the same data, to dichotomise patients into groups at low- or high-risk of aortic complications. Network training was performed only on patients treated at centre 1. External validation was performed by assessing network performance independently of network training, on patients treated at centre 2. Discrimination was assessed by Kaplan-Meier analysis to compare aortic complications in predicted low-risk versus predicted high-risk patients. 761 patients aged 75 +/− 7 years underwent EVAR in 2 centres. Mean follow-up was 36+/− 20 months. Neural networks were created incorporating neck angu- lation/length/diameter/volume; AAA diameter/area/volume/length/tortuosity; and common iliac tortuosity/diameter. A 19-feature network predicted aor- tic complications with excellent discrimination and external validation (5-year freedom from aortic complications in predicted low-risk vs predicted high-risk patients: 97.9% vs. 63%; p < 0.0001). A Bayesian Neural-Network algorithm can identify patients in whom it may be safe to abandon surveillance after EVAR. This proposal requires prospective study.
Resumo:
Traffic incidents are non-recurring events that can cause a temporary reduction in roadway capacity. They have been recognized as a major contributor to traffic congestion on our nation’s highway systems. To alleviate their impacts on capacity, automatic incident detection (AID) has been applied as an incident management strategy to reduce the total incident duration. AID relies on an algorithm to identify the occurrence of incidents by analyzing real-time traffic data collected from surveillance detectors. Significant research has been performed to develop AID algorithms for incident detection on freeways; however, similar research on major arterial streets remains largely at the initial stage of development and testing. This dissertation research aims to identify design strategies for the deployment of an Artificial Neural Network (ANN) based AID algorithm for major arterial streets. A section of the US-1 corridor in Miami-Dade County, Florida was coded in the CORSIM microscopic simulation model to generate data for both model calibration and validation. To better capture the relationship between the traffic data and the corresponding incident status, Discrete Wavelet Transform (DWT) and data normalization were applied to the simulated data. Multiple ANN models were then developed for different detector configurations, historical data usage, and the selection of traffic flow parameters. To assess the performance of different design alternatives, the model outputs were compared based on both detection rate (DR) and false alarm rate (FAR). The results show that the best models were able to achieve a high DR of between 90% and 95%, a mean time to detect (MTTD) of 55-85 seconds, and a FAR below 4%. The results also show that a detector configuration including only the mid-block and upstream detectors performs almost as well as one that also includes a downstream detector. In addition, DWT was found to be able to improve model performance, and the use of historical data from previous time cycles improved the detection rate. Speed was found to have the most significant impact on the detection rate, while volume was found to contribute the least. The results from this research provide useful insights on the design of AID for arterial street applications.
Resumo:
This thesis describes the development of an adaptive control algorithm for Computerized Numerical Control (CNC) machines implemented in a multi-axis motion control board based on the TMS320C31 DSP chip. The adaptive process involves two stages: Plant Modeling and Inverse Control Application. The first stage builds a non-recursive model of the CNC system (plant) using the Least-Mean-Square (LMS) algorithm. The second stage consists of the definition of a recursive structure (the controller) that implements an inverse model of the plant by using the coefficients of the model in an algorithm called Forward-Time Calculation (FTC). In this way, when the inverse controller is implemented in series with the plant, it will pre-compensate for the modification that the original plant introduces in the input signal. The performance of this solution was verified at three different levels: Software simulation, implementation in a set of isolated motor-encoder pairs and implementation in a real CNC machine. The use of the adaptive inverse controller effectively improved the step response of the system in all three levels. In the simulation, an ideal response was obtained. In the motor-encoder test, the rise time was reduced by as much as 80%, without overshoot, in some cases. Even with the larger mass of the actual CNC machine, decrease of the rise time and elimination of the overshoot were obtained in most cases. These results lead to the conclusion that the adaptive inverse controller is a viable approach to position control in CNC machinery.
Resumo:
Traffic incidents are non-recurring events that can cause a temporary reduction in roadway capacity. They have been recognized as a major contributor to traffic congestion on our national highway systems. To alleviate their impacts on capacity, automatic incident detection (AID) has been applied as an incident management strategy to reduce the total incident duration. AID relies on an algorithm to identify the occurrence of incidents by analyzing real-time traffic data collected from surveillance detectors. Significant research has been performed to develop AID algorithms for incident detection on freeways; however, similar research on major arterial streets remains largely at the initial stage of development and testing. This dissertation research aims to identify design strategies for the deployment of an Artificial Neural Network (ANN) based AID algorithm for major arterial streets. A section of the US-1 corridor in Miami-Dade County, Florida was coded in the CORSIM microscopic simulation model to generate data for both model calibration and validation. To better capture the relationship between the traffic data and the corresponding incident status, Discrete Wavelet Transform (DWT) and data normalization were applied to the simulated data. Multiple ANN models were then developed for different detector configurations, historical data usage, and the selection of traffic flow parameters. To assess the performance of different design alternatives, the model outputs were compared based on both detection rate (DR) and false alarm rate (FAR). The results show that the best models were able to achieve a high DR of between 90% and 95%, a mean time to detect (MTTD) of 55-85 seconds, and a FAR below 4%. The results also show that a detector configuration including only the mid-block and upstream detectors performs almost as well as one that also includes a downstream detector. In addition, DWT was found to be able to improve model performance, and the use of historical data from previous time cycles improved the detection rate. Speed was found to have the most significant impact on the detection rate, while volume was found to contribute the least. The results from this research provide useful insights on the design of AID for arterial street applications.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
The convex hull describes the extent or shape of a set of data and is used ubiquitously in computational geometry. Common algorithms to construct the convex hull on a finite set of n points (x,y) range from O(nlogn) time to O(n) time. However, it is often the case that a heuristic procedure is applied to reduce the original set of n points to a set of s < n points which contains the hull and so accelerates the final hull finding procedure. We present an algorithm to precondition data before building a 2D convex hull with integer coordinates, with three distinct advantages. First, for all practical purposes, it is linear; second, no explicit sorting of data is required and third, the reduced set of s points is constructed such that it forms an ordered set that can be directly pipelined into an O(n) time convex hull algorithm. Under these criteria a fast (or O(n)) pre-conditioner in principle creates a fast convex hull (approximately O(n)) for an arbitrary set of points. The paper empirically evaluates and quantifies the acceleration generated by the method against the most common convex hull algorithms. An extra acceleration of at least four times when compared to previous existing preconditioning methods is found from experiments on a dataset.
Resumo:
The convex hull describes the extent or shape of a set of data and is used ubiquitously in computational geometry. Common algorithms to construct the convex hull on a finite set of n points (x,y) range from O(nlogn) time to O(n) time. However, it is often the case that a heuristic procedure is applied to reduce the original set of n points to a set of s < n points which contains the hull and so accelerates the final hull finding procedure. We present an algorithm to precondition data before building a 2D convex hull with integer coordinates, with three distinct advantages. First, for all practical purposes, it is linear; second, no explicit sorting of data is required and third, the reduced set of s points is constructed such that it forms an ordered set that can be directly pipelined into an O(n) time convex hull algorithm. Under these criteria a fast (or O(n)) pre-conditioner in principle creates a fast convex hull (approximately O(n)) for an arbitrary set of points. The paper empirically evaluates and quantifies the acceleration generated by the method against the most common convex hull algorithms. An extra acceleration of at least four times when compared to previous existing preconditioning methods is found from experiments on a dataset.
Resumo:
This theoretical paper attempts to define some of the key components and challenges required to create embodied conversational agents that can be genuinely interesting conversational partners. Wittgenstein's argument concerning talking lions emphasizes the importance of having a shared common ground as a basis for conversational interactions. Virtual bats suggests that-for some people at least-it is important that there be a feeling of authenticity concerning a subjectively experiencing entity that can convey what it is like to be that entity. Electric sheep reminds us of the importance of empathy in human conversational interaction and that we should provide a full communicative repertoire of both verbal and non-verbal components if we are to create genuinely engaging interactions. Also we may be making the task more difficult rather than easy if we leave out non-verbal aspects of communication. Finally, analogical peacocks highlights the importance of between minds alignment and establishes a longer term goal of being interesting, creative, and humorous if an embodied conversational agent is to be truly an engaging conversational partner. Some potential directions and solutions to addressing these issues are suggested.
Resumo:
This paper formulates a linear kernel support vector machine (SVM) as a regularized least-squares (RLS) problem. By defining a set of indicator variables of the errors, the solution to the RLS problem is represented as an equation that relates the error vector to the indicator variables. Through partitioning the training set, the SVM weights and bias are expressed analytically using the support vectors. It is also shown how this approach naturally extends to Sums with nonlinear kernels whilst avoiding the need to make use of Lagrange multipliers and duality theory. A fast iterative solution algorithm based on Cholesky decomposition with permutation of the support vectors is suggested as a solution method. The properties of our SVM formulation are analyzed and compared with standard SVMs using a simple example that can be illustrated graphically. The correctness and behavior of our solution (merely derived in the primal context of RLS) is demonstrated using a set of public benchmarking problems for both linear and nonlinear SVMs.
Resumo:
The incidence of melanoma has increased rapidly over the past 30 years, and the disease is now the sixth most common cancer among men and women in the U.K. Many patients are diagnosed with or develop metastatic disease, and survival is substantially reduced in these patients. Mutations in the BRAF gene have been identified as key drivers of melanoma cells and are found in around 50% of cutaneous melanomas. Vemurafenib (Zelboraf(®) ; Roche Molecular Systems Inc., Pleasanton, CA, U.S.A.) is the first licensed inhibitor of mutated BRAF, and offers a new first-line option for patients with unresectable or metastatic melanoma who harbour BRAF mutations. Vemurafenib was developed in conjunction with a companion diagnostic, the cobas(®) 4800 BRAF V600 Mutation Test. The purpose of this paper is to make evidence-based recommendations to facilitate the implementation of BRAF mutation testing and targeted therapy in patients with metastatic melanoma in the U.K. The recommendations are the result of a meeting of an expert panel and have been reviewed by melanoma specialists and representatives of the National Cancer Research Network Clinical Study Group on behalf of the wider melanoma community. This article is intended to be a starting point for practical advice and recommendations, which will no doubt be updated as we gain further experience in personalizing therapy for patients with melanoma.
Resumo:
The evolution of wireless communication systems leads to Dynamic Spectrum Allocation for Cognitive Radio, which requires reliable spectrum sensing techniques. Among the spectrum sensing methods proposed in the literature, those that exploit cyclostationary characteristics of radio signals are particularly suitable for communication environments with low signal-to-noise ratios, or with non-stationary noise. However, such methods have high computational complexity that directly raises the power consumption of devices which often have very stringent low-power requirements. We propose a strategy for cyclostationary spectrum sensing with reduced energy consumption. This strategy is based on the principle that p processors working at slower frequencies consume less power than a single processor for the same execution time. We devise a strict relation between the energy savings and common parallel system metrics. The results of simulations show that our strategy promises very significant savings in actual devices.
Resumo:
We surveyed macroinvertebrate communities in 31 hill streams in the Vouga River and Mondego River catchments in central Portugal. Despite applying a "least-impacted" criterion, channel and bank management was common, with 38% of streams demonstrating channel modification (damming) and 80% with evidence of bank modification. Principal component analysis (PCA) at the family and species level related the macroinvertebrates to habitat variables derived at three spatial scales -- site (20 m), reach (200 m), and catchment. Variation in community structure between sites was similar at the species and family level and was statistically related to pH, conductivity, temperature, flow, shade, and substrate size at the site scale; channel and bank habitat and riparian vegetation and land-use at the reach scale; and altitude and slope at the catchment scale. While the effects of river management were apparent in various ecologically important habitat features at the site and reach scale, a direct relationship with macroinvertebrate assemblages was only apparent between the extent of walled banks and the secondary PCA axis described by species data. The strong relationship between catchment scale variables and descriptors of physical structure at the reach and site scale suggests that catchment-scale parameters are valuable predicators of macroinvertebrate community structure in these streams despite the anthropogenic modifications of the natural habitat.
Resumo:
The evolution of wireless communication systems leads to Dynamic Spectrum Allocation for Cognitive Radio, which requires reliable spectrum sensing techniques. Among the spectrum sensing methods proposed in the literature, those that exploit cyclostationary characteristics of radio signals are particularly suitable for communication environments with low signal-to-noise ratios, or with non-stationary noise. However, such methods have high computational complexity that directly raises the power consumption of devices which often have very stringent low-power requirements. We propose a strategy for cyclostationary spectrum sensing with reduced energy consumption. This strategy is based on the principle that p processors working at slower frequencies consume less power than a single processor for the same execution time. We devise a strict relation between the energy savings and common parallel system metrics. The results of simulations show that our strategy promises very significant savings in actual devices.
Resumo:
Close similarities have been found between the otoliths of sea-caught and laboratory-reared larvae of the common sole Solea solea (L.), given appropriate temperatures and nourishment of the latter. But from hatching to mouth formation. and during metamorphosis, sole otoliths have proven difficult to read because the increments may be less regular and low contrast. In this study, the growth increments in otoliths of larvae reared at 12 degrees C were counted by light microscopy to test the hypothesis of daily deposition, with some results verified using scanning electron microscopy (SEM), and by image analysis in order to compare the reliability of the 2 methods in age estimation. Age was first estimated (in days posthatch) from light micrographs of whole mounted otoliths. Counts were initiated from the increment formed at the time of month opening (Day 4). The average incremental deposition rate was consistent with the daily hypothesis. However, the light-micrograph readings tended to underestimate the mean ages of the larvae. Errors were probably associated with the low-contrast increments: those deposited after the mouth formation during the transition to first feeding, and those deposited from the onset of eye migration (about 20 d posthatch) during metamorphosis. SEM failed to resolve these low-contrast areas accurately because of poor etching. A method using image analysis was applied to a subsample of micrograph-counted otoliths. The image analysis was supported by an algorithm of pattern recognition (Growth Demodulation Algorithm, GDA). On each otolith, the GDA method integrated the growth pattern of these larval otoliths to averaged data from different radial profiles, in order to demodulate the exponential trend of the signal before spectral analysis (Fast Fourier Transformation, FFT). This second method both allowed more precise designation of increments, particularly for low-contrast areas, and more accurate readings but increased error in mean age estimation. The variability is probably due to a still rough perception of otolith increments by the GDA method, counting being achieved through a theoretical exponential pattern and mean estimates being given by FFT. Although this error variability was greater than expected, the method provides for improvement in both speed and accuracy in otolith readings.