807 resultados para Machine Learning,hepatocellular malignancies,HCC,MVI
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Background and aim: E-cadherin binds to beta-catenin to form the cadherin/catenin complex required for strong cell adhesion. Inactivation of this complex in tumors facilitates invasion into surrounding tissues. Alterations of both proteins have been reported in hepatocellular carcinomas (HCC). However, the interactions between E-cadherin and beta-catenin in HCC from different geographical groups have not been explored. The aim of the present study was to assess the role of E-cadherin and beta-catenin in Australian and South African patients with HCC. Methods: DNA was extracted from malignant and non-malignant liver tissue from 37 Australian and 24 South African patients, and from histologically normal liver from 20 transplant donors. Chromosomal instability at 16q22, promoter methylation at E-cadherin, beta-catenin mutations and E-cadherin and beta-catenin protein expression was assessed using loss of heterozygosity, methylation-specific polymerase chain reaction, denaturing high-performance liquid chromatography and immunohistochemistry, respectively. Results: Loss of heterozygosity at 16q22 was prevalent in South African HCC patients (50%vs 11%; P < 0.05, chi(2)). In contrast, E-cadherin promoter hypermethylation was common in Australian cases in both malignant (30%vs 13%; P = not significant, chi(2)) and non-malignant liver (57%vs 8%, respectively, P < 0.001, chi(2)). Methylation of non-malignant liver was more likely to be detected in patients over the age of 50 years (P < 0.001, chi(2)), the overall mean age for our cohort of patients. Only one beta-catenin mutation was identified. E-cadherin protein expression was reduced in one HCC, while abnormalities in protein expression were absent in beta-catenin. Conclusion: Contrary to previous observations in HCC from other countries, neither E-cadherin nor beta-catenin appears to play a role in hepatocarcinogenesis in Australian and South African patients with HCC. (C) 2004 Blackwell Publishing Asia Pty Ltd.
Resumo:
Foreign exchange trading has emerged recently as a significant activity in many countries. As with most forms of trading, the activity is influenced by many random parameters so that the creation of a system that effectively emulates the trading process will be very helpful. A major issue for traders in the deregulated Foreign Exchange Market is when to sell and when to buy a particular currency in order to maximize profit. This paper presents novel trading strategies based on the machine learning methods of genetic algorithms and reinforcement learning.
Resumo:
Objective: Inpatient length of stay (LOS) is an important measure of hospital activity, health care resource consumption, and patient acuity. This research work aims at developing an incremental expectation maximization (EM) based learning approach on mixture of experts (ME) system for on-line prediction of LOS. The use of a batchmode learning process in most existing artificial neural networks to predict LOS is unrealistic, as the data become available over time and their pattern change dynamically. In contrast, an on-line process is capable of providing an output whenever a new datum becomes available. This on-the-spot information is therefore more useful and practical for making decisions, especially when one deals with a tremendous amount of data. Methods and material: The proposed approach is illustrated using a real example of gastroenteritis LOS data. The data set was extracted from a retrospective cohort study on all infants born in 1995-1997 and their subsequent admissions for gastroenteritis. The total number of admissions in this data set was n = 692. Linked hospitalization records of the cohort were retrieved retrospectively to derive the outcome measure, patient demographics, and associated co-morbidities information. A comparative study of the incremental learning and the batch-mode learning algorithms is considered. The performances of the learning algorithms are compared based on the mean absolute difference (MAD) between the predictions and the actual LOS, and the proportion of predictions with MAD < 1 day (Prop(MAD < 1)). The significance of the comparison is assessed through a regression analysis. Results: The incremental learning algorithm provides better on-line prediction of LOS when the system has gained sufficient training from more examples (MAD = 1.77 days and Prop(MAD < 1) = 54.3%), compared to that using the batch-mode learning. The regression analysis indicates a significant decrease of MAD (p-value = 0.063) and a significant (p-value = 0.044) increase of Prop(MAD
Resumo:
Objectives: To systematically review radiofrequency ablation (RFA) for treating liver tumors. Data Sources: Databases were searched in July 2003. Study Selection: Studies comparing RFA with other therapies for hepatocellular carcinoma (HCC) and colorectal liver metastases (CLM) plus selected case series for CLM. Data Extraction: One researcher used standardized data extraction tables developed before the study, and these were checked by a second researcher. Data Synthesis: For HCC, 1.3 comparative studies were included, 4 of which were randomized, controlled trials. For CLM, 13 studies were included, 2 of which were nonrandomized comparative studies and 11 that were case series. There did not seem to be any distinct differences in the complication rates between RFA and any of the other procedures for treatment of HCC. The local recurrence rate at 2 years showed a statistically significant benefit for RFA over percutaneous ethanol injection for treatment of HCC (6% vs 26%, 1 randomized, controlled trial). Local recurrence was reported to be more common after RFA than after laser-induced thermotherapy, and a higher recurrence rate and a shorter time to recurrence were dassociated with RFA compared with surgical resection (1 nonrandomized study each). For CLM, the postoperative complication rate ranged from 0% to 33% (3 case series). Survival after diagnosis was shorter in the CLM group treated with RFA than in the surgical resection group (1 nonrandomized study). The CLM local recurrence rate after RFA ranged from 4% to 55% (6 case series). Conclusions: Radiofrequency ablation may be more effective than other treatments in terms of less recurrence of HCC and may be as sale, although the evidence is scant. There was not enough evidence to determine the safety or efficacy of RFA for treatment of CLM.
Resumo:
Learning from mistakes has proven to be an effective way of learning in the interactive document classifications. In this paper we propose an approach to effectively learning from mistakes in the email filtering process. Our system has employed both SVM and Winnow machine learning algorithms to learn from misclassified email documents and refine the email filtering process accordingly. Our experiments have shown that the training of an email filter becomes much effective and faster
Resumo:
Automatic ontology building is a vital issue in many fields where they are currently built manually. This paper presents a user-centred methodology for ontology construction based on the use of Machine Learning and Natural Language Processing. In our approach, the user selects a corpus of texts and sketches a preliminary ontology (or selects an existing one) for a domain with a preliminary vocabulary associated to the elements in the ontology (lexicalisations). Examples of sentences involving such lexicalisation (e.g. ISA relation) in the corpus are automatically retrieved by the system. Retrieved examples are validated by the user and used by an adaptive Information Extraction system to generate patterns that discover other lexicalisations of the same objects in the ontology, possibly identifying new concepts or relations. New instances are added to the existing ontology or used to tune it. This process is repeated until a satisfactory ontology is obtained. The methodology largely automates the ontology construction process and the output is an ontology with an associated trained leaner to be used for further ontology modifications.
Resumo:
The Vapnik-Chervonenkis (VC) dimension is a combinatorial measure of a certain class of machine learning problems, which may be used to obtain upper and lower bounds on the number of training examples needed to learn to prescribed levels of accuracy. Most of the known bounds apply to the Probably Approximately Correct (PAC) framework, which is the framework within which we work in this paper. For a learning problem with some known VC dimension, much is known about the order of growth of the sample-size requirement of the problem, as a function of the PAC parameters. The exact value of sample-size requirement is however less well-known, and depends heavily on the particular learning algorithm being used. This is a major obstacle to the practical application of the VC dimension. Hence it is important to know exactly how the sample-size requirement depends on VC dimension, and with that in mind, we describe a general algorithm for learning problems having VC dimension 1. Its sample-size requirement is minimal (as a function of the PAC parameters), and turns out to be the same for all non-trivial learning problems having VC dimension 1. While the method used cannot be naively generalised to higher VC dimension, it suggests that optimal algorithm-dependent bounds may improve substantially on current upper bounds.
Resumo:
A theoretical model is presented which describes selection in a genetic algorithm (GA) under a stochastic fitness measure and correctly accounts for finite population effects. Although this model describes a number of selection schemes, we only consider Boltzmann selection in detail here as results for this form of selection are particularly transparent when fitness is corrupted by additive Gaussian noise. Finite population effects are shown to be of fundamental importance in this case, as the noise has no effect in the infinite population limit. In the limit of weak selection we show how the effects of any Gaussian noise can be removed by increasing the population size appropriately. The theory is tested on two closely related problems: the one-max problem corrupted by Gaussian noise and generalization in a perceptron with binary weights. The averaged dynamics can be accurately modelled for both problems using a formalism which describes the dynamics of the GA using methods from statistical mechanics. The second problem is a simple example of a learning problem and by considering this problem we show how the accurate characterization of noise in the fitness evaluation may be relevant in machine learning. The training error (negative fitness) is the number of misclassified training examples in a batch and can be considered as a noisy version of the generalization error if an independent batch is used for each evaluation. The noise is due to the finite batch size and in the limit of large problem size and weak selection we show how the effect of this noise can be removed by increasing the population size. This allows the optimal batch size to be determined, which minimizes computation time as well as the total number of training examples required.
Resumo:
In this paper we introduce and illustrate non-trivial upper and lower bounds on the learning curves for one-dimensional Gaussian Processes. The analysis is carried out emphasising the effects induced on the bounds by the smoothness of the random process described by the Modified Bessel and the Squared Exponential covariance functions. We present an explanation of the early, linearly-decreasing behavior of the learning curves and the bounds as well as a study of the asymptotic behavior of the curves. The effects of the noise level and the lengthscale on the tightness of the bounds are also discussed.
Resumo:
Formal grammars can used for describing complex repeatable structures such as DNA sequences. In this paper, we describe the structural composition of DNA sequences using a context-free stochastic L-grammar. L-grammars are a special class of parallel grammars that can model the growth of living organisms, e.g. plant development, and model the morphology of a variety of organisms. We believe that parallel grammars also can be used for modeling genetic mechanisms and sequences such as promoters. Promoters are short regulatory DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for successful gene prediction. Promoters can be recognized by certain patterns that are conserved within a species, but there are many exceptions which makes the promoter recognition a complex problem. We replace the problem of promoter recognition by induction of context-free stochastic L-grammar rules, which are later used for the structural analysis of promoter sequences. L-grammar rules are derived automatically from the drosophila and vertebrate promoter datasets using a genetic programming technique and their fitness is evaluated using a Support Vector Machine (SVM) classifier. The artificial promoter sequences generated using the derived L- grammar rules are analyzed and compared with natural promoter sequences.
Resumo:
This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.