915 resultados para Model Classification
Resumo:
OBJECTIVE: This study proposes a new approach that considers uncertainty in predicting and quantifying the presence and severity of diabetic peripheral neuropathy. METHODS: A rule-based fuzzy expert system was designed by four experts in diabetic neuropathy. The model variables were used to classify neuropathy in diabetic patients, defining it as mild, moderate, or severe. System performance was evaluated by means of the Kappa agreement measure, comparing the results of the model with those generated by the experts in an assessment of 50 patients. Accuracy was evaluated by an ROC curve analysis obtained based on 50 other cases; the results of those clinical assessments were considered to be the gold standard. RESULTS: According to the Kappa analysis, the model was in moderate agreement with expert opinions. The ROC analysis (evaluation of accuracy) determined an area under the curve equal to 0.91, demonstrating very good consistency in classifying patients with diabetic neuropathy. CONCLUSION: The model efficiently classified diabetic patients with different degrees of neuropathy severity. In addition, the model provides a way to quantify diabetic neuropathy severity and allows a more accurate patient condition assessment.
Resumo:
The work presented in this thesis is focused on the open-ended coaxial-probe frequency-domain reflectometry technique for complex permittivity measurement at microwave frequencies of dispersive dielectric multilayer materials. An effective dielectric model is introduced and validated to extend the applicability of this technique to multilayer materials in on-line system context. In addition, the thesis presents: 1) a numerical study regarding the imperfectness of the contact at the probe-material interface, 2) a review of the available models and techniques, 3) a new classification of the extraction schemes with guidelines on how they can be used to improve the overall performance of the probe according to the problem requirements.
Resumo:
The diagnosis, grading and classification of tumours has benefited considerably from the development of DCE-MRI which is now essential to the adequate clinical management of many tumour types due to its capability in detecting active angiogenesis. Several strategies have been proposed for DCE-MRI evaluation. Visual inspection of contrast agent concentration curves vs time is a very simple yet operator dependent procedure, therefore more objective approaches have been developed in order to facilitate comparison between studies. In so called model free approaches, descriptive or heuristic information extracted from time series raw data have been used for tissue classification. The main issue concerning these schemes is that they have not a direct interpretation in terms of physiological properties of the tissues. On the other hand, model based investigations typically involve compartmental tracer kinetic modelling and pixel-by-pixel estimation of kinetic parameters via non-linear regression applied on region of interests opportunely selected by the physician. This approach has the advantage to provide parameters directly related to the pathophysiological properties of the tissue such as vessel permeability, local regional blood flow, extraction fraction, concentration gradient between plasma and extravascular-extracellular space. Anyway, nonlinear modelling is computational demanding and the accuracy of the estimates can be affected by the signal-to-noise ratio and by the initial solutions. The principal aim of this thesis is investigate the use of semi-quantitative and quantitative parameters for segmentation and classification of breast lesion. The objectives can be subdivided as follow: describe the principal techniques to evaluate time intensity curve in DCE-MRI with focus on kinetic model proposed in literature; to evaluate the influence in parametrization choice for a classic bi-compartmental kinetic models; to evaluate the performance of a method for simultaneous tracer kinetic modelling and pixel classification; to evaluate performance of machine learning techniques training for segmentation and classification of breast lesion.
Resumo:
Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.
Resumo:
Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.
Resumo:
Seventeen bones (sixteen cadaveric bones and one plastic bone) were used to validate a method for reconstructing a surface model of the proximal femur from 2D X-ray radiographs and a statistical shape model that was constructed from thirty training surface models. Unlike previously introduced validation studies, where surface-based distance errors were used to evaluate the reconstruction accuracy, here we propose to use errors measured based on clinically relevant morphometric parameters. For this purpose, a program was developed to robustly extract those morphometric parameters from the thirty training surface models (training population), from the seventeen surface models reconstructed from X-ray radiographs, and from the seventeen ground truth surface models obtained either by a CT-scan reconstruction method or by a laser-scan reconstruction method. A statistical analysis was then performed to classify the seventeen test bones into two categories: normal cases and outliers. This classification step depends on the measured parameters of the particular test bone. In case all parameters of a test bone were covered by the training population's parameter ranges, this bone is classified as normal bone, otherwise as outlier bone. Our experimental results showed that statistically there was no significant difference between the morphometric parameters extracted from the reconstructed surface models of the normal cases and those extracted from the reconstructed surface models of the outliers. Therefore, our statistical shape model based reconstruction technique can be used to reconstruct not only the surface model of a normal bone but also that of an outlier bone.
Resumo:
We propose a novel methodology to generate realistic network flow traces to enable systematic evaluation of network monitoring systems in various traffic conditions. Our technique uses a graph-based approach to model the communication structure observed in real-world traces and to extract traffic templates. By combining extracted and user-defined traffic templates, realistic network flow traces that comprise normal traffic and customized conditions are generated in a scalable manner. A proof-of-concept implementation demonstrates the utility and simplicity of our method to produce a variety of evaluation scenarios. We show that the extraction of templates from real-world traffic leads to a manageable number of templates that still enable accurate re-creation of the original communication properties on the network flow level.
Resumo:
Background: In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results: We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion: The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms.
Resumo:
Three fundamental types of suppressor additives for copper electroplating could be identified by means of potential Transient measurements. These suppressor additives differ in their synergistic and antagonistic interplay with anions that are chemisorbed on the metallic copper surface during electrodeposition. In addition these suppressor chemistries reveal different barrier properties with respect to cupric ions and plating additives (Cl, SPS). While the type-I suppressor selectively forms efficient barriers for copper inter-diffusion on chloride-terminated electrode surfaces we identified a type-II suppressor that interacts non-selectively with any kind of anions chemisorbed on copper (chloride, sulfate, sulfonate). Type-I suppressors are vital for the superconformal copper growth mode in Damascene processing and show an antagonistic interaction with SPS (Bis-Sodium-Sulfopropyl-Disulfide) which involves the deactivation of this suppressor chemistry. This suppressor deactivation is rationalized in terms of compositional changes in the layer of the chemisorbed anions due to the competition of chloride and MPS (Mercaptopropane Sulfonic Acid) for adsorption sites on the metallic copper surface. MPS is the product of the dissociative SPS adsorption within the preexisting chloride matrix on the copper surface. The non-selectivity in the adsorption behavior of the type-II suppressor is rationalized in terms of anion/cation pairing effects of the poly-cationic suppressor and the anion-modified copper substrate. Atomic-scale insights into the competitive Cl/MPS adsorption are gained from in situ STM (Scanning Tunneling Microscopy) using single crystalline copper surfaces as model substrates. Type-III suppressors are a third class of suppressors. In case of type-land type-II suppressor chemistries the resulting steady-state deposition conditions are completely independent on the particular succession of additive adsorption. In contrast to that a strong dependence of the suppressing capabilities on the sequence of additive adsorption ("first comes, first serves" principle) is observed for the type-IIIsuppressor. This behavior:is explained by a suppressor barrier that impedes not only the copper inter-diffusion but also the transport of other additives (e.g. SPS) to the copper surface. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Background: The current proposed model of colorectal tumorigenesis is based primarily on CpG island methylator phenotype (CIMP), microsatellite instability (MSI), KRAS, BRAF, and methylation status of 0-6-Methylguanine DNA Methyltransferase (MGMT) and classifies tumors into five subgroups. The aim of this study is to validate this molecular classification and test its prognostic relevance. Methods: Three hundred two patients were included in this study. Molecular analysis was performed for five CIMP-related promoters (CRABP1, MLH1, p16INK4a, CACNA1G, NEUROG1), MGMT, MSI, KRAS, and BRAF. Methylation in at least 4 promoters or in one to three promoters was considered CIMP-high and CIMP-low (CIMP-H/L), respectively. Results: CIMP-H, CIMP-L, and CIMP-negative were found in 7.1, 43, and 49.9% cases, respectively. One hundred twenty-three tumors (41%) could not be classified into any one of the proposed molecular subgroups, including 107 CIMP-L, 14 CIMP-H, and two CIMP-negative cases. The 10 year survival rate for CIMP-high patients [22.6% (95%CI: 7-43)] was significantly lower than for CIMP-L or CIMP-negative (p = 0.0295). Only the combined analysis of BRAF and CIMP (negative versus L/H) led to distinct prognostic subgroups. Conclusion: Although CIMP status has an effect on outcome, our results underline the need for standardized definitions of low- and high-level CIMP, which clearly hinders an effective prognostic and molecular classification of colorectal cancer.
Resumo:
The group analysed some syntactic and phonological phenomena that presuppose the existence of interrelated components within the lexicon, which motivate the assumption that there are some sublexicons within the global lexicon of a speaker. This result is confirmed by experimental findings in neurolinguistics. Hungarian speaking agrammatic aphasics were tested in several ways, the results showing that the sublexicon of closed-class lexical items provides a highly automated complex device for processing surface sentence structure. Analysing Hungarian ellipsis data from a semantic-syntactic aspect, the group established that the lexicon is best conceived of being as split into at least two main sublexicons: the store of semantic-syntactic feature bundles and a separate store of sound forms. On this basis they proposed a format for representing open-class lexical items whose meanings are connected via certain semantic relations. They also proposed a new classification of verbs to account for the contribution of the aspectual reading of the sentence depending on the referential type of the argument, and a new account of the syntactic and semantic behaviour of aspectual prefixes. The partitioned sets of lexical items are sublexicons on phonological grounds. These sublexicons differ in terms of phonotactic grammaticality. The degrees of phonotactic grammaticality are tied up with the problem of psychological reality, of how many degrees of this native speakers are sensitive to. The group developed a hierarchical construction network as an extension of the original General Inheritance Network formalism and this framework was then used as a platform for the implementation of the grammar fragments.
Resumo:
Every inclined land surface has a potential for soil and water degradation, the seriousness depends on a multitude of parameters such as slope, soil type, geomorphology, rainfall, land use and natural vegetation cover. In Laos this intensified land use leads to reduced vegetation cover, to increased soil erosion, decreasing yield, and finally is likely to influence the hydrological regime. Against this background the Mekong River Commission (MRC) elaborated a spatial explicit Watershed Classification (WSC) for the Lower Mekong Basin. Based on topographic factors derived from a high-resolution Digital Terrain Model, five watershed classes are calculated, giving indication about the sensitivity to resource degradation by soil erosion. The WSC allows spatial priority setting for watershed management and generally supports informed decision making on reconnaissance level. In the conclusions focus is laid on general considerations when GIS techniques are used for spatial decision support in a development context.
Resumo:
High-throughput gene expression technologies such as microarrays have been utilized in a variety of scientific applications. Most of the work has been on assessing univariate associations between gene expression with clinical outcome (variable selection) or on developing classification procedures with gene expression data (supervised learning). We consider a hybrid variable selection/classification approach that is based on linear combinations of the gene expression profiles that maximize an accuracy measure summarized using the receiver operating characteristic curve. Under a specific probability model, this leads to consideration of linear discriminant functions. We incorporate an automated variable selection approach using LASSO. An equivalence between LASSO estimation with support vector machines allows for model fitting using standard software. We apply the proposed method to simulated data as well as data from a recently published prostate cancer study.
Resumo:
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.
Resumo:
The construction of a reliable, practically useful prediction rule for future response is heavily dependent on the "adequacy" of the fitted regression model. In this article, we consider the absolute prediction error, the expected value of the absolute difference between the future and predicted responses, as the model evaluation criterion. This prediction error is easier to interpret than the average squared error and is equivalent to the mis-classification error for the binary outcome. We show that the distributions of the apparent error and its cross-validation counterparts are approximately normal even under a misspecified fitted model. When the prediction rule is "unsmooth", the variance of the above normal distribution can be estimated well via a perturbation-resampling method. We also show how to approximate the distribution of the difference of the estimated prediction errors from two competing models. With two real examples, we demonstrate that the resulting interval estimates for prediction errors provide much more information about model adequacy than the point estimates alone.