909 resultados para classification and regression trees


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditionally, many small-sized copepod species are considered to be widespread, bipolar or cosmopolitan. However, these large-scale distribution patterns need to be re-examined in view of increasing evidence of cryptic and pseudo-cryptic speciation in pelagic copepods. Here, we present a phylogeographic study of Oithona similis s.l. populations from the Arctic Ocean, the Southern Ocean and its northern boundaries, the North Atlantic and the Mediterrranean Sea. O. similis s.l. is considered as one of the most abundant species in temperate to polar oceans and acts as an important link in the trophic network between the microbial loop and higher trophic levels such as fish larvae. Two gene fragments were analysed: the mitochondrial cytochrome oxidase c subunit I (COI), and the nuclear ribosomal 28S genetic marker. Seven distinct, geographically delimitated, mitochondrial lineages could be identified, with divergences among the lineages ranging from 8 to 24 %, thus representing most likely cryptic or pseudocryptic species within O. similis s.l. Four lineages were identified within or close to the borders of the Southern Ocean, one lineage in the Arctic Ocean and two lineages in the temperate Northern hemisphere. Surprisingly the Arctic lineage was more closely related to lineages from the Southern hemisphere than to the other lineages from the Northern hemisphere, suggesting that geographic proximity is a rather poor predictor of how closely related the clades are on a genetic level. Molecular clock application revealed that the evolutionary history of O. similis s.l. is possibly closely associated with the reorganization of the ocean circulation in the mid Miocene and may be an example of allopatric speciation in the pelagic zone.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Research across several countries has shown that degree classification (i.e. the final grade awarded to students successfully completing university) is an important determinant of graduates’ first destination outcome. Graduates leaving university with higher degree classifications have better employment opportunities and a higher likelihood of continuing education relative to those with lower degree classifications. This article investigates whether one of the reasons for this result is that employers and higher education institutions use degree classification as a signalling device for the ability that recent graduates may possess. Given the large number of applicants and the amount of time and resources typically required to assess their skills, employers and higher education institutions may decide to rely on this measure when forming beliefs about recent graduates’ abilities. Using data on two cohorts of recent graduates from a UK university, results suggest that an Upper Second degree classification may have a signalling role.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Malware detection is a growing problem particularly on the Android mobile platform due to its increasing popularity and accessibility to numerous third party app markets. This has also been made worse by the increasingly sophisticated detection avoidance techniques employed by emerging malware families. This calls for more effective techniques for detection and classification of Android malware. Hence, in this paper we present an n-opcode analysis based approach that utilizes machine learning to classify and categorize Android malware. This approach enables automated feature discovery that eliminates the need for applying expert or domain knowledge to define the needed features. Our experiments on 2520 samples that were performed using up to 10-gram opcode features showed that an f-measure of 98% is achievable using this approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to optimize frontal detection in sea surface temperature fields at 4 km resolution, a combined statistical and expert-based approach is applied to test different spatial smoothing of the data prior to the detection process. Fronts are usually detected at 1 km resolution using the histogram-based, single image edge detection (SIED) algorithm developed by Cayula and Cornillon in 1992, with a standard preliminary smoothing using a median filter and a 3 × 3 pixel kernel. Here, detections are performed in three study regions (off Morocco, the Mozambique Channel, and north-western Australia) and across the Indian Ocean basin using the combination of multiple windows (CMW) method developed by Nieto, Demarcq and McClatchie in 2012 which improves on the original Cayula and Cornillon algorithm. Detections at 4 km and 1 km of resolution are compared. Fronts are divided in two intensity classes (“weak” and “strong”) according to their thermal gradient. A preliminary smoothing is applied prior to the detection using different convolutions: three type of filters (median, average and Gaussian) combined with four kernel sizes (3 × 3, 5 × 5, 7 × 7, and 9 × 9 pixels) and three detection window sizes (16 × 16, 24 × 24 and 32 × 32 pixels) to test the effect of these smoothing combinations on reducing the background noise of the data and therefore on improving the frontal detection. The performance of the combinations on 4 km data are evaluated using two criteria: detection efficiency and front length. We find that the optimal combination of preliminary smoothing parameters in enhancing detection efficiency and preserving front length includes a median filter, a 16 × 16 pixel window size, and a 5 × 5 pixel kernel for strong fronts and a 7 × 7 pixel kernel for weak fronts. Results show an improvement in detection performance (from largest to smallest window size) of 71% for strong fronts and 120% for weak fronts. Despite the small window used (16 × 16 pixels), the length of the fronts has been preserved relative to that found with 1 km data. This optimal preliminary smoothing and the CMW detection algorithm on 4 km sea surface temperature data are then used to describe the spatial distribution of the monthly frequencies of occurrence for both strong and weak fronts across the Indian Ocean basin. In general strong fronts are observed in coastal areas whereas weak fronts, with some seasonal exceptions, are mainly located in the open ocean. This study shows that adequate noise reduction done by a preliminary smoothing of the data considerably improves the frontal detection efficiency as well as the global quality of the results. Consequently, the use of 4 km data enables frontal detections similar to 1 km data (using a standard median 3 × 3 convolution) in terms of detectability, length and location. This method, using 4 km data is easily applicable to large regions or at the global scale with far less constraints of data manipulation and processing time relative to 1 km data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Blastic plasmacytoid dendritic cell neoplasm (BPDCN) is a rare subtype of leukemia/lymphoma, whose diagnosis can be difficult to achieve due to its clinical and biological heterogeneity, as well as its overlapping features with other hematologic malignancies. In this study we investigated whether the association between the maturational stage of tumor cells and the clinico-biological and prognostic features of the disease, based on the analysis of 46 BPDCN cases classified into three maturation-associated subgroups on immunophenotypic grounds. Our results show that blasts from cases with an immature plasmacytoid dendritic cell (pDC) phenotype exhibit an uncommon CD56- phenotype, coexisting with CD34+ non-pDC tumor cells, typically in the absence of extramedullary (e.g. skin) disease at presentation. Conversely, patients with a more mature blast cell phenotype more frequently displayed skin/extramedullary involvement and spread into secondary lymphoid tissues. Despite the dismal outcome, acute lymphoblastic leukemia-type therapy (with central nervous system prophylaxis) and/or allogeneic stem cell transplantation appeared to be the only effective therapies. Overall, our findings indicate that the maturational profile of pDC blasts in BPDCN is highly heterogeneous and translates into a wide clinical spectrum -from acute leukemia to mature lymphoma-like behavior-, which may also lead to variable diagnosis and treatment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In knowledge technology work, as expressed by the scope of this conference, there are a number of communities, each uncovering new methods, theories, and practices. The Library and Information Science (LIS) community is one such community. This community, through tradition and innovation, theories and practice, organizes knowledge and develops knowledge technologies formed by iterative research hewn to the values of equal access and discovery for all. The Information Modeling community is another contributor to knowledge technologies. It concerns itself with the construction of symbolic models that capture the meaning of information and organize it in ways that are computer-based, but human understandable. A recent paper that examines certain assumptions in information modeling builds a bridge between these two communities, offering a forum for a discussion on common aims from a common perspective. In a June 2000 article, Parsons and Wand separate classes from instances in information modeling in order to free instances from what they call the “tyranny” of classes. They attribute a number of problems in information modeling to inherent classification – or the disregard for the fact that instances can be conceptualized independent of any class assignment. By faceting instances from classes, Parsons and Wand strike a sonorous chord with classification theory as understood in LIS. In the practice community and in the publications of LIS, faceted classification has shifted the paradigm of knowledge organization theory in the twentieth century. Here, with the proposal of inherent classification and the resulting layered information modeling, a clear line joins both the LIS classification theory community and the information modeling community. Both communities have their eyes turned toward networked resource discovery, and with this conceptual conjunction a new paradigmatic conversation can take place. Parsons and Wand propose that the layered information model can facilitate schema integration, schema evolution, and interoperability. These three spheres in information modeling have their own connotation, but are not distant from the aims of classification research in LIS. In this new conceptual conjunction, established by Parsons and Ward, information modeling through the layered information model, can expand the horizons of classification theory beyond LIS, promoting a cross-fertilization of ideas on the interoperability of subject access tools like classification schemes, thesauri, taxonomies, and ontologies. This paper examines the common ground between the layered information model and faceted classification, establishing a vocabulary and outlining some common principles. It then turns to the issue of schema and the horizons of conventional classification and the differences between Information Modeling and Library and Information Science. Finally, a framework is proposed that deploys an interpretation of the layered information modeling approach in a knowledge technologies context. In order to design subject access systems that will integrate, evolve and interoperate in a networked environment, knowledge organization specialists must consider a semantic class independence like Parsons and Wand propose for information modeling.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper outlines three information organization frameworks: library classification, social tagging, and boundary infrastructures. It then outlines functionality of these frameworks. The paper takes a neo-pragmatic approach. The paper finds that these frameworks are complementary, and by understanding the differences and similarities that obtain between them, researchers and developers can begin to craft a vocabulary of evaluation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The semiarid region of northeastern Brazil, the Caatinga, is extremely important due to its biodiversity and endemism. Measurements of plant physiology are crucial to the calibration of Dynamic Global Vegetation Models (DGVMs) that are currently used to simulate the responses of vegetation in face of global changes. In a field work realized in an area of preserved Caatinga forest located in Petrolina, Pernambuco, measurements of carbon assimilation (in response to light and CO2) were performed on 11 individuals of Poincianella microphylla, a native species that is abundant in this region. These data were used to calibrate the maximum carboxylation velocity (Vcmax) used in the INLAND model. The calibration techniques used were Multiple Linear Regression (MLR), and data mining techniques as the Classification And Regression Tree (CART) and K-MEANS. The results were compared to the UNCALIBRATED model. It was found that simulated Gross Primary Productivity (GPP) reached 72% of observed GPP when using the calibrated Vcmax values, whereas the UNCALIBRATED approach accounted for 42% of observed GPP. Thus, this work shows the benefits of calibrating DGVMs using field ecophysiological measurements, especially in areas where field data is scarce or non-existent, such as in the Caatinga

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Deep learning methods are extremely promising machine learning tools to analyze neuroimaging data. However, their potential use in clinical settings is limited because of the existing challenges of applying these methods to neuroimaging data. In this study, first a data leakage type caused by slice-level data split that is introduced during training and validation of a 2D CNN is surveyed and a quantitative assessment of the model’s performance overestimation is presented. Second, an interpretable, leakage-fee deep learning software written in a python language with a wide range of options has been developed to conduct both classification and regression analysis. The software was applied to the study of mild cognitive impairment (MCI) in patients with small vessel disease (SVD) using multi-parametric MRI data where the cognitive performance of 58 patients measured by five neuropsychological tests is predicted using a multi-input CNN model taking brain image and demographic data. Each of the cognitive test scores was predicted using different MRI-derived features. As MCI due to SVD has been hypothesized to be the effect of white matter damage, DTI-derived features MD and FA produced the best prediction outcome of the TMT-A score which is consistent with the existing literature. In a second study, an interpretable deep learning system aimed at 1) classifying Alzheimer disease and healthy subjects 2) examining the neural correlates of the disease that causes a cognitive decline in AD patients using CNN visualization tools and 3) highlighting the potential of interpretability techniques to capture a biased deep learning model is developed. Structural magnetic resonance imaging (MRI) data of 200 subjects was used by the proposed CNN model which was trained using a transfer learning-based approach producing a balanced accuracy of 71.6%. Brain regions in the frontal and parietal lobe showing the cerebral cortex atrophy were highlighted by the visualization tools.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The dissertation addresses the still not solved challenges concerned with the source-based digital 3D reconstruction, visualisation and documentation in the domain of archaeology, art and architecture history. The emerging BIM methodology and the exchange data format IFC are changing the way of collaboration, visualisation and documentation in the planning, construction and facility management process. The introduction and development of the Semantic Web (Web 3.0), spreading the idea of structured, formalised and linked data, offers semantically enriched human- and machine-readable data. In contrast to civil engineering and cultural heritage, academic object-oriented disciplines, like archaeology, art and architecture history, are acting as outside spectators. Since the 1990s, it has been argued that a 3D model is not likely to be considered a scientific reconstruction unless it is grounded on accurate documentation and visualisation. However, these standards are still missing and the validation of the outcomes is not fulfilled. Meanwhile, the digital research data remain ephemeral and continue to fill the growing digital cemeteries. This study focuses, therefore, on the evaluation of the source-based digital 3D reconstructions and, especially, on uncertainty assessment in the case of hypothetical reconstructions of destroyed or never built artefacts according to scientific principles, making the models shareable and reusable by a potentially wide audience. The work initially focuses on terminology and on the definition of a workflow especially related to the classification and visualisation of uncertainty. The workflow is then applied to specific cases of 3D models uploaded to the DFG repository of the AI Mainz. In this way, the available methods of documenting, visualising and communicating uncertainty are analysed. In the end, this process will lead to a validation or a correction of the workflow and the initial assumptions, but also (dealing with different hypotheses) to a better definition of the levels of uncertainty.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spectral sensors are a wide class of devices that are extremely useful for detecting essential information of the environment and materials with high degree of selectivity. Recently, they have achieved high degrees of integration and low implementation cost to be suited for fast, small, and non-invasive monitoring systems. However, the useful information is hidden in spectra and it is difficult to decode. So, mathematical algorithms are needed to infer the value of the variables of interest from the acquired data. Between the different families of predictive modeling, Principal Component Analysis and the techniques stemmed from it can provide very good performances, as well as small computational and memory requirements. For these reasons, they allow the implementation of the prediction even in embedded and autonomous devices. In this thesis, I will present 4 practical applications of these algorithms to the prediction of different variables: moisture of soil, moisture of concrete, freshness of anchovies/sardines, and concentration of gasses. In all of these cases, the workflow will be the same. Initially, an acquisition campaign was performed to acquire both spectra and the variables of interest from samples. Then these data are used as input for the creation of the prediction models, to solve both classification and regression problems. From these models, an array of calibration coefficients is derived and used for the implementation of the prediction in an embedded system. The presented results will show that this workflow was successfully applied to very different scientific fields, obtaining autonomous and non-invasive devices able to predict the value of physical parameters of choice from new spectral acquisitions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Workflow activity was the following: Preliminary phase: Identification of 18 Formalin-fixed paraffin embedded (FFPE) samples (9 patients) («matched» 9 AK lesions and 9 SCC lesions). Working on biopsies samples we perform an extraction and RNA analysis with droplet Digital PCR (ddPCR) and we perform the data analysis. Second and final step phase: Evaluation of additional 39 subjects (36 men and 3 women). Results: We perform an evaluation and comparison of the following miRNA: miR-320 (a miRNA involved in apoptosis and cell proliferation control; miR-204, a miRNA involved in cell proliferation in and miRNA-16-5p, a miRNA involved in apoptosis).Conclusion: Our data suggest that there is no significant variation in the expression of the three tested microRNAs between adjacent AK lesions and squamous-cell carcinoma. However, a relevant trend has been observed Furthermore, by evaluating the miRNA expression trend between keratosis and carcinoma of the same patient, it is observed that there is no "uniform trend": for some samples the expression rises for the transition from AK to SCC and viceversa.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

RESUMO Objetivo Identificar e analisar os coeficientes de incidência de úlceras por pressão (UP) e os fatores de risco para o seu desenvolvimento em pacientes críticos com doenças cardiopneumológicas. Método Estudo de coorte, prospectivo realizado na Unidade de Terapia Intensiva (UTI) Cardiopneumológica de um hospital de grande porte na cidade de São Paulo, durante os meses de novembro de 2013 a fevereiro de 2014. Participaram do estudo 370 pacientes maiores de 18 anos, que não apresentavam UP na admissão e que estavam na UTI há menos de 24 horas. Os dados foram analisados por meio de análises univariadas e multivariada (Classification And Regression Tree - CART). Resultados Os coeficientes de incidência de UP foram: 11,0% para o total, distribuindo-se em 8,0% entre os homens e 3,0% para as mulheres (p=0,018); 10,0% na raça branca e 6,5% em pessoas com idade igual e superior a 60 anos. Os principais fatores de risco encontrados foram tempo de permanência na UTI igual ou superior a 9,5 dias, idade igual ou superior a 42,5 anos e raça branca. Conclusão O estudo contribui para os conhecimentos relacionados à epidemiologia das UP em pacientes críticos com doenças cardiopneumológicas, favorecendo o planejamento de cuidados preventivos específicos para essa clientela.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Diagnosing pediatric pneumonia is challenging in low-resource settings. The World Health Organization (WHO) has defined primary end-point radiological pneumonia for use in epidemiological and vaccine studies. However, radiography requires expertise and is often inaccessible. We hypothesized that plasma biomarkers of inflammation and endothelial activation may be useful surrogates for end-point pneumonia, and may provide insight into its biological significance. METHODS: We studied children with WHO-defined clinical pneumonia (n = 155) within a prospective cohort of 1,005 consecutive febrile children presenting to Tanzanian outpatient clinics. Based on x-ray findings, participants were categorized as primary end-point pneumonia (n = 30), other infiltrates (n = 31), or normal chest x-ray (n = 94). Plasma levels of 7 host response biomarkers at presentation were measured by ELISA. Associations between biomarker levels and radiological findings were assessed by Kruskal-Wallis test and multivariable logistic regression. Biomarker ability to predict radiological findings was evaluated using receiver operating characteristic curve analysis and Classification and Regression Tree analysis. RESULTS: Compared to children with normal x-ray, children with end-point pneumonia had significantly higher C-reactive protein, procalcitonin and Chitinase 3-like-1, while those with other infiltrates had elevated procalcitonin and von Willebrand Factor and decreased soluble Tie-2 and endoglin. Clinical variables were not predictive of radiological findings. Classification and Regression Tree analysis generated multi-marker models with improved performance over single markers for discriminating between groups. A model based on C-reactive protein and Chitinase 3-like-1 discriminated between end-point pneumonia and non-end-point pneumonia with 93.3% sensitivity (95% confidence interval 76.5-98.8), 80.8% specificity (72.6-87.1), positive likelihood ratio 4.9 (3.4-7.1), negative likelihood ratio 0.083 (0.022-0.32), and misclassification rate 0.20 (standard error 0.038). CONCLUSIONS: In Tanzanian children with WHO-defined clinical pneumonia, combinations of host biomarkers distinguished between end-point pneumonia, other infiltrates, and normal chest x-ray, whereas clinical variables did not. These findings generate pathophysiological hypotheses and may have potential research and clinical utility.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.