923 resultados para classification aided by clustering


Relevância:

40.00% 40.00%

Publicador:

Resumo:

ACM Computing Classification System (1998): I.4.9, I.4.10.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning sentence-level embeddings from WordNet glosses using a convolutional neural networks. The initialized word sense embeddings are used by a context clustering based model to generate the distributed representations of word senses. Our learned representations outperform the publicly available embeddings on half of the metrics in the word similarity task, 6 out of 13 sub tasks in the analogical reasoning task, and gives the best overall accuracy in the word sense effect classification task, which shows the effectiveness of our proposed distributed distribution learning model.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The rapid growth of the Internet and the advancements of the Web technologies have made it possible for users to have access to large amounts of on-line music data, including music acoustic signals, lyrics, style/mood labels, and user-assigned tags. The progress has made music listening more fun, but has raised an issue of how to organize this data, and more generally, how computer programs can assist users in their music experience. An important subject in computer-aided music listening is music retrieval, i.e., the issue of efficiently helping users in locating the music they are looking for. Traditionally, songs were organized in a hierarchical structure such as genre->artist->album->track, to facilitate the users’ navigation. However, the intentions of the users are often hard to be captured in such a simply organized structure. The users may want to listen to music of a particular mood, style or topic; and/or any songs similar to some given music samples. This motivated us to work on user-centric music retrieval system to improve users’ satisfaction with the system. The traditional music information retrieval research was mainly concerned with classification, clustering, identification, and similarity search of acoustic data of music by way of feature extraction algorithms and machine learning techniques. More recently the music information retrieval research has focused on utilizing other types of data, such as lyrics, user-access patterns, and user-defined tags, and on targeting non-genre categories for classification, such as mood labels and styles. This dissertation focused on investigating and developing effective data mining techniques for (1) organizing and annotating music data with styles, moods and user-assigned tags; (2) performing effective analysis of music data with features from diverse information sources; and (3) recommending music songs to the users utilizing both content features and user access patterns.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

South Florida’s watersheds have endured a century of urban and agricultural development and disruption of their hydrology. Spatial characterization of South Florida’s estuarine and coastal waters is important to Everglades’ restoration programs. We applied Factor Analysis and Hierarchical Clustering of water quality data in tandem to characterize and spatially subdivide South Florida’s coastal and estuarine waters. Segmentation rendered forty-four biogeochemically distinct water bodies whose spatial distribution is closely linked to geomorphology, circulation, benthic community pattern, and to water management. This segmentation has been adopted with minor changes by federal and state environmental agencies to derive numeric nutrient criteria.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This study subdivides the Weddell Sea, Antarctica, into seafloor regions using multivariate statistical methods. These regions are categories used for comparing, contrasting and quantifying biogeochemical processes and biodiversity between ocean regions geographically but also regions under development within the scope of global change. The division obtained is characterized by the dominating components and interpreted in terms of ruling environmental conditions. The analysis uses 28 environmental variables for the sea surface, 25 variables for the seabed and 9 variables for the analysis between surface and bottom variables. The data were taken during the years 1983-2013. Some data were interpolated. The statistical errors of several interpolation methods (e.g. IDW, Indicator, Ordinary and Co-Kriging) with changing settings have been compared for the identification of the most reasonable method. The multivariate mathematical procedures used are regionalized classification via k means cluster analysis, canonical-correlation analysis and multidimensional scaling. Canonical-correlation analysis identifies the influencing factors in the different parts of the cove. Several methods for the identification of the optimum number of clusters have been tested. For the seabed 8 and 12 clusters were identified as reasonable numbers for clustering the Weddell Sea. For the sea surface the numbers 8 and 13 and for the top/bottom analysis 8 and 3 were identified, respectively. Additionally, the results of 20 clusters are presented for the three alternatives offering the first small scale environmental regionalization of the Weddell Sea. Especially the results of 12 clusters identify marine-influenced regions which can be clearly separated from those determined by the geological catchment area and the ones dominated by river discharge.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Human activities represent a significant burden on the global water cycle, with large and increasing demands placed on limited water resources by manufacturing, energy production and domestic water use. In addition to changing the quantity of available water resources, human activities lead to changes in water quality by introducing a large and often poorly-characterized array of chemical pollutants, which may negatively impact biodiversity in aquatic ecosystems, leading to impairment of valuable ecosystem functions and services. Domestic and industrial wastewaters represent a significant source of pollution to the aquatic environment due to inadequate or incomplete removal of chemicals introduced into waters by human activities. Currently, incomplete chemical characterization of treated wastewaters limits comprehensive risk assessment of this ubiquitous impact to water. In particular, a significant fraction of the organic chemical composition of treated industrial and domestic wastewaters remains uncharacterized at the molecular level. Efforts aimed at reducing the impacts of water pollution on aquatic ecosystems critically require knowledge of the composition of wastewaters to develop interventions capable of protecting our precious natural water resources.

The goal of this dissertation was to develop a robust, extensible and high-throughput framework for the comprehensive characterization of organic micropollutants in wastewaters by high-resolution accurate-mass mass spectrometry. High-resolution mass spectrometry provides the most powerful analytical technique available for assessing the occurrence and fate of organic pollutants in the water cycle. However, significant limitations in data processing, analysis and interpretation have limited this technique in achieving comprehensive characterization of organic pollutants occurring in natural and built environments. My work aimed to address these challenges by development of automated workflows for the structural characterization of organic pollutants in wastewater and wastewater impacted environments by high-resolution mass spectrometry, and to apply these methods in combination with novel data handling routines to conduct detailed fate studies of wastewater-derived organic micropollutants in the aquatic environment.

In Chapter 2, chemoinformatic tools were implemented along with novel non-targeted mass spectrometric analytical methods to characterize, map, and explore an environmentally-relevant “chemical space” in municipal wastewater. This was accomplished by characterizing the molecular composition of known wastewater-derived organic pollutants and substances that are prioritized as potential wastewater contaminants, using these databases to evaluate the pollutant-likeness of structures postulated for unknown organic compounds that I detected in wastewater extracts using high-resolution mass spectrometry approaches. Results showed that application of multiple computational mass spectrometric tools to structural elucidation of unknown organic pollutants arising in wastewaters improved the efficiency and veracity of screening approaches based on high-resolution mass spectrometry. Furthermore, structural similarity searching was essential for prioritizing substances sharing structural features with known organic pollutants or industrial and consumer chemicals that could enter the environment through use or disposal.

I then applied this comprehensive methodological and computational non-targeted analysis workflow to micropollutant fate analysis in domestic wastewaters (Chapter 3), surface waters impacted by water reuse activities (Chapter 4) and effluents of wastewater treatment facilities receiving wastewater from oil and gas extraction activities (Chapter 5). In Chapter 3, I showed that application of chemometric tools aided in the prioritization of non-targeted compounds arising at various stages of conventional wastewater treatment by partitioning high dimensional data into rational chemical categories based on knowledge of organic chemical fate processes, resulting in the classification of organic micropollutants based on their occurrence and/or removal during treatment. Similarly, in Chapter 4, high-resolution sampling and broad-spectrum targeted and non-targeted chemical analysis were applied to assess the occurrence and fate of organic micropollutants in a water reuse application, wherein reclaimed wastewater was applied for irrigation of turf grass. Results showed that organic micropollutant composition of surface waters receiving runoff from wastewater irrigated areas appeared to be minimally impacted by wastewater-derived organic micropollutants. Finally, Chapter 5 presents results of the comprehensive organic chemical composition of oil and gas wastewaters treated for surface water discharge. Concurrent analysis of effluent samples by complementary, broad-spectrum analytical techniques, revealed that low-levels of hydrophobic organic contaminants, but elevated concentrations of polymeric surfactants, which may effect the fate and analysis of contaminants of concern in oil and gas wastewaters.

Taken together, my work represents significant progress in the characterization of polar organic chemical pollutants associated with wastewater-impacted environments by high-resolution mass spectrometry. Application of these comprehensive methods to examine micropollutant fate processes in wastewater treatment systems, water reuse environments, and water applications in oil/gas exploration yielded new insights into the factors that influence transport, transformation, and persistence of organic micropollutants in these systems across an unprecedented breadth of chemical space.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis explores whether a specific group of large EU law firms exhibited multiple common behaviours regarding their EU geographies between 1998 and 2009. These potentially common behaviours included their preferences for trading in certain EU locations, their usage of law firm alliances, and the specific reasons why they opened or closed EU branch offices. If my hypothesis is confirmed, this may indicate that certain aspects of large law firm geography are predictable – a finding potentially of interest to various stakeholders globally, including legal regulators, academics and law firms. In testing my hypothesis, I have drawn on research conducted by the Globalization and World Cities (GaWC) Research Network to assist me. Between 1999 and 2010, the GaWC published seven research papers exploring the geographies of large US and UK law firms. Several of the GaWC’s observations arising from these studies were evidence-based; others were speculative – including a novel approach for explaining legal practice branch office change, not adopted in research conducted previously or subsequently. By distilling the GaWC’s key observations these papers into a series of “sub-hypotheses”, I been able to test whether the geographical behaviours of my novel cohort of large EU law firms reflect those suggested by the GaWC. The more the GaWC’s suggested behaviours are observed among my cohort, the more my hypothesis will be supported. In conducting this exercise, I will additionally evaluate the extent to which the GaWC’s research has aided our understanding of large EU law firm geography. Ultimately, my findings broadly support most of the GaWC’s observations, notwithstanding our cohort differences and the speculative nature of several of the GaWC’s propositions. My investigation has also allowed me to refine several of the GaWC’s observations regarding commonly-observable large law firm geographical behaviours, while also addressing a key omission from the group’s research output.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The paper catalogues the procedures and steps involved in agroclimatic classification. These vary from conventional descriptive methods to modern computer-based numerical techniques. There are three mutually independent numerical classification techniques, namely Ordination, Cluster analysis, and Minimum spanning tree; and under each technique there are several forms of grouping techniques existing. The vhoice of numerical classification procedure differs with the type of data set. In the case of numerical continuous data sets with booth positive and negative values, the simple and least controversial procedures are unweighted pair group method (UPGMA) and weighted pair group method (WPGMA) under clustering techniques with similarity measure obtained either from Gower metric or standardized Euclidean metric. Where the number of attributes are large, these could be reduced to fewer new attributes defined by the principal components or coordinates by ordination technique. The first few components or coodinates explain the maximum variance in the data matrix. These revided attributes are less affected by noise in the data set. It is possible to check misclassifications using minimum spanning tree.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

© 2014 Cises This work is distributed with License Creative Commons Attribution-Non commercial-No derivatives 4.0 International (CC BY-BC-ND 4.0)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Diabetic Retinopathy (DR) is a complication of diabetes that can lead to blindness if not readily discovered. Automated screening algorithms have the potential to improve identification of patients who need further medical attention. However, the identification of lesions must be accurate to be useful for clinical application. The bag-of-visual-words (BoVW) algorithm employs a maximum-margin classifier in a flexible framework that is able to detect the most common DR-related lesions such as microaneurysms, cotton-wool spots and hard exudates. BoVW allows to bypass the need for pre- and post-processing of the retinographic images, as well as the need of specific ad hoc techniques for identification of each type of lesion. An extensive evaluation of the BoVW model, using three large retinograph datasets (DR1, DR2 and Messidor) with different resolution and collected by different healthcare personnel, was performed. The results demonstrate that the BoVW classification approach can identify different lesions within an image without having to utilize different algorithms for each lesion reducing processing time and providing a more flexible diagnostic system. Our BoVW scheme is based on sparse low-level feature detection with a Speeded-Up Robust Features (SURF) local descriptor, and mid-level features based on semi-soft coding with max pooling. The best BoVW representation for retinal image classification was an area under the receiver operating characteristic curve (AUC-ROC) of 97.8% (exudates) and 93.5% (red lesions), applying a cross-dataset validation protocol. To assess the accuracy for detecting cases that require referral within one year, the sparse extraction technique associated with semi-soft coding and max pooling obtained an AUC of 94.2 ± 2.0%, outperforming current methods. Those results indicate that, for retinal image classification tasks in clinical practice, BoVW is equal and, in some instances, surpasses results obtained using dense detection (widely believed to be the best choice in many vision problems) for the low-level descriptors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Garlic is a spice and a medicinal plant; hence, there is an increasing interest in 'developing' new varieties with different culinary properties or with high content of nutraceutical compounds. Phenotypic traits and dominant molecular markers are predominantly used to evaluate the genetic diversity of garlic clones. However, 24 SSR markers (codominant) specific for garlic are available in the literature, fostering germplasm researches. In this study, we genotyped 130 garlic accessions from Brazil and abroad using 17 polymorphic SSR markers to assess the genetic diversity and structure. This is the first attempt to evaluate a large set of accessions maintained by Brazilian institutions. A high level of redundancy was detected in the collection (50 % of the accessions represented eight haplotypes). However, non-redundant accessions presented high genetic diversity. We detected on average five alleles per locus, Shannon index of 1.2, HO of 0.5, and HE of 0.6. A core collection was set with 17 accessions, covering 100 % of the alleles with minimum redundancy. Overall FST and D values indicate a strong genetic structure within accessions. Two major groups identified by both model-based (Bayesian approach) and hierarchical clustering (UPGMA dendrogram) techniques were coherent with the classification of accessions according to maturity time (growth cycle): early-late and midseason accessions. Assessing genetic diversity and structure of garlic collections is the first step towards an efficient management and conservation of accessions in genebanks, as well as to advance future genetic studies and improvement of garlic worldwide.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Medullary thyroid carcinoma (MTC) originates in the thyroid parafollicular cells and represents 3-4% of the malignant neoplasms that affect this gland. Approximately 25% of these cases are hereditary due to activating mutations in the REarranged during Transfection (RET) proto-oncogene. The course of MTC is indolent, and survival rates depend on the tumor stage at diagnosis. The present article describes clinical evidence-based guidelines for the diagnosis, treatment, and follow-up of MTC. The aim of the consensus described herein, which was elaborated by Brazilian experts and sponsored by the Thyroid Department of the Brazilian Society of Endocrinology and Metabolism, was to discuss the diagnosis, treatment, and follow-up of individuals with MTC in accordance with the latest evidence reported in the literature. After clinical questions were elaborated, the available literature was initially surveyed for evidence in the MedLine-PubMed database, followed by the Embase and Scientific Electronic Library Online/Latin American and Caribbean Health Science Literature (SciELO/Lilacs) databases. The strength of evidence was assessed according to the Oxford classification of evidence levels, which is based on study design, and the best evidence available for each question was selected. Eleven questions corresponded to MTC diagnosis, 8 corresponded to its surgical treatment, and 13 corresponded to follow-up, for a total of 32 recommendations. The present article discusses the clinical and molecular diagnosis, initial surgical treatment, and postoperative management of MTC, as well as the therapeutic options for metastatic disease. MTC should be suspected in individuals who present with thyroid nodules and family histories of MTC, associations with pheochromocytoma and hyperparathyroidism, and/or typical phenotypic characteristics such as ganglioneuromatosis and Marfanoid habitus. Fine-needle nodule aspiration, serum calcitonin measurements, and anatomical-pathological examinations are useful for diagnostic confirmation. Surgery represents the only curative therapeutic strategy. The therapeutic options for metastatic disease remain limited and are restricted to disease control. Judicious postoperative assessments that focus on the identification of residual or recurrent disease are of paramount importance when defining the follow-up and later therapeutic management strategies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To compare the distributions of patients with clinical-pathological subtypes of luminal B-like breast cancer according to the 2011 and 2013 St. Gallen International Breast Cancer Conference Expert Panel. We studied 142 women with breast cancer who were positive to estrogen receptor and had been treated in São Paulo state, southeast Brazil. The expression of the following receptors was assessed by immunohistochemistry: estrogen, progesterone (PR) and Ki-67. The expression of HER-2 was measured by fluorescent in situ hybridization analysis in tissue microarray. There were 29 cases of luminal A breast cancers according to the 2011 St. Gallen International Breast Cancer Conference Expert Panel that were classified as luminal B-like in the 2013 version. Among the 65 luminal B-like breast cancer cases, 29 (45%) were previous luminal A tumors, 15 cases (20%) had a Ki-67 >14% and were at least 20% PR positive and 21 cases (35%) had Ki-67 >14% and more than 20% were PR positive. The 2013 St. Gallen consensus updated the definition of intrinsic molecular subtypes and increased the number of patients classified as having luminal B-like breast cancer in our series, for whom the use of cytotoxic drugs will probably be proposed with additional treatment cost.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Monte Carlo track structures (MCTS) simulations have been recognized as useful tools for radiobiological modeling. However, the authors noticed several issues regarding the consistency of reported data. Therefore, in this work, they analyze the impact of various user defined parameters on simulated direct DNA damage yields. In addition, they draw attention to discrepancies in published literature in DNA strand break (SB) yields and selected methodologies. The MCTS code Geant4-DNA was used to compare radial dose profiles in a nanometer-scale region of interest (ROI) for photon sources of varying sizes and energies. Then, electron tracks of 0.28 keV-220 keV were superimposed on a geometric DNA model composed of 2.7 × 10(6) nucleosomes, and SBs were simulated according to four definitions based on energy deposits or energy transfers in DNA strand targets compared to a threshold energy ETH. The SB frequencies and complexities in nucleosomes as a function of incident electron energies were obtained. SBs were classified into higher order clusters such as single and double strand breaks (SSBs and DSBs) based on inter-SB distances and on the number of affected strands. Comparisons of different nonuniform dose distributions lacking charged particle equilibrium may lead to erroneous conclusions regarding the effect of energy on relative biological effectiveness. The energy transfer-based SB definitions give similar SB yields as the one based on energy deposit when ETH ≈ 10.79 eV, but deviate significantly for higher ETH values. Between 30 and 40 nucleosomes/Gy show at least one SB in the ROI. The number of nucleosomes that present a complex damage pattern of more than 2 SBs and the degree of complexity of the damage in these nucleosomes diminish as the incident electron energy increases. DNA damage classification into SSB and DSB is highly dependent on the definitions of these higher order structures and their implementations. The authors' show that, for the four studied models, different yields are expected by up to 54% for SSBs and by up to 32% for DSBs, as a function of the incident electrons energy and of the models being compared. MCTS simulations allow to compare direct DNA damage types and complexities induced by ionizing radiation. However, simulation results depend to a large degree on user-defined parameters, definitions, and algorithms such as: DNA model, dose distribution, SB definition, and the DNA damage clustering algorithm. These interdependencies should be well controlled during the simulations and explicitly reported when comparing results to experiments or calculations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Matrix-assisted laser desorption/ionization time-of flight mass spectrometry (MALDI-TOF MS) has been widely used for the identification and classification of microorganisms based on their proteomic fingerprints. However, the use of MALDI-TOF MS in plant research has been very limited. In the present study, a first protocol is proposed for metabolic fingerprinting by MALDI-TOF MS using three different MALDI matrices with subsequent multivariate data analysis by in-house algorithms implemented in the R environment for the taxonomic classification of plants from different genera, families and orders. By merging the data acquired with different matrices, different ionization modes and using careful algorithms and parameter selection, we demonstrate that a close taxonomic classification can be achieved based on plant metabolic fingerprints, with 92% similarity to the taxonomic classifications found in literature. The present work therefore highlights the great potential of applying MALDI-TOF MS for the taxonomic classification of plants and, furthermore, provides a preliminary foundation for future research.