979 resultados para clustering techniques
Resumo:
Recent advances in non-destructive imaging techniques, such as X-ray computed tomography (CT), make it possible to analyse pore space features from the direct visualisation from soil structures. A quantitative characterisation of the three-dimensional solid-pore architecture is important to understand soil mechanics, as they relate to the control of biological, chemical, and physical processes across scales. This analysis technique therefore offers an opportunity to better interpret soil strata, as new and relevant information can be obtained. In this work, we propose an approach to automatically identify the pore structure of a set of 200-2D images that represent slices of an original 3D CT image of a soil sample, which can be accomplished through non-linear enhancement of the pixel grey levels and an image segmentation based on a PFCM (Possibilistic Fuzzy C-Means) algorithm. Once the solids and pore spaces have been identified, the set of 200-2D images is then used to reconstruct an approximation of the soil sample by projecting only the pore spaces. This reconstruction shows the structure of the soil and its pores, which become more bounded, less bounded, or unbounded with changes in depth. If the soil sample image quality is sufficiently favourable in terms of contrast, noise and sharpness, the pore identification is less complicated, and the PFCM clustering algorithm can be used without additional processing; otherwise, images require pre-processing before using this algorithm. Promising results were obtained with four soil samples, the first of which was used to show the algorithm validity and the additional three were used to demonstrate the robustness of our proposal. The methodology we present here can better detect the solid soil and pore spaces on CT images, enabling the generation of better 2D?3D representations of pore structures from segmented 2D images.
Resumo:
Non-technical losses (NTL) identification and prediction are important tasks for many utilities. Data from customer information system (CIS) can be used for NTL analysis. However, in order to accurately and efficiently perform NTL analysis, the original data from CIS need to be pre-processed before any detailed NTL analysis can be carried out. In this paper, we propose a feature selection based method for CIS data pre-processing in order to extract the most relevant information for further analysis such as clustering and classifications. By removing irrelevant and redundant features, feature selection is an essential step in data mining process in finding optimal subset of features to improve the quality of result by giving faster time processing, higher accuracy and simpler results with fewer features. Detailed feature selection analysis is presented in the paper. Both time-domain and load shape data are compared based on the accuracy, consistency and statistical dependencies between features.
Resumo:
This thesis seeks to describe the development of an inexpensive and efficient clustering technique for multivariate data analysis. The technique starts from a multivariate data matrix and ends with graphical representation of the data and pattern recognition discriminant function. The technique also results in distances frequency distribution that might be useful in detecting clustering in the data or for the estimation of parameters useful in the discrimination between the different populations in the data. The technique can also be used in feature selection. The technique is essentially for the discovery of data structure by revealing the component parts of the data. lhe thesis offers three distinct contributions for cluster analysis and pattern recognition techniques. The first contribution is the introduction of transformation function in the technique of nonlinear mapping. The second contribution is the us~ of distances frequency distribution instead of distances time-sequence in nonlinear mapping, The third contribution is the formulation of a new generalised and normalised error function together with its optimal step size formula for gradient method minimisation. The thesis consists of five chapters. The first chapter is the introduction. The second chapter describes multidimensional scaling as an origin of nonlinear mapping technique. The third chapter describes the first developing step in the technique of nonlinear mapping that is the introduction of "transformation function". The fourth chapter describes the second developing step of the nonlinear mapping technique. This is the use of distances frequency distribution instead of distances time-sequence. The chapter also includes the new generalised and normalised error function formulation. Finally, the fifth chapter, the conclusion, evaluates all developments and proposes a new program. for cluster analysis and pattern recognition by integrating all the new features.
Resumo:
With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.
Resumo:
Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.
Resumo:
Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.
Resumo:
Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.
A new age of fuel performance code criteria studied through advanced atomistic simulation techniques
Resumo:
A fundamental step in understanding the effects of irradiation on metallic uranium and uranium dioxide ceramic fuels, or any material, must start with the nature of radiation damage on the atomic level. The atomic damage displacement results in a multitude of defects that influence the fuel performance. Nuclear reactions are coupled, in that changing one variable will alter others through feedback. In the field of fuel performance modeling, these difficulties are addressed through the use of empirical models rather than models based on first principles. Empirical models can be used as a predictive code through the careful manipulation of input variables for the limited circumstances that are closely tied to the data used to create the model. While empirical models are efficient and give acceptable results, these results are only applicable within the range of the existing data. This narrow window prevents modeling changes in operating conditions that would invalidate the model as the new operating conditions would not be within the calibration data set. This work is part of a larger effort to correct for this modeling deficiency. Uranium dioxide and metallic uranium fuels are analyzed through a kinetic Monte Carlo code (kMC) as part of an overall effort to generate a stochastic and predictive fuel code. The kMC investigations include sensitivity analysis of point defect concentrations, thermal gradients implemented through a temperature variation mesh-grid, and migration energy values. In this work, fission damage is primarily represented through defects on the oxygen anion sublattice. Results were also compared between the various models. Past studies of kMC point defect migration have not adequately addressed non-standard migration events such as clustering and dissociation of vacancies. As such, the General Utility Lattice Program (GULP) code was utilized to generate new migration energies so that additional non-migration events could be included into kMC code in the future for more comprehensive studies. Defect energies were calculated to generate barrier heights for single vacancy migration, clustering and dissociation of two vacancies, and vacancy migration while under the influence of both an additional oxygen and uranium vacancy.
Resumo:
Nigerian scam, also known as advance fee fraud or 419 scam, is a prevalent form of online fraudulent activity that causes financial loss to individuals and businesses. Nigerian scam has evolved from simple non-targeted email messages to more sophisticated scams targeted at users of classifieds, dating and other websites. Even though such scams are observed and reported by users frequently, the community’s understanding of Nigerian scams is limited since the scammers operate “underground”. To better understand the underground Nigerian scam ecosystem and seek effective methods to deter Nigerian scam and cybercrime in general, we conduct a series of active and passive measurement studies. Relying upon the analysis and insight gained from the measurement studies, we make four contributions: (1) we analyze the taxonomy of Nigerian scam and derive long-term trends in scams; (2) we provide an insight on Nigerian scam and cybercrime ecosystems and their underground operation; (3) we propose a payment intervention as a potential deterrent to cybercrime operation in general and evaluate its effectiveness; and (4) we offer active and passive measurement tools and techniques that enable in-depth analysis of cybercrime ecosystems and deterrence on them. We first created and analyze a repository of more than two hundred thousand user-reported scam emails, stretching from 2006 to 2014, from four major scam reporting websites. We select ten most commonly observed scam categories and tag 2,000 scam emails randomly selected from our repository. Based upon the manually tagged dataset, we train a machine learning classifier and cluster all scam emails in the repository. From the clustering result, we find a strong and sustained upward trend for targeted scams and downward trend for non-targeted scams. We then focus on two types of targeted scams: sales scams and rental scams targeted users on Craigslist. We built an automated scam data collection system and gathered large-scale sales scam emails. Using the system we posted honeypot ads on Craigslist and conversed automatically with the scammers. Through the email conversation, the system obtained additional confirmation of likely scam activities and collected additional information such as IP addresses and shipping addresses. Our analysis revealed that around 10 groups were responsible for nearly half of the over 13,000 total scam attempts we received. These groups used IP addresses and shipping addresses in both Nigeria and the U.S. We also crawled rental ads on Craigslist, identified rental scam ads amongst the large number of benign ads and conversed with the potential scammers. Through in-depth analysis of the rental scams, we found seven major scam campaigns employing various operations and monetization methods. We also found that unlike sales scammers, most rental scammers were in the U.S. The large-scale scam data and in-depth analysis provide useful insights on how to design effective deterrence techniques against cybercrime in general. We study underground DDoS-for-hire services, also known as booters, and measure the effectiveness of undermining a payment system of DDoS Services. Our analysis shows that the payment intervention can have the desired effect of limiting cybercriminals’ ability and increasing the risk of accepting payments.
Resumo:
© 2014 Cises This work is distributed with License Creative Commons Attribution-Non commercial-No derivatives 4.0 International (CC BY-BC-ND 4.0)
Resumo:
The aim of this investigation was to compare the skeletal stability of three different rigid fixation methods after mandibular advancement. Fifty-five class II malocclusion patients treated with the use of bilateral sagittal split ramus osteotomy and mandibular advancement were selected for this retrospective study. Group 1 (n = 17) had miniplates with monocortical screws, Group 2 (n = 16) had bicortical screws and Group 3 (n = 22) had the osteotomy fixed by means of the hybrid technique. Cephalograms were taken preoperatively, 1 week within the postoperative care period, and 6 months after the orthognathic surgery. Linear and angular changes of the cephalometric landmarks of the chin region were measured at each period, and the changes at each cephalometric landmark were determined for the time gaps. Postoperative changes in the mandibular shape were analyzed to determine the stability of fixation methods. There was minimum difference in the relapse of the mandibular advancement among the three groups. Statistical analysis showed no significant difference in postoperative stability. However, a positive correlation between the amount of advancement and the amount of postoperative relapse was demonstrated by the linear multiple regression test (p < 0.05). It can be concluded that all techniques can be used to obtain stable postoperative results in mandibular advancement after 6 months.
Resumo:
Quantification of dermal exposure to pesticides in rural workers, used in risk assessment, can be performed with different techniques such as patches or whole body evaluation. However, the wide variety of methods can jeopardize the process by producing disparate results, depending on the principles in sample collection. A critical review was thus performed on the main techniques for quantifying dermal exposure, calling attention to this issue and the need to establish a single methodology for quantification of dermal exposure in rural workers. Such harmonization of different techniques should help achieve safer and healthier working conditions. Techniques that can provide reliable exposure data are an essential first step towards avoiding harm to workers' health.
Resumo:
The Centers for High Cost Medication (Centros de Medicação de Alto Custo, CEDMAC), Health Department, São Paulo were instituted by project in partnership with the Clinical Hospital of the Faculty of Medicine, USP, sponsored by the Foundation for Research Support of the State of São Paulo (Fundação de Amparo à Pesquisa do Estado de São Paulo, FAPESP) aimed at the formation of a statewide network for comprehensive care of patients referred for use of immunobiological agents in rheumatological diseases. The CEDMAC of Hospital de Clínicas, Universidade Estadual de Campinas (HC-Unicamp), implemented by the Division of Rheumatology, Faculty of Medical Sciences, identified the need for standardization of the multidisciplinary team conducts, in face of the specificity of care conducts, verifying the importance of describing, in manual format, their operational and technical processes. The aim of this study is to present the methodology applied to the elaboration of the CEDMAC/HC-Unicamp Manual as an institutional tool, with the aim of offering the best assistance and administrative quality. In the methodology for preparing the manuals at HC-Unicamp since 2008, the premise was to obtain a document that is participatory, multidisciplinary, focused on work processes integrated with institutional rules, with objective and didactic descriptions, in a standardized format and with electronic dissemination. The CEDMAC/HC-Unicamp Manual was elaborated in 10 months, with involvement of the entire multidisciplinary team, with 19 chapters on work processes and techniques, in addition to those concerning the organizational structure and its annexes. Published in the electronic portal of HC Manuals in July 2012 as an e-Book (ISBN 978-85-63274-17-5), the manual has been a valuable instrument in guiding professionals in healthcare, teaching and research activities.
Resumo:
Garlic is a spice and a medicinal plant; hence, there is an increasing interest in 'developing' new varieties with different culinary properties or with high content of nutraceutical compounds. Phenotypic traits and dominant molecular markers are predominantly used to evaluate the genetic diversity of garlic clones. However, 24 SSR markers (codominant) specific for garlic are available in the literature, fostering germplasm researches. In this study, we genotyped 130 garlic accessions from Brazil and abroad using 17 polymorphic SSR markers to assess the genetic diversity and structure. This is the first attempt to evaluate a large set of accessions maintained by Brazilian institutions. A high level of redundancy was detected in the collection (50 % of the accessions represented eight haplotypes). However, non-redundant accessions presented high genetic diversity. We detected on average five alleles per locus, Shannon index of 1.2, HO of 0.5, and HE of 0.6. A core collection was set with 17 accessions, covering 100 % of the alleles with minimum redundancy. Overall FST and D values indicate a strong genetic structure within accessions. Two major groups identified by both model-based (Bayesian approach) and hierarchical clustering (UPGMA dendrogram) techniques were coherent with the classification of accessions according to maturity time (growth cycle): early-late and midseason accessions. Assessing genetic diversity and structure of garlic collections is the first step towards an efficient management and conservation of accessions in genebanks, as well as to advance future genetic studies and improvement of garlic worldwide.
Resumo:
Abstract The aim of this study was to evaluate three transfer techniques used to obtain working casts of implant-supported prostheses through the marginal misfit and strain induced to metallic framework. Thirty working casts were obtained from a metallic master cast, each one containing two implant analogues simulating a clinical situation of three-unit implant-supported fixed prostheses, according to the following transfer impression techniques: Group A, squared transfers splinted with dental floss and acrylic resin, sectioned and re-splinted; Group B, squared transfers splinted with dental floss and bis-acrylic resin; and Group N, squared transfers not splinted. A metallic framework was made for marginal misfit and strain measurements from the metallic master cast. The misfit between metallic framework and the working casts was evaluated with an optical microscope following the single-screw test protocol. In the same conditions, the strain was evaluated using strain gauges placed on the metallic framework. The data was submitted to one-way ANOVA followed by the Tukey's test (α=5%). For both marginal misfit and strain, there were statistically significant differences between Groups A and N (p<0.01) and Groups B and N (p<0.01), with greater values for the Group N. According to the Pearson's test, there was a positive correlation between the variables misfit and strain (r=0.5642). The results of this study showed that the impression techniques with splinted transfers promoted better accuracy than non-splinted one, regardless of the splinting material utilized.