40 resultados para Data clustering. Fuzzy C-Means. Cluster centers initialization. Validation indices
Resumo:
Growing pot poinsettia and similar crops involves careful crop monitoring and management to ensure that height specifications are met. Graphical tracking represents a target driven approach to decision support with simple interpretation. HDC (Horticultural Development Council) Poinsettia Tracker implements a graphical track based on the Generalised Logistic Curve, similar to that of other tracking packages. Any set of curve parameters can be used to track crop progress. However, graphical tracks must be expected to be site and cultivar specific. By providing a simple Curve fitting function, growers can easily develop their own site and variety specific ideal tracks based on past records with increasing quality as more seasons' data are added. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Pulsed Phase Thermography (PPT) has been proven effective on depth retrieval of flat-bottomed holes in different materials such as plastics and aluminum. In PPT, amplitude and phase delay signatures are available following data acquisition (carried out in a similar way as in classical Pulsed Thermography), by applying a transformation algorithm such as the Fourier Transform (FT) on thermal profiles. The authors have recently presented an extended review on PPT theory, including a new inversion technique for depth retrieval by correlating the depth with the blind frequency fb (frequency at which a defect produce enough phase contrast to be detected). An automatic defect depth retrieval algorithm had also been proposed, evidencing PPT capabilities as a practical inversion technique. In addition, the use of normalized parameters to account for defect size variation as well as depth retrieval from complex shape composites (GFRP and CFRP) are currently under investigation. In this paper, steel plates containing flat-bottomed holes at different depths (from 1 to 4.5 mm) are tested by quantitative PPT. Least squares regression results show excellent agreement between depth and the inverse square root blind frequency, which can be used for depth inversion. Experimental results on steel plates with simulated corrosion are presented as well. It is worth noting that results are improved by performing PPT on reconstructed (synthetic) rather than on raw thermal data.
Resumo:
Inspired by a type of synesthesia where colour typically induces musical notes the MusiCam project investigates this unusual condition, particularly the transition from colour to sound. MusiCam explores the potential benefits of this idiosyncrasy as a mode of human computer interaction (1-10), providing a host of meaningful applications spanning control, communication and composition. Colour data is interpreted by means of an off-the-shelf webcam, and music is generated in real-time through regular speakers. By making colour-based gestures users can actively control the parameters of sounds, compose melodies and motifs or mix multiple tracks on the fly. The system shows great potential as an interactive medium and as a musical controller. The trials conducted to date have produced encouraging results, and only hint at the new possibilities achievable by such a device.
Resumo:
Satellite measurements of the radiation budget and data from the U.S. National Centers for Environmental Prediction–National Center for Atmospheric Research reanalysis are used to investigate the links between anomalous cloud radiative forcing over the tropical west Pacific warm pool and the tropical dynamics and sea surface temperature (SST) distribution during 1998. The ratio, N, of the shortwave cloud forcing (SWCF) to longwave cloud forcing (LWCF) (N = −SWCF/LWCF) is used to infer information on cloud altitude. A higher than average N during 1998 appears to be related to two separate phenomena. First, dynamic regime-dependent changes explain high values of N (associated with low cloud altitude) for small magnitudes of SWCF and LWCF (low cloud fraction), which reflect the unusual occurrence of mean subsiding motion over the tropical west Pacific during 1998, associated with the anomalous SST distribution. Second, Tropics-wide long-term changes in the spatial-mean cloud forcing, independent of dynamic regime, explain the higher values of N during both 1998 and in 1994/95. The changes in dynamic regime and their anomalous structure in 1998 are well simulated by version HadAM3 of the Hadley Centre climate model, forced by the observed SSTs. However, the LWCF and SWCF are poorly simulated, as are the interannual changes in N. It is argued that improved representation of LWCF and SWCF and their dependence on dynamical forcing are required before the cloud feedbacks simulated by climate models can be trusted.
Resumo:
(ABR) is of fundamental importance to the investiga- tion of the auditory system behavior, though its in- terpretation has a subjective nature because of the manual process employed in its study and the clinical experience required for its analysis. When analyzing the ABR, clinicians are often interested in the identi- fication of ABR signal components referred to as Jewett waves. In particular, the detection and study of the time when these waves occur (i.e., the wave la- tency) is a practical tool for the diagnosis of disorders affecting the auditory system. In this context, the aim of this research is to compare ABR manual/visual analysis provided by different examiners. Methods: The ABR data were collected from 10 normal-hearing subjects (5 men and 5 women, from 20 to 52 years). A total of 160 data samples were analyzed and a pair- wise comparison between four distinct examiners was executed. We carried out a statistical study aiming to identify significant differences between assessments provided by the examiners. For this, we used Linear Regression in conjunction with Bootstrap, as a me- thod for evaluating the relation between the responses given by the examiners. Results: The analysis sug- gests agreement among examiners however reveals differences between assessments of the variability of the waves. We quantified the magnitude of the ob- tained wave latency differences and 18% of the inves- tigated waves presented substantial differences (large and moderate) and of these 3.79% were considered not acceptable for the clinical practice. Conclusions: Our results characterize the variability of the manual analysis of ABR data and the necessity of establishing unified standards and protocols for the analysis of these data. These results may also contribute to the validation and development of automatic systems that are employed in the early diagnosis of hearing loss.
Resumo:
Precipitation indices are commonly used as climate change indicators. Considering four Climate Variability and Predictability-recommended indices, this study assesses possible changes in their spatial patterns over Portugal under future climatic conditions. Precipitation data from the regional climate model Consortium for Small-Scale Modelling–Climate version of the Local Model (CCLM) ensemble simulations with ECHAM5/MPI-OM1 boundary conditions are used for this purpose. For recent–past, medians and probability density functions of the CCLM-based indices are validated against station-based and gridded observational dataset from ENSEMBLES-based (gridded daily precipitation data provided by the European Climate Assessment & Dataset project) indices. It is demonstrated that the model is able to realistically reproduce not only precipitation but also the corresponding extreme indices. Climate change projections for 2071–2100 (A1B and B1 SRES scenarios) reveal significant decreases in total precipitation, particularly in autumn over northwestern and southern Portugal, though changes exhibit distinct local and seasonal patterns and are typically stronger for A1B than for B1. The increase in winter precipitation over northeastern Portugal in A1B is the most important exception to the overall drying trend. Contributions of extreme precipitation events to total precipitation are also expected to increase, mainly in winter and spring over northeastern Portugal. Strong projected increases in the dry spell lengths in autumn and spring are also noteworthy, giving evidence for an extension of the dry season from summer to spring and autumn. Although no coupling analysis is undertaken, these changes are qualitatively related to modifications in the large-scale circulation over the Euro-Atlantic area, more specifically to shifts in the position of the Azores High and associated changes in the large-scale pressure gradient over the area.
Resumo:
Income growth in highly industrialised countries has resulted in consumer choice of foodstuffs no longer being primarily influenced by basic factors such as price and organoleptic features. From this perspective, the present study sets out to evaluate how and to what extent consumer choice is influenced by the possible negative effects on health and environment caused by the consumption of fruit containing deposits of pesticides and chemical products. The study describes the results of a survey which explores and estimates consumer willingness to pay in two forms: a yearly contribution for the abolition of the use of pesticides on fruit, and a premium price for organically grown apples guaranteed by a certified label. The same questionnaire was administered to two samples. The first was a conventional face-to-face survey of customers of large retail outlets located around Bologna (Italy); the second was an Internet sample. The discrete choice data were analysed by means of probit and tobit models to estimate the utility consumers attribute to organically grown fruit and to a pesticide ban. The research also addresses questions of validity and representativeness as a fundamental problem in web-based surveys.
Resumo:
1. Comparative analyses are used to address the key question of what makes a species more prone to extinction by exploring the links between vulnerability and intrinsic species’ traits and/or extrinsic factors. This approach requires comprehensive species data but information is rarely available for all species of interest. As a result comparative analyses often rely on subsets of relatively few species that are assumed to be representative samples of the overall studied group. 2. Our study challenges this assumption and quantifies the taxonomic, spatial, and data type biases associated with the quantity of data available for 5415 mammalian species using the freely available life-history database PanTHERIA. 3. Moreover, we explore how existing biases influence results of comparative analyses of extinction risk by using subsets of data that attempt to correct for detected biases. In particular, we focus on links between four species’ traits commonly linked to vulnerability (distribution range area, adult body mass, population density and gestation length) and conduct univariate and multivariate analyses to understand how biases affect model predictions. 4. Our results show important biases in data availability with c.22% of mammals completely lacking data. Missing data, which appear to be not missing at random, occur frequently in all traits (14–99% of cases missing). Data availability is explained by intrinsic traits, with larger mammals occupying bigger range areas being the best studied. Importantly, we find that existing biases affect the results of comparative analyses by overestimating the risk of extinction and changing which traits are identified as important predictors. 5. Our results raise concerns over our ability to draw general conclusions regarding what makes a species more prone to extinction. Missing data represent a prevalent problem in comparative analyses, and unfortunately, because data are not missing at random, conventional approaches to fill data gaps, are not valid or present important challenges. These results show the importance of making appropriate inferences from comparative analyses by focusing on the subset of species for which data are available. Ultimately, addressing the data bias problem requires greater investment in data collection and dissemination, as well as the development of methodological approaches to effectively correct existing biases.
Resumo:
The Mersey Basin has been significantly polluted for over 200 years. However, there is a lack of quantitative historical water quality data as effective water quality monitoring and data recording only began 30-40 years ago. This paper assesses water pollution in the Mersey Basin using a Water Pollution Index constructed from social and economic data. Methodology, output and the difficulties involved with validation are discussed. With the limited data input available the index approximately reproduces historical water quality. The paper illustrates how historical studies of environmental water quality may provide valuable identification of factors responsible for pollution and a marker set for contemporary and future water quality issues in the context of the past. This is an issue of growing research interest.
Resumo:
This study details validation of two separate multiplex STR systems for use in paternity investigations. These are the Second Generation Multiplex (SGM) developed by the UK Forensic Science Service and the PowerPlex 1 multiplex commercially available from Promega Inc. (Madison, WI, USA). These multiplexes contain 12 different STR systems (two are duplicated in the two systems). Population databases from Caucasian, Asian and Afro-Caribbean populations have been compiled for all loci. In all but two of the 36 STR/ethnic group combinations, no evidence was obtained to indicate inconsistency with Hardy-Weinberg (HW) proportions. Empirical and theoretical approaches have been taken to validate these systems for paternity testing. Samples from 121 cases of disputed paternity were analysed using established Single Locus Probe (SLP) tests currently in use, and also using the two multiplex STR systems. Results of all three test systems were compared and no non-conformities in the conclusions were observed, although four examples of apparent germ line mutations in the STR systems were identified. The data was analysed to give information on expected paternity indices and exclusion rates for these STR systems. The 12 systems combined comprise a highly discriminating test suitable for paternity testing. 99.96% of non-fathers are excluded from paternity on two or more STR systems. Where no exclusion is found, Paternity Index (PI) values of > 10,000 are expected in > 96% of cases.
Resumo:
Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.
Resumo:
This paper deals with the selection of centres for radial basis function (RBF) networks. A novel mean-tracking clustering algorithm is described as a way in which centers can be chosen based on a batch of collected data. A direct comparison is made between the mean-tracking algorithm and k-means clustering and it is shown how mean-tracking clustering is significantly better in terms of achieving an RBF network which performs accurate function modelling.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element