974 resultados para cluster analysis


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The echolocation calls of long-tailed bats (Chalinolobus tuberculatus) were recorded in the Eglinton Valley, Fjordland, New Zealand, and digitized for analysis with the signal-processing software. Univariate and multivariate analyses of measure features facilitated a quantitative classification of the calls. Cluster analysis was used to categorize calls into two groups equating to search and terminal buzz calls described qualitatively for other species. When moving from search to terminal phases, the calls decrease in bandwidth, maximum and minimum frequency of call, and duration. Search calls begin with a steep-downward FM sweep followed by a short, less-modulated component. Buzz calls are FM sweeps. Although not found quantitatively, a broad pre-buzz group of calls also was identified. Ambiguity analysis of calls from the three groups shows that search-phrase calls are well suited to resolving the velocity of targets, and hence, identifying moving targets in a stationary clutter. Pre-buzz and buzz calls are better suited to resolving range, a feature that may aid the bats in capture of evasive prey after it has been identified.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Human expert analyses are commonly used in bioacoustic studies and can potentially limit the reproducibility of these results. In this paper, a machine learning method is presented to statistically classify avian vocalizations. Automated approaches were applied to isolate bird songs from long field recordings, assess song similarities, and classify songs into distinct variants. Because no positive controls were available to assess the true classification of variants, multiple replicates of automatic classification of song variants were analyzed to investigate clustering uncertainty. The automatic classifications were more similar to the expert classifications than expected by chance. Application of these methods demonstrated the presence of discrete song variants in an island population of the New Zealand hihi (Notiomystis cincta). The geographic patterns of song variation were then revealed by integrating over classification replicates. Because this automated approach considers variation in song variant classification, it reduces potential human bias and facilitates the reproducibility of the results.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A novel near-infrared spectroscopy (NIRS) method has been researched and developed for the simultaneous analyses of the chemical components and associated properties of mint (Mentha haplocalyx Briq.) tea samples. The common analytes were: total polysaccharide content, total flavonoid content, total phenolic content, and total antioxidant activity. To resolve the NIRS data matrix for such analyses, least squares support vector machines was found to be the best chemometrics method for prediction, although it was closely followed by the radial basis function/partial least squares model. Interestingly, the commonly used partial least squares was unsatisfactory in this case. Additionally, principal component analysis and hierarchical cluster analysis were able to distinguish the mint samples according to their four geographical provinces of origin, and this was further facilitated with the use of the chemometrics classification methods-K-nearest neighbors, linear discriminant analysis, and partial least squares discriminant analysis. In general, given the potential savings with sampling and analysis time as well as with the costs of special analytical reagents required for the standard individual methods, NIRS offered a very attractive alternative for the simultaneous analysis of mint samples.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Quality of fresh-cut carambola (Averrhoa carambola L) is related to many chemical and biochemical variables especially those involved with softening and browning, both influenced by storage temperature. To study these effects, a multivariate analysis was used to evaluate slices packaged in vacuum-sealed polyolefin bags, and stored at 2.5 degrees C, 5 degrees C and 10 degrees C, for up to 16 d. The quality of slices at each temperature was correlated with the duration of storage, O(2) and CO(2) concentration in the package, physical chemical constituents, and activity of enzymes involved in softening (PG) and browning (PPO) metabolism. Three quality groups were identified by hierarchical cluster analysis, and the classification of the components within each of these groups was obtained from a principal component analysis (PCA). The characterization of samples by PCA clearly distinguished acceptable and non-acceptable slices. According to PCA, acceptable slices presented higher ascorbic acid content, greater hue angles ((o)h) and final lightness (L-5) in the first principal component (PC1). On the other hand, non-acceptable slices presented higher total pectin content. PPO activity in the PC1. Non-acceptable slices also presented higher soluble pectin content, increased pectin solubilisation and higher CO(2) concentration in the second principal component (PC2) whereas acceptable slices showed lower total sugar content. The hierarchical cluster and PCA analyses were useful for discriminating the quality of slices stored at different temperatures.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of subpopulations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between subpopulations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in subpopulations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In meteorology, observations and forecasts of a wide range of phenomena for example, snow, clouds, hail, fog, and tornados can be categorical, that is, they can only have discrete values (e.g., "snow" and "no snow"). Concentrating on satellite-based snow and cloud analyses, this thesis explores methods that have been developed for evaluation of categorical products and analyses. Different algorithms for satellite products generate different results; sometimes the differences are subtle, sometimes all too visible. In addition to differences between algorithms, the satellite products are influenced by physical processes and conditions, such as diurnal and seasonal variation in solar radiation, topography, and land use. The analysis of satellite-based snow cover analyses from NOAA, NASA, and EUMETSAT, and snow analyses for numerical weather prediction models from FMI and ECMWF was complicated by the fact that we did not have the true knowledge of snow extent, and we were forced simply to measure the agreement between different products. The Sammon mapping, a multidimensional scaling method, was then used to visualize the differences between different products. The trustworthiness of the results for cloud analyses [EUMETSAT Meteorological Products Extraction Facility cloud mask (MPEF), together with the Nowcasting Satellite Application Facility (SAFNWC) cloud masks provided by Météo-France (SAFNWC/MSG) and the Swedish Meteorological and Hydrological Institute (SAFNWC/PPS)] compared with ceilometers of the Helsinki Testbed was estimated by constructing confidence intervals (CIs). Bootstrapping, a statistical resampling method, was used to construct CIs, especially in the presence of spatial and temporal correlation. The reference data for validation are constantly in short supply. In general, the needs of a particular project drive the requirements for evaluation, for example, for the accuracy and the timeliness of the particular data and methods. In this vein, we discuss tentatively how data provided by general public, e.g., photos shared on the Internet photo-sharing service Flickr, can be used as a new source for validation. Results show that they are of reasonable quality and their use for case studies can be warmly recommended. Last, the use of cluster analysis on meteorological in-situ measurements was explored. The Autoclass algorithm was used to construct compact representations of synoptic conditions of fog at Finnish airports.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Early glasses (about 1066 BC-220 AD) unearthed from Xinjiang of China were chemically characterized by using PIXE and ICP-AES. It was found that these glasses were basically attributed to PbO-BaO-SiO2 system, K2O-SiO2 system, Na2O-CaO-SiO2 system and Na2O-CaO-PbO-SiO2 system. The results from the cluster analysis showed that some glasses had basically similar recipe and technology. The PbO-BaO-SiO2 glass and the K2O-SiO2 glass were thought to come from the central area and the south of ancient China, respectively. The part of the Na2O-CaO-SiO2 glass (including the Na2O-CaO-PbO-SiO2 glass) might be imported from Mesopotamia, while the other part might be locally produced. (c) 2005 Elsevier B.V. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Sargassum muticum is important in maintaining the structure and function of littoral ecosystems, and is used in aquaculture and alginate production, however, little is known about its population genetic attributes. In this study, random amplified polymorphic DNA (RAPD) and inter-simple sequence repeat (ISSR) markers were used to investigate the genetic structure of four populations of S. muticum and one outgroup of S. fusiforme (Harv.) Setchell from Shandong peninsula of China. The selected 24 RAPD primers and 19 ISSR primers amplified 164 loci and 122 loci, respectively. Estimates of genetic diversity with different indicators (P%, percentage of polymorphic loci; H, the expected heterozygosity; I, Shannon's information index) revealed low or moderate level of genetic variations within each S. muticum population, and a high level of genetic differentiations were determined with pairwise unbiased genetic distance (D) and fixation index (F-ST ) among the populations. The Mantel test showed that two types of matrices of D and F-ST were highly correlated whether from RAPD (r = 0.9706, P = 0.009) or ISSR data (r = 0.9161, P = 0.009). Analysis of molecular variance (AMOVA) was conducted to apportion the variations among and within the S. muticum populations. It indicated that variations among populations were higher than those within populations, being 55.82% verse 44.18% by RAPD and 55.21% verse 44.79% by ISSR, respectively. Furthermore, the Mantel test suggested that genetic differentiations among populations were related to the geographical distances (r > 0.6), namely, conformed to the IBD (isolation by distance) model, as expected from UPGMA (unweighted pair group method with arithmetic averages) cluster analysis. On the whole, the high genetic structuring among the four S. muticum populations along the distant locations was clearly indicated in RAPD and ISSR analyses (r > 0.9, P < 0.05) in our study.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Timmis J and Neal M J. An artificial immune system for data analysis. In Proceedings of 3rd international workshop on information processing in cells and tissues (IPCAT), Indianapolis, U.S.A., 1999.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

BACKGROUND: Many patients with diabetes have poor blood pressure (BP) control. Pharmacological therapy is the cornerstone of effective BP treatment, yet there are high rates both of poor medication adherence and failure to intensify medications. Successful medication management requires an effective partnership between providers who initiate and increase doses of effective medications and patients who adhere to the regimen. METHODS: In this cluster-randomized controlled effectiveness study, primary care teams within sites were randomized to a program led by a clinical pharmacist trained in motivational interviewing-based behavioral counseling approaches and authorized to make BP medication changes or to usual care. This study involved the collection of data during a 14-month intervention period in three Department of Veterans Affairs facilities and two Kaiser Permanente Northern California facilities. The clinical pharmacist was supported by clinical information systems that enabled proactive identification of, and outreach to, eligible patients identified on the basis of poor BP control and either medication refill gaps or lack of recent medication intensification. The primary outcome is the relative change in systolic blood pressure (SBP) measurements over time. Secondary outcomes are changes in Hemoglobin A1c, low-density lipoprotein cholesterol (LDL), medication adherence determined from pharmacy refill data, and medication intensification rates. DISCUSSION: Integration of the three intervention elements--proactive identification, adherence counseling and medication intensification--is essential to achieve optimal levels of control for high-risk patients. Testing the effectiveness of this intervention at the team level allows us to study the program as it would typically be implemented within a clinic setting, including how it integrates with other elements of care. TRIAL REGISTRATION: The ClinicalTrials.gov registration number is NCT00495794.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Tumor microenvironmental stresses, such as hypoxia and lactic acidosis, play important roles in tumor progression. Although gene signatures reflecting the influence of these stresses are powerful approaches to link expression with phenotypes, they do not fully reflect the complexity of human cancers. Here, we describe the use of latent factor models to further dissect the stress gene signatures in a breast cancer expression dataset. The genes in these latent factors are coordinately expressed in tumors and depict distinct, interacting components of the biological processes. The genes in several latent factors are highly enriched in chromosomal locations. When these factors are analyzed in independent datasets with gene expression and array CGH data, the expression values of these factors are highly correlated with copy number alterations (CNAs) of the corresponding BAC clones in both the cell lines and tumors. Therefore, variation in the expression of these pathway-associated factors is at least partially caused by variation in gene dosage and CNAs among breast cancers. We have also found the expression of two latent factors without any chromosomal enrichment is highly associated with 12q CNA, likely an instance of "trans"-variations in which CNA leads to the variations in gene expression outside of the CNA region. In addition, we have found that factor 26 (1q CNA) is negatively correlated with HIF-1alpha protein and hypoxia pathways in breast tumors and cell lines. This agrees with, and for the first time links, known good prognosis associated with both a low hypoxia signature and the presence of CNA in this region. Taken together, these results suggest the possibility that tumor segmental aneuploidy makes significant contributions to variation in the lactic acidosis/hypoxia gene signatures in human cancers and demonstrate that latent factor analysis is a powerful means to uncover such a linkage.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In order to address road safety effectively, it is essential to understand all the factors, which
attribute to the occurrence of a road collision. This is achieved through road safety
assessment measures, which are primarily based on historical crash data. Recent advances
in uncertain reasoning technology have led to the development of robust machine learning
techniques, which are suitable for investigating road traffic collision data. These techniques
include supervised learning (e.g. SVM) and unsupervised learning (e.g. Cluster Analysis).
This study extends upon previous research work, carried out in Coll et al. [3], which
proposed a non-linear aggregation framework for identifying temporal and spatial hotspots.
The results from Coll et al. [3] identified Lisburn area as the hotspot, in terms of road safety,
in Northern Ireland. This study aims to use Cluster Analysis, to investigate and highlight any
hidden patterns associated with collisions that occurred in Lisburn area, which in turn, will
provide more clarity in the causation factors so that appropriate countermeasures can be put
in place.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Statistics are regularly used to make some form of comparison between trace evidence or deploy the exclusionary principle (Morgan and Bull, 2007) in forensic investigations. Trace evidence are routinely the results of particle size, chemical or modal analyses and as such constitute compositional data. The issue is that compositional data including percentages, parts per million etc. only carry relative information. This may be problematic where a comparison of percentages and other constraint/closed data is deemed a statistically valid and appropriate way to present trace evidence in a court of law. Notwithstanding an awareness of the existence of the constant sum problem since the seminal works of Pearson (1896) and Chayes (1960) and the introduction of the application of log-ratio techniques (Aitchison, 1986; Pawlowsky-Glahn and Egozcue, 2001; Pawlowsky-Glahn and Buccianti, 2011; Tolosana-Delgado and van den Boogaart, 2013) the problem that a constant sum destroys the potential independence of variances and covariances required for correlation regression analysis and empirical multivariate methods (principal component analysis, cluster analysis, discriminant analysis, canonical correlation) is all too often not acknowledged in the statistical treatment of trace evidence. Yet the need for a robust treatment of forensic trace evidence analyses is obvious. This research examines the issues and potential pitfalls for forensic investigators if the constant sum constraint is ignored in the analysis and presentation of forensic trace evidence. Forensic case studies involving particle size and mineral analyses as trace evidence are used to demonstrate the use of a compositional data approach using a centred log-ratio (clr) transformation and multivariate statistical analyses.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this study we analyse the emerging patterns of regional collaboration for innovation projects in China, using official government statistics of 30 Chinese regions. We propose the use of Ordinal Multidimensional Scaling and Cluster analysis as a robust method to study regional innovation systems. Our results show that regional collaborations amongst organisations can be categorised by means of eight dimensions: public versus private organisational mindset; public versus private resources; innovation capacity versus available infrastructures; innovation input (allocated resources) versus innovation output; knowledge production versus knowledge dissemination; and collaborative capacity versus collaboration output. Collaborations which are aimed to generate innovation fell into 4 categories, those related to highly specialised public research institutions, public universities, private firms and governmental intervention. By comparing the representative cases of regions in terms of these four innovation actors, we propose policy measures for improving regional innovation collaboration within China.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Beyond the classical statistical approaches (determination of basic statistics, regression analysis, ANOVA, etc.) a new set of applications of different statistical techniques has increasingly gained relevance in the analysis, processing and interpretation of data concerning the characteristics of forest soils. This is possible to be seen in some of the recent publications in the context of Multivariate Statistics. These new methods require additional care that is not always included or refered in some approaches. In the particular case of geostatistical data applications it is necessary, besides to geo-reference all the data acquisition, to collect the samples in regular grids and in sufficient quantity so that the variograms can reflect the spatial distribution of soil properties in a representative manner. In the case of the great majority of Multivariate Statistics techniques (Principal Component Analysis, Correspondence Analysis, Cluster Analysis, etc.) despite the fact they do not require in most cases the assumption of normal distribution, they however need a proper and rigorous strategy for its utilization. In this work, some reflections about these methodologies and, in particular, about the main constraints that often occur during the information collecting process and about the various linking possibilities of these different techniques will be presented. At the end, illustrations of some particular cases of the applications of these statistical methods will also be presented.