14 resultados para Elements, High Trhoughput Data, elettrofisiologia, elaborazione dati, analisi Real Time
em University of Queensland eSpace - Australia
Resumo:
Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.
Resumo:
Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.
Resumo:
Endochondral bone is formed during an avascular period in an environment of low oxygen. Under these conditions, pluripotential mesenchymal stromal cells preferentially differentiate into chondrocytes and form cartilage. In this study, we investigated the hypothesis that oxygen tension modulates bone mesenchymal cell fate by altering the expression of genes that function to promote chondrogenesis. Microarray of RNA samples from ST2 cells revealed significant changes in 728 array elements (P < 0.01) in response to hypoxia. Real-time PCR on these RNA samples, and separate samples from C3H10T1/2 cells, revealed hypoxia-induced changes in the expression of additional genes known to be expressed by chondrocytes including Sox9 and its downstream targets aggrecan and Col2a. These changes were accompanied by the accumulation of mucopolysacharide as detected by alcian blue staining. To investigate the mechanisms responsible for upregulation of Sox9 by hypoxia, we determined the effect of hypoxia on HIF-1 alpha levels and Sox9 promoter activity in ST2 cells. Hypoxia increased nuclear accumulation of HIF-1 alpha and activated the Sox9 promoter. The ability of hypoxia to transactivate the Sox9 promoter was virtually abolished by deletion of HIF-1 alpha consensus sites within the proximal promoter. These findings suggest that hypoxia promotes the differentiation of mesenchymal cells along a chondrocyte pathway in part by activating Sox-9 via a HIF-1 alpha-dependent mechanism. (c) 2005 Elsevier Inc. All rights reserved.
Resumo:
Complete rare earth element (except Eu) and Y concentrations from the estuarine mixing zone (salinity =0.2 to 33) of Elimbah Creek, Queensland, Australia, were measured by quadrupole ICP-MS without preconcentration. High sampling density in the low salinity regime along with high quality data allow accurate tracing of the development of the typical marine rare earth element anomalies as well as Y/Ho fractionation. Over the entire estuary, the rare earth elements are strongly removed relative to a freshwater endmember (60-80% removal). This large overall removal occurs despite a strong remineralisation peak (190% for La, 130% for Y relative to the freshwater endmember) in the mid-salinity zone. Removal and remineralisation are accompanied by fractionation of the original (freshwater) rare earth element pattern, resulting in light rare earth element depletion. Estuarine fractionation generates a large positive La anomaly and a superchondritic Y/Ho ratio. Conversely, we observe no evidence to support the generation of the negative Ce anomaly in the estuary. With the exception of Ce, the typical marine rare earth element features can thus be attributed to estuarine mixing processes. The persistence of these features in hydrogenous sediments for at least 3.71 Ga highlights the importance of estuarine processes for marine chemistry on geological timescales. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
Teledermatology can provide both accurate and reliable specialist care at a distance. This article reviews current data on the quality of care that teledermatology provides, as well as the societal cost benefits involved in the implementation of the technique. Teledermatology is most suited to patients unable to access specialist. services for geographical or social reasons. Patients are generally satisfied with the overall care that teledermatology provides. Real-time teledermatology is more expensive than conventional care for health services. However, significant savings can be expected from the patient's perspective due to reduced travel. Appropriate patient selection, improved technology and adequate clinical workloads may improve both the quality and cost effectiveness of this service.
Resumo:
Recently, we identified a large number of ultraconserved (uc) sequences in noncoding regions of human, mouse, and rat genomes that appear to be essential for vertebrate and amniote ontogeny. Here, we used similar methods to identify ultraconserved genomic regions between the insect species Drosophila melanogaster and Drosophila pseudoobscura, as well as the more distantly related Anopheles gambiae. As with vertebrates, ultraconserved sequences in insects appear to Occur primarily in intergenic and intronic sequences, and at intron-exon junctions. The sequences are significantly associated with genes encoding developmental regulators and transcription factors, but are less frequent and are smaller in size than in vertebrates. The longest identical, nongapped orthologous match between the three genomes was found within the homothorax (hth) gene. This sequence spans an internal exon-intron junction, with the majority located within the intron, and is predicted to form a highly stable stem-loop RNA structure. Real-time quantitative PCR analysis of different hth splice isoforms and Northern blotting showed that the conserved element is associated with a high incidence of intron retention in hth pre-mRNA, suggesting that the conserved intronic element is critically important in the post-transcriptional regulation of hth expression in Diptera.
Resumo:
In many online applications, we need to maintain quantile statistics for a sliding window on a data stream. The sliding windows in natural form are defined as the most recent N data items. In this paper, we study the problem of estimating quantiles over other types of sliding windows. We present a uniform framework to process quantile queries for time constrained and filter based sliding windows. Our algorithm makes one pass on the data stream and maintains an E-approximate summary. It uses O((1)/(epsilon2) log(2) epsilonN) space where N is the number of data items in the window. We extend this framework to further process generalized constrained sliding window queries and proved that our technique is applicable for flexible window settings. Our performance study indicates that the space required in practice is much less than the given theoretical bound and the algorithm supports high speed data streams.
Resumo:
Indexing high dimensional datasets has attracted extensive attention from many researchers in the last decade. Since R-tree type of index structures are known as suffering curse of dimensionality problems, Pyramid-tree type of index structures, which are based on the B-tree, have been proposed to break the curse of dimensionality. However, for high dimensional data, the number of pyramids is often insufficient to discriminate data points when the number of dimensions is high. Its effectiveness degrades dramatically with the increase of dimensionality. In this paper, we focus on one particular issue of curse of dimensionality; that is, the surface of a hypercube in a high dimensional space approaches 100% of the total hypercube volume when the number of dimensions approaches infinite. We propose a new indexing method based on the surface of dimensionality. We prove that the Pyramid tree technology is a special case of our method. The results of our experiments demonstrate clear priority of our novel method.
Resumo:
In recent years many real time applications need to handle data streams. We consider the distributed environments in which remote data sources keep on collecting data from real world or from other data sources, and continuously push the data to a central stream processor. In these kinds of environments, significant communication is induced by the transmitting of rapid, high-volume and time-varying data streams. At the same time, the computing overhead at the central processor is also incurred. In this paper, we develop a novel filter approach, called DTFilter approach, for evaluating the windowed distinct queries in such a distributed system. DTFilter approach is based on the searching algorithm using a data structure of two height-balanced trees, and it avoids transmitting duplicate items in data streams, thus lots of network resources are saved. In addition, theoretical analysis of the time spent in performing the search, and of the amount of memory needed is provided. Extensive experiments also show that DTFilter approach owns high performance.
Resumo:
Although managers consider accurate, timely, and relevant information as critical to the quality of their decisions, evidence of large variations in data quality abounds. Over a period of twelve months, the action research project reported herein attempted to investigate and track data quality initiatives undertaken by the participating organisation. The investigation focused on two types of errors: transaction input errors and processing errors. Whenever the action research initiative identified non-trivial errors, the participating organisation introduced actions to correct the errors and prevent similar errors in the future. Data quality metrics were taken quarterly to measure improvements resulting from the activities undertaken during the action research project. The action research project results indicated that for a mission-critical database to ensure and maintain data quality, commitment to continuous data quality improvement is necessary. Also, communication among all stakeholders is required to ensure common understanding of data quality improvement goals. The action research project found that to further substantially improve data quality, structural changes within the organisation and to the information systems are sometimes necessary. The major goal of the action research study is to increase the level of data quality awareness within all organisations and to motivate them to examine the importance of achieving and maintaining high-quality data.
Resumo:
Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.
Resumo:
Large amounts of information can be overwhelming and costly to process, especially when transmitting data over a network. A typical modern Geographical Information System (GIS) brings all types of data together based on the geographic component of the data and provides simple point-and-click query capabilities as well as complex analysis tools. Querying a Geographical Information System, however, can be prohibitively expensive due to the large amounts of data which may need to be processed. Since the use of GIS technology has grown dramatically in the past few years, there is now a need more than ever, to provide users with the fastest and least expensive query capabilities, especially since an approximated 80 % of data stored in corporate databases has a geographical component. However, not every application requires the same, high quality data for its processing. In this paper we address the issues of reducing the cost and response time of GIS queries by preaggregating data by compromising the data accuracy and precision. We present computational issues in generation of multi-level resolutions of spatial data and show that the problem of finding the best approximation for the given region and a real value function on this region, under a predictable error, in general is "NP-complete.