899 resultados para Numeric sets
Resumo:
Background: Large gene expression studies, such as those conducted using DNA arrays, often provide millions of different pieces of data. To address the problem of analyzing such data, we describe a statistical method, which we have called ‘gene shaving’. The method identifies subsets of genes with coherent expression patterns and large variation across conditions. Gene shaving differs from hierarchical clustering and other widely used methods for analyzing gene expression studies in that genes may belong to more than one cluster, and the clustering may be supervised by an outcome measure. The technique can be ‘unsupervised’, that is, the genes and samples are treated as unlabeled, or partially or fully supervised by using known properties of the genes or samples to assist in finding meaningful groupings. Results: We illustrate the use of the gene shaving method to analyze gene expression measurements made on samples from patients with diffuse large B-cell lymphoma. The method identifies a small cluster of genes whose expression is highly predictive of survival. Conclusions: The gene shaving method is a potentially useful tool for exploration of gene expression data and identification of interesting clusters of genes worth further investigation.
Resumo:
Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for storage space. CMS data is stored using HDFS (Hadoop Distributed File System). HDFS logs give information regarding file access operations. Hadoop MapReduce was used to feed information in these logs to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression which is used in this Thesis to develop a classifier. Time elapsed in data set classification by this method is dependent on the size of the input HDFS log file since the algorithmic complexities of Hadoop MapReduce algorithms here are O(n). The SVM methodology produces a list of data sets for deletion along with their respective sizes. This methodology was also compared with a heuristic called Retention Cost which was calculated using size of the data set and the time since its last access to help decide how useful a data set is. Accuracies of both were compared by calculating the percentage of data sets predicted for deletion which were accessed at a later instance of time. Our methodology using SVMs proved to be more accurate than using the Retention Cost heuristic. This methodology could be used to solve similar problems involving other large data sets.
Resumo:
Semi-supervised learning is one of the important topics in machine learning, concerning with pattern classification where only a small subset of data is labeled. In this paper, a new network-based (or graph-based) semi-supervised classification model is proposed. It employs a combined random-greedy walk of particles, with competition and cooperation mechanisms, to propagate class labels to the whole network. Due to the competition mechanism, the proposed model has a local label spreading fashion, i.e., each particle only visits a portion of nodes potentially belonging to it, while it is not allowed to visit those nodes definitely occupied by particles of other classes. In this way, a "divide-and-conquer" effect is naturally embedded in the model. As a result, the proposed model can achieve a good classification rate while exhibiting low computational complexity order in comparison to other network-based semi-supervised algorithms. Computer simulations carried out for synthetic and real-world data sets provide a numeric quantification of the performance of the method.
Resumo:
The present study compared the changes in markers of muscle damage after bouts of resistance exercise employing the Multiple-sets (MS) and Half-pyramid (HP) training systems. Ten healthy men (26.1 +/- 6.3 years), who had been involved in regular resistance training, performed MS and HP bouts, 14 days apart, in a randomised, counter-balanced manner. For the MS bout, participants performed three sets of maximum repetitions at 75%-1RM (i.e. 75% of a One Repetition Maximum) for the three exercises, starting with the bench press, followed by pec deck and decline bench press. For the HP bout, the participants performed three sets of maximum repetitions with 67%-1RM, 74%-1RM and 80%-1RM for the first, second and third sets, respectively, for the same three exercise sequences as the MS bout. The total volume of load lifted was equated between both bouts. Muscle soreness, plasma creatine kinase (CK) activity, myoglobin (Mb) and C-reactive protein (CRP) concentrations were assessed before and for three days after each exercise bout, and the changes over time were compared between MS and HP using two-way repeated measures ANOVA. Muscle soreness developed significantly (P<0.01) after both bouts, but no significant difference was observed between MS and HP. Plasma CK activity and Mb concentration increased significantly (P<0.01) without significant differences between bouts, and CRP concentration did not change significantly after either bout. These results suggest that the muscle damage profile is similar for MS and HP, probably due to the similar total volume of load lifted.
Resumo:
We show that if f is a homeomorphism of the 2-torus isotopic to the identity and its lift (f) over tilde is transitive, or even if it is transitive outside the lift of the elliptic islands, then (0,0) is in the interior of the rotation set of (f) over tilde. This proves a particular case of Boyland's conjecture.
Resumo:
Facial reconstruction is a method that seeks to recreate a person's facial appearance from his/her skull. This technique can be the last resource used in a forensic investigation, when identification techniques such as DNA analysis, dental records, fingerprints and radiographic comparison cannot be used to identify a body or skeletal remains. To perform facial reconstruction, the data of facial soft tissue thickness are necessary. Scientific literature has described differences in the thickness of facial soft tissue between ethnic groups. There are different databases of soft tissue thickness published in the scientific literature. There are no literature records of facial reconstruction works carried out with data of soft tissues obtained from samples of Brazilian subjects. There are also no reports of digital forensic facial reconstruction performed in Brazil. There are two databases of soft tissue thickness published for the Brazilian population: one obtained from measurements performed in fresh cadavers (fresh cadavers' pattern), and another from measurements using magnetic resonance imaging (Magnetic Resonance pattern). This study aims to perform three different characterized digital forensic facial reconstructions (with hair, eyelashes and eyebrows) of a Brazilian subject (based on an international pattern and two Brazilian patterns for soft facial tissue thickness), and evaluate the digital forensic facial reconstructions comparing them to photos of the individual and other nine subjects. The DICOM data of the Computed Tomography (CT) donated by a volunteer were converted into stereolitography (STL) files and used for the creation of the digital facial reconstructions. Once the three reconstructions were performed, they were compared to photographs of the subject who had the face reconstructed and nine other subjects. Thirty examiners participated in this recognition process. The target subject was recognized by 26.67% of the examiners in the reconstruction performed with the Brazilian Magnetic Resonance Pattern, 23.33% in the reconstruction performed with the Brazilian Fresh Cadavers Pattern and 20.00% in the reconstruction performed with the International Pattern, in which the target-subject was the most recognized subject in the first two patterns. The rate of correct recognitions of the target subject indicate that the digital forensic facial reconstruction, conducted with parameters used in this study, may be a useful tool. (C) 2011 Elsevier Ireland Ltd. All rights reserved.
Resumo:
This paper analyzes concepts of independence and assumptions of convexity in the theory of sets of probability distributions. The starting point is Kyburg and Pittarelli's discussion of "convex Bayesianism" (in particular their proposals concerning E-admissibility, independence, and convexity). The paper offers an organized review of the literature on independence for sets of probability distributions; new results on graphoid properties and on the justification of "strong independence" (using exchangeability) are presented. Finally, the connection between Kyburg and Pittarelli's results and recent developments on the axiomatization of non-binary preferences, and its impact on "complete" independence, are described.
Resumo:
The analysis of spatial relations among objects in an image is an important vision problem that involves both shape analysis and structural pattern recognition. In this paper, we propose a new approach to characterize the spatial relation along, an important feature of spatial configurations in space that has been overlooked in the literature up to now. We propose a mathematical definition of the degree to which an object A is along an object B, based on the region between A and B and a degree of elongatedness of this region. In order to better fit the perceptual meaning of the relation, distance information is included as well. In order to cover a more wide range of potential applications, both the crisp and fuzzy cases are considered. In the crisp case, the objects are represented in terms of 2D regions or ID contours, and the definition of the alongness between them is derived from a visibility notion and from the region between the objects. However, the computational complexity of this approach leads us to the proposition of a new model to calculate the between region using the convex hull of the contours. On the fuzzy side, the region-based approach is extended. Experimental results obtained using synthetic shapes and brain structures in medical imaging corroborate the proposed model and the derived measures of alongness, thus showing that they agree with the common sense. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
The importance of mechanical aspects related to cell activity and its environment is becoming more evident due to their influence in stem cell differentiation and in the development of diseases such as atherosclerosis. The mechanical tension homeostasis is related to normal tissue behavior and its lack may be related to the formation of cancer, which shows a higher mechanical tension. Due to the complexity of cellular activity, the application of simplified models may elucidate which factors are really essential and which have a marginal effect. The development of a systematic method to reconstruct the elements involved in the perception of mechanical aspects by the cell may accelerate substantially the validation of these models. This work proposes the development of a routine capable of reconstructing the topology of focal adhesions and the actomyosin portion of the cytoskeleton from the displacement field generated by the cell on a flexible substrate. Another way to think of this problem is to develop an algorithm to reconstruct the forces applied by the cell from the measurements of the substrate displacement, which would be characterized as an inverse problem. For these kind of problems, the Topology Optimization Method (TOM) is suitable to find a solution. TOM is consisted of an iterative application of an optimization method and an analysis method to obtain an optimal distribution of material in a fixed domain. One way to experimentally obtain the substrate displacement is through Traction Force Microscopy (TFM), which also provides the forces applied by the cell. Along with systematically generating the distributions of focal adhesion and actin-myosin for the validation of simplified models, the algorithm also represents a complementary and more phenomenological approach to TFM. As a first approximation, actin fibers and flexible substrate are represented through two-dimensional linear Finite Element Method. Actin contraction is modeled as an initial stress of the FEM elements. Focal adhesions connecting actin and substrate are represented by springs. The algorithm was applied to data obtained from experiments regarding cytoskeletal prestress and micropatterning, comparing the numerical results to the experimental ones
Resumo:
As distributed collaborative applications and architectures are adopting policy based management for tasks such as access control, network security and data privacy, the management and consolidation of a large number of policies is becoming a crucial component of such policy based systems. In large-scale distributed collaborative applications like web services, there is the need of analyzing policy interactions and integrating policies. In this thesis, we propose and implement EXAM-S, a comprehensive environment for policy analysis and management, which can be used to perform a variety of functions such as policy property analyses, policy similarity analysis, policy integration etc. As part of this environment, we have proposed and implemented new techniques for the analysis of policies that rely on a deep study of state of the art techniques. Moreover, we propose an approach for solving heterogeneity problems that usually arise when considering the analysis of policies belonging to different domains. Our work focuses on analysis of access control policies written in the dialect of XACML (Extensible Access Control Markup Language). We consider XACML policies because XACML is a rich language which can represent many policies of interest to real world applications and is gaining widespread adoption in the industry.
Resumo:
Data sets describing the state of the earth's atmosphere are of great importance in the atmospheric sciences. Over the last decades, the quality and sheer amount of the available data increased significantly, resulting in a rising demand for new tools capable of handling and analysing these large, multidimensional sets of atmospheric data. The interdisciplinary work presented in this thesis covers the development and the application of practical software tools and efficient algorithms from the field of computer science, aiming at the goal of enabling atmospheric scientists to analyse and to gain new insights from these large data sets. For this purpose, our tools combine novel techniques with well-established methods from different areas such as scientific visualization and data segmentation. In this thesis, three practical tools are presented. Two of these tools are software systems (Insight and IWAL) for different types of processing and interactive visualization of data, the third tool is an efficient algorithm for data segmentation implemented as part of Insight.Insight is a toolkit for the interactive, three-dimensional visualization and processing of large sets of atmospheric data, originally developed as a testing environment for the novel segmentation algorithm. It provides a dynamic system for combining at runtime data from different sources, a variety of different data processing algorithms, and several visualization techniques. Its modular architecture and flexible scripting support led to additional applications of the software, from which two examples are presented: the usage of Insight as a WMS (web map service) server, and the automatic production of a sequence of images for the visualization of cyclone simulations. The core application of Insight is the provision of the novel segmentation algorithm for the efficient detection and tracking of 3D features in large sets of atmospheric data, as well as for the precise localization of the occurring genesis, lysis, merging and splitting events. Data segmentation usually leads to a significant reduction of the size of the considered data. This enables a practical visualization of the data, statistical analyses of the features and their events, and the manual or automatic detection of interesting situations for subsequent detailed investigation. The concepts of the novel algorithm, its technical realization, and several extensions for avoiding under- and over-segmentation are discussed. As example applications, this thesis covers the setup and the results of the segmentation of upper-tropospheric jet streams and cyclones as full 3D objects. Finally, IWAL is presented, which is a web application for providing an easy interactive access to meteorological data visualizations, primarily aimed at students. As a web application, the needs to retrieve all input data sets and to install and handle complex visualization tools on a local machine are avoided. The main challenge in the provision of customizable visualizations to large numbers of simultaneous users was to find an acceptable trade-off between the available visualization options and the performance of the application. Besides the implementational details, benchmarks and the results of a user survey are presented.
Resumo:
La geometria euclidea risulta spesso inadeguata a descrivere le forme della natura. I Frattali, oggetti interrotti e irregolari, come indica il nome stesso, sono più adatti a rappresentare la forma frastagliata delle linee costiere o altri elementi naturali. Lo strumento necessario per studiare rigorosamente i frattali sono i teoremi riguardanti la misura di Hausdorff, con i quali possono definirsi gli s-sets, dove s è la dimensione di Hausdorff. Se s non è intero, l'insieme in gioco può riconoscersi come frattale e non presenta tangenti e densità in quasi nessun punto. I frattali più classici, come gli insiemi di Cantor, Koch e Sierpinski, presentano anche la proprietà di auto-similarità e la dimensione di similitudine viene a coincidere con quella di Hausdorff. Una tecnica basata sulla dimensione frattale, detta box-counting, interviene in applicazioni bio-mediche e risulta utile per studiare le placche senili di varie specie di mammiferi tra cui l'uomo o anche per distinguere un melanoma maligno da una diversa lesione della cute.