914 resultados para statistical data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Understanding transcriptional regulation by genome-wide microarray studies can contribute to unravel complex relationships between genes. Attempts to standardize the annotation of microarray data include the Minimum Information About a Microarray Experiment (MIAME) recommendations, the MAGE-ML format for data interchange, and the use of controlled vocabularies or ontologies. The existing software systems for microarray data analysis implement the mentioned standards only partially and are often hard to use and extend. Integration of genomic annotation data and other sources of external knowledge using open standards is therefore a key requirement for future integrated analysis systems. Results: The EMMA 2 software has been designed to resolve shortcomings with respect to full MAGE-ML and ontology support and makes use of modern data integration techniques. We present a software system that features comprehensive data analysis functions for spotted arrays, and for the most common synthesized oligo arrays such as Agilent, Affymetrix and NimbleGen. The system is based on the full MAGE object model. Analysis functionality is based on R and Bioconductor packages and can make use of a compute cluster for distributed services. Conclusion: Our model-driven approach for automatically implementing a full MAGE object model provides high flexibility and compatibility. Data integration via SOAP-based web-services is advantageous in a distributed client-server environment as the collaborative analysis of microarray data is gaining more and more relevance in international research consortia. The adequacy of the EMMA 2 software design and implementation has been proven by its application in many distributed functional genomics projects. Its scalability makes the current architecture suited for extensions towards future transcriptomics methods based on high-throughput sequencing approaches which have much higher computational requirements than microarrays.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Animal welfare issues have received much attention not only to supply farmed animal requirements, but also to ethical and cultural public concerns. Daily collected information, as well as the systematic follow-up of production stages, produces important statistical data for production assessment and control, as well as for improvement possibilities. In this scenario, this research study analyzed behavioral, production, and environmental data using Main Component Multivariable Analysis, which correlated observed behaviors, recorded using video cameras and electronic identification, with performance parameters of female broiler breeders. The aim was to start building a system to support decision-making in broiler breeder housing, based on bird behavioral parameters. Birds were housed in an environmental chamber, with three pens with different controlled environments. Bird sensitivity to environmental conditions were indicated by their behaviors, stressing the importance of behavioral observations for modern poultry management. A strong association between performance parameters and the behavior at the nest, suggesting that this behavior may be used to predict productivity. The behaviors of ruffling feathers, opening wings, preening, and at the drinker were negatively correlated with environmental temperature, suggesting that the increase of in the frequency of these behaviors indicate improvement of thermal welfare.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação de Mestrado, Gestão de Unidades de Saúde, Faculdade de Economia, Universidade do Algarve, 2016

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

MEGAGEO - Moving megaliths in the Neolithic is a project that intends to find the provenience of lithic materials in the construction of tombs. A multidisciplinary approach is carried out, with researchers from several of the knowledge fields involved. This work presents a spatial data warehouse specially developed for this project that comprises information from national archaeological databases, geographic and geological information and new geochemical and petrographic data obtained during the project. The use of the spatial data warehouse proved to be essential in the data analysis phase of the project. The Redondo Area is presented as a case study for the application of the spatial data warehouse to analyze the relations between geochemistry, geology and the tombs in this area.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Collecting and analysing data is an important element in any field of human activity and research. Even in sports, collecting and analyzing statistical data is attracting a growing interest. Some exemplar use cases are: improvement of technical/tactical aspects for team coaches, definition of game strategies based on the opposite team play or evaluation of the performance of players. Other advantages are related to taking more precise and impartial judgment in referee decisions: a wrong decision can change the outcomes of important matches. Finally, it can be useful to provide better representations and graphic effects that make the game more engaging for the audience during the match. Nowadays it is possible to delegate this type of task to automatic software systems that can use cameras or even hardware sensors to collect images or data and process them. One of the most efficient methods to collect data is to process the video images of the sporting event through mixed techniques concerning machine learning applied to computer vision. As in other domains in which computer vision can be applied, the main tasks in sports are related to object detection, player tracking, and to the pose estimation of athletes. The goal of the present thesis is to apply different models of CNNs to analyze volleyball matches. Starting from video frames of a volleyball match, we reproduce a bird's eye view of the playing court where all the players are projected, reporting also for each player the type of action she/he is performing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study of random probability measures is a lively research topic that has attracted interest from different fields in recent years. In this thesis, we consider random probability measures in the context of Bayesian nonparametrics, where the law of a random probability measure is used as prior distribution, and in the context of distributional data analysis, where the goal is to perform inference given avsample from the law of a random probability measure. The contributions contained in this thesis can be subdivided according to three different topics: (i) the use of almost surely discrete repulsive random measures (i.e., whose support points are well separated) for Bayesian model-based clustering, (ii) the proposal of new laws for collections of random probability measures for Bayesian density estimation of partially exchangeable data subdivided into different groups, and (iii) the study of principal component analysis and regression models for probability distributions seen as elements of the 2-Wasserstein space. Specifically, for point (i) above we propose an efficient Markov chain Monte Carlo algorithm for posterior inference, which sidesteps the need of split-merge reversible jump moves typically associated with poor performance, we propose a model for clustering high-dimensional data by introducing a novel class of anisotropic determinantal point processes, and study the distributional properties of the repulsive measures, shedding light on important theoretical results which enable more principled prior elicitation and more efficient posterior simulation algorithms. For point (ii) above, we consider several models suitable for clustering homogeneous populations, inducing spatial dependence across groups of data, extracting the characteristic traits common to all the data-groups, and propose a novel vector autoregressive model to study of growth curves of Singaporean kids. Finally, for point (iii), we propose a novel class of projected statistical methods for distributional data analysis for measures on the real line and on the unit-circle.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis, we investigate the role of applied physics in epidemiological surveillance through the application of mathematical models, network science and machine learning. The spread of a communicable disease depends on many biological, social, and health factors. The large masses of data available make it possible, on the one hand, to monitor the evolution and spread of pathogenic organisms; on the other hand, to study the behavior of people, their opinions and habits. Presented here are three lines of research in which an attempt was made to solve real epidemiological problems through data analysis and the use of statistical and mathematical models. In Chapter 1, we applied language-inspired Deep Learning models to transform influenza protein sequences into vectors encoding their information content. We then attempted to reconstruct the antigenic properties of different viral strains using regression models and to identify the mutations responsible for vaccine escape. In Chapter 2, we constructed a compartmental model to describe the spread of a bacterium within a hospital ward. The model was informed and validated on time series of clinical measurements, and a sensitivity analysis was used to assess the impact of different control measures. Finally (Chapter 3) we reconstructed the network of retweets among COVID-19 themed Twitter users in the early months of the SARS-CoV-2 pandemic. By means of community detection algorithms and centrality measures, we characterized users’ attention shifts in the network, showing that scientific communities, initially the most retweeted, lost influence over time to national political communities. In the Conclusion, we highlighted the importance of the work done in light of the main contemporary challenges for epidemiological surveillance. In particular, we present reflections on the importance of nowcasting and forecasting, the relationship between data and scientific research, and the need to unite the different scales of epidemiological surveillance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, product development in all its phases plays a fundamental role in the industrial chain. The need for a company to compete at high levels, the need to be quick in responding to market demands and therefore to be able to engineer the product quickly and with a high level of quality, has led to the need to get involved in new more advanced methods/ processes. In recent years, we are moving away from the concept of 2D-based design and production and approaching the concept of Model Based Definition. By using this approach, increasingly complex systems turn out to be easier to deal with but above all cheaper in obtaining them. Thanks to the Model Based Definition it is possible to share data in a lean and simple way to the entire engineering and production chain of the product. The great advantage of this approach is precisely the uniqueness of the information. In this specific thesis work, this approach has been exploited in the context of tolerances with the aid of CAD / CAT software. Tolerance analysis or dimensional variation analysis is a way to understand how sources of variation in part size and assembly constraints propagate between parts and assemblies and how that range affects the ability of a project to meet its requirements. It is critically important to note how tolerance directly affects the cost and performance of products. Worst Case Analysis (WCA) and Statistical analysis (RSS) are the two principal methods in DVA. The thesis aims to show the advantages of using statistical dimensional analysis by creating and examining various case studies, using PTC CREO software for CAD modeling and CETOL 6σ for tolerance analysis. Moreover, it will be provided a comparison between manual and 3D analysis, focusing the attention to the information lost in the 1D case. The results obtained allow us to highlight the need to use this approach from the early stages of the product design cycle.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

To assess the completeness and reliability of the Information System on Live Births (Sinasc) data. A cross-sectional analysis of the reliability and completeness of Sinasc's data was performed using a sample of Live Birth Certificate (LBC) from 2009, related to births from Campinas, Southeast Brazil. For data analysis, hospitals were grouped according to category of service (Unified National Health System, private or both), 600 LBCs were randomly selected and the data were collected in LBC-copies through mothers and newborns' hospital records and by telephone interviews. The completeness of LBCs was evaluated, calculating the percentage of blank fields, and the LBCs agreement comparing the originals with the copies was evaluated by Kappa and intraclass correlation coefficients. The percentage of completeness of LBCs ranged from 99.8%-100%. For the most items, the agreement was excellent. However, the agreement was acceptable for marital status, maternal education and newborn infants' race/color, low for prenatal visits and presence of birth defects, and very low for the number of deceased children. The results showed that the municipality Sinasc is reliable for most of the studied variables. Investments in training of the professionals are suggested in an attempt to improve system capacity to support planning and implementation of health activities for the benefit of maternal and child population.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Universidade Estadual de Campinas . Faculdade de Educação Física

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Universidade Estadual de Campinas . Faculdade de Educação Física

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Universidade Estadual de Campinas. Faculdade de Educação Física