970 resultados para Datasets


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The problem of discovering frequent arrangements of temporal intervals is studied. It is assumed that the database consists of sequences of events, where an event occurs during a time-interval. The goal is to mine temporal arrangements of event intervals that appear frequently in the database. The motivation of this work is the observation that in practice most events are not instantaneous but occur over a period of time and different events may occur concurrently. Thus, there are many practical applications that require mining such temporal correlations between intervals including the linguistic analysis of annotated data from American Sign Language as well as network and biological data. Two efficient methods to find frequent arrangements of temporal intervals are described; the first one is tree-based and uses depth first search to mine the set of frequent arrangements, whereas the second one is prefix-based. The above methods apply efficient pruning techniques that include a set of constraints consisting of regular expressions and gap constraints that add user-controlled focus into the mining process. Moreover, based on the extracted patterns a standard method for mining association rules is employed that applies different interestingness measures to evaluate the significance of the discovered patterns and rules. The performance of the proposed algorithms is evaluated and compared with other approaches on real (American Sign Language annotations and network data) and large synthetic datasets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

— Consideration of how people respond to the question What is this? has suggested new problem frontiers for pattern recognition and information fusion, as well as neural systems that embody the cognitive transformation of declarative information into relational knowledge. In contrast to traditional classification methods, which aim to find the single correct label for each exemplar (This is a car), the new approach discovers rules that embody coherent relationships among labels which would otherwise appear contradictory to a learning system (This is a car, that is a vehicle, over there is a sedan). This talk will describe how an individual who experiences exemplars in real time, with each exemplar trained on at most one category label, can autonomously discover a hierarchy of cognitive rules, thereby converting local information into global knowledge. Computational examples are based on the observation that sensors working at different times, locations, and spatial scales, and experts with different goals, languages, and situations, may produce apparently inconsistent image labels, which are reconciled by implicit underlying relationships that the network’s learning process discovers. The ARTMAP information fusion system can, moreover, integrate multiple separate knowledge hierarchies, by fusing independent domains into a unified structure. In the process, the system discovers cross-domain rules, inferring multilevel relationships among groups of output classes, without any supervised labeling of these relationships. In order to self-organize its expert system, the ARTMAP information fusion network features distributed code representations which exploit the model’s intrinsic capacity for one-to-many learning (This is a car and a vehicle and a sedan) as well as many-to-one learning (Each of those vehicles is a car). Fusion system software, testbed datasets, and articles are available from http://cns.bu.edu/techlab.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Memories in Adaptive Resonance Theory (ART) networks are based on matched patterns that focus attention on those portions of bottom-up inputs that match active top-down expectations. While this learning strategy has proved successful for both brain models and applications, computational examples show that attention to early critical features may later distort memory representations during online fast learning. For supervised learning, biased ARTMAP (bARTMAP) solves the problem of over-emphasis on early critical features by directing attention away from previously attended features after the system makes a predictive error. Small-scale, hand-computed analog and binary examples illustrate key model dynamics. Twodimensional simulation examples demonstrate the evolution of bARTMAP memories as they are learned online. Benchmark simulations show that featural biasing also improves performance on large-scale examples. One example, which predicts movie genres and is based, in part, on the Netflix Prize database, was developed for this project. Both first principles and consistent performance improvements on all simulation studies suggest that featural biasing should be incorporated by default in all ARTMAP systems. Benchmark datasets and bARTMAP code are available from the CNS Technology Lab Website: http://techlab.bu.edu/bART/.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

European badgers (Meles meles) are an important part of the Irish ecosystem; they are a component of Ireland’s native fauna and are afforded protection by national and international laws. The species is also a reservoir host for bovine tuberculosis (bTB) and implicated in the epidemiology of bTB in cattle. Due to this latter point, badgers have been culled in the Republic of Ireland (ROI) in areas where persistent cattle bTB outbreaks exist. The population dynamics of badgers are therefore of great pure and applied interest. The studies within this thesis used large datasets and a number of analytical approaches to uncover essential elements of badger populations in the ROI. Furthermore, a review and meta-analysis of all available data on Irish badgers was completed to give a framework from which key knowledge gaps and future directions could be identified (Chapter 1). One main finding suggested that badger densities are significantly reduced in areas of repeated culling, as revealed through declining trends in signs of activity (Chapter 2) and capture numbers (Chapter 2 and Chapter 3). Despite this, the trappability of badgers was shown to be lower than previously thought. This indicates that management programmes would require repeated long-term efforts to be effective (Chapter 4). Mark-recapture modelling of a population (sample area: 755km2) suggested that mean badger density was typical of continental European populations, but substantially lower than British populations (Chapter 4). Badger movement patterns indicated that most of the population exhibited site fidelity. Long-distance movements were also recorded, the longest of which (20.1km) was the greatest displacement of an Irish badger currently known (Chapter 5). The studies presented in this thesis allows for the development of more robust models of the badger population at national scales (see Future Directions). Through the use of large-scale datasets future models will facilitate informed sustainable planning for disease control.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The wave energy industry is progressing towards an advanced stage of development, with consideration being given to the selection of suitable sites for the first commercial installations. An informed, and accurate, characterisation of the wave energy resource is an essential aspect of this process. Ireland is exposed to an energetic wave climate, however many features of this resource are not well understood. This thesis assesses and characterises the wave energy resource that has been measured and modelled at the Atlantic Marine Energy Test Site, a facility for conducting sea trials of floating wave energy converters that is being developed near Belmullet, on the west coast of Ireland. This characterisation process is undertaken through the analysis of metocean datasets that have previously been unavailable for exposed Irish sites. A number of commonly made assumptions in the calculation of wave power are contested, and the uncertainties resulting from their application are demonstrated. The relationship between commonly used wave period parameters is studied, and its importance in the calculation of wave power quantified, while it is also shown that a disconnect exists between the sea states which occur most frequently at the site and those that contribute most to the incident wave energy. Additionally, observations of the extreme wave conditions that have occurred at the site and estimates of future storms that devices will need to withstand are presented. The implications of these results for the design and operation of wave energy converters are discussed. The foremost contribution of this thesis is the development of an enhanced understanding of the fundamental nature of the wave energy resource at the Atlantic Marine Energy Test Site. The results presented here also have a wider relevance, and can be considered typical of other, similarly exposed, locations on Ireland’s west coast.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Case-Based Reasoning (CBR) uses past experiences to solve new problems. The quality of the past experiences, which are stored as cases in a case base, is a big factor in the performance of a CBR system. The system's competence may be improved by adding problems to the case base after they have been solved and their solutions verified to be correct. However, from time to time, the case base may have to be refined to reduce redundancy and to get rid of any noisy cases that may have been introduced. Many case base maintenance algorithms have been developed to delete noisy and redundant cases. However, different algorithms work well in different situations and it may be difficult for a knowledge engineer to know which one is the best to use for a particular case base. In this thesis, we investigate ways to combine algorithms to produce better deletion decisions than the decisions made by individual algorithms, and ways to choose which algorithm is best for a given case base at a given time. We analyse five of the most commonly-used maintenance algorithms in detail and show how the different algorithms perform better on different datasets. This motivates us to develop a new approach: maintenance by a committee of experts (MACE). MACE allows us to combine maintenance algorithms to produce a composite algorithm which exploits the merits of each of the algorithms that it contains. By combining different algorithms in different ways we can also define algorithms that have different trade-offs between accuracy and deletion. While MACE allows us to define an infinite number of new composite algorithms, we still face the problem of choosing which algorithm to use. To make this choice, we need to be able to identify properties of a case base that are predictive of which maintenance algorithm is best. We examine a number of measures of dataset complexity for this purpose. These provide a numerical way to describe a case base at a given time. We use the numerical description to develop a meta-case-based classification system. This system uses previous experience about which maintenance algorithm was best to use for other case bases to predict which algorithm to use for a new case base. Finally, we give the knowledge engineer more control over the deletion process by creating incremental versions of the maintenance algorithms. These incremental algorithms suggest one case at a time for deletion rather than a group of cases, which allows the knowledge engineer to decide whether or not each case in turn should be deleted or kept. We also develop incremental versions of the complexity measures, allowing us to create an incremental version of our meta-case-based classification system. Since the case base changes after each deletion, the best algorithm to use may also change. The incremental system allows us to choose which algorithm is the best to use at each point in the deletion process.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Inclusive education is central to contemporary discourse internationally reflecting societies’ wider commitment to social inclusion. Education has witnessed transforming approaches that have created differing distributions of power, resource allocation and accountability. Multiple actors are being forced to consider changes to how key services and supports are organised. This research constitutes a case study situated within this broader social service dilemma of how to distribute finite resources equitably to meet individual need, while advancing inclusion. It focuses on the national directive with regard to inclusive educational practice for primary schools, Department of Education and Science Special Education Circular 02/05, which introduced the General Allocation Model (GAM) within the legislative context of the Education of Persons with Special Educational Needs (EPSEN) Act (Government of Ireland, 2004). This research could help to inform policy with ‘facts about what is happening on the ground’ (Quinn, 2013). Research Aims: The research set out to unearth the assumptions and definitions embedded within the policy document, to analyse how those who are at the coalface of policy, and who interface with multiple interests in primary schools, understand the GAM and respond to it, and to investigate its effects on students and their education. It examines student outcomes in the primary schools where the GAM was investigated. Methods and Sample The post-structural study acknowledges the importance of policy analysis which explicitly links the ‘bigger worlds’ of global and national policy contexts to the ‘smaller worlds’ of policies and practices within schools and classrooms. This study insists upon taking the detail seriously (Ozga, 1990). A mixed methods approach to data collection and analysis is applied. In order to secure the perspectives of key stakeholders, semi-structured interviews were conducted with primary school principals, class teachers and learning support/resource teachers (n=14) in three distinct mainstream, non-DEIS schools. Data from the schools and their environs provided a profile of students. The researcher then used the Pobal Maps Facility (available at www.pobal.ie) to identify the Small Area (SA) in which each student resides, and to assign values to each address based on the Pobal HP Deprivation Index (Haase and Pratschke, 2012). Analysis of the datasets, guided by the conceptual framework of the policy cycle (Ball, 1994), revealed a number of significant themes. Results: Data illustrate that the main model to support student need is withdrawal from the classroom under policy that espouses inclusion. Quantitative data, in particular, highlighted an association between segregated practice and lower socioeconomic status (LSES) backgrounds of students. Up to 83% of the students in special education programmes are from lower socio-economic status (LSES) backgrounds. In some schools 94% of students from LSES backgrounds are withdrawn from classrooms daily for special education. While the internal processes of schooling are not solely to blame for class inequalities, this study reveals the power of professionals to order children in school, which has implications for segregated special education practice. Such agency on the part of key actors in the context of practice relates to ‘local constructions of dis/ability’, which is influenced by teacher habitus (Bourdieu, 1984). The researcher contends that inclusive education has not resulted in positive outcomes for students from LSES backgrounds because it is built on faulty assumptions that focus on a psycho-medical perspective of dis/ability, that is, placement decisions do not consider the intersectionality of dis/ability with class or culture. This study argues that the student need for support is better understood as ‘home/school discontinuity’ not ‘disability’. Moreover, the study unearths the power of some parents to use social and cultural capital to ensure eligibility to enhanced resources. Therefore, a hierarchical system has developed in mainstream schools as a result of funding models to support need in inclusive settings. Furthermore, all schools in the study are ‘ordinary’ schools yet participants acknowledged that some schools are more ‘advantaged’, which may suggest that ‘ordinary’ schools serve to ‘bury class’ (Reay, 2010) as a key marker in allocating resources. The research suggests that general allocation models of funding to meet the needs of students demands a systematic approach grounded in reallocating funds from where they have less benefit to where they have more. The calculation of the composite Haase Value in respect of the student cohort in receipt of special education support adopted for this study could be usefully applied at a national level to ensure that the greatest level of support is targeted at greatest need. Conclusion: In summary, the study reveals that existing structures constrain and enable agents, whose interactions produce intended and unintended consequences. The study suggests that policy should be viewed as a continuous and evolving cycle (Ball, 1994) where actors in each of the social contexts have a shared responsibility in the evolution of education that is equitable, excellent and inclusive.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ribosome profiling (ribo-seq) is a recently developed technique that provides genomewide information on protein synthesis (GWIPS) in vivo. The high resolution of ribo-seq is one of the exciting properties of this technique. In Chapter 2, I present a computational method that utilises the sub-codon precision and triplet periodicity of ribosome profiling data to detect transitions in the translated reading frame. Application of this method to ribosome profiling data generated for human HeLa cells allowed us to detect several human genes where the same genomic segment is translated in more than one reading frame. Since the initial publication of the ribosome profiling technique in 2009, there has been a proliferation of studies that have used the technique to explore various questions with respect to translation. A review of the many uses and adaptations of the technique is provided in Chapter 1. Indeed, owing to the increasing popularity of the technique and the growing number of published ribosome profiling datasets, we have developed GWIPS-viz (http://gwips.ucc.ie), a ribo-seq dedicated genome browser. Details on the development of the browser and its usage are provided in Chapter 3. One of the surprising findings of ribosome profiling of initiating ribosomes carried out in 3 independent studies, was the widespread use of non-AUG codons as translation initiation start sites in mammals. Although initiation at non-AUG codons in mammals has been documented for some time, the extent of non-AUG initiation reported by these ribo-seq studies was unexpected. In Chapter 4, I present an approach for estimating the strength of initiating codons based on the leaky scanning model of translation initiation. Application of this approach to ribo-seq data illustrates that initiation at non-AUG codons is inefficient compared to initiation at AUG codons. In addition, our approach provides a probability of initiation score for each start site that allows its strength of initiation to be evaluated.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Existing work in Computer Science and Electronic Engineering demonstrates that Digital Signal Processing techniques can effectively identify the presence of stress in the speech signal. These techniques use datasets containing real or actual stress samples i.e. real-life stress such as 911 calls and so on. Studies that use simulated or laboratory-induced stress have been less successful and inconsistent. Pervasive, ubiquitous computing is increasingly moving towards voice-activated and voice-controlled systems and devices. Speech recognition and speaker identification algorithms will have to improve and take emotional speech into account. Modelling the influence of stress on speech and voice is of interest to researchers from many different disciplines including security, telecommunications, psychology, speech science, forensics and Human Computer Interaction (HCI). The aim of this work is to assess the impact of moderate stress on the speech signal. In order to do this, a dataset of laboratory-induced stress is required. While attempting to build this dataset it became apparent that reliably inducing measurable stress in a controlled environment, when speech is a requirement, is a challenging task. This work focuses on the use of a variety of stressors to elicit a stress response during tasks that involve speech content. Biosignal analysis (commercial Brain Computer Interfaces, eye tracking and skin resistance) is used to verify and quantify the stress response, if any. This thesis explains the basis of the author’s hypotheses on the elicitation of affectively-toned speech and presents the results of several studies carried out throughout the PhD research period. These results show that the elicitation of stress, particularly the induction of affectively-toned speech, is not a simple matter and that many modulating factors influence the stress response process. A model is proposed to reflect the author’s hypothesis on the emotional response pathways relating to the elicitation of stress with a required speech content. Finally the author provides guidelines and recommendations for future research on speech under stress. Further research paths are identified and a roadmap for future research in this area is defined.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Childhood obesity is a global epidemic posing a significant threat to the health and wellbeing of children. To reverse this epidemic, it is essential that we gain a deeper understanding of the complex array of driving factors at an individual, family and wider ecological level. Using a social-ecological framework, this thesis investigates the direction, magnitude and contribution of risk factors for childhood overweight and obesity at multiple levels of influence, with a particular focus on diet and physical activity. Methods: A systematic review was conducted to describe recent trends (from 2002-2012) in childhood overweight and obesity prevalence in Irish school children from the Republic of Ireland. Two datasets (Cork Children’s Lifestyle [CCLaS] Study and the Growing Up in Ireland [GUI] Study) were used to explore determinants of childhood overweight and obesity. Individual lifestyle factors examined were diet, physical activity and sedentary behaviour. The determinants of physical activity were also explored. Family factors examined were parental weight status and household socio-economic status. The impact of food access in the local area on diet quality and body mass index (BMI) was investigated as an environmental level risk factor. Results: Between 2002 and 2012, the prevalence of childhood overweight and obesity in Ireland remained stable. There was some evidence to suggest that childhood obesity rates may have decreased slightly though one in four Irish children remained either overweight or obese. In the CCLaS study, overweight and obese children consumed more unhealthy foods than normal weight children. A diet quality score was constructed based on a previously validated adult diet score. Each one unit increase in diet quality was significantly associated with a decreased risk of childhood overweight and obesity. Individual level factors (including gender, being a member of a sports team, weight status) were more strongly associated with physical activity levels than family or environmental factors. Overweight and obese children were more sedentary and less active than normal weight children. There was a dose response relationship between time spent at moderate to vigorous physical activity (MVPA) and the risk of childhood obesity independent of sedentary time. In contrast, total sedentary time was not associated with the risk of childhood obesity independent of MVPA though screen time was associated with childhood overweight and obesity. In the GUI Study, only one in five children had 2 normal weight parents (or one normal weight parent in the case of single parent families). Having overweight and obese parents was a significant risk factor for overweight and obesity regardless of socio-economic characteristics of the household. Family income was not associated with the odds of childhood obesity but social class and parental education were important risk factors for childhood obesity. Access to food stores in the local environment did not impact dietary quality or the BMI of Irish children. However, there was some evidence to suggest that the economic resources of the family influenced diet and BMI. Discussion: Though childhood overweight and obesity rates appear to have stabilised over the previous decade, prevalence rates are unacceptably high. As expected, overweight and obesity were associated with a high energy intake and poor dietary quality. The findings also highlight strong associations between physical inactivity and the risk of overweight and obesity, with effect sizes greater than what have been typically found in adults. Important family level determinants of childhood overweight and obesity were also identified. The findings highlight the need for a multifaceted approach, targeting a range of modifiable determinants to tackle the problem. In particular, policies and interventions at the shared family environment or community level may be an effective mean of tackling this current epidemic.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

INTRODUCTION: The characterization of urinary calculi using noninvasive methods has the potential to affect clinical management. CT remains the gold standard for diagnosis of urinary calculi, but has not reliably differentiated varying stone compositions. Dual-energy CT (DECT) has emerged as a technology to improve CT characterization of anatomic structures. This study aims to assess the ability of DECT to accurately discriminate between different types of urinary calculi in an in vitro model using novel postimage acquisition data processing techniques. METHODS: Fifty urinary calculi were assessed, of which 44 had >or=60% composition of one component. DECT was performed utilizing 64-slice multidetector CT. The attenuation profiles of the lower-energy (DECT-Low) and higher-energy (DECT-High) datasets were used to investigate whether differences could be seen between different stone compositions. RESULTS: Postimage acquisition processing allowed for identification of the main different chemical compositions of urinary calculi: brushite, calcium oxalate-calcium phosphate, struvite, cystine, and uric acid. Statistical analysis demonstrated that this processing identified all stone compositions without obvious graphical overlap. CONCLUSION: Dual-energy multidetector CT with postprocessing techniques allows for accurate discrimination among the main different subtypes of urinary calculi in an in vitro model. The ability to better detect stone composition may have implications in determining the optimum clinical treatment modality for urinary calculi from noninvasive, preprocedure radiological assessment.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The objective of spatial downscaling strategies is to increase the information content of coarse datasets at smaller scales. In the case of quantitative precipitation estimation (QPE) for hydrological applications, the goal is to close the scale gap between the spatial resolution of coarse datasets (e.g., gridded satellite precipitation products at resolution L × L) and the high resolution (l × l; L»l) necessary to capture the spatial features that determine spatial variability of water flows and water stores in the landscape. In essence, the downscaling process consists of weaving subgrid-scale heterogeneity over a desired range of wavelengths in the original field. The defining question is, which properties, statistical and otherwise, of the target field (the known observable at the desired spatial resolution) should be matched, with the caveat that downscaling methods be as a general as possible and therefore ideally without case-specific constraints and/or calibration requirements? Here, the attention is focused on two simple fractal downscaling methods using iterated functions systems (IFS) and fractal Brownian surfaces (FBS) that meet this requirement. The two methods were applied to disaggregate spatially 27 summertime convective storms in the central United States during 2007 at three consecutive times (1800, 2100, and 0000 UTC, thus 81 fields overall) from the Tropical Rainfall Measuring Mission (TRMM) version 6 (V6) 3B42 precipitation product (~25-km grid spacing) to the same resolution as the NCEP stage IV products (~4-km grid spacing). Results from bilinear interpolation are used as the control. A fundamental distinction between IFS and FBS is that the latter implies a distribution of downscaled fields and thus an ensemble solution, whereas the former provides a single solution. The downscaling effectiveness is assessed using fractal measures (the spectral exponent β, fractal dimension D, Hurst coefficient H, and roughness amplitude R) and traditional operational scores statistics scores [false alarm rate (FR), probability of detection (PD), threat score (TS), and Heidke skill score (HSS)], as well as bias and the root-mean-square error (RMSE). The results show that both IFS and FBS fractal interpolation perform well with regard to operational skill scores, and they meet the additional requirement of generating structurally consistent fields. Furthermore, confidence intervals can be directly generated from the FBS ensemble. The results were used to diagnose errors relevant for hydrometeorological applications, in particular a spatial displacement with characteristic length of at least 50 km (2500 km2) in the location of peak rainfall intensities for the cases studied. © 2010 American Meteorological Society.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but has not been previously mined en masse for changes in mRNA processing. We explored the possibility of using HG-U133 microarray data to identify changes in alternative mRNA processing in several available archival datasets. RESULTS: Data from these and other gene expression microarrays can now be mined for changes in transcript isoform abundance using a program described here, SplicerAV. Using in vivo and in vitro breast cancer microarray datasets, SplicerAV was able to perform both gene and isoform specific expression profiling within the same microarray dataset. Our reanalysis of Affymetrix U133 plus 2.0 data generated by in vitro over-expression of HRAS, E2F3, beta-catenin (CTNNB1), SRC, and MYC identified several hundred oncogene-induced mRNA isoform changes, one of which recognized a previously unknown mechanism of EGFR family activation. Using clinical data, SplicerAV predicted 241 isoform changes between low and high grade breast tumors; with changes enriched among genes coding for guanyl-nucleotide exchange factors, metalloprotease inhibitors, and mRNA processing factors. Isoform changes in 15 genes were associated with aggressive cancer across the three breast cancer datasets. CONCLUSIONS: Using SplicerAV, we identified several hundred previously uncharacterized isoform changes induced by in vitro oncogene over-expression and revealed a previously unknown mechanism of EGFR activation in human mammary epithelial cells. We analyzed Affymetrix GeneChip data from over 400 human breast tumors in three independent studies, making this the largest clinical dataset analyzed for en masse changes in alternative mRNA processing. The capacity to detect RNA isoform changes in archival microarray data using SplicerAV allowed us to carry out the first analysis of isoform specific mRNA changes directly associated with cancer survival.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Biological processes occur on a vast range of time scales, and many of them occur concurrently. As a result, system-wide measurements of gene expression have the potential to capture many of these processes simultaneously. The challenge however, is to separate these processes and time scales in the data. In many cases the number of processes and their time scales is unknown. This issue is particularly relevant to developmental biologists, who are interested in processes such as growth, segmentation and differentiation, which can all take place simultaneously, but on different time scales. RESULTS: We introduce a flexible and statistically rigorous method for detecting different time scales in time-series gene expression data, by identifying expression patterns that are temporally shifted between replicate datasets. We apply our approach to a Saccharomyces cerevisiae cell-cycle dataset and an Arabidopsis thaliana root developmental dataset. In both datasets our method successfully detects processes operating on several different time scales. Furthermore we show that many of these time scales can be associated with particular biological functions. CONCLUSIONS: The spatiotemporal modules identified by our method suggest the presence of multiple biological processes, acting at distinct time scales in both the Arabidopsis root and yeast. Using similar large-scale expression datasets, the identification of biological processes acting at multiple time scales in many organisms is now possible.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.