Biblioteca Digital

861 resultados para Curricular Support Data Analysis

Scaling up data mining techniques to large datasets using parallel and distributed processing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.

Current problems in four-dimensional data assimilation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this lecture is to review recent development in data analysis, initialization and data assimilation. The development of 3-dimensional multivariate schemes has been very timely because of its suitability to handle the many different types of observations during FGGE. Great progress has taken place in the initialization of global models by the aid of non-linear normal mode technique. However, in spite of great progress, several fundamental problems are still unsatisfactorily solved. Of particular importance is the question of the initialization of the divergent wind fields in the Tropics and to find proper ways to initialize weather systems driven by non-adiabatic processes. The unsatisfactory ways in which such processes are being initialized are leading to excessively long spin-up times.

The world at one's fingertips: interactive interpretation of environmental data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This chapter introduces the latest practices and technologies in the interactive interpretation of environmental data. With environmental data becoming ever larger, more diverse and more complex, there is a need for a new generation of tools that provides new capabilities over and above those of the standard workhorses of science. These new tools aid the scientist in discovering interesting new features (and also problems) in large datasets by allowing the data to be explored interactively using simple, intuitive graphical tools. In this way, new discoveries are made that are commonly missed by automated batch data processing. This chapter discusses the characteristics of environmental science data, common current practice in data analysis and the supporting tools and infrastructure. New approaches are introduced and illustrated from the points of view of both the end user and the underlying technology. We conclude by speculating as to future developments in the field and what must be achieved to fulfil this vision.

The neural bases of the short-term storage of verbal information are anatomically variable across individuals

Relevância:

100.00% 100.00%

Publicador:

Resumo:

What are the precise brain regions supporting the short-term retention of verbal information? A previous functional magnetic resonance imaging (fMRI) study suggested that they may be topographically variable across individuals, occurring, in most, in regions posterior to prefrontal cortex (PFC), and that detection of these regions may be best suited to a single-subject (SS) approach to fMRI analysis (Feredoes and Postle, 2007). In contrast, other studies using spatially normalized group-averaged (SNGA) analyses have localized storage-related activity to PFC. To evaluate the necessity of the regions identified by these two methods, we applied repetitive transcranial magnetic stimulation (rTMS) to SS- and SNGA-identified regions throughout the retention period of a delayed letter-recognition task. Results indicated that rTMS targeting SS analysis-identified regions of left perisylvian and sensorimotor cortex impaired performance, whereas rTMS targeting the SNGA-identified region of left caudal PFC had no effect on performance. Our results support the view that the short-term retention of verbal information can be supported by regions associated with acoustic, lexical, phonological, and speech-based representation of information. They also suggest that the brain bases of some cognitive functions may be better detected by SS than by SNGA approaches to fMRI data analysis.

TRY: a global database of plant traits

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Plant traits – the morphological, anatomical, physiological, biochemical and phenological characteristics of plants and their organs – determine how primary producers respond to environmental factors, affect other trophic levels, influence ecosystem processes and services and provide a link from species richness to ecosystem functional diversity. Trait data thus represent the raw material for a wide range of research from evolutionary biology, community and functional ecology to biogeography. Here we present the global database initiative named TRY, which has united a wide range of the plant trait research community worldwide and gained an unprecedented buy-in of trait data: so far 93 trait databases have been contributed. The data repository currently contains almost three million trait entries for 69 000 out of the world's 300 000 plant species, with a focus on 52 groups of traits characterizing the vegetative and regeneration stages of the plant life cycle, including growth, dispersal, establishment and persistence. A first data analysis shows that most plant traits are approximately log-normally distributed, with widely differing ranges of variation across traits. Most trait variation is between species (interspecific), but significant intraspecific variation is also documented, up to 40% of the overall variation. Plant functional types (PFTs), as commonly used in vegetation models, capture a substantial fraction of the observed variation – but for several traits most variation occurs within PFTs, up to 75% of the overall variation. In the context of vegetation models these traits would better be represented by state variables rather than fixed parameter values. The improved availability of plant trait data in the unified global database is expected to support a paradigm shift from species to trait-based ecology, offer new opportunities for synthetic plant trait research and enable a more realistic and empirically grounded representation of terrestrial vegetation in Earth system models.

Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2 of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units (6 of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.

Are power calculations useful? A multicentre neuroimaging study

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are now many reports of imaging experiments with small cohorts of typical participants that precede large-scale, often multicentre studies of psychiatric and neurological disorders. Data from these calibration experiments are sufficient to make estimates of statistical power and predictions of sample size and minimum observable effect sizes. In this technical note, we suggest how previously reported voxel-based power calculations can support decision making in the design, execution and analysis of cross-sectional multicentre imaging studies. The choice of MRI acquisition sequence, distribution of recruitment across acquisition centres, and changes to the registration method applied during data analysis are considered as examples. The consequences of modification are explored in quantitative terms by assessing the impact on sample size for a fixed effect size and detectable effect size for a fixed sample size. The calibration experiment dataset used for illustration was a precursor to the now complete Medical Research Council Autism Imaging Multicentre Study (MRC-AIMS). Validation of the voxel-based power calculations is made by comparing the predicted values from the calibration experiment with those observed in MRC-AIMS. The effect of non-linear mappings during image registration to a standard stereotactic space on the prediction is explored with reference to the amount of local deformation. In summary, power calculations offer a validated, quantitative means of making informed choices on important factors that influence the outcome of studies that consume significant resources.

Pocket data mining - big data on small devices

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the deployment of PDM in the mobile environment. An important extension to the basic implementation of PDM dealing with concept drift is also reported. In the era of Big Data, potential applications of paramount importance offered by PDM in a variety of domains including security, business and telemedicine are discussed.

Pragmatic oriented data interoperability for smart healthcare information systems

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Smart healthcare is a complex domain for systems integration due to human and technical factors and heterogeneous data sources involved. As a part of smart city, it is such a complex area where clinical functions require smartness of multi-systems collaborations for effective communications among departments, and radiology is one of the areas highly relies on intelligent information integration and communication. Therefore, it faces many challenges regarding integration and its interoperability such as information collision, heterogeneous data sources, policy obstacles, and procedure mismanagement. The purpose of this study is to conduct an analysis of data, semantic, and pragmatic interoperability of systems integration in radiology department, and to develop a pragmatic interoperability framework for guiding the integration. We select an on-going project at a local hospital for undertaking our case study. The project is to achieve data sharing and interoperability among Radiology Information Systems (RIS), Electronic Patient Record (EPR), and Picture Archiving and Communication Systems (PACS). Qualitative data collection and analysis methods are used. The data sources consisted of documentation including publications and internal working papers, one year of non-participant observations and 37 interviews with radiologists, clinicians, directors of IT services, referring clinicians, radiographers, receptionists and secretary. We identified four primary phases of data analysis process for the case study: requirements and barriers identification, integration approach, interoperability measurements, and knowledge foundations. Each phase is discussed and supported by qualitative data. Through the analysis we also develop a pragmatic interoperability framework that summaries the empirical findings and proposes recommendations for guiding the integration in the radiology context.

Blending systems thinking approaches for organisational analysis: reviewing child protection in England

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper concerns the innovative use of a blend of systems thinking ideas in the ‘Munro Review of Child Protection’, a high-profile examination of child protection activities in England, conducted for the Department for Education. We go ‘behind the scenes’ to describe the OR methodologies and processes employed. The circumstances that led to the Review are outlined. Three specific contributions that systems thinking made to the Review are then described. First, the systems-based analysis and visualisation of how a ‘compliance culture’ had grown up. Second the creation of a large, complex systems map of current operations and the effects of past policies on them. Third, how the map gave shape to the range of issues the Review addressed and acted as an organising framework for the systemically coherent set of recommendations made. The paper closes with an outline of the main implementation steps taken so far to create a child protection system with the critically reflective properties of a learning organisation, and methodological reflections on the benefits of systems thinking to support organisational analysis.

Mandatory and voluntary information disclosure and the effects on financial analysts: evidence from China

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose – This paper aims to make a comparison, different from existing literature solely focusing on voluntary earnings forecasts and ex post earnings surprise, between the effects of mandatory earnings surprise warnings and voluntary information disclosure issued by management teams on financial analysts in terms of the number of followings and the accuracy of earnings forecasts. Design/methodology/approach – This paper uses panel data analysis with fixed effects on data collected from Chinese public firms between 2006 and 2010. It uses an exogenous regulation enforcement to minimise the endogeneity problem. Findings – This paper finds that financial analysts are less likely to follow firms which mandatorily issue earnings surprise warnings ex ante than those voluntarily issue earnings forecasts. Moreover, ex post, they issue less accurate and more dispersed forecasts on former firms. The results support Brown et al.’s (2009) finding in the USA and suggest that the earnings surprise warnings affect information asymmetries. Practical implications – This paper justifies the mandatory earnings surprise warnings policy issued by Chinese Securities Regulatory Commission in 2006. Originality/value – Mandatory earnings surprise is a unique practical regulation for publicly listed firms in China. This paper, for the first time, provides empirical evaluation on the effectiveness of a mandatory information disclosure policy in China. Consistent with existing literature on information disclosure by public firms in other countries, this paper finds that, in China, voluntary information disclosure captures more private information than mandatory information disclosure on corporate earnings ability.

Inequalities in dental services utilization among Brazilian low-income children: the role of individual determinants

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objectives: To assess the role of the individual determinants on the inequalities of dental services utilization among low-income children living in the working area of Brazilian`s federal Primary Health Care program, which is called Family Health Program (FHP), in a big city in Southern Brazil. Methods: A cross-sectional population-based study was performed. The sample included 350 children, ages 0 to 14 years, whose parents answered a questionnaire about their socioeconomic conditions, perceived needs, oral hygiene habits, and access to dental services. The data analysis was performed according to a conceptual framework based on Andersen`s behavioral model of health services use. Multivariate models of logistic regression analysis instructed the hypothesis on covariates for never having had a dental visit. Results: Thirty one percent of the surveyed children had never had a dental visit. In the bivariate analysis, higher proportion of children who had never had a dental visit was found among the very young, those with inadequate oral hygiene habits, those without perceived need of dental care, and those whose family homes were under absent ownership. The mechanisms of social support showed to be important enabling factors: children attending schools/kindergartens and being regularly monitored by the FHP teams had higher odds of having gone to the dentist, even after adjusting for socioeconomic, demographic, and need variables. Conclusions: The conceptual framework has confirmed the presence of social and psychosocial inequalities on the utilization pattern of dental services for low-income children. The individual determinants seem to be important predictors of access.

PCA Tomography: how to extract information from data cubes

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Astronomy has evolved almost exclusively by the use of spectroscopic and imaging techniques, operated separately. With the development of modern technologies, it is possible to obtain data cubes in which one combines both techniques simultaneously, producing images with spectral resolution. To extract information from them can be quite complex, and hence the development of new methods of data analysis is desirable. We present a method of analysis of data cube (data from single field observations, containing two spatial and one spectral dimension) that uses Principal Component Analysis (PCA) to express the data in the form of reduced dimensionality, facilitating efficient information extraction from very large data sets. PCA transforms the system of correlated coordinates into a system of uncorrelated coordinates ordered by principal components of decreasing variance. The new coordinates are referred to as eigenvectors, and the projections of the data on to these coordinates produce images we will call tomograms. The association of the tomograms (images) to eigenvectors (spectra) is important for the interpretation of both. The eigenvectors are mutually orthogonal, and this information is fundamental for their handling and interpretation. When the data cube shows objects that present uncorrelated physical phenomena, the eigenvector`s orthogonality may be instrumental in separating and identifying them. By handling eigenvectors and tomograms, one can enhance features, extract noise, compress data, extract spectra, etc. We applied the method, for illustration purpose only, to the central region of the low ionization nuclear emission region (LINER) galaxy NGC 4736, and demonstrate that it has a type 1 active nucleus, not known before. Furthermore, we show that it is displaced from the centre of its stellar bulge.

Quality indices for (practical) clustering evaluation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering quality or validation indices allow the evaluation of the quality of clustering in order to support the selection of a specific partition or clustering structure in its natural unsupervised environment, where the real solution is unknown or not available. In this paper, we investigate the use of quality indices mostly based on the concepts of clusters` compactness and separation, for the evaluation of clustering results (partitions in particular). This work intends to offer a general perspective regarding the appropriate use of quality indices for the purpose of clustering evaluation. After presenting some commonly used indices, as well as indices recently proposed in the literature, key issues regarding the practical use of quality indices are addressed. A general methodological approach is presented which considers the identification of appropriate indices thresholds. This general approach is compared with the simple use of quality indices for evaluating a clustering solution.

The log-exponentiated Weibull regression model for interval-censored data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In interval-censored survival data, the event of interest is not observed exactly but is only known to occur within some time interval. Such data appear very frequently. In this paper, we are concerned only with parametric forms, and so a location-scale regression model based on the exponentiated Weibull distribution is proposed for modeling interval-censored data. We show that the proposed log-exponentiated Weibull regression model for interval-censored data represents a parametric family of models that include other regression models that are broadly used in lifetime data analysis. Assuming the use of interval-censored data, we employ a frequentist analysis, a jackknife estimator, a parametric bootstrap and a Bayesian analysis for the parameters of the proposed model. We derive the appropriate matrices for assessing local influences on the parameter estimates under different perturbation schemes and present some ways to assess global influences. Furthermore, for different parameter settings, sample sizes and censoring percentages, various simulations are performed; in addition, the empirical distribution of some modified residuals are displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended to a modified deviance residual in log-exponentiated Weibull regression models for interval-censored data. (C) 2009 Elsevier B.V. All rights reserved.

«
1
2
...
46
47
48
49
50
51
52
...
57
58
»