Biblioteca Digital

919 resultados para data-driven Stochastic Subspace Identification (SSI-data)

Bulk geochemical and lipid biomarker data for sediment core W8402A-14

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Eleven sediment samples taken downcore and representing the past 26 kyr of deposition at MANOP site C (0°57.2°N, 138°57.3°W) were analyzed for lipid biomarker composition. Biomarkers of both terrestrial and marine sources of organic carbon were identified. In general, concentration profiles for these biomarkers and for total organic carbon (TOC) displayed three common stratigraphic features in the time series: (1) a maximum within the surface sediment mixed layer (<=4 ka); (2) a broad minimum extending throughout the interglacial deposit; and (3) a deep, pronounced maximum within the glacial deposit. Using the biomarker records, a simple binary mixing model is described that assesses the proportion of terrestrial to marine TOC in these sediments. Best estimates from this model suggest that ~20% of the TOC is land-derived, introduced by long-range eolian transport, and the remainder is derived from marine productivity. The direct correlation between the records for terrestrial and marine TOC with depth in this core fits an interpretation that primary productivity at site C has been controlled by wind-driven upwelling at least over the last glacial/interglacial cycle. The biomarker records place the greatest wind strength and highest primary productivity within the time frame of 18 to 22 kyr B.P. Diagenetic effects limit our ability to ascertain directly from the biomarker records the absolute magnitude that different types of primary productivity have changed at this ocean location over the past 26 kyr.

Remotely sensed data for ecosystem analyses: Combining hierarchy and scene models

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Remotely sensed data have been used extensively for environmental monitoring and modeling at a number of spatial scales; however, a limited range of satellite imaging systems often. constrained the scales of these analyses. A wider variety of data sets is now available, allowing image data to be selected to match the scale of environmental structure(s) or process(es) being examined. A framework is presented for use by environmental scientists and managers, enabling their spatial data collection needs to be linked to a suitable form of remotely sensed data. A six-step approach is used, combining image spatial analysis and scaling tools, within the context of hierarchy theory. The main steps involved are: (1) identification of information requirements for the monitoring or management problem; (2) development of ideal image dimensions (scene model), (3) exploratory analysis of existing remotely sensed data using scaling techniques, (4) selection and evaluation of suitable remotely sensed data based on the scene model, (5) selection of suitable spatial analytic techniques to meet information requirements, and (6) cost-benefit analysis. Results from a case study show that the framework provided an objective mechanism to identify relevant aspects of the monitoring problem and environmental characteristics for selecting remotely sensed data and analysis techniques.

Exploration of the cell-cycle genes found within the RIKEN FANTOM2 data set

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The cell cycle is one of the most fundamental processes within a cell. Phase-dependent expression and cell-cycle checkpoints require a high level of control. A large number of genes with varying functions and modes of action are responsible for this biology. In a targeted exploration of the FANTOM2-Variable Protein Set, a number of mouse homologs to known cell-cycle regulators as well as novel members of cell-cycle families were identified. Focusing on two prototype cell-cycle families, the cyclins and the NIMA-related kinases (NEKs), we believe we have identified all of the mouse members of these families, 24 cyclins and 10 NEKs, and mapped them to ENSEMBL transcripts. To attempt to globally identify all potential cell cycle-related genes within mouse, the MGI (Mouse Genome Database) assignments for the RIKEN Representative Set (RPS) and the results from two homology-based queries were merged. We identified 1415 genes with possible cell-cycle roles, and 1758 potential paralogs. We comment on the genes identified in this screen and evaluate the merits of each approach.

The implications of simultaneous smoking initiation for inferences about the genetics of smoking behavior from twin data

Relevância:

50.00% 50.00%

Publicador:

Resumo:

We examined early social influences across stages of smoking within the context of a twin study using an environmental exposure specific to smoking: whether twins started smoking at the same time (simultaneous smoking initiation: SSI). We expected that SSI would be a good index of shared social influences on smoking initiation. Rates of SSI were indeed significantly higher in MZ twins and in twins who shared peers and classes, as well as in male twins. With the exception of regular smoking in females, we found no significant difference in estimates of genetic and environmental parameters between SSI and non-SSI pairs for any of the smoking measures that we examined (DSM-IV and Fagerstrom HSI measures of nicotine dependence; DSM-IV nicotine withdrawal; heavy smoking; and in males, regular smoking). For regular smoking in females, allowing for additional shared environmental influences associated with SSI only modestly reduced our estimates of additive genetic variance (56% vs. 68%). These results indicate the important social influences that may occur for smoking initiation do not appear to seriously bias estimates of genetic effects on later stages of smoking.

Reference values for whole body and cerebral multi-frequency bio-impedance data in neonates less than 12 h postpartum

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Multiple frequency bio-electrical impedance analysis (MFBIA) may be useful for monitoring fluid balance in newborn infants or to provide early prediction of the outcome following perinatal asphyxia. A reference range of data is needed for identification of babies with abnormal impedance values. This was a cross-sectional observational study in 84 term and near-term healthy neonates less than 12 h postpartum. Whole body and cerebral MFBIA measurements were performed at the bedside in the post-natal ward. Gestational age, post-natal age, gender, birthweight, head circumference and foot length measures were recorded. Reference values for impedance at the characteristic frequency (Z(C)) and resistance at zero frequency (R-0) are reported for whole body and cerebral impedance. Significant correlations (p < 0.05) were observed between whole body impedance and birthweight, footlength and head circumference. Females had a significantly higher whole body R0 than males. Cerebral impedance did not correlate significantly with any of the demographic measures and therewere no gender differences observed for cerebral impedance. The reference range for whole body multi-frequency bio-impedance values in term and near-term infants within the first 12 h postpartum can be calculated from the footlength (FL) using the following equations: Z(C) = (942.9 - 4.818* FL) +/- 124.6 Omega; R-0 = (1042 - 4.520(*)FL) +/- 135.5 Omega. For cerebral impedance the reference range is 29.5-48.7 Omega for Z(C) and 33.7-58.0 Omega for R-0.

Data flow and validation in workflow modelling

Relevância:

50.00% 50.00%

Publicador:

Resumo:

A complete workflow specification requires careful integration of many different process characteristics. Decisions must be made as to the definitions of individual activities, their scope, the order of execution that maintains the overall business process logic, the rules governing the discipline of work list scheduling to performers, identification of time constraints and more. The goal of this paper is to address an important issue in workflows modelling and specification, which is data flow, its modelling, specification and validation. Researchers have neglected this dimension of process analysis for some time, mainly focussing on structural considerations with limited verification checks. In this paper, we identify and justify the importance of data modelling in overall workflows specification and verification. We illustrate and define several potential data flow problems that, if not detected prior to workflow deployment may prevent the process from correct execution, execute process on inconsistent data or even lead to process suspension. A discussion on essential requirements of the workflow data model in order to support data validation is also given..

Customer information system data pre-processing with feature selection techniques for non-technical losses prediction in an electricity market

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Non-technical losses (NTL) identification and prediction are important tasks for many utilities. Data from customer information system (CIS) can be used for NTL analysis. However, in order to accurately and efficiently perform NTL analysis, the original data from CIS need to be pre-processed before any detailed NTL analysis can be carried out. In this paper, we propose a feature selection based method for CIS data pre-processing in order to extract the most relevant information for further analysis such as clustering and classifications. By removing irrelevant and redundant features, feature selection is an essential step in data mining process in finding optimal subset of features to improve the quality of result by giving faster time processing, higher accuracy and simpler results with fewer features. Detailed feature selection analysis is presented in the paper. Both time-domain and load shape data are compared based on the accuracy, consistency and statistical dependencies between features.

Resource allocation in general queueing networks with applications to data networks

Relevância:

50.00% 50.00%

Publicador:

Retrotransposon expression profiling: Unveiling the hidden SINE transcriptome through Next-Generation Sequencing data analysis

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Of the ~1.7 million SINE elements in the human genome, only a tiny number are estimated to be active in transcription by RNA polymerase (Pol) III. Tracing the individual loci from which SINE transcripts originate is complicated by their highly repetitive nature. By exploiting RNA-Seq datasets and unique SINE DNA sequences, we devised a bioinformatic pipeline allowing us to identify Pol III-dependent transcripts of individual SINE elements. When applied to ENCODE transcriptomes of seven human cell lines, this search strategy identified ~1300 Alu loci and ~1100 MIR loci corresponding to detectable transcripts, with ~120 and ~60 respectively Alu and MIR loci expressed in at least three cell lines. In vitro transcription of selected SINEs did not reflect their in vivo expression properties, and required the native 5’-flanking region in addition to internal promoter. We also identified a cluster of expressed AluYa5-derived transcription units, juxtaposed to snaR genes on chromosome 19, formed by a promoter-containing left monomer fused to an Alu-unrelated downstream moiety. Autonomous Pol III transcription was also revealed for SINEs nested within Pol II-transcribed genes raising the possibility of an underlying mechanism for Pol II gene regulation by SINE transcriptional units. Moreover the application of our bioinformatic pipeline to both RNA-seq data of cells subjected to an in vitro pro-oncogenic stimulus and of in vivo matched tumor and non-tumor samples allowed us to detect increased Alu RNA expression as well as the source loci of such deregulation. The ability to investigate SINE transcriptomes at single-locus resolution will facilitate both the identification of novel biologically relevant SINE RNAs and the assessment of SINE expression alteration under pathological conditions.

Efficient Data Management with Applications to IoT

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The Internet of Things (IoT) consists of a worldwide “network of networks,” composed by billions of interconnected heterogeneous devices denoted as things or “Smart Objects” (SOs). Significant research efforts have been dedicated to port the experience gained in the design of the Internet to the IoT, with the goal of maximizing interoperability, using the Internet Protocol (IP) and designing specific protocols like the Constrained Application Protocol (CoAP), which have been widely accepted as drivers for the effective evolution of the IoT. This first wave of standardization can be considered successfully concluded and we can assume that communication with and between SOs is no longer an issue. At this time, to favor the widespread adoption of the IoT, it is crucial to provide mechanisms that facilitate IoT data management and the development of services enabling a real interaction with things. Several reference IoT scenarios have real-time or predictable latency requirements, dealing with billions of device collecting and sending an enormous quantity of data. These features create a new need for architectures specifically designed to handle this scenario, hear denoted as “Big Stream”. In this thesis a new Big Stream Listener-based Graph architecture is proposed. Another important step, is to build more applications around the Web model, bringing about the Web of Things (WoT). As several IoT testbeds have been focused on evaluating lower-layer communication aspects, this thesis proposes a new WoT Testbed aiming at allowing developers to work with a high level of abstraction, without worrying about low-level details. Finally, an innovative SOs-driven User Interface (UI) generation paradigm for mobile applications in heterogeneous IoT networks is proposed, to simplify interactions between users and things.

Perspectivas e metodologias de pesquisa da Comunicação Social no contexto da internet com o Big Data e da especialização Data Scientist

Relevância:

50.00% 50.00%

Publicador:

Resumo:

O trabalho desenvolvido analisa a Comunicação Social no contexto da internet e delineia novas metodologias de estudo para a área na filtragem de significados no âmbito científico dos fluxos de informação das redes sociais, mídias de notícias ou qualquer outro dispositivo que permita armazenamento e acesso a informação estruturada e não estruturada. No intento de uma reflexão sobre os caminhos, que estes fluxos de informação se desenvolvem e principalmente no volume produzido, o projeto dimensiona os campos de significados que tal relação se configura nas teorias e práticas de pesquisa. O objetivo geral deste trabalho é contextualizar a área da Comunicação Social dentro de uma realidade mutável e dinâmica que é o ambiente da internet e fazer paralelos perante as aplicações já sucedidas por outras áreas. Com o método de estudo de caso foram analisados três casos sob duas chaves conceituais a Web Sphere Analysis e a Web Science refletindo os sistemas de informação contrapostos no quesito discursivo e estrutural. Assim se busca observar qual ganho a Comunicação Social tem no modo de visualizar seus objetos de estudo no ambiente das internet por essas perspectivas. O resultado da pesquisa mostra que é um desafio para o pesquisador da Comunicação Social buscar novas aprendizagens, mas a retroalimentação de informação no ambiente colaborativo que a internet apresenta é um caminho fértil para pesquisa, pois a modelagem de dados ganha corpus analítico quando o conjunto de ferramentas promovido e impulsionado pela tecnologia permite isolar conteúdos e possibilita aprofundamento dos significados e suas relações.

Semi-supervised learning of hierarchical latent trait models for data visualisation

Relevância:

50.00% 50.00%

Publicador:

Resumo:

An interactive hierarchical Generative Topographic Mapping (HGTM) ¸iteH_GTM has been developed to visualise complex data sets. In this paper, we build a more general visualisation system by extending the HGTM visualisation system in 3 directions: bf (1) We generalize HGTM to noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM) developed in ¸iteKaban_pami. bf (2) We give the user a choice of initializing the child plots of the current plot in either em interactive, or em automatic mode. In the interactive mode the user interactively selects ``regions of interest'' as in ¸iteH_GTM, whereas in the automatic mode an unsupervised minimum message length (MML)-driven construction of a mixture of LTMs is employed. bf (3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. We illustrate our approach on a toy example and apply our system to three more complex real data sets.

On the use of likert-type scales in multilevel data:Influence on aggregate variables

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In multilevel analyses, problems may arise when using Likert-type scales at the lowest level of analysis. Specifically, increases in variance should lead to greater censoring for the groups whose true scores fall at either end of the distribution. The current study used simulation methods to examine the influence of single-item Likert-type scale usage on ICC(1), ICC(2), and group-level correlations. Results revealed substantial underestimation of ICC(1) when using Likert-type scales with common response formats (e.g., 5 points). ICC(2) and group-level correlations were also underestimated, but to a lesser extent. Finally, the magnitude of underestimation was driven in large part to an interaction between Likert-type scale usage and the amounts of within- and between-group variance. © Sage Publications.

Estimation of functional connectivity from electromagnetic signals and the amount of empirical data required

Relevância:

50.00% 50.00%

Publicador:

Resumo:

An increasing number of neuroimaging studies are concerned with the identification of interactions or statistical dependencies between brain areas. Dependencies between the activities of different brain regions can be quantified with functional connectivity measures such as the cross-correlation coefficient. An important factor limiting the accuracy of such measures is the amount of empirical data available. For event-related protocols, the amount of data also affects the temporal resolution of the analysis. We use analytical expressions to calculate the amount of empirical data needed to establish whether a certain level of dependency is significant when the time series are autocorrelated, as is the case for biological signals. These analytical results are then contrasted with estimates from simulations based on real data recorded with magnetoencephalography during a resting-state paradigm and during the presentation of visual stimuli. Results indicate that, for broadband signals, 50-100 s of data is required to detect a true underlying cross-correlations coefficient of 0.05. This corresponds to a resolution of a few hundred milliseconds for typical event-related recordings. The required time window increases for narrow band signals as frequency decreases. For instance, approximately 3 times as much data is necessary for signals in the alpha band. Important implications can be derived for the design and interpretation of experiments to characterize weak interactions, which are potentially important for brain processing.

Data assimilation for precipitation nowcasting using Bayesian inference

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This work introduces a new variational Bayes data assimilation method for the stochastic estimation of precipitation dynamics using radar observations for short term probabilistic forecasting (nowcasting). A previously developed spatial rainfall model based on the decomposition of the observed precipitation field using a basis function expansion captures the precipitation intensity from radar images as a set of ‘rain cells’. The prior distributions for the basis function parameters are carefully chosen to have a conjugate structure for the precipitation field model to allow a novel variational Bayes method to be applied to estimate the posterior distributions in closed form, based on solving an optimisation problem, in a spirit similar to 3D VAR analysis, but seeking approximations to the posterior distribution rather than simply the most probable state. A hierarchical Kalman filter is used to estimate the advection field based on the assimilated precipitation fields at two times. The model is applied to tracking precipitation dynamics in a realistic setting, using UK Met Office radar data from both a summer convective event and a winter frontal event. The performance of the model is assessed both traditionally and using probabilistic measures of fit based on ROC curves. The model is shown to provide very good assimilation characteristics, and promising forecast skill. Improvements to the forecasting scheme are discussed

«
1
2
...
54
55
56
57
58
59
60
61
62
»