8 resultados para Data carving tools.

em Duke University


Relevância:

80.00% 80.00%

Publicador:

Resumo:

HomeBank is introduced here. It is a public, permanent, extensible, online database of daylong audio recorded in naturalistic environments. HomeBank serves two primary purposes. First, it is a repository for raw audio and associated files: one database requires special permissions, and another redacted database allows unrestricted public access. Associated files include metadata such as participant demographics and clinical diagnostics, automated annotations, and human-generated transcriptions and annotations. Many recordings use the child-perspective LENA recorders (LENA Research Foundation, Boulder, Colorado, United States), but various recordings and metadata can be accommodated. The HomeBank database can have both vetted and unvetted recordings, with different levels of accessibility. Additionally, HomeBank is an open repository for processing and analysis tools for HomeBank or similar data sets. HomeBank is flexible for users and contributors, making primary data available to researchers, especially those in child development, linguistics, and audio engineering. HomeBank facilitates researchers' access to large-scale data and tools, linking the acoustic, auditory, and linguistic characteristics of children's environments with a variety of variables including socioeconomic status, family characteristics, language trajectories, and disorders. Automated processing applied to daylong home audio recordings is now becoming widely used in early intervention initiatives, helping parents to provide richer speech input to at-risk children.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: HIV-1 clade C (HIV-C) predominates worldwide, and anti-HIV-C vaccines are urgently needed. Neutralizing antibody (nAb) responses are considered important but have proved difficult to elicit. Although some current immunogens elicit antibodies that neutralize highly neutralization-sensitive (tier 1) HIV strains, most circulating HIVs exhibiting a less sensitive (tier 2) phenotype are not neutralized. Thus, both tier 1 and 2 viruses are needed for vaccine discovery in nonhuman primate models. METHODOLOGY/PRINCIPAL FINDINGS: We constructed a tier 1 simian-human immunodeficiency virus, SHIV-1157ipEL, by inserting an "early," recently transmitted HIV-C env into the SHIV-1157ipd3N4 backbone [1] encoding a "late" form of the same env, which had evolved in a SHIV-infected rhesus monkey (RM) with AIDS. SHIV-1157ipEL was rapidly passaged to yield SHIV-1157ipEL-p, which remained exclusively R5-tropic and had a tier 1 phenotype, in contrast to "late" SHIV-1157ipd3N4 (tier 2). After 5 weekly low-dose intrarectal exposures, SHIV-1157ipEL-p systemically infected 16 out of 17 RM with high peak viral RNA loads and depleted gut CD4+ T cells. SHIV-1157ipEL-p and SHIV-1157ipd3N4 env genes diverge mostly in V1/V2. Molecular modeling revealed a possible mechanism for the increased neutralization resistance of SHIV-1157ipd3N4 Env: V2 loops hindering access to the CD4 binding site, shown experimentally with nAb b12. Similar mutations have been linked to decreased neutralization sensitivity in HIV-C strains isolated from humans over time, indicating parallel HIV-C Env evolution in humans and RM. CONCLUSIONS/SIGNIFICANCE: SHIV-1157ipEL-p, the first tier 1 R5 clade C SHIV, and SHIV-1157ipd3N4, its tier 2 counterpart, represent biologically relevant tools for anti-HIV-C vaccine development in primates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.

This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.

On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.

In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.

We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,

and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.

In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To maintain a strict balance between demand and supply in the US power systems, the Independent System Operators (ISOs) schedule power plants and determine electricity prices using a market clearing model. This model determines for each time period and power plant, the times of startup, shutdown, the amount of power production, and the provisioning of spinning and non-spinning power generation reserves, etc. Such a deterministic optimization model takes as input the characteristics of all the generating units such as their power generation installed capacity, ramp rates, minimum up and down time requirements, and marginal costs for production, as well as the forecast of intermittent energy such as wind and solar, along with the minimum reserve requirement of the whole system. This reserve requirement is determined based on the likelihood of outages on the supply side and on the levels of error forecasts in demand and intermittent generation. With increased installed capacity of intermittent renewable energy, determining the appropriate level of reserve requirements has become harder. Stochastic market clearing models have been proposed as an alternative to deterministic market clearing models. Rather than using a fixed reserve targets as an input, stochastic market clearing models take different scenarios of wind power into consideration and determine reserves schedule as output. Using a scaled version of the power generation system of PJM, a regional transmission organization (RTO) that coordinates the movement of wholesale electricity in all or parts of 13 states and the District of Columbia, and wind scenarios generated from BPA (Bonneville Power Administration) data, this paper explores a comparison of the performance between a stochastic and deterministic model in market clearing. The two models are compared in their ability to contribute to the affordability, reliability and sustainability of the electricity system, measured in terms of total operational costs, load shedding and air emissions. The process of building the models and running for tests indicate that a fair comparison is difficult to obtain due to the multi-dimensional performance metrics considered here, and the difficulty in setting up the parameters of the models in a way that does not advantage or disadvantage one modeling framework. Along these lines, this study explores the effect that model assumptions such as reserve requirements, value of lost load (VOLL) and wind spillage costs have on the comparison of the performance of stochastic vs deterministic market clearing models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Building on the planning efforts of the RCN4GSC project, a workshop was convened in San Diego to bring together experts from genomics and metagenomics, biodiversity, ecology, and bioinformatics with the charge to identify potential for positive interactions and progress, especially building on successes at establishing data standards by the GSC and by the biodiversity and ecological communities. Until recently, the contribution of microbial life to the biomass and biodiversity of the biosphere was largely overlooked (because it was resistant to systematic study). Now, emerging genomic and metagenomic tools are making investigation possible. Initial research findings suggest that major advances are in the offing. Although different research communities share some overlapping concepts and traditions, they differ significantly in sampling approaches, vocabularies and workflows. Likewise, their definitions of 'fitness for use' for data differ significantly, as this concept stems from the specific research questions of most importance in the different fields. Nevertheless, there is little doubt that there is much to be gained from greater coordination and integration. As a first step toward interoperability of the information systems used by the different communities, participants agreed to conduct a case study on two of the leading data standards from the two formerly disparate fields: (a) GSC's standard checklists for genomics and metagenomics and (b) TDWG's Darwin Core standard, used primarily in taxonomy and systematic biology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The hepatitis delta virus (HDV) ribozyme is a self-cleaving RNA enzyme essential for processing viral transcripts during rolling circle viral replication. The first crystal structure of the cleaved ribozyme was solved in 1998, followed by structures of uncleaved, mutant-inhibited and ion-complexed forms. Recently, methods have been developed that make the task of modeling RNA structure and dynamics significantly easier and more reliable. We have used ERRASER and PHENIX to rebuild and re-refine the cleaved and cis-acting C75U-inhibited structures of the HDV ribozyme. The results correct local conformations and identify alternates for RNA residues, many in functionally important regions, leading to improved R values and model validation statistics for both structures. We compare the rebuilt structures to a higher resolution, trans-acting deoxy-inhibited structure of the ribozyme, and conclude that although both inhibited structures are consistent with the currently accepted hammerhead-like mechanism of cleavage, they do not add direct structural evidence to the biochemical and modeling data. However, the rebuilt structures (PDBs: 4PR6, 4PRF) provide a more robust starting point for research on the dynamics and catalytic mechanism of the HDV ribozyme and demonstrate the power of new techniques to make significant improvements in RNA structures that impact biologically relevant conclusions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

While substance use problems are considered to be common in medical settings, they are not systematically assessed and diagnosed for treatment management. Research data suggest that the majority of individuals with a substance use disorder either do not use treatment or delay treatment-seeking for over a decade. The separation of substance abuse services from mainstream medical care and a lack of preventive services for substance abuse in primary care can contribute to under-detection of substance use problems. When fully enacted in 2014, the Patient Protection and Affordable Care Act 2010 will address these barriers by supporting preventive services for substance abuse (screening, counseling) and integration of substance abuse care with primary care. One key factor that can help to achieve this goal is to incorporate the standardized screeners or common data elements for substance use and related disorders into the electronic health records (EHR) system in the health care setting. Incentives for care providers to adopt an EHR system for meaningful use are part of the Health Information Technology for Economic and Clinical Health Act 2009. This commentary focuses on recent evidence about routine screening and intervention for alcohol/drug use and related disorders in primary care. Federal efforts in developing common data elements for use as screeners for substance use and related disorders are described. A pressing need for empirical data on screening, brief intervention, and referral to treatment (SBIRT) for drug-related disorders to inform SBIRT and related EHR efforts is highlighted.