925 resultados para Automated data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acknowledgements We thank Andrew Spink (Noldus Information Technology) and the Blogging Birds team members Peter Kindness and Abdul Adeniyi for their valuable contributions to this paper. John Fryxell, Chris Thaxter and Arjun Amar provided valuable comments on an earlier version. The study was part of the Digital Conservation project of dot.rural, the University of Aberdeen’s Digital Economy Research Hub, funded by RCUK (grant reference EP/G066051/1).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A combination of deductive reasoning, clustering, and inductive learning is given as an example of a hybrid system for exploratory data analysis. Visualization is replaced by a dialogue with the data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ion Mobility Spectrometry coupled with Multi Capillary Columns (MCC -IMS) is a fast analytical technique working at atmospheric pressure with high sensitivity and selectivity making it suitable for the analysis of complex biological matrices. MCC-IMS analysis generates its information through a 3D spectrum with peaks, corresponding to each of the substances detected, providing quantitative and qualitative information. Sometimes peaks of different substances overlap, making the quantification of substances present in the biological matrices a difficult process. In the present work we use peaks of isoprene and acetone as a model for this problem. These two volatile organic compounds (VOCs) that when detected by MCC-IMS produce two overlapping peaks. In this work it’s proposed an algorithm to identify and quantify these two peaks. This algorithm uses image processing techniques to treat the spectra and to detect the position of the peaks, and then fits the data to a custom model in order to separate the peaks. Once the peaks are separated it calculates the contribution of each peak to the data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE: To evaluate an automated seizure detection (ASD) algorithm in EEGs with periodic and other challenging patterns. METHODS: Selected EEGs recorded in patients over 1year old were classified into four groups: A. Periodic lateralized epileptiform discharges (PLEDs) with intermixed electrical seizures. B. PLEDs without seizures. C. Electrical seizures and no PLEDs. D. No PLEDs or seizures. Recordings were analyzed by the Persyst P12 software, and compared to the raw EEG, interpreted by two experienced neurophysiologists; Positive percent agreement (PPA) and false-positive rates/hour (FPR) were calculated. RESULTS: We assessed 98 recordings (Group A=21 patients; B=29, C=17, D=31). Total duration was 82.7h (median: 1h); containing 268 seizures. The software detected 204 (=76.1%) seizures; all ictal events were captured in 29/38 (76.3%) patients; in only in 3 (7.7%) no seizures were detected. Median PPA was 100% (range 0-100; interquartile range 50-100), and the median FPR 0/h (range 0-75.8; interquartile range 0-4.5); however, lower performances were seen in the groups containing periodic discharges. CONCLUSION: This analysis provides data regarding the yield of the ASD in a particularly difficult subset of EEG recordings, showing that periodic discharges may bias the results. SIGNIFICANCE: Ongoing refinements in this technique might enhance its utility and lead to a more extensive application.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recent rapid development of biotechnological approaches has enabled the production of large whole genome level biological data sets. In order to handle thesedata sets, reliable and efficient automated tools and methods for data processingand result interpretation are required. Bioinformatics, as the field of studying andprocessing biological data, tries to answer this need by combining methods and approaches across computer science, statistics, mathematics and engineering to studyand process biological data. The need is also increasing for tools that can be used by the biological researchers themselves who may not have a strong statistical or computational background, which requires creating tools and pipelines with intuitive user interfaces, robust analysis workflows and strong emphasis on result reportingand visualization. Within this thesis, several data analysis tools and methods have been developed for analyzing high-throughput biological data sets. These approaches, coveringseveral aspects of high-throughput data analysis, are specifically aimed for gene expression and genotyping data although in principle they are suitable for analyzing other data types as well. Coherent handling of the data across the various data analysis steps is highly important in order to ensure robust and reliable results. Thus,robust data analysis workflows are also described, putting the developed tools andmethods into a wider context. The choice of the correct analysis method may also depend on the properties of the specific data setandthereforeguidelinesforchoosing an optimal method are given. The data analysis tools, methods and workflows developed within this thesis have been applied to several research studies, of which two representative examplesare included in the thesis. The first study focuses on spermatogenesis in murinetestis and the second one examines cell lineage specification in mouse embryonicstem cells.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have combined several key sample preparation steps for the use of a liquid matrix system to provide high analytical sensitivity in automated ultraviolet -- matrix-assisted laser desorption/ionisation -- mass spectrometry (UV-MALDI-MS). This new sample preparation protocol employs a matrix-mixture which is based on the glycerol matrix-mixture described by Sze et al. The low-femtomole sensitivity that is achievable with this new preparation protocol enables proteomic analysis of protein digests comparable to solid-state matrix systems. For automated data acquisition and analysis, the MALDI performance of this liquid matrix surpasses the conventional solid-state MALDI matrices. Besides the inherent general advantages of liquid samples for automated sample preparation and data acquisition the use of the presented liquid matrix significantly reduces the extent of unspecific ion signals in peptide mass fingerprints compared to typically used solid matrices, such as 2,5-dihydroxybenzoic acid (DHB) or alpha-cyano-hydroxycinnamic acid (CHCA). In particular, matrix and low-mass ion signals and ion signals resulting from cation adduct formation are dramatically reduced. Consequently, the confidence level of protein identification by peptide mass mapping of in-solution and in-gel digests is generally higher.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have combined several key sample preparation steps for the use of a liquid matrix system to provide high analytical sensitivity in automated ultraviolet - matrix-assisted laser desorption/ ionisation - mass spectrometry (UV-MALDI-MS). This new sample preparation protocol employs a matrix-mixture which is based on the glycerol matrix-mixture described by Sze et al. U. Am. Soc. Mass Spectrom. 1998, 9, 166-174). The low-ferntomole sensitivity that is achievable with this new preparation protocol enables proteomic analysis of protein digests comparable to solid-state matrix systems. For automated data acquisition and analysis, the MALDI performance of this liquid matrix surpasses the conventional solid-state MALDI matrices. Besides the inherent general advantages of liquid samples for automated sample preparation and data acquisition the use of the presented liquid matrix significantly reduces the extent of unspecific ion signals in peptide mass fingerprints compared to typically used solid matrices, such as 2,5-dihydrox-ybenzoic acid (DHB) or alpha-cyano-hydroxycinnamic acid (CHCA). In particular, matrix and lowmass ion signals and ion signals resulting from cation adduct formation are dramatically reduced. Consequently, the confidence level of protein identification by peptide mass mapping of in-solution and in-gel digests is generally higher.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Parasite virulence genes are usually associated with telomeres. The clustering of the telomeres, together with their particular spatial distribution in the nucleus of human parasites such as Plasmodium falciparum and Trypanosoma brucei, has been suggested to play a role in facilitating ectopic recombination and in the emergence of new antigenic variants. Leishmania parasites, as well as other trypanosomes, have unusual gene expression characteristics, such as polycistronic and constitutive transcription of protein-coding genes. Leishmania subtelomeric regions are even more unique because unlike these regions in other trypanosomes they are devoid of virulence genes. Given these peculiarities of Leishmania, we sought to investigate how telomeres are organized in the nucleus of Leishmania major parasites at both the human and insect stages of their life cycle. We developed a new automated and precise method for identifying telomere position in the three-dimensional space of the nucleus, and we found that the telomeres are organized in clusters present in similar numbers in both the human and insect stages. While the number of clusters remained the same, their distribution differed between the two stages. The telomeric clusters were found more concentrated near the center of the nucleus in the human stage than in the insect stage suggesting reorganization during the parasite's differentiation process between the two hosts. These data provide the first 3D analysis of Leishmania telomere organization. The possible biological implications of these findings are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Gaia space mission is a major project for the European astronomical community. As challenging as it is, the processing and analysis of the huge data-flow incoming from Gaia is the subject of thorough study and preparatory work by the DPAC (Data Processing and Analysis Consortium), in charge of all aspects of the Gaia data reduction. This PhD Thesis was carried out in the framework of the DPAC, within the team based in Bologna. The task of the Bologna team is to define the calibration model and to build a grid of spectro-photometric standard stars (SPSS) suitable for the absolute flux calibration of the Gaia G-band photometry and the BP/RP spectrophotometry. Such a flux calibration can be performed by repeatedly observing each SPSS during the life-time of the Gaia mission and by comparing the observed Gaia spectra to the spectra obtained by our ground-based observations. Due to both the different observing sites involved and the huge amount of frames expected (≃100000), it is essential to maintain the maximum homogeneity in data quality, acquisition and treatment, and a particular care has to be used to test the capabilities of each telescope/instrument combination (through the “instrument familiarization plan”), to devise methods to keep under control, and eventually to correct for, the typical instrumental effects that can affect the high precision required for the Gaia SPSS grid (a few % with respect to Vega). I contributed to the ground-based survey of Gaia SPSS in many respects: with the observations, the instrument familiarization plan, the data reduction and analysis activities (both photometry and spectroscopy), and to the maintenance of the data archives. However, the field I was personally responsible for was photometry and in particular relative photometry for the production of short-term light curves. In this context I defined and tested a semi-automated pipeline which allows for the pre-reduction of imaging SPSS data and the production of aperture photometry catalogues ready to be used for further analysis. A series of semi-automated quality control criteria are included in the pipeline at various levels, from pre-reduction, to aperture photometry, to light curves production and analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

National Highway Traffic Safety Administration, Office of Vehicle Safety Research, Washington, D.C.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Derivational morphology proposes meaningful connections between words and is largely unrepresented in lexical databases. This thesis presents a project to enrich a lexical database with morphological links and to evaluate their contribution to disambiguation. A lexical database with sense distinctions was required. WordNet was chosen because of its free availability and widespread use. Its suitability was assessed through critical evaluation with respect to specifications and criticisms, using a transparent, extensible model. The identification of serious shortcomings suggested a portable enrichment methodology, applicable to alternative resources. Although 40% of the most frequent words are prepositions, they have been largely ignored by computational linguists, so addition of prepositions was also required. The preferred approach to morphological enrichment was to infer relations from phenomena discovered algorithmically. Both existing databases and existing algorithms can capture regular morphological relations, but cannot capture exceptions correctly; neither of them provide any semantic information. Some morphological analysis algorithms are subject to the fallacy that morphological analysis can be performed simply by segmentation. Morphological rules, grounded in observation and etymology, govern associations between and attachment of suffixes and contribute to defining the meaning of morphological relationships. Specifying character substitutions circumvents the segmentation fallacy. Morphological rules are prone to undergeneration, minimised through a variable lexical validity requirement, and overgeneration, minimised by rule reformulation and restricting monosyllabic output. Rules take into account the morphology of ancestor languages through co-occurrences of morphological patterns. Multiple rules applicable to an input suffix need their precedence established. The resistance of prefixations to segmentation has been addressed by identifying linking vowel exceptions and irregular prefixes. The automatic affix discovery algorithm applies heuristics to identify meaningful affixes and is combined with morphological rules into a hybrid model, fed only with empirical data, collected without supervision. Further algorithms apply the rules optimally to automatically pre-identified suffixes and break words into their component morphemes. To handle exceptions, stoplists were created in response to initial errors and fed back into the model through iterative development, leading to 100% precision, contestable only on lexicographic criteria. Stoplist length is minimised by special treatment of monosyllables and reformulation of rules. 96% of words and phrases are analysed. 218,802 directed derivational links have been encoded in the lexicon rather than the wordnet component of the model because the lexicon provides the optimal clustering of word senses. Both links and analyser are portable to an alternative lexicon. The evaluation uses the extended gloss overlaps disambiguation algorithm. The enriched model outperformed WordNet in terms of recall without loss of precision. Failure of all experiments to outperform disambiguation by frequency reflects on WordNet sense distinctions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2015

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. Methods/Principal Findings: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of ""what if'' situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. Conclusion/Significance: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The performance of three analytical methods for multiple-frequency bioelectrical impedance analysis (MFBIA) data was assessed. The methods were the established method of Cole and Cole, the newly proposed method of Siconolfi and co-workers and a modification of this procedure. Method performance was assessed from the adequacy of the curve fitting techniques, as judged by the correlation coefficient and standard error of the estimate, and the accuracy of the different methods in determining the theoretical values of impedance parameters describing a set of model electrical circuits. The experimental data were well fitted by all curve-fitting procedures (r = 0.9 with SEE 0.3 to 3.5% or better for most circuit-procedure combinations). Cole-Cole modelling provided the most accurate estimates of circuit impedance values, generally within 1-2% of the theoretical values, followed by the Siconolfi procedure using a sixth-order polynomial regression (1-6% variation). None of the methods, however, accurately estimated circuit parameters when the measured impedances were low (<20 Omega) reflecting the electronic limits of the impedance meter used. These data suggest that Cole-Cole modelling remains the preferred method for the analysis of MFBIA data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The identification, modeling, and analysis of interactions between nodes of neural systems in the human brain have become the aim of interest of many studies in neuroscience. The complex neural network structure and its correlations with brain functions have played a role in all areas of neuroscience, including the comprehension of cognitive and emotional processing. Indeed, understanding how information is stored, retrieved, processed, and transmitted is one of the ultimate challenges in brain research. In this context, in functional neuroimaging, connectivity analysis is a major tool for the exploration and characterization of the information flow between specialized brain regions. In most functional magnetic resonance imaging (fMRI) studies, connectivity analysis is carried out by first selecting regions of interest (ROI) and then calculating an average BOLD time series (across the voxels in each cluster). Some studies have shown that the average may not be a good choice and have suggested, as an alternative, the use of principal component analysis (PCA) to extract the principal eigen-time series from the ROI(s). In this paper, we introduce a novel approach called cluster Granger analysis (CGA) to study connectivity between ROIs. The main aim of this method was to employ multiple eigen-time series in each ROI to avoid temporal information loss during identification of Granger causality. Such information loss is inherent in averaging (e.g., to yield a single ""representative"" time series per ROI). This, in turn, may lead to a lack of power in detecting connections. The proposed approach is based on multivariate statistical analysis and integrates PCA and partial canonical correlation in a framework of Granger causality for clusters (sets) of time series. We also describe an algorithm for statistical significance testing based on bootstrapping. By using Monte Carlo simulations, we show that the proposed approach outperforms conventional Granger causality analysis (i.e., using representative time series extracted by signal averaging or first principal components estimation from ROIs). The usefulness of the CGA approach in real fMRI data is illustrated in an experiment using human faces expressing emotions. With this data set, the proposed approach suggested the presence of significantly more connections between the ROIs than were detected using a single representative time series in each ROI. (c) 2010 Elsevier Inc. All rights reserved.