835 resultados para Data processing and analysis
Analyzing large-scale gene expression data is a labor-intensive and time-consuming process. To make data analysis easier, we developed a set of pipelines for rapid processing and analysis poplar gene expression data for knowledge discovery. Of all pipelines developed, differentially expressed genes (DEGs) pipeline is the one designed to identify biologically important genes that are differentially expressed in one of multiple time points for conditions. Pathway analysis pipeline was designed to identify the differentially expression metabolic pathways. Protein domain enrichment pipeline can identify the enriched protein domains present in the DEGs. Finally, Gene Ontology (GO) enrichment analysis pipeline was developed to identify the enriched GO terms in the DEGs. Our pipeline tools can analyze both microarray gene data and high-throughput gene data. These two types of data are obtained by two different technologies. A microarray technology is to measure gene expression levels via microarray chips, a collection of microscopic DNA spots attached to a solid (glass) surface, whereas high throughput sequencing, also called as the next-generation sequencing, is a new technology to measure gene expression levels by directly sequencing mRNAs, and obtaining each mRNA’s copy numbers in cells or tissues. We also developed a web portal (http://sys.bio.mtu.edu/) to make all pipelines available to public to facilitate users to analyze their gene expression data. In addition to the analyses mentioned above, it can also perform GO hierarchy analysis, i.e. construct GO trees using a list of GO terms as an input.
Healthcare systems have assimilated information and communication technologies in order to improve the quality of healthcare and patient's experience at reduced costs. The increasing digitalization of people's health information raises however new threats regarding information security and privacy. Accidental or deliberate data breaches of health data may lead to societal pressures, embarrassment and discrimination. Information security and privacy are paramount to achieve high quality healthcare services, and further, to not harm individuals when providing care. With that in mind, we give special attention to the category of Mobile Health (mHealth) systems. That is, the use of mobile devices (e.g., mobile phones, sensors, PDAs) to support medical and public health. Such systems, have been particularly successful in developing countries, taking advantage of the flourishing mobile market and the need to expand the coverage of primary healthcare programs. Many mHealth initiatives, however, fail to address security and privacy issues. This, coupled with the lack of specific legislation for privacy and data protection in these countries, increases the risk of harm to individuals. The overall objective of this thesis is to enhance knowledge regarding the design of security and privacy technologies for mHealth systems. In particular, we deal with mHealth Data Collection Systems (MDCSs), which consists of mobile devices for collecting and reporting health-related data, replacing paper-based approaches for health surveys and surveillance. This thesis consists of publications contributing to mHealth security and privacy in various ways: with a comprehensive literature review about mHealth in Brazil; with the design of a security framework for MDCSs (SecourHealth); with the design of a MDCS (GeoHealth); with the design of Privacy Impact Assessment template for MDCSs; and with the study of ontology-based obfuscation and anonymisation functions for health data.
This thesis investigates the legal, ethical, technical, and psychological issues of general data processing and artificial intelligence practices and the explainability of AI systems. It consists of two main parts. In the initial section, we provide a comprehensive overview of the big data processing ecosystem and the main challenges we face today. We then evaluate the GDPR’s data privacy framework in the European Union. The Trustworthy AI Framework proposed by the EU’s High-Level Expert Group on AI (AI HLEG) is examined in detail. The ethical principles for the foundation and realization of Trustworthy AI are analyzed along with the assessment list prepared by the AI HLEG. Then, we list the main big data challenges the European researchers and institutions identified and provide a literature review on the technical and organizational measures to address these challenges. A quantitative analysis is conducted on the identified big data challenges and the measures to address them, which leads to practical recommendations for better data processing and AI practices in the EU. In the subsequent part, we concentrate on the explainability of AI systems. We clarify the terminology and list the goals aimed at the explainability of AI systems. We identify the reasons for the explainability-accuracy trade-off and how we can address it. We conduct a comparative cognitive analysis between human reasoning and machine-generated explanations with the aim of understanding how explainable AI can contribute to human reasoning. We then focus on the technical and legal responses to remedy the explainability problem. In this part, GDPR’s right to explanation framework and safeguards are analyzed in-depth with their contribution to the realization of Trustworthy AI. Then, we analyze the explanation techniques applicable at different stages of machine learning and propose several recommendations in chronological order to develop GDPR-compliant and Trustworthy XAI systems.
The aim of this research was to analyze temporal auditory processing and phonological awareness in school-age children with benign childhood epilepsy with centrotemporal spikes (BECTS). Patient group (GI) consisted of 13 children diagnosed with BECTS. Control group (GII) consisted of 17 healthy children. After neurological and peripheral audiological assessment, children underwent a behavioral auditory evaluation and phonological awareness assessment. The procedures applied were: Gaps-in-Noise test (GIN), Duration Pattern test, and Phonological Awareness test (PCF). Results were compared between the groups and a correlation analysis was performed between temporal tasks and phonological awareness performance. GII performed significantly better than the children with BECTS (GI) in both GIN and Duration Pattern test (P < 0.001). GI performed significantly worse in all of the 4 categories of phonological awareness assessed: syllabic (P = 0.001), phonemic (P = 0.006), rhyme (P = 0.015) and alliteration (P = 0.010). Statistical analysis showed a significant positive correlation between the phonological awareness assessment and Duration Pattern test (P < 0.001). From the analysis of the results, it was concluded that children with BECTS may have difficulties in temporal resolution, temporal ordering, and phonological awareness skills. A correlation was observed between auditory temporal processing and phonological awareness in the suited sample.
Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdos-Renyi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabasi-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree k variation, decreasing its network recovery rate with the increase of k. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.
The groundwater recharge and water fluxes of the Guarani Aquifer System in the state of Sao Paulo in Brazil were assessed through a numeric model. The study area (6,748 km(2)) comprises Jacar,-Gua double dagger A(0) and Jacar,-Pepira River watersheds, tributaries of the Tiet River in the central region of the state. GIS based tools were used in the storage, processing and analysis of data. Main hydrologic phenomena were selected, leading to a groundwater conceptual model, taking into account the significant outcrops occurring in the study area. Six recharge zones were related to the geologic formation and structures of the semi-confined and phreatic aquifer. The model was calibrated against the baseflows and static water levels of the wells. The results emphasize the strong interaction of groundwater flows between watersheds and the groundwater inflow into the rivers. It has been concluded that lateral groundwater exchanges between basins, the deep discharges to the regional system, and well exploitation were not significant aquifer outflows when compared to the aquifer recharge. The results have shown that the inflows from the river into the aquifer are significant and have the utmost importance since the aquifer is potentially more vulnerable in these places.
Isolation and analysis of bioactive isoflavonoids and chalcone from a new type of Brazilian propolis
Activity-directed fractionation and purification processes were employed to identify isoflavonoids with antioxidant and antimicrobial activities from Brazilian red propolis. Crude propolis was extracted with ethanol (80%. v/v) and fractioned by liquid-liquid extraction technique using hexane and chloroform. Since chloroform fraction showed strong antioxidant and antimicrobial activities it was purified and isolated using various chromatographic techniques. Comparing our spectral data (UV, NMR, and mass spectrometry) with values found in the literature, we identified two bioactive isoflavonoids (vestitol and neovestitol), together with one chalcone (isoliquiritigenin). Vestitol presented higher antioxidant activity against beta-carotene consumption than neovestitol. The antimicrobial activity of these three compounds against Staphylococcus aureus, Streptococcus mutans, and Actinomyces naeslundii was evaluated and we concluded that isoliquiritigenin was the most active one with lower MIC, ranging from 15.6 to 62.5 mu g/mL. Our results showed that Brazilian red propolis has biologically active isoflavonoids that may be used as a mild antioxidant and antimicrobial for food preservation. (C) 2010 Elsevier B.V. All rights reserved.
The Brazilian Network of Food Data Systems (BRASILFOODS) has been keeping the Brazilian Food Composition Database-USP (TBCA-USP) (http://www.fcf.usp.br/tabela) since 1998. Besides the constant compilation, analysis and update work in the database, the network tries to innovate through the introduction of food information that may contribute to decrease the risk for non-transmissible chronic diseases, such as the profile of carbohydrates and flavonoids in foods. In 2008, data on carbohydrates, individually analyzed, of 112 foods, and 41 data related to the glycemic response produced by foods widely consumed in the country were included in the TBCA-USP. Data (773) about the different flavonoid subclasses of 197 Brazilian foods were compiled and the quality of each data was evaluated according to the USDAs data quality evaluation system. In 2007, BRASILFOODS/USP and INFOODS/FAO organized the 7th International Food Data Conference ""Food Composition and Biodiversity"". This conference was a unique opportunity for interaction between renowned researchers and participants from several countries and it allowed the discussion of aspects that may improve the food composition area. During the period, the LATINFOODS Regional Technical Compilation Committee and BRASILFOODS disseminated to Latin America the Form and Manual for Data Compilation, version 2009, ministered a Food Composition Data Compilation course and developed many activities related to data production and compilation. (C) 2010 Elsevier Inc. All rights reserved.
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
The writers measured velocity, pressure and energy distributions, wavelengths, and wave amplitudes along undular jumps in a smooth rectangular channel 0.25 m wide. In each case the upstream flow was a fully developed shear flow. Analysis of the data shows that the jump has strong three-dimensional features and that the aspect ratio of the channel is an important parameter. Energy dissipation on the centerline is far from negligible and is largely constrained to the reach between the start of the lateral shock waves and the first wave crest of the jump, in which the boundary layer develops under a strong adverse pressure gradient. A Boussinesq-type solution of the free-surface profile, velocity, and energy and pressure distributions is developed and compared with the data. Limitations of the two-dimensional analysis are discussed.
We assess the effects of chemical processing, ethylene oxide sterilization, and threading on bone surface and mechanical properties of bovine undecalcified bone screws. In addition, we evaluate the possibility of manufacturing bone screws with predefined dimensions. Scanning electronic microscopic images show that chemical processing and ethylene oxide treatment causes collagen fiber amalgamation on the bone surface. Processed screws hold higher ultimate loads under bending and torsion than the in natura bone group, with no change in pull-out strength between groups. Threading significantly reduces deformation and bone strength under torsion. Metrological data demonstrate the possibility of manufacturing bone screws with standardized dimensions.
Functional brain imaging techniques such as functional MRI (fMRI) that allow the in vivo investigation of the human brain have been exponentially employed to address the neurophysiological substrates of emotional processing. Despite the growing number of fMRI studies in the field, when taken separately these individual imaging studies demonstrate contrasting findings and variable pictures, and are unable to definitively characterize the neural networks underlying each specific emotional condition. Different imaging packages, as well as the statistical approaches for image processing and analysis, probably have a detrimental role by increasing the heterogeneity of findings. In particular, it is unclear to what extent the observed neurofunctional response of the brain cortex during emotional processing depends on the fMRI package used in the analysis. In this pilot study, we performed a double analysis of an fMRI dataset using emotional faces. The Statistical Parametric Mapping (SPM) version 2.6 (Wellcome Department of Cognitive Neurology, London, UK) and the XBAM 3.4 (Brain Imaging Analysis Unit, Institute of Psychiatry, Kings College London, UK) programs, which use parametric and non-parametric analysis, respectively, were used to assess our results. Both packages revealed that processing of emotional faces was associated with an increased activation in the brain`s visual areas (occipital, fusiform and lingual gyri), in the cerebellum, in the parietal cortex, in the cingulate cortex (anterior and posterior cingulate), and in the dorsolateral and ventrolateral prefrontal cortex. However, blood oxygenation level-dependent (BOLD) response in the temporal regions, insula and putamen was evident in the XBAM analysis but not in the SPM analysis. Overall, SPM and XBAM analyses revealed comparable whole-group brain responses. Further Studies are needed to explore the between-group compatibility of the different imaging packages in other cognitive and emotional processing domains. (C) 2009 Elsevier Ltd. All rights reserved.
Estuaries are perhaps the most threatened environments in the coastal fringe; the coincidence of high natural value and attractiveness for human use has led to conflicts between conservation and development. These conflicts occur in the Sado Estuary since its location is near the industrialised zone of Peninsula of Setúbal and at the same time, a great part of the Estuary is classified as a Natural Reserve due to its high biodiversity. These facts led us to the need of implementing a model of environmental management and quality assessment, based on methodologies that enable the assessment of the Sado Estuary quality and evaluation of the human pressures in the estuary. These methodologies are based on indicators that can better depict the state of the environment and not necessarily all that could be measured or analysed. Sediments have always been considered as an important temporary source of some compounds or a sink for other type of materials or an interface where a great diversity of biogeochemical transformations occur. For all this they are of great importance in the formulation of coastal management system. Many authors have been using sediments to monitor aquatic contamination, showing great advantages when compared to the sampling of the traditional water column. The main objective of this thesis was to develop an estuary environmental management framework applied to Sado Estuary using the DPSIR Model (EMMSado), including data collection, data processing and data analysis. The support infrastructure of EMMSado were a set of spatially contiguous and homogeneous regions of sediment structure (management units). The environmental quality of the estuary was assessed through the sediment quality assessment and integrated in a preliminary stage with the human pressure for development. Besides the earlier explained advantages, studying the quality of the estuary mainly based on the indicators and indexes of the sediment compartment also turns this methodology easier, faster and human and financial resource saving. These are essential factors to an efficient environmental management of coastal areas. Data management, visualization, processing and analysis was obtained through the combined use of indicators and indices, sampling optimization techniques, Geographical Information Systems, remote sensing, statistics for spatial data, Global Positioning Systems and best expert judgments. As a global conclusion, from the nineteen management units delineated and analyzed three showed no ecological risk (18.5 % of the study area). The areas of more concern (5.6 % of the study area) are located in the North Channel and are under strong human pressure mainly due to industrial activities. These areas have also low hydrodynamics and are, thus associated with high levels of deposition. In particular the areas near Lisnave and Eurominas industries can also accumulate the contamination coming from Águas de Moura Channel, since particles coming from that channel can settle down in that area due to residual flow. In these areas the contaminants of concern, from those analyzed, are the heavy metals and metalloids (Cd, Cu, Zn and As exceeded the PEL guidelines) and the pesticides BHC isomers, heptachlor, isodrin, DDT and metabolits, endosulfan and endrin. In the remain management units (76 % of the study area) there is a moderate impact potential of occurrence of adverse ecological effects and in some of these areas no stress agents could be identified. This emphasizes the need for further research, since unmeasured chemicals may be causing or contributing to these adverse effects. Special attention must be taken to the units with moderate impact potential of occurrence of adverse ecological effects, located inside the natural reserve. Non-point source pollution coming from agriculture and aquaculture activities also seem to contribute with important pollution load into the estuary entering from Águas de Moura Channel. This pressure is expressed in a moderate impact potential for ecological risk existent in the areas near the entrance of this Channel. Pressures may also came from Alcácer Channel although they were not quantified in this study. The management framework presented here, including all the methodological tools may be applied and tested in other estuarine ecosystems, which will also allow a comparison between estuarine ecosystems in other parts of the globe.
Cooperating objects (COs) is a recently coined term used to signify the convergence of classical embedded computer systems, wireless sensor networks and robotics and control. We present essential elements of a reference architecture for scalable data processing for the CO paradigm.
The latest medical diagnosis devices enable the performance of e-diagnosis making the access to these services easier, faster and available in remote areas. However this imposes new communications and data interchange challenges. In this paper a new XML based format for storing cardiac signals and related information is presented. The proposed structure encompasses data acquisition devices, patient information, data description, pathological diagnosis and waveform annotation. When compared with similar purpose formats several advantages arise. Besides the full integrated data model it may also be noted the available geographical references for e-diagnosis, the multi stream data description, the ability to handle several simultaneous devices, the possibility of independent waveform annotation and a HL7 compliant structure for common contents. These features represent an enhanced integration with existent systems and an improved flexibility for cardiac data representation.