3 resultados para Large datasets
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.
Resumo:
The study of supermassive black hole (SMBH) accretion during their phase of activity (hence becoming active galactic nuclei, AGN), and its relation to the host-galaxy growth, requires large datasets of AGN, including a significant fraction of obscured sources. X-ray data are strategic in AGN selection, because at X-ray energies the contamination from non-active galaxies is far less significant than in optical/infrared surveys, and the selection of obscured AGN, including also a fraction of heavily obscured AGN, is much more effective. In this thesis, I present the results of the Chandra COSMOS Legacy survey, a 4.6 Ms X-ray survey covering the equatorial COSMOS area. The COSMOS Legacy depth (flux limit f=2x10^(-16) erg/s/cm^(-2) in the 0.5-2 keV band) is significantly better than that of other X-ray surveys on similar area, and represents the path for surveys with future facilities, like Athena and X-ray Surveyor. The final Chandra COSMOS Legacy catalog contains 4016 point-like sources, 97% of which with redshift. 65% of the sources are optically obscured and potentially caught in the phase of main BH growth. We used the sample of 174 Chandra COSMOS Legacy at z>3 to place constraints on the BH formation scenario. We found a significant disagreement between our space density and the predictions of a physical model of AGN activation through major-merger. This suggests that in our luminosity range the BH triggering through secular accretion is likely preferred to a major-merger triggering scenario. Thanks to its large statistics, the Chandra COSMOS Legacy dataset, combined with the other multiwavelength COSMOS catalogs, will be used to answer questions related to a large number of astrophysical topics, with particular focus on the SMBH accretion in different luminosity and redshift regimes.
Resumo:
Concerns over global change and its effect on coral reef survivorship have highlighted the need for long-term datasets and proxy records, to interpret environmental trends and inform policymakers. Citizen science programs have showed to be a valid method for collecting data, reducing financial and time costs for institutions. This study is based on the elaboration of data collected by recreational divers and its main purpose is to evaluate changes in the state of coral reef biodiversity in the Red Sea over a long term period and validate the volunteer-based monitoring method. Volunteers recreational divers completed a questionnaire after each dive, recording the presence of 72 animal taxa and negative reef conditions. Comparisons were made between records from volunteers and independent records from a marine biologist who performed the same dive at the same time. A total of 500 volunteers were tested in 78 validation trials. Relative values of accuracy, reliability and similarity seem to be comparable to those performed by volunteer divers on precise transects in other projects, or in community-based terrestrial monitoring. 9301 recreational divers participated in the monitoring program, completing 23,059 survey questionnaires in a 5-year period. The volunteer-sightings-based index showed significant differences between the geographical areas. The area of Hurghada is distinguished by a medium-low biodiversity index, heavily damaged by a not controlled anthropic exploitation. Coral reefs along the Ras Mohammed National Park at Sharm el Sheikh, conversely showed high biodiversity index. The detected pattern seems to be correlated with the conservation measures adopted. In our experience and that of other research institutes, citizen science can integrate conventional methods and significantly reduce costs and time. Involving recreational divers we were able to build a large data set, covering a wide geographic area. The main limitation remains the difficulty of obtaining an homogeneous spatial sampling distribution.