855 resultados para Large-scale databases
Resumo:
This paper addresses biometric identification using large databases, in particular, iris databases. In such applications, it is critical to have low response time, while maintaining an acceptable recognition rate. Thus, the trade-off between speed and accuracy must be evaluated for processing and recognition parts of an identification system. In this paper, a graph-based framework for pattern recognition, called Optimum-Path Forest (OPF), is utilized as a classifier in a pre-developed iris recognition system. The aim of this paper is to verify the effectiveness of OPF in the field of iris recognition, and its performance for various scale iris databases. The existing Gauss-Laguerre Wavelet based coding scheme is used for iris encoding. The performance of the OPF and two other - Hamming and Bayesian - classifiers, is compared using small, medium, and large-scale databases. Such a comparison shows that the OPF has faster response for large-scale databases, thus performing better than the more accurate, but slower, classifiers.
Resumo:
Majority of biometric researchers focus on the accuracy of matching using biometrics databases, including iris databases, while the scalability and speed issues have been neglected. In the applications such as identification in airports and borders, it is critical for the identification system to have low-time response. In this paper, a graph-based framework for pattern recognition, called Optimum-Path Forest (OPF), is utilized as a classifier in a pre-developed iris recognition system. The aim of this paper is to verify the effectiveness of OPF in the field of iris recognition, and its performance for various scale iris databases. This paper investigates several classifiers, which are widely used in iris recognition papers, and the response time along with accuracy. The existing Gauss-Laguerre Wavelet based iris coding scheme, which shows perfect discrimination with rotary Hamming distance classifier, is used for iris coding. The performance of classifiers is compared using small, medium, and large scale databases. Such comparison shows that OPF has faster response for large scale database, thus performing better than more accurate but slower Bayesian classifier.
Resumo:
OBJECTIVES: Barrett’s esophagus (BE) is a common premalignant lesion for which surveillance is recommended. This strategy is limited by considerable variations in clinical practice. We conducted an international, multidisciplinary, systematic search and evidence-based review of BE and provided consensus recommendations for clinical use in patients with nondysplastic, indefinite, and low-grade dysplasia (LGD). METHODS: We defined the scope, proposed statements, and searched electronic databases, yielding 20,558 publications that were screened, selected online, and formed the evidence base. We used a Delphi consensus process, with an 80% agreement threshold, using GRADE (Grading of Recommendations Assessment, Development and Evaluation) to categorize the quality of evidence and strength of recommendations. RESULTS: In total, 80% of respondents agreed with 55 of 127 statements in the final voting rounds. Population endoscopic screening is not recommended and screening should target only very high-risk cases of males aged over 60 years with chronic uncontrolled reflux. A new international definition of BE was agreed upon. For any degree of dysplasia, at least two specialist gastrointestinal (GI) pathologists are required. Risk factors for cancer include male gender, length of BE, and central obesity. Endoscopic resection should be used for visible, nodular areas. Surveillance is not recommended for <5 years of life expectancy. Management strategies for indefinite dysplasia (IND) and LGD were identified, including a de-escalation strategy for lower-risk patients and escalation to intervention with follow-up for higher-risk patients. CONCLUSIONS: In this uniquely large consensus process in gastroenterology, we made key clinical recommendations for the escalation/de-escalation of BE in clinical practice. We made strong recommendations for the prioritization of future research.
Resumo:
The high dependence of Portugal from foreign energy sources (mainly fossil fuels), together with the international commitments assumed by Portugal and the national strategy in terms of energy policy, as well as resources sustainability and climate change issues, inevitably force Portugal to invest in its energetic self-sufficiency. The 20/20/20 Strategy defined by the European Union defines that in 2020 60% of the total electricity consumption must come from renewable energy sources. Wind energy is currently a major source of electricity generation in Portugal, producing about 23% of the national total electricity consumption in 2013. The National Energy Strategy 2020 (ENE2020), which aims to ensure the national compliance of the European Strategy 20/20/20, states that about half of this 60% target will be provided by wind energy. This work aims to implement and optimise a numerical weather prediction model in the simulation and modelling of the wind energy resource in Portugal, both in offshore and onshore areas. The numerical model optimisation consisted in the determination of which initial and boundary conditions and planetary boundary layer physical parameterizations options provide wind power flux (or energy density), wind speed and direction simulations closest to in situ measured wind data. Specifically for offshore areas, it is also intended to evaluate if the numerical model, once optimised, is able to produce power flux, wind speed and direction simulations more consistent with in situ measured data than wind measurements collected by satellites. This work also aims to study and analyse possible impacts that anthropogenic climate changes may have on the future wind energetic resource in Europe. The results show that the ECMWF reanalysis ERA-Interim are those that, among all the forcing databases currently available to drive numerical weather prediction models, allow wind power flux, wind speed and direction simulations more consistent with in situ wind measurements. It was also found that the Pleim-Xiu and ACM2 planetary boundary layer parameterizations are the ones that showed the best performance in terms of wind power flux, wind speed and direction simulations. This model optimisation allowed a significant reduction of the wind power flux, wind speed and direction simulations errors and, specifically for offshore areas, wind power flux, wind speed and direction simulations more consistent with in situ wind measurements than data obtained from satellites, which is a very valuable and interesting achievement. This work also revealed that future anthropogenic climate changes can negatively impact future European wind energy resource, due to tendencies towards a reduction in future wind speeds especially by the end of the current century and under stronger radiative forcing conditions.
Resumo:
This paper considers the potential contribution of secondary quantitative analyses of large scale surveys to the investigation of 'other' childhoods. Exploring other childhoods involves investigating the experience of young people who are unequally positioned in relation to multiple, embodied, identity locations, such as (dis)ability, 'class', gender, sexuality, ethnicity and race. Despite some possible advantages of utilising extensive databases, the paper outlines a number of methodological problems with existing surveys which tend to reinforce adultist and broader hierarchical social relations. It is contended that scholars of children's geographies could overcome some of these problematic aspects of secondary data sources by endeavouring to transform the research relations of large scale surveys. Such endeavours would present new theoretical, ethical and methodological complexities, which are briefly considered.
Resumo:
The field site network (FSN) plays a central role in conducting joint research within all Assessing Large-scale Risks for biodiversity with tested Methods (ALARM) modules and provides a mechanism for integrating research on different topics in ALARM on the same site for measuring multiple impacts on biodiversity. The network covers most European climates and biogeographic regions, from Mediterranean through central European and boreal to subarctic. The project links databases with the European-wide field site network FSN, including geographic information system (GIS)-based information to characterise the test location for ALARM researchers for joint on-site research. Maps are provided in a standardised way and merged with other site-specific information. The application of GIS for these field sites and the information management promotes the use of the FSN for research and to disseminate the results. We conclude that ALARM FSN sites together with other research sites in Europe jointly could be used as a future backbone for research proposals
Resumo:
The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe
Resumo:
Background: Esophageal adenocarcinoma (EA) is one of the fastest rising cancers in western countries. Barrett’s Esophagus (BE) is the premalignant precursor of EA. However, only a subset of BE patients develop EA, which complicates the clinical management in the absence of valid predictors. Genetic risk factors for BE and EA are incompletely understood. This study aimed to identify novel genetic risk factors for BE and EA.Methods: Within an international consortium of groups involved in the genetics of BE/EA, we performed the first meta-analysis of all genome-wide association studies (GWAS) available, involving 6,167 BE patients, 4,112 EA patients, and 17,159 representative controls, all of European ancestry, genotyped on Illumina high-density SNP-arrays, collected from four separate studies within North America, Europe, and Australia. Meta-analysis was conducted using the fixed-effects inverse variance-weighting approach. We used the standard genome-wide significant threshold of 5×10-8 for this study. We also conducted an association analysis following reweighting of loci using an approach that investigates annotation enrichment among the genome-wide significant loci. The entire GWAS-data set was also analyzed using bioinformatics approaches including functional annotation databases as well as gene-based and pathway-based methods in order to identify pathophysiologically relevant cellular pathways.Findings: We identified eight new associated risk loci for BE and EA, within or near the CFTR (rs17451754, P=4·8×10-10), MSRA (rs17749155, P=5·2×10-10), BLK (rs10108511, P=2·1×10-9), KHDRBS2 (rs62423175, P=3·0×10-9), TPPP/CEP72 (rs9918259, P=3·2×10-9), TMOD1 (rs7852462, P=1·5×10-8), SATB2 (rs139606545, P=2·0×10-8), and HTR3C/ABCC5 genes (rs9823696, P=1·6×10-8). A further novel risk locus at LPA (rs12207195, posteriori probability=0·925) was identified after re-weighting using significantly enriched annotations. This study thereby doubled the number of known risk loci. The strongest disease pathways identified (P<10-6) belong to muscle cell differentiation and to mesenchyme development/differentiation, which fit with current pathophysiological BE/EA concepts. To our knowledge, this study identified for the first time an EA-specific association (rs9823696, P=1·6×10-8) near HTR3C/ABCC5 which is independent of BE development (P=0·45).Interpretation: The identified disease loci and pathways reveal new insights into the etiology of BE and EA. Furthermore, the EA-specific association at HTR3C/ABCC5 may constitute a novel genetic marker for the prediction of transition from BE to EA. Mutations in CFTR, one of the new risk loci identified in this study, cause cystic fibrosis (CF), the most common recessive disorder in Europeans. Gastroesophageal reflux (GER) belongs to the phenotypic CF-spectrum and represents the main risk factor for BE/EA. Thus, the CFTR locus may trigger a common GER-mediated pathophysiology.
Resumo:
The overwhelming amount and unprecedented speed of publication in the biomedical domain make it difficult for life science researchers to acquire and maintain a broad view of the field and gather all information that would be relevant for their research. As a response to this problem, the BioNLP (Biomedical Natural Language Processing) community of researches has emerged and strives to assist life science researchers by developing modern natural language processing (NLP), information extraction (IE) and information retrieval (IR) methods that can be applied at large-scale, to scan the whole publicly available biomedical literature and extract and aggregate the information found within, while automatically normalizing the variability of natural language statements. Among different tasks, biomedical event extraction has received much attention within BioNLP community recently. Biomedical event extraction constitutes the identification of biological processes and interactions described in biomedical literature, and their representation as a set of recursive event structures. The 2009–2013 series of BioNLP Shared Tasks on Event Extraction have given raise to a number of event extraction systems, several of which have been applied at a large scale (the full set of PubMed abstracts and PubMed Central Open Access full text articles), leading to creation of massive biomedical event databases, each of which containing millions of events. Sinece top-ranking event extraction systems are based on machine-learning approach and are trained on the narrow-domain, carefully selected Shared Task training data, their performance drops when being faced with the topically highly varied PubMed and PubMed Central documents. Specifically, false-positive predictions by these systems lead to generation of incorrect biomolecular events which are spotted by the end-users. This thesis proposes a novel post-processing approach, utilizing a combination of supervised and unsupervised learning techniques, that can automatically identify and filter out a considerable proportion of incorrect events from large-scale event databases, thus increasing the general credibility of those databases. The second part of this thesis is dedicated to a system we developed for hypothesis generation from large-scale event databases, which is able to discover novel biomolecular interactions among genes/gene-products. We cast the hypothesis generation problem as a supervised network topology prediction, i.e predicting new edges in the network, as well as types and directions for these edges, utilizing a set of features that can be extracted from large biomedical event networks. Routine machine learning evaluation results, as well as manual evaluation results suggest that the problem is indeed learnable. This work won the Best Paper Award in The 5th International Symposium on Languages in Biology and Medicine (LBM 2013).
Resumo:
Most departmental computing infrastructure reflects the state of networking technology and available funds at the time of construction, which converge in a preconceived notion of homogeneity of network architecture and usage patterns. The DMAN (Digital Media Access Network) project, a large-scale server and network foundation for the Hong Kong Polytechnic University's School of Design was created as a platform that would support a highly complex academic environment while giving maximum freedom to students, faculty and researchers through simplicity and ease of use. As a centralized multi-user computation backbone, DMAN faces an extremely hetrogeneous user and application profile, exceeding implementation and maintenance challenges of typical enterprise, and even most academic server set-ups. This paper sumarizes the specification, implementation and application of the system while describing its significance for design education in a computational context.