340 resultados para DATASETS
Resumo:
BACKGROUND Endometriosis is a heritable common gynaecological condition influenced by multiple genetic and environmental factors. Genome-wide association studies (GWASs) have proved successful in identifying common genetic variants of moderate effects for various complex diseases. To date, eight GWAS and replication studies from multiple populations have been published on endometriosis. In this review, we investigate the consistency and heterogeneity of the results across all the studies and their implications for an improved understanding of the aetiology of the condition. METHODS Meta-analyses were conducted on four GWASs and four replication studies including a total of 11 506 cases and 32 678 controls, and on the subset of studies that investigated associations for revised American Fertility Society (rAFS) Stage III/IV including 2859 cases. The datasets included 9039 cases and 27 343 controls of European (Australia, Belgium, Italy, UK, USA) and 2467 cases and 5335 controls of Japanese ancestry. Fixed and Han and Elkin random-effects models, and heterogeneity statistics (Cochran's Q test), were used to investigate the evidence of the nine reported genome-wide significant loci across datasets and populations. RESULTS Meta-analysis showed that seven out of nine loci had consistent directions of effect across studies and populations, and six out of nine remained genome-wide significant (P < 5 × 10(-8)), including rs12700667 on 7p15.2 (P = 1.6 × 10(-9)), rs7521902 near WNT4 (P = 1.8 × 10(-15)), rs10859871 near VEZT (P = 4.7 × 10(-15)), rs1537377 near CDKN2B-AS1 (P = 1.5 × 10(-8)), rs7739264 near ID4 (P = 6.2 × 10(-10)) and rs13394619 in GREB1 (P = 4.5 × 10(-8)). In addition to the six loci, two showed borderline genome-wide significant associations with Stage III/IV endometriosis, including rs1250248 in FN1 (P = 8 × 10(-8)) and rs4141819 on 2p14 (P = 9.2 × 10(-8)). Two independent inter-genic loci, rs4141819 and rs6734792 on chromosome 2, showed significant evidence of heterogeneity across datasets (P < 0.005). Eight of the nine loci had stronger effect sizes among Stage III/IV cases, implying that they are likely to be implicated in the development of moderate to severe, or ovarian, disease. While three out of nine loci were inter-genic, the remaining were in or near genes with known functions of biological relevance to endometriosis, varying from roles in developmental pathways to cellular growth/carcinogenesis. CONCLUSIONS Our meta-analysis shows remarkable consistency in endometriosis GWAS results across studies, with little evidence of population-based heterogeneity. They also show that the phenotypic classifications used in GWAS to date have been limited. Stronger associations with Stage III/IV disease observed for most loci emphasize the importance for future studies to include detailed sub-phenotype information. Functional studies in relevant tissues are needed to understand the effect of the variants on downstream biological pathways.
Resumo:
Birds represent the most diverse extant tetrapod clade, with ca. 10,000 extant species, and the timing of the crown avian radiation remains hotly debated. The fossil record supports a primarily Cenozoic radiation of crown birds, whereas molecular divergence dating analyses generally imply that this radiation was well underway during the Cretaceous. Furthermore, substantial differences have been noted between published divergence estimates. These have been variously attributed to clock model, calibration regime, and gene type. One underappreciated phenomenon is that disparity between fossil ages and molecular dates tends to be proportionally greater for shallower nodes in the avian Tree of Life. Here, we explore potential drivers of disparity in avian divergence dates through a set of analyses applying various calibration strategies and coding methods to a mitochondrial genome dataset and an 18-gene nuclear dataset, both sampled across 72 taxa. Our analyses support the occurrence of two deep divergences (i.e., the Palaeognathae/Neognathae split and the Galloanserae/Neoaves split) well within the Cretaceous, followed by a rapid radiation of Neoaves near the K-Pg boundary. However, 95% highest posterior density intervals for most basal divergences in Neoaves cross the boundary, and we emphasize that, barring unreasonably strict prior distributions, distinguishing between a rapid Early Paleocene radiation and a Late Cretaceous radiation may be beyond the resolving power of currently favored divergence dating methods. In contrast to recent observations for placental mammals, constraining all divergences within Neoaves to occur in the Cenozoic does not result in unreasonably high inferred substitution rates. Comparisons of nuclear DNA (nDNA) versus mitochondrial DNA (mtDNA) datasets and NT- versus RY-coded mitochondrial data reveal patterns of disparity that are consistent with substitution model misspecifications that result in tree compression/tree extension artifacts, which may explain some discordance between previous divergence estimates based on different sequence types. Comparisons of fully calibrated and nominally calibrated trees support a correlation between body mass and apparent dating error. Overall, our results are consistent with (but do not require) a Paleogene radiation for most major clades of crown birds.
Resumo:
The following technical report describes the approach and algorithm used to detect marine mammals from aerial imagery taken from manned/unmanned platform. The aim is to automate the process of counting the population of dugongs and other mammals. We have developed and algorithm that automatically presents to a user a number of possible candidates of these mammals. We tested the algorithm in two distinct datasets taken from different altitudes. Analysis and discussion is presented in regards with the complexity of the input datasets, the detection performance.
Resumo:
Principal Topic A small firm is unlikely to possess internally the full range of knowledge and skills that it requires or could benefit from for the development of its business. The ability to acquire suitable external expertise - defined as knowledge or competence that is rare in the firm and acquired from the outside - when needed thus becomes a competitive factor in itself. Access to external expertise enables the firm to focus on its core competencies and removes the necessity to internalize every skill and competence. However, research on how small firms access external expertise is still scarce. The present study contributes to this under-developed discussion by analysing the role of trust and strong ties in the small firm's selection and evaluation of sources of external expertise (henceforth referred to as the 'business advisor' or 'advisor'). Granovetter (1973, 1361) defines the strength of a network tie as 'a (probably linear) combination of the amount of time, the emotional intensity, the intimacy (mutual confiding) and the reciprocal services which characterize the tie'. Strong ties in the context of the present investigation refer to sources of external expertise who are well known to the owner-manager, and who may be either informal (e.g., family, friends) or professional advisors (e.g., consultants, enterprise support officers, accountants or solicitors). Previous research has suggested that strong and weak ties have different fortes and the choice of business advisors could thus be critical to business performance) While previous research results suggest that small businesses favour previously well known business advisors, prior studies have also pointed out that an excessive reliance on a network of well known actors might hamper business development, as the range of expertise available through strong ties is limited. But are owner-managers of small businesses aware of this limitation and does it matter to them? Or does working with a well-known advisor compensate for it? Hence, our research model first examines the impact of the strength of tie on the business advisor's perceived performance. Next, we ask what encourages a small business owner-manager to seek advice from a strong tie. A recent exploratory study by Welter and Kautonen (2005) drew attention to the central role of trust in this context. However, while their study found support for the general proposition that trust plays an important role in the choice of advisors, how trust and its different dimensions actually affect this choice remained ambiguous. The present paper develops this discussion by considering the impact of the different dimensions of perceived trustworthiness, defined as benevolence, integrity and ability, on the strength of tie. Further, we suggest that the dimensions of perceived trustworthiness relevant in the choice of a strong tie vary between professional and informal advisors. Methodology/Key Propositions Our propositions are examined empirically based on survey data comprising 153 Finnish small businesses. The data are analysed utilizing the partial least squares (PLS) approach to structural equation modelling with SmartPLS 2.0. Being non-parametric, the PLS algorithm is particularly well-suited to analysing small datasets with non-normally distributed variables. Results and Implications The path model shows that the stronger the tie, the more positively the advisor's performance is perceived. Hypothesis 1, that strong ties will be associated with higher perceptions of performance is clearly supported. Benevolence is clearly the most significant predictor of the choice of a strong tie for external expertise. While ability also reaches a moderate level of statistical significance, integrity does not have a statistically significant impact on the choice of a strong tie. Hence, we found support for two out of three independent variables included in Hypothesis 2. Path coefficients differed between the professional and informal advisor subsamples. The results of the exploratory group comparison show that Hypothesis 3a regarding ability being associated with strong ties more pronouncedly when choosing a professional advisor was not supported. Hypothesis 3b arguing that benevolence is more strongly associated with strong ties in the context of choosing an informal advisor received some support because the path coefficient in the informal advisor subsample was much larger than in the professional advisor subsample. Hypothesis 3c postulating that integrity would be more strongly associated with strong ties in the choice of a professional advisor was supported. Integrity is the most important dimension of trustworthiness in this context. However, integrity is of no concern, or even negative, when using strong ties to choose an informal advisor. The findings of this study have practical relevance to the enterprise support community. First of all, given that the strength of tie has a significant positive impact on the advisor's perceived performance, this implies that small business owners appreciate working with advisors in long-term relationships. Therefore, advisors are well advised to invest into relationship building and maintenance in their work with small firms. Secondly, the results show that, especially in the context of professional advisors, the advisor's perceived integrity and benevolence weigh more than ability. This again emphasizes the need to invest time and effort into building a personal relationship with the owner-manager, rather than merely maintaining a professional image and credentials. Finally, this study demonstrates that the dimensions of perceived trustworthiness are orthogonal with different effects on the strength of tie and ultimately perceived performance. This means that entrepreneurs and advisors should consider the specific dimensions of ability, benevolence and integrity, rather than rely on general perceptions of trustworthiness in their advice relationships.
Resumo:
The challenges of maintaining a building such as the Sydney Opera House are immense and are dependent upon a vast array of information. The value of information can be enhanced by its currency, accessibility and the ability to correlate data sets (integration of information sources). A building information model correlated to various information sources related to the facility is used as definition for a digital facility model. Such a digital facility model would give transparent and an integrated access to an array of datasets and obviously would support Facility Management processes. In order to construct such a digital facility model, two state-of-the-art Information and Communication technologies are considered: an internationally standardized building information model called the Industry Foundation Classes (IFC) and a variety of advanced communication and integration technologies often referred to as the Semantic Web such as the Resource Description Framework (RDF) and the Web Ontology Language (OWL). This paper reports on some technical aspects for developing a digital facility model focusing on Sydney Opera House. The proposed digital facility model enables IFC data to participate in an ontology driven, service-oriented software environment. A proof-of-concept prototype has been developed demonstrating the usability of IFC information to collaborate with Sydney Opera House’s specific data sources using semantic web ontologies.
Resumo:
The Google Online Marketing Challenge is an ongoing collaboration between Google and academics, to give students experiential learning. The Challenge gives student teams US$200 in AdWords, Google’s flagship advertising product, to develop online marketing campaigns for actual businesses. The end result is an engaging in-class exercise that provides students and professors with an exciting and pedagogically rigorous competition. Results from surveys at the end of the Challenge reveal positive appraisals from the three—students, businesses, and professors—main constituents; general agreement between students and instructors regarding learning outcomes; and a few points of difference between students and instructors. In addition to describing the Challenge and its outcomes, this article reviews the postparticipation questionnaires and subsequent datasets. The questionnaires and results are publicly available, and this article invites educators to mine the datasets, share their results, and offer suggestions for future iterations of the Challenge.
Resumo:
Automatic detection of suspicious activities in CCTV camera feeds is crucial to the success of video surveillance systems. Such a capability can help transform the dumb CCTV cameras into smart surveillance tools for fighting crime and terror. Learning and classification of basic human actions is a precursor to detecting suspicious activities. Most of the current approaches rely on a non-realistic assumption that a complete dataset of normal human actions is available. This paper presents a different approach to deal with the problem of understanding human actions in video when no prior information is available. This is achieved by working with an incomplete dataset of basic actions which are continuously updated. Initially, all video segments are represented by Bags-Of-Words (BOW) method using only Term Frequency-Inverse Document Frequency (TF-IDF) features. Then, a data-stream clustering algorithm is applied for updating the system's knowledge from the incoming video feeds. Finally, all the actions are classified into different sets. Experiments and comparisons are conducted on the well known Weizmann and KTH datasets to show the efficacy of the proposed approach.
Resumo:
Research has noted a ‘pronounced pattern of increase with increasing remoteness' of death rates in road crashes. However, crash characteristics by remoteness are not commonly or consistently reported, with definitions of rural and urban often relying on proxy representations such as prevailing speed limit. The current paper seeks to evaluate the efficacy of the Accessibility / Remoteness Index of Australia (ARIA+) to identifying trends in road crashes. ARIA+ does not rely on road-specific measures and uses distances to populated centres to attribute a score to an area, which can in turn be grouped into 5 classifications of increasing remoteness. The current paper uses applications of these classifications at the broad level of Australian Bureau of Statistics' Statistical Local Areas, thus avoiding precise crash locating or dedicated mapping software. Analyses used Queensland road crash database details for all 31,346 crashes resulting in a fatality or hospitalisation occurring between 1st July, 2001 and 30th June 2006 inclusive. Results showed that this simplified application of ARIA+ aligned with previous definitions such as speed limit, while also providing further delineation. Differences in crash contributing factors were noted with increasing remoteness such as a greater representation of alcohol and ‘excessive speed for circumstances.' Other factors such as the predominance of younger drivers in crashes differed little by remoteness classification. The results are discussed in terms of the utility of remoteness as a graduated rather than binary (rural/urban) construct and the potential for combining ARIA crash data with census and hospital datasets.
Resumo:
Background and Objective: As global warming continues, the frequency, intensity and duration of heatwaves are likely to increase. However, a heatwave is unlikely to be defined uniformly because acclimatisation plays a significant role in determining the heat-related impact. This study investigated how to best define a heatwave in Brisbane, Australia. Methods: Computerised datasets on daily weather, air pollution and health outcomes between 1996 and 2005 were obtained from pertinent government agencies. Paired t-tests and case-crossover analyses were performed to assess the relationship between heatwaves and health outcomes using different heatwave definitions. Results: The maximum temperature was as high as 41.5°C with a mean maximum daily temperature of 26.3°C. None of the five commonly-used heatwave definitions suited Brisbane well on the basis of the health effects of heatwaves. Additionally, there were pros and cons when locally-defined definitions were attempted using either a relative or absolute definition for extreme temperatures. Conclusion: The issue of how to best define a heatwave is complex. It is important to identify an appropriate definition of heatwave locally and to understand its health effects.
Resumo:
Objective: To summarise the extent to which narrative text fields in administrative health data are used to gather information about the event resulting in presentation to a health care provider for treatment of an injury, and to highlight best practise approaches to conducting narrative text interrogation for injury surveillance purposes.----- Design: Systematic review----- Data sources: Electronic databases searched included CINAHL, Google Scholar, Medline, Proquest, PubMed and PubMed Central.. Snowballing strategies were employed by searching the bibliographies of retrieved references to identify relevant associated articles.----- Selection criteria: Papers were selected if the study used a health-related database and if the study objectives were to a) use text field to identify injury cases or use text fields to extract additional information on injury circumstances not available from coded data or b) use text fields to assess accuracy of coded data fields for injury-related cases or c) describe methods/approaches for extracting injury information from text fields.----- Methods: The papers identified through the search were independently screened by two authors for inclusion, resulting in 41 papers selected for review. Due to heterogeneity between studies metaanalysis was not performed.----- Results: The majority of papers reviewed focused on describing injury epidemiology trends using coded data and text fields to supplement coded data (28 papers), with these studies demonstrating the value of text data for providing more specific information beyond what had been coded to enable case selection or provide circumstantial information. Caveats were expressed in terms of the consistency and completeness of recording of text information resulting in underestimates when using these data. Four coding validation papers were reviewed with these studies showing the utility of text data for validating and checking the accuracy of coded data. Seven studies (9 papers) described methods for interrogating injury text fields for systematic extraction of information, with a combination of manual and semi-automated methods used to refine and develop algorithms for extraction and classification of coded data from text. Quality assurance approaches to assessing the robustness of the methods for extracting text data was only discussed in 8 of the epidemiology papers, and 1 of the coding validation papers. All of the text interrogation methodology papers described systematic approaches to ensuring the quality of the approach.----- Conclusions: Manual review and coding approaches, text search methods, and statistical tools have been utilised to extract data from narrative text and translate it into useable, detailed injury event information. These techniques can and have been applied to administrative datasets to identify specific injury types and add value to previously coded injury datasets. Only a few studies thoroughly described the methods which were used for text mining and less than half of the studies which were reviewed used/described quality assurance methods for ensuring the robustness of the approach. New techniques utilising semi-automated computerised approaches and Bayesian/clustering statistical methods offer the potential to further develop and standardise the analysis of narrative text for injury surveillance.
Resumo:
The problem of impostor dataset selection for GMM-based speaker verification is addressed through the recently proposed data-driven background dataset refinement technique. The SVM-based refinement technique selects from a candidate impostor dataset those examples that are most frequently selected as support vectors when training a set of SVMs on a development corpus. This study demonstrates the versatility of dataset refinement in the task of selecting suitable impostor datasets for use in GMM-based speaker verification. The use of refined Z- and T-norm datasets provided performance gains of 15% in EER in the NIST 2006 SRE over the use of heuristically selected datasets. The refined datasets were shown to generalise well to the unseen data of the NIST 2008 SRE.
Resumo:
A data-driven background dataset refinement technique was recently proposed for SVM based speaker verification. This method selects a refined SVM background dataset from a set of candidate impostor examples after individually ranking examples by their relevance. This paper extends this technique to the refinement of the T-norm dataset for SVM-based speaker verification. The independent refinement of the background and T-norm datasets provides a means of investigating the sensitivity of SVM-based speaker verification performance to the selection of each of these datasets. Using refined datasets provided improvements of 13% in min. DCF and 9% in EER over the full set of impostor examples on the 2006 SRE corpus with the majority of these gains due to refinement of the T-norm dataset. Similar trends were observed for the unseen data of the NIST 2008 SRE.
Resumo:
XML document clustering is essential for many document handling applications such as information storage, retrieval, integration and transformation. An XML clustering algorithm should process both the structural and the content information of XML documents in order to improve the accuracy and meaning of the clustering solution. However, the inclusion of both kinds of information in the clustering process results in a huge overhead for the underlying clustering algorithm because of the high dimensionality of the data. This paper introduces a novel approach that first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. The proposed method reduces the high dimensionality of input data by using only the structure-constrained content. The empirical analysis reveals that the proposed method can effectively cluster even very large XML datasets and outperform other existing methods.
Resumo:
Association rule mining is one technique that is widely used when querying databases, especially those that are transactional, in order to obtain useful associations or correlations among sets of items. Much work has been done focusing on efficiency, effectiveness and redundancy. There has also been a focusing on the quality of rules from single level datasets with many interestingness measures proposed. However, with multi-level datasets now being common there is a lack of interestingness measures developed for multi-level and cross-level rules. Single level measures do not take into account the hierarchy found in a multi-level dataset. This leaves the Support-Confidence approach,which does not consider the hierarchy anyway and has other drawbacks, as one of the few measures available. In this paper we propose two approaches which measure multi-level association rules to help evaluate their interestingness. These measures of diversity and peculiarity can be used to help identify those rules from multi-level datasets that are potentially useful.
Resumo:
Objective We aimed to predict sub-national spatial variation in numbers of people infected with Schistosoma haematobium, and associated uncertainties, in Burkina Faso, Mali and Niger, prior to implementation of national control programmes. Methods We used national field survey datasets covering a contiguous area 2,750 × 850 km, from 26,790 school-aged children (5–14 years) in 418 schools. Bayesian geostatistical models were used to predict prevalence of high and low intensity infections and associated 95% credible intervals (CrI). Numbers infected were determined by multiplying predicted prevalence by numbers of school-aged children in 1 km2 pixels covering the study area. Findings Numbers of school-aged children with low-intensity infections were: 433,268 in Burkina Faso, 872,328 in Mali and 580,286 in Niger. Numbers with high-intensity infections were: 416,009 in Burkina Faso, 511,845 in Mali and 254,150 in Niger. 95% CrIs (indicative of uncertainty) were wide; e.g. the mean number of boys aged 10–14 years infected in Mali was 140,200 (95% CrI 6200, 512,100). Conclusion National aggregate estimates for numbers infected mask important local variation, e.g. most S. haematobium infections in Niger occur in the Niger River valley. Prevalence of high-intensity infections was strongly clustered in foci in western and central Mali, north-eastern and northwestern Burkina Faso and the Niger River valley in Niger. Populations in these foci are likely to carry the bulk of the urinary schistosomiasis burden and should receive priority for schistosomiasis control. Uncertainties in predicted prevalence and numbers infected should be acknowledged and taken into consideration by control programme planners.