990 resultados para CENSORED DATA
Resumo:
High-Order Co-Clustering (HOCC) methods have attracted high attention in recent years because of their ability to cluster multiple types of objects simultaneously using all available information. During the clustering process, HOCC methods exploit object co-occurrence information, i.e., inter-type relationships amongst different types of objects as well as object affinity information, i.e., intra-type relationships amongst the same types of objects. However, it is difficult to learn accurate intra-type relationships in the presence of noise and outliers. Existing HOCC methods consider the p nearest neighbours based on Euclidean distance for the intra-type relationships, which leads to incomplete and inaccurate intra-type relationships. In this paper, we propose a novel HOCC method that incorporates multiple subspace learning with a heterogeneous manifold ensemble to learn complete and accurate intra-type relationships. Multiple subspace learning reconstructs the similarity between any pair of objects that belong to the same subspace. The heterogeneous manifold ensemble is created based on two-types of intra-type relationships learnt using p-nearest-neighbour graph and multiple subspaces learning. Moreover, in order to make sure the robustness of clustering process, we introduce a sparse error matrix into matrix decomposition and develop a novel iterative algorithm. Empirical experiments show that the proposed method achieves improved results over the state-of-art HOCC methods for FScore and NMI.
Resumo:
This thesis proposes three novel models which extend the statistical methodology for motor unit number estimation, a clinical neurology technique. Motor unit number estimation is important in the treatment of degenerative muscular diseases and, potentially, spinal injury. Additionally, a recent and untested statistic to enable statistical model choice is found to be a practical alternative for larger datasets. The existing methods for dose finding in dual-agent clinical trials are found to be suitable only for designs of modest dimensions. The model choice case-study is the first of its kind containing interesting results using so-called unit information prior distributions.
Resumo:
When crystallization screening is conducted many outcomes are observed but typically the only trial recorded in the literature is the condition that yielded the crystal(s) used for subsequent diffraction studies. The initial hit that was optimized and the results of all the other trials are lost. These missing results contain information that would be useful for an improved general understanding of crystallization. This paper provides a report of a crystallization data exchange (XDX) workshop organized by several international large-scale crystallization screening laboratories to discuss how this information may be captured and utilized. A group that administers a significant fraction of the worlds crystallization screening results was convened, together with chemical and structural data informaticians and computational scientists who specialize in creating and analysing large disparate data sets. The development of a crystallization ontology for the crystallization community was proposed. This paper (by the attendees of the workshop) provides the thoughts and rationale leading to this conclusion. This is brought to the attention of the wider audience of crystallographers so that they are aware of these early efforts and can contribute to the process going forward. © 2012 International Union of Crystallography All rights reserved.
Resumo:
Many techniques in information retrieval produce counts from a sample, and it is common to analyse these counts as proportions of the whole - term frequencies are a familiar example. Proportions carry only relative information and are not free to vary independently of one another: for the proportion of one term to increase, one or more others must decrease. These constraints are hallmarks of compositional data. While there has long been discussion in other fields of how such data should be analysed, to our knowledge, Compositional Data Analysis (CoDA) has not been considered in IR. In this work we explore compositional data in IR through the lens of distance measures, and demonstrate that common measures, naïve to compositions, have some undesirable properties which can be avoided with composition-aware measures. As a practical example, these measures are shown to improve clustering. Copyright 2014 ACM.
Resumo:
Due to the availability of huge number of web services, finding an appropriate Web service according to the requirements of a service consumer is still a challenge. Moreover, sometimes a single web service is unable to fully satisfy the requirements of the service consumer. In such cases, combinations of multiple inter-related web services can be utilised. This paper proposes a method that first utilises a semantic kernel model to find related services and then models these related Web services as nodes of a graph. An all-pair shortest-path algorithm is applied to find the best compositions of Web services that are semantically related to the service consumer requirement. The recommendation of individual and composite Web services composition for a service request is finally made. Empirical evaluation confirms that the proposed method significantly improves the accuracy of service discovery in comparison to traditional keyword-based discovery methods.
Resumo:
With a focus to optimising the life cycle performance of Australian Railway bridges, new bridge classification and environmental classification systems are proposed. The new bridge classification system is mainly to facilitate the implementation of novel Bridge Management System (BMS) which optimise the life cycle cost both at project level and network level while environment classification is mainly to improve accuracy of Remaining Service Potential (RSP) module of the proposed BMS. In fact, limited capacity of the existing BMS to trigger the maintenance intervention point is an indirect result of inadequacies of the existing bridge and environmental classification systems. The proposed bridge classification system permits to identify the intervention points based on percentage deterioration of individual elements and maintenance cost, while allowing performance based rating technique to implement for maintenance optimisation and prioritisation. Simultaneously, the proposed environment classification system will enhance the accuracy of prediction of deterioration of steel components.
Resumo:
Hydrogeophysics is a growing discipline that holds significant promise to help elucidate details of dynamic processes in the near surface, built on the ability of geophysical methods to measure properties from which hydrological and geochemical variables can be derived. For example, bulk electrical conductivity is governed by, amongst others, interstitial water content, fluid salinity, and temperature, and can be measured using a range of geophysical methods. In many cases, electrical resistivity tomography (ERT) is well suited to characterize these properties in multiple dimensions and to monitor dynamic processes, such as water infiltration and solute transport. In recent years, ERT has been used increasingly for ecosystem research in a wide range of settings; in particular to characterize vegetation-driven changes in root-zone and near-surface water dynamics. This increased popularity is due to operational factors (e.g., improved equipment, low site impact), data considerations (e.g., excellent repeatability), and the fact that ERT operates at scales significantly larger than traditional point sensors. Current limitations to a more widespread use of the approach include the high equipment costs, and the need for site-specific petrophysical relationships between properties of interest. In this presentation we will discuss recent equipment advances and theoretical and methodological aspects involved in the accurate estimation of soil moisture from ERT results. Examples will be presented from two studies in a temperate climate (Michigan, USA) and one from a humid tropical location (Tapajos, Brazil).
Resumo:
This paper addresses research from a three-year longitudinal study that engaged children in data modeling experiences from the beginning school year through to third year (6-8 years). A data modeling approach to statistical development differs in several ways from what is typically done in early classroom experiences with data. In particular, data modeling immerses children in problems that evolve from their own questions and reasoning, with core statistical foundations established early. These foundations include a focus on posing and refining statistical questions within and across contexts, structuring and representing data, making informal inferences, and developing conceptual, representational, and metarepresentational competence. Examples are presented of how young learners developed and sustained informal inferential reasoning and metarepresentational competence across the study to become “sophisticated statisticians”.
Resumo:
The study of data modelling with elementary students involves the analysis of a developmental process beginning with children’s investigations of meaningful contexts: visualising, structuring, and representing data and displaying data in simple graphs (English, 2012; Lehrer & Schauble, 2005; Makar, Bakker, & Ben-Zvi, 2011). A 3-year longitudinal study investigated young children’s data modelling, integrating mathematical and scientific investigations. One aspect of this study involved a researcher-led teaching experiment with 21 mathematically able Grade 1 students. The study aimed to describe explicit developmental features of students’ representations of continuous data...
Resumo:
Protein adsorption at solid-liquid interfaces is critical to many applications, including biomaterials, protein microarrays and lab-on-a-chip devices. Despite this general interest, and a large amount of research in the last half a century, protein adsorption cannot be predicted with an engineering level, design-orientated accuracy. Here we describe a Biomolecular Adsorption Database (BAD), freely available online, which archives the published protein adsorption data. Piecewise linear regression with breakpoint applied to the data in the BAD suggests that the input variables to protein adsorption, i.e., protein concentration in solution; protein descriptors derived from primary structure (number of residues, global protein hydrophobicity and range of amino acid hydrophobicity, isoelectric point); surface descriptors (contact angle); and fluid environment descriptors (pH, ionic strength), correlate well with the output variable-the protein concentration on the surface. Furthermore, neural network analysis revealed that the size of the BAD makes it sufficiently representative, with a neural network-based predictive error of 5% or less. Interestingly, a consistently better fit is obtained if the BAD is divided in two separate sub-sets representing protein adsorption on hydrophilic and hydrophobic surfaces, respectively. Based on these findings, selected entries from the BAD have been used to construct neural network-based estimation routines, which predict the amount of adsorbed protein, the thickness of the adsorbed layer and the surface tension of the protein-covered surface. While the BAD is of general interest, the prediction of the thickness and the surface tension of the protein-covered layers are of particular relevance to the design of microfluidics devices.
Resumo:
Critical to the research of urban morphologists is the availability of historical records that document the urban transformation of the study area. However, thus far little work has been done towards an empirical approach to the validation of archival data in this field. Outlined in this paper, therefore, is a new methodology for validating the accuracy of archival records and mapping data, accrued through the process of urban morphological research, so as to establish a reliable platform from which analysis can proceed. The paper particularly addresses the problems of inaccuracies in existing curated historical information, as well as errors in archival research by student assistants, which together give rise to unacceptable levels of uncertainty in the documentation. The paper discusses the problems relating to the reliability of historical information, demonstrates the importance of data verification in urban morphological research, and proposes a rigorous method for objective testing of collected archival data through the use of qualitative data analysis software.