1000 resultados para phenotipic data
Resumo:
The idea of extracting knowledge in process mining is a descendant of data mining. Both mining disciplines emphasise data flow and relations among elements in the data. Unfortunately, challenges have been encountered when working with the data flow and relations. One of the challenges is that the representation of the data flow between a pair of elements or tasks is insufficiently simplified and formulated, as it considers only a one-to-one data flow relation. In this paper, we discuss how the effectiveness of knowledge representation can be extended in both disciplines. To this end, we introduce a new representation of the data flow and dependency formulation using a flow graph. The flow graph solves the issue of the insufficiency of presenting other relation types, such as many-to-one and one-to-many relations. As an experiment, a new evaluation framework is applied to the Teleclaim process in order to show how this method can provide us with more precise results when compared with other representations.
Resumo:
The knowledge of hydrological variables (e. g. soil moisture, evapotranspiration) are of pronounced importance in various applications including flood control, agricultural production and effective water resources management. These applications require the accurate prediction of hydrological variables spatially and temporally in watershed/basin. Though hydrological models can simulate these variables at desired resolution (spatial and temporal), often they are validated against the variables, which are either sparse in resolution (e. g. soil moisture) or averaged over large regions (e. g. runoff). A combination of the distributed hydrological model (DHM) and remote sensing (RS) has the potential to improve resolution. Data assimilation schemes can optimally combine DHM and RS. Retrieval of hydrological variables (e. g. soil moisture) from remote sensing and assimilating it in hydrological model requires validation of algorithms using field studies. Here we present a review of methodologies developed to assimilate RS in DHM and demonstrate the application for soil moisture in a small experimental watershed in south India.
Resumo:
Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.
Resumo:
Background Several prospective studies have suggested that gait and plantar pressure abnormalities secondary to diabetic peripheral neuropathy contributes to foot ulceration. There are many different methods by which gait and plantar pressures are assessed and currently there is no agreed standardised approach. This study aimed to describe the methods and reproducibility of three-dimensional gait and plantar pressure assessments in a small subset of participants using pre-existing protocols. Methods Fourteen participants were conveniently sampled prior to a planned longitudinal study; four patients with diabetes and plantar foot ulcers, five patients with diabetes but no foot ulcers and five healthy controls. The repeatability of measuring key biomechanical data was assessed including the identification of 16 key anatomical landmarks, the measurement of seven leg dimensions, the processing of 22 three-dimensional gait parameters and the analysis of four different plantar pressures measures at 20 foot regions. Results The mean inter-observer differences were within the pre-defined acceptable level (<7 mm) for 100 % (16 of 16) of key anatomical landmarks measured for gait analysis. The intra-observer assessment concordance correlation coefficients were > 0.9 for 100 % (7 of 7) of leg dimensions. The coefficients of variations (CVs) were within the pre-defined acceptable level (<10 %) for 100 % (22 of 22) of gait parameters. The CVs were within the pre-defined acceptable level (<30 %) for 95 % (19 of 20) of the contact area measures, 85 % (17 of 20) of mean plantar pressures, 70 % (14 of 20) of pressure time integrals and 55 % (11 of 20) of maximum sensor plantar pressure measures. Conclusion Overall, the findings of this study suggest that important gait and plantar pressure measurements can be reliably acquired. Nearly all measures contributing to three-dimensional gait parameter assessments were within predefined acceptable limits. Most plantar pressure measurements were also within predefined acceptable limits; however, reproducibility was not as good for assessment of the maximum sensor pressure. To our knowledge, this is the first study to investigate the reproducibility of several biomechanical methods in a heterogeneous cohort.
Resumo:
Dispersing a data object into a set of data shares is an elemental stage in distributed communication and storage systems. In comparison to data replication, data dispersal with redundancy saves space and bandwidth. Moreover, dispersing a data object to distinct communication links or storage sites limits adversarial access to whole data and tolerates loss of a part of data shares. Existing data dispersal schemes have been proposed mostly based on various mathematical transformations on the data which induce high computation overhead. This paper presents a novel data dispersal scheme where each part of a data object is replicated, without encoding, into a subset of data shares according to combinatorial design theory. Particularly, data parts are mapped to points and data shares are mapped to lines of a projective plane. Data parts are then distributed to data shares using the point and line incidence relations in the plane so that certain subsets of data shares collectively possess all data parts. The presented scheme incorporates combinatorial design theory with inseparability transformation to achieve secure data dispersal at reduced computation, communication and storage costs. Rigorous formal analysis and experimental study demonstrate significant cost-benefits of the presented scheme in comparison to existing methods.
Resumo:
With the development of wearable and mobile computing technology, more and more people start using sleep-tracking tools to collect personal sleep data on a daily basis aiming at understanding and improving their sleep. While sleep quality is influenced by many factors in a person’s lifestyle context, such as exercise, diet and steps walked, existing tools simply visualize sleep data per se on a dashboard rather than analyse those data in combination with contextual factors. Hence many people find it difficult to make sense of their sleep data. In this paper, we present a cloud-based intelligent computing system named SleepExplorer that incorporates sleep domain knowledge and association rule mining for automated analysis on personal sleep data in light of contextual factors. Experiments show that the same contextual factors can play a distinct role in sleep of different people, and SleepExplorer could help users discover factors that are most relevant to their personal sleep.
Resumo:
Volumetric method based adsorption measurements of nitrogen on two specimens of activated carbon (Fluka and Sarabhai) reported by us are refitted to two popular isotherms, namely, Dubunin−Astakhov (D−A) and Toth, in light of improved fitting methods derived recently. Those isotherms have been used to derive other data of relevance in design of engineering equipment such as the concentration dependence of heat of adsorption and Henry’s law coefficients. The present fits provide a better representation of experimental measurements than before because the temperature dependence of adsorbed phase volume and structural heterogeneity of micropore distribution have been accounted for in the D−A equation. A new correlation to the Toth equation is a further contribution. The heat of adsorption in the limiting uptake condition is correlated with the Henry’s law coefficients at the near zero uptake condition.
Resumo:
The problem of identification of stiffness, mass and damping properties of linear structural systems, based on multiple sets of measurement data originating from static and dynamic tests is considered. A strategy, within the framework of Kalman filter based dynamic state estimation, is proposed to tackle this problem. The static tests consists of measurement of response of the structure to slowly moving loads, and to static loads whose magnitude are varied incrementally; the dynamic tests involve measurement of a few elements of the frequency response function (FRF) matrix. These measurements are taken to be contaminated by additive Gaussian noise. An artificial independent variable τ, that simultaneously parameterizes the point of application of the moving load, the magnitude of the incrementally varied static load and the driving frequency in the FRFs, is introduced. The state vector is taken to consist of system parameters to be identified. The fact that these parameters are independent of the variable τ is taken to constitute the set of ‘process’ equations. The measurement equations are derived based on the mechanics of the problem and, quantities, such as displacements and/or strains, are taken to be measured. A recursive algorithm that employs a linearization strategy based on Neumann’s expansion of structural static and dynamic stiffness matrices, and, which provides posterior estimates of the mean and covariance of the unknown system parameters, is developed. The satisfactory performance of the proposed approach is illustrated by considering the problem of the identification of the dynamic properties of an inhomogeneous beam and the axial rigidities of members of a truss structure.
Resumo:
In this paper we have proposed and implemented a joint Medium Access Control (MAC) -cum- Routing scheme for environment data gathering sensor networks. The design principle uses node 'battery lifetime' maximization to be traded against a network that is capable of tolerating: A known percentage of combined packet losses due to packet collisions, network synchronization mismatch and channel impairments Significant end-to-end delay of an order of few seconds We have achieved this with a loosely synchronized network of sensor nodes that implement Slotted-Aloha MAC state machine together with route information. The scheme has given encouraging results in terms of energy savings compared to other popular implementations. The overall packet loss is about 12%. The battery life time increase compared to B-MAC varies from a minimum of 30% to about 90% depending on the duty cycle.
Resumo:
Data mining involves nontrivial process of extracting knowledge or patterns from large databases. Genetic Algorithms are efficient and robust searching and optimization methods that are used in data mining. In this paper we propose a Self-Adaptive Migration Model GA (SAMGA), where parameters of population size, the number of points of crossover and mutation rate for each population are adaptively fixed. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions and a set of actual classification datamining problems. Michigan style of classifier was used to build the classifier and the system was tested with machine learning databases of Pima Indian Diabetes database, Wisconsin Breast Cancer database and few others. The performance of our algorithm is better than others.
Resumo:
Virtual Machine (VM) management is an obvious need in today's data centers for various management activities and is accomplished in two phases— finding an optimal VM placement plan and implementing that placement through live VM migrations. These phases result in two research problems— VM placement problem (VMPP) and VM migration scheduling problem (VMMSP). This research proposes and develops several evolutionary algorithms and heuristic algorithms to address the VMPP and VMMSP. Experimental results show the effectiveness and scalability of the proposed algorithms. Finally, a VM management framework has been proposed and developed to automate the VM management activity in cost-efficient way.
Resumo:
Solving large-scale all-to-all comparison problems using distributed computing is increasingly significant for various applications. Previous efforts to implement distributed all-to-all comparison frameworks have treated the two phases of data distribution and comparison task scheduling separately. This leads to high storage demands as well as poor data locality for the comparison tasks, thus creating a need to redistribute the data at runtime. Furthermore, most previous methods have been developed for homogeneous computing environments, so their overall performance is degraded even further when they are used in heterogeneous distributed systems. To tackle these challenges, this paper presents a data-aware task scheduling approach for solving all-to-all comparison problems in heterogeneous distributed systems. The approach formulates the requirements for data distribution and comparison task scheduling simultaneously as a constrained optimization problem. Then, metaheuristic data pre-scheduling and dynamic task scheduling strategies are developed along with an algorithmic implementation to solve the problem. The approach provides perfect data locality for all comparison tasks, avoiding rearrangement of data at runtime. It achieves load balancing among heterogeneous computing nodes, thus enhancing the overall computation time. It also reduces data storage requirements across the network. The effectiveness of the approach is demonstrated through experimental studies.
Resumo:
The determination of the overconsolidation ratio (OCR) of clay deposits is an important task in geotechnical engineering practice. This paper examines the potential of a support vector machine (SVM) for predicting the OCR of clays from piezocone penetration test data. SVM is a statistical learning theory based on a structural risk minimization principle that minimizes both error and weight terms. The five input variables used for the SVM model for prediction of OCR are the corrected cone resistance (qt), vertical total stress (sigmav), hydrostatic pore pressure (u0), pore pressure at the cone tip (u1), and the pore pressure just above the cone base (u2). Sensitivity analysis has been performed to investigate the relative importance of each of the input parameters. From the sensitivity analysis, it is clear that qt=primary in situ data influenced by OCR followed by sigmav, u0, u2, and u1. Comparison between SVM and some of the traditional interpretation methods is also presented. The results of this study have shown that the SVM approach has the potential to be a practical tool for determination of OCR.
Resumo:
We explore the fuse of information on co-occurrence of domains in multi-domain proteins in predicting protein-protein interactions. The basic premise of our work is the assumption that domains co-occurring in a polypeptide chain undergo either structural or functional interactions among themselves. In this study we use a template dataset of domains in multidomain proteins and predict protein-protein interactions in a target organism. We note that maximum number of correct predictions of interacting protein domain families (158) is made in S. cerevisiae when the dataset of closely related organisms is used as the template followed by the more diverse dataset of bacterial proteins (48) and a dataset of randomly chosen proteins (23). We conclude that use of multi-domain information from organisms closely-related to the target can aid prediction of interacting protein families.