1000 resultados para Discrepant data
Resumo:
This thesis introduced two novel reputation models to generate accurate item reputation scores using ratings data and the statistics of the dataset. It also presented an innovative method that incorporates reputation awareness in recommender systems by employing voting system methods to produce more accurate top-N item recommendations. Additionally, this thesis introduced a personalisation method for generating reputation scores based on users' interests, where a single item can have different reputation scores for different users. The personalised reputation scores are then used in the proposed reputation-aware recommender systems to enhance the recommendation quality.
Resumo:
Urban growth identification, quantification, knowledge of rate and the trends of growth would help in regional planning for better infrastructure provision in environmentally sound way. This requires analysis of spatial and temporal data, which help in quantifying the trends of growth on spatial scale. Emerging technologies such as Remote Sensing, Geographic Information System (GIS) along with Global Positioning System (GPS) help in this regard. Remote sensing aids in the collection of temporal data and GIS helps in spatial analysis. This paper focuses on the analysis of urban growth pattern in the form of either radial or linear sprawl along the Bangalore - Mysore highway. Various GIS base layers such as builtup areas along the highway, road network, village boundary etc. were generated using collateral data such as the Survey of India toposheet, etc. Further, this analysis was complemented with the computation of Shannon's entropy, which helped in identifying prevalent sprawl zone, rate of growth and in delineating potential sprawl locations. The computation Shannon's entropy helped in delineating regions with dispersed and compact growth. This study reveals that the Bangalore North and South taluks contributed mainly to the sprawl with 559% increase in built-up area over a period of 28 years and high degree of dispersion. The Mysore and Srirangapatna region showed 128% change in built-up area and a high potential for sprawl with slightly high dispersion. The degree of sprawl was found to be directly proportional to the distances from the cities.
Resumo:
Poly(styrene peroxide) has been prepared and characterized. Nuclear magnetlc resonance (NMR) spectra Of the polymer show the shift Of aliphatic protons. Differential scanning calorimetric (DSC) and differential thermal analysis (DTA) results show anexothermic peak around 110 OC which is characteristic of peroxide decomposition.
Resumo:
Historically, school leaders have occupied a somewhat ambiguous position within networks of power. On the one hand, they appear to be celebrated as what Ball (2003) has termed the ‘new hero of educational reform'; on the other, they are often ‘held to account’ through those same performative processes and technologies. These have become compelling in schools and principals are ‘doubly bound’ through this. Adopting a Foucauldian notion of discursive production, this paper addresses the ways that the discursive ‘field’ of ‘principal’ (within larger regimes of truth such as schools, leadership, quality and efficiency) is produced. It explores how individual principals understand their roles and ethics within those practices of audit emerging in school governance, and how their self-regulation is constituted through NAPLAN – the National Assessment Program, Literacy and Numeracy. A key effect of NAPLAN has been the rise of auditing practices that change how education is valued. Open-ended interviews with 13 primary and secondary school principals from Western Australia, South Australia and New South Wales asked how they perceived NAPLAN's impact on their work, their relationships within their school community and their ethical practice.
Resumo:
We present some results on multicarrier analysis of magnetotransport data, Both synthetic as well as data from narrow gap Hg0.8Cd0.2Te samples are used to demonstrate applicability of various algorithms vs. nonlinear least square fitting, Quantitative Mobility Spectrum Analysis (QMSA) and Maximum Entropy Mobility Spectrum Analysis (MEMSA). Comments are made from our experience oil these algorithms, and, on the inversion procedure from experimental R/sigma-B to S-mu specifically with least square fitting as an example. Amongst the conclusions drawn are: (i) Experimentally measured resistivity (R-xx, R-xy) should also be used instead of just the inverted conductivity (sigma(xx), sigma(xy)) to fit data to semiclassical expressions for better fits especially at higher B. (ii) High magnetic field is necessary to extract low mobility carrier parameters. (iii) Provided the error in data is not large, better estimates to carrier parameters of remaining carrier species can be obtained at any stage by subtracting highest mobility carrier contribution to sigma from the experimental data and fitting with the remaining carriers. (iv)Even in presence of high electric field, an approximate multicarrier expression can be used to guess the carrier mobilities and their variations before solving the full Boltzmann equation.
Resumo:
The idea of extracting knowledge in process mining is a descendant of data mining. Both mining disciplines emphasise data flow and relations among elements in the data. Unfortunately, challenges have been encountered when working with the data flow and relations. One of the challenges is that the representation of the data flow between a pair of elements or tasks is insufficiently simplified and formulated, as it considers only a one-to-one data flow relation. In this paper, we discuss how the effectiveness of knowledge representation can be extended in both disciplines. To this end, we introduce a new representation of the data flow and dependency formulation using a flow graph. The flow graph solves the issue of the insufficiency of presenting other relation types, such as many-to-one and one-to-many relations. As an experiment, a new evaluation framework is applied to the Teleclaim process in order to show how this method can provide us with more precise results when compared with other representations.
Resumo:
The knowledge of hydrological variables (e. g. soil moisture, evapotranspiration) are of pronounced importance in various applications including flood control, agricultural production and effective water resources management. These applications require the accurate prediction of hydrological variables spatially and temporally in watershed/basin. Though hydrological models can simulate these variables at desired resolution (spatial and temporal), often they are validated against the variables, which are either sparse in resolution (e. g. soil moisture) or averaged over large regions (e. g. runoff). A combination of the distributed hydrological model (DHM) and remote sensing (RS) has the potential to improve resolution. Data assimilation schemes can optimally combine DHM and RS. Retrieval of hydrological variables (e. g. soil moisture) from remote sensing and assimilating it in hydrological model requires validation of algorithms using field studies. Here we present a review of methodologies developed to assimilate RS in DHM and demonstrate the application for soil moisture in a small experimental watershed in south India.
Resumo:
Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.
Resumo:
Background Several prospective studies have suggested that gait and plantar pressure abnormalities secondary to diabetic peripheral neuropathy contributes to foot ulceration. There are many different methods by which gait and plantar pressures are assessed and currently there is no agreed standardised approach. This study aimed to describe the methods and reproducibility of three-dimensional gait and plantar pressure assessments in a small subset of participants using pre-existing protocols. Methods Fourteen participants were conveniently sampled prior to a planned longitudinal study; four patients with diabetes and plantar foot ulcers, five patients with diabetes but no foot ulcers and five healthy controls. The repeatability of measuring key biomechanical data was assessed including the identification of 16 key anatomical landmarks, the measurement of seven leg dimensions, the processing of 22 three-dimensional gait parameters and the analysis of four different plantar pressures measures at 20 foot regions. Results The mean inter-observer differences were within the pre-defined acceptable level (<7 mm) for 100 % (16 of 16) of key anatomical landmarks measured for gait analysis. The intra-observer assessment concordance correlation coefficients were > 0.9 for 100 % (7 of 7) of leg dimensions. The coefficients of variations (CVs) were within the pre-defined acceptable level (<10 %) for 100 % (22 of 22) of gait parameters. The CVs were within the pre-defined acceptable level (<30 %) for 95 % (19 of 20) of the contact area measures, 85 % (17 of 20) of mean plantar pressures, 70 % (14 of 20) of pressure time integrals and 55 % (11 of 20) of maximum sensor plantar pressure measures. Conclusion Overall, the findings of this study suggest that important gait and plantar pressure measurements can be reliably acquired. Nearly all measures contributing to three-dimensional gait parameter assessments were within predefined acceptable limits. Most plantar pressure measurements were also within predefined acceptable limits; however, reproducibility was not as good for assessment of the maximum sensor pressure. To our knowledge, this is the first study to investigate the reproducibility of several biomechanical methods in a heterogeneous cohort.
Resumo:
Dispersing a data object into a set of data shares is an elemental stage in distributed communication and storage systems. In comparison to data replication, data dispersal with redundancy saves space and bandwidth. Moreover, dispersing a data object to distinct communication links or storage sites limits adversarial access to whole data and tolerates loss of a part of data shares. Existing data dispersal schemes have been proposed mostly based on various mathematical transformations on the data which induce high computation overhead. This paper presents a novel data dispersal scheme where each part of a data object is replicated, without encoding, into a subset of data shares according to combinatorial design theory. Particularly, data parts are mapped to points and data shares are mapped to lines of a projective plane. Data parts are then distributed to data shares using the point and line incidence relations in the plane so that certain subsets of data shares collectively possess all data parts. The presented scheme incorporates combinatorial design theory with inseparability transformation to achieve secure data dispersal at reduced computation, communication and storage costs. Rigorous formal analysis and experimental study demonstrate significant cost-benefits of the presented scheme in comparison to existing methods.
Resumo:
With the development of wearable and mobile computing technology, more and more people start using sleep-tracking tools to collect personal sleep data on a daily basis aiming at understanding and improving their sleep. While sleep quality is influenced by many factors in a person’s lifestyle context, such as exercise, diet and steps walked, existing tools simply visualize sleep data per se on a dashboard rather than analyse those data in combination with contextual factors. Hence many people find it difficult to make sense of their sleep data. In this paper, we present a cloud-based intelligent computing system named SleepExplorer that incorporates sleep domain knowledge and association rule mining for automated analysis on personal sleep data in light of contextual factors. Experiments show that the same contextual factors can play a distinct role in sleep of different people, and SleepExplorer could help users discover factors that are most relevant to their personal sleep.
Resumo:
Volumetric method based adsorption measurements of nitrogen on two specimens of activated carbon (Fluka and Sarabhai) reported by us are refitted to two popular isotherms, namely, Dubunin−Astakhov (D−A) and Toth, in light of improved fitting methods derived recently. Those isotherms have been used to derive other data of relevance in design of engineering equipment such as the concentration dependence of heat of adsorption and Henry’s law coefficients. The present fits provide a better representation of experimental measurements than before because the temperature dependence of adsorbed phase volume and structural heterogeneity of micropore distribution have been accounted for in the D−A equation. A new correlation to the Toth equation is a further contribution. The heat of adsorption in the limiting uptake condition is correlated with the Henry’s law coefficients at the near zero uptake condition.
Resumo:
The problem of identification of stiffness, mass and damping properties of linear structural systems, based on multiple sets of measurement data originating from static and dynamic tests is considered. A strategy, within the framework of Kalman filter based dynamic state estimation, is proposed to tackle this problem. The static tests consists of measurement of response of the structure to slowly moving loads, and to static loads whose magnitude are varied incrementally; the dynamic tests involve measurement of a few elements of the frequency response function (FRF) matrix. These measurements are taken to be contaminated by additive Gaussian noise. An artificial independent variable τ, that simultaneously parameterizes the point of application of the moving load, the magnitude of the incrementally varied static load and the driving frequency in the FRFs, is introduced. The state vector is taken to consist of system parameters to be identified. The fact that these parameters are independent of the variable τ is taken to constitute the set of ‘process’ equations. The measurement equations are derived based on the mechanics of the problem and, quantities, such as displacements and/or strains, are taken to be measured. A recursive algorithm that employs a linearization strategy based on Neumann’s expansion of structural static and dynamic stiffness matrices, and, which provides posterior estimates of the mean and covariance of the unknown system parameters, is developed. The satisfactory performance of the proposed approach is illustrated by considering the problem of the identification of the dynamic properties of an inhomogeneous beam and the axial rigidities of members of a truss structure.
Resumo:
In this paper we have proposed and implemented a joint Medium Access Control (MAC) -cum- Routing scheme for environment data gathering sensor networks. The design principle uses node 'battery lifetime' maximization to be traded against a network that is capable of tolerating: A known percentage of combined packet losses due to packet collisions, network synchronization mismatch and channel impairments Significant end-to-end delay of an order of few seconds We have achieved this with a loosely synchronized network of sensor nodes that implement Slotted-Aloha MAC state machine together with route information. The scheme has given encouraging results in terms of energy savings compared to other popular implementations. The overall packet loss is about 12%. The battery life time increase compared to B-MAC varies from a minimum of 30% to about 90% depending on the duty cycle.
Resumo:
Data mining involves nontrivial process of extracting knowledge or patterns from large databases. Genetic Algorithms are efficient and robust searching and optimization methods that are used in data mining. In this paper we propose a Self-Adaptive Migration Model GA (SAMGA), where parameters of population size, the number of points of crossover and mutation rate for each population are adaptively fixed. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions and a set of actual classification datamining problems. Michigan style of classifier was used to build the classifier and the system was tested with machine learning databases of Pima Indian Diabetes database, Wisconsin Breast Cancer database and few others. The performance of our algorithm is better than others.