893 resultados para data driven approach
Resumo:
When an accurate hydraulic network model is available, direct modeling techniques are very straightforward and reliable for on-line leakage detection and localization applied to large class of water distribution networks. In general, this type of techniques based on analytical models can be seen as an application of the well-known fault detection and isolation theory for complex industrial systems. Nonetheless, the assumption of single leak scenarios is usually made considering a certain leak size pattern which may not hold in real applications. Upgrading a leak detection and localization method based on a direct modeling approach to handle multiple-leak scenarios can be, on one hand, quite straightforward but, on the other hand, highly computational demanding for large class of water distribution networks given the huge number of potential water loss hotspots. This paper presents a leakage detection and localization method suitable for multiple-leak scenarios and large class of water distribution networks. This method can be seen as an upgrade of the above mentioned method based on a direct modeling approach in which a global search method based on genetic algorithms has been integrated in order to estimate those network water loss hotspots and the size of the leaks. This is an inverse / direct modeling method which tries to take benefit from both approaches: on one hand, the exploration capability of genetic algorithms to estimate network water loss hotspots and the size of the leaks and on the other hand, the straightforwardness and reliability offered by the availability of an accurate hydraulic model to assess those close network areas around the estimated hotspots. The application of the resulting method in a DMA of the Barcelona water distribution network is provided and discussed. The obtained results show that leakage detection and localization under multiple-leak scenarios may be performed efficiently following an easy procedure.
Resumo:
The largest uncertainties in the Standard Model calculation of the anomalous magnetic moment of the muon (ɡ − 2)μ come from hadronic contributions. In particular, it can be expected that in a few years the subleading hadronic light-by-light (HLbL) contribution will dominate the theory uncertainty. We present a dispersive description of the HLbL tensor. This new, model-independent approach opens up an avenue towards a data-driven determination of the HLbL contribution to the (ɡ − 2)μ.
Resumo:
TSEP-RLI was a technical cooperation project jointly conducted by GOP thru DA-Agricultural Training Institute (ATI) and GOJ thru JICA aimed at institutionalizing the training program for Rural Life Improvement (RLI) at the (ATI). As expected, farmers, fisherfolk, women, youth and extension agents were provided with efficient and effective training services from ATI leading to the improvement of quality of life in the rural areas through efforts of human resource development. The ATI- Bohol was chosen as the model center where participatory trials and various activities of the project were undertaken for five years. These activities were participatory surveys and data collection of on-farm and off-farm productive activities; planning workshop for RLI; feedbacking of survey results and action plans to the community and the Local Government Units (LGUs), and signing of Memorandum of Agreement between the Project and participating LGUs. The above activities were done to facilitate the planning and development of most effective and necessary rural life improvement activities, to confirm the willingness of the people to support and participate and to formalize the partnership between the Project and the LGUs. Since the concept of rural life covers a vast range of activities, a consensus had been reached that the total aspects of rural life be grasped in three spheres, namely, Production & Livelihood (P/L), Rural Living Condition (RLC) and Community Environment (C/E). The RLI for Ubi (Yam) Growers was one of the pilot activities undertaken in two pilot barangays and the target beneficiaries were members of the Rural Improvement Club (RIC- a group of organized women) with the LGU of the Municipality of Corella as the implementing partner. During the planning workshop, the barangay residents articulated their desire to promote production and processing of ubi (sphere on P/L - as the entry point), lack of nutritious food was one of the identified problem (sphere on RLC- expansion point) and environmental degradation such as deforestation, and soil erosion was another problem articulated by the community people (sphere on C/E- expansion point). Major activities that were undertaken namely, Ubi cooking contest, cooking/processing seminar, training courses on entrepreneurial development, ubi production and storage technology, packaging and product design, human resource development and simplified bookkeeping motivated the beneficiaries as well as developed and enhanced their skills & capabilities while strengthening their associations. Their participation to the 5 ubi festivals and other related activities had brought some impacts on their economic and rural life improvement activities. The seven principles of TSEP-RLI include the participatory process, holistic approach, dialogical approach, bottom -up training needs assessment, demand-driven approach, cost sharing approach and collaborative implementation with other agencies including LGUs and the community.
Resumo:
Exploratory analysis of data in all sciences seeks to find common patterns to gain insights into the structure and distribution of the data. Typically visualisation methods like principal components analysis are used but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this technical report we discuss a complementary approach based on a non-linear probabilistic model. The generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate far more structure than a two dimensional principal components plot could, and deal at the same time with missing data. We show that using the generative topographic mapping provides us with an optimal method to explore the data while being able to replace missing values in a dataset, particularly where a large proportion of the data is missing.
Resumo:
Financial prediction has attracted a lot of interest due to the financial implications that the accurate prediction of financial markets can have. A variety of data driven modellingapproaches have been applied but their performance has produced mixed results. In this study we apply both parametric (neural networks with active neurons) and nonparametric (analog complexing) self-organisingmodelling methods for the daily prediction of the exchangerate market. We also propose acombinedapproach where the parametric and nonparametricself-organising methods are combined sequentially, exploiting the advantages of the individual methods with the aim of improving their performance. The combined method is found to produce promising results and to outperform the individual methods when tested with two exchangerates: the American Dollar and the Deutche Mark against the British Pound.
Resumo:
University students encounter difficulties with academic English because of its vocabulary, phraseology, and variability, and also because academic English differs in many respects from general English, the language which they have experienced before starting their university studies. Although students have been provided with many dictionaries that contain some helpful information on words used in academic English, these dictionaries remain focused on the uses of words in general English. There is therefore a gap in the dictionary market for a dictionary for university students, and this thesis provides a proposal for such a dictionary (called the Dictionary of Academic English; DOAE) in the form of a model which depicts how the dictionary should be designed, compiled, and offered to students. The model draws on state-of-the-art techniques in lexicography, dictionary-use research, and corpus linguistics. The model demanded the creation of a completely new corpus of academic language (Corpus of Academic Journal Articles; CAJA). The main advantages of the corpus are its large size (83.5 million words) and balance. Having access to a large corpus of academic language was essential for a corpus-driven approach to data analysis. A good corpus balance in terms of domains enabled a detailed domain-labelling of senses, patterns, collocates, etc. in the dictionary database, which was then used to tailor the output according to the needs of different types of student. The model proposes an online dictionary that is designed as an online dictionary from the outset. The proposed dictionary is revolutionary in the way it addresses the needs of different types of student. It presents students with a dynamic dictionary whose contents can be customised according to the user's native language, subject of study, variant spelling preferences, and/or visual preferences (e.g. black and white).
Resumo:
Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.
Resumo:
This paper presents a novel intonation modelling approach and demonstrates its applicability using the Standard Yorùbá language. Our approach is motivated by the theory that abstract and realised forms of intonation and other dimensions of prosody should be modelled within a modular and unified framework. In our model, this framework is implemented using the Relational Tree (R-Tree) technique. The R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. Our R-Tree for an utterance is generated in two steps. First, the abstract structure of the waveform, called the Skeletal Tree (S-Tree), is generated using tone phonological rules for the target language. Second, the numerical values of the perceptually significant peaks and valleys on the S-Tree are computed using a fuzzy logic based model. The resulting points are then joined by applying interpolation techniques. The actual intonation contour is synthesised by Pitch Synchronous Overlap Technique (PSOLA) using the Praat software. We performed both quantitative and qualitative evaluations of our model. The preliminary results suggest that, although the model does not predict the numerical speech data as accurately as contemporary data-driven approaches, it produces synthetic speech with comparable intelligibility and naturalness. Furthermore, our model is easy to implement, interpret and adapt to other tone languages.
Resumo:
In this work we present a quality driven approach to DASH (Dynamic Adaptive Streaming over HTTP) for segment selection in varying network conditions. Current adaption algorithms focus largely on regulating data rates using network layer parameters by selecting the level of quality on offer that can eliminate buffer underrun without considering picture fidelity. In reality, viewers may accept a level of buffer underrun in order to achieve an improved level of picture fidelity. In this case, the conventional DASH algorithms can cause extreme degradation of the picture fidelity when attempting to eliminate buffer underrun with scarce bandwidth availability. Our work is concerned with a quality-aware rate adaption scheme that maximizes the client's quality of experience in terms of both continuity and fidelity (picture quality). Results show that the scheme proposed can maintain a high level of quality for streaming services, especially at low packet loss rates. It is also shown that by eliminating buffer underrun completely, the PSNR that reflects the picture quality of the video is greatly reduced. Our scheme offers the offset between continuity-based quality and resolution-based quality, which can be used to set threshold values for the level of quality desired by clients with different quality requirements. © 2013 IEEE.
Resumo:
Computers employing some degree of data flow organisation are now well established as providing a possible vehicle for concurrent computation. Although data-driven computation frees the architecture from the constraints of the single program counter, processor and global memory, inherent in the classic von Neumann computer, there can still be problems with the unconstrained generation of fresh result tokens if a pure data flow approach is adopted. The advantages of allowing serial processing for those parts of a program which are inherently serial, and of permitting a demand-driven, as well as data-driven, mode of operation are identified and described. The MUSE machine described here is a structured architecture supporting both serial and parallel processing which allows the abstract structure of a program to be mapped onto the machine in a logical way.
Resumo:
Background: Understanding transcriptional regulation by genome-wide microarray studies can contribute to unravel complex relationships between genes. Attempts to standardize the annotation of microarray data include the Minimum Information About a Microarray Experiment (MIAME) recommendations, the MAGE-ML format for data interchange, and the use of controlled vocabularies or ontologies. The existing software systems for microarray data analysis implement the mentioned standards only partially and are often hard to use and extend. Integration of genomic annotation data and other sources of external knowledge using open standards is therefore a key requirement for future integrated analysis systems. Results: The EMMA 2 software has been designed to resolve shortcomings with respect to full MAGE-ML and ontology support and makes use of modern data integration techniques. We present a software system that features comprehensive data analysis functions for spotted arrays, and for the most common synthesized oligo arrays such as Agilent, Affymetrix and NimbleGen. The system is based on the full MAGE object model. Analysis functionality is based on R and Bioconductor packages and can make use of a compute cluster for distributed services. Conclusion: Our model-driven approach for automatically implementing a full MAGE object model provides high flexibility and compatibility. Data integration via SOAP-based web-services is advantageous in a distributed client-server environment as the collaborative analysis of microarray data is gaining more and more relevance in international research consortia. The adequacy of the EMMA 2 software design and implementation has been proven by its application in many distributed functional genomics projects. Its scalability makes the current architecture suited for extensions towards future transcriptomics methods based on high-throughput sequencing approaches which have much higher computational requirements than microarrays.
Resumo:
This thesis investigates how web search evaluation can be improved using historical interaction data. Modern search engines combine offline and online evaluation approaches in a sequence of steps that a tested change needs to pass through to be accepted as an improvement and subsequently deployed. We refer to such a sequence of steps as an evaluation pipeline. In this thesis, we consider the evaluation pipeline to contain three sequential steps: an offline evaluation step, an online evaluation scheduling step, and an online evaluation step. In this thesis we show that historical user interaction data can aid in improving the accuracy or efficiency of each of the steps of the web search evaluation pipeline. As a result of these improvements, the overall efficiency of the entire evaluation pipeline is increased. Firstly, we investigate how user interaction data can be used to build accurate offline evaluation methods for query auto-completion mechanisms. We propose a family of offline evaluation metrics for query auto-completion that represents the effort the user has to spend in order to submit their query. The parameters of our proposed metrics are trained against a set of user interactions recorded in the search engine’s query logs. From our experimental study, we observe that our proposed metrics are significantly more correlated with an online user satisfaction indicator than the metrics proposed in the existing literature. Hence, fewer changes will pass the offline evaluation step to be rejected after the online evaluation step. As a result, this would allow us to achieve a higher efficiency of the entire evaluation pipeline. Secondly, we state the problem of the optimised scheduling of online experiments. We tackle this problem by considering a greedy scheduler that prioritises the evaluation queue according to the predicted likelihood of success of a particular experiment. This predictor is trained on a set of online experiments, and uses a diverse set of features to represent an online experiment. Our study demonstrates that a higher number of successful experiments per unit of time can be achieved by deploying such a scheduler on the second step of the evaluation pipeline. Consequently, we argue that the efficiency of the evaluation pipeline can be increased. Next, to improve the efficiency of the online evaluation step, we propose the Generalised Team Draft interleaving framework. Generalised Team Draft considers both the interleaving policy (how often a particular combination of results is shown) and click scoring (how important each click is) as parameters in a data-driven optimisation of the interleaving sensitivity. Further, Generalised Team Draft is applicable beyond domains with a list-based representation of results, i.e. in domains with a grid-based representation, such as image search. Our study using datasets of interleaving experiments performed both in document and image search domains demonstrates that Generalised Team Draft achieves the highest sensitivity. A higher sensitivity indicates that the interleaving experiments can be deployed for a shorter period of time or use a smaller sample of users. Importantly, Generalised Team Draft optimises the interleaving parameters w.r.t. historical interaction data recorded in the interleaving experiments. Finally, we propose to apply the sequential testing methods to reduce the mean deployment time for the interleaving experiments. We adapt two sequential tests for the interleaving experimentation. We demonstrate that one can achieve a significant decrease in experiment duration by using such sequential testing methods. The highest efficiency is achieved by the sequential tests that adjust their stopping thresholds using historical interaction data recorded in diagnostic experiments. Our further experimental study demonstrates that cumulative gains in the online experimentation efficiency can be achieved by combining the interleaving sensitivity optimisation approaches, including Generalised Team Draft, and the sequential testing approaches. Overall, the central contributions of this thesis are the proposed approaches to improve the accuracy or efficiency of the steps of the evaluation pipeline: the offline evaluation frameworks for the query auto-completion, an approach for the optimised scheduling of online experiments, a general framework for the efficient online interleaving evaluation, and a sequential testing approach for the online search evaluation. The experiments in this thesis are based on massive real-life datasets obtained from Yandex, a leading commercial search engine. These experiments demonstrate the potential of the proposed approaches to improve the efficiency of the evaluation pipeline.
Resumo:
In recent times, a significant research effort has been focused on how deformable linear objects (DLOs) can be manipulated for real world applications such as assembly of wiring harnesses for the automotive and aerospace sector. This represents an open topic because of the difficulties in modelling accurately the behaviour of these objects and simulate a task involving their manipulation, considering a variety of different scenarios. These problems have led to the development of data-driven techniques in which machine learning techniques are exploited to obtain reliable solutions. However, this approach makes the solution difficult to be extended, since the learning must be replicated almost from scratch as the scenario changes. It follows that some model-based methodology must be introduced to generalize the results and reduce the training effort accordingly. The objective of this thesis is to develop a solution for the DLOs manipulation to assemble a wiring harness for the automotive sector based on adaptation of a base trajectory set by means of reinforcement learning methods. The idea is to create a trajectory planning software capable of solving the proposed task, reducing where possible the learning time, which is done in real time, but at the same time presenting suitable performance and reliability. The solution has been implemented on a collaborative 7-DOFs Panda robot at the Laboratory of Automation and Robotics of the University of Bologna. Experimental results are reported showing how the robot is capable of optimizing the manipulation of the DLOs gaining experience along the task repetition, but showing at the same time a high success rate from the very beginning of the learning phase.
Resumo:
We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with excellent properties. The approach is in- spired by the principles of the generalized cross entropy method. The pro- posed density estimation procedure has numerous advantages over the tra- ditional kernel density estimator methods. Firstly, for the first time in the nonparametric literature, the proposed estimator allows for a genuine incor- poration of prior information in the density estimation procedure. Secondly, the approach provides the first data-driven bandwidth selection method that is guaranteed to provide a unique bandwidth for any data. Lastly, simulation examples suggest the proposed approach outperforms the current state of the art in nonparametric density estimation in terms of accuracy and reliability.
Resumo:
In this and a preceding paper, we provide an introduction to the Fujitsu VPP range of vector-parallel supercomputers and to some of the computational chemistry software available for the VPP. Here, we consider the implementation and performance of seven popular chemistry application packages. The codes discussed range from classical molecular dynamics to semiempirical and ab initio quantum chemistry. All have evolved from sequential codes, and have typically been parallelised using a replicated data approach. As such they are well suited to the large-memory/fast-processor architecture of the VPP. For one code, CASTEP, a distributed-memory data-driven parallelisation scheme is presented. (C) 2000 Published by Elsevier Science B.V. All rights reserved.