976 resultados para Filtering techniques
Resumo:
Genomic DNA obtained from patient whole blood samples is a key element for genomic research. Advantages and disadvantages, in terms of time-efficiency, cost-effectiveness and laboratory requirements, of procedures available to isolate nucleic acids need to be considered before choosing any particular method. These characteristics have not been fully evaluated for some laboratory techniques, such as the salting out method for DNA extraction, which has been excluded from comparison in different studies published to date. We compared three different protocols (a traditional salting out method, a modified salting out method and a commercially available kit method) to determine the most cost-effective and time-efficient method to extract DNA. We extracted genomic DNA from whole blood samples obtained from breast cancer patient volunteers and compared the results of the product obtained in terms of quantity (concentration of DNA extracted and DNA obtained per ml of blood used) and quality (260/280 ratio and polymerase chain reaction product amplification) of the obtained yield. On average, all three methods showed no statistically significant differences between the final result, but when we accounted for time and cost derived for each method, they showed very significant differences. The modified salting out method resulted in a seven- and twofold reduction in cost compared to the commercial kit and traditional salting out method, respectively and reduced time from 3 days to 1 hour compared to the traditional salting out method. This highlights a modified salting out method as a suitable choice to be used in laboratories and research centres, particularly when dealing with a large number of samples.
Resumo:
The Bluetooth technology is being increasingly used to track vehicles throughout their trips, within urban networks and across freeway stretches. One important opportunity offered by this type of data is the measurement of Origin-Destination patterns, emerging from the aggregation and clustering of individual trips. In order to obtain accurate estimations, however, a number of issues need to be addressed, through data filtering and correction techniques. These issues mainly stem from the use of the Bluetooth technology amongst drivers, and the physical properties of the Bluetooth sensors themselves. First, not all cars are equipped with discoverable Bluetooth devices and the Bluetooth-enabled vehicles may belong to some small socio-economic groups of users. Second, the Bluetooth datasets include data from various transport modes; such as pedestrian, bicycles, cars, taxi driver, buses and trains. Third, the Bluetooth sensors may fail to detect all of the nearby Bluetooth-enabled vehicles. As a consequence, the exact journey for some vehicles may become a latent pattern that will need to be extracted from the data. Finally, sensors that are in close proximity to each other may have overlapping detection areas, thus making the task of retrieving the correct travelled path even more challenging. The aim of this paper is twofold. We first give a comprehensive overview of the aforementioned issues. Further, we propose a methodology that can be followed, in order to cleanse, correct and aggregate Bluetooth data. We postulate that the methods introduced by this paper are the first crucial steps that need to be followed in order to compute accurate Origin-Destination matrices in urban road networks.
Resumo:
Results of an interlaboratory comparison on size characterization of SiO2 airborne nanoparticles using on-line and off-line measurement techniques are discussed. This study was performed in the framework of Technical Working Area (TWA) 34—“Properties of Nanoparticle Populations” of the Versailles Project on Advanced Materials and Standards (VAMAS) in the project no. 3 “Techniques for characterizing size distribution of airborne nanoparticles”. Two types of nano-aerosols, consisting of (1) one population of nanoparticles with a mean diameter between 30.3 and 39.0 nm and (2) two populations of non-agglomerated nanoparticles with mean diameters between, respectively, 36.2–46.6 nm and 80.2–89.8 nm, were generated for characterization measurements. Scanning mobility particle size spectrometers (SMPS) were used for on-line measurements of size distributions of the produced nano-aerosols. Transmission electron microscopy, scanning electron microscopy, and atomic force microscopy were used as off-line measurement techniques for nanoparticles characterization. Samples were deposited on appropriate supports such as grids, filters, and mica plates by electrostatic precipitation and a filtration technique using SMPS controlled generation upstream. The results of the main size distribution parameters (mean and mode diameters), obtained from several laboratories, were compared based on metrological approaches including metrological traceability, calibration, and evaluation of the measurement uncertainty. Internationally harmonized measurement procedures for airborne SiO2 nanoparticles characterization are proposed.
Resumo:
A significant amount of speech is typically required for speaker verification system development and evaluation, especially in the presence of large intersession variability. This paper introduces a source and utterance duration normalized linear discriminant analysis (SUN-LDA) approaches to compensate session variability in short-utterance i-vector speaker verification systems. Two variations of SUN-LDA are proposed where normalization techniques are used to capture source variation from both short and full-length development i-vectors, one based upon pooling (SUN-LDA-pooled) and the other on concatenation (SUN-LDA-concat) across the duration and source-dependent session variation. Both the SUN-LDA-pooled and SUN-LDA-concat techniques are shown to provide improvement over traditional LDA on NIST 08 truncated 10sec-10sec evaluation conditions, with the highest improvement obtained with the SUN-LDA-concat technique achieving a relative improvement of 8% in EER for mis-matched conditions and over 3% for matched conditions over traditional LDA approaches.
Resumo:
In Victoria, as in other jurisdictions, there is very little research on the potential risks and benefits of lane filtering by motorcyclists, particularly from a road safety perspective. This on-road proof of concept study aimed to investigate whether and how lane filtering influences motorcycle rider situation awareness at intersections and to address factors that need to be considered for the design of a larger study in this area. Situation awareness refers to road users’ understanding of ‘what is going on’ around them and is a critical commodity for safe performance. Twenty-five experienced motorcyclists rode their own instrumented motorcycle around an urban test route in Melbourne whilst providing verbal protocols. Lane filtering occurred in 27% of 43 possible instances in which there were one or more vehicles in the traffic queue and the traffic lights were red on approach to the intersection. A network analysis procedure, based on the verbal protocols provided by motorcyclists, was used to identify differences in motorcyclist situation awareness between filtering and non-filtering events. Although similarities in situation awareness across filtering and nonfiltering motorcyclists were found, the analysis revealed some differences. For example, filtering motorcyclists placed more emphasis on the timing of the traffic light sequence and on their own actions when moving to the front of the traffic queue, whilst non-filtering motorcyclists paid greater attention to traffic moving through the intersection and approaching from behind. Based on the results of this study, the paper discusses some methodological and theoretical issues to be addressed in a larger study comparing situation awareness between filtering and non-filtering motorcyclists.
Resumo:
A "self-exciting" market is one in which the probability of observing a crash increases in response to the occurrence of a crash. It essentially describes cases where the initial crash serves to weaken the system to some extent, making subsequent crashes more likely. This thesis investigates if equity markets possess this property. A self-exciting extension of the well-known jump-based Bates (1996) model is used as the workhorse model for this thesis, and a particle-filtering algorithm is used to facilitate estimation by means of maximum likelihood. The estimation method is developed so that option prices are easily included in the dataset, leading to higher quality estimates. Equilibrium arguments are used to price the risks associated with the time-varying crash probability, and in turn to motivate a risk-neutral system for use in option pricing. The option pricing function for the model is obtained via the application of widely-used Fourier techniques. An application to S&P500 index returns and a panel of S&P500 index option prices reveals evidence of self excitation.
Resumo:
Over the last decade, the majority of existing search techniques is either keyword- based or category-based, resulting in unsatisfactory effectiveness. Meanwhile, studies have illustrated that more than 80% of users preferred personalized search results. As a result, many studies paid a great deal of efforts (referred to as col- laborative filtering) investigating on personalized notions for enhancing retrieval performance. One of the fundamental yet most challenging steps is to capture precise user information needs. Most Web users are inexperienced or lack the capability to express their needs properly, whereas the existent retrieval systems are highly sensitive to vocabulary. Researchers have increasingly proposed the utilization of ontology-based tech- niques to improve current mining approaches. The related techniques are not only able to refine search intentions among specific generic domains, but also to access new knowledge by tracking semantic relations. In recent years, some researchers have attempted to build ontological user profiles according to discovered user background knowledge. The knowledge is considered to be both global and lo- cal analyses, which aim to produce tailored ontologies by a group of concepts. However, a key problem here that has not been addressed is: how to accurately match diverse local information to universal global knowledge. This research conducts a theoretical study on the use of personalized ontolo- gies to enhance text mining performance. The objective is to understand user information needs by a \bag-of-concepts" rather than \words". The concepts are gathered from a general world knowledge base named the Library of Congress Subject Headings. To return desirable search results, a novel ontology-based mining approach is introduced to discover accurate search intentions and learn personalized ontologies as user profiles. The approach can not only pinpoint users' individual intentions in a rough hierarchical structure, but can also in- terpret their needs by a set of acknowledged concepts. Along with global and local analyses, another solid concept matching approach is carried out to address about the mismatch between local information and world knowledge. Relevance features produced by the Relevance Feature Discovery model, are determined as representatives of local information. These features have been proven as the best alternative for user queries to avoid ambiguity and consistently outperform the features extracted by other filtering models. The two attempt-to-proposed ap- proaches are both evaluated by a scientific evaluation with the standard Reuters Corpus Volume 1 testing set. A comprehensive comparison is made with a num- ber of the state-of-the art baseline models, including TF-IDF, Rocchio, Okapi BM25, the deploying Pattern Taxonomy Model, and an ontology-based model. The gathered results indicate that the top precision can be improved remarkably with the proposed ontology mining approach, where the matching approach is successful and achieves significant improvements in most information filtering measurements. This research contributes to the fields of ontological filtering, user profiling, and knowledge representation. The related outputs are critical when systems are expected to return proper mining results and provide personalized services. The scientific findings have the potential to facilitate the design of advanced preference mining models, where impact on people's daily lives.
Resumo:
A people-to-people matching system (or a match-making system) refers to a system in which users join with the objective of meeting other users with the common need. Some real-world examples of these systems are employer-employee (in job search networks), mentor-student (in university social networks), consume-to-consumer (in marketplaces) and male-female (in an online dating network). The network underlying in these systems consists of two groups of users, and the relationships between users need to be captured for developing an efficient match-making system. Most of the existing studies utilize information either about each of the users in isolation or their interaction separately, and develop recommender systems using the one form of information only. It is imperative to understand the linkages among the users in the network and use them in developing a match-making system. This study utilizes several social network analysis methods such as graph theory, small world phenomenon, centrality analysis, density analysis to gain insight into the entities and their relationships present in this network. This paper also proposes a new type of graph called “attributed bipartite graph”. By using these analyses and the proposed type of graph, an efficient hybrid recommender system is developed which generates recommendation for new users as well as shows improvement in accuracy over the baseline methods.
Resumo:
Using cooperative learning in classrooms promotes academic achievement, communication skills, problem-solving, social skills and student motivation. Yet it is reported that cooperative learning as a Western educational concept may be ineffective in Asian cultural contexts. The study aims to investigate the utilisation of scaffolding techniques for cooperative learning in Thailand primary mathematics classes. A teacher training program was designed to foster Thai primary school teachers’ cooperative learning implementation. Two teachers participated in this experimental program for one and a half weeks and then implemented cooperative learning strategies in their mathematics classes for six weeks. The data collected from teacher interviews and classroom observations indicates that the difficulty or failure of implementing cooperative learning in Thailand education may not be directly derived from cultural differences. Instead, it does indicate that Thai culture can be constructively merged with cooperative learning through a teacher training program and practices of scaffolding techniques.
Resumo:
Airport efficiency is important because it has a direct impact on customer safety and satisfaction and therefore the financial performance and sustainability of airports, airlines, and affiliated service providers. This is especially so in a world characterized by an increasing volume of both domestic and international air travel, price and other forms of competition between rival airports, airport hubs and airlines, and rapid and sometimes unexpected changes in airline routes and carriers. It also reflects expansion in the number of airports handling regional, national, and international traffic and the growth of complementary airport facilities including industrial, commercial, and retail premises. This has fostered a steadily increasing volume of research aimed at modeling and providing best-practice measures and estimates of airport efficiency using mathematical and econometric frontiers. The purpose of this chapter is to review these various methods as they apply to airports throughout the world. Apart from discussing the strengths and weaknesses of the different approaches and their key findings, the paper also examines the steps faced by researchers as they move through the modeling process in defining airport inputs and outputs and the purported efficiency drivers. Accordingly, the chapter provides guidance to those conducting empirical research on airport efficiency and serves as an aid for aviation regulators and airport operators among others interpreting airport efficiency research outcomes.
Resumo:
The Bluetooth technology is being increasingly used, among the Automated Vehicle Identification Systems, to retrieve important information about urban networks. Because the movement of Bluetooth-equipped vehicles can be monitored, throughout the network of Bluetooth sensors, this technology represents an effective means to acquire accurate time dependant Origin Destination information. In order to obtain reliable estimations, however, a number of issues need to be addressed, through data filtering and correction techniques. Some of the main challenges inherent to Bluetooth data are, first, that Bluetooth sensors may fail to detect all of the nearby Bluetooth-enabled vehicles. As a consequence, the exact journey for some vehicles may become a latent pattern that will need to be estimated. Second, sensors that are in close proximity to each other may have overlapping detection areas, thus making the task of retrieving the correct travelled path even more challenging. The aim of this paper is twofold: to give an overview of the issues inherent to the Bluetooth technology, through the analysis of the data available from the Bluetooth sensors in Brisbane; and to propose a method for retrieving the itineraries of the individual Bluetooth vehicles. We argue that estimating these latent itineraries, accurately, is a crucial step toward the retrieval of accurate dynamic Origin Destination Matrices.
Resumo:
Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, which has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering is rarely known. Patterns are always thought to be more representative than single terms for representing documents. In this paper, a novel information filtering model, Pattern-based Topic Model(PBTM) , is proposed to represent the text documents not only using the topic distributions at general level but also using semantic pattern representations at detailed specific level, both of which contribute to the accurate document representation and document relevance ranking. Extensive experiments are conducted to evaluate the effectiveness of PBTM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model achieves outstanding performance.
Resumo:
This paper presents a comparative study on the response of a buried tunnel to surface blast using the arbitrary Lagrangian-Eulerian (ALE) and smooth particle hydrodynamics (SPH) techniques. Since explosive tests with real physical models are extremely risky and expensive, the results of a centrifuge test were used to validate the numerical techniques. The numerical study shows that the ALE predictions were faster and closer to the experimental results than those from the SPH simulations which over predicted the strains. The findings of this research demonstrate the superiority of the ALE modelling techniques for the present study. They also provide a comprehensive understanding of the preferred ALE modelling techniques which can be used to investigate the surface blast response of underground tunnels.
Resumo:
Big Data presents many challenges related to volume, whether one is interested in studying past datasets or, even more problematically, attempting to work with live streams of data. The most obvious challenge, in a ‘noisy’ environment such as contemporary social media, is to collect the pertinent information; be that information for a specific study, tweets which can inform emergency services or other responders to an ongoing crisis, or give an advantage to those involved in prediction markets. Often, such a process is iterative, with keywords and hashtags changing with the passage of time, and both collection and analytic methodologies need to be continually adapted to respond to this changing information. While many of the data sets collected and analyzed are preformed, that is they are built around a particular keyword, hashtag, or set of authors, they still contain a large volume of information, much of which is unnecessary for the current purpose and/or potentially useful for future projects. Accordingly, this panel considers methods for separating and combining data to optimize big data research and report findings to stakeholders. The first paper considers possible coding mechanisms for incoming tweets during a crisis, taking a large stream of incoming tweets and selecting which of those need to be immediately placed in front of responders, for manual filtering and possible action. The paper suggests two solutions for this, content analysis and user profiling. In the former case, aspects of the tweet are assigned a score to assess its likely relationship to the topic at hand, and the urgency of the information, whilst the latter attempts to identify those users who are either serving as amplifiers of information or are known as an authoritative source. Through these techniques, the information contained in a large dataset could be filtered down to match the expected capacity of emergency responders, and knowledge as to the core keywords or hashtags relating to the current event is constantly refined for future data collection. The second paper is also concerned with identifying significant tweets, but in this case tweets relevant to particular prediction market; tennis betting. As increasing numbers of professional sports men and women create Twitter accounts to communicate with their fans, information is being shared regarding injuries, form and emotions which have the potential to impact on future results. As has already been demonstrated with leading US sports, such information is extremely valuable. Tennis, as with American Football (NFL) and Baseball (MLB) has paid subscription services which manually filter incoming news sources, including tweets, for information valuable to gamblers, gambling operators, and fantasy sports players. However, whilst such services are still niche operations, much of the value of information is lost by the time it reaches one of these services. The paper thus considers how information could be filtered from twitter user lists and hash tag or keyword monitoring, assessing the value of the source, information, and the prediction markets to which it may relate. The third paper examines methods for collecting Twitter data and following changes in an ongoing, dynamic social movement, such as the Occupy Wall Street movement. It involves the development of technical infrastructure to collect and make the tweets available for exploration and analysis. A strategy to respond to changes in the social movement is also required or the resulting tweets will only reflect the discussions and strategies the movement used at the time the keyword list is created — in a way, keyword creation is part strategy and part art. In this paper we describe strategies for the creation of a social media archive, specifically tweets related to the Occupy Wall Street movement, and methods for continuing to adapt data collection strategies as the movement’s presence in Twitter changes over time. We also discuss the opportunities and methods to extract data smaller slices of data from an archive of social media data to support a multitude of research projects in multiple fields of study. The common theme amongst these papers is that of constructing a data set, filtering it for a specific purpose, and then using the resulting information to aid in future data collection. The intention is that through the papers presented, and subsequent discussion, the panel will inform the wider research community not only on the objectives and limitations of data collection, live analytics, and filtering, but also on current and in-development methodologies that could be adopted by those working with such datasets, and how such approaches could be customized depending on the project stakeholders.
Resumo:
This paper proposes techniques to improve the performance of i-vector based speaker verification systems when only short utterances are available. Short-length utterance i-vectors vary with speaker, session variations, and the phonetic content of the utterance. Well established methods such as linear discriminant analysis (LDA), source-normalized LDA (SN-LDA) and within-class covariance normalisation (WCCN) exist for compensating the session variation but we have identified the variability introduced by phonetic content due to utterance variation as an additional source of degradation when short-duration utterances are used. To compensate for utterance variations in short i-vector speaker verification systems using cosine similarity scoring (CSS), we have introduced a short utterance variance normalization (SUVN) technique and a short utterance variance (SUV) modelling approach at the i-vector feature level. A combination of SUVN with LDA and SN-LDA is proposed to compensate the session and utterance variations and is shown to provide improvement in performance over the traditional approach of using LDA and/or SN-LDA followed by WCCN. An alternative approach is also introduced using probabilistic linear discriminant analysis (PLDA) approach to directly model the SUV. The combination of SUVN, LDA and SN-LDA followed by SUV PLDA modelling provides an improvement over the baseline PLDA approach. We also show that for this combination of techniques, the utterance variation information needs to be artificially added to full-length i-vectors for PLDA modelling.