893 resultados para Feature Extraction Algorithms


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Microposts are small fragments of social media content that have been published using a lightweight paradigm (e.g. Tweets, Facebook likes, foursquare check-ins). Microposts have been used for a variety of applications (e.g., sentiment analysis, opinion mining, trend analysis), by gleaning useful information, often using third-party concept extraction tools. There has been very large uptake of such tools in the last few years, along with the creation and adoption of new methods for concept extraction. However, the evaluation of such efforts has been largely consigned to document corpora (e.g. news articles), questioning the suitability of concept extraction tools and methods for Micropost data. This report describes the Making Sense of Microposts Workshop (#MSM2013) Concept Extraction Challenge, hosted in conjunction with the 2013 World Wide Web conference (WWW'13). The Challenge dataset comprised a manually annotated training corpus of Microposts and an unlabelled test corpus. Participants were set the task of engineering a concept extraction system for a defined set of concepts. Out of a total of 22 complete submissions 13 were accepted for presentation at the workshop; the submissions covered methods ranging from sequence mining algorithms for attribute extraction to part-of-speech tagging for Micropost cleaning and rule-based and discriminative models for token classification. In this report we describe the evaluation process and explain the performance of different approaches in different contexts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Differential evolution is an optimisation technique that has been successfully employed in various applications. In this paper, we apply differential evolution to the problem of extracting the optimal colours of a colour map for quantised images. The choice of entries in the colour map is crucial for the resulting image quality as it forms a look-up table that is used for all pixels in the image. We show that differential evolution can be effectively employed as a method for deriving the entries in the map. In order to optimise the image quality, our differential evolution approach is combined with a local search method that is guaranteed to find the local optimal colour map. This hybrid approach is shown to outperform various commonly used colour quantisation algorithms on a set of standard images. Copyright © 2010 Inderscience Enterprises Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.

While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Monitoring and tracking of IP traffic flows are essential for network services (i.e. packet forwarding). Packet header lookup is the main part of flow identification by determining the predefined matching action for each incoming flow. In this paper, an improved header lookup and flow rule update solution is investigated. A detailed study of several well-known lookup algorithms reveals that searching individual packet header field and combining the results achieve high lookup speed and flexibility. The proposed hybrid lookup architecture is comprised of various lookup algorithms, which are selected based on the user applications and system requirements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data mining can be defined as the extraction of implicit, previously un-known, and potentially useful information from data. Numerous re-searchers have been developing security technology and exploring new methods to detect cyber-attacks with the DARPA 1998 dataset for Intrusion Detection and the modified versions of this dataset KDDCup99 and NSL-KDD, but until now no one have examined the performance of the Top 10 data mining algorithms selected by experts in data mining. The compared classification learning algorithms in this thesis are: C4.5, CART, k-NN and Naïve Bayes. The performance of these algorithms are compared with accuracy, error rate and average cost on modified versions of NSL-KDD train and test dataset where the instances are classified into normal and four cyber-attack categories: DoS, Probing, R2L and U2R. Additionally the most important features to detect cyber-attacks in all categories and in each category are evaluated with Weka’s Attribute Evaluator and ranked according to Information Gain. The results show that the classification algorithm with best performance on the dataset is the k-NN algorithm. The most important features to detect cyber-attacks are basic features such as the number of seconds of a network connection, the protocol used for the connection, the network service used, normal or error status of the connection and the number of data bytes sent. The most important features to detect DoS, Probing and R2L attacks are basic features and the least important features are content features. Unlike U2R attacks, where the content features are the most important features to detect attacks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In computer vision, training a model that performs classification effectively is highly dependent on the extracted features, and the number of training instances. Conventionally, feature detection and extraction are performed by a domain-expert who, in many cases, is expensive to employ and hard to find. Therefore, image descriptors have emerged to automate these tasks. However, designing an image descriptor still requires domain-expert intervention. Moreover, the majority of machine learning algorithms require a large number of training examples to perform well. However, labelled data is not always available or easy to acquire, and dealing with a large dataset can dramatically slow down the training process. In this paper, we propose a novel Genetic Programming based method that automatically synthesises a descriptor using only two training instances per class. The proposed method combines arithmetic operators to evolve a model that takes an image and generates a feature vector. The performance of the proposed method is assessed using six datasets for texture classification with different degrees of rotation, and is compared with seven domain-expert designed descriptors. The results show that the proposed method is robust to rotation, and has significantly outperformed, or achieved a comparable performance to, the baseline methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Passive infrared sensors have widespread use in many applications, including motion detectors for alarms, lighting systems and hand dryers. Combinations of multiple PIR sensors have also been used to count the number of humans passing through doorways. In this paper, we demonstrate the potential of the PIR sensor as a tool for occupancy estimation inside of a monitored environment. Our approach shows how flexible nonparametric machine learning algorithms extract useful information about the occupancy from a single PIR sensor. The approach allows us to understand and make use of the motion patterns generated by people within the monitored environment. The proposed counting system uses information about those patterns to provide an accurate estimate of room occupancy which can be updated every 30 seconds. The system was successfully tested on data from more than 50 real office meetings consisting of at most 14 room occupants.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Crop monitoring and more generally land use change detection are of primary importance in order to analyze spatio-temporal dynamics and its impacts on environment. This aspect is especially true in such a region as the State of Mato Grosso (south of the Brazilian Amazon Basin) which hosts an intensive pioneer front. Deforestation in this region as often been explained by soybean expansion in the last three decades. Remote sensing techniques may now represent an efficient and objective manner to quantify how crops expansion really represents a factor of deforestation through crop mapping studies. Due to the special characteristics of the soybean productions' farms in Mato Grosso (area varying between 1000 hectares and 40000 hectares and individual fields often bigger than 100 hectares), the Moderate Resolution Imaging Spectroradiometer (MODIS) data with a near daily temporal resolution and 250 m spatial resolution can be considered as adequate resources to crop mapping. Especially, multitemporal vegetation indices (VI) studies have been currently used to realize this task [1] [2]. In this study, 16-days compositions of EVI (MODQ13 product) data are used. However, although these data are already processed, multitemporal VI profiles still remain noisy due to cloudiness (which is extremely frequent in a tropical region such as south Amazon Basin), sensor problems, errors in atmospheric corrections or BRDF effect. Thus, many works tried to develop algorithms that could smooth the multitemporal VI profiles in order to improve further classification. The goal of this study is to compare and test different smoothing algorithms in order to select the one which satisfies better to the demand which is classifying crop classes. Those classes correspond to 6 different agricultural managements observed in Mato Grosso through an intensive field work which resulted in mapping more than 1000 individual fields. The agricultural managements above mentioned are based on combination of soy, cotton, corn, millet and sorghum crops sowed in single or double crop systems. Due to the difficulty in separating certain classes because of too similar agricultural calendars, the classification will be reduced to 3 classes : Cotton (single crop), Soy and cotton (double crop), soy (single or double crop with corn, millet or sorghum). The classification will use training data obtained in the 2005-2006 harvest and then be tested on the 2006-2007 harvest. In a first step, four smoothing techniques are presented and criticized. Those techniques are Best Index Slope Extraction (BISE) [3], Mean Value Iteration (MVI) [4], Weighted Least Squares (WLS) [5] and Savitzky-Golay Filter (SG) [6] [7]. These techniques are then implemented and visually compared on a few individual pixels so that it allows doing a first selection between the five studied techniques. The WLS and SG techniques are selected according to criteria proposed by [8]. Those criteria are: ability in eliminating frequent noises, conserving the upper values of the VI profiles and keeping the temporality of the profiles. Those selected algorithms are then programmed and applied to the MODIS/TERRA EVI data (16-days composition periods). Tests of separability are realized based on the Jeffries-Matusita distance in order to see if the algorithms managed in improving the potential of differentiation between the classes. Those tests are realized on the overall profile (comprising 23 MODIS images) as well as on each MODIS sub-period of the profile [1]. This last test is a double interest process because it allows comparing the smoothing techniques and also enables to select a set of images which carries more information on the separability between the classes. Those selected dates can then be used to realize a supervised classification. Here three different classifiers are tested to evaluate if the smoothing techniques as a particular effect on the classification depending on the classifiers used. Those classifiers are Maximum Likelihood classifier, Spectral Angle Mapper (SAM) classifier and CHAID Improved Decision tree. It appears through the separability tests on the overall process that the smoothed profiles don't improve efficiently the potential of discrimination between classes when compared with the original data. However, the same tests realized on the MODIS sub-periods show better results obtained with the smoothed algorithms. The results of the classification confirm this first analyze. The Kappa coefficients are always better with the smoothing techniques and the results obtained with the WLS and SG smoothed profiles are nearly equal. However, the results are different depending on the classifier used. The impact of the smoothing algorithms is much better while using the decision tree model. Indeed, it allows a gain of 0.1 in the Kappa coefficient. While using the Maximum Likelihood end SAM models, the gain remains positive but is much lower (Kappa improved of 0.02 only). Thus, this work's aim is to prove the utility in smoothing the VI profiles in order to improve the final results. However, the choice of the smoothing algorithm has to be made considering the original data used and the classifier models used. In that case the Savitzky-Golay filter gave the better results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the field of vibration qualification testing, with the popular Random Control mode of shakers, the specimen is excited by random vibrations typically set in the form of a Power Spectral Density (PSD). The corresponding signals are stationary and Gaussian, i.e. featuring a normal distribution. Conversely, real-life excitations are frequently non-Gaussian, exhibiting high peaks and/or burst signals and/or deterministic harmonic components. The so-called kurtosis is a parameter often used to statistically describe the occurrence and significance of high peak values in a random process. Since the similarity between test input profiles and real-life excitations is fundamental for qualification test reliability, some methods of kurtosis-control can be implemented to synthesize realistic (non-Gaussian) input signals. Durability tests are performed to check the resistance of a component to vibration-based fatigue damage. A procedure to synthesize test excitations which starts from measured data and preserves both the damage potential and the characteristics of the reference signals is desirable. The Fatigue Damage Spectrum (FDS) is generally used to quantify the fatigue damage potential associated with the excitation. The signal synthesized for accelerated durability tests (i.e. with a limited duration) must feature the same FDS as the reference vibration computed for the component’s expected lifetime. Current standard procedures are efficient in synthesizing signals in the form of a PSD, but prove inaccurate if reference data are non-Gaussian. This work presents novel algorithms for the synthesis of accelerated durability test profiles with prescribed FDS and a non-Gaussian distribution. An experimental campaign is conducted to validate the algorithms, by testing their accuracy, robustness, and practical effectiveness. Moreover, an original procedure is proposed for the estimation of the fatigue damage potential, aiming to minimize the computational time. The research is thus supposed to improve both the effectiveness and the efficiency of excitation profile synthesis for accelerated durability tests.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To detect the presence of male DNA in vaginal samples collected from survivors of sexual violence and stored on filter paper. A pilot study was conducted to evaluate 10 vaginal samples spotted on sterile filter paper: 6 collected at random in April 2009 and 4 in October 2010. Time between sexual assault and sample collection was 4-48hours. After drying at room temperature, the samples were placed in a sterile envelope and stored for 2-3years until processing. DNA extraction was confirmed by polymerase chain reaction for human β-globin, and the presence of prostate-specific antigen (PSA) was quantified. The presence of the Y chromosome was detected using primers for sequences in the TSPY (Y7/Y8 and DYS14) and SRY genes. β-Globin was detected in all 10 samples, while 2 samples were positive for PSA. Half of the samples amplified the Y7/Y8 and DYS14 sequences of the TSPY gene and 30% amplified the SRY gene sequence of the Y chromosome. Four male samples and 1 female sample served as controls. Filter-paper spots stored for periods of up to 3years proved adequate for preserving genetic material from vaginal samples collected following sexual violence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the current study, a new approach has been developed for correcting the effect that moisture reduction after virgin olive oil (VOO) filtration exerts on the apparent increase of the secoiridoid content by using an internal standard during extraction. Firstly, two main Spanish varieties (Picual and Hojiblanca) were submitted to industrial filtration of VOOs. Afterwards, the moisture content was determined in unfiltered and filtered VOOs, and liquid-liquid extraction of phenolic compounds was performed using different internal standards. The resulting extracts were analyzed by HPLC-ESI-TOF/MS, in order to gain maximum information concerning the phenolic profiles of the samples under study. The reduction effect of filtration on the moisture content, phenolic alcohols, and flavones was confirmed at the industrial scale. Oleuropein was chosen as internal standard and, for the first time, the apparent increase of secoiridoids in filtered VOO was corrected, using a correction coefficient (Cc) calculated from the variation of internal standard area in filtered and unfiltered VOO during extraction. This approach gave the real concentration of secoiridoids in filtered VOO, and clarified the effect of the filtration step on the phenolic fraction. This finding is of great importance for future studies that seek to quantify phenolic compounds in VOOs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Originally from Asia, Dovyalis hebecarpa is a dark purple/red exotic berry now also produced in Brazil. However, no reports were found in the literature about phenolic extraction or characterisation of this berry. In this study we evaluate the extraction optimisation of anthocyanins and total phenolics in D. hebecarpa berries aiming at the development of a simple and mild analytical technique. Multivariate analysis was used to optimise the extraction variables (ethanol:water:acetone solvent proportions, times, and acid concentrations) at different levels. Acetone/water (20/80 v/v) gave the highest anthocyanin extraction yield, but pure water and different proportions of acetone/water or acetone/ethanol/water (with >50% of water) were also effective. Neither acid concentration nor time had a significant effect on extraction efficiency allowing to fix the recommended parameters at the lowest values tested (0.35% formic acid v/v, and 17.6 min). Under optimised conditions, extraction efficiencies were increased by 31.5% and 11% for anthocyanin and total phenolics, respectively as compared to traditional methods that use more solvent and time. Thus, the optimised methodology increased yields being less hazardous and time consuming than traditional methods. Finally, freeze-dried D. hebecarpa showed high content of target phytochemicals (319 mg/100g and 1,421 mg/100g of total anthocyanin and total phenolic content, respectively).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Extraction processes are largely used in many chemical, biotechnological and pharmaceutical industries for recovery of bioactive compounds from medicinal plants. To replace the conventional extraction techniques, new techniques as high-pressure extraction processes that use environment friendly solvents have been developed. However, these techniques, sometimes, are associated with low extraction rate. The ultrasound can be effectively used to improve the extraction rate by the increasing the mass transfer and possible rupture of cell wall due the formation of microcavities leading to higher product yields with reduced processing time and solvent consumption. This review presents a brief survey about the mechanism and aspects that affecting the ultrasound assisted extraction focusing on the use of ultrasound irradiation for high-pressure extraction processes intensification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purified genomic DNA can be difficult to obtain from some plant species because of the presence of impurities such as polysaccharides, which are often co-extracted with DNA. In this study, we developed a fast, simple, and low-cost protocol for extracting DNA from plants containing high levels of secondary metabolites. This protocol does not require the use of volatile toxic reagents such as mercaptoethanol, chloroform, or phenol and allows the extraction of high-quality DNA from wild and cultivated tropical species.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract In this paper, we address the problem of picking a subset of bids in a general combinatorial auction so as to maximize the overall profit using the first-price model. This winner determination problem assumes that a single bidding round is held to determine both the winners and prices to be paid. We introduce six variants of biased random-key genetic algorithms for this problem. Three of them use a novel initialization technique that makes use of solutions of intermediate linear programming relaxations of an exact mixed integer-linear programming model as initial chromosomes of the population. An experimental evaluation compares the effectiveness of the proposed algorithms with the standard mixed linear integer programming formulation, a specialized exact algorithm, and the best-performing heuristics proposed for this problem. The proposed algorithms are competitive and offer strong results, mainly for large-scale auctions.