996 resultados para over-fitting


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Document classification is a supervised machine learning process, where predefined category labels are assigned to documents based on the hypothesis derived from training set of labelled documents. Documents cannot be directly interpreted by a computer system unless they have been modelled as a collection of computable features. Rogati and Yang [M. Rogati and Y. Yang, Resource selection for domain-specific cross-lingual IR, in SIGIR 2004: Proceedings of the 27th annual international conference on Research and Development in Information Retrieval, ACM Press, Sheffied: United Kingdom, pp. 154-161.] pointed out that the effectiveness of document classification system may vary in different domains. This implies that the quality of document model contributes to the effectiveness of document classification. Conventionally, model evaluation is accomplished by comparing the effectiveness scores of classifiers on model candidates. However, this kind of evaluation methods may encounter either under-fitting or over-fitting problems, because the effectiveness scores are restricted by the learning capacities of classifiers. We propose a model fitness evaluation method to determine whether a model is sufficient to distinguish positive and negative instances while still competent to provide satisfactory effectiveness with a small feature subset. Our experiments demonstrated how the fitness of models are assessed. The results of our work contribute to the researches of feature selection, dimensionality reduction and document classification.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a Bayesian framework for regression problems, which covers areas which are usually dealt with by function approximation. An online learning algorithm is derived which solves regression problems with a Kalman filter. Its solution always improves with increasing model complexity, without the risk of over-fitting. In the infinite dimension limit it approaches the true Bayesian posterior. The issues of prior selection and over-fitting are also discussed, showing that some of the commonly held beliefs are misleading. The practical implementation is summarised. Simulations using 13 popular publicly available data sets are used to demonstrate the method and highlight important issues concerning the choice of priors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Bayesian techniques have been developed over many years in a range of different fields, but have only recently been applied to the problem of learning in neural networks. As well as providing a consistent framework for statistical pattern recognition, the Bayesian approach offers a number of practical advantages including a potential solution to the problem of over-fitting. This chapter aims to provide an introductory overview of the application of Bayesian methods to neural networks. It assumes the reader is familiar with standard feed-forward network models and how to train them using conventional techniques.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Bayesian techniques have been developed over many years in a range of different fields, but have only recently been applied to the problem of learning in neural networks. As well as providing a consistent framework for statistical pattern recognition, the Bayesian approach offers a number of practical advantages including a potential solution to the problem of over-fitting. This chapter aims to provide an introductory overview of the application of Bayesian methods to neural networks. It assumes the reader is familiar with standard feed-forward network models and how to train them using conventional techniques.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as ƒ-test is performed during each node's split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Person re-identification involves recognizing a person across non-overlapping camera views, with different pose, illumination, and camera characteristics. We propose to tackle this problem by training a deep convolutional network to represent a person’s appearance as a low-dimensional feature vector that is invariant to common appearance variations encountered in the re-identification problem. Specifically, a Siamese-network architecture is used to train a feature extraction network using pairs of similar and dissimilar images. We show that use of a novel multi-task learning objective is crucial for regularizing the network parameters in order to prevent over-fitting due to the small size the training dataset. We complement the verification task, which is at the heart of re-identification, by training the network to jointly perform verification, identification, and to recognise attributes related to the clothing and pose of the person in each image. Additionally, we show that our proposed approach performs well even in the challenging cross-dataset scenario, which may better reflect real-world expected performance. 

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new), and respiratory rate predictor RRP) with three main components of cow’s milk (yield, fat, and protein) for cows in Iran. The least absolute shrinkage selection operator (LASSO) and the Akaike information criterion (AIC) techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R2 (0.50, 0.49) respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001) with R2 (0.69). For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The main purpose of this study is to assess the relationship between six bioclimatic indices for cattle (temperature humidity (THI), environmental stress (ESI), equivalent temperature (ESI), heat load (HLI), modified heat load (HLInew) and respiratory rate predictor(RRP)) and fundamental milk components (fat, protein, and milk yield) considering uncertainty. The climate parameters used to calculate the climate indices were taken from the NASA-Modern Era Retrospective-Analysis for Research and Applications (NASA-MERRA) reanalysis from 2002 to 2010. Cow milk data were considered for the same period from April to September when cows use natural pasture, with possibility for cows to choose to stay in the barn or to graze on the pasture in the pasturing system. The study is based on a linear regression analysis using correlations as a summarizing diagnostic. Bootstrapping is used to represent uncertainty estimation through resampling in the confidence intervals. To find the relationships between climate indices (THI, ETI, HLI, HLInew, ESI and RRP) and main components of cow milk (fat, protein and yield), multiple liner regression is applied. The least absolute shrinkage selection operator (LASSO) and the Akaike information criterion (AIC) techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Cross validation is used to avoid over-fitting. Based on results of investigation the effect of heat stress indices on milk compounds separately, we suggest the use of ESI and RRP in the summer and ESI in the spring. THI and HLInew are suggested for fat content and HLInew also is suggested for protein content in the spring season. The best linear models are found in spring between milk yield as predictands and THI, ESI,HLI, ETI and RRP as predictors with p-value < 0.001 and R2 0.50, 0.49. In summer, milk yield with independent variables of THI, ETI and ESI show the highest relation (p-value < 0.001) with R2 (0.69). For fat and protein the results are only marginal. It is strongly suggested that new and significant indices are needed to control critical heat stress conditions that consider more predictors of the effect of climate variability on animal products, such as sunshine duration, quality of pasture, the number of days of stress (NDS), the color of skin with attention to large black spots, and categorical predictors such as breed, welfare facility, and management system. This methodology is suggested for studies investigating the impacts of climate variability/change on food quality/security, animal science and agriculture using short term data considering uncertainty or data collection is expensive, difficult, or data with gaps.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A new method for fitting a series of Zernike polynomials to point clouds defined over connected domains of arbitrary shape defined within the unit circle is presented in this work. The method is based on the application of machine learning fitting techniques by constructing an extended training set in order to ensure the smooth variation of local curvature over the whole domain. Therefore this technique is best suited for fitting points corresponding to ophthalmic lenses surfaces, particularly progressive power ones, in non-regular domains. We have tested our method by fitting numerical and real surfaces reaching an accuracy of 1 micron in elevation and 0.1 D in local curvature in agreement with the customary tolerances in the ophthalmic manufacturing industry.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cosmetically tinted soft contact lenses are an attractive option for contact lens wearers. Data that we have gathered from annual contact lens fitting surveys demonstrate that those wearing tinted lenses are more likely to be female (4.6% of all soft lenses fitted vs. 1.6% for males; p < 0.0001) and younger (27 11 years vs. 33 13 years for those wearing non-tinted lenses; p < 0.0001). Tinted lenses tend to be worn more on a part-time basis and are replaced less frequently than non-tinted lenses. The decline in fitting tinted lenses over the past 12 years may be due to (a) the current limited availability of tinted lenses in silicone hydrogel materials and daily disposable replacement frequencies, which together represent a significant majority (78%) of new soft lenses fits today, (b) growing concerns among lens wearers and practitioners relating to the risks of complications associated with the wearing of tinted lenses, and (c) reduced promotion of such lenses by the contact lens industry.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Silicone hydrogel contact lenses were introduced into the market in 1999. To assess prescribing trends of this lens type since then, up to 1000 survey forms were sent to contact lens fitters in Australia, Canada, Japan, the Netherlands, Norway, the UK and the USA each year between 2000 and 2008. Practitioners were asked to record data relating to the first 10 contact lens fits or refits performed after receiving the survey form. Analysis of returned forms revealed a rapid increase in the prescribing of silicone hydrogel lenses over the survey period. In 2008, silicone hydrogel lenses represented 36% of all soft lenses prescribed. The categorization of the majority of lenses prescribed as ‘refits’ is primarily attributed to the mass conversion of lens wearers from hydrogel to silicone hydrogel lenses. Silicone hydrogels may soon represent the majority of soft contact lenses prescribed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Gait recognition approaches continue to struggle with challenges including view-invariance, low-resolution data, robustness to unconstrained environments, and fluctuating gait patterns due to subjects carrying goods or wearing different clothes. Although computationally expensive, model based techniques offer promise over appearance based techniques for these challenges as they gather gait features and interpret gait dynamics in skeleton form. In this paper, we propose a fast 3D ellipsoidal-based gait recognition algorithm using a 3D voxel model derived from multi-view silhouette images. This approach directly solves the limitations of view dependency and self-occlusion in existing ellipse fitting model-based approaches. Voxel models are segmented into four components (left and right legs, above and below the knee), and ellipsoids are fitted to each region using eigenvalue decomposition. Features derived from the ellipsoid parameters are modeled using a Fourier representation to retain the temporal dynamic pattern for classification. We demonstrate the proposed approach using the CMU MoBo database and show that an improvement of 15-20% can be achieved over a 2D ellipse fitting baseline.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose: Silicone hydrogel contact lenses (CLs) are becoming increasingly popular for daily wear (DW), extended wear (EW) and continuous wear (CW), due to their higher oxygen transmissibility compared to hydrogel CLs. The aim of this study was to investigate the clinical and subjective performance of asmofilcon A (Menicon Co., Ltd), a new surface treated silicone hydrogel CL, during 6-night EW over 6 months (M). Methods: A prospective, randomised, single-masked, monadic study was conducted. N=60 experienced DW soft CL wearers were randomly assigned to wear either asmofilcon A (test: Dk=129, water content (WC)=40%, Nanogloss surface treatment) or senofilcon A (control: Dk=103, WC=38%, PVP internal wetting agent, Vistakon, Johnson & Johnson Vision Care) CLs bilaterally for 6 M on an EW basis. A PHMB-preserved solution (Menicon Co., Ltd) was dispensed for CL care. Evaluations were conducted at CL delivery and after 1 week (W), 4 W, 3 M and 6 M of EW. At each visit, a range of objective and subjective clinical performance measures were assessed. Results: N=50 subjects (83%) successfully completed the study, with the majority of discontinuations due to loss to follow-up (n=3) or moving away/travel (n=5). N=2 subjects experienced adverse events; n=1 unilateral red eye with asmofilcon A and n=1 asymptomatic infiltrate with senofilcon A. There were no significant differences in high or low contrast distance visual acuity (HCDVA or LCDVA) between asmofilcon A and senofilcon A; however, LCDVA decreased significantly over time with both CL types (p<0.05). The two CL types did not vary significantly with respect to any of the objective and subjective measures assessed (p>0.05); CL fitting characteristics and CL surface measurements were very similar and mean bulbar and limbal redness measures were always less than grade 1.0. Superior palpebral conjunctival injection showed a statistically, but not clinically, significant increase over time with both CL types (p<0.05). Corneal staining did not vary significantly between asmofilcon A and senofilcon A (p>0.05), with low median gradings of less than 0.5 observed for all areas assessed. There were no solution-related staining reactions observed with either CL type. The asmofilcon A and senofilcon A CLs were both rated highly with respect to overall comfort, with medians of 14 or 15 hours of comfortable lens wearing time per day reported at each of the study visits (p>0.05). Conclusions: Over 6 months of EW, the asmofilcon A and senofilcon A CLs performed in a similar manner with respect to visual acuity, ocular health and CL performance measures. Some changes over time were observed with both CL types, including reduced LCDVA and increased superior palpebral injection, which warrant further investigation in longer-term EW studies. Asmofilcon A appeared to be equivalent in performance to senofilcon A.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A combination of factors has dictated patterns of prescribing to contact lens wearers in different age groups over time, such as the evolution of manufacturing technology in bringing better lens designs and replacement frequency options; the aging population demographic; and the knowledge and attitudes of practitioners. Here we explore evolving lens fitting practices at the opposite poles of the age spectrum—children and presbyopes.