979 resultados para Text classification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this paper is to analyze the performance of the Histograms of Oriented Gradients (HOG) as descriptors for traffic signs recognition. The test dataset consists of speed limit traffic signs because of their high inter-class similarities.   HOG features of speed limit signs, which were extracted from different traffic scenes, were computed and a Gentle AdaBoost classifier was invoked to evaluate the different features. The performance of HOG was tested with a dataset consisting of 1727 Swedish speed signs images. Different numbers of HOG features per descriptor, ranging from 36 features up 396 features, were computed for each traffic sign in the benchmark testing. The results show that HOG features perform high classification rate as the Gentle AdaBoost classification rate was 99.42%, and they are suitable to real time traffic sign recognition. However, it is found that changing the number of orientation bins has insignificant effect on the classification rate. In addition to this, HOG descriptors are not robust with respect to sign orientation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

COMPOSERS COMMONLY USE MAJOR OR MINOR SCALES to create different moods in music.Nonmusicians show poor discrimination and classification of this musical dimension; however, they can perform these tasks if the decision is phrased as happy vs. sad.We created pairs of melodies identical except for mode; the first major or minor third or sixth was the critical note that distinguished major from minor mode. Musicians and nonmusicians judged each melody as major vs. minor or happy vs. sad.We collected ERP waveforms, triggered to the onset of the critical note. Musicians showed a late positive component (P3) to the critical note only for the minor melodies, and in both tasks.Nonmusicians could adequately classify the melodies as happy or sad but showed little evidence of processing the critical information. Major appears to be the default mode in music, and musicians and nonmusicians apparently process mode differently.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results: We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion: The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

High-throughput gene expression technologies such as microarrays have been utilized in a variety of scientific applications. Most of the work has been on assessing univariate associations between gene expression with clinical outcome (variable selection) or on developing classification procedures with gene expression data (supervised learning). We consider a hybrid variable selection/classification approach that is based on linear combinations of the gene expression profiles that maximize an accuracy measure summarized using the receiver operating characteristic curve. Under a specific probability model, this leads to consideration of linear discriminant functions. We incorporate an automated variable selection approach using LASSO. An equivalence between LASSO estimation with support vector machines allows for model fitting using standard software. We apply the proposed method to simulated data as well as data from a recently published prostate cancer study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analyzing “nuggety” gold samples commonly produces erratic fire assay results, due to random inclusion or exclusion of coarse gold in analytical samples. Preconcentrating gold samples might allow the nuggets to be concentrated and fire assayed separately. In this investigation synthetic gold samples were made using similar density tungsten powder and silica, and were preconcentrated using two approaches: an air jig and an air classifier. Current analytical gold sampling method is time and labor intensive and our aim is to design a set-up for rapid testing. It was observed that the preliminary air classifier design showed more promise than the air jig in terms of control over mineral recovery and preconcentrating bulk ore sub-samples. Hence the air classifier was modified with the goal of producing 10-30 grams samples aiming to capture all of the high density metallic particles, tungsten in this case. Effects of air velocity and feed rate on the recovery of tungsten from synthetic tungsten-silica mixtures were studied. The air classifier achieved optimal high density metal recovery of 97.7% at an air velocity of 0.72 m/s and feed rate of 160 g/min. Effects of density on classification were investigated by using iron as the dense metal instead of tungsten and the recovery was seen to drop from 96.13% to 20.82%. Preliminary investigations suggest that preconcentration of gold samples is feasible using the laboratory designed air classifier.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A representative committee of Houston Academy of Medicine-Texas Medical Center Library staff and faculty, under the direction of the library administration, successfully redesigned a job classification system for the library's nonprofessional staff. In the new system all nonprofessionals are assigned to one of five grade levels, each with a corresponding salary range. To determine its appropriate grade level each job is analyzed and assigned a numerical value using a point system based on a set of five factors, each of which is assigned a relative number of points. The factors used to measure jobs are: education and experience, complexity of work, administrative accountability, manual skill, and contact with users. Each factor is described according to degrees, so that a job can be given partial credit for a factor. An advisory staff classification committee now participates in the ongoing administration of the classification system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a previous paper, we presented a proposed expansion of the National Guideline Clearing-house (NGC) classification1. We performed a preliminary evaluation of the classification based on 100 guidelines randomly selected from the NGC collection. We found that 89 of the 100 guidelines could be assigned to a single guideline category. To test inter-observer agreement, twenty guidelines were also categorized by a second investigator. Agreement was found to be 40-90% depending on the axis, which compares favorably with agreement among MeSH indexers (30-60%)2. We conclude that categorization is feasible. Further research is needed to clarify axes with poor inter-observer agreement.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Gray matter lesions are known to be common in multiple sclerosis (MS) and are suspected to play an important role in disease progression and clinical disability. A combination of magnetic resonance imaging (MRI) techniques, double-inversion recovery (DIR), and phase-sensitive inversion recovery (PSIR), has been used for detection and classification of cortical lesions. This study shows that high-resolution three-dimensional (3D) magnetization-prepared rapid acquisition with gradient echo (MPRAGE) improves the classification of cortical lesions by allowing more accurate anatomic localization of lesion morphology. METHODS: 11 patients with MS with previously identified cortical lesions were scanned using DIR, PSIR, and 3D MPRAGE. Lesions were identified on DIR and PSIR and classified as purely intracortical or mixed. MPRAGE images were then examined, and lesions were re-classified based on the new information. RESULTS: The high signal-to-noise ratio, fine anatomic detail, and clear gray-white matter tissue contrast seen in the MPRAGE images provided superior delineation of lesion borders and surrounding gray-white matter junction, improving classification accuracy. 119 lesions were identified as either intracortical or mixed on DIR/PSIR. In 89 cases, MPRAGE confirmed the classification by DIR/PSIR. In 30 cases, MPRAGE overturned the original classification. CONCLUSION: Improved classification of cortical lesions was realized by inclusion of high-spatial resolution 3D MPRAGE. This sequence provides unique detail on lesion morphology that is necessary for accurate classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: To develop and implement a method for improved cerebellar tissue classification on the MRI of brain by automatically isolating the cerebellum prior to segmentation. MATERIALS AND METHODS: Dual fast spin echo (FSE) and fluid attenuation inversion recovery (FLAIR) images were acquired on 18 normal volunteers on a 3 T Philips scanner. The cerebellum was isolated from the rest of the brain using a symmetric inverse consistent nonlinear registration of individual brain with the parcellated template. The cerebellum was then separated by masking the anatomical image with individual FLAIR images. Tissues in both the cerebellum and rest of the brain were separately classified using hidden Markov random field (HMRF), a parametric method, and then combined to obtain tissue classification of the whole brain. The proposed method for tissue classification on real MR brain images was evaluated subjectively by two experts. The segmentation results on Brainweb images with varying noise and intensity nonuniformity levels were quantitatively compared with the ground truth by computing the Dice similarity indices. RESULTS: The proposed method significantly improved the cerebellar tissue classification on all normal volunteers included in this study without compromising the classification in remaining part of the brain. The average similarity indices for gray matter (GM) and white matter (WM) in the cerebellum are 89.81 (+/-2.34) and 93.04 (+/-2.41), demonstrating excellent performance of the proposed methodology. CONCLUSION: The proposed method significantly improved tissue classification in the cerebellum. The GM was overestimated when segmentation was performed on the whole brain as a single object.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A patient classification system was developed integrating a patient acuity instrument with a computerized nursing distribution method based on a linear programming model. The system was designed for real-time measurement of patient acuity (workload) and allocation of nursing personnel to optimize the utilization of resources.^ The acuity instrument was a prototype tool with eight categories of patients defined by patient severity and nursing intensity parameters. From this tool, the demand for nursing care was defined in patient points with one point equal to one hour of RN time. Validity and reliability of the instrument was determined as follows: (1) Content validity by a panel of expert nurses; (2) predictive validity through a paired t-test analysis of preshift and postshift categorization of patients; (3) initial reliability by a one month pilot of the instrument in a practice setting; and (4) interrater reliability by the Kappa statistic.^ The nursing distribution system was a linear programming model using a branch and bound technique for obtaining integer solutions. The objective function was to minimize the total number of nursing personnel used by optimally assigning the staff to meet the acuity needs of the units. A penalty weight was used as a coefficient of the objective function variables to define priorities for allocation of staff.^ The demand constraints were requirements to meet the total acuity points needed for each unit and to have a minimum number of RNs on each unit. Supply constraints were: (1) total availability of each type of staff and the value of that staff member (value was determined relative to that type of staff's ability to perform the job function of an RN (i.e., value for eight hours RN = 8 points, LVN = 6 points); (2) number of personnel available for floating between units.^ The capability of the model to assign staff quantitatively and qualitatively equal to the manual method was established by a thirty day comparison. Sensitivity testing demonstrated appropriate adjustment of the optimal solution to changes in penalty coefficients in the objective function and to acuity totals in the demand constraints.^ Further investigation of the model documented: correct adjustment of assignments in response to staff value changes; and cost minimization by an addition of a dollar coefficient to the objective function. ^