944 resultados para automated text classification
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Recent experimental evidence has suggested a neuromodulatory deficit in Alzheimer's disease (AD). In this paper, we present a new electroencephalogram (EEG) based metric to quantitatively characterize neuromodulatory activity. More specifically, the short-term EEG amplitude modulation rate-of-change (i.e., modulation frequency) is computed for five EEG subband signals. To test the performance of the proposed metric, a classification task was performed on a database of 32 participants partitioned into three groups of approximately equal size: healthy controls, patients diagnosed with mild AD, and those with moderate-to-severe AD. To gauge the benefits of the proposed metric, performance results were compared with those obtained using EEG spectral peak parameters which were recently shown to outperform other conventional EEG measures. Using a simple feature selection algorithm based on area-under-the-curve maximization and a support vector machine classifier, the proposed parameters resulted in accuracy gains, relative to spectral peak parameters, of 21.3% when discriminating between the three groups and by 50% when mild and moderate-to-severe groups were merged into one. The preliminary findings reported herein provide promising insights that automated tools may be developed to assist physicians in very early diagnosis of AD as well as provide researchers with a tool to automatically characterize cross-frequency interactions and their changes with disease.
Resumo:
Ontology design and population -core aspects of semantic technologies- re- cently have become fields of great interest due to the increasing need of domain-specific knowledge bases that can boost the use of Semantic Web. For building such knowledge resources, the state of the art tools for ontology design require a lot of human work. Producing meaningful schemas and populating them with domain-specific data is in fact a very difficult and time-consuming task. Even more if the task consists in modelling knowledge at a web scale. The primary aim of this work is to investigate a novel and flexible method- ology for automatically learning ontology from textual data, lightening the human workload required for conceptualizing domain-specific knowledge and populating an extracted schema with real data, speeding up the whole ontology production process. Here computational linguistics plays a fundamental role, from automati- cally identifying facts from natural language and extracting frame of relations among recognized entities, to producing linked data with which extending existing knowledge bases or creating new ones. In the state of the art, automatic ontology learning systems are mainly based on plain-pipelined linguistics classifiers performing tasks such as Named Entity recognition, Entity resolution, Taxonomy and Relation extraction [11]. These approaches present some weaknesses, specially in capturing struc- tures through which the meaning of complex concepts is expressed [24]. Humans, in fact, tend to organize knowledge in well-defined patterns, which include participant entities and meaningful relations linking entities with each other. In literature, these structures have been called Semantic Frames by Fill- 6 Introduction more [20], or more recently as Knowledge Patterns [23]. Some NLP studies has recently shown the possibility of performing more accurate deep parsing with the ability of logically understanding the structure of discourse [7]. In this work, some of these technologies have been investigated and em- ployed to produce accurate ontology schemas. The long-term goal is to collect large amounts of semantically structured information from the web of crowds, through an automated process, in order to identify and investigate the cognitive patterns used by human to organize their knowledge.
Resumo:
Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.
Resumo:
COMPOSERS COMMONLY USE MAJOR OR MINOR SCALES to create different moods in music.Nonmusicians show poor discrimination and classification of this musical dimension; however, they can perform these tasks if the decision is phrased as happy vs. sad.We created pairs of melodies identical except for mode; the first major or minor third or sixth was the critical note that distinguished major from minor mode. Musicians and nonmusicians judged each melody as major vs. minor or happy vs. sad.We collected ERP waveforms, triggered to the onset of the critical note. Musicians showed a late positive component (P3) to the critical note only for the minor melodies, and in both tasks.Nonmusicians could adequately classify the melodies as happy or sad but showed little evidence of processing the critical information. Major appears to be the default mode in music, and musicians and nonmusicians apparently process mode differently.
Resumo:
Background: In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results: We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion: The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms.
Resumo:
Current methods to characterize mesenchymal stem cells (MSCs) are limited to CD marker expression, plastic adherence and their ability to differentiate into adipogenic, osteogenic and chondrogenic precursors. It seems evident that stem cells undergoing differentiation should differ in many aspects, such as morphology and possibly also behaviour; however, such a correlation has not yet been exploited for fate prediction of MSCs. Primary human MSCs from bone marrow were expanded and pelleted to form high-density cultures and were then randomly divided into four groups to differentiate into adipogenic, osteogenic chondrogenic and myogenic progenitor cells. The cells were expanded as heterogeneous and tracked with time-lapse microscopy to record cell shape, using phase-contrast microscopy. The cells were segmented using a custom-made image-processing pipeline. Seven morphological features were extracted for each of the segmented cells. Statistical analysis was performed on the seven-dimensional feature vectors, using a tree-like classification method. Differentiation of cells was monitored with key marker genes and histology. Cells in differentiation media were expressing the key genes for each of the three pathways after 21 days, i.e. adipogenic, osteogenic and chondrogenic, which was also confirmed by histological staining. Time-lapse microscopy data were obtained and contained new evidence that two cell shape features, eccentricity and filopodia (= 'fingers') are highly informative to classify myogenic differentiation from all others. However, no robust classifiers could be identified for the other cell differentiation paths. The results suggest that non-invasive automated time-lapse microscopy could potentially be used to predict the stem cell fate of hMSCs for clinical application, based on morphology for earlier time-points. The classification is challenged by cell density, proliferation and possible unknown donor-specific factors, which affect the performance of morphology-based approaches. Copyright © 2012 John Wiley & Sons, Ltd.
Resumo:
Placing portal incisions during arthroscopic hip surgery presents challenges for surgeons in terms of anatomic accessibility and patient safety. Based on key anatomic landmarks and portal placement information from recent literature, suggested portal incisions were determined. Guidance in the placement of the three most common portal incision locations (anterior, anterolateral, and posterolateral) for arthroscopic surgery; in addition to visual feedback on tool trajectory to the hip joint is provided in real time by a computer aided system for hip arthroscopy. By simplifying the portal placement process, one of the most challenging aspects of arthroscopic hip surgery, an increased use of this minimally invasive technique could be possible. In addition to portal information, improvements to an existing computer aided system for arthroscopic hip surgery, including a new hip model and redesigned mechanical tracking linkage, were completed.
Resumo:
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.
Resumo:
An extrusion die is used to continuously produce parts with a constant cross section; such as sheets, pipes, tire components and more complex shapes such as window seals. The die is fed by a screw extruder when polymers are used. The extruder melts, mixes and pressures the material by the rotation of either a single or double screw. The polymer can then be continuously forced through the die producing a long part in the shape of the die outlet. The extruded section is then cut to the desired length. Generally, the primary target of a well designed die is to produce a uniform outlet velocity without excessively raising the pressure required to extrude the polymer through the die. Other properties such as temperature uniformity and residence time are also important but are not directly considered in this work. Designing dies for optimal outlet velocity variation using simple analytical equations are feasible for basic die geometries or simple channels. Due to the complexity of die geometry and of polymer material properties design of complex dies by analytical methods is difficult. For complex dies iterative methods must be used to optimize dies. An automated iterative method is desired for die optimization. To automate the design and optimization of an extrusion die two issues must be dealt with. The first is how to generate a new mesh for each iteration. In this work, this is approached by modifying a Parasolid file that describes a CAD part. This file is then used in a commercial meshing software. Skewing the initial mesh to produce a new geometry was also employed as a second option. The second issue is an optimization problem with the presence of noise stemming from variations in the mesh and cumulative truncation errors. In this work a simplex method and a modified trust region method were employed for automated optimization of die geometries. For the trust region a discreet derivative and a BFGS Hessian approximation were used. To deal with the noise in the function the trust region method was modified to automatically adjust the discreet derivative step size and the trust region based on changes in noise and function contour. Generally uniformity of velocity at exit of the extrusion die can be improved by increasing resistance across the die but this is limited by the pressure capabilities of the extruder. In optimization, a penalty factor that increases exponentially from the pressure limit is applied. This penalty can be applied in two different ways; the first only to the designs which exceed the pressure limit, the second to both designs above and below the pressure limit. Both of these methods were tested and compared in this work.
Resumo:
Self-stabilization is a property of a distributed system such that, regardless of the legitimacy of its current state, the system behavior shall eventually reach a legitimate state and shall remain legitimate thereafter. The elegance of self-stabilization stems from the fact that it distinguishes distributed systems by a strong fault tolerance property against arbitrary state perturbations. The difficulty of designing and reasoning about self-stabilization has been witnessed by many researchers; most of the existing techniques for the verification and design of self-stabilization are either brute-force, or adopt manual approaches non-amenable to automation. In this dissertation, we first investigate the possibility of automatically designing self-stabilization through global state space exploration. In particular, we develop a set of heuristics for automating the addition of recovery actions to distributed protocols on various network topologies. Our heuristics equally exploit the computational power of a single workstation and the available parallelism on computer clusters. We obtain existing and new stabilizing solutions for classical protocols like maximal matching, ring coloring, mutual exclusion, leader election and agreement. Second, we consider a foundation for local reasoning about self-stabilization; i.e., study the global behavior of the distributed system by exploring the state space of just one of its components. It turns out that local reasoning about deadlocks and livelocks is possible for an interesting class of protocols whose proof of stabilization is otherwise complex. In particular, we provide necessary and sufficient conditions – verifiable in the local state space of every process – for global deadlock- and livelock-freedom of protocols on ring topologies. Local reasoning potentially circumvents two fundamental problems that complicate the automated design and verification of distributed protocols: (1) state explosion and (2) partial state information. Moreover, local proofs of convergence are independent of the number of processes in the network, thereby enabling our assertions about deadlocks and livelocks to apply on rings of arbitrary sizes without worrying about state explosion.
Resumo:
Analyzing “nuggety” gold samples commonly produces erratic fire assay results, due to random inclusion or exclusion of coarse gold in analytical samples. Preconcentrating gold samples might allow the nuggets to be concentrated and fire assayed separately. In this investigation synthetic gold samples were made using similar density tungsten powder and silica, and were preconcentrated using two approaches: an air jig and an air classifier. Current analytical gold sampling method is time and labor intensive and our aim is to design a set-up for rapid testing. It was observed that the preliminary air classifier design showed more promise than the air jig in terms of control over mineral recovery and preconcentrating bulk ore sub-samples. Hence the air classifier was modified with the goal of producing 10-30 grams samples aiming to capture all of the high density metallic particles, tungsten in this case. Effects of air velocity and feed rate on the recovery of tungsten from synthetic tungsten-silica mixtures were studied. The air classifier achieved optimal high density metal recovery of 97.7% at an air velocity of 0.72 m/s and feed rate of 160 g/min. Effects of density on classification were investigated by using iron as the dense metal instead of tungsten and the recovery was seen to drop from 96.13% to 20.82%. Preliminary investigations suggest that preconcentration of gold samples is feasible using the laboratory designed air classifier.
Resumo:
Quantifying belowground dynamics is critical to our understanding of plant and ecosystem function and belowground carbon cycling, yet currently available tools for complex belowground image analyses are insufficient. We introduce novel techniques combining digital image processing tools and geographic information systems (GIS) analysis to permit semi-automated analysis of complex root and soil dynamics. We illustrate methodologies with imagery from microcosms, minirhizotrons, and a rhizotron, in upland and peatland soils. We provide guidelines for correct image capture, a method that automatically stitches together numerous minirhizotron images into one seamless image, and image analysis using image segmentation and classification in SPRING or change analysis in ArcMap. These methods facilitate spatial and temporal root and soil interaction studies, providing a framework to expand a more comprehensive understanding of belowground dynamics.
Resumo:
A representative committee of Houston Academy of Medicine-Texas Medical Center Library staff and faculty, under the direction of the library administration, successfully redesigned a job classification system for the library's nonprofessional staff. In the new system all nonprofessionals are assigned to one of five grade levels, each with a corresponding salary range. To determine its appropriate grade level each job is analyzed and assigned a numerical value using a point system based on a set of five factors, each of which is assigned a relative number of points. The factors used to measure jobs are: education and experience, complexity of work, administrative accountability, manual skill, and contact with users. Each factor is described according to degrees, so that a job can be given partial credit for a factor. An advisory staff classification committee now participates in the ongoing administration of the classification system.