944 resultados para automated text classification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective in this work is to build a rapid and automated numerical design method that makes optimal design of robots possible. In this work, two classes of optimal robot design problems were specifically addressed: (1) When the objective is to optimize a pre-designed robot, and (2) when the goal is to design an optimal robot from scratch. In the first case, to reach the optimum design some of the critical dimensions or specific measures to optimize (design parameters) are varied within an established range. Then the stress is calculated as a function of the design parameter(s), the design parameter(s) that optimizes a pre-determined performance index provides the optimum design. In the second case, this work focuses on the development of an automated procedure for the optimal design of robotic systems. For this purpose, Pro/Engineer© and MatLab© software packages are integrated to draw the robot parts, optimize them, and then re-draw the optimal system parts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Lena River Delta, situated in Northern Siberia (72.0 - 73.8° N, 122.0 - 129.5° E), is the largest Arctic delta and covers 29,000 km**2. Since natural deltas are characterised by complex geomorphological patterns and various types of ecosystems, high spatial resolution information on the distribution and extent of the delta environments is necessary for a spatial assessment and accurate quantification of biogeochemical processes as drivers for the emission of greenhouse gases from tundra soils. In this study, the first land cover classification for the entire Lena Delta based on Landsat 7 Enhanced Thematic Mapper (ETM+) images was conducted and used for the quantification of methane emissions from the delta ecosystems on the regional scale. The applied supervised minimum distance classification was very effective with the few ancillary data that were available for training site selection. Nine land cover classes of aquatic and terrestrial ecosystems in the wetland dominated (72%) Lena Delta could be defined by this classification approach. The mean daily methane emission of the entire Lena Delta was calculated with 10.35 mg CH4/m**2/d. Taking our multi-scale approach into account we find that the methane source strength of certain tundra wetland types is lower than calculated previously on coarser scales.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background.  The optimum approach for infectious complication surveillance for cardiac implantable electronic device (CIED) procedures is unclear. We created an automated surveillance tool for infectious complications after CIED procedures. Methods.  Adults having CIED procedures between January 1, 2005 and December 31, 2011 at Duke University Hospital were identified retrospectively using International Classification of Diseases, 9th revision (ICD-9) procedure codes. Potential infections were identified with combinations of ICD-9 diagnosis codes and microbiology data for 365 days postprocedure. All microbiology-identified and a subset of ICD-9 code-identified possible cases, as well as a subset of procedures without microbiology or ICD-9 codes, were reviewed. Test performance characteristics for specific queries were calculated. Results.  Overall, 6097 patients had 7137 procedures. Of these, 1686 procedures with potential infectious complications were identified: 174 by both ICD-9 code and microbiology, 14 only by microbiology, and 1498 only by ICD-9 criteria. We reviewed 558 potential cases, including all 188 microbiology-identified cases, 250 randomly selected ICD-9 cases, and 120 with neither. Overall, 65 unique infections were identified, including 5 of 250 reviewed cases identified only by ICD-9 codes. Queries that included microbiology data and ICD-9 code 996.61 had good overall test performance, with sensitivities of approximately 90% and specificities of approximately 80%. Queries with ICD-9 codes alone had poor specificity. Extrapolation of reviewed infectious rates to nonreviewed cases yields an estimated rate of infection of 1.3%. Conclusions.  Electronic queries with combinations of ICD-9 codes and microbiologic data can be created and have good test performance characteristics for identifying likely infectious complications of CIED procedures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.

Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.

Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.

Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Malware detection is a growing problem particularly on the Android mobile platform due to its increasing popularity and accessibility to numerous third party app markets. This has also been made worse by the increasingly sophisticated detection avoidance techniques employed by emerging malware families. This calls for more effective techniques for detection and classification of Android malware. Hence, in this paper we present an n-opcode analysis based approach that utilizes machine learning to classify and categorize Android malware. This approach enables automated feature discovery that eliminates the need for applying expert or domain knowledge to define the needed features. Our experiments on 2520 samples that were performed using up to 10-gram opcode features showed that an f-measure of 98% is achievable using this approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automated acceptance testing is the testing of software done in higher level to test whether the system abides by the requirements desired by the business clients by the use of piece of script other than the software itself. This project is a study of the feasibility of acceptance tests written in Behavior Driven Development principle. The project includes an implementation part where automated accep- tance testing is written for Touch-point web application developed by Dewire (a software consultant company) for Telia (a telecom company) from the require- ments received from the customer (Telia). The automated acceptance testing is in Cucumber-Selenium framework which enforces Behavior Driven Development principles. The purpose of the implementation is to verify the practicability of this style of acceptance testing. From the completion of implementation, it was concluded that all the requirements from customer in real world can be converted into executable specifications and the process was not at all time-consuming or difficult for a low-experienced programmer like the author itself. The project also includes survey to measure the learnability and understandability of Gherkin- the language that Cucumber understands. The survey consist of some Gherkin exam- ples followed with questions that include making changes to the Gherkin exam- ples. Survey had 3 parts: first being easy, second medium and third most difficult. Survey also had a linear scale from 1 to 5 to rate the difficulty level for each part of the survey. 1 stood for very easy and 5 for very difficult. Time when the partic- ipants began the survey was also taken in order to calculate the total time taken by the participants to learn and answer the questions. Survey was taken by 18 of the employers of Dewire who had primary working role as one of the programmer, tester and project manager. In the result, tester and project manager were grouped as non-programmer. The survey concluded that it is very easy and quick to learn Gherkin. While the participants rated Gherkin as very easy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recommendation systems aim to help users make decisions more efficiently. The most widely used method in recommendation systems is collaborative filtering, of which, a critical step is to analyze a user's preferences and make recommendations of products or services based on similarity analysis with other users' ratings. However, collaborative filtering is less usable for recommendation facing the "cold start" problem, i.e. few comments being given to products or services. To tackle this problem, we propose an improved method that combines collaborative filtering and data classification. We use hotel recommendation data to test the proposed method. The accuracy of the recommendation is determined by the rankings. Evaluations regarding the accuracies of Top-3 and Top-10 recommendation lists using the 10-fold cross-validation method and ROC curves are conducted. The results show that the Top-3 hotel recommendation list proposed by the combined method has the superiority of the recommendation performance than the Top-10 list under the cold start condition in most of the times.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Diplomacy often finds itself reduced to actions centred on states. However, after the Cold War, international relations and diplomacy have expanded with different actors growing into significant roles, particularly in the increase of diplomatic relations in the context of sport. The classification and significance of other actors remains under-researched in relation to sport, with literature focusing more on the growth of new and varying practices of diplomacy. This analysis contends that there is a need to interrogate fundamental components of modern diplomacy—with the actor being the focus—more specifically the classification of sports organisations in diplomacy. It is relevant as a more accurate understanding of sports organisations will contribute to how diplomatic studies can analyse and evaluate modern diplomacy within the context of sport. The International Olympic Committee is the actor used to illustrate how problematic classifications currently in the academic literature translate into weak and reduced analysis and evaluation of its role and significance in diplomacy. As counterpoint, this analysis proposes an analytical framework of socio-legal theory that harnesses legal regulation as a benchmark to classify an actor’s capacity within a society. In consequence, the IOC is as an active and significant contributor to the ever expanding and complex diplomatic environment and wider society.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Résumé : Une définition opérationnelle de la dyslexie qui est adéquate et pertinente à l'éducation n'a pu être identifiée suite à une recension des écrits. Les études sur la dyslexie se retrouvent principalement dans trois champs: la neurologie, la neurolinguistique et la génétique. Les résultats de ces recherches cependant, se limitent au domaine médical et ont peu d'utilité pour une enseignante ou un enseignant. La classification de la dyslexie de surface et la dyslexie profonde est la plus appropriée lorsque la dyslexie est définie comme trouble de lecture dans le contexte de l'éducation. L'objectif de ce mémoire était de développer un cadre conceptuel théorique dans lequel les troubles de lecture chez les enfants dyslexiques sont dû à une difficulté en résolution de problèmes dans le traitement de l'information. La validation du cadre conceptuel a été exécutée à l'aide d'un expert en psychologie cognitive, un expert en dyslexie et une enseignante. La perspective de la résolution de problèmes provient du traitement de l'information en psychologie cognitive. Le cadre conceptuel s'adresse uniquement aux troubles de lectures qui sont manifestés par les enfants dyslexiques.||Abstract : An extensive literature review failed to uncover an adequate operational definition of dyslexia applicable to education. The predominant fields of research that have produced most of the studies on dyslexia are neurology, neurolinguistics and genetics. Their perspectives were shown to be more pertinent to medical experts than to teachers. The categorization of surface and deep dyslexia was shown to be the best description of dyslexia in an educational context. The purpose of the present thesis was to develop a theoretical conceptual framework which describes a link between dyslexia, a text-processing model and problem solving. This conceptual framework was validated by three experts specializing in a specific field (either cognitive psychology, dyslexia or teaching). The concept of problem solving was based on information-processing theories in cognitive psychology. This framework applies specifically to reading difficulties which are manifested by dyslexic children.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In order to optimize frontal detection in sea surface temperature fields at 4 km resolution, a combined statistical and expert-based approach is applied to test different spatial smoothing of the data prior to the detection process. Fronts are usually detected at 1 km resolution using the histogram-based, single image edge detection (SIED) algorithm developed by Cayula and Cornillon in 1992, with a standard preliminary smoothing using a median filter and a 3 × 3 pixel kernel. Here, detections are performed in three study regions (off Morocco, the Mozambique Channel, and north-western Australia) and across the Indian Ocean basin using the combination of multiple windows (CMW) method developed by Nieto, Demarcq and McClatchie in 2012 which improves on the original Cayula and Cornillon algorithm. Detections at 4 km and 1 km of resolution are compared. Fronts are divided in two intensity classes (“weak” and “strong”) according to their thermal gradient. A preliminary smoothing is applied prior to the detection using different convolutions: three type of filters (median, average and Gaussian) combined with four kernel sizes (3 × 3, 5 × 5, 7 × 7, and 9 × 9 pixels) and three detection window sizes (16 × 16, 24 × 24 and 32 × 32 pixels) to test the effect of these smoothing combinations on reducing the background noise of the data and therefore on improving the frontal detection. The performance of the combinations on 4 km data are evaluated using two criteria: detection efficiency and front length. We find that the optimal combination of preliminary smoothing parameters in enhancing detection efficiency and preserving front length includes a median filter, a 16 × 16 pixel window size, and a 5 × 5 pixel kernel for strong fronts and a 7 × 7 pixel kernel for weak fronts. Results show an improvement in detection performance (from largest to smallest window size) of 71% for strong fronts and 120% for weak fronts. Despite the small window used (16 × 16 pixels), the length of the fronts has been preserved relative to that found with 1 km data. This optimal preliminary smoothing and the CMW detection algorithm on 4 km sea surface temperature data are then used to describe the spatial distribution of the monthly frequencies of occurrence for both strong and weak fronts across the Indian Ocean basin. In general strong fronts are observed in coastal areas whereas weak fronts, with some seasonal exceptions, are mainly located in the open ocean. This study shows that adequate noise reduction done by a preliminary smoothing of the data considerably improves the frontal detection efficiency as well as the global quality of the results. Consequently, the use of 4 km data enables frontal detections similar to 1 km data (using a standard median 3 × 3 convolution) in terms of detectability, length and location. This method, using 4 km data is easily applicable to large regions or at the global scale with far less constraints of data manipulation and processing time relative to 1 km data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automated fibre placement (AFP) enables the trajectory of unidirectional composite tape to be optimized, but laying down complex shapes with this technology can result in the introduction of defects. The aim of this experimental study is to investigate the influence of gaps and overlaps on the microstructure and tensile properties of carbon-epoxy laminates. First, a comparison between a hand-layup and AFP layup, draped and cured under the same conditions, shows equivalent microstructures and tensile properties. This provides the reference values for the study. Then, gap and overlap embedded defects (more or less severe) are introduced during manufacturing, on two cross-ply layups [(0°/(90°)5/0°] and [(90°/0°)2/90°]. Autoclave cure without a caul plate results in local thickness variation and microstructural changes which depend on the defect type. This has a strong influence on mechanical performance. Use of a caul plate avoids these variations and in this case embedded defects hardly affect tensile properties.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

International audience