964 resultados para machine-tools
Resumo:
The Perspex Machine arose from the unification of computation with geometry. We now report significant redevelopment of both a partial C compiler that generates perspex programs and of a Graphical User Interface (GUI). The compiler is constructed with standard compiler-generator tools and produces both an explicit parse tree for C and an Abstract Syntax Tree (AST) that is better suited to code generation. The GUI uses a hash table and a simpler software architecture to achieve an order of magnitude speed up in processing and, consequently, an order of magnitude increase in the number of perspexes that can be manipulated in real time (now 6,000). Two perspex-machine simulators are provided, one using trans-floating-point arithmetic and the other using transrational arithmetic. All of the software described here is available on the world wide web. The compiler generates code in the neural model of the perspex. At each branch point it uses a jumper to return control to the main fibre. This has the effect of pruning out an exponentially increasing number of branching fibres, thereby greatly increasing the efficiency of perspex programs as measured by the number of neurons required to implement an algorithm. The jumpers are placed at unit distance from the main fibre and form a geometrical structure analogous to a myelin sheath in a biological neuron. Both the perspex jumper-sheath and the biological myelin-sheath share the computational function of preventing cross-over of signals to neurons that lie close to an axon. This is an example of convergence driven by similar geometrical and computational constraints in perspex and biological neurons.
Resumo:
This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets.
Resumo:
Species` potential distribution modelling consists of building a representation of the fundamental ecological requirements of a species from biotic and abiotic conditions where the species is known to occur. Such models can be valuable tools to understand the biogeography of species and to support the prediction of its presence/absence considering a particular environment scenario. This paper investigates the use of different supervised machine learning techniques to model the potential distribution of 35 plant species from Latin America. Each technique was able to extract a different representation of the relations between the environmental conditions and the distribution profile of the species. The experimental results highlight the good performance of random trees classifiers, indicating this particular technique as a promising candidate for modelling species` potential distribution. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by NIT tools can be distinguished from their manual counterparts by means of metrics such as in-(ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better NIT tools and automatic evaluation metrics.
Resumo:
This study examines the question of how language teachers in a highly technologyfriendly university environment view machine translation and the implications that this has for the personal learning environments of students. It brings an activity-theory perspective to the question, examining the ways that the introduction of new tools can disrupt the relationship between different elements in an activity system. This perspective opens up for an investigation of the ways that new tools have the potential to fundamentally alter traditional learning activities. In questionnaires and group discussions, respondents showed general agreement that although use of machine translation by students could be considered cheating, students are bound to use it anyway, and suggested that teachers focus on the kinds of skills students would need when using machine translation and design assignments and exams to practice and assess these skills. The results of the empirical study are used to reflect upon questions of what the roles of teachers and students are in a context where many of the skills that a person needs to be able to interact in a foreign language increasingly can be outsourced to laptops and smartphones.
Resumo:
This study examines the question of how language teachers in a highly technology-friendly university environment view machine translation and the implications that this has for the personal learning environments of students. It brings an activity-theory perspective to the question, examining the ways that the introduction of new tools can disrupt the relationship between different elements in an activity system. This perspective opens up for an investigation of the ways that new tools have the potential to fundamentally alter traditional learning activities. In questionnaires and group discussions, respondents showed general agreement that although use of machine translation by students could be considered cheating, students are bound to use it anyway, and suggested that teachers focus on the kinds of skills students would need when using machine translation and design assignments and exams to practice and assess these skills. The results of the empirical study are used to reflect upon questions of what the roles of teachers and students are in a context where many of the skills that a person needs to be able to interact in a foreign language increasingly can be outsourced to laptops and smartphones.
Resumo:
One problem with using component-based software development approach is that once software modules are reused over generations of products, they form legacy structures that can be challenging to understand, making validating these systems difficult. Therefore, tools and methodologies that enable engineers to see interactions of these software modules will enhance their ability to make these software systems more dependable. To address this need, we propose SimSight, a framework to capture dynamic call graphs in Simics, a widely adopted commercial full-system simulator. Simics is a software system that simulates complete computer systems. Thus, it performs nearly identical tasks to a real system but at a much lower speed while providing greater execution observability. We have implemented SimSight to generate dynamic call graphs of statically and dynamically linked functions in x86/Linux environment. A case study illustrates how we can use SimSight to identify sources of software errors. We then evaluate its performance using 12 integer programs from SPEC CPU2006 benchmark suite.
Resumo:
The installation of induction distributed generators should be preceded by a careful study in order to determine if the point of common coupling is suitable for transmission of the generated power, keeping acceptable power quality and system stability. In this sense, this paper presents a simple analytical formulation that allows a fast and comprehensive evaluation of the maximum power delivered by the induction generator, without losing voltage stability. Moreover, this formulation can be used to identify voltage stability issues that limit the generator output power. All the formulation is developed by using the equivalent circuit of squirrel-cage induction machine. Simulation results are used to validate the method, which enables the approach to be used as a guide to reduce the simulation efforts necessary to assess the maximum output power and voltage stability of induction generators. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Several recent studies in literature have identified brain morphological alterations associated to Borderline Personality Disorder (BPD) patients. These findings are reported by studies based on voxel-based-morphometry analysis of structural MRI data, comparing mean gray-matter concentration between groups of BPD patients and healthy controls. On the other hand, mean differences between groups are not informative about the discriminative value of neuroimaging data to predict the group of individual subjects. In this paper, we go beyond mean differences analyses, and explore to what extent individual BPD patients can be differentiated from controls (25 subjects in each group), using a combination of automated-morphometric tools for regional cortical thickness/volumetric estimation and Support Vector Machine classifier. The approach included a feature selection step in order to identify the regions containing most discriminative information. The accuracy of this classifier was evaluated using the leave-one-subject-out procedure. The brain regions indicated as containing relevant information to discriminate groups were the orbitofrontal, rostral anterior cingulate, posterior cingulate, middle temporal cortices, among others. These areas, which are distinctively involved in emotional and affect regulation of BPD patients, were the most informative regions to achieve both sensitivity and specificity values of 80% in SVM classification. The findings suggest that this new methodology can add clinical and potential diagnostic value to neuroimaging of psychiatric disorders. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe
Resumo:
Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.
Resumo:
Data sets describing the state of the earth's atmosphere are of great importance in the atmospheric sciences. Over the last decades, the quality and sheer amount of the available data increased significantly, resulting in a rising demand for new tools capable of handling and analysing these large, multidimensional sets of atmospheric data. The interdisciplinary work presented in this thesis covers the development and the application of practical software tools and efficient algorithms from the field of computer science, aiming at the goal of enabling atmospheric scientists to analyse and to gain new insights from these large data sets. For this purpose, our tools combine novel techniques with well-established methods from different areas such as scientific visualization and data segmentation. In this thesis, three practical tools are presented. Two of these tools are software systems (Insight and IWAL) for different types of processing and interactive visualization of data, the third tool is an efficient algorithm for data segmentation implemented as part of Insight.Insight is a toolkit for the interactive, three-dimensional visualization and processing of large sets of atmospheric data, originally developed as a testing environment for the novel segmentation algorithm. It provides a dynamic system for combining at runtime data from different sources, a variety of different data processing algorithms, and several visualization techniques. Its modular architecture and flexible scripting support led to additional applications of the software, from which two examples are presented: the usage of Insight as a WMS (web map service) server, and the automatic production of a sequence of images for the visualization of cyclone simulations. The core application of Insight is the provision of the novel segmentation algorithm for the efficient detection and tracking of 3D features in large sets of atmospheric data, as well as for the precise localization of the occurring genesis, lysis, merging and splitting events. Data segmentation usually leads to a significant reduction of the size of the considered data. This enables a practical visualization of the data, statistical analyses of the features and their events, and the manual or automatic detection of interesting situations for subsequent detailed investigation. The concepts of the novel algorithm, its technical realization, and several extensions for avoiding under- and over-segmentation are discussed. As example applications, this thesis covers the setup and the results of the segmentation of upper-tropospheric jet streams and cyclones as full 3D objects. Finally, IWAL is presented, which is a web application for providing an easy interactive access to meteorological data visualizations, primarily aimed at students. As a web application, the needs to retrieve all input data sets and to install and handle complex visualization tools on a local machine are avoided. The main challenge in the provision of customizable visualizations to large numbers of simultaneous users was to find an acceptable trade-off between the available visualization options and the performance of the application. Besides the implementational details, benchmarks and the results of a user survey are presented.
Resumo:
OBJECTIVE: The purpose of this study was to adapt and improve a minimally invasive two-step postmortem angiographic technique for use on human cadavers. Detailed mapping of the entire vascular system is almost impossible with conventional autopsy tools. The technique described should be valuable in the diagnosis of vascular abnormalities. MATERIALS AND METHODS: Postmortem perfusion with an oily liquid is established with a circulation machine. An oily contrast agent is introduced as a bolus injection, and radiographic imaging is performed. In this pilot study, the upper or lower extremities of four human cadavers were perfused. In two cases, the vascular system of a lower extremity was visualized with anterograde perfusion of the arteries. In the other two cases, in which the suspected cause of death was drug intoxication, the veins of an upper extremity were visualized with retrograde perfusion of the venous system. RESULTS: In each case, the vascular system was visualized up to the level of the small supplying and draining vessels. In three of the four cases, vascular abnormalities were found. In one instance, a venous injection mark engendered by the self-administration of drugs was rendered visible by exudation of the contrast agent. In the other two cases, occlusion of the arteries and veins was apparent. CONCLUSION: The method described is readily applicable to human cadavers. After establishment of postmortem perfusion with paraffin oil and injection of the oily contrast agent, the vascular system can be investigated in detail and vascular abnormalities rendered visible.
Resumo:
Training and assessment paradigms for laparoscopic surgical skills are evolving from traditional mentor–trainee tutorship towards structured, more objective and safer programs. Accreditation of surgeons requires reaching a consensus on metrics and tasks used to assess surgeons’ psychomotor skills. Ongoing development of tracking systems and software solutions has allowed for the expansion of novel training and assessment means in laparoscopy. The current challenge is to adapt and include these systems within training programs, and to exploit their possibilities for evaluation purposes. This paper describes the state of the art in research on measuring and assessing psychomotor laparoscopic skills. It gives an overview on tracking systems as well as on metrics and advanced statistical and machine learning techniques employed for evaluation purposes. The later ones have a potential to be used as an aid in deciding on the surgical competence level, which is an important aspect when accreditation of the surgeons in particular, and patient safety in general, are considered. The prospective of these methods and tools make them complementary means for surgical assessment of motor skills, especially in the early stages of training. Successful examples such as the Fundamentals of Laparoscopic Surgery should help drive a paradigm change to structured curricula based on objective parameters. These may improve the accreditation of new surgeons, as well as optimize their already overloaded training schedules.
Resumo:
El aprendizaje automático y la cienciometría son las disciplinas científicas que se tratan en esta tesis. El aprendizaje automático trata sobre la construcción y el estudio de algoritmos que puedan aprender a partir de datos, mientras que la cienciometría se ocupa principalmente del análisis de la ciencia desde una perspectiva cuantitativa. Hoy en día, los avances en el aprendizaje automático proporcionan las herramientas matemáticas y estadísticas para trabajar correctamente con la gran cantidad de datos cienciométricos almacenados en bases de datos bibliográficas. En este contexto, el uso de nuevos métodos de aprendizaje automático en aplicaciones de cienciometría es el foco de atención de esta tesis doctoral. Esta tesis propone nuevas contribuciones en el aprendizaje automático que podrían arrojar luz sobre el área de la cienciometría. Estas contribuciones están divididas en tres partes: Varios modelos supervisados (in)sensibles al coste son aprendidos para predecir el éxito científico de los artículos y los investigadores. Los modelos sensibles al coste no están interesados en maximizar la precisión de clasificación, sino en la minimización del coste total esperado derivado de los errores ocasionados. En este contexto, los editores de revistas científicas podrían disponer de una herramienta capaz de predecir el número de citas de un artículo en el fututo antes de ser publicado, mientras que los comités de promoción podrían predecir el incremento anual del índice h de los investigadores en los primeros años. Estos modelos predictivos podrían allanar el camino hacia nuevos sistemas de evaluación. Varios modelos gráficos probabilísticos son aprendidos para explotar y descubrir nuevas relaciones entre el gran número de índices bibliométricos existentes. En este contexto, la comunidad científica podría medir cómo algunos índices influyen en otros en términos probabilísticos y realizar propagación de la evidencia e inferencia abductiva para responder a preguntas bibliométricas. Además, la comunidad científica podría descubrir qué índices bibliométricos tienen mayor poder predictivo. Este es un problema de regresión multi-respuesta en el que el papel de cada variable, predictiva o respuesta, es desconocido de antemano. Los índices resultantes podrían ser muy útiles para la predicción, es decir, cuando se conocen sus valores, el conocimiento de cualquier valor no proporciona información sobre la predicción de otros índices bibliométricos. Un estudio bibliométrico sobre la investigación española en informática ha sido realizado bajo la cultura de publicar o morir. Este estudio se basa en una metodología de análisis de clusters que caracteriza la actividad en la investigación en términos de productividad, visibilidad, calidad, prestigio y colaboración internacional. Este estudio también analiza los efectos de la colaboración en la productividad y la visibilidad bajo diferentes circunstancias. ABSTRACT Machine learning and scientometrics are the scientific disciplines which are covered in this dissertation. Machine learning deals with the construction and study of algorithms that can learn from data, whereas scientometrics is mainly concerned with the analysis of science from a quantitative perspective. Nowadays, advances in machine learning provide the mathematical and statistical tools for properly working with the vast amount of scientometrics data stored in bibliographic databases. In this context, the use of novel machine learning methods in scientometrics applications is the focus of attention of this dissertation. This dissertation proposes new machine learning contributions which would shed light on the scientometrics area. These contributions are divided in three parts: Several supervised cost-(in)sensitive models are learned to predict the scientific success of articles and researchers. Cost-sensitive models are not interested in maximizing classification accuracy, but in minimizing the expected total cost of the error derived from mistakes in the classification process. In this context, publishers of scientific journals could have a tool capable of predicting the citation count of an article in the future before it is published, whereas promotion committees could predict the annual increase of the h-index of researchers within the first few years. These predictive models would pave the way for new assessment systems. Several probabilistic graphical models are learned to exploit and discover new relationships among the vast number of existing bibliometric indices. In this context, scientific community could measure how some indices influence others in probabilistic terms and perform evidence propagation and abduction inference for answering bibliometric questions. Also, scientific community could uncover which bibliometric indices have a higher predictive power. This is a multi-output regression problem where the role of each variable, predictive or response, is unknown beforehand. The resulting indices could be very useful for prediction purposes, that is, when their index values are known, knowledge of any index value provides no information on the prediction of other bibliometric indices. A scientometric study of the Spanish computer science research is performed under the publish-or-perish culture. This study is based on a cluster analysis methodology which characterizes the research activity in terms of productivity, visibility, quality, prestige and international collaboration. This study also analyzes the effects of collaboration on productivity and visibility under different circumstances.