990 resultados para Computer Science engineer
Resumo:
The generation of heterogeneous big data sources with ever increasing volumes, velocities and veracities over the he last few years has inspired the data science and research community to address the challenge of extracting knowledge form big data. Such a wealth of generated data across the board can be intelligently exploited to advance our knowledge about our environment, public health, critical infrastructure and security. In recent years we have developed generic approaches to process such big data at multiple levels for advancing decision-support. It specifically concerns data processing with semantic harmonisation, low level fusion, analytics, knowledge modelling with high level fusion and reasoning. Such approaches will be introduced and presented in context of the TRIDEC project results on critical oil and gas industry drilling operations and also the ongoing large eVacuate project on critical crowd behaviour detection in confined spaces.
Resumo:
Abstract: In the mid-1990s when I worked for a telecommunications giant I struggled to gain access to basic geodemographic data. It cost hundreds of thousands of dollars at the time to simply purchase a tile of satellite imagery from Marconi, and it was often cheaper to create my own maps using a digitizer and A0 paper maps. Everything from granular administrative boundaries to right-of-ways to points of interest and geocoding capabilities were either unavailable for the places I was working in throughout Asia or very limited. The control of this data was either in a government’s census and statistical bureau or was created by a handful of forward thinking corporations. Twenty years on we find ourselves inundated with data (location and other) that we are challenged to amalgamate, and much of it still “dirty” in nature. Open data initiatives such as ODI give us great hope for how we might be able to share information together and capitalize not only in the crowdsourcing behavior but in the implications for positive usage for the environment and for the advancement of humanity. We are already gathering and amassing a great deal of data and insight through excellent citizen science participatory projects across the globe. In early 2015, I delivered a keynote at the Data Made Me Do It conference at UC Berkeley, and in the preceding year an invited talk at the inaugural QSymposium. In gathering research for these presentations, I began to ponder on the effect that social machines (in effect, autonomous data collection subjects and objects) might have on social behaviors. I focused on studying the problem of data from various veillance perspectives, with an emphasis on the shortcomings of uberveillance which included the potential for misinformation, misinterpretation, and information manipulation when context was entirely missing. As we build advanced systems that rely almost entirely on social machines, we need to ponder on the risks associated with following a purely technocratic approach where machines devoid of intelligence may one day dictate what humans do at the fundamental praxis level. What might be the fallout of uberveillance? Bio: Dr Katina Michael is a professor in the School of Computing and Information Technology at the University of Wollongong. She presently holds the position of Associate Dean – International in the Faculty of Engineering and Information Sciences. Katina is the IEEE Technology and Society Magazine editor-in-chief, and IEEE Consumer Electronics Magazine senior editor. Since 2008 she has been a board member of the Australian Privacy Foundation, and until recently was the Vice-Chair. Michael researches on the socio-ethical implications of emerging technologies with an emphasis on an all-hazards approach to national security. She has written and edited six books, guest edited numerous special issue journals on themes related to radio-frequency identification (RFID) tags, supply chain management, location-based services, innovation and surveillance/ uberveillance for Proceedings of the IEEE, Computer and IEEE Potentials. Prior to academia, Katina worked for Nortel Networks as a senior network engineer in Asia, and also in information systems for OTIS and Andersen Consulting. She holds cross-disciplinary qualifications in technology and law.
Resumo:
Abstract Heading into the 2020s, Physics and Astronomy are undergoing experimental revolutions that will reshape our picture of the fabric of the Universe. The Large Hadron Collider (LHC), the largest particle physics project in the world, produces 30 petabytes of data annually that need to be sifted through, analysed, and modelled. In astrophysics, the Large Synoptic Survey Telescope (LSST) will be taking a high-resolution image of the full sky every 3 days, leading to data rates of 30 terabytes per night over ten years. These experiments endeavour to answer the question why 96% of the content of the universe currently elude our physical understanding. Both the LHC and LSST share the 5-dimensional nature of their data, with position, energy and time being the fundamental axes. This talk will present an overview of the experiments and data that is gathered, and outlines the challenges in extracting information. Common strategies employed are very similar to industrial data! Science problems (e.g., data filtering, machine learning, statistical interpretation) and provide a seed for exchange of knowledge between academia and industry. Speaker Biography Professor Mark Sullivan Mark Sullivan is a Professor of Astrophysics in the Department of Physics and Astronomy. Mark completed his PhD at Cambridge, and following postdoctoral study in Durham, Toronto and Oxford, now leads a research group at Southampton studying dark energy using exploding stars called "type Ia supernovae". Mark has many years' experience of research that involves repeatedly imaging the night sky to track the arrival of transient objects, involving significant challenges in data handling, processing, classification and analysis.
Resumo:
We present an Integrated Environment suitable for learning and teaching computer programming which is designed for both students of specialised Computer Science courses, and also non-specialist students such as those following Liberal Arts. The environment is rich enough to allow exploration of concepts from robotics, artificial intelligence, social science, and philosophy as well as the specialist areas of operating systems and the various computer programming paradigms.
Resumo:
Abstract not available
Resumo:
This thesis reports on an investigation of the feasibility and usefulness of incorporating dynamic management facilities for managing sensed context data in a distributed contextaware mobile application. The investigation focuses on reducing the work required to integrate new sensed context streams in an existing context aware architecture. Current architectures require integration work for new streams and new contexts that are encountered. This means of operation is acceptable for current fixed architectures. However, as systems become more mobile the number of discoverable streams increases. Without the ability to discover and use these new streams the functionality of any given device will be limited to the streams that it knows how to decode. The integration of new streams requires that the sensed context data be understood by the current application. If the new source provides data of a type that an application currently requires then the new source should be connected to the application without any prior knowledge of the new source. If the type is similar and can be converted then this stream too should be appropriated by the application. Such applications are based on portable devices (phones, PDAs) for semi-autonomous services that use data from sensors connected to the devices, plus data exchanged with other such devices and remote servers. Such applications must handle input from a variety of sensors, refining the data locally and managing its communication from the device in volatile and unpredictable network conditions. The choice to focus on locally connected sensory input allows for the introduction of privacy and access controls. This local control can determine how the information is communicated to others. This investigation focuses on the evaluation of three approaches to sensor data management. The first system is characterised by its static management based on the pre-pended metadata. This was the reference system. Developed for a mobile system, the data was processed based on the attached metadata. The code that performed the processing was static. The second system was developed to move away from the static processing and introduce a greater freedom of handling for the data stream, this resulted in a heavy weight approach. The approach focused on pushing the processing of the data into a number of networked nodes rather than the monolithic design of the previous system. By creating a separate communication channel for the metadata it is possible to be more flexible with the amount and type of data transmitted. The final system pulled the benefits of the other systems together. By providing a small management class that would load a separate handler based on the incoming data, Dynamism was maximised whilst maintaining ease of code understanding. The three systems were then compared to highlight their ability to dynamically manage new sensed context. The evaluation took two approaches, the first is a quantitative analysis of the code to understand the complexity of the relative three systems. This was done by evaluating what changes to the system were involved for the new context. The second approach takes a qualitative view of the work required by the software engineer to reconfigure the systems to provide support for a new data stream. The evaluation highlights the various scenarios in which the three systems are most suited. There is always a trade-o↵ in the development of a system. The three approaches highlight this fact. The creation of a statically bound system can be quick to develop but may need to be completely re-written if the requirements move too far. Alternatively a highly dynamic system may be able to cope with new requirements but the developer time to create such a system may be greater than the creation of several simpler systems.
Resumo:
One of the most visionary goals of Artificial Intelligence is to create a system able to mimic and eventually surpass the intelligence observed in biological systems including, ambitiously, the one observed in humans. The main distinctive strength of humans is their ability to build a deep understanding of the world by learning continuously and drawing from their experiences. This ability, which is found in various degrees in all intelligent biological beings, allows them to adapt and properly react to changes by incrementally expanding and refining their knowledge. Arguably, achieving this ability is one of the main goals of Artificial Intelligence and a cornerstone towards the creation of intelligent artificial agents. Modern Deep Learning approaches allowed researchers and industries to achieve great advancements towards the resolution of many long-standing problems in areas like Computer Vision and Natural Language Processing. However, while this current age of renewed interest in AI allowed for the creation of extremely useful applications, a concerningly limited effort is being directed towards the design of systems able to learn continuously. The biggest problem that hinders an AI system from learning incrementally is the catastrophic forgetting phenomenon. This phenomenon, which was discovered in the 90s, naturally occurs in Deep Learning architectures where classic learning paradigms are applied when learning incrementally from a stream of experiences. This dissertation revolves around the Continual Learning field, a sub-field of Machine Learning research that has recently made a comeback following the renewed interest in Deep Learning approaches. This work will focus on a comprehensive view of continual learning by considering algorithmic, benchmarking, and applicative aspects of this field. This dissertation will also touch on community aspects such as the design and creation of research tools aimed at supporting Continual Learning research, and the theoretical and practical aspects concerning public competitions in this field.
Resumo:
The discovery of new materials and their functions has always been a fundamental component of technological progress. Nowadays, the quest for new materials is stronger than ever: sustainability, medicine, robotics and electronics are all key assets which depend on the ability to create specifically tailored materials. However, designing materials with desired properties is a difficult task, and the complexity of the discipline makes it difficult to identify general criteria. While scientists developed a set of best practices (often based on experience and expertise), this is still a trial-and-error process. This becomes even more complex when dealing with advanced functional materials. Their properties depend on structural and morphological features, which in turn depend on fabrication procedures and environment, and subtle alterations leads to dramatically different results. Because of this, materials modeling and design is one of the most prolific research fields. Many techniques and instruments are continuously developed to enable new possibilities, both in the experimental and computational realms. Scientists strive to enforce cutting-edge technologies in order to make progress. However, the field is strongly affected by unorganized file management, proliferation of custom data formats and storage procedures, both in experimental and computational research. Results are difficult to find, interpret and re-use, and a huge amount of time is spent interpreting and re-organizing data. This also strongly limit the application of data-driven and machine learning techniques. This work introduces possible solutions to the problems described above. Specifically, it talks about developing features for specific classes of advanced materials and use them to train machine learning models and accelerate computational predictions for molecular compounds; developing method for organizing non homogeneous materials data; automate the process of using devices simulations to train machine learning models; dealing with scattered experimental data and use them to discover new patterns.
Resumo:
Due to both the widespread and multipurpose use of document images and the current availability of a high number of document images repositories, robust information retrieval mechanisms and systems have been increasingly demanded. This paper presents an approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). We developed the LinkDI (Linking of Document Images) service, which extracts and indexes document images content, computes its latent semantics, and defines relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents as well as among their respective document images. Considering those same document images, we ran further experiments in order to compare the performance of LinkDI when it exploits or not the LSI technique. Experimental results showed that LSI can mitigate the effects of usual OCR misrecognition, which reinforces the feasibility of LinkDI relating OCR output with high degradation.
Resumo:
In Natural Language Processing (NLP) symbolic systems, several linguistic phenomena, for instance, the thematic role relationships between sentence constituents, such as AGENT, PATIENT, and LOCATION, can be accounted for by the employment of a rule-based grammar. Another approach to NLP concerns the use of the connectionist model, which has the benefits of learning, generalization and fault tolerance, among others. A third option merges the two previous approaches into a hybrid one: a symbolic thematic theory is used to supply the connectionist network with initial knowledge. Inspired on neuroscience, it is proposed a symbolic-connectionist hybrid system called BIO theta PRED (BIOlogically plausible thematic (theta) symbolic-connectionist PREDictor), designed to reveal the thematic grid assigned to a sentence. Its connectionist architecture comprises, as input, a featural representation of the words (based on the verb/noun WordNet classification and on the classical semantic microfeature representation), and, as output, the thematic grid assigned to the sentence. BIO theta PRED is designed to ""predict"" thematic (semantic) roles assigned to words in a sentence context, employing biologically inspired training algorithm and architecture, and adopting a psycholinguistic view of thematic theory.
Resumo:
This paper presents SMarty, a variability management approach for UML-based software product lines (PL). SMarty is supported by a UML profile, the SMartyProfile, and a process for managing variabilities, the SMartyProcess. SMartyProfile aims at representing variabilities, variation points, and variants in UML models by applying a set of stereotypes. SMartyProcess consists of a set of activities that is systematically executed to trace, identify, and control variabilities in a PL based on SMarty. It also identifies variability implementation mechanisms and analyzes specific product configurations. In addition, a more comprehensive application of SMarty is presented using SEI's Arcade Game Maker PL. An evaluation of SMarty and related work are discussed.
Resumo:
Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdos-Renyi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabasi-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree k variation, decreasing its network recovery rate with the increase of k. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.
Resumo:
A planar k-restricted structure is a simple graph whose blocks are planar and each has at most k vertices. Planar k-restricted structures are used by approximation algorithms for Maximum Weight Planar Subgraph, which motivates this work. The planar k-restricted ratio is the infimum, over simple planar graphs H, of the ratio of the number of edges in a maximum k-restricted structure subgraph of H to the number edges of H. We prove that, as k tends to infinity, the planar k-restricted ratio tends to 1/2. The same result holds for the weighted version. Our results are based on analyzing the analogous ratios for outerplanar and weighted outerplanar graphs. Here both ratios tend to 1 as k goes to infinity, and we provide good estimates of the rates of convergence, showing that they differ in the weighted from the unweighted case.
Resumo:
We simplify the known formula for the asymptotic estimate of the number of deterministic and accessible automata with n states over a k-letter alphabet. The proof relies on the theory of Lagrange inversion applied in the context of generalized binomial series.
Resumo:
Objective: We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. Methods and materials: The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely. Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. Results: We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Conclusions: Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). (C) 2011 Elsevier B.V. All rights reserved.