78 resultados para Tutorial on Computing
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.
Resumo:
Markowitz showed that assets can be combined to produce an 'Efficient' portfolio that will give the highest level of portfolio return for any level of portfolio risk, as measured by the variance or standard deviation. These portfolios can then be connected to generate what is termed an 'Efficient Frontier' (EF). In this paper we discuss the calculation of the Efficient Frontier for combinations of assets, again using the spreadsheet Optimiser. To illustrate the derivation of the Efficient Frontier, we use the data from the Investment Property Databank Long Term Index of Investment Returns for the period 1971 to 1993. Many investors might require a certain specific level of holding or a restriction on holdings in at least some of the assets. Such additional constraints may be readily incorporated into the model to generate a constrained EF with upper and/or lower bounds. This can then be compared with the unconstrained EF to see whether the reduction in return is acceptable. To see the effect that these additional constraints may have, we adopt a fairly typical pension fund profile, with no more than 20% of the total held in Property. The paper shows that it is now relatively easy to use the Optimiser available in at least one spreadsheet (EXCEL) to calculate efficient portfolios for various levels of risk and return, both constrained and unconstrained, so as to be able to generate any number of Efficient Frontiers.
Resumo:
Pervasive computing is a continually, and rapidly, growing field, although still remains in relative infancy. The possible applications for the technology are numerous, and stand to fundamentally change the way users interact with technology. However, alongside these are equally numerous potential undesirable effects and risks. The lack of empirical naturalistic data in the real world makes studying the true impacts of this technology difficult. This paper describes how two independent research projects shared such valuable empirical data on the relationship between pervasive technologies and users. Each project had different aims and adopted different methods, but successfully used the same data and arrived at the same conclusions. This paper demonstrates the benefit of sharing research data in multidisciplinary pervasive computing research where real world implementations are not widely available.
Resumo:
The impending threat of global climate change and its regional manifestations is among the most important and urgent problems facing humanity. Society needs accurate and reliable estimates of changes in the probability of regional weather variations to develop science-based adaptation and mitigation strategies. Recent advances in weather prediction and in our understanding and ability to model the climate system suggest that it is both necessary and possible to revolutionize climate prediction to meet these societal needs. However, the scientific workforce and the computational capability required to bring about such a revolution is not available in any single nation. Motivated by the success of internationally funded infrastructure in other areas of science, this paper argues that, because of the complexity of the climate system, and because the regional manifestations of climate change are mainly through changes in the statistics of regional weather variations, the scientific and computational requirements to predict its behavior reliably are so enormous that the nations of the world should create a small number of multinational high-performance computing facilities dedicated to the grand challenges of developing the capabilities to predict climate variability and change on both global and regional scales over the coming decades. Such facilities will play a key role in the development of next-generation climate models, build global capacity in climate research, nurture a highly trained workforce, and engage the global user community, policy-makers, and stakeholders. We recommend the creation of a small number of multinational facilities with computer capability at each facility of about 20 peta-flops in the near term, about 200 petaflops within five years, and 1 exaflop by the end of the next decade. Each facility should have sufficient scientific workforce to develop and maintain the software and data analysis infrastructure. Such facilities will enable questions of what resolution, both horizontal and vertical, in atmospheric and ocean models, is necessary for more confident predictions at the regional and local level. Current limitations in computing power have placed severe limitations on such an investigation, which is now badly needed. These facilities will also provide the world's scientists with the computational laboratories for fundamental research on weather–climate interactions using 1-km resolution models and on atmospheric, terrestrial, cryospheric, and oceanic processes at even finer scales. Each facility should have enabling infrastructure including hardware, software, and data analysis support, and scientific capacity to interact with the national centers and other visitors. This will accelerate our understanding of how the climate system works and how to model it. It will ultimately enable the climate community to provide society with climate predictions, which are based on our best knowledge of science and the most advanced technology.
Resumo:
Pocket Data Mining (PDM) is our new term describing collaborative mining of streaming data in mobile and distributed computing environments. With sheer amounts of data streams are now available for subscription on our smart mobile phones, the potential of using this data for decision making using data stream mining techniques has now been achievable owing to the increasing power of these handheld devices. Wireless communication among these devices using Bluetooth and WiFi technologies has opened the door wide for collaborative mining among the mobile devices within the same range that are running data mining techniques targeting the same application. This paper proposes a new architecture that we have prototyped for realizing the significant applications in this area. We have proposed using mobile software agents in this application for several reasons. Most importantly the autonomic intelligent behaviour of the agent technology has been the driving force for using it in this application. Other efficiency reasons are discussed in details in this paper. Experimental results showing the feasibility of the proposed architecture are presented and discussed.
Resumo:
High spatial resolution environmental data gives us a better understanding of the environmental factors affecting plant distributions at fine spatial scales. However, large environmental datasets dramatically increase compute times and output species model size stimulating the need for an alternative computing solution. Cluster computing offers such a solution, by allowing both multiple plant species Environmental Niche Models (ENMs) and individual tiles of high spatial resolution models to be computed concurrently on the same compute cluster. We apply our methodology to a case study of 4,209 species of Mediterranean flora (around 17% of species believed present in the biome). We demonstrate a 16 times speed-up of ENM computation time when 16 CPUs were used on the compute cluster. Our custom Java ‘Merge’ and ‘Downsize’ programs reduce ENM output files sizes by 94%. The median 0.98 test AUC score of species ENMs is aided by various species occurrence data filtering techniques. Finally, by calculating the percentage change of individual grid cell values, we map the projected percentages of plant species vulnerable to climate change in the Mediterranean region between 1950–2000 and 2020.
Resumo:
Purpose: This paper aims to design an evaluation method that enables an organization to assess its current IT landscape and provide readiness assessment prior to Software as a Service (SaaS) adoption. Design/methodology/approach: The research employs a mixed of quantitative and qualitative approaches for conducting an IT application assessment. Quantitative data such as end user’s feedback on the IT applications contribute to the technical impact on efficiency and productivity. Qualitative data such as business domain, business services and IT application cost drivers are used to determine the business value of the IT applications in an organization. Findings: The assessment of IT applications leads to decisions on suitability of each IT application that can be migrated to cloud environment. Research limitations/implications: The evaluation of how a particular IT application impacts on a business service is done based on the logical interpretation. Data mining method is suggested in order to derive the patterns of the IT application capabilities. Practical implications: This method has been applied in a local council in UK. This helps the council to decide the future status of the IT applications for cost saving purpose.
Resumo:
Point and click interactions using a mouse are an integral part of computer use for current desktop systems. Compared with younger users though, older adults experience greater difficulties performing cursor positioning tasks, and this can present limitations to using a computer easily and effectively. Target expansion is a technique for improving pointing performance, where the target dynamically grows as the cursor approaches. This has the advantage that targets conserve screen real estate in their unexpanded state, yet can still provide the benefits of a larger area to click on. This paper presents two studies of target expansion with older and younger participants, involving multidirectional point-select tasks with a computer mouse. Study 1 compares static versus expanding targets, and Study 2 compares static targets with three alternative techniques for expansion. Results show that expansion can improve times by up to 14%, and reduce error rates by up to 50%. Additionally, expanding targets are beneficial even when the expansion happens late in the movement, i.e. after the cursor has reached the expanded target area or even after it has reached the original target area. Participants’ subjective feedback on the target expansion are generally favorable, and this lends further support for the technique.
Resumo:
In this paper we propose methods for computing Fresnel integrals based on truncated trapezium rule approximations to integrals on the real line, these trapezium rules modified to take into account poles of the integrand near the real axis. Our starting point is a method for computation of the error function of complex argument due to Matta and Reichel (J Math Phys 34:298–307, 1956) and Hunter and Regan (Math Comp 26:539–541, 1972). We construct approximations which we prove are exponentially convergent as a function of N , the number of quadrature points, obtaining explicit error bounds which show that accuracies of 10−15 uniformly on the real line are achieved with N=12 , this confirmed by computations. The approximations we obtain are attractive, additionally, in that they maintain small relative errors for small and large argument, are analytic on the real axis (echoing the analyticity of the Fresnel integrals), and are straightforward to implement.
Resumo:
Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the deployment of PDM in the mobile environment. An important extension to the basic implementation of PDM dealing with concept drift is also reported. In the era of Big Data, potential applications of paramount importance offered by PDM in a variety of domains including security, business and telemedicine are discussed.
Resumo:
n this study, the authors discuss the effective usage of technology to solve the problem of deciding on journey start times for recurrent traffic conditions. The developed algorithm guides the vehicles to travel on more reliable routes that are not easily prone to congestion or travel delays, ensures that the start time is as late as possible to avoid the traveller waiting too long at their destination and attempts to minimise the travel time. Experiments show that in order to be more certain of reaching their destination on time, a traveller has to leave early and correspondingly arrive early, resulting in a large waiting time. The application developed here asks the user to set this certainty factor as per the task in hand, and computes the best start time and route.
Resumo:
SOA (Service Oriented Architecture), workflow, the Semantic Web, and Grid computing are key enabling information technologies in the development of increasingly sophisticated e-Science infrastructures and application platforms. While the emergence of Cloud computing as a new computing paradigm has provided new directions and opportunities for e-Science infrastructure development, it also presents some challenges. Scientific research is increasingly finding that it is difficult to handle “big data” using traditional data processing techniques. Such challenges demonstrate the need for a comprehensive analysis on using the above mentioned informatics techniques to develop appropriate e-Science infrastructure and platforms in the context of Cloud computing. This survey paper describes recent research advances in applying informatics techniques to facilitate scientific research particularly from the Cloud computing perspective. Our particular contributions include identifying associated research challenges and opportunities, presenting lessons learned, and describing our future vision for applying Cloud computing to e-Science. We believe our research findings can help indicate the future trend of e-Science, and can inform funding and research directions in how to more appropriately employ computing technologies in scientific research. We point out the open research issues hoping to spark new development and innovation in the e-Science field.
Resumo:
Most developers of behavior change support systems (BCSS) employ ad hoc procedures in their designs. This paper presents a novel discussion concerning how analyzing the relationship between attitude toward target behavior, current behavior, and attitude toward change or maintaining behavior can facilitate the design of BCSS. We describe the three-dimensional relationships between attitude and behavior (3D-RAB) model and demonstrate how it can be used to categorize users, based on variations in levels of cognitive dissonance. The proposed model seeks to provide a method for analyzing the user context on the persuasive systems design model, and it is evaluated using existing BCSS. We identified that although designers seem to address the various cognitive states, this is not done purposefully, or in a methodical fashion, which implies that many existing applications are targeting users not considered at the design phase. As a result of this work, it is suggested that designers apply the 3D-RAB model in order to design solutions for targeted users.
Resumo:
We present an intuitive geometric approach for analysing the structure and fragility of T1-weighted structural MRI scans of human brains. Apart from computing characteristics like the surface area and volume of regions of the brain that consist of highly active voxels, we also employ Network Theory in order to test how close these regions are to breaking apart. This analysis is used in an attempt to automatically classify subjects into three categories: Alzheimer’s disease, mild cognitive impairment and healthy controls, for the CADDementia Challenge.