766 resultados para Data Mining, Big Data, Consumi energetici, Weka Data Cleaning
Resumo:
We share our experience in planning, designing and deploying a wireless sensor network of one square kilometre area. Environmental data such as soil moisture, temperature, barometric pressure, and relative humidity are collected in this area situated in the semi-arid region of Karnataka, India. It is a hope that information derived from this data will benefit the marginal farmer towards improving his farming practices. Soon after establishing the need for such a project, we begin by showing the big picture of such a data gathering network, the software architecture we have used, the range measurements needed for determining the sensor density, and the packaging issues that seem to play a crucial role in field deployments. Our field deployment experiences include designing with intermittent grid power, enhancing software tools to aid quicker and effective deployment, and flash memory corruption. The first results on data gathering look encouraging.
Resumo:
The memory subsystem is a major contributor to the performance, power, and area of complex SoCs used in feature rich multimedia products. Hence, memory architecture of the embedded DSP is complex and usually custom designed with multiple banks of single-ported or dual ported on-chip scratch pad memory and multiple banks of off-chip memory. Building software for such large complex memories with many of the software components as individually optimized software IPs is a big challenge. In order to obtain good performance and a reduction in memory stalls, the data buffers of the application need to be placed carefully in different types of memory. In this paper we present a unified framework (MODLEX) that combines different data layout optimizations to address the complex DSP memory architectures. Our method models the data layout problem as multi-objective genetic algorithm (GA) with performance and power being the objectives and presents a set of solution points which is attractive from a platform design viewpoint. While most of the work in the literature assumes that performance and power are non-conflicting objectives, our work demonstrates that there is significant trade-off (up to 70%) that is possible between power and performance.
Resumo:
The high temperature region of the MnO-A1203 phase diagram has been redetermined to resolve some discrepancies reported in the literature regarding the melting behaviour of MnA1,04. This spinel was found to melt congruently at 2108 (+ 15) K. Theactivity of MnOin MnO-Al,03 meltsand in the two phase regions, melt + MnAI,04 and MnAI2O4 + A1203, has been determined by measuring the manganese concentration in platinum foils in equilibrium under controlled oxygen potentials. The activity of MnO obtained in this study for M ~ O - A I ,m~el~ts is in fair agreement with the results of Sharma and Richardson.However. the alumina-rich melt is found to be in equilibrium with MnAl,04 rather than AI2O3. as suggested by ~ha rmaan d Richardson. The value for the acthity of MnO in the M~AI ,O,+ A1,03 two phaseregion permits a rigorous application of the Gibbs-Duhem equation for calculating the activity of A1203 and the integral Gibbs' energy of mixing of MnO-A1203 melts, which are significantly different from those reported in the literature.
Resumo:
Over the past decade, many powerful data mining techniques have been developed to analyze temporal and sequential data. The time is now fertile for addressing problems of larger scope under the purview of temporal data mining. The fourth SIGKDD workshop on temporal data mining focused on the question: What can we infer about the structure of a complex dynamical system from observed temporal data? The goals of the workshop were to critically evaluate the need in this area by bringing together leading researchers from industry and academia, and to identify promising technologies and methodologies for doing the same. We provide a brief summary of the workshop proceedings and ideas arising out of the discussions.
Resumo:
The problem of classification of time series data is an interesting problem in the field of data mining. Even though several algorithms have been proposed for the problem of time series classification we have developed an innovative algorithm which is computationally fast and accurate in several cases when compared with 1NN classifier. In our method we are calculating the fuzzy membership of each test pattern to be classified to each class. We have experimented with 6 benchmark datasets and compared our method with 1NN classifier.
Resumo:
Since streaming data keeps coming continuously as an ordered sequence, massive amounts of data is created. A big challenge in handling data streams is the limitation of time and space. Prototype selection on streaming data requires the prototypes to be updated in an incremental manner as new data comes in. We propose an incremental algorithm for prototype selection. This algorithm can also be used to handle very large datasets. Results have been presented on a number of large datasets and our method is compared to an existing algorithm for streaming data. Our algorithm saves time and the prototypes selected gives good classification accuracy.
Resumo:
With data centers being the supporting infrastructure for a wide range of IT services, their efficiency has become a big concern to operators, as well as to society, for both economic and environmental reasons. The goal of this thesis is to design energy-efficient algorithms that reduce energy cost while minimizing compromise to service. We focus on the algorithmic challenges at different levels of energy optimization across the data center stack. The algorithmic challenge at the device level is to improve the energy efficiency of a single computational device via techniques such as job scheduling and speed scaling. We analyze the common speed scaling algorithms in both the worst-case model and stochastic model to answer some fundamental issues in the design of speed scaling algorithms. The algorithmic challenge at the local data center level is to dynamically allocate resources (e.g., servers) and to dispatch the workload in a data center. We develop an online algorithm to make a data center more power-proportional by dynamically adapting the number of active servers. The algorithmic challenge at the global data center level is to dispatch the workload across multiple data centers, considering the geographical diversity of electricity price, availability of renewable energy, and network propagation delay. We propose algorithms to jointly optimize routing and provisioning in an online manner. Motivated by the above online decision problems, we move on to study a general class of online problem named "smoothed online convex optimization", which seeks to minimize the sum of a sequence of convex functions when "smooth" solutions are preferred. This model allows us to bridge different research communities and help us get a more fundamental understanding of general online decision problems.
Resumo:
Compared with structured data sources that are usually stored and analyzed in spreadsheets, relational databases, and single data tables, unstructured construction data sources such as text documents, site images, web pages, and project schedules have been less intensively studied due to additional challenges in data preparation, representation, and analysis. In this paper, our vision for data management and mining addressing such challenges are presented, together with related research results from previous work, as well as our recent developments of data mining on text-based, web-based, image-based, and network-based construction databases.
Resumo:
The original description of Myxobolus longisporus Nie et Li, 1992, the species infecting gills of Cyprinus carpio haematopterus L., is supplemented with new data on the spore morphology and pathogenicity. Spores are elongate pyriform with pointed anterior end, 15.7 (15.5-16.5) mum long, 6.7 (6-8) mum wide and 5.5 mum thick. Sutural ridge is straight and narrow. Mucus envelope is lacking. Two equal-sized elongate pyriform polar capsules are 8.5 mum long and 2.5 mum wide with convergent long axes. Polar filament coiled perpendicularly to the long axis of the capsule makes 9 (8-10) turns. Posterior end of polar capsules exceeds mid-spore by 15-20%. Cyst-like plasmodia are localised in the gill secondary lamellae. The infection is described in adult big host specimens. Gross lesions manifested as dark red colouration of gill tissues were restricted to the ventral part of the first gill arches. Remarkable site specificity (apical part of secondary lamellae) was observed in the course of development of microscopic lesions. M. longisporus is characterised also on the molecular level using sequences of SSU rRNA gene. Phylogenetic analysis based on these sequences has allowed clearer phylogenetic relationships to be established with other species of the genus Myxobolus sequenced to date.
Resumo:
IEEE
Resumo:
National Key Basic Research and Development Program of China [2006CB701305]; State Key Laboratory of Resource and Environment Information System [088RA400SA]; Chinese Academy of Sciences
Resumo:
R. Jensen, Q. Shen, Data Reduction with Rough Sets, In: Encyclopedia of Data Warehousing and Mining - 2nd Edition, Vol. II, 2008.
Resumo:
The aim of this research, which focused on the Irish adult population, was to generate information for policymakers by applying statistical analyses and current technologies to oral health administrative and survey databases. Objectives included identifying socio-demographic influences on oral health and utilisation of dental services, comparing epidemiologically-estimated dental treatment need with treatment provided, and investigating the potential of a dental administrative database to provide information on utilisation of services and the volume and types of treatment provided over time. Information was extracted from the claims databases for the Dental Treatment Benefit Scheme (DTBS) for employed adults and the Dental Treatment Services Scheme (DTSS) for less-well-off adults, the National Surveys of Adult Oral Health, and the 2007 Survey of Lifestyle Attitudes and Nutrition in Ireland. Factors associated with utilisation and retention of natural teeth were analysed using count data models and logistic regression. The chi-square test and the student’s t-test were used to compare epidemiologically-estimated need in a representative sample of adults with treatment provided. Differences were found in dental care utilisation and tooth retention by Socio-Economic Status. An analysis of the five-year utilisation behaviour of a 2003 cohort of DTBS dental attendees revealed that age and being female were positively associated with visiting annually and number of treatments. Number of adults using the DTBS increased, and mean number of treatments per patient decreased, between 1997 and 2008. As a percentage of overall treatments, restorations, dentures, and extractions decreased, while prophylaxis increased. Differences were found between epidemiologically-estimated treatment need and treatment provided for those using the DTBS and DTSS. This research confirms the utility of survey and administrative data to generate knowledge for policymakers. Public administrative databases have not been designed for research purposes, but they have the potential to provide a wealth of knowledge on treatments provided and utilisation patterns.
Resumo:
It is estimated that the quantity of digital data being transferred, processed or stored at any one time currently stands at 4.4 zettabytes (4.4 × 2 70 bytes) and this figure is expected to have grown by a factor of 10 to 44 zettabytes by 2020. Exploiting this data is, and will remain, a significant challenge. At present there is the capacity to store 33% of digital data in existence at any one time; by 2020 this capacity is expected to fall to 15%. These statistics suggest that, in the era of Big Data, the identification of important, exploitable data will need to be done in a timely manner. Systems for the monitoring and analysis of data, e.g. stock markets, smart grids and sensor networks, can be made up of massive numbers of individual components. These components can be geographically distributed yet may interact with one another via continuous data streams, which in turn may affect the state of the sender or receiver. This introduces a dynamic causality, which further complicates the overall system by introducing a temporal constraint that is difficult to accommodate. Practical approaches to realising the system described above have led to a multiplicity of analysis techniques, each of which concentrates on specific characteristics of the system being analysed and treats these characteristics as the dominant component affecting the results being sought. The multiplicity of analysis techniques introduces another layer of heterogeneity, that is heterogeneity of approach, partitioning the field to the extent that results from one domain are difficult to exploit in another. The question is asked can a generic solution for the monitoring and analysis of data that: accommodates temporal constraints; bridges the gap between expert knowledge and raw data; and enables data to be effectively interpreted and exploited in a transparent manner, be identified? The approach proposed in this dissertation acquires, analyses and processes data in a manner that is free of the constraints of any particular analysis technique, while at the same time facilitating these techniques where appropriate. Constraints are applied by defining a workflow based on the production, interpretation and consumption of data. This supports the application of different analysis techniques on the same raw data without the danger of incorporating hidden bias that may exist. To illustrate and to realise this approach a software platform has been created that allows for the transparent analysis of data, combining analysis techniques with a maintainable record of provenance so that independent third party analysis can be applied to verify any derived conclusions. In order to demonstrate these concepts, a complex real world example involving the near real-time capturing and analysis of neurophysiological data from a neonatal intensive care unit (NICU) was chosen. A system was engineered to gather raw data, analyse that data using different analysis techniques, uncover information, incorporate that information into the system and curate the evolution of the discovered knowledge. The application domain was chosen for three reasons: firstly because it is complex and no comprehensive solution exists; secondly, it requires tight interaction with domain experts, thus requiring the handling of subjective knowledge and inference; and thirdly, given the dearth of neurophysiologists, there is a real world need to provide a solution for this domain
Resumo:
BACKGROUND: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. METHODS/PRINCIPAL FINDINGS: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of "what if" situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. CONCLUSION/SIGNIFICANCE: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.