907 resultados para Data clustering
Resumo:
Harnessing idle PCs CPU cycles, storage space and other resources of networked computers to collaborative are mainly fixated on for all major grid computing research projects. Most of the university computers labs are occupied with the high puissant desktop PC nowadays. It is plausible to notice that most of the time machines are lying idle or wasting their computing power without utilizing in felicitous ways. However, for intricate quandaries and for analyzing astronomically immense amounts of data, sizably voluminous computational resources are required. For such quandaries, one may run the analysis algorithms in very puissant and expensive computers, which reduces the number of users that can afford such data analysis tasks. Instead of utilizing single expensive machines, distributed computing systems, offers the possibility of utilizing a set of much less expensive machines to do the same task. BOINC and Condor projects have been prosperously utilized for solving authentic scientific research works around the world at a low cost. In this work the main goal is to explore both distributed computing to implement, Condor and BOINC, and utilize their potency to harness the ideal PCs resources for the academic researchers to utilize in their research work. In this thesis, Data mining tasks have been performed in implementation of several machine learning algorithms on the distributed computing environment.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
An American cutaneous leishmaniasis outbreak, with cases clustering during 1993 in Tartagal city, Salta, was reported. The outbreak involved 102 individuals, 43.1% of them with multiple ulcers. Age (mean: 33 years old) and sex distribution of cases (74.5% males), as well as working activity (70 forest-related), support the hypothesis of classical forest transmission leishmaniasis, despite the fact that the place of permanent residence was in periurban Tartagal. Moreover, during July, sandflies were only collected from one of the 'deforestation areas'. Lutzomyia intermedia was the single species of the 491 phlebotomines captured, reinforcing the vector incrimination of this species. Most infections must have been acquired during the fall (April to June), a pattern consistent with previous sandfly population dynamics data. Based on the epidemiological and entomological results, it was advised not to do any vector-targeted periurban control measures during July. Further studies should be done to assess if the high rate of multiple lesions was due to parasite factors or to infective vector density factors.
Resumo:
O presente trabalho enquadra-se na temática de segurança contra incêndio em edifícios e consiste num estudo de caso de projeto de deteção e extinção de incêndio num Data Center. Os objetivos deste trabalho resumem-se à realização de um estudo sobre o estado da arte da extinção e deteção automática de incêndio, ao desenvolvimento de uma ferramenta de software de apoio a projetos de extinção por agentes gasosos, como também à realização de um estudo e uma análise da proteção contra incêndios em Data Centers. Por último foi efetuado um estudo de caso. São abordados os conceitos de fogo e de incêndio, em que um estudo teórico à temática foi desenvolvido, descrevendo de que forma pode o fogo ser originado e respetivas consequências. Os regulamentos nacionais relativos à Segurança Contra Incêndios em Edifícios (SCIE) são igualmente abordados, com especial foco nos Sistemas Automáticos de Deteção de Incêndio (SADI) e nos Sistemas Automáticos de Extinção de Incêndio (SAEI), as normas nacionais e internacionais relativas a esta temática também são mencionadas. Pelo facto de serem muito relevantes para o desenvolvimento deste trabalho, os sistemas de deteção de incêndio são exaustivamente abordados, mencionando características de equipamentos de deteção, técnicas mais utilizadas como também quais os aspetos a ter em consideração no dimensionamento de um SADI. Quanto aos meios de extinção de incêndio foram mencionados quais os mais utilizados atualmente, as suas vantagens e a que tipo de fogo se aplicam, com especial destaque para os SAEI com utilização de gases inertes, em que foi descrito como deve ser dimensionado um sistema deste tipo. Foi também efetuada a caracterização dos Data Centers para que seja possível entender quais as suas funcionalidades, a importância da sua existência e os aspetos gerais de uma proteção contra incêndio nestas instalações. Por último, um estudo de caso foi desenvolvido, um SADI foi projetado juntamente com um SAEI que utiliza azoto como gás de extinção. As escolhas e os sistemas escolhidos foram devidamente justificados, tendo em conta os regulamentos e normas em vigor.
Resumo:
São muitas as organizações que por todo o mundo possuem instalações deste tipo, em Portugal temos o exemplo da Portugal Telecom que recentemente inaugurou o seu Data Center na Covilhã. O desenvolvimento de um Data Center exige assim um projeto muito cuidado, o qual entre outros aspetos deverá garantir a segurança da informação e das próprias instalações, nomeadamente no que se refere à segurança contra incêndio.
Resumo:
In-network storage of data in wireless sensor networks contributes to reduce the communications inside the network and to favor data aggregation. In this paper, we consider the use of n out of m codes and data dispersal in combination to in-network storage. In particular, we provide an abstract model of in-network storage to show how n out of m codes can be used, and we discuss how this can be achieved in five cases of study. We also define a model aimed at evaluating the probability of correct data encoding and decoding, we exploit this model and simulations to show how, in the cases of study, the parameters of the n out of m codes and the network should be configured in order to achieve correct data coding and decoding with high probability.
Resumo:
Accepted in 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia (ESTIMedia 2015), Amsterdam, Netherlands.
Resumo:
Nowadays, data centers are large energy consumers and the trend for next years is expected to increase further, considering the growth in the order of cloud services. A large portion of this power consumption is due to the control of physical parameters of the data center (such as temperature and humidity). However, these physical parameters are tightly coupled with computations, and even more so in upcoming data centers, where the location of workloads can vary substantially due, for example, to workloads being moved in the cloud infrastructure hosted in the data center. Therefore, managing the physical and compute infrastructure of a large data center is an embodiment of a Cyber-Physical System (CPS). In this paper, we describe a data collection and distribution architecture that enables gathering physical parameters of a large data center at a very high temporal and spatial resolution of the sensor measurements. We think this is an important characteristic to enable more accurate heat-flow models of the data center and with them, find opportunities to optimize energy consumptions. Having a high-resolution picture of the data center conditions, also enables minimizing local hot-spots, perform more accurate predictive maintenance (failures in all infrastructure equipments can be more promptly detected) and more accurate billing. We detail this architecture and define the structure of the underlying messaging system that is used to collect and distribute the data. Finally, we show the results of a preliminary study of a typical data center radio environment.
Resumo:
This paper analyses forest fires in the perspective of dynamical systems. Forest fires exhibit complex correlations in size, space and time, revealing features often present in complex systems, such as the absence of a characteristic length-scale, or the emergence of long range correlations and persistent memory. This study addresses a public domain forest fires catalogue, containing information of events for Portugal, during the period from 1980 up to 2012. The data is analysed in an annual basis, modelling the occurrences as sequences of Dirac impulses with amplitude proportional to the burnt area. First, we consider mutual information to correlate annual patterns. We use visualization trees, generated by hierarchical clustering algorithms, in order to compare and to extract relationships among the data. Second, we adopt the Multidimensional Scaling (MDS) visualization tool. MDS generates maps where each object corresponds to a point. Objects that are perceived to be similar to each other are placed on the map forming clusters. The results are analysed in order to extract relationships among the data and to identify forest fire patterns.
Resumo:
In this paper, we apply multidimensional scaling (MDS) and parametric similarity indices (PSI) in the analysis of complex systems (CS). Each CS is viewed as a dynamical system, exhibiting an output time-series to be interpreted as a manifestation of its behavior. We start by adopting a sliding window to sample the original data into several consecutive time periods. Second, we define a given PSI for tracking pieces of data. We then compare the windows for different values of the parameter, and we generate the corresponding MDS maps of ‘points’. Third, we use Procrustes analysis to linearly transform the MDS charts for maximum superposition and to build a global MDS map of “shapes”. This final plot captures the time evolution of the phenomena and is sensitive to the PSI adopted. The generalized correlation, the Minkowski distance and four entropy-based indices are tested. The proposed approach is applied to the Dow Jones Industrial Average stock market index and the Europe Brent Spot Price FOB time-series.
Resumo:
This paper studies forest fires from the perspective of dynamical systems. Burnt area, precipitation and atmospheric temperatures are interpreted as state variables of a complex system and the correlations between them are investigated by means of different mathematical tools. First, we use mutual information to reveal potential relationships in the data. Second, we adopt the state space portrait to characterize the system’s behavior. Third, we compare the annual state space curves and we apply clustering and visualization tools to unveil long-range patterns. We use forest fire data for Portugal, covering the years 1980–2003. The territory is divided into two regions (North and South), characterized by different climates and vegetation. The adopted methodology represents a new viewpoint in the context of forest fires, shedding light on a complex phenomenon that needs to be better understood in order to mitigate its devastating consequences, at both economical and environmental levels.
Resumo:
Complex industrial plants exhibit multiple interactions among smaller parts and with human operators. Failure in one part can propagate across subsystem boundaries causing a serious disaster. This paper analyzes the industrial accident data series in the perspective of dynamical systems. First, we process real world data and show that the statistics of the number of fatalities reveal features that are well described by power law (PL) distributions. For early years, the data reveal double PL behavior, while, for more recent time periods, a single PL fits better into the experimental data. Second, we analyze the entropy of the data series statistics over time. Third, we use the Kullback–Leibler divergence to compare the empirical data and multidimensional scaling (MDS) techniques for data analysis and visualization. Entropy-based analysis is adopted to assess complexity, having the advantage of yielding a single parameter to express relationships between the data. The classical and the generalized (fractional) entropy and Kullback–Leibler divergence are used. The generalized measures allow a clear identification of patterns embedded in the data.
Resumo:
Currently, due to the widespread use of computers and the internet, students are trading libraries for the World Wide Web and laboratories with simulation programs. In most courses, simulators are made available to students and can be used to proof theoretical results or to test a developing hardware/product. Although this is an interesting solution: low cost, easy and fast way to perform some courses work, it has indeed major disadvantages. As everything is currently being done with/in a computer, the students are loosing the “feel” of the real values of the magnitudes. For instance in engineering studies, and mainly in the first years, students need to learn electronics, algorithmic, mathematics and physics. All of these areas can use numerical analysis software, simulation software or spreadsheets and in the majority of the cases data used is either simulated or random numbers, but real data could be used instead. For example, if a course uses numerical analysis software and needs a dataset, the students can learn to manipulate arrays. Also, when using the spreadsheets to build graphics, instead of using a random table, students could use a real dataset based, for instance, in the room temperature and its variation across the day. In this work we present a framework which uses a simple interface allowing it to be used by different courses where the computers are the teaching/learning process in order to give a more realistic feeling to students by using real data. A framework is proposed based on a set of low cost sensors for different physical magnitudes, e.g. temperature, light, wind speed, which are connected to a central server, that the students have access with an Ethernet protocol or are connected directly to the student computer/laptop. These sensors use the communication ports available such as: serial ports, parallel ports, Ethernet or Universal Serial Bus (USB). Since a central server is used, the students are encouraged to use sensor values results in their different courses and consequently in different types of software such as: numerical analysis tools, spreadsheets or simply inside any programming language when a dataset is needed. In order to do this, small pieces of hardware were developed containing at least one sensor using different types of computer communication. As long as the sensors are attached in a server connected to the internet, these tools can also be shared between different schools. This allows sensors that aren't available in a determined school to be used by getting the values from other places that are sharing them. Another remark is that students in the more advanced years and (theoretically) more know how, can use the courses that have some affinities with electronic development to build new sensor pieces and expand the framework further. The final solution provided is very interesting, low cost, simple to develop, allowing flexibility of resources by using the same materials in several courses bringing real world data into the students computer works.
Resumo:
Demo in Workshop on ns-3 (WNS3 2015). 13 to 14, May, 2015. Castelldefels, Spain.
Resumo:
Data Mining (DM) methods are being increasingly used in prediction with time series data, in addition to traditional statistical approaches. This paper presents a literature review of the use of DM with time series data, focusing on short- time stocks prediction. This is an area that has been attracting a great deal of attention from researchers in the field. The main contribution of this paper is to provide an outline of the use of DM with time series data, using mainly examples related with short-term stocks prediction. This is important to a better understanding of the field. Some of the main trends and open issues will also be introduced.