845 resultados para mining data streams
Resumo:
Following inspections in 2013 of all police forces, Her Majesty’s Inspectorate of Constabulary found that one-third of forces could not provide data on repeat victims of domestic abuse (DA) and concluded that in general there were ambiguities around the term ‘repeat victim’ and that there was a need for consistent and comparable statistics on DA. Using an analysis of police-recorded DA data from two forces, an argument is made for including both offences and non-crime incidents when identifying repeat victims of DA. Furthermore, for statistical purposes the counting period for repeat victimizations should be taken as a rolling 12 months from first recorded victimization. Examples are given of summary statistics that can be derived from these data down to Community Safety Partnership level. To reinforce the need to include both offences and incidents in analyses, repeat victim chronologies from policerecorded data are also used to briefly examine cases of escalation to homicide as an example of how they can offer new insights and greater scope for evaluating risk and effectiveness of interventions.
Resumo:
Data mining, as a heatedly discussed term, has been studied in various fields. Its possibilities in refining the decision-making process, realizing potential patterns and creating valuable knowledge have won attention of scholars and practitioners. However, there are less studies intending to combine data mining and libraries where data generation occurs all the time. Therefore, this thesis plans to fill such a gap. Meanwhile, potential opportunities created by data mining are explored to enhance one of the most important elements of libraries: reference service. In order to thoroughly demonstrate the feasibility and applicability of data mining, literature is reviewed to establish a critical understanding of data mining in libraries and attain the current status of library reference service. The result of the literature review indicates that free online data resources other than data generated on social media are rarely considered to be applied in current library data mining mandates. Therefore, the result of the literature review motivates the presented study to utilize online free resources. Furthermore, the natural match between data mining and libraries is established. The natural match is explained by emphasizing the data richness reality and considering data mining as one kind of knowledge, an easy choice for libraries, and a wise method to overcome reference service challenges. The natural match, especially the aspect that data mining could be helpful for library reference service, lays the main theoretical foundation for the empirical work in this study. Turku Main Library was selected as the case to answer the research question: whether data mining is feasible and applicable for reference service improvement. In this case, the daily visit from 2009 to 2015 in Turku Main Library is considered as the resource for data mining. In addition, corresponding weather conditions are collected from Weather Underground, which is totally free online. Before officially being analyzed, the collected dataset is cleansed and preprocessed in order to ensure the quality of data mining. Multiple regression analysis is employed to mine the final dataset. Hourly visits are the independent variable and weather conditions, Discomfort Index and seven days in a week are dependent variables. In the end, four models in different seasons are established to predict visiting situations in each season. Patterns are realized in different seasons and implications are created based on the discovered patterns. In addition, library-climate points are generated by a clustering method, which simplifies the process for librarians using weather data to forecast library visiting situation. Then the data mining result is interpreted from the perspective of improving reference service. After this data mining work, the result of the case study is presented to librarians so as to collect professional opinions regarding the possibility of employing data mining to improve reference services. In the end, positive opinions are collected, which implies that it is feasible to utilizing data mining as a tool to enhance library reference service.
Resumo:
Following the workshop on new developments in daily licensing practice in November 2011, we brought together fourteen representatives from national consortia (from Denmark, Germany, Netherlands and the UK) and publishers (Elsevier, SAGE and Springer) met in Copenhagen on 9 March 2012 to discuss provisions in licences to accommodate new developments. The one day workshop aimed to: present background and ideas regarding the provisions KE Licensing Expert Group developed; introduce and explain the provisions the invited publishers currently use;ascertain agreement on the wording for long term preservation, continuous access and course packs; give insight and more clarity about the use of open access provisions in licences; discuss a roadmap for inclusion of the provisions in the publishers’ licences; result in report to disseminate the outcome of the meeting. Participants of the workshop were: United Kingdom: Lorraine Estelle (Jisc Collections) Denmark: Lotte Eivor Jørgensen (DEFF), Lone Madsen (Southern University of Denmark), Anne Sandfær (DEFF/Knowledge Exchange) Germany: Hildegard Schaeffler (Bavarian State Library), Markus Brammer (TIB) The Netherlands: Wilma Mossink (SURF), Nol Verhagen (University of Amsterdam), Marc Dupuis (SURF/Knowledge Exchange) Publishers: Alicia Wise (Elsevier), Yvonne Campfens (Springer), Bettina Goerner (Springer), Leo Walford (Sage) Knowledge Exchange: Keith Russell The main outcome of the workshop was that it would be valuable to have a standard set of clauses which could used in negotiations, this would make concluding licences a lot easier and more efficient. The comments on the model provisions the Licensing Expert group had drafted will be taken into account and the provisions will be reformulated. Data and text mining is a new development and demand for access to allow for this is growing. It would be easier if there was a simpler way to access materials so they could be more easily mined. However there are still outstanding questions on how authors of articles that have been mined can be properly attributed.
Resumo:
The incredible rapid development to huge volumes of air travel, mainly because of jet airliners that appeared to the sky in the 1950s, created the need for systematic research for aviation safety and collecting data about air traffic. The structured data can be analysed easily using queries from databases and running theseresults through graphic tools. However, in analysing narratives that often give more accurate information about the case, mining tools are needed. The analysis of textual data with computers has not been possible until data mining tools have been developed. Their use, at least among aviation, is still at a moderate level. The research aims at discovering lethal trends in the flight safety reports. The narratives of 1,200 flight safety reports from years 1994 – 1996 in Finnish were processed with three text mining tools. One of them was totally language independent, the other had a specific configuration for Finnish and the third originally created for English, but encouraging results had been achieved with Spanish and that is why a Finnish test was undertaken, too. The global rate of accidents is stabilising and the situation can now be regarded as satisfactory, but because of the growth in air traffic, the absolute number of fatal accidents per year might increase, if the flight safety will not be improved. The collection of data and reporting systems have reached their top level. The focal point in increasing the flight safety is analysis. The air traffic has generally been forecasted to grow 5 – 6 per cent annually over the next two decades. During this period, the global air travel will probably double also with relatively conservative expectations of economic growth. This development makes the airline management confront growing pressure due to increasing competition, signify cant rise in fuel prices and the need to reduce the incident rate due to expected growth in air traffic volumes. All this emphasises the urgent need for new tools and methods. All systems provided encouraging results, as well as proved challenges still to be won. Flight safety can be improved through the development and utilisation of sophisticated analysis tools and methods, like data mining, using its results supporting the decision process of the executives.
Resumo:
Double Degree
Resumo:
Este trabalho objetivou realizar a sistematização e análise das informações disponíveis na literatura sobre técnicas de produção de mudas de seis espécies florestais nativas e exóticas no Bioma Amazônia.
Resumo:
Il presente elaborato esplora l’attitudine delle organizzazioni nei confronti dei processi di business che le sostengono: dalla semi-assenza di struttura, all’organizzazione funzionale, fino all’avvento del Business Process Reengineering e del Business Process Management, nato come superamento dei limiti e delle problematiche del modello precedente. All’interno del ciclo di vita del BPM, trova spazio la metodologia del process mining, che permette un livello di analisi dei processi a partire dagli event data log, ossia dai dati di registrazione degli eventi, che fanno riferimento a tutte quelle attività supportate da un sistema informativo aziendale. Il process mining può essere visto come naturale ponte che collega le discipline del management basate sui processi (ma non data-driven) e i nuovi sviluppi della business intelligence, capaci di gestire e manipolare l’enorme mole di dati a disposizione delle aziende (ma che non sono process-driven). Nella tesi, i requisiti e le tecnologie che abilitano l’utilizzo della disciplina sono descritti, cosi come le tre tecniche che questa abilita: process discovery, conformance checking e process enhancement. Il process mining è stato utilizzato come strumento principale in un progetto di consulenza da HSPI S.p.A. per conto di un importante cliente italiano, fornitore di piattaforme e di soluzioni IT. Il progetto a cui ho preso parte, descritto all’interno dell’elaborato, ha come scopo quello di sostenere l’organizzazione nel suo piano di improvement delle prestazioni interne e ha permesso di verificare l’applicabilità e i limiti delle tecniche di process mining. Infine, nell’appendice finale, è presente un paper da me realizzato, che raccoglie tutte le applicazioni della disciplina in un contesto di business reale, traendo dati e informazioni da working papers, casi aziendali e da canali diretti. Per la sua validità e completezza, questo documento è stata pubblicato nel sito dell'IEEE Task Force on Process Mining.
Resumo:
Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.
Resumo:
The semiarid region of northeastern Brazil, the Caatinga, is extremely important due to its biodiversity and endemism. Measurements of plant physiology are crucial to the calibration of Dynamic Global Vegetation Models (DGVMs) that are currently used to simulate the responses of vegetation in face of global changes. In a field work realized in an area of preserved Caatinga forest located in Petrolina, Pernambuco, measurements of carbon assimilation (in response to light and CO2) were performed on 11 individuals of Poincianella microphylla, a native species that is abundant in this region. These data were used to calibrate the maximum carboxylation velocity (Vcmax) used in the INLAND model. The calibration techniques used were Multiple Linear Regression (MLR), and data mining techniques as the Classification And Regression Tree (CART) and K-MEANS. The results were compared to the UNCALIBRATED model. It was found that simulated Gross Primary Productivity (GPP) reached 72% of observed GPP when using the calibrated Vcmax values, whereas the UNCALIBRATED approach accounted for 42% of observed GPP. Thus, this work shows the benefits of calibrating DGVMs using field ecophysiological measurements, especially in areas where field data is scarce or non-existent, such as in the Caatinga
Resumo:
Background: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. Methods/Principal Findings: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of ""what if'' situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. Conclusion/Significance: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.
Resumo:
Stream discharge-concentration relationships are indicators of terrestrial ecosystem function. Throughout the Amazon and Cerrado regions of Brazil rapid changes in land use and land cover may be altering these hydrochemical relationships. The current analysis focuses on factors controlling the discharge-calcium (Ca) concentration relationship since previous research in these regions has demonstrated both positive and negative slopes in linear log(10)discharge-log(10)Ca concentration regressions. The objective of the current study was to evaluate factors controlling stream discharge-Ca concentration relationships including year, season, stream order, vegetation cover, land use, and soil classification. It was hypothesized that land use and soil class are the most critical attributes controlling discharge-Ca concentration relationships. A multilevel, linear regression approach was utilized with data from 28 streams throughout Brazil. These streams come from three distinct regions and varied broadly in watershed size (< 1 to > 10(6) ha) and discharge (10(-5.7)-10(3.2) m(3) s(-1)). Linear regressions of log(10)Ca versus log(10)discharge in 13 streams have a preponderance of negative slopes with only two streams having significant positive slopes. An ANOVA decomposition suggests the effect of discharge on Ca concentration is large but variable. Vegetation cover, which incorporates aspects of land use, explains the largest proportion of the variance in the effect of discharge on Ca followed by season and year. In contrast, stream order, land use, and soil class explain most of the variation in stream Ca concentration. In the current data set, soil class, which is related to lithology, has an important effect on Ca concentration but land use, likely through its effect on runoff concentration and hydrology, has a greater effect on discharge-concentration relationships.
Resumo:
Hot tensile and creep tests were carried out on Kanthal A1 alloy in the temperature range from 600 to 800 degrees C. Each of these sets of data were analyzed separately according to their own methodologies, but an attempt was made to find a correlation between them. A new criterion proposed for converting hot tensile data to creep data, makes possible the analysis of the two kinds of results according to usual creep relations like: Norton, Monkman-Grant, Larson-Miller and others. The remarkable compatibility verified between both sets of data by this procedure strongly suggests that hot tensile data can be converted to creep data and vice-versa for Kanthal A1 alloy, as verified previously for other metallic materials.
Resumo:
The productivity associated with commonly available disassembly methods today seldomly makes disassembly the preferred end-of-life solution for massive take back product streams. Systematic reuse of parts or components, or recycling of pure material fractions are often not achievable in an economically sustainable way. In this paper a case-based review of current disassembly practices is used to analyse the factors influencing disassembly feasibility. Data mining techniques were used to identify major factors influencing the profitability of disassembly operations. Case characteristics such as involvement of the product manufacturer in the end-of-life treatment and continuous ownership are some of the important dimensions. Economic models demonstrate that the efficiency of disassembly operations should be increased an order of magnitude to assure the competitiveness of ecologically preferred, disassembly oriented end-of-life scenarios for large waste of electric and electronic equipment (WEEE) streams. Technological means available to increase the productivity of the disassembly operations are summarized. Automated disassembly techniques can contribute to the robustness of the process, but do not allow to overcome the efficiency gap if not combined with appropriate product design measures. Innovative, reversible joints, collectively activated by external trigger signals, form a promising approach to low cost, mass disassembly in this context. A short overview of the state-of-the-art in the development of such self-disassembling joints is included. (c) 2008 CIRP.
Resumo:
The central issue for pillar design in underground coal mining is the in situ uniaxial compressive strength (sigma (cm)). The paper proposes a new method for estimating in situ uniaxial compressive strength in coal seams based on laboratory strength and P wave propagation velocity. It describes the collection of samples in the Bonito coal seam, Fontanella Mine, southern Brazil, the techniques used for the structural mapping of the coal seam and determination of seismic wave propagation velocity as well as the laboratory procedures used to determine the strength and ultrasonic wave velocity. The results obtained using the new methodology are compared with those from seven other techniques for estimating in situ rock mass uniaxial compressive strength.