916 resultados para Data Mining and its Application
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
Recent interest in the validation of general circulation models (GCMs) has been devoted to objective methods. A small number of authors have used the direct synoptic identification of phenomena together with a statistical analysis to perform the objective comparison between various datasets. This paper describes a general method for performing the synoptic identification of phenomena that can be used for an objective analysis of atmospheric, or oceanographic, datasets obtained from numerical models and remote sensing. Methods usually associated with image processing have been used to segment the scene and to identify suitable feature points to represent the phenomena of interest. This is performed for each time level. A technique from dynamic scene analysis is then used to link the feature points to form trajectories. The method is fully automatic and should be applicable to a wide range of geophysical fields. An example will be shown of results obtained from this method using data obtained from a run of the Universities Global Atmospheric Modelling Project GCM.
Resumo:
Current commercially available Doppler lidars provide an economical and robust solution for measuring vertical and horizontal wind velocities, together with the ability to provide co- and cross-polarised backscatter profiles. The high temporal resolution of these instruments allows turbulent properties to be obtained from studying the variation in radial velocities. However, the instrument specifications mean that certain characteristics, especially the background noise behaviour, become a limiting factor for the instrument sensitivity in regions where the aerosol load is low. Turbulent calculations require an accurate estimate of the contribution from velocity uncertainty estimates, which are directly related to the signal-to-noise ratio. Any bias in the signal-to-noise ratio will propagate through as a bias in turbulent properties. In this paper we present a method to correct for artefacts in the background noise behaviour of commercially available Doppler lidars and reduce the signal-to-noise ratio threshold used to discriminate between noise, and cloud or aerosol signals. We show that, for Doppler lidars operating continuously at a number of locations in Finland, the data availability can be increased by as much as 50 % after performing this background correction and subsequent reduction in the threshold. The reduction in bias also greatly improves subsequent calculations of turbulent properties in weak signal regimes.
Resumo:
In Part One, the foundations of Bayesian inference are reviewed, and the technicalities of the Bayesian method are illustrated. Part Two applies the Bayesian meta-analysis program, the Confidence Profile Method (CPM), to clinical trial data and evaluates the merits of using Bayesian meta-analysis for overviews of clinical trials.^ The Bayesian method of meta-analysis produced similar results to the classical results because of the large sample size, along with the input of a non-preferential prior probability distribution. These results were anticipated through explanations in Part One of the mechanics of the Bayesian approach. ^
Resumo:
Comunicación presentada en las IV Jornadas TIMM, Torres (Jaén), 7-8 abril 2011.
Resumo:
Metagenomics is the culture-independent study of genetic material obtained directly from environmental samples. It has become a realistic approach to understanding microbial communities thanks to advances in high-throughput DNA sequencing technologies over the past decade. Current research has shown that different sites of the human body house varied bacterial communities. There is a strong correlation between an individual’s microbial community profile at a given site and disease. Metagenomics is being applied more often as a means of comparing microbial profiles in biomedical studies. The analysis of the data collected using metagenomics can be quite challenging and there exist a plethora of tools for interpreting the results. An automatic analytical workflow for metagenomic analyses has been implemented and tested using synthetic datasets of varying quality. It is able to accurately classify bacteria by taxa and correctly estimate the richness and diversity of each set. The workflow was then applied to the study of the airways microbiome in Chronic Obstructive Pulmonary Disease (COPD). COPD is a progressive lung disease resulting in narrowing of the airways and restricted airflow. Despite being the third leading cause of death in the United States, little is known about the differences in the lung microbial community profiles of healthy individuals and COPD patients. Bronchoalveolar lavage (BAL) samples were collected from COPD patients, active or ex-smokers, and never smokers and sequenced by 454 pyrosequencing. A total of 56 individuals were recruited for the study. Substantial colonization of the lungs was found in all subjects and differentially abundant genera in each group were identified. These discoveries are promising and may further our understanding of how the structure of the lung microbiome is modified as COPD progresses. It is also anticipated that the results will eventually lead to improved treatments for COPD.
Resumo:
The Lena River Delta, situated in Northern Siberia (72.0 - 73.8° N, 122.0 - 129.5° E), is the largest Arctic delta and covers 29,000 km**2. Since natural deltas are characterised by complex geomorphological patterns and various types of ecosystems, high spatial resolution information on the distribution and extent of the delta environments is necessary for a spatial assessment and accurate quantification of biogeochemical processes as drivers for the emission of greenhouse gases from tundra soils. In this study, the first land cover classification for the entire Lena Delta based on Landsat 7 Enhanced Thematic Mapper (ETM+) images was conducted and used for the quantification of methane emissions from the delta ecosystems on the regional scale. The applied supervised minimum distance classification was very effective with the few ancillary data that were available for training site selection. Nine land cover classes of aquatic and terrestrial ecosystems in the wetland dominated (72%) Lena Delta could be defined by this classification approach. The mean daily methane emission of the entire Lena Delta was calculated with 10.35 mg CH4/m**2/d. Taking our multi-scale approach into account we find that the methane source strength of certain tundra wetland types is lower than calculated previously on coarser scales.
A new analysis of hydrographic data in the Atlantic and its application to an inverse modeling study
Resumo:
Forecasting abrupt variations in wind power generation (the so-called ramps) helps achieve large scale wind power integration. One of the main issues to be confronted when addressing wind power ramp forecasting is the way in which relevant information is identified from large datasets to optimally feed forecasting models. To this end, an innovative methodology oriented to systematically relate multivariate datasets to ramp events is presented. The methodology comprises two stages: the identification of relevant features in the data and the assessment of the dependence between these features and ramp occurrence. As a test case, the proposed methodology was employed to explore the relationships between atmospheric dynamics at the global/synoptic scales and ramp events experienced in two wind farms located in Spain. The achieved results suggested different connection degrees between these atmospheric scales and ramp occurrence. For one of the wind farms, it was found that ramp events could be partly explained from regional circulations and zonal pressure gradients. To perform a comprehensive analysis of ramp underlying causes, the proposed methodology could be applied to datasets related to other stages of the wind-topower conversion chain.
Resumo:
A simple and low cost method to determine volatile contaminants in post-consumer recycled PET flakes was developed and validated by Headspace Dynamic Concentration and Gas Chromatography-Flame Ionization Detection (HDC-GC-FID). The analytical parameters evaluated by using surrogates include: correlation coefficient, detection limit, quantification limit, accuracy, intra-assay precision, and inter-assay precision. In order to compare the efficiency of the proposed method to recognized automated techniques, post-consumer PET packaging samples collected in Brazil were used. GC-MS was used to confirm the identity of the substances identified in the PET packaging. Some of the identified contaminants were estimated in the post-consumer material at concentrations higher than 220 ng.g-1. The findings in this work corroborate data available in the scientific literature pointing out the suitability of the proposed analytical method.
Resumo:
The solubilities and dissolution rates of three gypsum sources (analytical grade (AG), phosphogypsum (PG) and mined gypsum (MG)) with six MG size fractions ((mm) > 2.0, 1.0-2.0, 0.5-1.0, 0.25-0.5, 0.125-0.25, and < 0.125) were investigated in triple deionised water (TDI) and seawater to examine their suitability for bauxite residue amelioration. Gypsum solubility was greater in seawater (3.8 g L 1) than TDI (2.9 g L 1) due to the ionic strength effect, with dissolution in both TDI and seawater following first order kinetics. Dissolution rate constants varied with gypsum source (AR > PG > MG) due to reactivity and surface area differences, with 1:20 gypsum:solution suspensions reaching saturation within 15 s (AR) to 30 min (MG > 2.0). The ability of bauxite residue to adsorb Ca from solution was also examined. The quantity of the total solution Ca adsorbed was found to be small (5 %). These low rates of solution Ca adsorption combined with the comparatively rapid dissolution rates preclude the application of gypsum to the residue sand/seawater slurry as a method for residue amelioration. Instead, direct field application to the residue would ensure more efficient gypsum use. In addition, the formation of a sparingly soluble CaCO3 coating around the gypsum particles after mixing in a highly alkaline seawater/supernatant liquor (SNL) solution greatly reduced the rate of gypsum dissolution.
Resumo:
NMR spectroscopy and simulated annealing calculations have been used to determine the three-dimensional structure of NaD1, a novel antifungal and insecticidal protein isolated from the flowers of Nicotiana alata. NaD1 is a basic, cysteine-rich protein of 47 residues and is the first example of a plant defensin from flowers to be characterized structurally. Its three-dimensional structure consists of an a-helix and a triple-stranded anti-parallel beta-sheet that are stabilized by four intramolecular disulfide bonds. NaD1 features all the characteristics of the cysteine-stabilized up motif that has been described for a variety of proteins of differing functions ranging from antibacterial insect defensins and ion channel-perturbing scorpion toxins to an elicitor of the sweet taste response. The protein is biologically active against insect pests, which makes it a potential candidate for use in crop protection. NaD1 shares 31% sequence identity with alfAFP, an antifungal protein from alfalfa that confers resistance to a fungal pathogen in transgenic potatoes. The structure of NaD1 was used to obtain a homology model of alfAFP, since NaD1 has the highest level of sequence identity with alfAFP of any structurally characterized antifungal defensin. The structures of NaD1 and alfAFP were used in conjunction with structure - activity data for the radish defensin Rs-AFP2 to provide an insight into structure-function relationships. In particular, a putative effector site was identified in the structure of NaD1 and in the corresponding homology model of alfAFP. (C) 2002 Elsevier Science Ltd. All rights reserved.
Resumo:
The aim of this work is to introduce a systematic press database on natural hazards and climate change in Catalonia (NE of Spain) and to analyze its potential application to social-impact studies. For this reason, a review of the concepts of risk, hazard, vulnerability and social perception is also included. This database has been built for the period 1982¿2007 and contains all the news related with those issues published by the oldest still-active newspaper in Catalonia. Some parameters are registered for each article and for each event, including criteria that enable us to determine the importance accorded to it by the newspaper, and a compilation of information about it. This ACCESS data base allows each article to be classified on the basis of the seven defined topics and key words, as well as summary information about the format and structuring of the new itself, the social impact of the event and data about the magnitude or intensity of the event. The coverage given to this type of news has been assessed because of its influence on construction of the social perception of natural risk and climate change, and as a potential source of information about them. The treatment accorded by the press to different risks is also considered. More than 14 000 press articles have been classified. Results show that the largest number of news items for the period 1982¿2007 relates to forest fires and droughts, followed by floods and heavy rainfalls, although floods are the major risk in the region of study. Two flood events recorded in 2002 have been analyzed in order to show an example of the role of the press information as indicator of risk perception.