92 resultados para Data Mining and Machine Learning
Resumo:
Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the deployment of PDM in the mobile environment. An important extension to the basic implementation of PDM dealing with concept drift is also reported. In the era of Big Data, potential applications of paramount importance offered by PDM in a variety of domains including security, business and telemedicine are discussed.
Resumo:
Background Major Depressive Disorder (MDD) is among the most prevalent and disabling medical conditions worldwide. Identification of clinical and biological markers (“biomarkers”) of treatment response could personalize clinical decisions and lead to better outcomes. This paper describes the aims, design, and methods of a discovery study of biomarkers in antidepressant treatment response, conducted by the Canadian Biomarker Integration Network in Depression (CAN-BIND). The CAN-BIND research program investigates and identifies biomarkers that help to predict outcomes in patients with MDD treated with antidepressant medication. The primary objective of this initial study (known as CAN-BIND-1) is to identify individual and integrated neuroimaging, electrophysiological, molecular, and clinical predictors of response to sequential antidepressant monotherapy and adjunctive therapy in MDD. Methods CAN-BIND-1 is a multisite initiative involving 6 academic health centres working collaboratively with other universities and research centres. In the 16-week protocol, patients with MDD are treated with a first-line antidepressant (escitalopram 10–20 mg/d) that, if clinically warranted after eight weeks, is augmented with an evidence-based, add-on medication (aripiprazole 2–10 mg/d). Comprehensive datasets are obtained using clinical rating scales; behavioural, dimensional, and functioning/quality of life measures; neurocognitive testing; genomic, genetic, and proteomic profiling from blood samples; combined structural and functional magnetic resonance imaging; and electroencephalography. De-identified data from all sites are aggregated within a secure neuroinformatics platform for data integration, management, storage, and analyses. Statistical analyses will include multivariate and machine-learning techniques to identify predictors, moderators, and mediators of treatment response. Discussion From June 2013 to February 2015, a cohort of 134 participants (85 outpatients with MDD and 49 healthy participants) has been evaluated at baseline. The clinical characteristics of this cohort are similar to other studies of MDD. Recruitment at all sites is ongoing to a target sample of 290 participants. CAN-BIND will identify biomarkers of treatment response in MDD through extensive clinical, molecular, and imaging assessments, in order to improve treatment practice and clinical outcomes. It will also create an innovative, robust platform and database for future research.
Resumo:
A combination of satellite data, reanalysis products and climate models are combined to monitor changes in water vapour, clear-sky radiative cooling of the atmosphere and precipitation over the period 1979-2006. Climate models are able to simulate observed increases in column integrated water vapour (CWV) with surface temperature (Ts) over the ocean. Changes in the observing system lead to spurious variability in water vapour and clear-sky longwave radiation in reanalysis products. Nevertheless all products considered exhibit a robust increase in clear-sky longwave radiative cooling from the atmosphere to the surface; clear-sky longwave radiative cooling of the atmosphere is found to increase with Ts at the rate of ~4 Wm-2 K-1 over tropical ocean regions of mean descending vertical motion. Precipitation (P) is tightly coupled to atmospheric radiative cooling rates and this implies an increase in P with warming at a slower rate than the observed increases in CWV. Since convective precipitation depends on moisture convergence, the above implies enhanced precipitation over convective regions and reduced precipitation over convectively suppressed regimes. To quantify this response, observed and simulated changes in precipitation rate are analysed separately over regions of mean ascending and descending vertical motion over the tropics. The observed response is found to be substantially larger than the model simulations and climate change projections. It is currently not clear whether this is due to deficiencies in model parametrizations or errors in satellite retrievals.
Resumo:
The long-term stability, high accuracy, all-weather capability, high vertical resolution, and global coverage of Global Navigation Satellite System (GNSS) radio occultation (RO) suggests it as a promising tool for global monitoring of atmospheric temperature change. With the aim to investigate and quantify how well a GNSS RO observing system is able to detect climate trends, we are currently performing an (climate) observing system simulation experiment over the 25-year period 2001 to 2025, which involves quasi-realistic modeling of the neutral atmosphere and the ionosphere. We carried out two climate simulations with the general circulation model MAECHAM5 (Middle Atmosphere European Centre/Hamburg Model Version 5) of the MPI-M Hamburg, covering the period 2001–2025: One control run with natural variability only and one run also including anthropogenic forcings due to greenhouse gases, sulfate aerosols, and tropospheric ozone. On the basis of this, we perform quasi-realistic simulations of RO observables for a small GNSS receiver constellation (six satellites), state-of-the-art data processing for atmospheric profiles retrieval, and a statistical analysis of temperature trends in both the “observed” climatology and the “true” climatology. Here we describe the setup of the experiment and results from a test bed study conducted to obtain a basic set of realistic estimates of observational errors (instrument- and retrieval processing-related errors) and sampling errors (due to spatial-temporal undersampling). The test bed results, obtained for a typical summer season and compared to the climatic 2001–2025 trends from the MAECHAM5 simulation including anthropogenic forcing, were found encouraging for performing the full 25-year experiment. They indicated that observational and sampling errors (both contributing about 0.2 K) are consistent with recent estimates of these errors from real RO data and that they should be sufficiently small for monitoring expected temperature trends in the global atmosphere over the next 10 to 20 years in most regions of the upper troposphere and lower stratosphere (UTLS). Inspection of the MAECHAM5 trends in different RO-accessible atmospheric parameters (microwave refractivity and pressure/geopotential height in addition to temperature) indicates complementary climate change sensitivity in different regions of the UTLS so that optimized climate monitoring shall combine information from all climatic key variables retrievable from GNSS RO data.
Resumo:
Trace elements may present an environmental hazard in the vicinity of mining and smelting activities. However, the factors controlling their distribution and transfer within the soil and vegetation systems are not always well defined. Total concentrations of up to 15,195 mg center dot kg (-1) As, 6,690 mg center dot kg(-1) Cu, 24,820 mg center dot kg(-1) Pb and 9,810 mg center dot kg(-1) Zn in soils, and 62 mg center dot kg(-1) As, 1,765 mg center dot kg(-1) Cu, 280 mg center dot kg(-1) Pb and 3,460 mg center dot kg (-1) Zn in vegetation were measured. However, unusually for smelters and mines of a similar size, the elevated trace element concentrations in soils were found to be restricted to the immediate vicinity of the mines and smelters (maximum 2-3 km). Parent material, prevailing wind direction, and soil physical and chemical characteristics were found to correlate poorly with the restricted trace element distributions in soils. Hypotheses are given for this unusual distribution: (1) the contaminated soils were removed by erosion or (2) mines and smelters released large heavy particles that could not have been transported long distances. Analyses of the accumulation of trace elements in vegetation (median ratios: As 0.06, Cu 0.19, Pb 0.54 and Zn 1.07) and the percentage of total trace elements being DTPA extractable in soils (median percentages: As 0.06%, Cu 15%, Pb 7% and Zn 4%) indicated higher relative trace element mobility in soils with low total concentrations than in soils with elevated concentrations.
Resumo:
Trace elements may present an environmental hazard in the vicinity of mining and smelting activities. However, the factors controlling trace element distribution in soils around ancient and modem mining and smelting areas are not always clear. Tharsis, Riotinto and Huelva are located in the Iberian Pyrite Belt in SW Spain. Tharsis and Riotinto mines have been exploited since 2500 B.C., with intensive smelting taking place. Huelva, established in 1970 and using the Flash Furnace Outokumpu process, is currently one of the largest smelter in the world. Pyrite and chalcopyrite ore have been intensively smelted for Cu. However, unusually for smelters and mines of a similar size, the elevated trace element concentrations in soils were found to be restricted to the immediate vicinity of the mines and smelters, being found up to a maximum of 2 kin from the mines and smelters at Tharsis, Riotinto and Huelva. Trace element partitioning (over 2/3 of trace elements found in the residual immobile fraction of soils at Tharsis) and soil particles examination by SEM-EDX showed that trace elements were not adsorbed onto soil particles, but were included within the matrix of large trace element-rich Fe silicate slag particles (i.e. 1 min circle divide at least 1 wt.% As, Cu and Zn, and 2 wt.% Pb). Slag particle large size (I mm 0) was found to control the geographically restricted trace element distribution in soils at Tharsis, Riotinto and Huelva, since large heavy particles could not have been transported long distances. Distribution and partitioning indicated that impacts to the environment as a result of mining and smelting should remain minimal in the region. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Toxic trace elements present an environmental hazard in the vicinity of mining and smelting activities. However. the processes of transfer of these elements to groundwater and to plants are not always clear. Tharsis mine. in the Iberian pyrite belt (SW Spain), has been exploited since 2500 BC, with extensive smelting, taking place front the 1850S until the 1920s. Sixty four soil (mainly topsoils) and vegetation samples were collected in February 2001 and analysed by ICP-AES for 23 elements. Concentrations are 6-6300 mg kg(-1) As and 14-24800 mg kg(-1) Pb in soils, and 0.20-9 mg kg(-1) As and 2-195 mg Pb in vegetation. Trace element concentrations decrease rapidly away from the mine. with As and Pb concentrations in the range 6-1850 mg kg(-1) (median 22 mg kg(-1)) and 14-31 mg, kg(-1) (median 43 mg, kg(-1)), respectively, 1 km away from the mine. These concentrations are low when compared to other well-studied mining and smelting areas (e.g. 600 mg kg(-1) As at 8 km from Yellowknife smelter, Canada; >100 mg kg(-1) Pb over 270 km(2) around the Pb-Zn Port Pirie smelter. South Australia: mean of 1419 mg kg(-1) Pb around Aberystwyth smelter, Wales, UK). The high metal content of the vegetation and the low soil pH (mean pH 4.93) indicate the potential for trace element mobility which Could explain the relatively low concentration of metals in Tharsis topsoils and cause threats to plans to redevelop the Tharsis area as an orange plantation.
Resumo:
Two distinctions in the human learning literature are becoming increasingly influential; implicit versus explicit memory, and implicit versus explicit learning, respectively. To date, these distinctions have been used to refer to apparently different phenomena. Recent research suggests, however, that the same processes may be underlying performance in the two types of task. This paper reviews recent results in the two areas and suggests ways in which the two distinctions may be related.
Resumo:
The “butterfly effect” is a popularly known paradigm; commonly it is said that when a butterfly flaps its wings in Brazil, it may cause a tornado in Texas. This essentially describes how weather forecasts can be extremely senstive to small changes in the given atmospheric data, or initial conditions, used in computer model simulations. In 1961 Edward Lorenz found, when running a weather model, that small changes in the initial conditions given to the model can, over time, lead to entriely different forecasts (Lorenz, 1963). This discovery highlights one of the major challenges in modern weather forecasting; that is to provide the computer model with the most accurately specified initial conditions possible. A process known as data assimilation seeks to minimize the errors in the given initial conditions and was, in 1911, described by Bjerkness as “the ultimate problem in meteorology” (Bjerkness, 1911).
Resumo:
In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.