30 resultados para mining landscapes
em Université de Lausanne, Switzerland
Resumo:
Combinatorial optimization involves finding an optimal solution in a finite set of options; many everyday life problems are of this kind. However, the number of options grows exponentially with the size of the problem, such that an exhaustive search for the best solution is practically infeasible beyond a certain problem size. When efficient algorithms are not available, a practical approach to obtain an approximate solution to the problem at hand, is to start with an educated guess and gradually refine it until we have a good-enough solution. Roughly speaking, this is how local search heuristics work. These stochastic algorithms navigate the problem search space by iteratively turning the current solution into new candidate solutions, guiding the search towards better solutions. The search performance, therefore, depends on structural aspects of the search space, which in turn depend on the move operator being used to modify solutions. A common way to characterize the search space of a problem is through the study of its fitness landscape, a mathematical object comprising the space of all possible solutions, their value with respect to the optimization objective, and a relationship of neighborhood defined by the move operator. The landscape metaphor is used to explain the search dynamics as a sort of potential function. The concept is indeed similar to that of potential energy surfaces in physical chemistry. Borrowing ideas from that field, we propose to extend to combinatorial landscapes the notion of the inherent network formed by energy minima in energy landscapes. In our case, energy minima are the local optima of the combinatorial problem, and we explore several definitions for the network edges. At first, we perform an exhaustive sampling of local optima basins of attraction, and define weighted transitions between basins by accounting for all the possible ways of crossing the basins frontier via one random move. Then, we reduce the computational burden by only counting the chances of escaping a given basin via random kick moves that start at the local optimum. Finally, we approximate network edges from the search trajectory of simple search heuristics, mining the frequency and inter-arrival time with which the heuristic visits local optima. Through these methodologies, we build a weighted directed graph that provides a synthetic view of the whole landscape, and that we can characterize using the tools of complex networks science. We argue that the network characterization can advance our understanding of the structural and dynamical properties of hard combinatorial landscapes. We apply our approach to prototypical problems such as the Quadratic Assignment Problem, the NK model of rugged landscapes, and the Permutation Flow-shop Scheduling Problem. We show that some network metrics can differentiate problem classes, correlate with problem non-linearity, and predict problem hardness as measured from the performances of trajectory-based local search heuristics.
Resumo:
The survival of threatened species as the European tree frog (Hyla arborea) is strongly dependent on the genetic variability within populations, as well as gene flow between them. In Switzerland, only two sectors in its western part still harbour metapopulations. The first is characterised by a very heterogeneous and urbanized landscape, while the second is characterised by a uninterrupted array of suitable habitats. In this study, six microsatellite loci were used to establish levels of genetic differentiation among the populations from the two different locations. The results show that the metapopulations have: (i) weak levels of genetic differentiation (FST within metapopulation ≈ 0.04), (ii) no difference in levels of genetic structuring between them, (iii) significant (p = 0.019) differences in terms of genetic diversity (Hs) and observed heterozygozity (Ho), the metapopulation located in a disturbed landscape showing lower values. Our results suggest that even if the dispersal of H. arborea among contiguous ponds seems to be efficient in areas of heterogeneous landscape, a loss of genetic diversity can occur.
Resumo:
PURPOSE: In Burkina Faso, gold ore is one of the main sources of income for an important part of the active population. Artisan gold miners use mercury in the extraction, a toxic metal whose human health risks are well known. The aim of the present study was to assess mercury exposure as well as to understand the exposure determinants of gold miners in Burkinabe small-scale mines.METHODS: The examined gold miners' population on the different selected gold mining sites was composed by persons who were directly and indirectly related to gold mining activities. But measurement of urinary mercury was performed on workers most susceptible to be exposed to mercury. Thus, occupational exposure to mercury was evaluated among ninety-three workers belonging to eight different gold mining sites spread in six regions of Burkina Faso. Among others, work-related exposure determinants were taken into account for each person during urine sampling as for example amalgamating or heating mercury. All participants were medically examined by a local medical team in order to identify possible symptoms related to the toxic effect of mercury.RESULTS: Mercury levels were high, showing that 69% of the measurements exceeded the ACGIH (American Conference of Industrial Hygienists) biological exposure indice (BEI) of 35 µg per g of creatinine (µg/g-Cr) (prior to shift) while 16% even exceeded 350 µg/g-Cr. Basically, unspecific but also specific symptoms related to mercury toxicity could be underlined among the persons who were directly related to gold mining activities. Only one-third among the studied subpopulation reported about less than three symptoms possibly associated to mercury exposure and nearly half of them suffered from at least five of these symptoms. Ore washers were more involved in the direct handling of mercury while gold dealers in the final gold recovery activities. These differences may explain the overexposure observed in gold dealers and indicate that the refining process is the major source of exposure.CONCLUSIONS: This study attests that mercury exposure still is an issue of concern. North-South collaborations should encourage knowledge exchange between developing and developed countries, for a cleaner artisanal gold mining process and thus for reducing human health and environmental hazards due to mercury use.
Resumo:
Le "data mining", ou "fouille de données", est un ensemble de méthodes et de techniques attractif qui a connu une popularité fulgurante ces dernières années, spécialement dans le domaine du marketing. Le développement récent de l'analyse ou du renseignement criminel soulève des problèmatiques auxqwuelles il est tentant de d'appliquer ces méthodes et techniques. Le potentiel et la place du data mining dans le contexte de l'analyse criminelle doivent être mieux définis afin de piloter son application. Cette réflexion est menée dans le cadre du renseignement produit par des systèmes de détection et de suivi systématique de la criminalité répétitive, appelés processus de veille opérationnelle. Leur fonctionnement nécessite l'existence de patterns inscrits dans les données, et justifiés par les approches situationnelles en criminologie. Muni de ce bagage théorique, l'enjeu principal revient à explorer les possibilités de détecter ces patterns au travers des méthodes et techniques de data mining. Afin de répondre à cet objectif, une recherche est actuellement menée au Suisse à travers une approche interdisciplinaire combinant des connaissances forensiques, criminologiques et computationnelles.
Resumo:
The t(8;21) chromosomal translocation activates aberrant expression of the AML1-ETO (AE) fusion protein and is commonly associated with core binding factor acute myeloid leukaemia (CBF AML). Combining a conditional mouse model that closely resembles the slow evolution and the mosaic AE expression pattern of human t(8;21) CBF AML with global transcriptome sequencing, we find that disease progression was characterized by two principal pathogenic mechanisms. Initially, AE expression modified the lineage potential of haematopoietic stem cells (HSCs), resulting in the selective expansion of the myeloid compartment at the expense of normal erythro- and lymphopoiesis. This lineage skewing was followed by a second substantial rewiring of transcriptional networks occurring in the trajectory to manifest leukaemia. We also find that both HSC and lineage-restricted granulocyte macrophage progenitors (GMPs) acquired leukaemic stem cell (LSC) potential being capable of initiating and maintaining the disease. Finally, our data demonstrate that long-term expression of AE induces an indolent myeloproliferative disease (MPD)-like myeloid leukaemia phenotype with complete penetrance and that acute inactivation of AE function is a potential novel therapeutic option.
Resumo:
The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.
Resumo:
Research has demonstrated that landscape or watershed scale processes can influence instream aquatic ecosystems, in terms of the impacts of delivery of fine sediment, solutes and organic matter. Testing such impacts upon populations of organisms (i.e. at the catchment scale) has not proven straightforward and differences have emerged in the conclusions reached. This is: (1) partly because different studies have focused upon different scales of enquiry; but also (2) because the emphasis upon upstream land cover has rarely addressed the extent to which such land covers are hydrologically connected, and hence able to deliver diffuse pollution, to the drainage network However, there is a third issue. In order to develop suitable hydrological models, we need to conceptualise the process cascade. To do this, we need to know what matters to the organism being impacted by the hydrological system, such that we can identify which processes need to be modelled. Acquiring such knowledge is not easy, especially for organisms like fish that might occupy very different locations in the river over relatively short periods of time. However, and inevitably, hydrological modellers have started by building up piecemeal the aspects of the problem that we think matter to fish. Herein, we report two developments: (a) for the case of sediment associated diffuse pollution from agriculture, a risk-based modelling framework, SCIMAP, has been developed, which is distinct because it has an explicit focus upon hydrological connectivity; and (b) we use spatially distributed ecological data to infer the processes and the associated process parameters that matter to salmonid fry. We apply the model to spatially distributed salmon and fry data from the River Eden, Cumbria, England. The analysis shows, quite surprisingly, that arable land covers are relatively unimportant as drivers of fry abundance. What matters most is intensive pasture, a land cover that could be associated with a number of stressors on salmonid fry (e.g. pesticides, fine sediment) and which allows us to identify a series of risky field locations, where this land cover is readily connected to the river system by overland flow. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Imaging mass spectrometry (IMS) represents an innovative tool in the cancer research pipeline, which is increasingly being used in clinical and pharmaceutical applications. The unique properties of the technique, especially the amount of data generated, make the handling of data from multiple IMS acquisitions challenging. This work presents a histology-driven IMS approach aiming to identify discriminant lipid signatures from the simultaneous mining of IMS data sets from multiple samples. The feasibility of the developed workflow is evaluated on a set of three human colorectal cancer liver metastasis (CRCLM) tissue sections. Lipid IMS on tissue sections was performed using MALDI-TOF/TOF MS in both negative and positive ionization modes after 1,5-diaminonaphthalene matrix deposition by sublimation. The combination of both positive and negative acquisition results was performed during data mining to simplify the process and interrogate a larger lipidome into a single analysis. To reduce the complexity of the IMS data sets, a sub data set was generated by randomly selecting a fixed number of spectra from a histologically defined region of interest, resulting in a 10-fold data reduction. Principal component analysis confirmed that the molecular selectivity of the regions of interest is maintained after data reduction. Partial least-squares and heat map analyses demonstrated a selective signature of the CRCLM, revealing lipids that are significantly up- and down-regulated in the tumor region. This comprehensive approach is thus of interest for defining disease signatures directly from IMS data sets by the use of combinatory data mining, opening novel routes of investigation for addressing the demands of the clinical setting.
Resumo:
Data mining can be defined as the extraction of previously unknown and potentially useful information from large datasets. The main principle is to devise computer programs that run through databases and automatically seek deterministic patterns. It is applied in different fields of application, e.g., remote sensing, biometry, speech recognition, but has seldom been applied to forensic case data. The intrinsic difficulty related to the use of such data lies in its heterogeneity, which comes from the many different sources of information. The aim of this study is to highlight potential uses of pattern recognition that would provide relevant results from a criminal intelligence point of view. The role of data mining within a global crime analysis methodology is to detect all types of structures in a dataset. Once filtered and interpreted, those structures can point to previously unseen criminal activities. The interpretation of patterns for intelligence purposes is the final stage of the process. It allows the researcher to validate the whole methodology and to refine each step if necessary. An application to cutting agents found in illicit drug seizures was performed. A combinatorial approach was done, using the presence and the absence of products. Methods coming from the graph theory field were used to extract patterns in data constituted by links between products and place and date of seizure. A data mining process completed using graphing techniques is called ``graph mining''. Patterns were detected that had to be interpreted and compared with preliminary knowledge to establish their relevancy. The illicit drug profiling process is actually an intelligence process that uses preliminary illicit drug classes to classify new samples. Methods proposed in this study could be used \textit{a priori} to compare structures from preliminary and post-detection patterns. This new knowledge of a repeated structure may provide valuable complementary information to profiling and become a source of intelligence.
Resumo:
Alternative land uses make different contributions to the conservation of biodiversity and have different implementation and management costs. Conservation planning analyses to date have generally assumed that land is either protected or unprotected, and that the unprotected portion does not contribute to conservation goals. We develop and apply a new planning approach that explicitly accounts for the contribution of a diverse range of land uses to achieving conservation goals. Using East Kalimantan (Indonesian Borneo) as a case study, we prioritize investments in alternative conservation strategies and account for the relative contribution of land uses ranging from production forest to well-managed protected areas. We employ data on the distribution of mammals and assign species-specific conservation targets to achieve equitable protection by accounting for life history characteristics and home range sizes. The relative sensitivity of each species to forest degradation determines the contribution of each land use to achieving targets. We compare the cost effectiveness of our approach to a plan that considers only the contribution of protected areas to biodiversity conservation, and to a plan that assumes that the cost of conservation is represented by only the opportunity costs of conservation to the timber industry. Our preliminary results will require further development and substantial stakeholder engagement prior to implementation; nonetheless we reveal that, by accounting for the contribution of unprotected land, we can obtain more refined estimates of the costs of conservation. Using traditional planning approaches would overestimate the cost of achieving the conservation targets by an order of magnitude. Our approach reveals not only where to invest, but which strategies to invest in, in order to effectively and efficiently conserve biodiversity.
Resumo:
We present the first approach to the genetic diversity and structure of the Balearic toad (Bufo balearicus Boettger, 1880) for the island of Menorca. Forty-one individ- uals from 21 localities were analyzed for ten microsatellite loci. We used geo-refer- enced individual multilocus genotypes and a model-based clustering method for the inference of the number of populations and of the spatial location of genetic dis- continuities between those populations.¦Only six of the microsatellites analyzed were polymorphic. We revealed a northwest- ern area inhabited by a single population with several well-connected localities and another set of populations in the southeast that includes a few unconnected small units with genetically significant differences among them as well as with the individ- uals from the northwest of the island. The observed fragmentation may be explained by shifts from agricultural to tourism practices that have been taking place on the island of Menorca since the 1960s. The abandonment of rural activities in favor of urbanization and concomitant service areas has mostly affected the southeast of the island and is currently threatening the overall geographic connectivity between the different farming areas of the island that are inhabited by the Balearic toad.
Resumo:
Ultra-high-throughput sequencing (UHTS) techniques are evolving rapidly and may soon become an affordable and routine tool for sequencing plant DNA, even in smaller plant biology labs. Here we review recent insights into intraspecific genome variation gained from UHTS, which offers a glimpse of the rather unexpected levels of structural variability among Arabidopsis thaliana accessions. The challenges that will need to be addressed to efficiently assemble and exploit this information are also discussed.