984 resultados para Mining extraction
Resumo:
Product reviews are the foremost source of information for customers and manufacturers to help them make appropriate purchasing and production decisions. Natural language data is typically very sparse; the most common words are those that do not carry a lot of semantic content, and occurrences of any particular content-bearing word are rare, while co-occurrences of these words are rarer. Mining product aspects, along with corresponding opinions, is essential for Aspect-Based Opinion Mining (ABOM) as a result of the e-commerce revolution. Therefore, the need for automatic mining of reviews has reached a peak. In this work, we deal with ABOM as sequence labelling problem and propose a supervised extraction method to identify product aspects and corresponding opinions. We use Conditional Random Fields (CRFs) to solve the extraction problem and propose a feature function to enhance accuracy. The proposed method is evaluated using two different datasets. We also evaluate the effectiveness of feature function and the optimisation through multiple experiments.
Resumo:
This paper proposes the Clinical Pathway Analysis Method (CPAM) approach that enables the extraction of valuable organisational and medical information on past clinical pathway executions from the event logs of healthcare information systems. The method deals with the complexity of real-world clinical pathways by introducing a perspective-based segmentation of the date-stamped event log. CPAM enables the clinical pathway analyst to effectively and efficiently acquire a profound insight into the clinical pathways. By comparing the specific medical conditions of patients with the factors used for characterising the different clinical pathway variants, the medical expert can identify the best therapeutic option. Process mining-based analytics enables the acquisition of valuable insights into clinical pathways, based on the complete audit traces of previous clinical pathway instances. Additionally, the methodology is suited to assess guideline compliance and analyse adverse events. Finally, the methodology provides support for eliciting tacit knowledge and providing treatment selection assistance.
Resumo:
Frog protection has become increasingly essential due to the rapid decline of its biodiversity. Therefore, it is valuable to develop new methods for studying this biodiversity. In this paper, a novel feature extraction method is proposed based on perceptual wavelet packet decomposition for classifying frog calls in noisy environments. Pre-processing and syllable segmentation are first applied to the frog call. Then, a spectral peak track is extracted from each syllable if possible. Track duration, dominant frequency and oscillation rate are directly extracted from the track. With k-means clustering algorithm, the calculated dominant frequency of all frog species is clustered into k parts, which produce a frequency scale for wavelet packet decomposition. Based on the adaptive frequency scale, wavelet packet decomposition is applied to the frog calls. Using the wavelet packet decomposition coefficients, a new feature set named perceptual wavelet packet decomposition sub-band cepstral coefficients is extracted. Finally, a k-nearest neighbour (k-NN) classifier is used for the classification. The experiment results show that the proposed features can achieve an average classification accuracy of 97.45% which outperforms syllable features (86.87%) and Mel-frequency cepstral coefficients (MFCCs) feature (90.80%).
Resumo:
Precipitation-induced runoff and leaching from milled peat mining mires by peat types: a comparative method for estimating the loading of water bodies during peat production. This research project in environmental geology has arisen out of an observed need to be able to predict more accurately the loading of watercourses with detrimental organic substances and nutrients from already existing and planned peat production areas, since the authorities capacity for insisting on such predictions covering the whole duration of peat production in connection with evaluations of environmental impact is at present highly limited. National and international decisions regarding monitoring of the condition of watercourses and their improvement and restoration require more sophisticated evaluation methods in order to be able to forecast watercourse loading and its environmental impacts at the stage of land-use planning and preparations for peat production.The present project thus set out from the premise that it would be possible on the basis of existing mire and peat data properties to construct estimates for the typical loading from production mires over the whole duration of their exploitation. Finland has some 10 million hectares of peatland, accounting for almost a third of its total area. Macroclimatic conditions have varied in the course of the Holocene growth and development of this peatland, and with them the habitats of the peat-forming plants. Temperatures and moisture conditions have played a significant role in determining the dominant species of mire plants growing there at any particular time, the resulting mire types and the accumulation and deposition of plant remains to form the peat. The above climatic, environmental and mire development factors, together with ditching, have contributed, and continue to contribute, to the existence of peat horizons that differ in their physical and chemical properties, leading to differences in material transport between peatlands in a natural state and mires that have been ditched or prepared for forestry and peat production. Watercourse loading from the ditching of mires or their use for peat production can have detrimental effects on river and lake environments and their recreational use, especially where oxygen-consuming organic solids and soluble organic substances and nutrients are concerned. It has not previously been possible, however, to estimate in advance the watercourse loading likely to arise from ditching and peat production on the basis of the characteristics of the peat in a mire, although earlier observations have indicated that watercourse loading from peat production can vary greatly and it has been suggested that differences in peat properties may be of significance in this. Sprinkling is used here in combination with simulations of conditions in a milled peat production area to determine the influence of the physical and chemical properties of milled peats in production mires on surface runoff into the drainage ditches and the concentrations of material in the runoff water. Sprinkling and extraction experiments were carried out on 25 samples of milled Carex (C) and Sphagnum (S) peat of humification grades H 2.5 8.5 with moisture content in the range 23.4 89% on commencement of the first sprinkling, which was followed by a second sprinkling 24 hours later. The water retention capacity of the peat was best, and surface runoff lowest, with Sphagnum and Carex peat samples of humification grades H 2.5 6 in the moisture content class 56 75%. On account of the hydrophobicity of dry peat, runoff increased in a fairly regular manner with drying of the sample from 55% to 24 30%. Runoff from the samples with an original moisture content over 55% increased by 63% in the second round of sprinkling relative to the first, as they had practically reached saturation point on the first occasion, while those with an original moisture content below 55% retained their high runoff in the second round, due to continued hydrophobicity. The well-humified samples (H 6.5 8.5) with a moisture content over 80% showed a low water retention capacity and high runoff in both rounds of sprinkling. Loading of the runoff water with suspended solids, total phosphorus and total nitrogen, and also the chemical oxygen demand (CODMn O2), varied greatly in the sprinkling experiment, depending on the peat type and degree of humification, but concentrations of the same substances in the two sprinklings were closely or moderately closely correlated and these correlations were significant. The concentrations of suspended solids in the runoff water observed in the simulations of a peat production area and the direct surface runoff from it into the drainage ditch system in response to rain (sprinkling intensity 1.27 mm/min) varied c. 60-fold between the degrees of humification in the case of the Carex peats and c. 150-fold for the Sphagnum peats, while chemical oxygen demand varied c. 30-fold and c. 50-fold, respectively, total phosphorus c. 60-fold and c. 66-fold, total nitrogen c. 65-fold and c. 195-fold and ammonium nitrogen c. 90-fold and c. 30-fold. The increases in concentrations in the runoff water were very closely correlated with increases in humification of the peat. The correlations of the concentrations measured in extraction experiments (48 h) with peat type and degree of humification corresponded to those observed in the sprinkler experiments. The resulting figures for the surface runoff from a peat production area into the drainage ditches simulated by means of sprinkling and material concentrations in the runoff water were combined with statistics on the mean extent of daily rainfall (0 67 mm) during the frost-free period of the year (May October) over an observation period of 30 years to yield typical annual loading figures (kg/ha) for suspended solids (SS), chemical oxygen demand of organic matter (CODmn O2), total phosphorus (tot. P) and total nitrogen (tot. N) entering the ditches with respect to milled Carex (C) and Sphagnum (S) peats of humification grades H 2.5 8.5. In order to calculate the loading of drainage ditches from a milled peat production mire with the aid of these annual comparative values (in kg/ha), information is required on the properties of the intended production mire and its peat. Once data are available on the area of the mire, its peat depth, peat types and their degrees of humification, dry matter content, calorific value and corresponding energy content, it is possible to produce mutually comparable estimates for individual mires with respect to the annual loading of the drainage ditch system and the surrounding watercourse for the whole service life of the production area, the duration of this service life, determinations of energy content and the amount of loading per unit of energy generated (kg/MWh). In the 8 mires in the Köyhäjoki basin, Central Ostrobothnia, taken as an example, the loading of suspended solids (SS) in the drainage ditch networks calculated on the basis of the typical values obtained here and existing mire and peat data and expressed per unit of energy generated varied between the mires and horizons in the range 0.9 16.5 kg/MWh. One of the aims of this work was to develop means of making better use of existing mire and peat data and the results of corings and other field investigations. In this respect combination of the typical loading values (kg/ha) obtained here for S, SC, CS and C peats and the various degrees of humification (H 2.5 8.5) with the above mire and peat data by means of a computer program for the acquisition and handling of such data would enable all the information currently available and that deposited in the system in the future to be used for defining watercourse loading estimates for mires and comparing them with the corresponding estimates of energy content. The intention behind this work has been to respond to the challenge facing the energy generation industry to find larger peat production areas that exert less loading on the environment and to that facing the environmental authorities to improve the means available for estimating watercourse loading from peat production and its environmental impacts in advance. The results conform well to the initial hypothesis and to the goals laid down for the research and should enable watercourse loading from existing and planned peat production to be evaluated better in the future and the resulting impacts to be taken into account when planning land use and energy generation. The advance loading information available in this way would be of value in the selection of individual peat production areas, the planning of their exploitation, the introduction of water protection measures and the planning of loading inspections, in order to achieve controlled peat production that pays due attention to environmental considerations.
Resumo:
Gene mapping is a systematic search for genes that affect observable characteristics of an organism. In this thesis we offer computational tools to improve the efficiency of (disease) gene-mapping efforts. In the first part of the thesis we propose an efficient simulation procedure for generating realistic genetical data from isolated populations. Simulated data is useful for evaluating hypothesised gene-mapping study designs and computational analysis tools. As an example of such evaluation, we demonstrate how a population-based study design can be a powerful alternative to traditional family-based designs in association-based gene-mapping projects. In the second part of the thesis we consider a prioritisation of a (typically large) set of putative disease-associated genes acquired from an initial gene-mapping analysis. Prioritisation is necessary to be able to focus on the most promising candidates. We show how to harness the current biomedical knowledge for the prioritisation task by integrating various publicly available biological databases into a weighted biological graph. We then demonstrate how to find and evaluate connections between entities, such as genes and diseases, from this unified schema by graph mining techniques. Finally, in the last part of the thesis, we define the concept of reliable subgraph and the corresponding subgraph extraction problem. Reliable subgraphs concisely describe strong and independent connections between two given vertices in a random graph, and hence they are especially useful for visualising such connections. We propose novel algorithms for extracting reliable subgraphs from large random graphs. The efficiency and scalability of the proposed graph mining methods are backed by extensive experiments on real data. While our application focus is in genetics, the concepts and algorithms can be applied to other domains as well. We demonstrate this generality by considering coauthor graphs in addition to biological graphs in the experiments.
Resumo:
Most research on technology roadmapping has focused on its practical applications and the development of methods to enhance its operational process. Thus, despite a demand for well-supported, systematic information, little attention has been paid to how/which information can be utilised in technology roadmapping. Therefore, this paper aims at proposing a methodology to structure technological information in order to facilitate the process. To this end, eight methods are suggested to provide useful information for technology roadmapping: summary, information extraction, clustering, mapping, navigation, linking, indicators and comparison. This research identifies the characteristics of significant data that can potentially be used in roadmapping, and presents an approach to extracting important information from such raw data through various data mining techniques including text mining, multi-dimensional scaling and K-means clustering. In addition, this paper explains how this approach can be applied in each step of roadmapping. The proposed approach is applied to develop a roadmap of radio-frequency identification (RFID) technology to illustrate the process practically. © 2013 © 2013 Taylor & Francis.
Resumo:
Grattan, J.P., Gilbertson, D.D., Hunt, C.O. (2007). The local and global dimensions of metaliferrous air pollution derived from a reconstruction of an 8 thousand year record of copper smelting and mining at a desert-mountain frontier in southern Jordan. Journal of Archaeological Science 34, 83-110
Resumo:
Many factors such as poverty, ineffective institutions and environmental regulations may prevent developing countries from managing how natural resources are extracted to meet a strong market demand. Extraction for some resources has reached such proportions that evidence is measurable from space. We present recent evidence of the global demand for a single commodity and the ecosystem destruction resulting from commodity extraction, recorded by satellites for one of the most biodiverse areas of the world. We find that since 2003, recent mining deforestation in Madre de Dios, Peru is increasing nonlinearly alongside a constant annual rate of increase in international gold price (∼18%/yr). We detect that the new pattern of mining deforestation (1915 ha/year, 2006-2009) is outpacing that of nearby settlement deforestation. We show that gold price is linked with exponential increases in Peruvian national mercury imports over time (R(2) = 0.93, p = 0.04, 2003-2009). Given the past rates of increase we predict that mercury imports may more than double for 2011 (∼500 t/year). Virtually all of Peru's mercury imports are used in artisanal gold mining. Much of the mining increase is unregulated/artisanal in nature, lacking environmental impact analysis or miner education. As a result, large quantities of mercury are being released into the atmosphere, sediments and waterways. Other developing countries endowed with gold deposits are likely experiencing similar environmental destruction in response to recent record high gold prices. The increasing availability of satellite imagery ought to evoke further studies linking economic variables with land use and cover changes on the ground.
Resumo:
Two approaches were undertaken to characterize the arsenic (As) content of Chinese rice. First, a national market basket survey (n = 240) was conducted in provincial capitals, sourcing grain from China's premier rice production areas. Second, to reflect rural diets, paddy rice (n = 195) directly from farmers fields were collected from three regions in Hunan, a key rice producing province located in southern China. Two of the sites were within mining and smeltery districts, and the third was devoid of large-scale metal processing industries. Arsenic levels were determined in all the samples while a subset (n = 33) were characterized for As species, using a new simple and rapid extraction method suitable for use with Hamilton PRP-X100 anion exchange columns and HPLC-ICP-MS. The vast majority (85%) of the market rice grains possessed total As levels <150 ng g(-1). The rice collected from mine-impacted regions, however, were found to be highly enriched in As, reaching concentrations of up to 624 ng g(-1). Inorganic As (As(i)) was the predominant species detected in all of the speciated grain, with As(i) levels in some samples exceeding 300 ng g(-1). The As(i) concentration in polished and unpolished Chinese rice was successfully predicted from total As levels. The mean baseline concentrations for As(i) in Chinese market rice based on this survey were estimated to be 96 ng g(-1) while levels in mine-impacted areas were higher with ca. 50% of the rice in one region predicted to fail the national standard.
Resumo:
Mining seafloor massive sulfides for metals is an emergent industry faced with environmental management challenges. These revolve largely around limits to our current understanding of biological variability in marine systems, a challenge common to all marine environmental management. VentBase was established as a forum where academic, commercial, governmental, and non-governmental stakeholders can develop a consensus regarding the management of exploitative activities in the deep-sea. Participants advocate a precautionary approach with the incorporation of lessons learned from coastal studies. This workshop report from VentBase encourages the standardization of sampling methodologies for deep-sea environmental impact assessment. VentBase stresses the need for the collation of spatial data and importance of datasets amenable to robust statistical analyses. VentBase supports the identification of set-asides to prevent the local extirpation of vent-endemic communities and for the post-extraction recolonization of mine sites. © 2013.
Resumo:
The rapid evolution and proliferation of a world-wide computerized network, the Internet, resulted in an overwhelming and constantly growing amount of publicly available data and information, a fact that was also verified in biomedicine. However, the lack of structure of textual data inhibits its direct processing by computational solutions. Information extraction is the task of text mining that intends to automatically collect information from unstructured text data sources. The goal of the work described in this thesis was to build innovative solutions for biomedical information extraction from scientific literature, through the development of simple software artifacts for developers and biocurators, delivering more accurate, usable and faster results. We started by tackling named entity recognition - a crucial initial task - with the development of Gimli, a machine-learning-based solution that follows an incremental approach to optimize extracted linguistic characteristics for each concept type. Afterwards, Totum was built to harmonize concept names provided by heterogeneous systems, delivering a robust solution with improved performance results. Such approach takes advantage of heterogenous corpora to deliver cross-corpus harmonization that is not constrained to specific characteristics. Since previous solutions do not provide links to knowledge bases, Neji was built to streamline the development of complex and custom solutions for biomedical concept name recognition and normalization. This was achieved through a modular and flexible framework focused on speed and performance, integrating a large amount of processing modules optimized for the biomedical domain. To offer on-demand heterogenous biomedical concept identification, we developed BeCAS, a web application, service and widget. We also tackled relation mining by developing TrigNER, a machine-learning-based solution for biomedical event trigger recognition, which applies an automatic algorithm to obtain the best linguistic features and model parameters for each event type. Finally, in order to assist biocurators, Egas was developed to support rapid, interactive and real-time collaborative curation of biomedical documents, through manual and automatic in-line annotation of concepts and relations. Overall, the research work presented in this thesis contributed to a more accurate update of current biomedical knowledge bases, towards improved hypothesis generation and knowledge discovery.
Resumo:
Regional Innovation Systems describe the relations between actors, structures and infrastructures in a region in order to stimulate innovation and regional development. For these systems the collection and organization of information is crucial. In the present paper we investigate the possibilities to extract information from websites of companies. First we describe regional innovation systems and the information types that are necessary to create them. Then we discuss the possibilities of text mining and keyword extraction techniques to extract this information from company websites. Finally, we describe a small scale experiment in which keywords related to economic sectors and commodities are extracted from the websites of over 200 companies. This experiment shows what the main challenges are for information extraction from websites for regional innovation systems.
Resumo:
O sector do turismo é uma área francamente em crescimento em Portugal e que tem desenvolvido a sua divulgação e estratégia de marketing. Contudo, apenas se prende com indicadores de desempenho e de oferta instalada (número de quartos, hotéis, voos, estadias), deixando os indicadores estatísticos em segundo plano. De acordo com o “ Travel & tourism Competitiveness Report 2013”, do World Economic Forum, classifica Portugal em 72º lugar no que respeita à qualidade e cobertura da informação estatística, disponível para o sector do Turismo. Refira-se que Espanha ocupa o 3º lugar. Uma estratégia de mercado, sem base analítica, que sustente um quadro de orientações específico e objetivo, com relevante conhecimento dos mercados alvo, dificilmente é compreensível ou até mesmo materializável. A implementação de uma estrutura de Business Intelligence que permita a realização de um levantamento e tratamento de dados que possibilite relacionar e sustentar os resultados obtidos no sector do turismo revela-se fundamental e crucial, para que sejam criadas estratégias de mercado. Essas estratégias são realizadas a partir da informação dos turistas que nos visitam, e dos potenciais turistas, para que possam ser cativados no futuro. A análise das características e dos padrões comportamentais dos turistas permite definir perfis distintos e assim detetar as tendências de mercado, de forma a promover a oferta dos produtos e serviços mais adequados. O conhecimento obtido permite, por um lado criar e disponibilizar os produtos mais atrativos para oferecer aos turistas e por outro informá-los, de uma forma direcionada, da existência desses produtos. Assim, a associação de uma recomendação personalizada que, com base no conhecimento de perfis do turista proceda ao aconselhamento dos melhores produtos, revela-se como uma ferramenta essencial na captação e expansão de mercado.
Resumo:
Thecamoebian (testate amoeba) species diversity and assemblages in reclamation wetlands and lakes in northeastern Alberta respond to chemical and physical parameters associated with oil sands extraction. Ecosystems more impacted by OSPM (oil sands process-affected material) contain sparse, low-diversity populations dominated by centropyxid taxa and Arcella vulgaris. More abundant and diverse thecamoebian populations rich in difflugiid species characterize environments with lower OSPM concentrations. These shelled protists respond quickly to environmental change, allowing year-to-year variations in OSPM impact to be recorded. Their fossil record thus provides corporations with interests in the Athabasca Oil Sands with a potential means of measuring the progression of highlyimpacted aquatic environments to more natural wetlands. Development of this metric required investigation of controls on their fossil assemblage (e.g. seasonal variability, fossilization potential) and their biogeographic distribution, not only in the constructed lakes and wetlands on the oil sands leases, but also in natural environments across Alberta.