992 resultados para Extracting information


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Three-dimensional spectroscopy techniques are becoming more and more popular, producing an increasing number of large data cubes. The challenge of extracting information from these cubes requires the development of new techniques for data processing and analysis. We apply the recently developed technique of principal component analysis (PCA) tomography to a data cube from the center of the elliptical galaxy NGC 7097 and show that this technique is effective in decomposing the data into physically interpretable information. We find that the first five principal components of our data are associated with distinct physical characteristics. In particular, we detect a low-ionization nuclear-emitting region (LINER) with a weak broad component in the Balmer lines. Two images of the LINER are present in our data, one seen through a disk of gas and dust, and the other after scattering by free electrons and/or dust particles in the ionization cone. Furthermore, we extract the spectrum of the LINER, decontaminated from stellar and extended nebular emission, using only the technique of PCA tomography. We anticipate that the scattered image has polarized light due to its scattered nature.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ao longo dos últimos anos, as regras de associação têm assumido um papel relevante na extracção de informação e de conhecimento em base de dados e vêm com isso auxiliar o processo de tomada de decisão. A maioria dos trabalhos de investigação desenvolvidos sobre regras de associação têm por base o modelo de suporte e confiança. Este modelo permite obter regras de associação que envolvem particularmente conjuntos de itens frequentes. Contudo, nos últimos anos, tem-se explorado conjuntos de itens que surgem com menor frequência, designados de regras de associação raras ou infrequentes. Muitas das regras com base nestes itens têm particular interesse para o utilizador. Actualmente a investigação sobre regras de associação procuram incidir na geração do maior número possível de regras com interesse aglomerando itens raros e frequentes. Assim, este estudo foca, inicialmente, uma pesquisa sobre os principais algoritmos de data mining que abordam as regras de associação. A finalidade deste trabalho é examinar as técnicas e algoritmos de extracção de regras de associação já existentes, verificar as principais vantagens e desvantagens dos algoritmos na extracção de regras de associação e, por fim, desenvolver um algoritmo cujo objectivo é gerar regras de associação que envolvem itens raros e frequentes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Electrotécnica e de Computadores

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we review the impact that the availability of the Schistosoma mansoni genome sequence and annotation has had on schistosomiasis research. Easy access to the genomic information is important and several types of data are currently being integrated, such as proteomics, microarray and polymorphic loci. Access to the genome annotation and powerful means of extracting information are major resources to the research community.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Consumer reviews, opinions and shared experiences in the use of a product is a powerful source of information about consumer preferences that can be used in recommender systems. Despite the importance and value of such information, there is no comprehensive mechanism that formalizes the opinions selection and retrieval process and the utilization of retrieved opinions due to the difficulty of extracting information from text data. In this paper, a new recommender system that is built on consumer product reviews is proposed. A prioritizing mechanism is developed for the system. The proposed approach is illustrated using the case study of a recommender system for digital cameras

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Projecte sobre la comparativa de cinc solucions de virtualització centrant-nos especialment en el rendiment de les màquines virtualitzades, extraient informació d'aquestes mitjançant la utilització de benchmarks i posterior anàlisi de les dades obtingudes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Understanding and quantifying seismic energy dissipation, which manifests itself in terms of velocity dispersion and attenuation, in fluid-saturated porous rocks is of considerable interest, since it offers the perspective of extracting information with regard to the elastic and hydraulic rock properties. There is increasing evidence to suggest that wave-induced fluid flow, or simply WIFF, is the dominant underlying physical mechanism governing these phenomena throughout the seismic, sonic, and ultrasonic frequency ranges. This mechanism, which can prevail at the microscopic, mesoscopic, and macroscopic scale ranges, operates through viscous energy dissipation in response to fluid pressure gradients and inertial effects induced by the passing wavefield. In the first part of this thesis, we present an analysis of broad-band multi-frequency sonic log data from a borehole penetrating water-saturated unconsolidated glacio-fluvial sediments. An inherent complication arising in the interpretation of the observed P-wave attenuation and velocity dispersion is, however, that the relative importance of WIFF at the various scales is unknown and difficult to unravel. An important generic result of our work is that the levels of attenuation and velocity dispersion due to the presence of mesoscopic heterogeneities in water-saturated unconsolidated clastic sediments are expected to be largely negligible. Conversely, WIFF at the macroscopic scale allows for explaining most of the considered data while refinements provided by including WIFF at the microscopic scale in the analysis are locally meaningful. Using a Monte-Carlo-type inversion approach, we compare the capability of the different models describing WIFF at the macroscopic and microscopic scales with regard to their ability to constrain the dry frame elastic moduli and the permeability as well as their local probability distribution. In the second part of this thesis, we explore the issue of determining the size of a representative elementary volume (REV) arising in the numerical upscaling procedures of effective seismic velocity dispersion and attenuation of heterogeneous media. To this end, we focus on a set of idealized synthetic rock samples characterized by the presence of layers, fractures or patchy saturation in the mesocopic scale range. These scenarios are highly pertinent because they tend to be associated with very high levels of velocity dispersion and attenuation caused by WIFF in the mesoscopic scale range. The problem of determining the REV size for generic heterogeneous rocks is extremely complex and entirely unexplored in the given context. In this pilot study, we have therefore focused on periodic media, which assures the inherent self- similarity of the considered samples regardless of their size and thus simplifies the problem to a systematic analysis of the dependence of the REV size on the applied boundary conditions in the numerical simulations. Our results demonstrate that boundary condition effects are absent for layered media and negligible in the presence of patchy saturation, thus resulting in minimum REV sizes. Conversely, strong boundary condition effects arise in the presence of a periodic distribution of finite-length fractures, thus leading to large REV sizes. In the third part of the thesis, we propose a novel effective poroelastic model for periodic media characterized by mesoscopic layering, which accounts for WIFF at both the macroscopic and mesoscopic scales as well as for the anisotropy associated with the layering. Correspondingly, this model correctly predicts the existence of the fast and slow P-waves as well as quasi and pure S-waves for any direction of wave propagation as long as the corresponding wavelengths are much larger than the layer thicknesses. The primary motivation for this work is that, for formations of intermediate to high permeability, such as, for example, unconsolidated sediments, clean sandstones, or fractured rocks, these two WIFF mechanisms may prevail at similar frequencies. This scenario, which can be expected rather common, cannot be accounted for by existing models for layered porous media. Comparisons of analytical solutions of the P- and S-wave phase velocities and inverse quality factors for wave propagation perpendicular to the layering with those obtained from numerical simulations based on a ID finite-element solution of the poroelastic equations of motion show very good agreement as long as the assumption of long wavelengths remains valid. A limitation of the proposed model is its inability to account for inertial effects in mesoscopic WIFF when both WIFF mechanisms prevail at similar frequencies. Our results do, however, also indicate that the associated error is likely to be relatively small, as, even at frequencies at which both inertial and scattering effects are expected to be at play, the proposed model provides a solution that is remarkably close to its numerical benchmark. -- Comprendre et pouvoir quantifier la dissipation d'énergie sismique qui se traduit par la dispersion et l'atténuation des vitesses dans les roches poreuses et saturées en fluide est un intérêt primordial pour obtenir des informations à propos des propriétés élastique et hydraulique des roches en question. De plus en plus d'études montrent que le déplacement relatif du fluide par rapport au solide induit par le passage de l'onde (wave induced fluid flow en anglais, dont on gardera ici l'abréviation largement utilisée, WIFF), représente le principal mécanisme physique qui régit ces phénomènes, pour la gamme des fréquences sismiques, sonique et jusqu'à l'ultrasonique. Ce mécanisme, qui prédomine aux échelles microscopique, mésoscopique et macroscopique, est lié à la dissipation d'énergie visqueuse résultant des gradients de pression de fluide et des effets inertiels induits par le passage du champ d'onde. Dans la première partie de cette thèse, nous présentons une analyse de données de diagraphie acoustique à large bande et multifréquences, issues d'un forage réalisé dans des sédiments glaciaux-fluviaux, non-consolidés et saturés en eau. La difficulté inhérente à l'interprétation de l'atténuation et de la dispersion des vitesses des ondes P observées, est que l'importance des WIFF aux différentes échelles est inconnue et difficile à quantifier. Notre étude montre que l'on peut négliger le taux d'atténuation et de dispersion des vitesses dû à la présence d'hétérogénéités à l'échelle mésoscopique dans des sédiments clastiques, non- consolidés et saturés en eau. A l'inverse, les WIFF à l'échelle macroscopique expliquent la plupart des données, tandis que les précisions apportées par les WIFF à l'échelle microscopique sont localement significatives. En utilisant une méthode d'inversion du type Monte-Carlo, nous avons comparé, pour les deux modèles WIFF aux échelles macroscopique et microscopique, leur capacité à contraindre les modules élastiques de la matrice sèche et la perméabilité ainsi que leur distribution de probabilité locale. Dans une seconde partie de cette thèse, nous cherchons une solution pour déterminer la dimension d'un volume élémentaire représentatif (noté VER). Cette problématique se pose dans les procédures numériques de changement d'échelle pour déterminer l'atténuation effective et la dispersion effective de la vitesse sismique dans un milieu hétérogène. Pour ce faire, nous nous concentrons sur un ensemble d'échantillons de roches synthétiques idéalisés incluant des strates, des fissures, ou une saturation partielle à l'échelle mésoscopique. Ces scénarios sont hautement pertinents, car ils sont associés à un taux très élevé d'atténuation et de dispersion des vitesses causé par les WIFF à l'échelle mésoscopique. L'enjeu de déterminer la dimension d'un VER pour une roche hétérogène est très complexe et encore inexploré dans le contexte actuel. Dans cette étude-pilote, nous nous focalisons sur des milieux périodiques, qui assurent l'autosimilarité des échantillons considérés indépendamment de leur taille. Ainsi, nous simplifions le problème à une analyse systématique de la dépendance de la dimension des VER aux conditions aux limites appliquées. Nos résultats indiquent que les effets des conditions aux limites sont absents pour un milieu stratifié, et négligeables pour un milieu à saturation partielle : cela résultant à des dimensions petites des VER. Au contraire, de forts effets des conditions aux limites apparaissent dans les milieux présentant une distribution périodique de fissures de taille finie : cela conduisant à de grandes dimensions des VER. Dans la troisième partie de cette thèse, nous proposons un nouveau modèle poro- élastique effectif, pour les milieux périodiques caractérisés par une stratification mésoscopique, qui prendra en compte les WIFF à la fois aux échelles mésoscopique et macroscopique, ainsi que l'anisotropie associée à ces strates. Ce modèle prédit alors avec exactitude l'existence des ondes P rapides et lentes ainsi que les quasis et pures ondes S, pour toutes les directions de propagation de l'onde, tant que la longueur d'onde correspondante est bien plus grande que l'épaisseur de la strate. L'intérêt principal de ce travail est que, pour les formations à perméabilité moyenne à élevée, comme, par exemple, les sédiments non- consolidés, les grès ou encore les roches fissurées, ces deux mécanismes d'WIFF peuvent avoir lieu à des fréquences similaires. Or, ce scénario, qui est assez commun, n'est pas décrit par les modèles existants pour les milieux poreux stratifiés. Les comparaisons des solutions analytiques des vitesses des ondes P et S et de l'atténuation de la propagation des ondes perpendiculaires à la stratification, avec les solutions obtenues à partir de simulations numériques en éléments finis, fondées sur une solution obtenue en 1D des équations poro- élastiques, montrent un très bon accord, tant que l'hypothèse des grandes longueurs d'onde reste valable. Il y a cependant une limitation de ce modèle qui est liée à son incapacité à prendre en compte les effets inertiels dans les WIFF mésoscopiques quand les deux mécanismes d'WIFF prédominent à des fréquences similaires. Néanmoins, nos résultats montrent aussi que l'erreur associée est relativement faible, même à des fréquences à laquelle sont attendus les deux effets d'inertie et de diffusion, indiquant que le modèle proposé fournit une solution qui est remarquablement proche de sa référence numérique.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

EPR users often face the problem of extracting information from frequently low-resolution and complex EPR spectra. Simulation programs that provide a series of parameters, characteristic of the investigated system, have been used to achieve this goal. This work describes the general aspects of one of those programs, the NLSL program, used to fit EPR spectra applying a nonlinear least squares method. Several motion regimes of the probes are included in this computational tool, covering a broad range of spectral changes. The meanings of the different parameters and rotational diffusion models are discussed. The anisotropic case is also treated by including an orienting potential and order parameters. Some examples are presented in order to show its applicability in different systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The present dissertation examined reading development during elementary school years by means of eye movement tracking. Three different but related issues in this field were assessed. First of all, the development of parafoveal processing skills in reading was investigated. Second, it was assessed whether and to what extent sublexical units such as syllables and morphemes are used in processing Finnish words and whether the use of these sublexical units changes as a function of reading proficiency. Finally, the developmental trend in the speed of visual information extraction during reading was examined. With regard to parafoveal processing skills, it was shown that 2nd graders extract letter identity information approx. 5 characters to the right of fixation, 4th graders approx. 7 characters to the right of fixation, and 6th graders and adults approx. 9 characters to the right of fixation. Furthermore, it was shown that all age groups extract more parafoveal information within compound words than across adjectivenoun pairs of similar length. In compounds, parafoveal word information can be extracted in parallel with foveal word information, if the compound in question is of high frequency. With regard to the use of sublexical units in Finnish word processing, it was shown that less proficient 2nd graders use both syllables and morphemes in the course of lexical access. More proficient 2nd graders as well as older readers seem to process words more holistically. Finally, it was shown that 60 ms is enough for 4th graders and adults to extract visual information from both 4-letter and 8-letter words, whereas 2nd graders clearly needed more than 60 ms to extract all information from 8- letter words for processing to proceed smoothly. The present dissertation demonstrates that Finnish 2nd graders develop their reading skills rapidly and are already at an adult level in some aspects of reading. This is not to say that there are no differences between less proficient (e.g., 2nd graders) and more proficient readers (e.g., adults) but in some respects it seems that the visual system used in extracting information from the text is matured by the 2nd grade. Furthermore, the present dissertation demonstrates that the allocation of attention in reading depends much on textual properties such as word frequency and whether words are spatially unified (as in compounds) or not. This flexibility of the attentional system naturally needs to be captured in word processing models. Finally, individual differences within age groups are quite substantial but it seems that by the end of the 2nd grade practically all Finnish children have reached a reasonable level of reading proficiency.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Companies require information in order to gain an improved understanding of their customers. Data concerning customers, their interests and behavior are collected through different loyalty programs. The amount of data stored in company data bases has increased exponentially over the years and become difficult to handle. This research area is the subject of much current interest, not only in academia but also in practice, as is shown by several magazines and blogs that are covering topics on how to get to know your customers, Big Data, information visualization, and data warehousing. In this Ph.D. thesis, the Self-Organizing Map and two extensions of it – the Weighted Self-Organizing Map (WSOM) and the Self-Organizing Time Map (SOTM) – are used as data mining methods for extracting information from large amounts of customer data. The thesis focuses on how data mining methods can be used to model and analyze customer data in order to gain an overview of the customer base, as well as, for analyzing niche-markets. The thesis uses real world customer data to create models for customer profiling. Evaluation of the built models is performed by CRM experts from the retailing industry. The experts considered the information gained with help of the models to be valuable and useful for decision making and for making strategic planning for the future.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Consumer reviews, opinions and shared experiences in the use of a product is a powerful source of information about consumer preferences that can be used in recommender systems. Despite the importance and value of such information, there is no comprehensive mechanism that formalizes the opinions selection and retrieval process and the utilization of retrieved opinions due to the difficulty of extracting information from text data. In this paper, a new recommender system that is built on consumer product reviews is proposed. A prioritizing mechanism is developed for the system. The proposed approach is illustrated using the case study of a recommender system for digital cameras

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Two ongoing projects at ESSC that involve the development of new techniques for extracting information from airborne LiDAR data and combining this information with environmental models will be discussed. The first project in conjunction with Bristol University is aiming to improve 2-D river flood flow models by using remote sensing to provide distributed data for model calibration and validation. Airborne LiDAR can provide such models with a dense and accurate floodplain topography together with vegetation heights for parameterisation of model friction. The vegetation height data can be used to specify a friction factor at each node of a model’s finite element mesh. A LiDAR range image segmenter has been developed which converts a LiDAR image into separate raster maps of surface topography and vegetation height for use in the model. Satellite and airborne SAR data have been used to measure flood extent remotely in order to validate the modelled flood extent. Methods have also been developed for improving the models by decomposing the model’s finite element mesh to reflect floodplain features such as hedges and trees having different frictional properties to their surroundings. Originally developed for rural floodplains, the segmenter is currently being extended to provide DEMs and friction parameter maps for urban floods, by fusing the LiDAR data with digital map data. The second project is concerned with the extraction of tidal channel networks from LiDAR. These networks are important features of the inter-tidal zone, and play a key role in tidal propagation and in the evolution of salt-marshes and tidal flats. The study of their morphology is currently an active area of research, and a number of theories related to networks have been developed which require validation using dense and extensive observations of network forms and cross-sections. The conventional method of measuring networks is cumbersome and subjective, involving manual digitisation of aerial photographs in conjunction with field measurement of channel depths and widths for selected parts of the network. A semi-automatic technique has been developed to extract networks from LiDAR data of the inter-tidal zone. A multi-level knowledge-based approach has been implemented, whereby low level algorithms first extract channel fragments based mainly on image properties then a high level processing stage improves the network using domain knowledge. The approach adopted at low level uses multi-scale edge detection to detect channel edges, then associates adjacent anti-parallel edges together to form channels. The higher level processing includes a channel repair mechanism.