904 resultados para INFORMATION EXTRACTION FROM DOCUMENTS
Resumo:
A valuable alternative to US cardiotocography, for fetal surveillance, can be offered by phonocardiography, a passive and low cost acoustic recording of fetal heart sounds. A crucial point is the exact recognizing of the fetal heart sounds, associated to each fetal heart beat, and then the estimation of FHR signal. In this work, software for FHR assessment from phonocardiographic signals was developed. To check the reliability of the software, obtained results were compared with those of simultaneously recorded cardiotocographic signals. Results seemed to be satisfying, as provided FHR series were almost all confined within FHR-CTG +/- 3 bpm, where FHR-CTG were FHR series provided by commercial US cardiotocographic devices, currently employed in clinical routine.
Resumo:
Carte du Ciel (from French, map of the sky) is a part of a 19th century extensive international astronomical project whose goal was to map the entire visible sky. The results of this vast effort were collected in the form of astrographic plates and their paper representatives that are called astrographic maps and are widely distributed among many observatories and astronomical institutes over the world. Our goal is to design methods and algorithms to automatically extract data from digitized Carte du Ciel astrographic maps. This paper examines the image processing and pattern recognition techniques that can be adopted for automatic extraction of astronomical data from stars’ triple expositions that can aid variable stars detection in Carte du Ciel maps.
Resumo:
The publisher regrets to inform the readers that the image that is appearing for Fig. 8 is incorrect and that the Supplementary material is missing on the published paper. The correct image for Fig. 8 and the Supplementary files are provided below: Fig. 8. (a) Timber blocks covered by invented plastic container bottom open, (b) timber blocks in the field after trial, (c) and (d) comparison between resin-coated blocks without termite damage and control blocks which were severely damaged by termites.
Resumo:
Salinity gradient power (SGP) is the energy that can be obtained from the mixing entropy of two solutions with a different salt concentration. River estuary, as a place for mixing salt water and fresh water, has a huge potential of this renewable energy. In this study, this potential in the estuaries of rivers leading to the Persian Gulf and the factors affecting it are analysis and assessment. Since most of the full water rivers are in the Asia, this continent with the potential power of 338GW is a second major source of energy from the salinity gradient power in the world (Wetsus institute, 2009). Persian Gulf, with the proper salinity gradient in its river estuaries, has Particular importance for extraction of this energy. Considering the total river flow into the Persian Gulf, which is approximately equal to 3486 m3/s, the amount of theoretical extractable power from salinity gradient in this region is 5.2GW. Iran, with its numerous rivers along the coast of the Persian Gulf, has a great share of this energy source. For example, with study calculations done on data from three hydrometery stations located on the Arvand River, Khorramshahr Station with releasing 1.91M/ energy which is obtained by combining 1.26m3 river water with 0.74 m3 sea water, is devoted to itself extracting the maximum amount of extractable energy. Considering the average of annual discharge of Arvand River in Khorramshahr hydrometery station, the amount of theoretical extractable power is 955 MW. Another part of parameters that are studied in this research, are the intrusion length of salt water and its flushing time in the estuary that have a significant influence on the salinity gradient power. According to the calculation done in conditions HWS and the average discharge of rivers, the maximum of salinity intrusion length in to the estuary of the river by 41km is related to Arvand River and the lowest with 8km is for Helle River. Also the highest rate of salt water flushing time in the estuary with 9.8 days is related to the Arvand River and the lowest with 3.3 days is for Helle River. Influence of these two parameters on reduces the amount of extractable energy from salinity gradient power as well as can be seen in the estuaries of the rivers studied. For example, at the estuary of the Arvand River in the interval 8.9 days, salinity gradient power decreases 9.2%. But another part of this research focuses on the design of a suitable system for extracting electrical energy from the salinity gradient. So far, five methods have been proposed to convert this energy to electricity that among them, reverse electro-dialysis (RED) method and pressure-retarded osmosis (PRO) method have special importance in practical terms. In theory both techniques generate the same amount of energy from given volumes of sea and river water with specified salinity; in practice the RED technique seems to be more attractive for power generation using sea water and river water. Because it is less necessity of salinity gradient to PRO method. In addition to this, in RED method, it does not need to use turbine to change energy and the electricity generation is started when two solutions are mixed. In this research, the power density and the efficiency of generated energy was assessment by designing a physical method. The physical designed model is an unicellular reverse electro-dialysis battery with nano heterogenic membrane has 20cmx20cm dimension, which produced power density 0.58 W/m2 by using river water (1 g NaCl/lit) and sea water (30 g NaCl/lit) in laboratorial condition. This value was obtained because of nano method used on the membrane of this system and suitable design of the cell which led to increase the yield of the system efficiency 11% more than non nano ones.
Resumo:
Information entropy measured from acoustic emission (AE) waveforms is shown to be an indicator of fatigue damage in a high-strength aluminum alloy. Several tension-tension fatigue experiments were performed with dogbone samples of aluminum alloy, Al7075-T6, a commonly used material in aerospace structures. Unlike previous studies in which fatigue damage is simply measured based on visible crack growth, this work investigated fatigue damage prior to crack initiation through the use of instantaneous elastic modulus degradation. Three methods of measuring the AE information entropy, regarded as a direct measure of microstructural disorder, are proposed and compared with traditional damage-related AE features. Results show that one of the three entropy measurement methods appears to better assess damage than the traditional AE features, while the other two entropies have unique trends that can differentiate between small and large cracks.
Resumo:
© 2015. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Resumo:
1989
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
We are included in a society where the use of the Internet became very important to our everyday life. The relationships nowadays usually happen through technological devices instead of face to face contact, for instance, Internet forums where people can discuss online. However, the global analysis is a big challenge, due to the large amount of data. This work investigates the use of visual representations to support an exploratory analysis of contents in messages from discussions forums. This analysis considers the thematic and the chronology. The target forums refer to the educational area and the analysis happens manually, i.e. by direct reading message-by-message. The proprieties of perception and cognition of the human visual system allow a person the capacity to conduct high-level tasks in information extraction from a graphical or visual representation of data. Therefore, this work was based on Visual Analytics, an area that aims create techniques that amplify these human abilities. For that reason we used software that creates a visualization of data from a forum. This software allows a forum content analysis. But, during the work, we identified the necessity to create a new tool to clean the data, because the data had a lot of unnecessary information. After cleaning the data we created a new visualization and held an analysis seeking a new knowledge. In the end we compared the new visualization with the manual analysis that had been made. Analyzing the results, it was evident the potential of visualization use, it provides a better correlation between the information, enabling the acquisition of new knowledge that was not identified in the initial analysis, providing a better use of the forum content
Resumo:
Civil infrastructure provides essential services for the development of both society and economy. It is very important to manage systems efficiently to ensure sound performance. However, there are challenges in information extraction from available data, which also necessitates the establishment of methodologies and frameworks to assist stakeholders in the decision making process. This research proposes methodologies to evaluate systems performance by maximizing the use of available information, in an effort to build and maintain sustainable systems. Under the guidance of problem formulation from a holistic view proposed by Mukherjee and Muga, this research specifically investigates problem solving methods that measure and analyze metrics to support decision making. Failures are inevitable in system management. A methodology is developed to describe arrival pattern of failures in order to assist engineers in failure rescues and budget prioritization especially when funding is limited. It reveals that blockage arrivals are not totally random. Smaller meaningful subsets show good random behavior. Additional overtime failure rate is analyzed by applying existing reliability models and non-parametric approaches. A scheme is further proposed to depict rates over the lifetime of a given facility system. Further analysis of sub-data sets is also performed with the discussion of context reduction. Infrastructure condition is another important indicator of systems performance. The challenges in predicting facility condition are the transition probability estimates and model sensitivity analysis. Methods are proposed to estimate transition probabilities by investigating long term behavior of the model and the relationship between transition rates and probabilities. To integrate heterogeneities, model sensitivity is performed for the application of non-homogeneous Markov chains model. Scenarios are investigated by assuming transition probabilities follow a Weibull regressed function and fall within an interval estimate. For each scenario, multiple cases are simulated using a Monte Carlo simulation. Results show that variations on the outputs are sensitive to the probability regression. While for the interval estimate, outputs have similar variations to the inputs. Life cycle cost analysis and life cycle assessment of a sewer system are performed comparing three different pipe types, which are reinforced concrete pipe (RCP) and non-reinforced concrete pipe (NRCP), and vitrified clay pipe (VCP). Life cycle cost analysis is performed for material extraction, construction and rehabilitation phases. In the rehabilitation phase, Markov chains model is applied in the support of rehabilitation strategy. In the life cycle assessment, the Economic Input-Output Life Cycle Assessment (EIO-LCA) tools are used in estimating environmental emissions for all three phases. Emissions are then compared quantitatively among alternatives to support decision making.
Resumo:
Control design for stochastic uncertain nonlinear systems is traditionally based on minimizing the expected value of a suitably chosen loss function. Moreover, most control methods usually assume the certainty equivalence principle to simplify the problem and make it computationally tractable. We offer an improved probabilistic framework which is not constrained by these previous assumptions, and provides a more natural framework for incorporating and dealing with uncertainty. The focus of this paper is on developing this framework to obtain an optimal control law strategy using a fully probabilistic approach for information extraction from process data, which does not require detailed knowledge of system dynamics. Moreover, the proposed control method framework allows handling the problem of input-dependent noise. A basic paradigm is proposed and the resulting algorithm is discussed. The proposed probabilistic control method is for the general nonlinear class of discrete-time systems. It is demonstrated theoretically on the affine class. A nonlinear simulation example is also provided to validate theoretical development.
Resumo:
Humans have a high ability to extract visual data information acquired by sight. Trought a learning process, which starts at birth and continues throughout life, image interpretation becomes almost instinctively. At a glance, one can easily describe a scene with reasonable precision, naming its main components. Usually, this is done by extracting low-level features such as edges, shapes and textures, and associanting them to high level meanings. In this way, a semantic description of the scene is done. An example of this, is the human capacity to recognize and describe other people physical and behavioral characteristics, or biometrics. Soft-biometrics also represents inherent characteristics of human body and behaviour, but do not allow unique person identification. Computer vision area aims to develop methods capable of performing visual interpretation with performance similar to humans. This thesis aims to propose computer vison methods which allows high level information extraction from images in the form of soft biometrics. This problem is approached in two ways, unsupervised and supervised learning methods. The first seeks to group images via an automatic feature extraction learning , using both convolution techniques, evolutionary computing and clustering. In this approach employed images contains faces and people. Second approach employs convolutional neural networks, which have the ability to operate on raw images, learning both feature extraction and classification processes. Here, images are classified according to gender and clothes, divided into upper and lower parts of human body. First approach, when tested with different image datasets obtained an accuracy of approximately 80% for faces and non-faces and 70% for people and non-person. The second tested using images and videos, obtained an accuracy of about 70% for gender, 80% to the upper clothes and 90% to lower clothes. The results of these case studies, show that proposed methods are promising, allowing the realization of automatic high level information image annotation. This opens possibilities for development of applications in diverse areas such as content-based image and video search and automatica video survaillance, reducing human effort in the task of manual annotation and monitoring.
Resumo:
This paper describes our semi-automatic keyword based approach for the four topics of Information Extraction from Microblogs Posted during Disasters task at Forum for Information Retrieval Evaluation (FIRE) 2016. The approach consists three phases.