846 resultados para Open Information Extraction
Resumo:
Epoch is defined as the instant of significant excitation within a pitch period of voiced speech. Epoch extraction continues to attract the interest of researchers because of its significance in speech analysis. Existing high performance epoch extraction algorithms require either dynamic programming techniques or a priori information of the average pitch period. An algorithm without such requirements is proposed based on integrated linear prediction residual (ILPR) which resembles the voice source signal. Half wave rectified and negated ILPR (or Hilbert transform of ILPR) is used as the pre-processed signal. A new non-linear temporal measure named the plosion index (PI) has been proposed for detecting `transients' in speech signal. An extension of PI, called the dynamic plosion index (DPI) is applied on pre-processed signal to estimate the epochs. The proposed DPI algorithm is validated using six large databases which provide simultaneous EGG recordings. Creaky and singing voice samples are also analyzed. The algorithm has been tested for its robustness in the presence of additive white and babble noise and on simulated telephone quality speech. The performance of the DPI algorithm is found to be comparable or better than five state-of-the-art techniques for the experiments considered.
Resumo:
In this work, we describe a system, which recognises open vocabulary, isolated, online handwritten Tamil words and extend it to recognize a paragraph of writing. We explain in detail each step involved in the process: segmentation, preprocessing, feature extraction, classification and bigram-based post-processing. On our database of 45,000 handwritten words obtained through tablet PC, we have obtained symbol level accuracy of 78.5% and 85.3% without and with the usage of post-processing using symbol level language models, respectively. Word level accuracies for the same are 40.1% and 59.6%. A line and word level segmentation strategy is proposed, which gives promising results of 100% line segmentation and 98.1% word segmentation accuracies on our initial trials of 40 handwritten paragraphs. The two modules have been combined to obtain a full-fledged page recognition system for online handwritten Tamil data. To the knowledge of the authors, this is the first ever attempt on recognition of open vocabulary, online handwritten paragraphs in any Indian language.
Resumo:
Executive Summary: A number of studies have shown that mobile, bottom-contact fishing gear (such as otter trawls) can alter seafloor habitats and associated biota. Considerably less is known about the recovery of these resources following such disturbances, though this information is critical for successful management. In part, this paucity of information can be attributed to the lack of access to adequate control sites – areas of the seafloor that are closed to fishing activity. Recent closures along the coast of central California provide an excellent opportunity to track the recovery of historically trawled areas and to compare recovery rates to adjacent areas that continue to be trawled. In June 2006 we initiated a multi-year study of the recovery of seafloor microhabitats and associated benthic fauna inside and outside two new Essential Fish Habitat (EFH) closures within the Cordell Bank and Gulf of the Farallones National Marine Sanctuaries. Study sites inside the EFH closure at Cordell Bank were located in historically active areas of fishing effort, which had not been trawled since 2003. Sites outside the EFH closure in the Gulf of Farallones were located in an area that continues to be actively trawled. All sites were located in unconsolidated sands at equivalent water depths. Video and still photographic data collected via a remotely operated vehicle (ROV) were used to quantify the abundance, richness, and diversity of microhabitats and epifaunal macro-invertebrates at recovering and actively trawled sites, while bottom grabs and conductivity/temperature/depth (CTD) casts were used to quantify infaunal diversity and to characterize local environmental conditions. Analysis of still photos found differences in common seafloor microhabitats between the recovering and actively trawled areas, while analysis of videographic data indicated that biogenic mound and biogenic depression microhabitats were significantly less abundant at trawled sites. Each of these features provides structure with which demersal fishes, across a wide range of size classes, have been observed to associate. Epifaunal macro-invertebrates were sparsely distributed and occurred in low numbers in both treatments. However, their total abundance was significantly different between treatments, which was attributable to lower densities at trawled sites. In addition, the dominant taxa were different between the two sites. Patchily-distributed buried brittle stars dominated the recovering site, and sea whips (Halipteris cf. willemoesi) were most numerous at the trawled site though they occurred in only five of ten transects. Numerical classification (cluster analysis) of the infaunal samples also revealed a clear difference between benthic assemblages in the recovering vs. trawled areas due to differences in the relative abundances of component species. There were no major differences in infaunal species richness, H′ diversity, or J′ evenness between recovering vs. trawled site groups. However, total infaunal abundance showed a significant difference attributable to much lower densities at trawled sites. This pattern was driven largely by the small oweniid polychaete Myriochele gracilis, which was the most abundant species in the overall study region though significantly less abundant at trawled sites. Other taxa that were significantly less abundant at trawled sites included the polychaete M. olgae and the polychaete family Terebellidae. In contrast, the thyasirid bivalve Axinopsida serricata and the polychaetes Spiophanes spp. (mostly S. duplex), Prionospio spp., and Scoloplos armiger all had significantly to near significantly higher abundances at trawled sites. As a result of such contrasting species patterns, there also was a significant difference in the overall dominance structure of infaunal assemblages between the two treatments. It is suggested that the observed biological patterns were the result of trawling impacts and varying levels of recovery due to the difference in trawling status between the two areas. The EFH closure was established in June 2006, within a month of when sampling was conducted for the present study, however, the stations within this closure area are at sites that actually have experienced little trawling since 2003, based on National Marine Fishery Service trawl records. Thus, the three-year period would be sufficient time for some post-trawling changes to have occurred. Other results from this study (e.g., similarly moderate numbers of infaunal species in both areas that are lower than values recorded elsewhere in comparable habitats along the California continental shelf) also indicate that recovery within the closure area is not yet complete. Additional sampling is needed to evaluate subsequent recovery trends and persistence of effects. Furthermore, to date, the study has been limited to unconsolidated substrates. Ultimately, the goal of this project is to characterize the recovery trajectories of a wide spectrum of seafloor habitats and communities and to link that recovery to the dynamics of exploited marine fishes. (PDF has 48 pages.)
Resumo:
Authority files serve to uniquely identify real world ‘things’ or entities like documents, persons, organisations, and their properties, like relations and features. Already important in the classical library world, authority files are indispensable for adequate information retrieval and analysis in the computer age. This is because, even more than humans, computers are poor at handling ambiguity. Through authority files, people tell computers which terms, names or numbers refer to the same thing or have the same meaning by giving equivalent notions the same identifier. Thus, authority files signpost the internet where these identifiers are interlinked on the basis of relevance. When executing a query, computers are able to navigate from identifier to identifier by following these links and collect the queried information on these so-called ‘crosswalks’. In this context, identifiers also go under the name controlled access points. Identifiers become even more crucial now massive data collections like library catalogues or research datasets are releasing their till-now contained data directly to the internet. This development is coined Open Linked Data. The concatenating name for the internet is Web of Data instead of the classical Web of Documents.
Resumo:
Dicistroviridae is a new family of small, nonenveloped, and +ssRNA viruses pathogenic to both beneficial arthropods and insect pests as well. Triatoma virus (TrV), a dicistrovirus, is a pathogen of Triatoma infestans (Hemiptera: Reduviidae), one of the main vectors of Chagas disease. In this work, we report a single-step method to identify TrV, a dicistrovirus, isolated from fecal samples of triatomines. The identification method proved to be quite sensitive, even without the extraction and purification of RNA virus.
Resumo:
Hyper-spectral data allows the construction of more robust statistical models to sample the material properties than the standard tri-chromatic color representation. However, because of the large dimensionality and complexity of the hyper-spectral data, the extraction of robust features (image descriptors) is not a trivial issue. Thus, to facilitate efficient feature extraction, decorrelation techniques are commonly applied to reduce the dimensionality of the hyper-spectral data with the aim of generating compact and highly discriminative image descriptors. Current methodologies for data decorrelation such as principal component analysis (PCA), linear discriminant analysis (LDA), wavelet decomposition (WD), or band selection methods require complex and subjective training procedures and in addition the compressed spectral information is not directly related to the physical (spectral) characteristics associated with the analyzed materials. The major objective of this article is to introduce and evaluate a new data decorrelation methodology using an approach that closely emulates the human vision. The proposed data decorrelation scheme has been employed to optimally minimize the amount of redundant information contained in the highly correlated hyper-spectral bands and has been comprehensively evaluated in the context of non-ferrous material classification
Resumo:
Background: Consensus development techniques were used in the late 1980s to create explicit criteria for the appropriateness of cataract extraction. We developed a new appropriateness of indications tool for cataract following the RAND method. We tested the validity of our panel results. Methods: Criteria were developed using a modified Delphi panel judgment process. A panel of 12 ophthalmologists was assembled. Ratings were analyzed regarding the level of agreement among panelists. We studied the influence of all variables on the final panel score using linear and logistic regression models. The explicit criteria developed were summarized by classification and regression tree analysis. Results: Of the 765 indications evaluated by the main panel in the second round, 32.9% were found appropriate, 30.1% uncertain, and 37% inappropriate. Agreement was found in 53% of the indications and disagreement in 0.9%. Seven variables were considered to create the indications and divided into three groups: simple cataract, with diabetic retinopathy, or with other ocular pathologies. The preoperative visual acuity in the cataractous eye and visual function were the variables that best explained the panel scoring. The panel results were synthesized and presented in three decision trees. Misclassification error in the decision trees, as compared with the panel original criteria, was 5.3%. Conclusion: The parameters tested showed acceptable validity for an evaluation tool. These results support the use of this indication algorithm as a screening tool for assessing the appropriateness of cataract extraction in field studies and for the development of practice guidelines.
Resumo:
Storage systems are widely used and have played a crucial rule in both consumer and industrial products, for example, personal computers, data centers, and embedded systems. However, such system suffers from issues of cost, restricted-lifetime, and reliability with the emergence of new systems and devices, such as distributed storage and flash memory, respectively. Information theory, on the other hand, provides fundamental bounds and solutions to fully utilize resources such as data density, information I/O and network bandwidth. This thesis bridges these two topics, and proposes to solve challenges in data storage using a variety of coding techniques, so that storage becomes faster, more affordable, and more reliable.
We consider the system level and study the integration of RAID schemes and distributed storage. Erasure-correcting codes are the basis of the ubiquitous RAID schemes for storage systems, where disks correspond to symbols in the code and are located in a (distributed) network. Specifically, RAID schemes are based on MDS (maximum distance separable) array codes that enable optimal storage and efficient encoding and decoding algorithms. With r redundancy symbols an MDS code can sustain r erasures. For example, consider an MDS code that can correct two erasures. It is clear that when two symbols are erased, one needs to access and transmit all the remaining information to rebuild the erasures. However, an interesting and practical question is: What is the smallest fraction of information that one needs to access and transmit in order to correct a single erasure? In Part I we will show that the lower bound of 1/2 is achievable and that the result can be generalized to codes with arbitrary number of parities and optimal rebuilding.
We consider the device level and study coding and modulation techniques for emerging non-volatile memories such as flash memory. In particular, rank modulation is a novel data representation scheme proposed by Jiang et al. for multi-level flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. It eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells. In order to decrease the decoding complexity, we propose two variations of this scheme in Part II: bounded rank modulation where only small sliding windows of cells are sorted to generated permutations, and partial rank modulation where only part of the n cells are used to represent data. We study limits on the capacity of bounded rank modulation and propose encoding and decoding algorithms. We show that overlaps between windows will increase capacity. We present Gray codes spanning all possible partial-rank states and using only ``push-to-the-top'' operations. These Gray codes turn out to solve an open combinatorial problem called universal cycle, which is a sequence of integers generating all possible partial permutations.
Resumo:
We have investigated the spectra of the electromagnetically induced transparency (EIT) when a cell is filled with a buffer gas. Our theoretical results show that the buffer gas can induce a narrower spectra line and steeper dispersion than those of the usual EIT case in a homogeneous and Doppler broadened system. The linewidth decreases with the increase of the buffer gas pressure. This narrow spectra may be applied to quantum information processing, nonlinear optics and atomic frequency standard.
Resumo:
150 p.
Resumo:
Squids of the family Ommastrephidae are a vital part of marine food webs and support major fisheries around the world. They are widely distributed in the open ocean, where they are among the most abundant in number and biomass of nektonic epipelagic organisms. In turn, seven of the 11 genera of this family (Dosidicus, Illex, Martialia, Nototodarus, Ommastrephes, Sthenoteuthis, and Todarodes) are heavily preyed upon by top marine predators, i.e., birds, mammals, and fish, and currently support fisheries in both neritic and oceanic waters (Roper and Sweeney, 1984; Rodhouse, 1997). Their commercial importance has made the large ommastrephids the target of many scientific investigations and their biology is consequently reasonably well-known (Nigmatullin et al., 2001; Zuyev et al., 2002; Bower and Ichii, 2005). In contrast, much less information is available on the biology and ecological role of the smaller, unexploited species of ommastrephids (e.g., Eucleoteuthis, Hyaloteuthis, Ornithoteuthis, and Todaropsis).
Resumo:
Opengazer is an open source application that uses an ordinary webcam to estimate head pose, facial gestures, or the direction of your gaze. This information can then be passed to other applications. For example, used in conjunction with Dasher, opengazer allows you to write with your eyes. Opengazer aims to be a low-cost software alternative to commercial hardware-based eye trackers. The first version of Opengazer was developed by Piotr Zieliński, supported by Samsung and the Gatsby Charitable Foundation. Research and development for Opengazer has been continued by Emli-Mari Nel, and was supported until 2012 by the European Commission in the context of the AEGIS project, and also by the Gatsby Charitable Foundation.
Resumo:
Most of the manual labor needed to create the geometric building information model (BIM) of an existing facility is spent converting raw point cloud data (PCD) to a BIM description. Automating this process would drastically reduce the modeling cost. Surface extraction from PCD is a fundamental step in this process. Compact modeling of redundant points in PCD as a set of planes leads to smaller file size and fast interactive visualization on cheap hardware. Traditional approaches for smooth surface reconstruction do not explicitly model the sparse scene structure or significantly exploit the redundancy. This paper proposes a method based on sparsity-inducing optimization to address the planar surface extraction problem. Through sparse optimization, points in PCD are segmented according to their embedded linear subspaces. Within each segmented part, plane models can be estimated. Experimental results on a typical noisy PCD demonstrate the effectiveness of the algorithm.
Resumo:
“Dissolved” (< 0.4 μm filtered) and “total dissolvable” (unfiltered) trace element samples were collected using “clean” sampling techniques from four vertical profiles in the eastern Atlantic Ocean on the first IOC Trace Metals Baseline expedition. The analytical results obtained by 9 participating laboratories for Mn, Fe, Co, Ni, Cu, Zn, Cd, Pb, and Se on samples from station 4 in the northeast Atlantic have been evaluated with respect to accuracy and precision (intercomparability). The data variability among the reporting laboratories was expressed as 2 × SD for a given element and depth, and was comparable to the 95% confidence interval reported for the NASS seawater reference standards (representing analytical variability only). The discrepancies between reporting laboratories appear to be due to inaccuracies in standardization (analytical calibration), blank correction, and/or extraction efficiency corrections.Several of the sampling bottles used at this station were not adequately pre-cleaned (anomalous Pb results). The sample filtration process did not appear to have been a source of contamination for either dissolved or particulate trace elements. The trace metal profiles agree in general with previously reported profiles from the Atlantic Ocean. We conclude that the sampling and analytical methods we have employed for this effort, while still in need of improvement, are sufficient for obtaining accurate concentration data on most trace metals in the major water masses of the oceans, and to enable some evaluation of the biogeochemical cycling of the metals.