893 resultados para Digital data sets
Resumo:
We have benchmarked the maximum obtainable recognition accuracy on five publicly available standard word image data sets using semi-automated segmentation and a commercial OCR. These images have been cropped from camera captured scene images, born digital images (BDI) and street view images. Using the Matlab based tool developed by us, we have annotated at the pixel level more than 3600 word images from the five data sets. The word images binarized by the tool, as well as by our own midline analysis and propagation of segmentation (MAPS) algorithm are recognized using the trial version of Nuance Omnipage OCR and these two results are compared with the best reported in the literature. The benchmark word recognition rates obtained on ICDAR 2003, Sign evaluation, Street view, Born-digital and ICDAR 2011 data sets are 83.9%, 89.3%, 79.6%, 88.5% and 86.7%, respectively. The results obtained from MAPS binarized word images without the use of any lexicon are 64.5% and 71.7% for ICDAR 2003 and 2011 respectively, and these values are higher than the best reported values in the literature of 61.1% and 41.2%, respectively. MAPS results of 82.8% for BDI 2011 dataset matches the performance of the state of the art method based on power law transform.
Resumo:
Nitrogen and water are essential for plant growth and development. In this study, we designed experiments to produce gene expression data of poplar roots under nitrogen starvation and water deprivation conditions. We found low concentration of nitrogen led first to increased root elongation followed by lateral root proliferation and eventually increased root biomass. To identify genes regulating root growth and development under nitrogen starvation and water deprivation, we designed a series of data analysis procedures, through which, we have successfully identified biologically important genes. Differentially Expressed Genes (DEGs) analysis identified the genes that are differentially expressed under nitrogen starvation or drought. Protein domain enrichment analysis identified enriched themes (in same domains) that are highly interactive during the treatment. Gene Ontology (GO) enrichment analysis allowed us to identify biological process changed during nitrogen starvation. Based on the above analyses, we examined the local Gene Regulatory Network (GRN) and identified a number of transcription factors. After testing, one of them is a high hierarchically ranked transcription factor that affects root growth under nitrogen starvation. It is very tedious and time-consuming to analyze gene expression data. To avoid doing analysis manually, we attempt to automate a computational pipeline that now can be used for identification of DEGs and protein domain analysis in a single run. It is implemented in scripts of Perl and R.
Resumo:
This study evaluated the feasibility of documenting patterned injury using three dimensions and true colour photography without complex 3D surface documentation methods. This method is based on a generated 3D surface model using radiologic slice images (CT) while the colour information is derived from photographs taken with commercially available cameras. The external patterned injuries were documented in 16 cases using digital photography as well as highly precise photogrammetry-supported 3D structured light scanning. The internal findings of these deceased were recorded using CT and MRI. For registration of the internal with the external data, two different types of radiographic markers were used and compared. The 3D surface model generated from CT slice images was linked with the photographs, and thereby digital true-colour 3D models of the patterned injuries could be created (Image projection onto CT/IprojeCT). In addition, these external models were merged with the models of the somatic interior. We demonstrated that 3D documentation and visualization of external injury findings by integration of digital photography in CT/MRI data sets is suitable for the 3D documentation of individual patterned injuries to a body. Nevertheless, this documentation method is not a substitution for photogrammetry and surface scanning, especially when the entire bodily surface is to be recorded in three dimensions including all external findings, and when precise data is required for comparing highly detailed injury features with the injury-inflicting tool.
Resumo:
Digital terrain models (DTM) typically contain large numbers of postings, from hundreds of thousands to billions. Many algorithms that run on DTMs require topological knowledge of the postings, such as finding nearest neighbors, finding the posting closest to a chosen location, etc. If the postings are arranged irregu- larly, topological information is costly to compute and to store. This paper offers a practical approach to organizing and searching irregularly-space data sets by presenting a collection of efficient algorithms (O(N),O(lgN)) that compute important topological relationships with only a simple supporting data structure. These relationships include finding the postings within a window, locating the posting nearest a point of interest, finding the neighborhood of postings nearest a point of interest, and ordering the neighborhood counter-clockwise. These algorithms depend only on two sorted arrays of two-element tuples, holding a planimetric coordinate and an integer identification number indicating which posting the coordinate belongs to. There is one array for each planimetric coordinate (eastings and northings). These two arrays cost minimal overhead to create and store but permit the data to remain arranged irregularly.
Resumo:
long-term research on freshwater ecosystems provides insights that can be difficult to obtain from other approaches. Widespread monitoring of ecologically relevant water-quality parameters spanning decades can facilitate important tests of ecological principles. Unique long-term data sets and analytical tools are increasingly available, allowing for powerful and synthetic analyses across sites. long-term measurements or experiments in aquatic systems can catch rare events, changes in highly variable systems, time-lagged responses, cumulative effects of stressors, and biotic responses that encompass multiple generations. Data are available from formal networks, local to international agencies, private organizations, various institutions, and paleontological and historic records; brief literature surveys suggest much existing data are not synthesized. Ecological sciences will benefit from careful maintenance and analyses of existing long-term programs, and subsequent insights can aid in the design of effective future long-term experimental and observational efforts. long-term research on freshwaters is particularly important because of their value to humanity.
Resumo:
Big data is big news in almost every sector including crisis communication. However, not everyone has access to big data and even if we have access to big data, we often do not have necessary tools to analyze and cross reference such a large data set. Therefore this paper looks at patterns in small data sets that we have ability to collect with our current tools to understand if we can find actionable information from what we already have. We have analyzed 164390 tweets collected during 2011 earthquake to find out what type of location specific information people mention in their tweet and when do they talk about that. Based on our analysis we find that even a small data set that has far less data than a big data set can be useful to find priority disaster specific areas quickly.
Resumo:
This paper evaluates the efficiency of a number of popular corpus-based distributional models in performing discovery on very large document sets, including online collections. Literature-based discovery is the process of identifying previously unknown connections from text, often published literature, that could lead to the development of new techniques or technologies. Literature-based discovery has attracted growing research interest ever since Swanson's serendipitous discovery of the therapeutic effects of fish oil on Raynaud's disease in 1986. The successful application of distributional models in automating the identification of indirect associations underpinning literature-based discovery has been heavily demonstrated in the medical domain. However, we wish to investigate the computational complexity of distributional models for literature-based discovery on much larger document collections, as they may provide computationally tractable solutions to tasks including, predicting future disruptive innovations. In this paper we perform a computational complexity analysis on four successful corpus-based distributional models to evaluate their fit for such tasks. Our results indicate that corpus-based distributional models that store their representations in fixed dimensions provide superior efficiency on literature-based discovery tasks.
Resumo:
In this paper we present large, accurately calibrated and time-synchronized data sets, gathered outdoors in controlled and variable environmental conditions, using an unmanned ground vehicle (UGV), equipped with a wide variety of sensors. These include four 2D laser scanners, a radar scanner, a color camera and an infrared camera. It provides a full description of the system used for data collection and the types of environments and conditions in which these data sets have been gathered, which include the presence of airborne dust, smoke and rain.
Resumo:
Analytically or computationally intractable likelihood functions can arise in complex statistical inferential problems making them inaccessible to standard Bayesian inferential methods. Approximate Bayesian computation (ABC) methods address such inferential problems by replacing direct likelihood evaluations with repeated sampling from the model. ABC methods have been predominantly applied to parameter estimation problems and less to model choice problems due to the added difficulty of handling multiple model spaces. The ABC algorithm proposed here addresses model choice problems by extending Fearnhead and Prangle (2012, Journal of the Royal Statistical Society, Series B 74, 1–28) where the posterior mean of the model parameters estimated through regression formed the summary statistics used in the discrepancy measure. An additional stepwise multinomial logistic regression is performed on the model indicator variable in the regression step and the estimated model probabilities are incorporated into the set of summary statistics for model choice purposes. A reversible jump Markov chain Monte Carlo step is also included in the algorithm to increase model diversity for thorough exploration of the model space. This algorithm was applied to a validating example to demonstrate the robustness of the algorithm across a wide range of true model probabilities. Its subsequent use in three pathogen transmission examples of varying complexity illustrates the utility of the algorithm in inferring preference of particular transmission models for the pathogens.
Resumo:
Rapid advances in sequencing technologies (Next Generation Sequencing or NGS) have led to a vast increase in the quantity of bioinformatics data available, with this increasing scale presenting enormous challenges to researchers seeking to identify complex interactions. This paper is concerned with the domain of transcriptional regulation, and the use of visualisation to identify relationships between specific regulatory proteins (the transcription factors or TFs) and their associated target genes (TGs). We present preliminary work from an ongoing study which aims to determine the effectiveness of different visual representations and large scale displays in supporting discovery. Following an iterative process of implementation and evaluation, representations were tested by potential users in the bioinformatics domain to determine their efficacy, and to understand better the range of ad hoc practices among bioinformatics literate users. Results from two rounds of small scale user studies are considered with initial findings suggesting that bioinformaticians require richly detailed views of TF data, features to compare TF layouts between organisms quickly, and ways to keep track of interesting data points.
Resumo:
Background: Plotless density estimators are those that are based on distance measures rather than counts per unit area (quadrats or plots) to estimate the density of some usually stationary event, e.g. burrow openings, damage to plant stems, etc. These estimators typically use distance measures between events and from random points to events to derive an estimate of density. The error and bias of these estimators for the various spatial patterns found in nature have been examined using simulated populations only. In this study we investigated eight plotless density estimators to determine which were robust across a wide range of data sets from fully mapped field sites. They covered a wide range of situations including animal damage to rice and corn, nest locations, active rodent burrows and distribution of plants. Monte Carlo simulations were applied to sample the data sets, and in all cases the error of the estimate (measured as relative root mean square error) was reduced with increasing sample size. The method of calculation and ease of use in the field were also used to judge the usefulness of the estimator. Estimators were evaluated in their original published forms, although the variable area transect (VAT) and ordered distance methods have been the subjects of optimization studies. Results: An estimator that was a compound of three basic distance estimators was found to be robust across all spatial patterns for sample sizes of 25 or greater. The same field methodology can be used either with the basic distance formula or the formula used with the Kendall-Moran estimator in which case a reduction in error may be gained for sample sizes less than 25, however, there is no improvement for larger sample sizes. The variable area transect (VAT) method performed moderately well, is easy to use in the field, and its calculations easy to undertake. Conclusion: Plotless density estimators can provide an estimate of density in situations where it would not be practical to layout a plot or quadrat and can in many cases reduce the workload in the field.
Resumo:
A Radio Frequency (RF) based digital data transmission scheme with 8 channel encoder/decoder ICs is proposed for surface electrode switching of a 16-electrode wireless Electrical Impedance Tomography (EIT) system. A RF based wireless digital data transmission module (WDDTM) is developed and the electrode switching of a EIT system is studied by analyzing the boundary data collected and the resistivity images of practical phantoms. An analog multiplexers based electrode switching module (ESM) is developed with analog multiplexers and switched with parallel digital data transmitted by a wireless transmitter/receiver (T-x/R-x) module working with radio frequency technology. Parallel digital bits are generated using NI USB 6251 card working in LabVIEW platform and sent to transmission module to transmit the digital data to the receiver end. The transmitter/receiver module developed is properly interfaced with the personal computer (PC) and practical phantoms through the ESM and USB based DAQ system respectively. It is observed that the digital bits required for multiplexer operation are sequentially generated by the digital output (D/O) ports of the DAQ card. Parallel to serial and serial to parallel conversion of digital data are suitably done by encoder and decoder ICs. Wireless digital data transmission module successfully transmitted and received the parallel data required for switching the current and voltage electrodes wirelessly. 1 mA, 50 kHz sinusoidal constant current is injected at the phantom boundary using common ground current injection protocol and the boundary potentials developed at the voltage electrodes are measured. Resistivity images of the practical phantoms are reconstructed from boundary data using EIDORS. Boundary data and the resistivity images reconstructed from the surface potentials are studied to assess the wireless digital data transmission system. Boundary data profiles of the practical phantom with different configurations show that the multiplexers are operating in the required sequence for common ground current injection protocol. The voltage peaks obtained at the proper positions in the boundary data profiles proved the sequential operation of multiplexers and successful wireless transmission of digital bits. Reconstructed images and their image parameters proved that the boundary data are successfully acquired by the DAQ system which in turn again indicates a sequential and proper operation of multiplexers as well as the successful wireless transmission of digital bits. Hence the developed RF based wireless digital data transmission module (WDDTM) is found suitable for transmitting digital bits required for electrode switching in wireless EIT data acquisition system. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
We consider an inverse elasticity problem in which forces and displacements are known on the boundary and the material property distribution inside the body is to be found. In other words, we need to estimate the distribution of constitutive properties using the finite boundary data sets. Uniqueness of the solution to this problem is proved in the literature only under certain assumptions for a given complete Dirichlet-to-Neumann map. Another complication in the numerical solution of this problem is that the number of boundary data sets needed to establish uniqueness is not known even under the restricted cases where uniqueness is proved theoretically. In this paper, we present a numerical technique that can assess the sufficiency of given boundary data sets by computing the rank of a sensitivity matrix that arises in the Gauss-Newton method used to solve the problem. Numerical experiments are presented to illustrate the method.
Resumo:
Surface electrode switching of 16-electrode wireless EIT is studied using a Radio Frequency (RF) based digital data transmission technique operating with 8 channel encoder/decoder ICs. An electrode switching module is developed the analog multiplexers and switched with 8-bit parallel digital data transferred by transmitter/receiver module developed with radio frequency technology. 8-bit parallel digital data collected from the receiver module are converted to 16-bit digital data by using binary adder circuits and then used for switching the electrodes in opposite current injection protocol. 8-bit parallel digital data are generated using NI USB 6251 DAQ card in LabVIEW software and sent to the transmission module which transmits the digital data bits to the receiver end. Receiver module supplies the parallel digital bits to the binary adder circuits and adder circuit outputs are fed to the multiplexers of the electrode switching module for surface electrode switching. 1 mA, 50 kHz sinusoidal constant current is injected at the phantom boundary using opposite current injection protocol. The boundary potentials developed at the voltage electrodes are measured and studied to assess the wireless data transmission.