858 resultados para Data pre-processing


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Near infrared spectroscopy (NIRS) combined with multivariate analysis techniques was applied to assess phenol content of European oak. NIRS data were firstly collected directly from solid heartwood surfaces: in doing so, the spectra were recorded separately from the longitudinal radial and the transverse section surfaces by diffuse reflectance. The spectral data were then pretreated by several pre-processing procedures, such as multiplicative scatter correction, first derivative, second derivative and standard normal variate. The tannin contents of sawmill collected from the longitudinal radial and transverse section surfaces were determined by quantitative extraction with water/methanol (1:4, by vol). Then, total phenol contents in tannin extracts were measured by the Folin-Ciocalteu method. The NIR data were correlated against the Folin-Ciocalteu results. Calibration models built with partial least squares regression displayed strong correlation - as expressed by high determination correlation coefficient (r2) and high ratio of performance to deviation (RPD) - between measured and predicted total phenols content, and weak calibration and prediction errors (RMSEC, RMSEP). The best calibration was provided with second derivative spectra (r2 value of 0.93 for the longitudinal radial plane and of 0.91 for the transverse section plane). This study illustrates that the NIRS technique when used in conjunction with multivariate analysis could provide reliable, quick and non-destructive assessment of European oak heartwood extractives.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Solving large-scale all-to-all comparison problems using distributed computing is increasingly significant for various applications. Previous efforts to implement distributed all-to-all comparison frameworks have treated the two phases of data distribution and comparison task scheduling separately. This leads to high storage demands as well as poor data locality for the comparison tasks, thus creating a need to redistribute the data at runtime. Furthermore, most previous methods have been developed for homogeneous computing environments, so their overall performance is degraded even further when they are used in heterogeneous distributed systems. To tackle these challenges, this paper presents a data-aware task scheduling approach for solving all-to-all comparison problems in heterogeneous distributed systems. The approach formulates the requirements for data distribution and comparison task scheduling simultaneously as a constrained optimization problem. Then, metaheuristic data pre-scheduling and dynamic task scheduling strategies are developed along with an algorithmic implementation to solve the problem. The approach provides perfect data locality for all comparison tasks, avoiding rearrangement of data at runtime. It achieves load balancing among heterogeneous computing nodes, thus enhancing the overall computation time. It also reduces data storage requirements across the network. The effectiveness of the approach is demonstrated through experimental studies.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Imaging flow cytometry is an emerging technology that combines the statistical power of flow cytometry with spatial and quantitative morphology of digital microscopy. It allows high-throughput imaging of cells with good spatial resolution, while they are in flow. This paper proposes a general framework for the processing/classification of cells imaged using imaging flow cytometer. Each cell is localized by finding an accurate cell contour. Then, features reflecting cell size, circularity and complexity are extracted for the classification using SVM. Unlike the conventional iterative, semi-automatic segmentation algorithms such as active contour, we propose a noniterative, fully automatic graph-based cell localization. In order to evaluate the performance of the proposed framework, we have successfully classified unstained label-free leukaemia cell-lines MOLT, K562 and HL60 from video streams captured using custom fabricated cost-effective microfluidics-based imaging flow cytometer. The proposed system is a significant development in the direction of building a cost-effective cell analysis platform that would facilitate affordable mass screening camps looking cellular morphology for disease diagnosis. Lay description In this article, we propose a novel framework for processing the raw data generated using microfluidics based imaging flow cytometers. Microfluidics microscopy or microfluidics based imaging flow cytometry (mIFC) is a recent microscopy paradigm, that combines the statistical power of flow cytometry with spatial and quantitative morphology of digital microscopy, which allows us imaging cells while they are in flow. In comparison to the conventional slide-based imaging systems, mIFC is a nascent technology enabling high throughput imaging of cells and is yet to take the form of a clinical diagnostic tool. The proposed framework process the raw data generated by the mIFC systems. The framework incorporates several steps: beginning from pre-processing of the raw video frames to enhance the contents of the cell, localising the cell by a novel, fully automatic, non-iterative graph based algorithm, extraction of different quantitative morphological parameters and subsequent classification of cells. In order to evaluate the performance of the proposed framework, we have successfully classified unstained label-free leukaemia cell-lines MOLT, K562 and HL60 from video streams captured using cost-effective microfluidics based imaging flow cytometer. The cell lines of HL60, K562 and MOLT were obtained from ATCC (American Type Culture Collection) and are separately cultured in the lab. Thus, each culture contains cells from its own category alone and thereby provides the ground truth. Each cell is localised by finding a closed cell contour by defining a directed, weighted graph from the Canny edge images of the cell such that the closed contour lies along the shortest weighted path surrounding the centroid of the cell from a starting point on a good curve segment to an immediate endpoint. Once the cell is localised, morphological features reflecting size, shape and complexity of the cells are extracted and used to develop a support vector machine based classification system. We could classify the cell-lines with good accuracy and the results were quite consistent across different cross validation experiments. We hope that imaging flow cytometers equipped with the proposed framework for image processing would enable cost-effective, automated and reliable disease screening in over-loaded facilities, which cannot afford to hire skilled personnel in large numbers. Such platforms would potentially facilitate screening camps in low income group countries; thereby transforming the current health care paradigms by enabling rapid, automated diagnosis for diseases like cancer.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Malignancies arising in the large bowel cause the second largest number of deaths from cancer in the Western World. Despite progresses made during the last decades, colorectal cancer remains one of the most frequent and deadly neoplasias in the western countries. Methods: A genomic study of human colorectal cancer has been carried out on a total of 31 tumoral samples, corresponding to different stages of the disease, and 33 non-tumoral samples. The study was carried out by hybridisation of the tumour samples against a reference pool of non-tumoral samples using Agilent Human 1A 60- mer oligo microarrays. The results obtained were validated by qRT-PCR. In the subsequent bioinformatics analysis, gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling were built. The consensus among all the induced models produced a hierarchy of dependences and, thus, of variables. Results: After an exhaustive process of pre-processing to ensure data quality–lost values imputation, probes quality, data smoothing and intraclass variability filtering–the final dataset comprised a total of 8, 104 probes. Next, a supervised classification approach and data analysis was carried out to obtain the most relevant genes. Two of them are directly involved in cancer progression and in particular in colorectal cancer. Finally, a supervised classifier was induced to classify new unseen samples. Conclusions: We have developed a tentative model for the diagnosis of colorectal cancer based on a biomarker panel. Our results indicate that the gene profile described herein can discriminate between non-cancerous and cancerous samples with 94.45% accuracy using different supervised classifiers (AUC values in the range of 0.997 and 0.955).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

On the basis of analyzing the principle and realization of geo-steering drilling system, the key technologies and methods in it are systematically studied in this paper. In order to recognize lithology, distinguish stratum and track reservoirs, the techniques of MWD and data process about natural gamma, resistivity, inductive density and porosity are researched. The methods for pre-processing and standardizing MWD data and for converting geological data in directional and horizontal drilling are discussed, consequently the methods of data conversion between MD and TVD and those of formation description and adjacent well contrast are proposed. Researching the method of identifying sub-layer yields the techniques of single well explanation, multi-well evaluation and oil reservoir description. Using the extremum and variance clustering analysis realizes logging phase analysis and stratum subdivision and explanation, which provides a theoretical method and lays a technical basis for tracing oil reservoirs and achieving geo-steering drilling. Researching the technique for exploring the reservoir top with a holdup section provides a planning method of wellpath control scheme to trace oil and gas reservoir dynamically, which solves the problem of how to control well trajectory on condition that the layer’s TVD is uncertain. The control scheme and planning method of well path for meeting the demands of target hitting, soft landing and continuous steering respectively provide the technological guarantee to land safely and drill successfully for horizontal, extended-reach and multi-target wells. The integrative design and control technologies are researched based on geology, reservoir and drilling considering reservoir disclosing ratio as a primary index, and the methods for planning and control optimum wellpath under multi-target restriction, thus which lets the target wellpath lie the favorite position in oil reservoir during the process of geo-steering drilling. The BHA (bottomhole assembly) mechanical model is discussed using the finite element method, and the BHA design methods are given on the basis of mechanical analyses according to the shape of well trajectory and the characteristics of BHA’s structure and deformation. The methods for predicting the deflection rate of bent housing motors and designing their assemblies are proposed based on the principle of minimum potential energy, which can clearly show the relation between the BHA’s structure parameters and deflection rate, especially the key factors’ effect to the deflection rate. Moreover, the interaction model between bit and formation is discussed through the process of equivalent formation and equivalent bit considering the formation anisotropy and bit anisotropy on the basis of analyzing the influence factors of well trajectory. Accordingly, the inherence relationship among well trajectory, formation, bit and drilling direction is revealed, which lays the theory basis and technique for predicting and controlling well trajectory.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Seismic exploration is the main tools of exploration for petroleum. as the society needs more petroleum and the level of exploration is going up, the exploration in the area of complex geology construction is the main task in oil industry, so the seismic prestack depth migration appeared, it has good ability for complex construction imaging. Its result depends on the velocity model strongly. So for seismic prestack depth migration has become the main research area. In this thesis the difference in seismic prestack depth migration between our country and the abroad has been analyzed in system. the tomographical method with no layer velocity model, the residual curve velocity analysical method based on velocity model and the deleting method in pre-processing have been developed. In the thesis, the tomographysical method in velocity analysis is been analyzed at first. It characterized with perfection in theory and diffculity in application. This method use the picked first arrivial, compare the difference between the picked first arrival and the calculated arrival in theory velocity model, and then anti-projected the difference along the ray path to get the new velocity model. This method only has the hypothesis of high frequency, no other hypothesis. So it is very effective and has high efficiency. But this method has default still. The picking of first arrival is difficult in the prestack data. The reasons are the ratio of signal to noise is very low and many other event cross each other in prestack data. These phenomenon appear strongly in the complex geology construction area. Based on these a new tomophysical methos in velocity analysis with no layer velocity model is been developed. The aim is to solve the picking problem. It do not need picking the event time contiunely. You can picking in random depending on the reliability. This methos not only need the pick time as the routine tomographysical mehtod, but also the slope of event. In this methos we use the high slope analysis method to improve the precision of picking. In addition we also make research on the residual curve velocity analysis and find that its application is not good and the efficiency is low. The reasons is that the hypothesis is rigid and it is a local optimizing method, it can solve seismic velocity problem in the area with laterical strong velocity variation. A new method is developed to improve the precision of velocity model building . So far the pattern of seismic prestack depth migration is the same as it aborad. Before the work of velocity building the original seismic data must been corrected on a datum plane, and then to make the prestack depth migration work. As we know the successful example is in Mexico bay. It characterized with the simple surface layer construction, the pre-precessing is very simple and its precision is very high. But in our country the main seismic work is in land, the surface layer is very complex, in some area the error of pre-precessing is big, it affect the velocity building. So based on this a new method is developed to delete the per-precessing error and improve the precision of velocity model building. Our main work is, (1) developing a effective tomographical velocity building method with no layer velocity model. (2) a new high resolution slope analysis method is developed. (3) developing a global optimized residual curve velocity buliding method based on velocity model. (4) a effective method of deleting the pre-precessing error is developing. All the method as listed above has been ceritified by the theorical calculation and the actual seismic data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The dissertation addressed the problems of signals reconstruction and data restoration in seismic data processing, which takes the representation methods of signal as the main clue, and take the seismic information reconstruction (signals separation and trace interpolation) as the core. On the natural bases signal representation, I present the ICA fundamentals, algorithms and its original applications to nature earth quake signals separation and survey seismic signals separation. On determinative bases signal representation, the paper proposed seismic dada reconstruction least square inversion regularization methods, sparseness constraints, pre-conditioned conjugate gradient methods, and their applications to seismic de-convolution, Radon transformation, et. al. The core contents are about de-alias uneven seismic data reconstruction algorithm and its application to seismic interpolation. Although the dissertation discussed two cases of signal representation, they can be integrated into one frame, because they both deal with the signals or information restoration, the former reconstructing original signals from mixed signals, the later reconstructing whole data from sparse or irregular data. The goal of them is same to provide pre-processing methods and post-processing method for seismic pre-stack depth migration. ICA can separate the original signals from mixed signals by them, or abstract the basic structure from analyzed data. I surveyed the fundamental, algorithms and applications of ICA. Compared with KL transformation, I proposed the independent components transformation concept (ICT). On basis of the ne-entropy measurement of independence, I implemented the FastICA and improved it by covariance matrix. By analyzing the characteristics of the seismic signals, I introduced ICA into seismic signal processing firstly in Geophysical community, and implemented the noise separation from seismic signal. Synthetic and real data examples show the usability of ICA to seismic signal processing and initial effects are achieved. The application of ICA to separation quake conversion wave from multiple in sedimentary area is made, which demonstrates good effects, so more reasonable interpretation of underground un-continuity is got. The results show the perspective of application of ICA to Geophysical signal processing. By virtue of the relationship between ICA and Blind Deconvolution , I surveyed the seismic blind deconvolution, and discussed the perspective of applying ICA to seismic blind deconvolution with two possible solutions. The relationship of PC A, ICA and wavelet transform is claimed. It is proved that reconstruction of wavelet prototype functions is Lie group representation. By the way, over-sampled wavelet transform is proposed to enhance the seismic data resolution, which is validated by numerical examples. The key of pre-stack depth migration is the regularization of pre-stack seismic data. As a main procedure, seismic interpolation and missing data reconstruction are necessary. Firstly, I review the seismic imaging methods in order to argue the critical effect of regularization. By review of the seismic interpolation algorithms, I acclaim that de-alias uneven data reconstruction is still a challenge. The fundamental of seismic reconstruction is discussed firstly. Then sparseness constraint on least square inversion and preconditioned conjugate gradient solver are studied and implemented. Choosing constraint item with Cauchy distribution, I programmed PCG algorithm and implement sparse seismic deconvolution, high resolution Radon Transformation by PCG, which is prepared for seismic data reconstruction. About seismic interpolation, dealias even data interpolation and uneven data reconstruction are very good respectively, however they can not be combined each other. In this paper, a novel Fourier transform based method and a algorithm have been proposed, which could reconstruct both uneven and alias seismic data. I formulated band-limited data reconstruction as minimum norm least squares inversion problem where an adaptive DFT-weighted norm regularization term is used. The inverse problem is solved by pre-conditional conjugate gradient method, which makes the solutions stable and convergent quickly. Based on the assumption that seismic data are consisted of finite linear events, from sampling theorem, alias events can be attenuated via LS weight predicted linearly from low frequency. Three application issues are discussed on even gap trace interpolation, uneven gap filling, high frequency trace reconstruction from low frequency data trace constrained by few high frequency traces. Both synthetic and real data numerical examples show the proposed method is valid, efficient and applicable. The research is valuable to seismic data regularization and cross well seismic. To meet 3D shot profile depth migration request for data, schemes must be taken to make the data even and fitting the velocity dataset. The methods of this paper are used to interpolate and extrapolate the shot gathers instead of simply embedding zero traces. So, the aperture of migration is enlarged and the migration effect is improved. The results show the effectiveness and the practicability.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In Circum-Bohai region (112°~124°E, 34°~42°N ), there exists rich gas-petroleum while inner-plate seismic activity is robust. Although the tectonic structure of this region is very complicated, plenty of geological, geophysical and geochemical researches have been carried out.In this paper, guided by the ideas of "One, Two, Three and Many" and "The depth controls the shallow, the regional constrains the local", I fully take advantage of previous results so as to establish a general image of this region. After collecting the arrival-time of P-wave phases of local events and tele-seismic events recorded by the stations within this region from 1966 to 2004, I process all these data and build an initial model. Then, a tomography image of crust and upper-mantle of this region is obtained. With reference to previous results, we compare the image of various depths and five cross-profiles traverse this region along different direction. And finally, a discussion and conclusion is made.The principle contents is listed as below: 1) in the first chapter, the purpose and meaning of this thesis, the advance in seismic tomography, and the research contents and blue-print is stated; 2) in the second chapter, I introduce the regional geological setting of Circum-Bohai region, describe the tectonic and evolutionary characteristics of principle tectonic units, including Bohai Bay Basin, Yanshan Fold Zone, Taihangshan Uplifted Zone, Jiao-Niao Uplifted Zone and Luxi Uplifted Zone, and primary deep faults; 3) In the third chapter, the previous geophysical researches, i.e., gravity and geomagnetic characters, geothermal flow, seismic activity, physical character of rocks, deep seismic sounding, and previous seismic tomography, are discussed; 4) in the fourth chapter, the fundamental theory and approach of seismic tomography is introduced; 5) in the fifth chapter, the technology and approaches used in this thesis, including collecting and pre-processing of data, the establishment of initial velocity model and relocation of all events; 6) in the sixth chapter, I discuss and analyze the tomography image of various depth and five cross-sections; 7)in the seventh chapter, I make a conclusion of the results, state the existing problems and possible solutions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Liu, Yonghuai. Improving ICP with Easy Implementation for Free Form Surface Matching. Pattern Recognition, vol. 37, no. 2, pp. 211-226, 2004.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Liu, Yonghuai. Automatic 3d free form shape matching using the graduated assignment algorithm. Pattern Recognition, vol. 38, no. 10, pp. 1615-1631, 2005.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Increasingly semiconductor manufacturers are exploring opportunities for virtual metrology (VM) enabled process monitoring and control as a means of reducing non-value added metrology and achieving ever more demanding wafer fabrication tolerances. However, developing robust, reliable and interpretable VM models can be very challenging due to the highly correlated input space often associated with the underpinning data sets. A particularly pertinent example is etch rate prediction of plasma etch processes from multichannel optical emission spectroscopy data. This paper proposes a novel input-clustering based forward stepwise regression methodology for VM model building in such highly correlated input spaces. Max Separation Clustering (MSC) is employed as a pre-processing step to identify a reduced srt of well-conditioned, representative variables that can then be used as inputs to state-of-the-art model building techniques such as Forward Selection Regression (FSR), Ridge regression, LASSO and Forward Selection Ridge Regression (FCRR). The methodology is validated on a benchmark semiconductor plasma etch dataset and the results obtained are compared with those achieved when the state-of-art approaches are applied directly to the data without the MSC pre-processing step. Significant performance improvements are observed when MSC is combined with FSR (13%) and FSRR (8.5%), but not with Ridge Regression (-1%) or LASSO (-32%). The optimal VM results are obtained using the MSC-FSR and MSC-FSRR generated models. © 2012 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Masked implementations of cryptographic algorithms are often used in commercial embedded cryptographic devices to increase their resistance to side channel attacks. In this work we show how neural networks can be used to both identify the mask value, and to subsequently identify the secret key value with a single attack trace with high probability. We propose the use of a pre-processing step using principal component analysis (PCA) to significantly increase the success of the attack. We have developed a classifier that can correctly identify the mask for each trace, hence removing the security provided by that mask and reducing the attack to being equivalent to an attack against an unprotected implementation. The attack is performed on the freely available differential power analysis (DPA) contest data set to allow our work to be easily reproducible. We show that neural networks allow for a robust and efficient classification in the context of side-channel attacks.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Cryptographic algorithms have been designed to be computationally secure, however it has been shown that when they are implemented in hardware, that these devices leak side channel information that can be used to mount an attack that recovers the secret encryption key. In this paper an overlapping window power spectral density (PSD) side channel attack, targeting an FPGA device running the Advanced Encryption Standard is proposed. This improves upon previous research into PSD attacks by reducing the amount of pre-processing (effort) required. It is shown that the proposed overlapping window method requires less processing effort than that of using a sliding window approach, whilst overcoming the issues of sampling boundaries. The method is shown to be effective for both aligned and misaligned data sets and is therefore recommended as an improved approach in comparison with existing time domain based correlation attacks.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The present research paper presents five different clustering methods to identify typical load profiles of medium voltage (MV) electricity consumers. These methods are intended to be used in a smart grid environment to extract useful knowledge about customer’s behaviour. The obtained knowledge can be used to support a decision tool, not only for utilities but also for consumers. Load profiles can be used by the utilities to identify the aspects that cause system load peaks and enable the development of specific contracts with their customers. The framework presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partition, which is supported by cluster validity indices. The process ends with the analysis of the discovered knowledge. To validate the proposed framework, a case study with a real database of 208 MV consumers is used.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Relatório do Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações