187 resultados para Preprocessing
Resumo:
This dissertation introduces a new system for handwritten text recognition based on an improved neural network design. Most of the existing neural networks treat mean square error function as the standard error function. The system as proposed in this dissertation utilizes the mean quartic error function, where the third and fourth derivatives are non-zero. Consequently, many improvements on the training methods were achieved. The training results are carefully assessed before and after the update. To evaluate the performance of a training system, there are three essential factors to be considered, and they are from high to low importance priority: (1) error rate on testing set, (2) processing time needed to recognize a segmented character and (3) the total training time and subsequently the total testing time. It is observed that bounded training methods accelerate the training process, while semi-third order training methods, next-minimal training methods, and preprocessing operations reduce the error rate on the testing set. Empirical observations suggest that two combinations of training methods are needed for different case character recognition. Since character segmentation is required for word and sentence recognition, this dissertation provides also an effective rule-based segmentation method, which is different from the conventional adaptive segmentation methods. Dictionary-based correction is utilized to correct mistakes resulting from the recognition and segmentation phases. The integration of the segmentation methods with the handwritten character recognition algorithm yielded an accuracy of 92% for lower case characters and 97% for upper case characters. In the testing phase, the database consists of 20,000 handwritten characters, with 10,000 for each case. The testing phase on the recognition 10,000 handwritten characters required 8.5 seconds in processing time.
Resumo:
Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.
Resumo:
The accurate and reliable estimation of travel time based on point detector data is needed to support Intelligent Transportation System (ITS) applications. It has been found that the quality of travel time estimation is a function of the method used in the estimation and varies for different traffic conditions. In this study, two hybrid on-line travel time estimation models, and their corresponding off-line methods, were developed to achieve better estimation performance under various traffic conditions, including recurrent congestion and incidents. The first model combines the Mid-Point method, which is a speed-based method, with a traffic flow-based method. The second model integrates two speed-based methods: the Mid-Point method and the Minimum Speed method. In both models, the switch between travel time estimation methods is based on the congestion level and queue status automatically identified by clustering analysis. During incident conditions with rapidly changing queue lengths, shock wave analysis-based refinements are applied for on-line estimation to capture the fast queue propagation and recovery. Travel time estimates obtained from existing speed-based methods, traffic flow-based methods, and the models developed were tested using both simulation and real-world data. The results indicate that all tested methods performed at an acceptable level during periods of low congestion. However, their performances vary with an increase in congestion. Comparisons with other estimation methods also show that the developed hybrid models perform well in all cases. Further comparisons between the on-line and off-line travel time estimation methods reveal that off-line methods perform significantly better only during fast-changing congested conditions, such as during incidents. The impacts of major influential factors on the performance of travel time estimation, including data preprocessing procedures, detector errors, detector spacing, frequency of travel time updates to traveler information devices, travel time link length, and posted travel time range, were investigated in this study. The results show that these factors have more significant impacts on the estimation accuracy and reliability under congested conditions than during uncongested conditions. For the incident conditions, the estimation quality improves with the use of a short rolling period for data smoothing, more accurate detector data, and frequent travel time updates.
Resumo:
Modern IT infrastructures are constructed by large scale computing systems and administered by IT service providers. Manually maintaining such large computing systems is costly and inefficient. Service providers often seek automatic or semi-automatic methodologies of detecting and resolving system issues to improve their service quality and efficiency. This dissertation investigates several data-driven approaches for assisting service providers in achieving this goal. The detailed problems studied by these approaches can be categorized into the three aspects in the service workflow: 1) preprocessing raw textual system logs to structural events; 2) refining monitoring configurations for eliminating false positives and false negatives; 3) improving the efficiency of system diagnosis on detected alerts. Solving these problems usually requires a huge amount of domain knowledge about the particular computing systems. The approaches investigated by this dissertation are developed based on event mining algorithms, which are able to automatically derive part of that knowledge from the historical system logs, events and tickets. ^ In particular, two textual clustering algorithms are developed for converting raw textual logs into system events. For refining the monitoring configuration, a rule based alert prediction algorithm is proposed for eliminating false alerts (false positives) without losing any real alert and a textual classification method is applied to identify the missing alerts (false negatives) from manual incident tickets. For system diagnosis, this dissertation presents an efficient algorithm for discovering the temporal dependencies between system events with corresponding time lags, which can help the administrators to determine the redundancies of deployed monitoring situations and dependencies of system components. To improve the efficiency of incident ticket resolving, several KNN-based algorithms that recommend relevant historical tickets with resolutions for incoming tickets are investigated. Finally, this dissertation offers a novel algorithm for searching similar textual event segments over large system logs that assists administrators to locate similar system behaviors in the logs. Extensive empirical evaluation on system logs, events and tickets from real IT infrastructures demonstrates the effectiveness and efficiency of the proposed approaches.^
Resumo:
Valve stiction, or static friction, in control loops is a common problem in modern industrial processes. Recently, many studies have been developed to understand, reproduce and detect such problem, but quantification still remains a challenge. Since the valve position (mv) is normally unknown in an industrial process, the main challenge is to diagnose stiction knowing only the output signals of the process (pv) and the control signal (op). This paper presents an Artificial Neural Network approach in order to detect and quantify the amount of static friction using only the pv and op information. Different methods for preprocessing the training set of the neural network are presented. Those methods are based on the calculation of centroid and Fourier Transform. The proposal is validated using a simulated process and the results show a satisfactory measurement of stiction.
Resumo:
The scatterometer SeaWinds on QuikSCAT provided regular measurements at Ku-band from 1999 to 2009. Although it was designed for ocean applications, it has been frequently used for the assessment of seasonal snowmelt patterns aside from other terrestrial applications such as ice cap monitoring, phenology and urban mapping. This paper discusses general data characteristics of SeaWinds and reviews relevant change detection algorithms. Depending on the complexity of the method, parameters such as long-term noise and multiple event analyses were incorporated. Temporal averaging is a commonly accepted preprocessing step with consideration of diurnal, multi-day or seasonal averages.
Resumo:
In recent years, depth cameras have been widely utilized in camera tracking for augmented and mixed reality. Many of the studies focus on the methods that generate the reference model simultaneously with the tracking and allow operation in unprepared environments. However, methods that rely on predefined CAD models have their advantages. In such methods, the measurement errors are not accumulated to the model, they are tolerant to inaccurate initialization, and the tracking is always performed directly in reference model's coordinate system. In this paper, we present a method for tracking a depth camera with existing CAD models and the Iterative Closest Point (ICP) algorithm. In our approach, we render the CAD model using the latest pose estimate and construct a point cloud from the corresponding depth map. We construct another point cloud from currently captured depth frame, and find the incremental change in the camera pose by aligning the point clouds. We utilize a GPGPU-based implementation of the ICP which efficiently uses all the depth data in the process. The method runs in real-time, it is robust for outliers, and it does not require any preprocessing of the CAD models. We evaluated the approach using the Kinect depth sensor, and compared the results to a 2D edge-based method, to a depth-based SLAM method, and to the ground truth. The results show that the approach is more stable compared to the edge-based method and it suffers less from drift compared to the depth-based SLAM.
Resumo:
We evaluate the integration of 3D preoperative computed tomography angiography of the coronary arteries with intraoperative 2D X-ray angiographies by a recently proposed novel registration-by-regression method. The method relates image features of 2D projection images to the transformation parameters of the 3D image. We compared different sets of features and studied the influence of preprocessing the training set. For the registration evaluation, a gold standard was developed from eight X-ray angiography sequences from six different patients. The alignment quality was measured using the 3D mean target registration error (mTRE). The registration-by-regression method achieved moderate accuracy (median mTRE of 15 mm) on real images. It does therefore not provide yet a complete solution to the 3D–2D registration problem but it could be used as an initialisation method to eliminate the need for manual initialisation.
Resumo:
Este trabalho propõe um estudo de sinais cerebrais aplicados em sistemas BCI (Brain-Computer Interface - Interfaces Cérebro Computador), através do uso de Árvores de Decisão e da análise dessas árvores com base nas Neurociências. Para realizar o tratamento dos dados são necessárias 5 fases: aquisição de dados, pré-processamento, extração de características, classificação e validação. Neste trabalho, todas as fases são contempladas. Contudo, enfatiza-se as fases de classificação e de validação. Na classificação utiliza-se a técnica de Inteligência Artificial denominada Árvores de Decisão. Essa técnica é reconhecida na literatura como uma das formas mais simples e bem sucedidas de algoritmos de aprendizagem. Já a fase de validação é realizada nos estudos baseados na Neurociência, que é um conjunto das disciplinas que estudam o sistema nervoso, sua estrutura, seu desenvolvimento, funcionamento, evolução, relação com o comportamento e a mente, e também suas alterações. Os resultados obtidos neste trabalho são promissores, mesmo sendo iniciais, visto que podem melhor explicar, com a utilização de uma forma automática, alguns processos cerebrais.
Resumo:
Melanoma is a type of skin cancer and is caused by the uncontrolled growth of atypical melanocytes. In recent decades, computer aided diagnosis is used to support medical professionals; however, there is still no globally accepted tool. In this context, similar to state-of-the-art we propose a system that receives a dermatoscopy image and provides a diagnostic if the lesion is benign or malignant. This tool is composed with next modules: Preprocessing, Segmentation, Feature Extraction, and Classification. Preprocessing involves the removal of hairs. Segmentation is to isolate the lesion. Feature extraction is considering the ABCD dermoscopy rule. The classification is performed by the Support Vector Machine. Experimental evidence indicates that the proposal has 90.63 % accuracy, 95 % sensitivity, and 83.33 % specificity on a data-set of 104 dermatoscopy images. These results are favorable considering the performance of diagnosis by traditional progress in the area of dermatology
Resumo:
Esta tesis versa sobre el an álisis de la forma de objetos 2D. En visión articial existen numerosos aspectos de los que se pueden extraer información. Uno de los más usados es la forma o el contorno de esos objetos. Esta característica visual de los objetos nos permite, mediante el procesamiento adecuado, extraer información de los objetos, analizar escenas, etc. No obstante el contorno o silueta de los objetos contiene información redundante. Este exceso de datos que no aporta nuevo conocimiento debe ser eliminado, con el objeto de agilizar el procesamiento posterior o de minimizar el tamaño de la representación de ese contorno, para su almacenamiento o transmisión. Esta reducción de datos debe realizarse sin que se produzca una pérdida de información importante para representación del contorno original. Se puede obtener una versión reducida de un contorno eliminando puntos intermedios y uniendo los puntos restantes mediante segmentos. Esta representación reducida de un contorno se conoce como aproximación poligonal. Estas aproximaciones poligonales de contornos representan, por tanto, una versión comprimida de la información original. El principal uso de las mismas es la reducción del volumen de información necesario para representar el contorno de un objeto. No obstante, en los últimos años estas aproximaciones han sido usadas para el reconocimiento de objetos. Para ello los algoritmos de aproximaci ón poligonal se han usado directamente para la extracci ón de los vectores de caracter ísticas empleados en la fase de aprendizaje. Las contribuciones realizadas por tanto en esta tesis se han centrado en diversos aspectos de las aproximaciones poligonales. En la primera contribución se han mejorado varios algoritmos de aproximaciones poligonales, mediante el uso de una fase de preprocesado que acelera estos algoritmos permitiendo incluso mejorar la calidad de las soluciones en un menor tiempo. En la segunda contribución se ha propuesto un nuevo algoritmo de aproximaciones poligonales que obtiene soluciones optimas en un menor espacio de tiempo que el resto de métodos que aparecen en la literatura. En la tercera contribución se ha propuesto un algoritmo de aproximaciones que es capaz de obtener la solución óptima en pocas iteraciones en la mayor parte de los casos. Por último, se ha propuesto una versi ón mejorada del algoritmo óptimo para obtener aproximaciones poligonales que soluciona otro problema de optimización alternativo.
Resumo:
Given a 2manifold triangular mesh \(M \subset {\mathbb {R}}^3\), with border, a parameterization of \(M\) is a FACE or trimmed surface \(F=\{S,L_0,\ldots, L_m\}\) -- \(F\) is a connected subset or region of a parametric surface \(S\), bounded by a set of LOOPs \(L_0,\ldots ,L_m\) such that each \(L_i \subset S\) is a closed 1manifold having no intersection with the other \(L_j\) LOOPs -- The parametric surface \(S\) is a statistical fit of the mesh \(M\) -- \(L_0\) is the outermost LOOP bounding \(F\) and \(L_i\) is the LOOP of the ith hole in \(F\) (if any) -- The problem of parameterizing triangular meshes is relevant for reverse engineering, tool path planning, feature detection, redesign, etc -- Stateofart mesh procedures parameterize a rectangular mesh \(M\) -- To improve such procedures, we report here the implementation of an algorithm which parameterizes meshes \(M\) presenting holes and concavities -- We synthesize a parametric surface \(S \subset {\mathbb {R}}^3\) which approximates a superset of the mesh \(M\) -- Then, we compute a set of LOOPs trimming \(S\), and therefore completing the FACE \(F=\ {S,L_0,\ldots ,L_m\}\) -- Our algorithm gives satisfactory results for \(M\) having low Gaussian curvature (i.e., \(M\) being quasi-developable or developable) -- This assumption is a reasonable one, since \(M\) is the product of manifold segmentation preprocessing -- Our algorithm computes: (1) a manifold learning mapping \(\phi : M \rightarrow U \subset {\mathbb {R}}^2\), (2) an inverse mapping \(S: W \subset {\mathbb {R}}^2 \rightarrow {\mathbb {R}}^3\), with \ (W\) being a rectangular grid containing and surpassing \(U\) -- To compute \(\phi\) we test IsoMap, Laplacian Eigenmaps and Hessian local linear embedding (best results with HLLE) -- For the back mapping (NURBS) \(S\) the crucial step is to find a control polyhedron \(P\), which is an extrapolation of \(M\) -- We calculate \(P\) by extrapolating radial basis functions that interpolate points inside \(\phi (M)\) -- We successfully test our implementation with several datasets presenting concavities, holes, and are extremely nondevelopable -- Ongoing work is being devoted to manifold segmentation which facilitates mesh parameterization
Resumo:
As one of the newest members in the field of articial immune systems (AIS), the Dendritic Cell Algorithm (DCA) is based on behavioural models of natural dendritic cells (DCs). Unlike other AIS, the DCA does not rely on training data, instead domain or expert knowledge is required to predetermine the mapping between input signals from a particular instance to the three categories used by the DCA. This data preprocessing phase has received the criticism of having manually over-fitted the data to the algorithm, which is undesirable. Therefore, in this paper we have attempted to ascertain if it is possible to use principal component analysis (PCA) techniques to automatically categorise input data while still generating useful and accurate classication results. The integrated system is tested with a biometrics dataset for the stress recognition of automobile drivers. The experimental results have shown the application of PCA to the DCA for the purpose of automated data preprocessing is successful.
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2016
Resumo:
Espécies forrageiras adaptadas às condições semiáridas são uma alternativa para reduzir os impactos negativos na cadeia produtiva de ruminantes da região Nordeste brasileira devido à sazonalidade na oferta de forragem, além de reduzir custo com o fornecimento de alimentos concentrados. Dentre as espécies, a vagem de algaroba (Prosopis juliflora SW D.C.) e palma forrageira (Opuntia e Nopalea) ganham destaque por tolerarem o déficit hídrico e produzirem em períodos onde a oferta de forragem está reduzida, além de apresentam bom valor nutricional e serem bem aceitas pelos animais. Porém, devido à variação na sua composição, seu uso na alimentação animal exige o conhecimento profundo da sua composição para a elaboração de dietas balanceadas. No entanto, devido ao custo e tempo para análise, os produtores não fazem uso da prática de análise da composição químico-bromatológica dos alimentos. Por isto, a espectroscopia de reflectância no infravermelho próximo (NIRS) representa uma importante alternativa aos métodos tradicionais. Objetivou-se com este estudo desenvolver e validar modelos de predição da composição bromatológica de vagem de algaroba e palma forrageira baseados em espectroscopia NIRS, escaneadas em dois modelos de equipamentos e com diferentes processamentos da amostra. Foram coletadas amostras de vagem de algaroba nos estados do Ceará, Bahia, Paraíba e Pernambuco, e amostras de palma forrageira nos estados do Ceará, Paraíba e Pernambuco, frescas (in natura) ou pré-secas e moídas. Para obtenção dos espectros utilizaram-se dois equipamentos NIR, Perten DA 7250 e FOSS 5000. Inicialmente os alimentos foram escaneados in natura em aparelho do modelo Perten, e, com o auxílio do software The Unscrambler 10.2 foi selecionado um grupo de amostras para o banco de calibração. As amostras selecionadas foram secas e moídas, e escaneadas novamente em equipamentos Perten e FOSS. Os valores dos parâmetros de referência foram obtidos por meio de metodologias tradicionalmente aplicadas em laboratório de nutrição animal para matéria seca (MS), matéria mineral (MM), matéria orgânica (MO), proteína bruta (PB), estrato etéreo (EE), fibra solúvel em detergente neutro (FDN), fibra solúvel em detergente ácido (FDA), hemicelulose (HEM) e digestibilidade in vitro da matéria seca (DIVMS). O desempenho dos modelos foi avaliado de acordo com os erros médios de calibração (RMSEC) e validação (RMSECV), coeficiente de determinação (R2 ) e da relação de desempenho de desvio dos modelos (RPD). A análise exploratória dos dados, por meio de tratamentos espectrais e análise de componentes principais (PCA), demonstraram que os bancos de dados eram similares entre si, dando segurança de desenvolver os modelos com todas as amostras selecionadas em um único modelo para cada alimento, algaroba e palma. Na avaliação dos resultados de referência, observou-se que a variação dos resultados para cada parâmetro corroboraram com os descritos na literatura. No desempenho dos modelos, aqueles desenvolvidos com pré-processamento da amostra (pré-secagem e moagem) se mostraram mais robustos do que aqueles construídos com amostras in natura. O aparelho NIRS Perten apresentou desempenho semelhante ao equipamento FOSS, apesar desse último cobrir uma faixa espectral maior e com intervalos de leituras menores. A técnica NIR, associada ao método de calibração multivariada de regressão por meio de quadrados mínimos (PLS), mostrou-se confiável para prever a composição químico-bromatológica de vagem de algaroba e da palma forrageira. Abstract: Forage species adapted to semi-arid conditions are an alternative to reduce the negative impacts in the feed supply for ruminants in the Brazilian Northeast region, due to seasonality in forage availability, as well as in the reducing of cost by providing concentrated feedstuffs. Among the species, mesquite pods (Prosopis juliflora SW DC) and spineless cactus (Opuntia and Nopalea) are highlighted for tolerating the drought and producion in periods where the forage is scarce, and have high nutritional value and also are well accepted by the animals. However, its use in animal diets requires a knowledge about its composition to prepare balanced diets. However, farmers usually do not use feed composition analysis, because their high cost and time-consuming. Thus, the Near Infrared Reflectance Spectroscopy in the (NIRS) is an important alternative to traditional methods. The objective of this study to develop and validate predictive models of the chemical composition of mesquite pods and spineless cactus-based NIRS spectroscopy, scanned in two different spectrometers and sample processing. Mesquite pods samples were collected in the states of Ceará, Bahia, Paraiba and Pernambuco, and samples of forage cactus in the states of Ceará, Paraíba and Pernambuco. In order to obtain the spectra, it was used two NIR equipment: Perten DA 7250 and FOSS 5000. sSpectra of samples were initially obtained fresh (as received) using Perten instrument, and with The Unscrambler software 10.2, a group of subsamples was selected to model development, keeping out redundant ones. The selected samples were dried and ground, and scanned again in both Perten and FOSS instruments. The values of the reference analysis were obtained by methods traditionally applied in animal nutrition laboratory to dry matter (DM), mineral matter (MM), organic matter (OM), crude protein (CP), ether extract (EE), soluble neutral detergent fiber (NDF), soluble acid detergent fiber (ADF), hemicellulose ( HEM) and in vitro digestibility of dry matter (DIVDM). The performance of the models was evaluated according to the Root Mean Square Error of Calibration (RMSEC) and cross-validation (RMSECV), coefficient of determination (R2 ) and the deviation of Ratio of performance Deviation of the models (RPD). Exploratory data analysis through spectral treatments and principal component analysis (PCA), showed that the databases were similar to each other, and may be treated asa single model for each feed - mesquite pods and cactus. Evaluating the reference results, it was observed that the variation were similar to those reported in the literature. Comparing the preprocessing of samples, the performance ofthose developed with preprocessing (dried and ground) of the sample were more robust than those built with fresh samples. The NIRS Perten device performance similar to FOSS equipment, although the latter cover a larger spectral range and with lower readings intervals. NIR technology associate do multivariate techniques is reliable to predict the bromatological composition of mesquite pods and cactus.