858 resultados para data pre-processing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il trauma cranico é tra le piú importanti patologie traumatiche. Ogni anno 250 pazienti ogni 100.000 abitanti vengono ricoverati in Italia per un trauma cranico. La mortalitá é di circa 17 casi per 100.000 abitanti per anno. L’Italia si trova in piena “media” Europea considerando l’incidenza media in Europa di 232 casi per 100.000 abitanti ed una mortalitá di 15 casi per 100.000 abitanti. Degli studi hanno indicato come una terapia anticoagulante é uno dei principali fattori di rischio di evolutiviá di una lesione emorragica. Al contrario della terapia anticoagulante, il rischio emorragico correlato ad una terapia antiaggregante é a tutt’oggi ancora in fase di verifica. Il problema risulta rilevante in particolare nella popolazione occidentale in quanto l’impiego degli antiaggreganti é progressivamente sempre piú diffuso. Questo per la politica di prevenzione sostenuta dalle linee guida nazionali e internazionali in termini di prevenzione del rischio cardiovascolare, in particolare nelle fasce di popolazione di etá piú avanzata. Per la prima volta, é stato dimostrato all’ospedale di Forlí[1], su una casistica sufficientemente ampia, che la terapia cronica con antiaggreganti, per la preven- zione del rischio cardiovascolare, puó rivelarsi un significativo fattore di rischio di complicanze emorragiche in un soggetto con trauma cranico, anche di grado lieve. L’ospedale per approfondire e convalidare i risultati della ricerca ha condotto, nell’anno 2009, una nuova indagine. La nuova indagine ha coinvolto oltre l’ospedale di Forlí altri trentuno centri ospedalieri italiani. Questo lavoro di ricerca vuole, insieme ai ricercatori dell’ospedale di Forlí, verificare: “se una terapia con antiaggreganti influenzi l’evolutivitá, in senso peggiorativo, di una lesione emorragica conseguente a trauma cranico lieve - moderato - severo in un soggetto adulto”, grazie ai dati raccolti dai centri ospedalieri nel 2009. Il documento é strutturato in due parti. La prima parte piú teorica, vuole fissare i concetti chiave riguardanti il contesto della ricerca e la metodologia usata per analizzare i dati. Mentre, la seconda parte piú pratica, vuole illustrare il lavoro fatto per rispondere al quesito della ricerca. La prima parte é composta da due capitoli, che sono: • Il capitolo 1: dove sono descritti i seguenti concetti: cos’é un trauma cra- nico, cos’é un farmaco di tipo anticoagulante e cos’é un farmaco di tipo antiaggregante; • Il capitolo 2: dove é descritto cos’é il Data Mining e quali tecniche sono state usate per analizzare i dati. La seconda parte é composta da quattro capitoli, che sono: • Il capitolo 3: dove sono state descritte: la struttura dei dati raccolti dai trentadue centri ospedalieri, la fase di pre-processing e trasformazione dei dati. Inoltre in questo capitolo sono descritti anche gli strumenti utilizzati per analizzare i dati; • Il capitolo 4: dove é stato descritto come é stata eseguita l’analisi esplorativa dei dati. • Il capitolo 5: dove sono descritte le analisi svolte sui dati e soprattutto i risultati che le analisi, grazie alle tecniche di Data Mining, hanno prodotto per rispondere al quesito della ricerca; • Il capitolo 6: dove sono descritte le conclusioni della ricerca. Per una maggiore comprensione del lavoro sono state aggiunte due appendici. La prima tratta del software per data mining Weka, utilizzato per effettuare le analisi. Mentre, la seconda tratta dell’implementazione dei metodi per la creazione degli alberi decisionali.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, the use of Reverse Engineering systems has got a considerable interest for a wide number of applications. Therefore, many research activities are focused on accuracy and precision of the acquired data and post processing phase improvements. In this context, this PhD Thesis deals with the definition of two novel methods for data post processing and data fusion between physical and geometrical information. In particular a technique has been defined for error definition in 3D points’ coordinates acquired by an optical triangulation laser scanner, with the aim to identify adequate correction arrays to apply under different acquisition parameters and operative conditions. Systematic error in data acquired is thus compensated, in order to increase accuracy value. Moreover, the definition of a 3D thermogram is examined. Object geometrical information and its thermal properties, coming from a thermographic inspection, are combined in order to have a temperature value for each recognizable point. Data acquired by an optical triangulation laser scanner are also used to normalize temperature values and make thermal data independent from thermal-camera point of view.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lake water temperature (LWT) is an important driver of lake ecosystems and it has been identified as an indicator of climate change. Consequently, the Global Climate Observing System (GCOS) lists LWT as an essential climate variable. Although for some European lakes long in situ time series of LWT do exist, many lakes are not observed or only on a non-regular basis making these observations insufficient for climate monitoring. Satellite data can provide the information needed. However, only few satellite sensors offer the possibility to analyse time series which cover 25 years or more. The Advanced Very High Resolution Radiometer (AVHRR) is among these and has been flown as a heritage instrument for almost 35 years. It will be carried on for at least ten more years, offering a unique opportunity for satellite-based climate studies. Herein we present a satellite-based lake surface water temperature (LSWT) data set for European water bodies in or near the Alps based on the extensive AVHRR 1 km data record (1989–2013) of the Remote Sensing Research Group at the University of Bern. It has been compiled out of AVHRR/2 (NOAA-07, -09, -11, -14) and AVHRR/3 (NOAA-16, -17, -18, -19 and MetOp-A) data. The high accuracy needed for climate related studies requires careful pre-processing and consideration of the atmospheric state. The LSWT retrieval is based on a simulation-based scheme making use of the Radiative Transfer for TOVS (RTTOV) Version 10 together with ERA-interim reanalysis data from the European Centre for Medium-range Weather Forecasts. The resulting LSWTs were extensively compared with in situ measurements from lakes with various sizes between 14 and 580 km2 and the resulting biases and RMSEs were found to be within the range of −0.5 to 0.6 K and 1.0 to 1.6 K, respectively. The upper limits of the reported errors could be rather attributed to uncertainties in the data comparison between in situ and satellite observations than inaccuracies of the satellite retrieval. An inter-comparison with the standard Moderate-resolution Imaging Spectroradiometer (MODIS) Land Surface Temperature product exhibits RMSEs and biases in the range of 0.6 to 0.9 and −0.5 to 0.2 K, respectively. The cross-platform consistency of the retrieval was found to be within ~ 0.3 K. For one lake, the satellite-derived trend was compared with the trend of in situ measurements and both were found to be similar. Thus, orbital drift is not causing artificial temperature trends in the data set. A comparison with LSWT derived through global sea surface temperature (SST) algorithms shows lower RMSEs and biases for the simulation-based approach. A running project will apply the developed method to retrieve LSWT for all of Europe to derive the climate signal of the last 30 years. The data are available at doi:10.1594/PANGAEA.831007.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A basic requirement of the data acquisition systems used in long pulse fusion experiments is the real time physical events detection in signals. Developing such applications is usually a complex task, so it is necessary to develop a set of hardware and software tools that simplify their implementation. This type of applications can be implemented in ITER using fast controllers. ITER is standardizing the architectures to be used for fast controller implementation. Until now the standards chosen are PXIe architectures (based on PCIe) for the hardware and EPICS middleware for the software. This work presents the methodology for implementing data acquisition and pre-processing using FPGA-based DAQ cards and how to integrate these in fast controllers using EPICS.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La diabetes mellitus es un trastorno en la metabolización de los carbohidratos, caracterizado por la nula o insuficiente segregación de insulina (hormona producida por el páncreas), como resultado del mal funcionamiento de la parte endocrina del páncreas, o de una creciente resistencia del organismo a esta hormona. Esto implica, que tras el proceso digestivo, los alimentos que ingerimos se transforman en otros compuestos químicos más pequeños mediante los tejidos exocrinos. La ausencia o poca efectividad de esta hormona polipéptida, no permite metabolizar los carbohidratos ingeridos provocando dos consecuencias: Aumento de la concentración de glucosa en sangre, ya que las células no pueden metabolizarla; consumo de ácidos grasos mediante el hígado, liberando cuerpos cetónicos para aportar la energía a las células. Esta situación expone al enfermo crónico, a una concentración de glucosa en sangre muy elevada, denominado hiperglucemia, la cual puede producir a medio o largo múltiples problemas médicos: oftalmológicos, renales, cardiovasculares, cerebrovasculares, neurológicos… La diabetes representa un gran problema de salud pública y es la enfermedad más común en los países desarrollados por varios factores como la obesidad, la vida sedentaria, que facilitan la aparición de esta enfermedad. Mediante el presente proyecto trabajaremos con los datos de experimentación clínica de pacientes con diabetes de tipo 1, enfermedad autoinmune en la que son destruidas las células beta del páncreas (productoras de insulina) resultando necesaria la administración de insulina exógena. Dicho esto, el paciente con diabetes tipo 1 deberá seguir un tratamiento con insulina administrada por la vía subcutánea, adaptado a sus necesidades metabólicas y a sus hábitos de vida. Para abordar esta situación de regulación del control metabólico del enfermo, mediante una terapia de insulina, no serviremos del proyecto “Páncreas Endocrino Artificial” (PEA), el cual consta de una bomba de infusión de insulina, un sensor continuo de glucosa, y un algoritmo de control en lazo cerrado. El objetivo principal del PEA es aportar al paciente precisión, eficacia y seguridad en cuanto a la normalización del control glucémico y reducción del riesgo de hipoglucemias. El PEA se instala mediante vía subcutánea, por lo que, el retardo introducido por la acción de la insulina, el retardo de la medida de glucosa, así como los errores introducidos por los sensores continuos de glucosa cuando, se descalibran dificultando el empleo de un algoritmo de control. Llegados a este punto debemos modelar la glucosa del paciente mediante sistemas predictivos. Un modelo, es todo aquel elemento que nos permita predecir el comportamiento de un sistema mediante la introducción de variables de entrada. De este modo lo que conseguimos, es una predicción de los estados futuros en los que se puede encontrar la glucosa del paciente, sirviéndonos de variables de entrada de insulina, ingesta y glucosa ya conocidas, por ser las sucedidas con anterioridad en el tiempo. Cuando empleamos el predictor de glucosa, utilizando parámetros obtenidos en tiempo real, el controlador es capaz de indicar el nivel futuro de la glucosa para la toma de decisones del controlador CL. Los predictores que se están empleando actualmente en el PEA no están funcionando correctamente por la cantidad de información y variables que debe de manejar. Data Mining, también referenciado como Descubrimiento del Conocimiento en Bases de Datos (Knowledge Discovery in Databases o KDD), ha sido definida como el proceso de extracción no trivial de información implícita, previamente desconocida y potencialmente útil. Todo ello, sirviéndonos las siguientes fases del proceso de extracción del conocimiento: selección de datos, pre-procesado, transformación, minería de datos, interpretación de los resultados, evaluación y obtención del conocimiento. Con todo este proceso buscamos generar un único modelo insulina glucosa que se ajuste de forma individual a cada paciente y sea capaz, al mismo tiempo, de predecir los estados futuros glucosa con cálculos en tiempo real, a través de unos parámetros introducidos. Este trabajo busca extraer la información contenida en una base de datos de pacientes diabéticos tipo 1 obtenidos a partir de la experimentación clínica. Para ello emplearemos técnicas de Data Mining. Para la consecución del objetivo implícito a este proyecto hemos procedido a implementar una interfaz gráfica que nos guía a través del proceso del KDD (con información gráfica y estadística) de cada punto del proceso. En lo que respecta a la parte de la minería de datos, nos hemos servido de la denominada herramienta de WEKA, en la que a través de Java controlamos todas sus funciones, para implementarlas por medio del programa creado. Otorgando finalmente, una mayor potencialidad al proyecto con la posibilidad de implementar el servicio de los dispositivos Android por la potencial capacidad de portar el código. Mediante estos dispositivos y lo expuesto en el proyecto se podrían implementar o incluso crear nuevas aplicaciones novedosas y muy útiles para este campo. Como conclusión del proyecto, y tras un exhaustivo análisis de los resultados obtenidos, podemos apreciar como logramos obtener el modelo insulina-glucosa de cada paciente. ABSTRACT. The diabetes mellitus is a metabolic disorder, characterized by the low or none insulin production (a hormone produced by the pancreas), as a result of the malfunctioning of the endocrine pancreas part or by an increasing resistance of the organism to this hormone. This implies that, after the digestive process, the food we consume is transformed into smaller chemical compounds, through the exocrine tissues. The absence or limited effectiveness of this polypeptide hormone, does not allow to metabolize the ingested carbohydrates provoking two consequences: Increase of the glucose concentration in blood, as the cells are unable to metabolize it; fatty acid intake through the liver, releasing ketone bodies to provide energy to the cells. This situation exposes the chronic patient to high blood glucose levels, named hyperglycemia, which may cause in the medium or long term multiple medical problems: ophthalmological, renal, cardiovascular, cerebrum-vascular, neurological … The diabetes represents a great public health problem and is the most common disease in the developed countries, by several factors such as the obesity or sedentary life, which facilitate the appearance of this disease. Through this project we will work with clinical experimentation data of patients with diabetes of type 1, autoimmune disease in which beta cells of the pancreas (producers of insulin) are destroyed resulting necessary the exogenous insulin administration. That said, the patient with diabetes type 1 will have to follow a treatment with insulin, administered by the subcutaneous route, adapted to his metabolic needs and to his life habits. To deal with this situation of metabolic control regulation of the patient, through an insulin therapy, we shall be using the “Endocrine Artificial Pancreas " (PEA), which consists of a bomb of insulin infusion, a constant glucose sensor, and a control algorithm in closed bow. The principal aim of the PEA is providing the patient precision, efficiency and safety regarding the normalization of the glycemic control and hypoglycemia risk reduction". The PEA establishes through subcutaneous route, consequently, the delay introduced by the insulin action, the delay of the glucose measure, as well as the mistakes introduced by the constant glucose sensors when, decalibrate, impede the employment of an algorithm of control. At this stage we must shape the patient glucose levels through predictive systems. A model is all that element or set of elements which will allow us to predict the behavior of a system by introducing input variables. Thus what we obtain, is a prediction of the future stages in which it is possible to find the patient glucose level, being served of input insulin, ingestion and glucose variables already known, for being the ones happened previously in the time. When we use the glucose predictor, using obtained real time parameters, the controller is capable of indicating the future level of the glucose for the decision capture CL controller. The predictors that are being used nowadays in the PEA are not working correctly for the amount of information and variables that it need to handle. Data Mining, also indexed as Knowledge Discovery in Databases or KDD, has been defined as the not trivial extraction process of implicit information, previously unknown and potentially useful. All this, using the following phases of the knowledge extraction process: selection of information, pre- processing, transformation, data mining, results interpretation, evaluation and knowledge acquisition. With all this process we seek to generate the unique insulin glucose model that adjusts individually and in a personalized way for each patient form and being capable, at the same time, of predicting the future conditions with real time calculations, across few input parameters. This project of end of grade seeks to extract the information contained in a database of type 1 diabetics patients, obtained from clinical experimentation. For it, we will use technologies of Data Mining. For the attainment of the aim implicit to this project we have proceeded to implement a graphical interface that will guide us across the process of the KDD (with graphical and statistical information) of every point of the process. Regarding the data mining part, we have been served by a tool called WEKA's tool called, in which across Java, we control all of its functions to implement them by means of the created program. Finally granting a higher potential to the project with the possibility of implementing the service for Android devices, porting the code. Through these devices and what has been exposed in the project they might help or even create new and very useful applications for this field. As a conclusion of the project, and after an exhaustive analysis of the obtained results, we can show how we achieve to obtain the insulin–glucose model for each patient.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

LIDAR (LIght Detection And Ranging) first return elevation data of the Boston, Massachusetts region from MassGIS at 1-meter resolution. This LIDAR data was captured in Spring 2002. LIDAR first return data (which shows the highest ground features, e.g. tree canopy, buildings etc.) can be used to produce a digital terrain model of the Earth's surface. This dataset consists of 74 First Return DEM tiles. The tiles are 4km by 4km areas corresponding with the MassGIS orthoimage index. This data set was collected using 3Di's Digital Airborne Topographic Imaging System II (DATIS II). The area of coverage corresponds to the following MassGIS orthophoto quads covering the Boston region (MassGIS orthophoto quad ID: 229890, 229894, 229898, 229902, 233886, 233890, 233894, 233898, 233902, 233906, 233910, 237890, 237894, 237898, 237902, 237906, 237910, 241890, 241894, 241898, 241902, 245898, 245902). The geographic extent of this dataset is the same as that of the MassGIS dataset: Boston, Massachusetts Region 1:5,000 Color Ortho Imagery (1/2-meter Resolution), 2001 and was used to produce the MassGIS dataset: Boston, Massachusetts, 2-Dimensional Building Footprints with Roof Height Data (from LIDAR data), 2002 [see cross references].

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dataset consists of 2D footprints of the buildings in the metropolitan Boston area, based on tiles in the orthoimage index (orthophoto quad ID: 229890, 229894, 229898, 229902, 233886, 233890, 233894, 233898, 233902, 237890, 237894, 237898, 237902, 241890, 241894, 241898, 241902, 245898, 245902). This data set was collected using 3Di's Digital Airborne Topographic Imaging System II (DATIS II). Roof height and footprint elevation attributes (derived from 1-meter resolution LIDAR (LIght Detection And Ranging) data) are included as part of each building feature. This data can be combined with other datasets to create 3D representations of buildings and the surrounding environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in three-dimensional (313) electron microscopy (EM) and image processing are providing considerable improvements in the resolution of subcellular volumes, macromolecular assemblies and individual proteins. However, the recovery of high-frequency information from biological samples is hindered by specimen sensitivity to beam damage. Low dose electron cryo-microscopy conditions afford reduced beam damage but typically yield images with reduced contrast and low signal-to-noise ratios (SNRs). Here, we describe the properties of a new discriminative bilateral (DBL) filter that is based upon the bilateral filter implementation of Jiang et al. (Jiang, W., Baker, M.L., Wu, Q., Bajaj, C., Chin, W., 2003. Applications of a bilateral denoising filter in biological electron microscopy. J. Struc. Biol. 128, 82-97.). In contrast to the latter, the DBL filter can distinguish between object edges and high-frequency noise pixels through the use of an additional photometric exclusion function. As a result, high frequency noise pixels are smoothed, yet object edge detail is preserved. In the present study, we show that the DBL filter effectively reduces noise in low SNR single particle data as well as cellular tomograms of stained plastic sections. The properties of the DBL filter are discussed in terms of its usefulness for single particle analysis and for pre-processing cellular tomograms ahead of image segmentation. (c) 2006 Elsevier Inc. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Uninhabited aerial vehicles (UAVs) are a cutting-edge technology that is at the forefront of aviation/aerospace research and development worldwide. Many consider their current military and defence applications as just a token of their enormous potential. Unlocking and fully exploiting this potential will see UAVs in a multitude of civilian applications and routinely operating alongside piloted aircraft. The key to realising the full potential of UAVs lies in addressing a host of regulatory, public relation, and technological challenges never encountered be- fore. Aircraft collision avoidance is considered to be one of the most important issues to be addressed, given its safety critical nature. The collision avoidance problem can be roughly organised into three areas: 1) Sense; 2) Detect; and 3) Avoid. Sensing is concerned with obtaining accurate and reliable information about other aircraft in the air; detection involves identifying potential collision threats based on available information; avoidance deals with the formulation and execution of appropriate manoeuvres to maintain safe separation. This thesis tackles the detection aspect of collision avoidance, via the development of a target detection algorithm that is capable of real-time operation onboard a UAV platform. One of the key challenges of the detection problem is the need to provide early warning. This translates to detecting potential threats whilst they are still far away, when their presence is likely to be obscured and hidden by noise. Another important consideration is the choice of sensors to capture target information, which has implications for the design and practical implementation of the detection algorithm. The main contributions of the thesis are: 1) the proposal of a dim target detection algorithm combining image morphology and hidden Markov model (HMM) filtering approaches; 2) the novel use of relative entropy rate (RER) concepts for HMM filter design; 3) the characterisation of algorithm detection performance based on simulated data as well as real in-flight target image data; and 4) the demonstration of the proposed algorithm's capacity for real-time target detection. We also consider the extension of HMM filtering techniques and the application of RER concepts for target heading angle estimation. In this thesis we propose a computer-vision based detection solution, due to the commercial-off-the-shelf (COTS) availability of camera hardware and the hardware's relatively low cost, power, and size requirements. The proposed target detection algorithm adopts a two-stage processing paradigm that begins with an image enhancement pre-processing stage followed by a track-before-detect (TBD) temporal processing stage that has been shown to be effective in dim target detection. We compare the performance of two candidate morphological filters for the image pre-processing stage, and propose a multiple hidden Markov model (MHMM) filter for the TBD temporal processing stage. The role of the morphological pre-processing stage is to exploit the spatial features of potential collision threats, while the MHMM filter serves to exploit the temporal characteristics or dynamics. The problem of optimising our proposed MHMM filter has been examined in detail. Our investigation has produced a novel design process for the MHMM filter that exploits information theory and entropy related concepts. The filter design process is posed as a mini-max optimisation problem based on a joint RER cost criterion. We provide proof that this joint RER cost criterion provides a bound on the conditional mean estimate (CME) performance of our MHMM filter, and this in turn establishes a strong theoretical basis connecting our filter design process to filter performance. Through this connection we can intelligently compare and optimise candidate filter models at the design stage, rather than having to resort to time consuming Monte Carlo simulations to gauge the relative performance of candidate designs. Moreover, the underlying entropy concepts are not constrained to any particular model type. This suggests that the RER concepts established here may be generalised to provide a useful design criterion for multiple model filtering approaches outside the class of HMM filters. In this thesis we also evaluate the performance of our proposed target detection algorithm under realistic operation conditions, and give consideration to the practical deployment of the detection algorithm onboard a UAV platform. Two fixed-wing UAVs were engaged to recreate various collision-course scenarios to capture highly realistic vision (from an onboard camera perspective) of the moments leading up to a collision. Based on this collected data, our proposed detection approach was able to detect targets out to distances ranging from about 400m to 900m. These distances, (with some assumptions about closing speeds and aircraft trajectories) translate to an advanced warning ahead of impact that approaches the 12.5 second response time recommended for human pilots. Furthermore, readily available graphic processing unit (GPU) based hardware is exploited for its parallel computing capabilities to demonstrate the practical feasibility of the proposed target detection algorithm. A prototype hardware-in- the-loop system has been found to be capable of achieving data processing rates sufficient for real-time operation. There is also scope for further improvement in performance through code optimisations. Overall, our proposed image-based target detection algorithm offers UAVs a cost-effective real-time target detection capability that is a step forward in ad- dressing the collision avoidance issue that is currently one of the most significant obstacles preventing widespread civilian applications of uninhabited aircraft. We also highlight that the algorithm development process has led to the discovery of a powerful multiple HMM filtering approach and a novel RER-based multiple filter design process. The utility of our multiple HMM filtering approach and RER concepts, however, extend beyond the target detection problem. This is demonstrated by our application of HMM filters and RER concepts to a heading angle estimation problem.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

MapReduce frameworks such as Hadoop are well suited to handling large sets of data which can be processed separately and independently, with canonical applications in information retrieval and sales record analysis. Rapid advances in sequencing technology have ensured an explosion in the availability of genomic data, with a consequent rise in the importance of large scale comparative genomics, often involving operations and data relationships which deviate from the classical Map Reduce structure. This work examines the application of Hadoop to patterns of this nature, using as our focus a wellestablished workflow for identifying promoters - binding sites for regulatory proteins - Across multiple gene regions and organisms, coupled with the unifying step of assembling these results into a consensus sequence. Our approach demonstrates the utility of Hadoop for problems of this nature, showing how the tyranny of the "dominant decomposition" can be at least partially overcome. It also demonstrates how load balance and the granularity of parallelism can be optimized by pre-processing that splits and reorganizes input files, allowing a wide range of related problems to be brought under the same computational umbrella.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This study aims to assess the accuracy of Digital Elevation Model (DEM) which is generated by using Toutin’s model. Thus, Toutin’s model was run by using OrthoEngineSE of PCI Geomatics 10.3.Thealong-track stereoimages of Advanced Spaceborne Thermal Emission and Reflection radiometer (ASTER) sensor with 15 m resolution were used to produce DEM on an area with low and near Mean Sea Level (MSL) elevation in Johor Malaysia. Despite the satisfactory pre-processing results the visual assessment of the DEM generated from Toutin’s model showed that the DEM contained many outliers and incorrect values. The failure of Toutin’s model may mostly be due to the inaccuracy and insufficiency of ASTER ephemeris data for low terrains as well as huge water body in the stereo images.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Frog protection has become increasingly essential due to the rapid decline of its biodiversity. Therefore, it is valuable to develop new methods for studying this biodiversity. In this paper, a novel feature extraction method is proposed based on perceptual wavelet packet decomposition for classifying frog calls in noisy environments. Pre-processing and syllable segmentation are first applied to the frog call. Then, a spectral peak track is extracted from each syllable if possible. Track duration, dominant frequency and oscillation rate are directly extracted from the track. With k-means clustering algorithm, the calculated dominant frequency of all frog species is clustered into k parts, which produce a frequency scale for wavelet packet decomposition. Based on the adaptive frequency scale, wavelet packet decomposition is applied to the frog calls. Using the wavelet packet decomposition coefficients, a new feature set named perceptual wavelet packet decomposition sub-band cepstral coefficients is extracted. Finally, a k-nearest neighbour (k-NN) classifier is used for the classification. The experiment results show that the proposed features can achieve an average classification accuracy of 97.45% which outperforms syllable features (86.87%) and Mel-frequency cepstral coefficients (MFCCs) feature (90.80%).