926 resultados para Data Pre-Processing and Performance Evaluation
Resumo:
Learning Disability (LD) is a general term that describes specific kinds of learning problems. It is a neurological condition that affects a child's brain and impairs his ability to carry out one or many specific tasks. The learning disabled children are neither slow nor mentally retarded. This disorder can make it problematic for a child to learn as quickly or in the same way as some child who isn't affected by a learning disability. An affected child can have normal or above average intelligence. They may have difficulty paying attention, with reading or letter recognition, or with mathematics. It does not mean that children who have learning disabilities are less intelligent. In fact, many children who have learning disabilities are more intelligent than an average child. Learning disabilities vary from child to child. One child with LD may not have the same kind of learning problems as another child with LD. There is no cure for learning disabilities and they are life-long. However, children with LD can be high achievers and can be taught ways to get around the learning disability. In this research work, data mining using machine learning techniques are used to analyze the symptoms of LD, establish interrelationships between them and evaluate the relative importance of these symptoms. To increase the diagnostic accuracy of learning disability prediction, a knowledge based tool based on statistical machine learning or data mining techniques, with high accuracy,according to the knowledge obtained from the clinical information, is proposed. The basic idea of the developed knowledge based tool is to increase the accuracy of the learning disability assessment and reduce the time used for the same. Different statistical machine learning techniques in data mining are used in the study. Identifying the important parameters of LD prediction using the data mining techniques, identifying the hidden relationship between the symptoms of LD and estimating the relative significance of each symptoms of LD are also the parts of the objectives of this research work. The developed tool has many advantages compared to the traditional methods of using check lists in determination of learning disabilities. For improving the performance of various classifiers, we developed some preprocessing methods for the LD prediction system. A new system based on fuzzy and rough set models are also developed for LD prediction. Here also the importance of pre-processing is studied. A Graphical User Interface (GUI) is designed for developing an integrated knowledge based tool for prediction of LD as well as its degree. The designed tool stores the details of the children in the student database and retrieves their LD report as and when required. The present study undoubtedly proves the effectiveness of the tool developed based on various machine learning techniques. It also identifies the important parameters of LD and accurately predicts the learning disability in school age children. This thesis makes several major contributions in technical, general and social areas. The results are found very beneficial to the parents, teachers and the institutions. They are able to diagnose the child’s problem at an early stage and can go for the proper treatments/counseling at the correct time so as to avoid the academic and social losses.
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do Grau de Mestre em Engenharia Electrotécnica e de Computadores
Resumo:
The region of greatest variability on soil maps is along the edge of their polygons, causing disagreement among pedologists about the appropriate description of soil classes at these locations. The objective of this work was to propose a strategy for data pre-processing applied to digital soil mapping (DSM). Soil polygons on a training map were shrunk by 100 and 160 m. This strategy prevented the use of covariates located near the edge of the soil classes for the Decision Tree (DT) models. Three DT models derived from eight predictive covariates, related to relief and organism factors sampled on the original polygons of a soil map and on polygons shrunk by 100 and 160 m were used to predict soil classes. The DT model derived from observations 160 m away from the edge of the polygons on the original map is less complex and has a better predictive performance.
Resumo:
Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.
Resumo:
This is the first part of a study investigating a model-based transient calibration process for diesel engines. The motivation is to populate hundreds of parameters (which can be calibrated) in a methodical and optimum manner by using model-based optimization in conjunction with the manual process so that, relative to the manual process used by itself, a significant improvement in transient emissions and fuel consumption and a sizable reduction in calibration time and test cell requirements is achieved. Empirical transient modelling and optimization has been addressed in the second part of this work, while the required data for model training and generalization are the focus of the current work. Transient and steady-state data from a turbocharged multicylinder diesel engine have been examined from a model training perspective. A single-cylinder engine with external air-handling has been used to expand the steady-state data to encompass transient parameter space. Based on comparative model performance and differences in the non-parametric space, primarily driven by a high engine difference between exhaust and intake manifold pressures (ΔP) during transients, it has been recommended that transient emission models should be trained with transient training data. It has been shown that electronic control module (ECM) estimates of transient charge flow and the exhaust gas recirculation (EGR) fraction cannot be accurate at the high engine ΔP frequently encountered during transient operation, and that such estimates do not account for cylinder-to-cylinder variation. The effects of high engine ΔP must therefore be incorporated empirically by using transient data generated from a spectrum of transient calibrations. Specific recommendations on how to choose such calibrations, how many data to acquire, and how to specify transient segments for data acquisition have been made. Methods to process transient data to account for transport delays and sensor lags have been developed. The processed data have then been visualized using statistical means to understand transient emission formation. Two modes of transient opacity formation have been observed and described. The first mode is driven by high engine ΔP and low fresh air flowrates, while the second mode is driven by high engine ΔP and high EGR flowrates. The EGR fraction is inaccurately estimated at both modes, while EGR distribution has been shown to be present but unaccounted for by the ECM. The two modes and associated phenomena are essential to understanding why transient emission models are calibration dependent and furthermore how to choose training data that will result in good model generalization.
Resumo:
Non-technical losses (NTL) identification and prediction are important tasks for many utilities. Data from customer information system (CIS) can be used for NTL analysis. However, in order to accurately and efficiently perform NTL analysis, the original data from CIS need to be pre-processed before any detailed NTL analysis can be carried out. In this paper, we propose a feature selection based method for CIS data pre-processing in order to extract the most relevant information for further analysis such as clustering and classifications. By removing irrelevant and redundant features, feature selection is an essential step in data mining process in finding optimal subset of features to improve the quality of result by giving faster time processing, higher accuracy and simpler results with fewer features. Detailed feature selection analysis is presented in the paper. Both time-domain and load shape data are compared based on the accuracy, consistency and statistical dependencies between features.
Resumo:
The aim of this work was to investigate alternative safe and effective permeation enhancers for buccal peptide delivery. Basic amino acids improved insulin solubility in water while 200 and 400 µg/mL lysine significantly increased insulin solubility in HBSS. Permeability data showed a significant improvement in insulin permeation especially for 10 µg/mL of lysine (p < 0.05) and 10 µg/mL histidine (p < 0.001), 100 µg/mL of glutamic acid (p < 0.05) and 200 µg/mL of glutamic acid and aspartic acid (p < 0.001) without affecting cell integrity; in contrast to sodium deoxycholate which enhanced insulin permeability but was toxic to the cells. It was hypothesized that both amino acids and insulin were ionised at buccal cavity pH and able to form stable ion pairs which penetrated the cells as one entity; while possibly triggering amino acid nutrient transporters on cell surfaces. Evidence of these transport mechanisms was seen with reduction of insulin transport at suboptimal temperatures as well as with basal-to-apical vectoral transport, and confocal imaging of transcellular insulin transport. These results obtained for insulin is the first indication of a possible amino acid mediated transport of insulin via formation of insulin-amino acid neutral complexes by the ion pairing mechanism.
Resumo:
Porous ceramic samples were prepared from aqueous foam incorporated alumina suspension for application as hot aerosol filtering membrane. The procedure for establishment of membrane features required to maintain a desired flow condition was theoretically described and experimental work was designed to prepare ceramic membranes to meet the predicted criteria. Two best membranes, thus prepared, were selected for permeability tests up to 700 degrees C and their total and fractional collection efficiencies were experimentally evaluated. Reasonably good performance was achieved at room temperature, while at 700 degrees C, increased permeability was obtained with significant reduction in collection efficiency, which was explained by a combination of thermal expansion of the structure and changes in the gas properties. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
In general, modern networks are analysed by taking several Key Performance Indicators (KPIs) into account, their proper balance being required in order to guarantee a desired Quality of Service (QoS), particularly, cellular wireless heterogeneous networks. A model to integrate a set of KPIs into a single one is presented, by using a Cost Function that includes these KPIs, providing for each network node a single evaluation parameter as output, and reflecting network conditions and common radio resource management strategies performance. The proposed model enables the implementation of different network management policies, by manipulating KPIs according to users' or operators' perspectives, allowing for a better QoS. Results show that different policies can in fact be established, with a different impact on the network, e.g., with median values ranging by a factor higher than two.
Resumo:
False identity documents constitute a potential powerful source of forensic intelligence because they are essential elements of transnational crime and provide cover for organized crime. In previous work, a systematic profiling method using false documents' visual features has been built within a forensic intelligence model. In the current study, the comparison process and metrics lying at the heart of this profiling method are described and evaluated. This evaluation takes advantage of 347 false identity documents of four different types seized in two countries whose sources were known to be common or different (following police investigations and dismantling of counterfeit factories). Intra-source and inter-sources variations were evaluated through the computation of more than 7500 similarity scores. The profiling method could thus be validated and its performance assessed using two complementary approaches to measuring type I and type II error rates: a binary classification and the computation of likelihood ratios. Very low error rates were measured across the four document types, demonstrating the validity and robustness of the method to link documents to a common source or to differentiate them. These results pave the way for an operational implementation of a systematic profiling process integrated in a developed forensic intelligence model.
Resumo:
Tutkielmantavoitteena oli luoda ohjeistus toimittajan valinnasta ja suorituskyvyn arvioinnista case - yrityksen, Exel Oyj:n, käyttöön. Ohjeistuksen tarkoituksena oli ollalähtökohtana toimittajan valinta- ja suoristuskyvyn arviointiprosessien kehittämisessä. Tutkielma keskittyy esittelemään toimittajan valintakriteereitä ja toimittajan suorituskyvyn arviointikriteereitä. Kriteerit valittiin ja analysoitiin teorian ja empirian avulla ja kriteereistä tehtiin selkeät listaukset. Näitä listoja käytettiin avuksi pohdittaessa uusia valintakriteereitä ja suorituskyvyn arviointikriteereitä, joita case -yritys voi jatkossa käyttää. Tutkielmassa käytiin läpi myös toimittajan valintaprosessi jaapuvälineitä ja mittareita toimittajan arviointiin liittyen. Empiirisen aineiston keruu toteutettiin haastattelemalla hankintapäällikköä sekä keräämällä tietoavuosikertomuksesta ja yrityksen internet sivuilta. Tutkielman tuloksena saatiinlistauksia kriteereistä, joita yritys voi hyödyntää jatkossa sekä listaukset kriteereistä, jotka valittiin alustavasti yrityksen käyttöön.
Resumo:
The hydrodynamic characterization and the performance evaluation of an aerobic three phase fluidized bed reactor in wastewater fish culture treatment are presented in this report. The objective of this study was to evaluate the organic matter, nitrogen and phosphorous removal efficiency in a physical and biological wastewater treatment system of an intensive Nile Tilapia laboratory production with recirculation. The treatment system comprised of a conventional sedimentation basin operated at a hydraulic detention time HDT of 2.94 h and an aerobic three phase airlift fluidized bed reactor AAFBR operated at an 11.9 min HDT. Granular activated carbon was used as support media with density of 1.64 g/cm(3) and effective size of 0.34 mm in an 80 g/L constant concentration. Mean removal efficiencies of BOD, COD, phosphorous, total ammonia nitrogen and total nitrogen were 47%, 77%, 38%, 27% and 24%, respectively. The evaluated system proved an effective alternative for water reuse in the recirculation system capable of maintaining water quality characteristics within the recommended values for fish farming and met the Brazilian standards for final effluent discharges with exception of phosphorous values. (C) 2011 Elsevier B.V. All rights reserved.