964 resultados para Pre-processing
Resumo:
Tämän työn tavoitteena oli selvittää ja toteuttaa esikäsittelypiirin prototyyppi akustisen emission anturin signaalille. Toteutettu esikäsittelypiiri toimii yksipuoleisella käyttöjännitteellä. Työssä käydään läpi esikäsittelypiirin suunnitteluun liittyvät vaiheet laskelmien ja simulaatioiden muodossa. Lisäksi työssä esitetään mittaustulokset esikäsittelypiirin toiminnasta.
Resumo:
Työssä määritettiin luokan 2 eläinperäisistä sivutuotteista liikennekäyttöön tuotettujen biodieselin ja biometaanin elinkaaren aikaiset kasvihuonekaasupäästöt ja tuotantoprosessien energiankulutukset perustuen kirjallisuuslähteistä saatuihin lähtötietoihin. Tätä kautta tutkittiin yhdistelmäprosessia, jossa tuotetaan molempia polttoaineita ja selvitettiin onko tällaisella tuotantotavalla mahdollista vähentää päästöjä ja parantaa polttoaineiden tuotannon energiatehokkuutta. Kasvihuone-kaasupäästöjen laskentamenetelmä pohjautuu direktiivissä 2009/28/EY annettuun ohjeistukseen ja eri kasvihuonekaasupäästöjen karakterisointi IPCC:n sadan vuoden tarkastelumalliin. Käytännön laskenta suoritettiin standardien SFS-EN ISO 14040 ja 14044 määrittelemän elinkaariarviointiselvityksen muodossa. Työssä käytetyn laskentamenetelmän ja tarkasteluun valittujen tuotanto-teknologioiden perusteella lasketut tulokset osoittavat, että yhdistelmäprosessilla ei saavuteta suurempia päästövähenemiä eikä parempaa energiatehokkuutta kuin nykyisin käytössä olevilla tuotantotavoilla. Tulokset ovat kuitenkin hyvin herkkiä laskennassa tehtyjen oletusten ja käytettyjen lähtötietojen vaihtelulle sekä valittujen laskentamenetelmien muutoksille. Suurin päästöjä ja energiankulutusta aiheuttava yksittäinen tekijä on kaikissa tuotejärjestelmissä luokan 2 sivutuotteiden esikäsittelyssä vaadittavaan steri-lointiin tarvittavan lämmön tuotanto. Tutkituissa tuotejärjestelmissä lämpö tuotetaan kokonaan tai osittain fossiilisilla polttoaineilla. Kasvihuone-kaasupäästöjä olisi mahdollista alentaa merkittävästi siirtymällä lämmön tuotannossa kokonaan uusiutuviin polttoaineisiin. Sterilointi on lain edellyttämä käsittelytapa ja siksi energiankulutusta on vallitsevissa olosuhteissa hyvin vaikea pienentää merkittävästi.
Resumo:
Many research works have being carried out on analyzing grain storage facility costs; however a few of them had taken into account the analysis of factors associated to all pre-processing and storage steps. The objective of this work was to develop a decision support system for determining the grain storage facility costs and utilization fees in grain storage facilities. The data of a CONAB storage facility located in Ponta Grossa - PR, Brazil, was used as input of the system developed to analyze its specific characteristics, such as amount of product received and stored throughout the year, hourly capacity of drying, cleaning, and receiving, and dispatch. By applying the decision support system, it was observed that the reception and expedition costs were exponentially reduced as the turnover rate of the storage increased. The cleaning and drying costs increased linearly with grain initial moisture. The storage cost increased exponentially as the occupancy rate of the storage facility decreased.
Resumo:
This study aimed at identifying different conditions of coffee plants after harvesting period, using data mining and spectral behavior profiles from Hyperion/EO1 sensor. The Hyperion image, with spatial resolution of 30 m, was acquired in August 28th, 2008, at the end of the coffee harvest season in the studied area. For pre-processing imaging, atmospheric and signal/noise effect corrections were carried out using Flaash and MNF (Minimum Noise Fraction Transform) algorithms, respectively. Spectral behavior profiles (38) of different coffee varieties were generated from 150 Hyperion bands. The spectral behavior profiles were analyzed by Expectation-Maximization (EM) algorithm considering 2; 3; 4 and 5 clusters. T-test with 5% of significance was used to verify the similarity among the wavelength cluster means. The results demonstrated that it is possible to separate five different clusters, which were comprised by different coffee crop conditions making possible to improve future intervention actions.
Resumo:
In this study, cantilever-enhanced photoacoustic spectroscopy (CEPAS) was applied in different drug detection schemes. The study was divided into two different applications: trace detection of vaporized drugs and drug precursors in the gas-phase, and detection of cocaine abuse in hair. The main focus, however, was the study of hair samples. In the gas-phase, methyl benzoate, a hydrolysis product of cocaine hydrochloride, and benzyl methyl ketone (BMK), a precursor of amphetamine and methamphetamine were investigated. In the solid-phase, hair samples from cocaine overdose patients were measured and compared to a drug-free reference group. As hair consists mostly of long fibrous proteins generally called keratin, proteins from fingernails and saliva were also studied for comparison. Different measurement setups were applied in this study. Gas measurements were carried out using quantum cascade lasers (QLC) as a source in the photoacoustic detection. Also, an external cavity (EC) design was used for a broader tuning range. Detection limits of 3.4 particles per billion (ppb) for methyl benzoate and 26 ppb for BMK in 0.9 s were achieved with the EC-QCL PAS setup. The achieved detection limits are sufficient for realistic drug detection applications. The measurements from drug overdose patients were carried out using Fourier transform infrared (FTIR) PAS. The drug-containing hair samples and drug-free samples were both measured with the FTIR-PAS setup, and the measured spectra were analyzed statistically with principal component analysis (PCA). The two groups were separated by their spectra with PCA and proper spectral pre-processing. To improve the method, ECQCL measurements of the hair samples, and studies using photoacoustic microsampling techniques, were performed. High quality, high-resolution spectra with a broad tuning range were recorded from a single hair fiber. This broad tuning range of an EC-QCL has not previously been used in the photoacoustic spectroscopy of solids. However, no drug detection studies were performed with the EC-QCL solid-phase setup.
Resumo:
The iron and steelmaking industry is among the major contributors to the anthropogenic emissions of carbon dioxide in the world. The rising levels of CO2 in the atmosphere and the global concern about the greenhouse effect and climate change have brought about considerable investigations on how to reduce the energy intensity and CO2 emissions of this industrial sector. In this thesis the problem is tackled by mathematical modeling and optimization using three different approaches. The possibility to use biomass in the integrated steel plant, particularly as an auxiliary reductant in the blast furnace, is investigated. By pre-processing the biomass its heating value and carbon content can be increased at the same time as the oxygen content is decreased. As the compression strength of the preprocessed biomass is lower than that of coke, it is not suitable for replacing a major part of the coke in the blast furnace burden. Therefore the biomass is assumed to be injected at the tuyere level of the blast furnace. Carbon capture and storage is, nowadays, mostly associated with power plants but it can also be used to reduce the CO2 emissions of an integrated steel plant. In the case of a blast furnace, the effect of CCS can be further increased by recycling the carbon dioxide stripped top gas back into the process. However, this affects the economy of the integrated steel plant, as the amount of top gases available, e.g., for power and heat production is decreased. High quality raw materials are a prerequisite for smooth blast furnace operation. High quality coal is especially needed to produce coke with sufficient properties to ensure proper gas permeability and smooth burden descent. Lower quality coals as well as natural gas, which some countries have in great volumes, can be utilized with various direct and smelting reduction processes. The DRI produced with a direct reduction process can be utilized as a feed material for blast furnace, basic oxygen furnace or electric arc furnace. The liquid hot metal from a smelting reduction process can in turn be used in basic oxygen furnace or electric arc furnace. The unit sizes and investment costs of an alternative ironmaking process are also lower than those of a blast furnace. In this study, the economy of an integrated steel plant is investigated by simulation and optimization. The studied system consists of linearly described unit processes from coke plant to steel making units, with a more detailed thermodynamical model of the blast furnace. The results from the blast furnace operation with biomass injection revealed the importance of proper pre-processing of the raw biomass as the composition of the biomass as well as the heating value and the yield are all affected by the pyrolysis temperature. As for recycling of CO2 stripped blast furnace top gas, substantial reductions in the emission rates are achieved if the stripped CO2 can be stored. However, the optimal recycling degree together with other operation conditions is heavily dependent on the cost structure of CO2 emissions and stripping/storage. The economical feasibility related to the use of DRI in the blast furnace depends on the price ratio between the DRI pellets and the BF pellets. The high amount of energy needed in the rotary hearth furnace to reduce the iron ore leads to increased CO2 emissions.
Resumo:
La technologie des microarrays demeure à ce jour un outil important pour la mesure de l'expression génique. Au-delà de la technologie elle-même, l'analyse des données provenant des microarrays constitue un problème statistique complexe, ce qui explique la myriade de méthodes proposées pour le pré-traitement et en particulier, l'analyse de l'expression différentielle. Toutefois, l'absence de données de calibration ou de méthodologie de comparaison appropriée a empêché l'émergence d'un consensus quant aux méthodes d'analyse optimales. En conséquence, la décision de l'analyste de choisir telle méthode plutôt qu'une autre se fera la plupart du temps de façon subjective, en se basant par exemple sur la facilité d'utilisation, l'accès au logiciel ou la popularité. Ce mémoire présente une approche nouvelle au problème de la comparaison des méthodes d'analyse de l'expression différentielle. Plus de 800 pipelines d'analyse sont appliqués à plus d'une centaine d'expériences sur deux plateformes Affymetrix différentes. La performance de chacun des pipelines est évaluée en calculant le niveau moyen de co-régulation par l'entremise de scores d'enrichissements pour différentes collections de signatures moléculaires. L'approche comparative proposée repose donc sur un ensemble varié de données biologiques pertinentes, ne confond pas la reproductibilité avec l'exactitude et peut facilement être appliquée à de nouvelles méthodes. Parmi les méthodes testées, la supériorité de la sommarisation FARMS et de la statistique de l'expression différentielle TREAT est sans équivoque. De plus, les résultats obtenus quant à la statistique d'expression différentielle corroborent les conclusions d'autres études récentes à propos de l'importance de prendre en compte la grandeur du changement en plus de sa significativité statistique.
Resumo:
Le Ministère des Ressources Naturelles et de la Faune (MRNF) a mandaté la compagnie de géomatique SYNETIX inc. de Montréal et le laboratoire de télédétection de l’Université de Montréal dans le but de développer une application dédiée à la détection automatique et la mise à jour du réseau routier des cartes topographiques à l’échelle 1 : 20 000 à partir de l’imagerie optique à haute résolution spatiale. À cette fin, les mandataires ont entrepris l’adaptation du progiciel SIGMA0 qu’ils avaient conjointement développé pour la mise à jour cartographique à partir d’images satellitales de résolution d’environ 5 mètres. Le produit dérivé de SIGMA0 fut un module nommé SIGMA-ROUTES dont le principe de détection des routes repose sur le balayage d’un filtre le long des vecteurs routiers de la cartographie existante. Les réponses du filtre sur des images couleurs à très haute résolution d’une grande complexité radiométrique (photographies aériennes) conduisent à l’assignation d’étiquettes selon l’état intact, suspect, disparu ou nouveau aux segments routiers repérés. L’objectif général de ce projet est d’évaluer la justesse de l’assignation des statuts ou états en quantifiant le rendement sur la base des distances totales détectées en conformité avec la référence ainsi qu’en procédant à une analyse spatiale des incohérences. La séquence des essais cible d’abord l’effet de la résolution sur le taux de conformité et dans un second temps, les gains escomptés par une succession de traitements de rehaussement destinée à rendre ces images plus propices à l’extraction du réseau routier. La démarche globale implique d’abord la caractérisation d’un site d’essai dans la région de Sherbrooke comportant 40 km de routes de diverses catégories allant du sentier boisé au large collecteur sur une superficie de 2,8 km2. Une carte de vérité terrain des voies de communication nous a permis d’établir des données de référence issues d’une détection visuelle à laquelle sont confrontés les résultats de détection de SIGMA-ROUTES. Nos résultats confirment que la complexité radiométrique des images à haute résolution en milieu urbain bénéficie des prétraitements telles que la segmentation et la compensation d’histogramme uniformisant les surfaces routières. On constate aussi que les performances présentent une hypersensibilité aux variations de résolution alors que le passage entre nos trois résolutions (84, 168 et 210 cm) altère le taux de détection de pratiquement 15% sur les distances totales en concordance avec la référence et segmente spatialement de longs vecteurs intacts en plusieurs portions alternant entre les statuts intact, suspect et disparu. La détection des routes existantes en conformité avec la référence a atteint 78% avec notre plus efficace combinaison de résolution et de prétraitements d’images. Des problèmes chroniques de détection ont été repérés dont la présence de plusieurs segments sans assignation et ignorés du processus. Il y a aussi une surestimation de fausses détections assignées suspectes alors qu’elles devraient être identifiées intactes. Nous estimons, sur la base des mesures linéaires et des analyses spatiales des détections que l’assignation du statut intact devrait atteindre 90% de conformité avec la référence après divers ajustements à l’algorithme. La détection des nouvelles routes fut un échec sans égard à la résolution ou au rehaussement d’image. La recherche des nouveaux segments qui s’appuie sur le repérage de points potentiels de début de nouvelles routes en connexion avec les routes existantes génère un emballement de fausses détections navigant entre les entités non-routières. En lien avec ces incohérences, nous avons isolé de nombreuses fausses détections de nouvelles routes générées parallèlement aux routes préalablement assignées intactes. Finalement, nous suggérons une procédure mettant à profit certaines images rehaussées tout en intégrant l’intervention humaine à quelques phases charnières du processus.
Resumo:
Les systèmes statistiques de traduction automatique ont pour tâche la traduction d’une langue source vers une langue cible. Dans la plupart des systèmes de traduction de référence, l'unité de base considérée dans l'analyse textuelle est la forme telle qu’observée dans un texte. Une telle conception permet d’obtenir une bonne performance quand il s'agit de traduire entre deux langues morphologiquement pauvres. Toutefois, ceci n'est plus vrai lorsqu’il s’agit de traduire vers une langue morphologiquement riche (ou complexe). Le but de notre travail est de développer un système statistique de traduction automatique comme solution pour relever les défis soulevés par la complexité morphologique. Dans ce mémoire, nous examinons, dans un premier temps, un certain nombre de méthodes considérées comme des extensions aux systèmes de traduction traditionnels et nous évaluons leurs performances. Cette évaluation est faite par rapport aux systèmes à l’état de l’art (système de référence) et ceci dans des tâches de traduction anglais-inuktitut et anglais-finnois. Nous développons ensuite un nouvel algorithme de segmentation qui prend en compte les informations provenant de la paire de langues objet de la traduction. Cet algorithme de segmentation est ensuite intégré dans le modèle de traduction à base d’unités lexicales « Phrase-Based Models » pour former notre système de traduction à base de séquences de segments. Enfin, nous combinons le système obtenu avec des algorithmes de post-traitement pour obtenir un système de traduction complet. Les résultats des expériences réalisées dans ce mémoire montrent que le système de traduction à base de séquences de segments proposé permet d’obtenir des améliorations significatives au niveau de la qualité de la traduction en terme de le métrique d’évaluation BLEU (Papineni et al., 2002) et qui sert à évaluer. Plus particulièrement, notre approche de segmentation réussie à améliorer légèrement la qualité de la traduction par rapport au système de référence et une amélioration significative de la qualité de la traduction est observée par rapport aux techniques de prétraitement de base (baseline).
Resumo:
Learning Disability (LD) is a general term that describes specific kinds of learning problems. It is a neurological condition that affects a child's brain and impairs his ability to carry out one or many specific tasks. The learning disabled children are neither slow nor mentally retarded. This disorder can make it problematic for a child to learn as quickly or in the same way as some child who isn't affected by a learning disability. An affected child can have normal or above average intelligence. They may have difficulty paying attention, with reading or letter recognition, or with mathematics. It does not mean that children who have learning disabilities are less intelligent. In fact, many children who have learning disabilities are more intelligent than an average child. Learning disabilities vary from child to child. One child with LD may not have the same kind of learning problems as another child with LD. There is no cure for learning disabilities and they are life-long. However, children with LD can be high achievers and can be taught ways to get around the learning disability. In this research work, data mining using machine learning techniques are used to analyze the symptoms of LD, establish interrelationships between them and evaluate the relative importance of these symptoms. To increase the diagnostic accuracy of learning disability prediction, a knowledge based tool based on statistical machine learning or data mining techniques, with high accuracy,according to the knowledge obtained from the clinical information, is proposed. The basic idea of the developed knowledge based tool is to increase the accuracy of the learning disability assessment and reduce the time used for the same. Different statistical machine learning techniques in data mining are used in the study. Identifying the important parameters of LD prediction using the data mining techniques, identifying the hidden relationship between the symptoms of LD and estimating the relative significance of each symptoms of LD are also the parts of the objectives of this research work. The developed tool has many advantages compared to the traditional methods of using check lists in determination of learning disabilities. For improving the performance of various classifiers, we developed some preprocessing methods for the LD prediction system. A new system based on fuzzy and rough set models are also developed for LD prediction. Here also the importance of pre-processing is studied. A Graphical User Interface (GUI) is designed for developing an integrated knowledge based tool for prediction of LD as well as its degree. The designed tool stores the details of the children in the student database and retrieves their LD report as and when required. The present study undoubtedly proves the effectiveness of the tool developed based on various machine learning techniques. It also identifies the important parameters of LD and accurately predicts the learning disability in school age children. This thesis makes several major contributions in technical, general and social areas. The results are found very beneficial to the parents, teachers and the institutions. They are able to diagnose the child’s problem at an early stage and can go for the proper treatments/counseling at the correct time so as to avoid the academic and social losses.
Resumo:
Data mining is one of the hottest research areas nowadays as it has got wide variety of applications in common man’s life to make the world a better place to live. It is all about finding interesting hidden patterns in a huge history data base. As an example, from a sales data base, one can find an interesting pattern like “people who buy magazines tend to buy news papers also” using data mining. Now in the sales point of view the advantage is that one can place these things together in the shop to increase sales. In this research work, data mining is effectively applied to a domain called placement chance prediction, since taking wise career decision is so crucial for anybody for sure. In India technical manpower analysis is carried out by an organization named National Technical Manpower Information System (NTMIS), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead centre in the IAMR, New Delhi, and 21 nodal centres located at different parts of the country. The Kerala State Nodal Centre is located at Cochin University of Science and Technology. In Nodal Centre, they collect placement information by sending postal questionnaire to passed out students on a regular basis. From this raw data available in the nodal centre, a history data base was prepared. Each record in this data base includes entrance rank ranges, reservation, Sector, Sex, and a particular engineering. From each such combination of attributes from the history data base of student records, corresponding placement chances is computed and stored in the history data base. From this data, various popular data mining models are built and tested. These models can be used to predict the most suitable branch for a particular new student with one of the above combination of criteria. Also a detailed performance comparison of the various data mining models is done.This research work proposes to use a combination of data mining models namely a hybrid stacking ensemble for better predictions. A strategy to predict the overall absorption rate for various branches as well as the time it takes for all the students of a particular branch to get placed etc are also proposed. Finally, this research work puts forward a new data mining algorithm namely C 4.5 * stat for numeric data sets which has been proved to have competent accuracy over standard benchmarking data sets called UCI data sets. It also proposes an optimization strategy called parameter tuning to improve the standard C 4.5 algorithm. As a summary this research work passes through all four dimensions for a typical data mining research work, namely application to a domain, development of classifier models, optimization and ensemble methods.
Resumo:
In the current study, epidemiology study is done by means of literature survey in groups identified to be at higher potential for DDIs as well as in other cases to explore patterns of DDIs and the factors affecting them. The structure of the FDA Adverse Event Reporting System (FAERS) database is studied and analyzed in detail to identify issues and challenges in data mining the drug-drug interactions. The necessary pre-processing algorithms are developed based on the analysis and the Apriori algorithm is modified to suit the process. Finally, the modules are integrated into a tool to identify DDIs. The results are compared using standard drug interaction database for validation. 31% of the associations obtained were identified to be new and the match with existing interactions was 69%. This match clearly indicates the validity of the methodology and its applicability to similar databases. Formulation of the results using the generic names expanded the relevance of the results to a global scale. The global applicability helps the health care professionals worldwide to observe caution during various stages of drug administration thus considerably enhancing pharmacovigilance
Resumo:
This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
A methodology for translating text from English into the Dravidian language, Malayalam using statistical models is discussed in this paper. The translator utilizes a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase and generates automatically the Malayalam translation of an unseen English sentence. Various techniques to improve the alignment model by incorporating the morphological inputs into the bilingual corpus are discussed. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in producing better alignments. Difficulties in translation process that arise due to the structural difference between the English Malayalam pair is resolved in the decoding phase by applying the order conversion rules. The handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
Presently different audio watermarking methods are available; most of them inclined towards copyright protection and copy protection. This is the key motive for the notion to develop a speaker verification scheme that guar- antees non-repudiation services and the thesis is its outcome. The research presented in this thesis scrutinizes the field of audio water- marking and the outcome is a speaker verification scheme that is proficient in addressing issues allied to non-repudiation to a great extent. This work aimed in developing novel audio watermarking schemes utilizing the fun- damental ideas of Fast-Fourier Transform (FFT) or Fast Walsh-Hadamard Transform (FWHT). The Mel-Frequency Cepstral Coefficients (MFCC) the best parametric representation of the acoustic signals along with few other key acoustic characteristics is employed in crafting of new schemes. The au- dio watermark created is entirely dependent to the acoustic features, hence named as FeatureMark and is crucial in this work. In any watermarking scheme, the quality of the extracted watermark de- pends exclusively on the pre-processing action and in this work framing and windowing techniques are involved. The theme non-repudiation provides immense significance in the audio watermarking schemes proposed in this work. Modification of the signal spectrum is achieved in a variety of ways by selecting appropriate FFT/FWHT coefficients and the watermarking schemes were evaluated for imperceptibility, robustness and capacity char- acteristics. The proposed schemes are unequivocally effective in terms of maintaining the sound quality, retrieving the embedded FeatureMark and in terms of the capacity to hold the mark bits. Robust nature of these marking schemes is achieved with the help of syn- chronization codes such as Barker Code with FFT based FeatureMarking scheme and Walsh Code with FWHT based FeatureMarking scheme. An- other important feature associated with this scheme is the employment of an encryption scheme towards the preparation of its FeatureMark that scrambles the signal features that helps to keep the signal features unreve- laed. A comparative study with the existing watermarking schemes and the ex- periments to evaluate imperceptibility, robustness and capacity tests guar- antee that the proposed schemes can be baselined as efficient audio water- marking schemes. The four new digital audio watermarking algorithms in terms of their performance are remarkable thereby opening more opportu- nities for further research.