967 resultados para Classification models


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Original method and technology of systemological «Unit-Function-Object» analysis for solving complete ill-structured problems is proposed. The given visual grapho-analytical UFO technology for the fist time combines capabilities and advantages of the system and object approaches and can be used for business reengineering and for information systems design. UFO- technology procedures are formalized by pattern-theory methods and developed by embedding systemological conceptual classification models into the system-object analysis and software tools. Technology is based on natural classification and helps to investigate deep semantic regularities of subject domain and to take proper account of system-classes essential properties the most objectively. Systemological knowledge models are based on method which for the first time synthesizes system and classification analysis. It allows creating CASE-toolkit of a new generation for organizational modelling for companies’ sustainable development and competitive advantages providing.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As traffic congestion exuberates and new roadway construction is severely constrained because of limited availability of land, high cost of land acquisition, and communities' opposition to the building of major roads, new solutions have to be sought to either make roadway use more efficient or reduce travel demand. There is a general agreement that travel demand is affected by land use patterns. However, traditional aggregate four-step models, which are the prevailing modeling approach presently, assume that traffic condition will not affect people's decision on whether to make a trip or not when trip generation is estimated. Existing survey data indicate, however, that differences exist in trip rates for different geographic areas. The reasons for such differences have not been carefully studied, and the success of quantifying the influence of land use on travel demand beyond employment, households, and their characteristics has been limited to be useful to the traditional four-step models. There may be a number of reasons, such as that the representation of influence of land use on travel demand is aggregated and is not explicit and that land use variables such as density and mix and accessibility as measured by travel time and congestion have not been adequately considered. This research employs the artificial neural network technique to investigate the potential effects of land use and accessibility on trip productions. Sixty two variables that may potentially influence trip production are studied. These variables include demographic, socioeconomic, land use and accessibility variables. Different architectures of ANN models are tested. Sensitivity analysis of the models shows that land use does have an effect on trip production, so does traffic condition. The ANN models are compared with linear regression models and cross-classification models using the same data. The results show that ANN models are better than the linear regression models and cross-classification models in terms of RMSE. Future work may focus on finding a representation of traffic condition with existing network data and population data which might be available when the variables are needed to in prediction.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Effective conservation and management of top predators requires a comprehensive understanding of their distributions and of the underlying biological and physical processes that affect these distributions. The Mid-Atlantic Bight shelf break system is a dynamic and productive region where at least 32 species of cetaceans have been recorded through various systematic and opportunistic marine mammal surveys from the 1970s through 2012. My dissertation characterizes the spatial distribution and habitat of cetaceans in the Mid-Atlantic Bight shelf break system by utilizing marine mammal line-transect survey data, synoptic multi-frequency active acoustic data, and fine-scale hydrographic data collected during the 2011 summer Atlantic Marine Assessment Program for Protected Species (AMAPPS) survey. Although studies describing cetacean habitat and distributions have been previously conducted in the Mid-Atlantic Bight, my research specifically focuses on the shelf break region to elucidate both the physical and biological processes that influence cetacean distribution patterns within this cetacean hotspot.

In Chapter One I review biologically important areas for cetaceans in the Atlantic waters of the United States. I describe the study area, the shelf break region of the Mid-Atlantic Bight, in terms of the general oceanography, productivity and biodiversity. According to recent habitat-based cetacean density models, the shelf break region is an area of high cetacean abundance and density, yet little research is directed at understanding the mechanisms that establish this region as a cetacean hotspot.

In Chapter Two I present the basic physical principles of sound in water and describe the methodology used to categorize opportunistically collected multi-frequency active acoustic data using frequency responses techniques. Frequency response classification methods are usually employed in conjunction with net-tow data, but the logistics of the 2011 AMAPPS survey did not allow for appropriate net-tow data to be collected. Biologically meaningful information can be extracted from acoustic scattering regions by comparing the frequency response curves of acoustic regions to theoretical curves of known scattering models. Using the five frequencies on the EK60 system (18, 38, 70, 120, and 200 kHz), three categories of scatterers were defined: fish-like (with swim bladder), nekton-like (e.g., euphausiids), and plankton-like (e.g., copepods). I also employed a multi-frequency acoustic categorization method using three frequencies (18, 38, and 120 kHz) that has been used in the Gulf of Maine and Georges Bank which is based the presence or absence of volume backscatter above a threshold. This method is more objective than the comparison of frequency response curves because it uses an established backscatter value for the threshold. By removing all data below the threshold, only strong scattering information is retained.

In Chapter Three I analyze the distribution of the categorized acoustic regions of interest during the daytime cross shelf transects. Over all transects, plankton-like acoustic regions of interest were detected most frequently, followed by fish-like acoustic regions and then nekton-like acoustic regions. Plankton-like detections were the only significantly different acoustic detections per kilometer, although nekton-like detections were only slightly not significant. Using the threshold categorization method by Jech and Michaels (2006) provides a more conservative and discrete detection of acoustic scatterers and allows me to retrieve backscatter values along transects in areas that have been categorized. This provides continuous data values that can be integrated at discrete spatial increments for wavelet analysis. Wavelet analysis indicates significant spatial scales of interest for fish-like and nekton-like acoustic backscatter range from one to four kilometers and vary among transects.

In Chapter Four I analyze the fine scale distribution of cetaceans in the shelf break system of the Mid-Atlantic Bight using corrected sightings per trackline region, classification trees, multidimensional scaling, and random forest analysis. I describe habitat for common dolphins, Risso’s dolphins and sperm whales. From the distribution of cetacean sightings, patterns of habitat start to emerge: within the shelf break region of the Mid-Atlantic Bight, common dolphins were sighted more prevalently over the shelf while sperm whales were more frequently found in the deep waters offshore and Risso’s dolphins were most prevalent at the shelf break. Multidimensional scaling presents clear environmental separation among common dolphins and Risso’s dolphins and sperm whales. The sperm whale random forest habitat model had the lowest misclassification error (0.30) and the Risso’s dolphin random forest habitat model had the greatest misclassification error (0.37). Shallow water depth (less than 148 meters) was the primary variable selected in the classification model for common dolphin habitat. Distance to surface density fronts and surface temperature fronts were the primary variables selected in the classification models to describe Risso’s dolphin habitat and sperm whale habitat respectively. When mapped back into geographic space, these three cetacean species occupy different fine-scale habitats within the dynamic Mid-Atlantic Bight shelf break system.

In Chapter Five I present a summary of the previous chapters and present potential analytical steps to address ecological questions pertaining the dynamic shelf break region. Taken together, the results of my dissertation demonstrate the use of opportunistically collected data in ecosystem studies; emphasize the need to incorporate middle trophic level data and oceanographic features into cetacean habitat models; and emphasize the importance of developing more mechanistic understanding of dynamic ecosystems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The routine analysis for quantization of organic acids and sugars are generally slow methods that involve the use and preparation of several reagents, require trained professional, the availability of special equipment and is expensive. In this context, it has been increasing investment in research whose purpose is the development of substitutive methods to reference, which are faster, cheap and simple, and infrared spectroscopy have been highlighted in this regard. The present study developed multivariate calibration models for the simultaneous and quantitative determination of ascorbic acid, citric, malic and tartaric and sugars sucrose, glucose and fructose, and soluble solids in juices and fruit nectars and classification models for ACP. We used methods of spectroscopy in the near infrared (Near Infrared, NIR) in association with the method regression of partial least squares (PLS). Were used 42 samples between juices and fruit nectars commercially available in local shops. For the construction of the models were performed with reference analysis using high-performance liquid chromatography (HPLC) and refractometry for the analysis of soluble solids. Subsequently, the acquisition of the spectra was done in triplicate, in the spectral range 12500 to 4000 cm-1. The best models were applied to the quantification of analytes in study on natural juices and juice samples produced in the Paraná Southwest Region. The juices used in the application of the models also underwent physical and chemical analysis. Validation of chromatographic methodology has shown satisfactory results, since the external calibration curve obtained R-square value (R2) above 0.98 and coefficient of variation (%CV) for intermediate precision and repeatability below 8.83%. Through the Principal Component Analysis (PCA) was possible to separate samples of juices into two major groups, grape and apple and tangerine and orange, while for nectars groups separated guava and grape, and pineapple and apple. Different validation methods, and pre-processes that were used separately and in combination, were obtained with multivariate calibration models with average forecast square error (RMSEP) and cross validation (RMSECV) errors below 1.33 and 1.53 g.100 mL-1, respectively and R2 above 0.771, except for malic acid. The physicochemical analysis enabled the characterization of drinks, including the pH working range (variation of 2.83 to 5.79) and acidity within the parameters Regulation for each flavor. Regression models have demonstrated the possibility of determining both ascorbic acids, citric, malic and tartaric with successfully, besides sucrose, glucose and fructose by means of only a spectrum, suggesting that the models are economically viable for quality control and product standardization in the fruit juice and nectars processing industry.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nowadays it is still difficult to perform an early and accurate diagnosis of dementia, therefore many research focus on the finding of new dementia biomarkers that can aid in that purpose. So scientists try to find a noninvasive, rapid, and relatively inexpensive procedures for early diagnosis purpose. Several studies demonstrated that the utilization of spectroscopic techniques, such as Fourier Transform Infrared Spectroscopy (FTIR) and Raman spectroscopy could be an useful and accurate procedure to diagnose dementia. As several biochemical mechanisms related to neurodegeneration and dementia can lead to changes in plasma components and others peripheral body fluids, blood-based samples and spectroscopic analyses can be used as a more simple and less invasive technique. This work is intended to confirm some of the hypotheses of previous studies in which FTIR was used in the study of plasma samples of possible patient with AD and respective controls and verify the reproducibility of this spectroscopic technique in the analysis of such samples. Through the spectroscopic analysis combined with multivariate analysis it is possible to discriminate controls and demented samples and identify key spectroscopic differences between these two groups of samples which allows the identification of metabolites altered in this disease. It can be concluded that there are three spectral regions, 3500-2700 cm -1, 1800-1400 cm-1 and 1200-900 cm-1 where it can be extracted relevant spectroscopic information. In the first region, the main conclusion that is possible to take is that there is an unbalance between the content of saturated and unsaturated lipids. In the 1800-1400 cm-1 region it is possible to see the presence of protein aggregates and the change in protein conformation for highly stable parallel β-sheet. The last region showed the presence of products of lipid peroxidation related to impairment of membranes, and nucleic acids oxidative damage. FTIR technique and the information gathered in this work can be used in the construction of classification models that may be used for the diagnosis of cognitive dysfunction.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Doutoramento em Engenharia Florestal e dos Recursos Naturais - Instituto Superior de Agronomia - UL

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In order to reduce serious health incidents, individuals with high risks need to be identified as early as possible so that effective intervention and preventive care can be provided. This requires regular and efficient assessments of risk within communities that are the first point of contacts for individuals. Clinical Decision Support Systems CDSSs have been developed to help with the task of risk assessment, however such systems and their underpinning classification models are tailored towards those with clinical expertise. Communities where regular risk assessments are required lack such expertise. This paper presents the continuation of GRiST research team efforts to disseminate clinical expertise to communities. Based on our earlier published findings, this paper introduces the framework and skeleton for a data collection and risk classification model that evaluates data redundancy in real-time, detects the risk-informative data and guides the risk assessors towards collecting those data. By doing so, it enables non-experts within the communities to conduct reliable Mental Health risk triage.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this thesis project is to automatically localize HCC tumors in the human liver and subsequently predict if the tumor will undergo microvascular infiltration (MVI), the initial stage of metastasis development. The input data for the work have been partially supplied by Sant'Orsola Hospital and partially downloaded from online medical databases. Two Unet models have been implemented for the automatic segmentation of the livers and the HCC malignancies within it. The segmentation models have been evaluated with the Intersection-over-Union and the Dice Coefficient metrics. The outcomes obtained for the liver automatic segmentation are quite good (IOU = 0.82; DC = 0.35); the outcomes obtained for the tumor automatic segmentation (IOU = 0.35; DC = 0.46) are, instead, affected by some limitations: it can be state that the algorithm is almost always able to detect the location of the tumor, but it tends to underestimate its dimensions. The purpose is to achieve the CT images of the HCC tumors, necessary for features extraction. The 14 Haralick features calculated from the 3D-GLCM, the 120 Radiomic features and the patients' clinical information are collected to build a dataset of 153 features. Now, the goal is to build a model able to discriminate, based on the features given, the tumors that will undergo MVI and those that will not. This task can be seen as a classification problem: each tumor needs to be classified either as “MVI positive” or “MVI negative”. Techniques for features selection are implemented to identify the most descriptive features for the problem at hand and then, a set of classification models are trained and compared. Among all, the models with the best performances (around 80-84% ± 8-15%) result to be the XGBoost Classifier, the SDG Classifier and the Logist Regression models (without penalization and with Lasso, Ridge or Elastic Net penalization).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Intelligent systems are currently inherent to the society, supporting a synergistic human-machine collaboration. Beyond economical and climate factors, energy consumption is strongly affected by the performance of computing systems. The quality of software functioning may invalidate any improvement attempt. In addition, data-driven machine learning algorithms are the basis for human-centered applications, being their interpretability one of the most important features of computational systems. Software maintenance is a critical discipline to support automatic and life-long system operation. As most software registers its inner events by means of logs, log analysis is an approach to keep system operation. Logs are characterized as Big data assembled in large-flow streams, being unstructured, heterogeneous, imprecise, and uncertain. This thesis addresses fuzzy and neuro-granular methods to provide maintenance solutions applied to anomaly detection (AD) and log parsing (LP), dealing with data uncertainty, identifying ideal time periods for detailed software analyses. LP provides deeper semantics interpretation of the anomalous occurrences. The solutions evolve over time and are general-purpose, being highly applicable, scalable, and maintainable. Granular classification models, namely, Fuzzy set-Based evolving Model (FBeM), evolving Granular Neural Network (eGNN), and evolving Gaussian Fuzzy Classifier (eGFC), are compared considering the AD problem. The evolving Log Parsing (eLP) method is proposed to approach the automatic parsing applied to system logs. All the methods perform recursive mechanisms to create, update, merge, and delete information granules according with the data behavior. For the first time in the evolving intelligent systems literature, the proposed method, eLP, is able to process streams of words and sentences. Essentially, regarding to AD accuracy, FBeM achieved (85.64+-3.69)%; eGNN reached (96.17+-0.78)%; eGFC obtained (92.48+-1.21)%; and eLP reached (96.05+-1.04)%. Besides being competitive, eLP particularly generates a log grammar, and presents a higher level of model interpretability.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Difficult tracheal intubation assessment is an important research topic in anesthesia as failed intubations are important causes of mortality in anesthetic practice. The modified Mallampati score is widely used, alone or in conjunction with other criteria, to predict the difficulty of intubation. This work presents an automatic method to assess the modified Mallampati score from an image of a patient with the mouth wide open. For this purpose we propose an active appearance models (AAM) based method and use linear support vector machines (SVM) to select a subset of relevant features obtained using the AAM. This feature selection step proves to be essential as it improves drastically the performance of classification, which is obtained using SVM with RBF kernel and majority voting. We test our method on images of 100 patients undergoing elective surgery and achieve 97.9% accuracy in the leave-one-out crossvalidation test and provide a key element to an automatic difficult intubation assessment system.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

BACKGROUND: We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. METHODS: Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i) linear regression; (ii) logistic classification; (iii) regression trees; (iv) classification trees (iii and iv are collectively known as "CART"). Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. RESULTS: Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60-80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. CONCLUSIONS: There were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv), or the regression (i, iii) vs. classification (ii, iv) modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Co-training is a semi-supervised learning method that is designed to take advantage of the redundancy that is present when the object to be identified has multiple descriptions. Co-training is known to work well when the multiple descriptions are conditional independent given the class of the object. The presence of multiple descriptions of objects in the form of text, images, audio and video in multimedia applications appears to provide redundancy in the form that may be suitable for co-training. In this paper, we investigate the suitability of utilizing text and image data from the Web for co-training. We perform measurements to find indications of conditional independence in the texts and images obtained from the Web. Our measurements suggest that conditional independence is likely to be present in the data. Our experiments, within a relevance feedback framework to test whether a method that exploits the conditional independence outperforms methods that do not, also indicate that better performance can indeed be obtained by designing algorithms that exploit this form of the redundancy when it is present.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Bloom-forming and toxin-producing cyanobacteria remain a persistent nuisance across the world. Modelling of cyanobacteria in freshwaters is an important tool for understanding their population dynamics and predicting bloom occurrence in lakes and rivers. In this paper existing key models of cyanobacteria are reviewed, evaluated and classified. Two major groups emerge: deterministic mathematical and artificial neural network models. Mathematical models can be further subcategorized into those models concerned with impounded water bodies and those concerned with rivers. Most existing models focus on a single aspect such as the growth of transport mechanisms, but there are a few models which couple both.