884 resultados para data structures
Resumo:
This paper describes seagrass species and percentage cover point-based field data sets derived from georeferenced photo transects. Annually or biannually over a ten year period (2004-2015) data sets were collected using 30-50 transects, 500-800 m in length distributed across a 142 km**2 shallow, clear water seagrass habitat, the Eastern Banks, Moreton Bay, Australia. Each of the eight data sets include seagrass property information derived from approximately 3000 georeferenced, downward looking photographs captured at 2-4 m intervals along the transects. Photographs were manually interpreted to estimate seagrass species composition and percentage cover (Coral Point Count excel; CPCe). Understanding seagrass biology, ecology and dynamics for scientific and management purposes requires point-based data on species composition and cover. This data set, and the methods used to derive it are a globally unique example for seagrass ecological applications. It provides the basis for multiple further studies at this site, regional to global comparative studies, and, for the design of similar monitoring programs elsewhere.
(Table 4) Sedimentologic characteristics and a summary of diagenetic structures of ODP Hole 114-699A
Resumo:
This paper proposes the optimization relaxation approach based on the analogue Hopfield Neural Network (HNN) for cluster refinement of pre-classified Polarimetric Synthetic Aperture Radar (PolSAR) image data. We consider the initial classification provided by the maximum-likelihood classifier based on the complex Wishart distribution, which is then supplied to the HNN optimization approach. The goal is to improve the classification results obtained by the Wishart approach. The classification improvement is verified by computing a cluster separability coefficient and a measure of homogeneity within the clusters. During the HNN optimization process, for each iteration and for each pixel, two consistency coefficients are computed, taking into account two types of relations between the pixel under consideration and its corresponding neighbors. Based on these coefficients and on the information coming from the pixel itself, the pixel under study is re-classified. Different experiments are carried out to verify that the proposed approach outperforms other strategies, achieving the best results in terms of separability and a trade-off with the homogeneity preserving relevant structures in the image. The performance is also measured in terms of computational central processing unit (CPU) times.
Resumo:
In the last decade, complex networks have widely been applied to the study of many natural and man-made systems, and to the extraction of meaningful information from the interaction structures created by genes and proteins. Nevertheless, less attention has been devoted to metabonomics, due to the lack of a natural network representation of spectral data. Here we define a technique for reconstructing networks from spectral data sets, where nodes represent spectral bins, and pairs of them are connected when their intensities follow a pattern associated with a disease. The structural analysis of the resulting network can then be used to feed standard data-mining algorithms, for instance for the classification of new (unlabeled) subjects. Furthermore, we show how the structure of the network is resilient to the presence of external additive noise, and how it can be used to extract relevant knowledge about the development of the disease.
Resumo:
In the information society large amounts of information are being generated and transmitted constantly, especially in the most natural way for humans, i.e., natural language. Social networks, blogs, forums, and Q&A sites are a dynamic Large Knowledge Repository. So, Web 2.0 contains structured data but still the largest amount of information is expressed in natural language. Linguistic structures for text recognition enable the extraction of structured information from texts. However, the expressiveness of the current structures is limited as they have been designed with a strict order in their phrases, limiting their applicability to other languages and making them more sensible to grammatical errors. To overcome these limitations, in this paper we present a linguistic structure named ?linguistic schema?, with a richer expressiveness that introduces less implicit constraints over annotations.
Resumo:
System identification deals with the problem of building mathematical models of dynamical systems based on observed data from the system" [1]. In the context of civil engineering, the system refers to a large scale structure such as a building, bridge, or an offshore structure, and identification mostly involves the determination of modal parameters (the natural frequencies, damping ratios, and mode shapes). This paper presents some modal identification results obtained using a state-of-the-art time domain system identification method (data-driven stochastic subspace algorithms [2]) applied to the output-only data measured in a steel arch bridge. First, a three dimensional finite element model was developed for the numerical analysis of the structure using ANSYS. Modal analysis was carried out and modal parameters were extracted in the frequency range of interest, 0-10 Hz. The results obtained from the finite element modal analysis were used to determine the location of the sensors. After that, ambient vibration tests were conducted during April 23-24, 2009. The response of the structure was measured using eight accelerometers. Two stations of three sensors were formed (triaxial stations). These sensors were held stationary for reference during the test. The two remaining sensors were placed at the different measurement points along the bridge deck, in which only vertical and transversal measurements were conducted (biaxial stations). Point estimate and interval estimate have been carried out in the state space model using these ambient vibration measurements. In the case of parametric models (like state space), the dynamic behaviour of a system is described using mathematical models. Then, mathematical relationships can be established between modal parameters and estimated point parameters (thus, it is common to use experimental modal analysis as a synonym for system identification). Stable modal parameters are found using a stabilization diagram. Furthermore, this paper proposes a method for assessing the precision of estimates of the parameters of state-space models (confidence interval). This approach employs the nonparametric bootstrap procedure [3] and is applied to subspace parameter estimation algorithm. Using bootstrap results, a plot similar to a stabilization diagram is developed. These graphics differentiate system modes from spurious noise modes for a given order system. Additionally, using the modal assurance criterion, the experimental modes obtained have been compared with those evaluated from a finite element analysis. A quite good agreement between numerical and experimental results is observed.
Resumo:
For the past 20 years, dynamic analysis of shells has been one of the most fascinating fields for research. Using the new light materials the building engineer soon discovered that the subsequent reduction of gravity forces produced not only the desired shape freedom but the appearance of ecologic loads as the first factor of design; loads which present strong random properties and marked dynamic influence. On the other hand, the technological advance in the aeronautical and astronautical field placed the engineers in front of shell structures of nonconventional shape and able to sustain substantialy dynamic loads. The response to the increasingly challenger problems of the last two decades has been very bright; new forms, new materials and new methods of analysis have arosen in the design of off-shore platforms, nuclear vessels, space crafts, etc. Thanks to the intensity of the lived years we have at our disposition a coherent and homogeneous amount of knowledge which enable us to face problems of inconceivable complexity when IASS was founded. The open minded approach to classical problems and the impact of the computer are, probably, important factors in the Renaissance we have enjoyed these years, and a good proof of this are the papers presented to the previous IASS meetings as well as that we are going to consider in this one. Particularly striking is the great number of papers based on a mathematical modeling in front of the meagerness of those treating laboratory experiments on physical models. The universal entering of the computer into almost every phase of our lifes, and the cost of physical models, are –may be- reasons for this lack of experimental methods. Nevertheless they continue offering useful results as are those obtained with the shaking-table in which the computer plays an essential role in the application of loads as well as in the instantaneous treatment of control data. Plates 1 and 2 record the papers presented under dynamic heading, 40% of them are from Japan in good correlation with the relevance that Japanese research has traditionally showed in this area. Also interesting is to find old friends as profesors Tanaka, Nishimura and Kostem who presented valuable papers in previous IASS conferences. As we see there are papers representative of all tendencies, even purely analytical! Better than discuss them in detail, which can be done after the authors presentation, I think we can comment in the general pattern of the dynamical approach are summarized in plate 3.
Resumo:
In this paper we develop new techniques for revealing geometrical structures in phase space that are valid for aperiodically time dependent dynamical systems, which we refer to as Lagrangian descriptors. These quantities are based on the integration, for a finite time, along trajectories of an intrinsic bounded, positive geometrical and/or physical property of the trajectory itself. We discuss a general methodology for constructing Lagrangian descriptors, and we discuss a “heuristic argument” that explains why this method is successful for revealing geometrical structures in the phase space of a dynamical system. We support this argument by explicit calculations on a benchmark problem having a hyperbolic fixed point with stable and unstable manifolds that are known analytically. Several other benchmark examples are considered that allow us the assess the performance of Lagrangian descriptors in revealing invariant tori and regions of shear. Throughout the paper “side-by-side” comparisons of the performance of Lagrangian descriptors with both finite time Lyapunov exponents (FTLEs) and finite time averages of certain components of the vector field (“time averages”) are carried out and discussed. In all cases Lagrangian descriptors are shown to be both more accurate and computationally efficient than these methods. We also perform computations for an explicitly three dimensional, aperiodically time-dependent vector field and an aperiodically time dependent vector field defined as a data set. Comparisons with FTLEs and time averages for these examples are also carried out, with similar conclusions as for the benchmark examples.
Resumo:
Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.
Resumo:
1. Canopies are complex multilayered structures comprising individual plant crowns exposing a multifaceted surface area to sunlight. Foliage arrangement and properties are the main mediators of canopy functions. The leaves act as light traps whose exposure to sunlight varies with time of the day, date and latitude in a trade-off between photosynthetic light harvesting and excessive or photoinhibitory light avoidance. To date, ecological research based upon leaf sampling has been limited by the available echnology, with which data acquisition becomes labour intensive and time-consuming, given the verwhelming number of leaves involved. 2. In the present study, our goal involved developing a tool capable of easuring a sufficient number of leaves to enable analysis of leaf populations, tree crowns and canopies.We specifically tested whether a cell phone working as a 3Dpointer could yield reliable, repeatable and valid leaf anglemeasurements with a simple gesture. We evaluated the accuracy of this method under controlled conditions, using a 3D digitizer, and we compared performance in the field with the methods commonly used. We presented an equation to estimate the potential proportion of the leaf exposed to direct sunlight (SAL) at any given time and compared the results with those obtained bymeans of a graphicalmethod. 3. We found a strong and highly significant correlation between the graphical methods and the equation presented. The calibration process showed a strong correlation between the results derived from the two methods with amean relative difference below 10%. Themean relative difference in calculation of instantaneous exposure was below 5%. Our device performed equally well in diverse locations, in which we characterized over 700 leaves in a single day. 4. The newmethod, involving the use of a cell phone, ismuchmore effective than the traditionalmethods or digitizers when the goal is to scale up from leaf position to performance of leaf populations, tree crowns or canopies. Our methodology constitutes an affordable and valuable tool within which to frame a wide range of ecological hypotheses and to support canopy modelling approaches.
Resumo:
The aim of this paper is to explain the chloride concentration profiles obtained experimentally from control samples of an offshore platform after 25 years of service life. The platform is located 12 km off the coast of the Brazilian province Rio Grande do Norte, in the north-east of Brazil. The samples were extracted at different orientations and heights above mean sea level. A simple model based on Fick’s second law is considered and compared with a finite element model which takes into account transport of chloride ions by diffusion and convection. Results show that convective flows significantly affect the studied chloride penetrations. The convection velocity is obtained by fitting the finite element solution to the experimental data and seems to be directly proportional to the height above mean sea level and also seems to depend on the orientation of the face of the platform. This work shows that considering solely diffusion as transport mechanism does not allow a good prediction of the chloride profiles. Accounting for capillary suction due to moisture gradients permits a better interpretation of the material’s behaviour.
Resumo:
Nowadays, in order to take advantage of fiber optic bandwidth, any optical communications system tends to be WDM. The way to extract a channel, characterized by a wavelength, from the optical fiber is to filter the specific wavelength. This gives the systems a low degree of freedom due to the fact of the static character of most of the employed devices. In this paper we will present a different way to extract channels from an optical fiber with WDM transmission. The employed method is based on an Optically Programmable Logic Cells (OPLC) previously published by us, for other applications as a chaotic generator or as basic element for optical computing. In this paper we will describe the configuration of the OPLC to be employed as a dropping device. It acts as a filter because it will extract the data carried by a concrete wavelength. It does depend, internally, on the wavelength. We will show how the intensity of the signal is able to select the chosen information from the line. It will be also demonstrated that a new idea of redundant information it is the way of selecting the concrete wavelength. As a matter of fact this idea is apparently the only way to use the OPLC as a dropping device. Moreover, based on these concepts, a similar way to route signals to different routes is reported. The basis is the use of photonic switching configurations, namely Batcher or Bayan structures, where the unit switching cells are the above indicated OPLCs.