920 resultados para Semi-complete Data Synchronization
Resumo:
This paper describes seagrass species and percentage cover point-based field data sets derived from georeferenced photo transects. Annually or biannually over a ten year period (2004-2015) data sets were collected using 30-50 transects, 500-800 m in length distributed across a 142 km**2 shallow, clear water seagrass habitat, the Eastern Banks, Moreton Bay, Australia. Each of the eight data sets include seagrass property information derived from approximately 3000 georeferenced, downward looking photographs captured at 2-4 m intervals along the transects. Photographs were manually interpreted to estimate seagrass species composition and percentage cover (Coral Point Count excel; CPCe). Understanding seagrass biology, ecology and dynamics for scientific and management purposes requires point-based data on species composition and cover. This data set, and the methods used to derive it are a globally unique example for seagrass ecological applications. It provides the basis for multiple further studies at this site, regional to global comparative studies, and, for the design of similar monitoring programs elsewhere.
Resumo:
Machine learning techniques are used for extracting valuable knowledge from data. Nowa¬days, these techniques are becoming even more important due to the evolution in data ac¬quisition and storage, which is leading to data with different characteristics that must be exploited. Therefore, advances in data collection must be accompanied with advances in machine learning techniques to solve new challenges that might arise, on both academic and real applications. There are several machine learning techniques depending on both data characteristics and purpose. Unsupervised classification or clustering is one of the most known techniques when data lack of supervision (unlabeled data) and the aim is to discover data groups (clusters) according to their similarity. On the other hand, supervised classification needs data with supervision (labeled data) and its aim is to make predictions about labels of new data. The presence of data labels is a very important characteristic that guides not only the learning task but also other related tasks such as validation. When only some of the available data are labeled whereas the others remain unlabeled (partially labeled data), neither clustering nor supervised classification can be used. This scenario, which is becoming common nowadays because of labeling process ignorance or cost, is tackled with semi-supervised learning techniques. This thesis focuses on the branch of semi-supervised learning closest to clustering, i.e., to discover clusters using available labels as support to guide and improve the clustering process. Another important data characteristic, different from the presence of data labels, is the relevance or not of data features. Data are characterized by features, but it is possible that not all of them are relevant, or equally relevant, for the learning process. A recent clustering tendency, related to data relevance and called subspace clustering, claims that different clusters might be described by different feature subsets. This differs from traditional solutions to data relevance problem, where a single feature subset (usually the complete set of original features) is found and used to perform the clustering process. The proximity of this work to clustering leads to the first goal of this thesis. As commented above, clustering validation is a difficult task due to the absence of data labels. Although there are many indices that can be used to assess the quality of clustering solutions, these validations depend on clustering algorithms and data characteristics. Hence, in the first goal three known clustering algorithms are used to cluster data with outliers and noise, to critically study how some of the most known validation indices behave. The main goal of this work is however to combine semi-supervised clustering with subspace clustering to obtain clustering solutions that can be correctly validated by using either known indices or expert opinions. Two different algorithms are proposed from different points of view to discover clusters characterized by different subspaces. For the first algorithm, available data labels are used for searching for subspaces firstly, before searching for clusters. This algorithm assigns each instance to only one cluster (hard clustering) and is based on mapping known labels to subspaces using supervised classification techniques. Subspaces are then used to find clusters using traditional clustering techniques. The second algorithm uses available data labels to search for subspaces and clusters at the same time in an iterative process. This algorithm assigns each instance to each cluster based on a membership probability (soft clustering) and is based on integrating known labels and the search for subspaces into a model-based clustering approach. The different proposals are tested using different real and synthetic databases, and comparisons to other methods are also included when appropriate. Finally, as an example of real and current application, different machine learning tech¬niques, including one of the proposals of this work (the most sophisticated one) are applied to a task of one of the most challenging biological problems nowadays, the human brain model¬ing. Specifically, expert neuroscientists do not agree with a neuron classification for the brain cortex, which makes impossible not only any modeling attempt but also the day-to-day work without a common way to name neurons. Therefore, machine learning techniques may help to get an accepted solution to this problem, which can be an important milestone for future research in neuroscience. Resumen Las técnicas de aprendizaje automático se usan para extraer información valiosa de datos. Hoy en día, la importancia de estas técnicas está siendo incluso mayor, debido a que la evolución en la adquisición y almacenamiento de datos está llevando a datos con diferentes características que deben ser explotadas. Por lo tanto, los avances en la recolección de datos deben ir ligados a avances en las técnicas de aprendizaje automático para resolver nuevos retos que pueden aparecer, tanto en aplicaciones académicas como reales. Existen varias técnicas de aprendizaje automático dependiendo de las características de los datos y del propósito. La clasificación no supervisada o clustering es una de las técnicas más conocidas cuando los datos carecen de supervisión (datos sin etiqueta), siendo el objetivo descubrir nuevos grupos (agrupaciones) dependiendo de la similitud de los datos. Por otra parte, la clasificación supervisada necesita datos con supervisión (datos etiquetados) y su objetivo es realizar predicciones sobre las etiquetas de nuevos datos. La presencia de las etiquetas es una característica muy importante que guía no solo el aprendizaje sino también otras tareas relacionadas como la validación. Cuando solo algunos de los datos disponibles están etiquetados, mientras que el resto permanece sin etiqueta (datos parcialmente etiquetados), ni el clustering ni la clasificación supervisada se pueden utilizar. Este escenario, que está llegando a ser común hoy en día debido a la ignorancia o el coste del proceso de etiquetado, es abordado utilizando técnicas de aprendizaje semi-supervisadas. Esta tesis trata la rama del aprendizaje semi-supervisado más cercana al clustering, es decir, descubrir agrupaciones utilizando las etiquetas disponibles como apoyo para guiar y mejorar el proceso de clustering. Otra característica importante de los datos, distinta de la presencia de etiquetas, es la relevancia o no de los atributos de los datos. Los datos se caracterizan por atributos, pero es posible que no todos ellos sean relevantes, o igualmente relevantes, para el proceso de aprendizaje. Una tendencia reciente en clustering, relacionada con la relevancia de los datos y llamada clustering en subespacios, afirma que agrupaciones diferentes pueden estar descritas por subconjuntos de atributos diferentes. Esto difiere de las soluciones tradicionales para el problema de la relevancia de los datos, en las que se busca un único subconjunto de atributos (normalmente el conjunto original de atributos) y se utiliza para realizar el proceso de clustering. La cercanía de este trabajo con el clustering lleva al primer objetivo de la tesis. Como se ha comentado previamente, la validación en clustering es una tarea difícil debido a la ausencia de etiquetas. Aunque existen muchos índices que pueden usarse para evaluar la calidad de las soluciones de clustering, estas validaciones dependen de los algoritmos de clustering utilizados y de las características de los datos. Por lo tanto, en el primer objetivo tres conocidos algoritmos se usan para agrupar datos con valores atípicos y ruido para estudiar de forma crítica cómo se comportan algunos de los índices de validación más conocidos. El objetivo principal de este trabajo sin embargo es combinar clustering semi-supervisado con clustering en subespacios para obtener soluciones de clustering que puedan ser validadas de forma correcta utilizando índices conocidos u opiniones expertas. Se proponen dos algoritmos desde dos puntos de vista diferentes para descubrir agrupaciones caracterizadas por diferentes subespacios. Para el primer algoritmo, las etiquetas disponibles se usan para bus¬car en primer lugar los subespacios antes de buscar las agrupaciones. Este algoritmo asigna cada instancia a un único cluster (hard clustering) y se basa en mapear las etiquetas cono-cidas a subespacios utilizando técnicas de clasificación supervisada. El segundo algoritmo utiliza las etiquetas disponibles para buscar de forma simultánea los subespacios y las agru¬paciones en un proceso iterativo. Este algoritmo asigna cada instancia a cada cluster con una probabilidad de pertenencia (soft clustering) y se basa en integrar las etiquetas conocidas y la búsqueda en subespacios dentro de clustering basado en modelos. Las propuestas son probadas utilizando diferentes bases de datos reales y sintéticas, incluyendo comparaciones con otros métodos cuando resulten apropiadas. Finalmente, a modo de ejemplo de una aplicación real y actual, se aplican diferentes técnicas de aprendizaje automático, incluyendo una de las propuestas de este trabajo (la más sofisticada) a una tarea de uno de los problemas biológicos más desafiantes hoy en día, el modelado del cerebro humano. Específicamente, expertos neurocientíficos no se ponen de acuerdo en una clasificación de neuronas para la corteza cerebral, lo que imposibilita no sólo cualquier intento de modelado sino también el trabajo del día a día al no tener una forma estándar de llamar a las neuronas. Por lo tanto, las técnicas de aprendizaje automático pueden ayudar a conseguir una solución aceptada para este problema, lo cual puede ser un importante hito para investigaciones futuras en neurociencia.
Resumo:
Propagation of nuclear data uncertainties to calculated values is interesting for design purposes and libraries evaluation. XSUSA, developed at GRS, propagates cross section uncertainties to nuclear calculations. In depletion simulations, fission yields and decay data are also involved and suppose a possible source of uncertainty that must be taken into account. We have developed tools to generate varied fission yields and decay libraries and to propagate uncertainties trough depletion in order to complete the XSUSA uncertainty assessment capabilities. A simple test to probe the methodology is defined and discussed.
Resumo:
Controversy still exists over the adaptive nature of variation of enzyme loci. In conifers, random amplified polymorphic DNAs (RAPDs) represent a class of marker loci that is unlikely to fall within or be strongly linked to coding DNA. We have compared the genetic diversity in natural populations of black spruce [Picea mariana (Mill.) B.S.P.] using genotypic data at allozyme loci and RAPD loci as well as phenotypic data from inferred RAPD fingerprints. The genotypic data for both allozymes and RAPDs were obtained from at least six haploid megagametophytes for each of 75 sexually mature individuals distributed in five populations. Heterozygosities and population fixation indices were in complete agreement between allozyme loci and RAPD loci. In black spruce, it is more likely that the similar levels of variation detected at both enzyme and RAPD loci are due to such evolutionary forces as migration and the mating system, rather than to balancing selection and overdominance. Furthermore, we show that biased estimates of expected heterozygosity and among-population differentiation are obtained when using allele frequencies derived from dominant RAPD phenotypes.
Resumo:
Rock mass characterization requires a deep geometric understanding of the discontinuity sets affecting rock exposures. Recent advances in Light Detection and Ranging (LiDAR) instrumentation currently allow quick and accurate 3D data acquisition, yielding on the development of new methodologies for the automatic characterization of rock mass discontinuities. This paper presents a methodology for the identification and analysis of flat surfaces outcropping in a rocky slope using the 3D data obtained with LiDAR. This method identifies and defines the algebraic equations of the different planes of the rock slope surface by applying an analysis based on a neighbouring points coplanarity test, finding principal orientations by Kernel Density Estimation and identifying clusters by the Density-Based Scan Algorithm with Noise. Different sources of information —synthetic and 3D scanned data— were employed, performing a complete sensitivity analysis of the parameters in order to identify the optimal value of the variables of the proposed method. In addition, raw source files and obtained results are freely provided in order to allow to a more straightforward method comparison aiming to a more reproducible research.
Resumo:
The multiobjective optimization model studied in this paper deals with simultaneous minimization of finitely many linear functions subject to an arbitrary number of uncertain linear constraints. We first provide a radius of robust feasibility guaranteeing the feasibility of the robust counterpart under affine data parametrization. We then establish dual characterizations of robust solutions of our model that are immunized against data uncertainty by way of characterizing corresponding solutions of robust counterpart of the model. Consequently, we present robust duality theorems relating the value of the robust model with the corresponding value of its dual problem.
Resumo:
An interactive hierarchical Generative Topographic Mapping (HGTM) ¸iteHGTM has been developed to visualise complex data sets. In this paper, we build a more general visualisation system by extending the HGTM visualisation system in 3 directions: bf (1) We generalize HGTM to noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM) developed in ¸iteKabanpami. bf (2) We give the user a choice of initializing the child plots of the current plot in either em interactive, or em automatic mode. In the interactive mode the user interactively selects ``regions of interest'' as in ¸iteHGTM, whereas in the automatic mode an unsupervised minimum message length (MML)-driven construction of a mixture of LTMs is employed. bf (3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. We illustrate our approach on a toy example and apply our system to three more complex real data sets.
Resumo:
Data Envelopment Analysis (DEA) is a nonparametric method for measuring the efficiency of a set of decision making units such as firms or public sector agencies, first introduced into the operational research and management science literature by Charnes, Cooper, and Rhodes (CCR) [Charnes, A., Cooper, W.W., Rhodes, E., 1978. Measuring the efficiency of decision making units. European Journal of Operational Research 2, 429–444]. The original DEA models were applicable only to technologies characterized by positive inputs/outputs. In subsequent literature there have been various approaches to enable DEA to deal with negative data. In this paper, we propose a semi-oriented radial measure, which permits the presence of variables which can take both negative and positive values. The model is applied to data on a notional effluent processing system to compare the results with those yielded by two alternative methods for dealing with negative data in DEA: The modified slacks-based model suggested by Sharp et al. [Sharp, J.A., Liu, W.B., Meng, W., 2006. A modified slacks-based measure model for data envelopment analysis with ‘natural’ negative outputs and inputs. Journal of Operational Research Society 57 (11) 1–6] and the range directional model developed by Portela et al. [Portela, M.C.A.S., Thanassoulis, E., Simpson, G., 2004. A directional distance approach to deal with negative data in DEA: An application to bank branches. Journal of Operational Research Society 55 (10) 1111–1121]. A further example explores the advantages of using the new model.
Resumo:
Over the last few years Data Envelopment Analysis (DEA) has been gaining increasing popularity as a tool for measuring efficiency and productivity of Decision Making Units (DMUs). Conventional DEA models assume non-negative inputs and outputs. However, in many real applications, some inputs and/or outputs can take negative values. Recently, Emrouznejad et al. [6] introduced a Semi-Oriented Radial Measure (SORM) for modelling DEA with negative data. This paper points out some issues in target setting with SORM models and introduces a modified SORM approach. An empirical study in bank sector demonstrates the applicability of the proposed model. © 2014 Elsevier Ltd. All rights reserved.
Resumo:
Николай Кутев, Величка Милушева - Намираме експлицитно всичките би-омбилични фолирани полусиметрични повърхнини в четиримерното евклидово пространство R^4
Resumo:
In 2001, a weather and climate monitoring network was established along the temperature and aridity gradient between the sub-humid Moroccan High Atlas Mountains and the former end lake of the Middle Drâa in a pre-Saharan environment. The highest Automated Weather Stations (AWS) was installed just below the M'Goun summit at 3850 m, the lowest station Lac Iriki was at 450 m. This network of 13 AWS stations was funded and maintained by the German IMPETUS (BMBF Grant 01LW06001A, North Rhine-Westphalia Grant 313-21200200) project and since 2011 five stations were further maintained by the GERMAN DFG Fennec project (FI 786/3-1), this way some stations of the AWS network provided data for almost 12 years from 2001-2012. Standard meteorological variables such as temperature, humidity, and wind were measured at an altitude of 2 m above ground. Other meteorological variables comprise precipitation, station pressure, solar irradiance, soil temperature at different depths and for high mountain station snow water equivalent. The stations produced data summaries for 5-minute-precipitation-data, 10- or 15-minute-data and a daily summary of all other variables. This network is a unique resource of multi-year weather data in the remote semi-arid to arid mountain region of the Saharan flank of the Atlas Mountains. The network is described in Schulz et al. (2010) and its further continuation until 2012 is briefly discussed in Redl et al. (2015, doi:10.1175/MWR-D-15-0223.1) and Redl et al. (2016, doi:10.1002/2015JD024443).
Resumo:
© The Author 2016. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.