974 resultados para Subset Sum Problem


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper considers the problem of language change. Linguists must explain not only how languages are learned but also how and why they have evolved along certain trajectories and not others. While the language learning problem has focused on the behavior of individuals and how they acquire a particular grammar from a class of grammars ${cal G}$, here we consider a population of such learners and investigate the emergent, global population characteristics of linguistic communities over several generations. We argue that language change follows logically from specific assumptions about grammatical theories and learning paradigms. In particular, we are able to transform parameterized theories and memoryless acquisition algorithms into grammatical dynamical systems, whose evolution depicts a population's evolving linguistic composition. We investigate the linguistic and computational consequences of this model, showing that the formalization allows one to ask questions about diachronic that one otherwise could not ask, such as the effect of varying initial conditions on the resulting diachronic trajectories. From a more programmatic perspective, we give an example of how the dynamical system model for language change can serve as a way to distinguish among alternative grammatical theories, introducing a formal diachronic adequacy criterion for linguistic theories.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Support Vector Machines (SVMs) perform pattern recognition between two point classes by finding a decision surface determined by certain points of the training set, termed Support Vectors (SV). This surface, which in some feature space of possibly infinite dimension can be regarded as a hyperplane, is obtained from the solution of a problem of quadratic programming that depends on a regularization parameter. In this paper we study some mathematical properties of support vectors and show that the decision surface can be written as the sum of two orthogonal terms, the first depending only on the margin vectors (which are SVs lying on the margin), the second proportional to the regularization parameter. For almost all values of the parameter, this enables us to predict how the decision surface varies for small parameter changes. In the special but important case of feature space of finite dimension m, we also show that there are at most m+1 margin vectors and observe that m+1 SVs are usually sufficient to fully determine the decision surface. For relatively small m this latter result leads to a consistent reduction of the SV number.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Array technologies have made it possible to record simultaneously the expression pattern of thousands of genes. A fundamental problem in the analysis of gene expression data is the identification of highly relevant genes that either discriminate between phenotypic labels or are important with respect to the cellular process studied in the experiment: for example cell cycle or heat shock in yeast experiments, chemical or genetic perturbations of mammalian cell lines, and genes involved in class discovery for human tumors. In this paper we focus on the task of unsupervised gene selection. The problem of selecting a small subset of genes is particularly challenging as the datasets involved are typically characterized by a very small sample size ?? the order of few tens of tissue samples ??d by a very large feature space as the number of genes tend to be in the high thousands. We propose a model independent approach which scores candidate gene selections using spectral properties of the candidate affinity matrix. The algorithm is very straightforward to implement yet contains a number of remarkable properties which guarantee consistent sparse selections. To illustrate the value of our approach we applied our algorithm on five different datasets. The first consists of time course data from four well studied Hematopoietic cell lines (HL-60, Jurkat, NB4, and U937). The other four datasets include three well studied treatment outcomes (large cell lymphoma, childhood medulloblastomas, breast tumors) and one unpublished dataset (lymph status). We compared our approach both with other unsupervised methods (SOM,PCA,GS) and with supervised methods (SNR,RMB,RFE). The results clearly show that our approach considerably outperforms all the other unsupervised approaches in our study, is competitive with supervised methods and in some case even outperforms supervised approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report outlines the problem of intelligent failure recovery in a problem-solver for electrical design. We want our problem solver to learn as much as it can from its mistakes. Thus we cast the engineering design process on terms of Problem Solving by Debugging Almost-Right Plans, a paradigm for automatic problem solving based on the belief that creation and removal of "bugs" is an unavoidable part of the process of solving a complex problem. The process of localization and removal of bugs called for by the PSBDARP theory requires an approach to engineering analysis in which every result has a justification which describes the exact set of assumptions it depends upon. We have developed a program based on Analysis by Propagation of Constraints which can explain the basis of its deductions. In addition to being useful to a PSBDARP designer, these justifications are used in Dependency-Directed Backtracking to limit the combinatorial search in the analysis routines. Although the research we will describe is explicitly about electrical circuits, we believe that similar principles and methods are employed by other kinds of engineers, including computer programmers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We shall call an n × p data matrix fully-compositional if the rows sum to a constant, and sub-compositional if the variables are a subset of a fully-compositional data set1. Such data occur widely in archaeometry, where it is common to determine the chemical composition of ceramic, glass, metal or other artefacts using techniques such as neutron activation analysis (NAA), inductively coupled plasma spectroscopy (ICPS), X-ray fluorescence analysis (XRF) etc. Interest often centres on whether there are distinct chemical groups within the data and whether, for example, these can be associated with different origins or manufacturing technologies

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The statistical analysis of literary style is the part of stylometry that compares measurable characteristics in a text that are rarely controlled by the author, with those in other texts. When the goal is to settle authorship questions, these characteristics should relate to the author’s style and not to the genre, epoch or editor, and they should be such that their variation between authors is larger than the variation within comparable texts from the same author. For an overview of the literature on stylometry and some of the techniques involved, see for example Mosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) or Lebart, Salem and Berry (1998). Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be “the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writters like Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translated several times into Spanish, Italian and French, with modern English translations by Rosenthal (1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465, but it was not printed until 1490. There is an intense and long lasting debate around its authorship sprouting from its first edition, where its introduction states that the whole book is the work of Martorell (1413?-1468), while at the end it is stated that the last one fourth of the book is by Galba (?-1490), after the death of Martorell. Some of the authors that support the theory of single authorship are Riquer (1990), Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer (1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990). Neither of the two candidate authors left any text comparable to the one under study, and therefore discriminant analysis can not be used to help classify chapters by author. By using sample texts encompassing about ten percent of the book, and looking at word length and at the use of 44 conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that might indicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba and Ginebra (2000) estimates that stylistic boundary to be near chapter 383. Following the lead of the extensive literature, this paper looks into word length, the use of the most frequent words and into the use of vowels in each chapter of the book. Given that the features selected are categorical, that leads to three contingency tables of ordered rows and therefore to three sequences of multinomial observations. Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3 describes the problem of the estimation of a suden change-point in those sequences, in the following sections we propose various ways to estimate change-points in multinomial sequences; the method in section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma models onto the sequence of Chi-square distances between each row profiles and the average profile, the one in Section 6 fits models onto the sequence of values taken by the first component of the correspondence analysis as well as onto sequences of other summary measures like the average word length. In Section 7 we fit models onto the marginal binomial sequences to identify the features that distinguish the chapters before and after that boundary. Most methods rely heavily on the use of generalized linear models

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The application of Discriminant function analysis (DFA) is not a new idea in the study of tephrochrology. In this paper, DFA is applied to compositional datasets of two different types of tephras from Mountain Ruapehu in New Zealand and Mountain Rainier in USA. The canonical variables from the analysis are further investigated with a statistical methodology of change-point problems in order to gain a better understanding of the change in compositional pattern over time. Finally, a special case of segmented regression has been proposed to model both the time of change and the change in pattern. This model can be used to estimate the age for the unknown tephras using Bayesian statistical calibration

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Self-organizing maps (Kohonen 1997) is a type of artificial neural network developed to explore patterns in high-dimensional multivariate data. The conventional version of the algorithm involves the use of Euclidean metric in the process of adaptation of the model vectors, thus rendering in theory a whole methodology incompatible with non-Euclidean geometries. In this contribution we explore the two main aspects of the problem: 1. Whether the conventional approach using Euclidean metric can shed valid results with compositional data. 2. If a modification of the conventional approach replacing vectorial sum and scalar multiplication by the canonical operators in the simplex (i.e. perturbation and powering) can converge to an adequate solution. Preliminary tests showed that both methodologies can be used on compositional data. However, the modified version of the algorithm performs poorer than the conventional version, in particular, when the data is pathological. Moreover, the conventional ap- proach converges faster to a solution, when data is \well-behaved". Key words: Self Organizing Map; Artificial Neural networks; Compositional data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Our essay aims at studying suitable statistical methods for the clustering of compositional data in situations where observations are constituted by trajectories of compositional data, that is, by sequences of composition measurements along a domain. Observed trajectories are known as “functional data” and several methods have been proposed for their analysis. In particular, methods for clustering functional data, known as Functional Cluster Analysis (FCA), have been applied by practitioners and scientists in many fields. To our knowledge, FCA techniques have not been extended to cope with the problem of clustering compositional data trajectories. In order to extend FCA techniques to the analysis of compositional data, FCA clustering techniques have to be adapted by using a suitable compositional algebra. The present work centres on the following question: given a sample of compositional data trajectories, how can we formulate a segmentation procedure giving homogeneous classes? To address this problem we follow the steps described below. First of all we adapt the well-known spline smoothing techniques in order to cope with the smoothing of compositional data trajectories. In fact, an observed curve can be thought of as the sum of a smooth part plus some noise due to measurement errors. Spline smoothing techniques are used to isolate the smooth part of the trajectory: clustering algorithms are then applied to these smooth curves. The second step consists in building suitable metrics for measuring the dissimilarity between trajectories: we propose a metric that accounts for difference in both shape and level, and a metric accounting for differences in shape only. A simulation study is performed in order to evaluate the proposed methodologies, using both hierarchical and partitional clustering algorithm. The quality of the obtained results is assessed by means of several indices

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Determinar el alcance de los objetivos y la naturaleza de las concepciones que tienen los-as estudiantes y los materiales did??cticos, sobre la tem??tica de la energ??a, se formula los siguientes problemas: 1. ??Cu??les son las concepciones de los-las estudiantes de Magisterio sobre los modelos de Educaci??n Ambiental? 2. ??Cu??les son las concepciones de los-las estudiantes sobre los problemas socioambientales que consideran m??s importantes y la idea de riesgo asociada a los mismos? 3. ??Cu??les son las concepciones de los-las estudiantes sobre el papel que juega la participaci??n en el proceso de Educaci??n Ambiental? 4. ??Cu??les son las concepciones de los-las estudiantes sobre la energ??a y el papel que juega la energ??a como problema socioambiental? 5. ??Cu??les son las concepciones did??cticas dominantes de los-las estudiantes sobre el tratamiento did??ctico de la energ??a? 6. ??Cu??les son las concepciones did??cticas dominantes de los materiales seleccionados sobre el tratamiento de la energ??a? 7. ??Existe alguna correspondencia entre las concepciones de los-las estudiantes y las concepciones did??cticas de los materiales seleccionados sobre el tratamiento did??ctico de la energ??a? 8. ??Existen diferencias en las concepciones de los-las estudiantes sobre algunos aspectos de la Educaci??n Ambiental y de la energ??a dependiendo del momento de la investigaci??n? 9. ??Existe coherencia en las concepciones de los-las estudiantes y de los materiales?. El paradigma metodol??gico elegido se ubica en la definici??n del paradigma interpretativo, denominado tambi??n naturalista, de enfoque ecol??gico o etnogr??fico, a trav??s de un estudio de caso. Se selecciona un grupo de estudiantes de la Facultad de Ciencias de la Educaci??n de la Universidad de Sevilla, de las modalidades de Educaci??n Primaria y Especial, que cursan la asignatura optativa de Educaci??n Ambiental (EA). La muestra se compone de 12 hombres y 38 mujeres, sumando un total de 50 personas, solo se tiene materiales escritos de forma individual de 45 personas, se agrupan para trabajar de forma conjunta formando 12 grupos. Dentro del planteamiento metodol??gico adoptado, se utiliza diferentes t??cnicas e instrumentos: la b??squeda y an??lisis de materiales de EA desde un punto de vista did??ctico. Observaci??n externa y recogida de informaci??n en el diario de clase. Cuestionarios y documentos de trabajo. Grabaciones de algunas sesiones de clase. Grupo de discusi??n. Para el tratamiento de los datos se dise??a un sistema de categor??as que sistematizara las respuestas. Las respuestas m??s encontradas, sobre el modelo de EA y el car??cter interdisciplinar, se encuentran en los valores m??s simples respecto a los valores considerados en el sistema de categor??as creados, los cuales mayoritariamente vinculan la EA con un modelo conservacionista y con actividades puntuales, lo cual indica que lo que entiende los-as educadores-as ambientales sobre EA, suele estar 'identificado con el amor a la naturaleza, con las salidas fuera del aula, generalmente 'al campo', con la recogida de muestras o la realizaci??n de an??lisis o reciclado de papel' y 'ven la Educaci??n Ambiental como algo ajeno a las dem??s materias que se lleva a cabo en algunas determinadas fechas y que debe tener un curr??culo establecido y diferente del de otras materias, a excepci??n de las ciencias naturales, con las que de alguna manera se liga la Educaci??n Ambiental y aparece una alta relaci??n entre esta visi??n de la EA y la corriente inductivista del aprendizaje'.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Resumen tomado de la publicaci??n

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Resumen del autor. Res??menes en espa??ol e ingl??s

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Epipolar geometry is a key point in computer vision and the fundamental matrix estimation is the only way to compute it. This article surveys several methods of fundamental matrix estimation which have been classified into linear methods, iterative methods and robust methods. All of these methods have been programmed and their accuracy analysed using real images. A summary, accompanied with experimental results, is given

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We will discuss several examples and research efforts related to the small world problem and set the ground for our discussion of network theory and social network analysis. Readings: An Experimental Study of the Small World Problem, J. Travers and S. Milgram Sociometry 32 425-443 (1969) [Protected Access] Optional: The Strength of Weak Ties, M.S. Granovetter The American Journal of Sociology 78 1360--1380 (1973) [Protected Access] Optional: Worldwide Buzz: Planetary-Scale Views on an Instant-Messaging Network, J. Leskovec and E. Horvitz MSR-TR-2006-186. Microsoft Research, June 2007. [Web Link, the most recent and comprehensive study on the subject!] Originally from: http://kmi.tugraz.at/staff/markus/courses/SS2008/707.000_web-science/