15 resultados para Data-representation
em Université de Lausanne, Switzerland
Resumo:
The proportion of population living in or around cites is more important than ever. Urban sprawl and car dependence have taken over the pedestrian-friendly compact city. Environmental problems like air pollution, land waste or noise, and health problems are the result of this still continuing process. The urban planners have to find solutions to these complex problems, and at the same time insure the economic performance of the city and its surroundings. At the same time, an increasing quantity of socio-economic and environmental data is acquired. In order to get a better understanding of the processes and phenomena taking place in the complex urban environment, these data should be analysed. Numerous methods for modelling and simulating such a system exist and are still under development and can be exploited by the urban geographers for improving our understanding of the urban metabolism. Modern and innovative visualisation techniques help in communicating the results of such models and simulations. This thesis covers several methods for analysis, modelling, simulation and visualisation of problems related to urban geography. The analysis of high dimensional socio-economic data using artificial neural network techniques, especially self-organising maps, is showed using two examples at different scales. The problem of spatiotemporal modelling and data representation is treated and some possible solutions are shown. The simulation of urban dynamics and more specifically the traffic due to commuting to work is illustrated using multi-agent micro-simulation techniques. A section on visualisation methods presents cartograms for transforming the geographic space into a feature space, and the distance circle map, a centre-based map representation particularly useful for urban agglomerations. Some issues on the importance of scale in urban analysis and clustering of urban phenomena are exposed. A new approach on how to define urban areas at different scales is developed, and the link with percolation theory established. Fractal statistics, especially the lacunarity measure, and scale laws are used for characterising urban clusters. In a last section, the population evolution is modelled using a model close to the well-established gravity model. The work covers quite a wide range of methods useful in urban geography. Methods should still be developed further and at the same time find their way into the daily work and decision process of urban planners. La part de personnes vivant dans une région urbaine est plus élevé que jamais et continue à croître. L'étalement urbain et la dépendance automobile ont supplanté la ville compacte adaptée aux piétons. La pollution de l'air, le gaspillage du sol, le bruit, et des problèmes de santé pour les habitants en sont la conséquence. Les urbanistes doivent trouver, ensemble avec toute la société, des solutions à ces problèmes complexes. En même temps, il faut assurer la performance économique de la ville et de sa région. Actuellement, une quantité grandissante de données socio-économiques et environnementales est récoltée. Pour mieux comprendre les processus et phénomènes du système complexe "ville", ces données doivent être traitées et analysées. Des nombreuses méthodes pour modéliser et simuler un tel système existent et sont continuellement en développement. Elles peuvent être exploitées par le géographe urbain pour améliorer sa connaissance du métabolisme urbain. Des techniques modernes et innovatrices de visualisation aident dans la communication des résultats de tels modèles et simulations. Cette thèse décrit plusieurs méthodes permettant d'analyser, de modéliser, de simuler et de visualiser des phénomènes urbains. L'analyse de données socio-économiques à très haute dimension à l'aide de réseaux de neurones artificiels, notamment des cartes auto-organisatrices, est montré à travers deux exemples aux échelles différentes. Le problème de modélisation spatio-temporelle et de représentation des données est discuté et quelques ébauches de solutions esquissées. La simulation de la dynamique urbaine, et plus spécifiquement du trafic automobile engendré par les pendulaires est illustrée à l'aide d'une simulation multi-agents. Une section sur les méthodes de visualisation montre des cartes en anamorphoses permettant de transformer l'espace géographique en espace fonctionnel. Un autre type de carte, les cartes circulaires, est présenté. Ce type de carte est particulièrement utile pour les agglomérations urbaines. Quelques questions liées à l'importance de l'échelle dans l'analyse urbaine sont également discutées. Une nouvelle approche pour définir des clusters urbains à des échelles différentes est développée, et le lien avec la théorie de la percolation est établi. Des statistiques fractales, notamment la lacunarité, sont utilisées pour caractériser ces clusters urbains. L'évolution de la population est modélisée à l'aide d'un modèle proche du modèle gravitaire bien connu. Le travail couvre une large panoplie de méthodes utiles en géographie urbaine. Toutefois, il est toujours nécessaire de développer plus loin ces méthodes et en même temps, elles doivent trouver leur chemin dans la vie quotidienne des urbanistes et planificateurs.
Resumo:
Sound localization relies on the analysis of interaural time and intensity differences, as well as attenuation patterns by the outer ear. We investigated the relative contributions of interaural time and intensity difference cues to sound localization by testing 60 healthy subjects: 25 with focal left and 25 with focal right hemispheric brain damage. Group and single-case behavioural analyses, as well as anatomo-clinical correlations, confirmed that deficits were more frequent and much more severe after right than left hemispheric lesions and for the processing of interaural time than intensity difference cues. For spatial processing based on interaural time difference cues, different error types were evident in the individual data. Deficits in discriminating between neighbouring positions occurred in both hemispaces after focal right hemispheric brain damage, but were restricted to the contralesional hemispace after focal left hemispheric brain damage. Alloacusis (perceptual shifts across the midline) occurred only after focal right hemispheric brain damage and was associated with minor or severe deficits in position discrimination. During spatial processing based on interaural intensity cues, deficits were less severe in the right hemispheric brain damage than left hemispheric brain damage group and no alloacusis occurred. These results, matched to anatomical data, suggest the existence of a binaural sound localization system predominantly based on interaural time difference cues and primarily supported by the right hemisphere. More generally, our data suggest that two distinct mechanisms contribute to: (i) the precise computation of spatial coordinates allowing spatial comparison within the contralateral hemispace for the left hemisphere and the whole space for the right hemisphere; and (ii) the building up of global auditory spatial representations in right temporo-parietal cortices.
Resumo:
Recent technological advances in remote sensing have enabled investigation of the morphodynamics and hydrodynamics of large rivers. However, measuring topography and flow in these very large rivers is time consuming and thus often constrains the spatial resolution and reach-length scales that can be monitored. Similar constraints exist for computational fluid dynamics (CFD) studies of large rivers, requiring maximization of mesh-or grid-cell dimensions and implying a reduction in the representation of bedform-roughness elements that are of the order of a model grid cell or less, even if they are represented in available topographic data. These ``subgrid'' elements must be parameterized, and this paper applies and considers the impact of roughness-length treatments that include the effect of bed roughness due to ``unmeasured'' topography. CFD predictions were found to be sensitive to the roughness-length specification. Model optimization was based on acoustic Doppler current profiler measurements and estimates of the water surface slope for a variety of roughness lengths. This proved difficult as the metrics used to assess optimal model performance diverged due to the effects of large bedforms that are not well parameterized in roughness-length treatments. However, the general spatial flow patterns are effectively predicted by the model. Changes in roughness length were shown to have a major impact upon flow routing at the channel scale. The results also indicate an absence of secondary flow circulation cells in the reached studied, and suggest simpler two-dimensional models may have great utility in the investigation of flow within large rivers. Citation: Sandbach, S. D. et al. (2012), Application of a roughness-length representation to parameterize energy loss in 3-D numerical simulations of large rivers, Water Resour. Res., 48, W12501, doi: 10.1029/2011WR011284.
Resumo:
The HUPO Proteomics Standards Initiative has developed several standardized data formats to facilitate data sharing in mass spectrometry (MS)-based proteomics. These allow researchers to report their complete results in a unified way. However, at present, there is no format to describe the final qualitative and quantitative results for proteomics and metabolomics experiments in a simple tabular format. Many downstream analysis use cases are only concerned with the final results of an experiment and require an easily accessible format, compatible with tools such as Microsoft Excel or R. We developed the mzTab file format for MS-based proteomics and metabolomics results to meet this need. mzTab is intended as a lightweight supplement to the existing standard XML-based file formats (mzML, mzIdentML, mzQuantML), providing a comprehensive summary, similar in concept to the supplemental material of a scientific publication. mzTab files can contain protein, peptide, and small molecule identifications together with experimental metadata and basic quantitative information. The format is not intended to store the complete experimental evidence but provides mechanisms to report results at different levels of detail. These range from a simple summary of the final results to a representation of the results including the experimental design. This format is ideally suited to make MS-based proteomics and metabolomics results available to a wider biological community outside the field of MS. Several software tools for proteomics and metabolomics have already adapted the format as an output format. The comprehensive mzTab specification document and extensive additional documentation can be found online.
Resumo:
Analyzing functional data often leads to finding common factors, for which functional principal component analysis proves to be a useful tool to summarize and characterize the random variation in a function space. The representation in terms of eigenfunctions is optimal in the sense of L-2 approximation. However, the eigenfunctions are not always directed towards an interesting and interpretable direction in the context of functional data and thus could obscure the underlying structure. To overcome such difficulty, an alternative to functional principal component analysis is proposed that produces directed components which may be more informative and easier to interpret. These structural components are similar to principal components, but are adapted to situations in which the domain of the function may be decomposed into disjoint intervals such that there is effectively independence between intervals and positive correlation within intervals. The approach is demonstrated with synthetic examples as well as real data. Properties for special cases are also studied.
Resumo:
Spatial data on species distributions are available in two main forms, point locations and distribution maps (polygon ranges and grids). The first are often temporally and spatially biased, and too discontinuous, to be useful (untransformed) in spatial analyses. A variety of modelling approaches are used to transform point locations into maps. We discuss the attributes that point location data and distribution maps must satisfy in order to be useful in conservation planning. We recommend that before point location data are used to produce and/or evaluate distribution models, the dataset should be assessed under a set of criteria, including sample size, age of data, environmental/geographical coverage, independence, accuracy, time relevance and (often forgotten) representation of areas of permanent and natural presence of the species. Distribution maps must satisfy additional attributes if used for conservation analyses and strategies, including minimizing commission and omission errors, credibility of the source/assessors and availability for public screening. We review currently available databases for mammals globally and show that they are highly variable in complying with these attributes. The heterogeneity and weakness of spatial data seriously constrain their utility to global and also sub-global scale conservation analyses.
Resumo:
Les inégalités économiques se traduisent-elles dans des inégalités politiques à travers le processus de représentation électorale? Telle est la question centrale de cette thèse qui s'attache, par ailleurs, à investiguer les mécanismes qui tendent à produire une représentation biaisée des préférences politiques des citoyens en fonction de leur statut économique. Focalisé sur le cas de la Suisse et faisant usage des données de l'enquête postélectorale Selects de 2007, ce travail démontre que sur les rares sujets qui divisent les citoyens selon des clivages économiques - la redistribution des richesses et la sécurité sociale en particulier - les élus à l'Assemblée fédérale ont des préférences qui reflètent mieux les opinions des citoyens les plus riches. Cette sous-représentation des opinions des citoyens modestes et de ceux faisant partie du centre de la distribution des revenus peut en partie être attribuée à des différences dans les taux de participation et de connaissance politiques entre ces groupes de citoyens. La thèse met également en évidence le rôle joué par la représentation descriptive - autrement dit, la similitude en termes de statut économique entre les représentants et les représentés - dans la représentation des opinions et intérêts des citoyens. Par ailleurs, la structure du système partisan en Suisse ne reflétant pas la multidimensionnalité des préférences politiques des citoyens, les électeurs ne parviennent pas à traduire la complexité de leurs préférences politiques dans un choix de vote, ce qui, dans la configuration actuelle des forces politiques, tend à favoriser l'élection de représentants aux opinions proches de la droite sur les questions économiques. Enfin, une analyse de la représentation politique au niveau cantonal tend à soutenir la thèse selon laquelle le manque de régulation en matière de financement des partis en Suisse pourrait partiellement expliquer les inégalités dans la représentation des opinions politiques des citoyens aux revenus distincts. - Do economic inequalities translate into political inequalities through electoral representation? This is the central research question of this thesis, which also investigates the mechanisms that lead to potential economically based inequalities in the representation of citizens' policy preferences. Focusing on the case of Switzerland and making use of data provided by the post- electoral survey Selects 2007, this research demonstrates that regarding the rare policy domains in which the preferences of citizens are clearly linked to economic cleavages - redistribution and social security in particular - members of the Federal Assembly have policy preferences that best reflect the policy preferences of richer citizens. The under-representation of the opinions of relatively poor citizens and of those being the in the middle of the income distribution can be to some extent be explained by differences in political participation and political information across income groups. The thesis also puts forward the role played by descriptive representation - the similarity between representatives and represented in terms of their socioeconomic status - for the representation of citizens' preferences and interests. In addition, the structure of the party system in Switzerland does not reflect the multidimensionality of policy preferences among citizens who, as a result, have a hard time translating their complex preferences into a vote choice. Given the configuration of political actors, this tends to favour the election of representatives from the right who do not represent the preferences of their voters on economic issues. Finally, an analysis of representation at the cantonal level tends to confirm that the lack of party finance regulations in Switzerland may partially explain inequalities in the representation of citizens with different levels of income.
Resumo:
The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.
Resumo:
BACKGROUND: Qualitative frameworks, especially those based on the logical discrete formalism, are increasingly used to model regulatory and signalling networks. A major advantage of these frameworks is that they do not require precise quantitative data, and that they are well-suited for studies of large networks. While numerous groups have developed specific computational tools that provide original methods to analyse qualitative models, a standard format to exchange qualitative models has been missing. RESULTS: We present the Systems Biology Markup Language (SBML) Qualitative Models Package ("qual"), an extension of the SBML Level 3 standard designed for computer representation of qualitative models of biological networks. We demonstrate the interoperability of models via SBML qual through the analysis of a specific signalling network by three independent software tools. Furthermore, the collective effort to define the SBML qual format paved the way for the development of LogicalModel, an open-source model library, which will facilitate the adoption of the format as well as the collaborative development of algorithms to analyse qualitative models. CONCLUSIONS: SBML qual allows the exchange of qualitative models among a number of complementary software tools. SBML qual has the potential to promote collaborative work on the development of novel computational approaches, as well as on the specification and the analysis of comprehensive qualitative models of regulatory and signalling networks.
Resumo:
For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.
Resumo:
The broad aim of biomedical science in the postgenomic era is to link genomic and phenotype information to allow deeper understanding of the processes leading from genomic changes to altered phenotype and disease. The EuroPhenome project (http://www.EuroPhenome.org) is a comprehensive resource for raw and annotated high-throughput phenotyping data arising from projects such as EUMODIC. EUMODIC is gathering data from the EMPReSSslim pipeline (http://www.empress.har.mrc.ac.uk/) which is performed on inbred mouse strains and knock-out lines arising from the EUCOMM project. The EuroPhenome interface allows the user to access the data via the phenotype or genotype. It also allows the user to access the data in a variety of ways, including graphical display, statistical analysis and access to the raw data via web services. The raw phenotyping data captured in EuroPhenome is annotated by an annotation pipeline which automatically identifies statistically different mutants from the appropriate baseline and assigns ontology terms for that specific test. Mutant phenotypes can be quickly identified using two EuroPhenome tools: PhenoMap, a graphical representation of statistically relevant phenotypes, and mining for a mutant using ontology terms. To assist with data definition and cross-database comparisons, phenotype data is annotated using combinations of terms from biological ontologies.
Resumo:
While equal political representation of all citizens is a fundamental democratic goal, it is hampered empirically in a multitude of ways. This study examines how the societal level of economic inequality affects the representation of relatively poor citizens by parties and governments. Using CSES survey data for citizens' policy preferences and expert placements of political parties, empirical evidence is found that in economically more unequal societies, the party system represents the preferences of relatively poor citizens worse than in more equal societies. This moderating effect of economic equality is also found for policy congruence between citizens and governments, albeit slightly less clear-cut.
Resumo:
BACKGROUND: The need to contextualise wastewater-based figures about illicit drug consumption by comparing them with other indicators has been stressed by numerous studies. The objective of the present study was to further investigate the possibility of combining wastewater data to conventional statistics to assess the reliability of the former method and obtain a more balanced picture of illicit drug consumption in the investigated area. METHODS: Wastewater samples were collected between October 2013 and July 2014 in the metropolitan area of Lausanne (226,000 inhabitants), Switzerland. Methadone, its metabolite 2-ethylidene-1,5-dimethyl-3,3-diphenylpyrrolidine (EDDP), the exclusive metabolite of heroin, 6-monoacetylmorphine (6-MAM), and morphine loads were used to estimate the amounts of methadone and heroin consumed. RESULTS: Methadone consumption estimated from EDDP was in agreement with the expectations. Heroin estimates based on 6-MAM loads were inconsistent. Estimates obtained from morphine loads, combined to prescription/sales data, were in agreement with figures derived from syringe distribution data and general population surveys. CONCLUSIONS: The results obtained for methadone allowed assessing the reliability of the selected sampling strategy, supporting its ability to capture the consumption of a small cohort (i.e., 743 patients). Using morphine as marker, in combination with prescription/sales data, estimates in accordance with other indicators about heroin use were obtained. Combining different sources of data allowed strengthening the results and suggested that the different indicators (i.e., administration route, average dosage and number of consumers) contribute to depict a realistic representation of the phenomenon in the investigated area. Heroin consumption was estimated to approximately 13gday(-1) (118gday(-1) at street level).
Resumo:
Female gender and low income are two markers for groups that have been historically disadvantaged within most societies. The study explores two research questions related to their political representation: 1) Are parties ideologically biased towards the ideological preferences of male and rich citizens? 2) Does the proportionality of the electoral system moderate the degree of underrepresentation of women and poor citizens in the party system? A multilevel analysis of survey data from 24 parliamentary democracies indicates that there is some bias against those with low income and, at a much smaller rate, women. This has systemic consequences for the quality of representation, as the preferences of the complementary groups differ. The proportionality of the electoral system influences the degree of underrepresentation: specifically, larger district magnitudes help closing the considerable gap between rich and poor.
Resumo:
The extension of traditional data mining methods to time series has been effectively applied to a wide range of domains such as finance, econometrics, biology, security, and medicine. Many existing mining methods deal with the task of change points detection, but very few provide a flexible approach. Querying specific change points with linguistic variables is particularly useful in crime analysis, where intuitive, understandable, and appropriate detection of changes can significantly improve the allocation of resources for timely and concise operations. In this paper, we propose an on-line method for detecting and querying change points in crime-related time series with the use of a meaningful representation and a fuzzy inference system. Change points detection is based on a shape space representation, and linguistic terms describing geometric properties of the change points are used to express queries, offering the advantage of intuitiveness and flexibility. An empirical evaluation is first conducted on a crime data set to confirm the validity of the proposed method and then on a financial data set to test its general applicability. A comparison to a similar change-point detection algorithm and a sensitivity analysis are also conducted. Results show that the method is able to accurately detect change points at very low computational costs. More broadly, the detection of specific change points within time series of virtually any domain is made more intuitive and more understandable, even for experts not related to data mining.