23 resultados para Datasets
Resumo:
We report a search for single top quark production with the CDF II detector using 2.1 fb-1 of integrated luminosity of pbar p collisions at sqrt{s}=1.96 TeV. The data selected consist of events characterized by large energy imbalance in the transverse plane and hadronic jets, and no identified electrons and muons, so the sample is enriched in W -> tau nu decays. In order to suppress backgrounds, additional kinematic and topological requirements are imposed through a neural network, and at least one of the jets must be identified as a b-quark jet. We measure an excess of signal-like events in agreement with the standard model prediction, but inconsistent with a model without single top quark production by 2.1 standard deviations (sigma), with a median expected sensitivity of 1.4 sigma. Assuming a top quark mass of 175 GeV/c2 and ascribing the excess to single top quark production, the cross section is measured to be 4.9+2.5-2.2(stat+syst)pb, consistent with measurements performed in independent datasets and with the standard model prediction.
Resumo:
Market microstructure is “the study of the trading mechanisms used for financial securities” (Hasbrouck (2007)). It seeks to understand the sources of value and reasons for trade, in a setting with different types of traders, and different private and public information sets. The actual mechanisms of trade are a continually changing object of study. These include continuous markets, auctions, limit order books, dealer markets, or combinations of these operating as a hybrid market. Microstructure also has to allow for the possibility of multiple prices. At any given time an investor may be faced with a multitude of different prices, depending on whether he or she is buying or selling, the quantity he or she wishes to trade, and the required speed for the trade. The price may also depend on the relationship that the trader has with potential counterparties. In this research, I touch upon all of the above issues. I do this by studying three specific areas, all of which have both practical and policy implications. First, I study the role of information in trading and pricing securities in markets with a heterogeneous population of traders, some of whom are informed and some not, and who trade for different private or public reasons. Second, I study the price discovery of stocks in a setting where they are simultaneously traded in more than one market. Third, I make a contribution to the ongoing discussion about market design, i.e. the question of which trading systems and ways of organizing trading are most efficient. A common characteristic throughout my thesis is the use of high frequency datasets, i.e. tick data. These datasets include all trades and quotes in a given security, rather than just the daily closing prices, as in traditional asset pricing literature. This thesis consists of four separate essays. In the first essay I study price discovery for European companies cross-listed in the United States. I also study explanatory variables for differences in price discovery. In my second essay I contribute to earlier research on two issues of broad interest in market microstructure: market transparency and informed trading. I examine the effects of a change to an anonymous market at the OMX Helsinki Stock Exchange. I broaden my focus slightly in the third essay, to include releases of macroeconomic data in the United States. I analyze the effect of these releases on European cross-listed stocks. The fourth and last essay examines the uses of standard methodologies of price discovery analysis in a novel way. Specifically, I study price discovery within one market, between local and foreign traders.
Resumo:
Gene expression is one of the most critical factors influencing the phenotype of a cell. As a result of several technological advances, measuring gene expression levels has become one of the most common molecular biological measurements to study the behaviour of cells. The scientific community has produced enormous and constantly increasing collection of gene expression data from various human cells both from healthy and pathological conditions. However, while each of these studies is informative and enlighting in its own context and research setup, diverging methods and terminologies make it very challenging to integrate existing gene expression data to a more comprehensive view of human transcriptome function. On the other hand, bioinformatic science advances only through data integration and synthesis. The aim of this study was to develop biological and mathematical methods to overcome these challenges and to construct an integrated database of human transcriptome as well as to demonstrate its usage. Methods developed in this study can be divided in two distinct parts. First, the biological and medical annotation of the existing gene expression measurements needed to be encoded by systematic vocabularies. There was no single existing biomedical ontology or vocabulary suitable for this purpose. Thus, new annotation terminology was developed as a part of this work. Second part was to develop mathematical methods correcting the noise and systematic differences/errors in the data caused by various array generations. Additionally, there was a need to develop suitable computational methods for sample collection and archiving, unique sample identification, database structures, data retrieval and visualization. Bioinformatic methods were developed to analyze gene expression levels and putative functional associations of human genes by using the integrated gene expression data. Also a method to interpret individual gene expression profiles across all the healthy and pathological tissues of the reference database was developed. As a result of this work 9783 human gene expression samples measured by Affymetrix microarrays were integrated to form a unique human transcriptome resource GeneSapiens. This makes it possible to analyse expression levels of 17330 genes across 175 types of healthy and pathological human tissues. Application of this resource to interpret individual gene expression measurements allowed identification of tissue of origin with 92.0% accuracy among 44 healthy tissue types. Systematic analysis of transcriptional activity levels of 459 kinase genes was performed across 44 healthy and 55 pathological tissue types and a genome wide analysis of kinase gene co-expression networks was done. This analysis revealed biologically and medically interesting data on putative kinase gene functions in health and disease. Finally, we developed a method for alignment of gene expression profiles (AGEP) to perform analysis for individual patient samples to pinpoint gene- and pathway-specific changes in the test sample in relation to the reference transcriptome database. We also showed how large-scale gene expression data resources can be used to quantitatively characterize changes in the transcriptomic program of differentiating stem cells. Taken together, these studies indicate the power of systematic bioinformatic analyses to infer biological and medical insights from existing published datasets as well as to facilitate the interpretation of new molecular profiling data from individual patients.
Resumo:
In recent years, thanks to developments in information technology, large-dimensional datasets have been increasingly available. Researchers now have access to thousands of economic series and the information contained in them can be used to create accurate forecasts and to test economic theories. To exploit this large amount of information, researchers and policymakers need an appropriate econometric model.Usual time series models, vector autoregression for example, cannot incorporate more than a few variables. There are two ways to solve this problem: use variable selection procedures or gather the information contained in the series to create an index model. This thesis focuses on one of the most widespread index model, the dynamic factor model (the theory behind this model, based on previous literature, is the core of the first part of this study), and its use in forecasting Finnish macroeconomic indicators (which is the focus of the second part of the thesis). In particular, I forecast economic activity indicators (e.g. GDP) and price indicators (e.g. consumer price index), from 3 large Finnish datasets. The first dataset contains a large series of aggregated data obtained from the Statistics Finland database. The second dataset is composed by economic indicators from Bank of Finland. The last dataset is formed by disaggregated data from Statistic Finland, which I call micro dataset. The forecasts are computed following a two steps procedure: in the first step I estimate a set of common factors from the original dataset. The second step consists in formulating forecasting equations including the factors extracted previously. The predictions are evaluated using relative mean squared forecast error, where the benchmark model is a univariate autoregressive model. The results are dataset-dependent. The forecasts based on factor models are very accurate for the first dataset (the Statistics Finland one), while they are considerably worse for the Bank of Finland dataset. The forecasts derived from the micro dataset are still good, but less accurate than the ones obtained in the first case. This work leads to multiple research developments. The results here obtained can be replicated for longer datasets. The non-aggregated data can be represented in an even more disaggregated form (firm level). Finally, the use of the micro data, one of the major contributions of this thesis, can be useful in the imputation of missing values and the creation of flash estimates of macroeconomic indicator (nowcasting).
Resumo:
Reorganizing a dataset so that its hidden structure can be observed is useful in any data analysis task. For example, detecting a regularity in a dataset helps us to interpret the data, compress the data, and explain the processes behind the data. We study datasets that come in the form of binary matrices (tables with 0s and 1s). Our goal is to develop automatic methods that bring out certain patterns by permuting the rows and columns. We concentrate on the following patterns in binary matrices: consecutive-ones (C1P), simultaneous consecutive-ones (SC1P), nestedness, k-nestedness, and bandedness. These patterns reflect specific types of interplay and variation between the rows and columns, such as continuity and hierarchies. Furthermore, their combinatorial properties are interlinked, which helps us to develop the theory of binary matrices and efficient algorithms. Indeed, we can detect all these patterns in a binary matrix efficiently, that is, in polynomial time in the size of the matrix. Since real-world datasets often contain noise and errors, we rarely witness perfect patterns. Therefore we also need to assess how far an input matrix is from a pattern: we count the number of flips (from 0s to 1s or vice versa) needed to bring out the perfect pattern in the matrix. Unfortunately, for most patterns it is an NP-complete problem to find the minimum distance to a matrix that has the perfect pattern, which means that the existence of a polynomial-time algorithm is unlikely. To find patterns in datasets with noise, we need methods that are noise-tolerant and work in practical time with large datasets. The theory of binary matrices gives rise to robust heuristics that have good performance with synthetic data and discover easily interpretable structures in real-world datasets: dialectical variation in the spoken Finnish language, division of European locations by the hierarchies found in mammal occurrences, and co-occuring groups in network data. In addition to determining the distance from a dataset to a pattern, we need to determine whether the pattern is significant or a mere occurrence of a random chance. To this end, we use significance testing: we deem a dataset significant if it appears exceptional when compared to datasets generated from a certain null hypothesis. After detecting a significant pattern in a dataset, it is up to domain experts to interpret the results in the terms of the application.
Resumo:
Finland witnessed a surge in crime news reporting during the 1990s. At the same time, there was a significant rise in the levels of fear of crime reported by surveys. This research examines whether and how the two phenomena: news media and fear of violence were associated with each other. The dissertation consists of five sub-studies and a summary article. The first sub-study is a review of crime reporting trends in Finland, in which I have reviewed prior research and used existing Finnish datasets on media contents and crime news media exposure. The second study examines the association between crime media consumption and fear of crime when personal and vicarious victimization experiences have been held constant. Apart from analyzing the impact of crime news consumption on fear, media effects on general social trust are analyzed in the third sub-study. In the fourth sub-study I have analyzed the contents of the Finnish Poliisi-TV programme and compared the consistency of the picture of violent crime between official data sources and the programme. In the fifth and final sub-study, the victim narratives of Poliisi-TV s violence news contents have been analyzed. The research provides a series of results which are unprecedented in Finland. First, it observes that as in many other countries, the quantity of crime news supply has increased quite markedly in Finland. Second, it verifies that exposure to crime news is related to being worried about violent victimization and avoidance behaviour. Third, it documents that exposure to TV crime reality-programming is associated with reduced social trust among Finnish adolescents. Fourth, the analysis of Poliisi-TV shows that it transmits a distorted view of crime when contrasted with primary data sources on crime, but that this distortion is not as big as could be expected from international research findings and epochal theories of sociology. Fifth, the portrayals of violence victims in Poliisi-TV do not fit the traditional ideal types of victims that are usually seen to dominate crime media. The fact that the victims of violence in Poliisi-TV are ordinary people represents a wider development of the changing significance of the crime victim in Finland. The research concludes that although the media most likely did have an effect on the rising public fears in the 1990s, the mechanism was not as straight forward as has often been claimed. It is likely that there are other factors in the fear-media equation that are affecting both fear levels and crime reporting and that these factors are interactive in nature. Finally, the research calls for a re-orientation of media criminology and suggests more emphasis on the positive implications of crime in the media. Keywords: crime, media, fear of crime, violence, victimization, news
Resumo:
The world of mapping has changed. Earlier, only professional experts were responsible for map production, but today ordinary people without any training or experience can become map-makers. The number of online mapping sites, and the number of volunteer mappers has increased significantly. The development of the technology, such as satellite navigation systems, Web 2.0, broadband Internet connections, and smartphones, have had one of the key roles in enabling the rise of volunteered geographic information (VGI). As opening governmental data to public is a current topic in many countries, the opening of high quality geographical data has a central role in this study. The aim of this study is to investigate how is the quality of spatial data produced by volunteers by comparing it with the map data produced by public authorities, to follow what occurs when spatial data are opened for users, and to get acquainted with the user profile of these volunteer mappers. A central part of this study is OpenStreetMap project (OSM), which aim is to create a map of the entire world by volunteers. Anyone can become an OpenStreetMap contributor, and the data created by the volunteers are free to use for anyone without restricting copyrights or license charges. In this study OpenStreetMap is investigated from two viewpoints. In the first part of the study, the aim was to investigate the quality of volunteered geographic information. A pilot project was implemented by following what occurs when a high-resolution aerial imagery is released freely to the OpenStreetMap contributors. The quality of VGI was investigated by comparing the OSM datasets with the map data of The National Land Survey of Finland (NLS). The quality of OpenStreetMap data was investigated by inspecting the positional accuracy and the completeness of the road datasets, as well as the differences in the attribute datasets between the studied datasets. Also the OSM community was under analysis and the development of the map data of OpenStreetMap was investigated by visual analysis. The aim of the second part of the study was to analyse the user profile of OpenStreetMap contributors, and to investigate how the contributors act when collecting data and editing OpenStreetMap. The aim was also to investigate what motivates users to map and how is the quality of volunteered geographic information envisaged. The second part of the study was implemented by conducting a web inquiry to the OpenStreetMap contributors. The results of the study show that the quality of OpenStreetMap data compared with the data of National Land Survey of Finland can be defined as good. OpenStreetMap differs from the map of National Land Survey especially because of the amount of uncertainty, for example because of the completeness and uniformity of the map are not known. The results of the study reveal that opening spatial data increased notably the amount of the data in the study area, and both the positional accuracy and completeness improved significantly. The study confirms the earlier arguments that only few contributors have created the majority of the data in OpenStreetMap. The inquiry made for the OpenStreetMap users revealed that the data are most often collected by foot or by bicycle using GPS device, or by editing the map with the help of aerial imageries. According to the responses, the users take part to the OpenStreetMap project because they want to make maps better, and want to produce maps, which have information that is up-to-date and cannot be found from any other maps. Almost all of the users exploit the maps by themselves, most popular methods being downloading the map into a navigator or into a mobile device. The users regard the quality of OpenStreetMap as good, especially because of the up-to-dateness and the accuracy of the map.
Resumo:
Work has a central role in the lives of big share of adult Finns and meals they eat during the workday comprise an important factor in their nutrition, health, and well-being. On workdays, lunch is mainly eaten at worksite canteens or, especially among women, as a packed meal in the workplace s break room. No national-level data is available on the nutritional quality of the meals served by canteens, although the Finnish Institute of Occupational Health laid out the first nutrition recommendations for worksite canteens in 1971. The aim of this study was to examine the contribution of various socio-demographic, socioeconomic, and work-related factors to the lunch eating patterns of Finnish employees during the working day and how lunch eating patterns influence dietary intake. Four different population-based cross-sectional datasets were used in this thesis. Three of the datasets were collected by the National Institute for Health and Welfare (Health Behaviour and Health among the Finnish Adult Population survey from 1979 to 2001, n=24746, and 2005 to 2007, n=5585, the National Findiet 2002 Study, n=261), and one of them by the Finnish Institute of Occupational Health (Work and Health in Finland survey from 1997, 2000, and 2003, n=6369). The Health Behaviour and Health among the Finnish Adult Population survey and the Work and Health in Finland survey are nationally representative studies that are conducted repeatedly. Survey information was collected by self-administered questionnaires, dietary recalls, and telephone interviews. The frequency of worksite canteen use has been quite stable for over two decades in Finland. A small decreasing trend can be seen in all socioeconomic groups. During the whole period studied, those with more years of education ate at worksite canteens more often than the others. The size of the workplace was the most important work-related determinant associated with the use of a worksite canteen. At small workplaces, other work-related determinants, like occupation, physical strain at work, and job control, were also associated with canteen use, whereas at bigger workplaces the associations were almost nonexistent. The major social determinants of worksite canteen availability were the education and occupational status of employees and the only work-related determinant was the size of the workplace. A worksite canteen was more commonly available to employees at larger workplaces and to those with the higher education and the higher occupational status. Even when the canteen was equally available to all employees, its use was nevertheless determined by occupational class and the place of residence, especially among female employees. Those with higher occupational status and those living in the Helsinki capital area ate in canteens more frequently than the others. Employees who ate at a worksite canteen consumed more vegetables and vegetable and fish dishes at lunch than did those who ate packed lunches. Also, the daily consumption of vegetables and the proportion of the daily users of vegetables were higher among those male employees who ate at a canteen. In conclusion, life possibilities, i.e. the availability of a canteen, education, occupational status, and work-related factors, played an important role in the choice of where to eat lunch among Finnish employees. The most basic prerequisite for eating in a canteen was availability, but there were also a number of underlying social determinants. Occupational status and the place of residence were the major structural factors behind individuals choices in their lunch eating patterns. To ensure the nutrition, health, and well-being of employees, employers should provide them with the option to have good quality meals during working hours. The availability of worksite canteens should be especially supported in lower socioeconomic groups. In addition, employees should be encouraged to have lunch at a worksite canteen when one is available by removing structural barriers to its use.