888 resultados para Dataset
Resumo:
Immigration has risen substantially in many European economies, with far-reaching if still uncertain implications for labor markets and industrial relations. This paper investigates such implications, focusing on employment flexibility, involving both ‘external flexibility’ (fixed-term or temporary agency and/or involuntary part-time work) and ‘internal flexibility’ (overtime and/or balancing-time accounts). The paper identifies reasons why immigration should generally increase the incidence of such flexibility, and why external should rise more than internal flexibility. The paper supports these claims using a dataset of establishments in sixteen European countries.
Resumo:
In order to accelerate computing the convex hull on a set of n points, a heuristic procedure is often applied to reduce the number of points to a set of s points, s ≤ n, which also contains the same hull. We present an algorithm to precondition 2D data with integer coordinates bounded by a box of size p × q before building a 2D convex hull, with three distinct advantages. First, we prove that under the condition min(p, q) ≤ n the algorithm executes in time within O(n); second, no explicit sorting of data is required; and third, the reduced set of s points forms a simple polygonal chain and thus can be directly pipelined into an O(n) time convex hull algorithm. This paper empirically evaluates and quantifies the speed up gained by preconditioning a set of points by a method based on the proposed algorithm before using common convex hull algorithms to build the final hull. A speedup factor of at least four is consistently found from experiments on various datasets when the condition min(p, q) ≤ n holds; the smaller the ratio min(p, q)/n is in the dataset, the greater the speedup factor achieved.
Resumo:
Long-duration observations of Neptune’s brightness in two visible wavelengths provide a disk-averaged estimate of its atmospheric aerosol. Brightness variations were previously associated with the 11-year solar cycle, through solar-modulated mechanisms linked with either ultra-violet (UV) or galactic cosmic ray (GCR) effects on atmospheric particles. Here we use a recently extended brightness dataset (1972-2014), with physically realistic modelling to show that rather than alternatives, UV and GCR are likely to be modulating Neptune’s atmosphere in combination. The importance of GCR is further supported by the response of Neptune's atmosphere to an intermittent 1.5 to 1.9 year periodicity, which occurred preferentially in GCR (not UV) during the mid-1980s. This periodicity was detected both at Earth, and in GCR measured by Voyager 2, then near Neptune. A similar coincident variability in Neptune’s brightness suggests nucleation onto GCR ions. Both GCR and UV mechanisms may occur more rapidly than the subsequent atmospheric particle transport.
Resumo:
Based on a large dataset from eight Asian economies, we test the impact of post-crisis regulatory reforms on the performance of depository institutions in countries at different levels of financial development. We allow for technological heterogeneity and estimate a set of country-level stochastic cost frontiers followed by a deterministic bootstrapped meta-frontier to evaluate cost efficiency and cost technology. Our results support the view that liberalization policies have a positive impact on bank performance, while the reverse is true for prudential regulation policies. The removal of activities restrictions, bank privatization and foreign bank entry have a positive and significant impact on technological progress and cost efficiency. In contrast, prudential policies, which aim to protect the banking sector from excessive risk-taking, tend to adversely affect banks cost efficiency but not cost technology.
Resumo:
Observations obtained during an 8-month deployment of AMF2 in a boreal environment in Hyytiälä, Finland, and the 20-year comprehensive in-situ data from SMEAR-II station enable the characterization of biogenic aerosol, clouds and precipitation, and their interactions. During “Biogenic Aerosols - Effects on Clouds and Climate (BAECC)”, the U.S. Department of Energy’s Atmospheric Radiation Measurement (ARM) Program deployed the ARM 2nd Mobile Facility (AMF2) to Hyytiälä, Finland, for an 8-month intensive measurement campaign from February to September 2014. The primary research goal is to understand the role of biogenic aerosols in cloud formation. Hyytiälä is host to SMEAR-II (Station for Measuring Forest Ecosystem-Atmosphere Relations), one of the world’s most comprehensive surface in-situ observation sites in a boreal forest environment. The station has been measuring atmospheric aerosols, biogenic emissions and an extensive suite of parameters relevant to atmosphere-biosphere interactions continuously since 1996. Combining vertical profiles from AMF2 with surface-based in-situ SMEAR-II observations allow the processes at the surface to be directly related to processes occurring throughout the entire tropospheric column. Together with the inclusion of extensive surface precipitation measurements, and intensive observation periods involving aircraft flights and novel radiosonde launches, the complementary observations provide a unique opportunity for investigating aerosol-cloud interactions, and cloud-to-precipitation processes, in a boreal environment. The BAECC dataset provides opportunities for evaluating and improving models of aerosol sources and transport, cloud microphysical processes, and boundary-layer structures. In addition, numerical models are being used to bridge the gap between surface-based and tropospheric observations.
Resumo:
We study the relationship between the sentiment levels of Twitter users and the evolving network structure that the users created by @-mentioning each other. We use a large dataset of tweets to which we apply three sentiment scoring algorithms, including the open source SentiStrength program. Specifically we make three contributions. Firstly we find that people who have potentially the largest communication reach (according to a dynamic centrality measure) use sentiment differently than the average user: for example they use positive sentiment more often and negative sentiment less often. Secondly we find that when we follow structurally stable Twitter communities over a period of months, their sentiment levels are also stable, and sudden changes in community sentiment from one day to the next can in most cases be traced to external events affecting the community. Thirdly, based on our findings, we create and calibrate a simple agent-based model that is capable of reproducing measures of emotive response comparable to those obtained from our empirical dataset.
Resumo:
In 2004 the National Household Survey (Pesquisa Nacional par Amostras de Domicilios - PNAD) estimated the prevalence of food and nutrition insecurity in Brazil. However, PNAD data cannot be disaggregated at the municipal level. The objective of this study was to build a statistical model to predict severe food insecurity for Brazilian municipalities based on the PNAD dataset. Exclusion criteria were: incomplete food security data (19.30%); informants younger than 18 years old (0.07%); collective households (0.05%); households headed by indigenous persons (0.19%). The modeling was carried out in three stages, beginning with the selection of variables related to food insecurity using univariate logistic regression. The variables chosen to construct the municipal estimates were selected from those included in PNAD as well as the 2000 Census. Multivariate logistic regression was then initiated, removing the non-significant variables with odds ratios adjusted by multiple logistic regression. The Wald Test was applied to check the significance of the coefficients in the logistic equation. The final model included the variables: per capita income; years of schooling; race and gender of the household head; urban or rural residence; access to public water supply; presence of children; total number of household inhabitants and state of residence. The adequacy of the model was tested using the Hosmer-Lemeshow test (p=0.561) and ROC curve (area=0.823). Tests indicated that the model has strong predictive power and can be used to determine household food insecurity in Brazilian municipalities, suggesting that similar predictive models may be useful tools in other Latin American countries.
Resumo:
The accurate estimate of the surface longwave fluxes contribution is important for the calculation of the surface radiation budget, which in turn controls all the components of the surface energy budget, such as evaporation and the sensible heat fluxes. This study evaluates the performance of the various downward longwave radiation parameterizations for clear and all-sky days applied to the Sertozinho region in So Paulo, Brazil. Equations have been adjusted to the observations of longwave radiation. The adjusted equations were evaluated for every hour throughout the day and the results showed good fits for most of the day, except near dawn and sunset, followed by nighttime. The seasonal variation was studied by comparing the dry period against the rainy period in the dataset. The least square linear regressions resulted in coefficients equal to the coefficients found for the complete period, both in the dry period and in the rainy period. It is expected that the best fit equation to the observed data for this site be used to produce estimates in other regions of the State of So Paulo, where such information is not available.
Resumo:
According to most studies on seed dispersal in tropical forests, mammals and birds are considered the main dispersal agents and the role played by other animal groups remains poorly explored. We investigate qualitative and quantitative components of the role played by the tortoise Chelonoidis denticulata in seed dispersal in southeastern Amazon, and the influence of seasonal variation in tortoise movement patterns on resulting seed shadows. Seed shadows produced by this tortoise were estimated by combining information on seed passage times through their digestive tract, which varied from 3 to 17 days, with a robust dataset on movements obtained from 18 adult C. denticulata monitored with radio transmitters and spoon-and-line tracking devices. A total of 4,206 seeds were found in 94 collected feces, belonging to 50 seed morphotypes of, at least, 25 plant genera. Very low rates of damage to the external structure of the ingested seeds were observed. Additionally, results of germination trials suggested that passage of seeds through C. denticulata`s digestive tract does not seem to negatively affect seed germination. The estimated seed shadows are likely to contribute significantly to the dispersal of seeds away from parent plants. During the dry season seeds were dispersed, on average, 174.1 m away from the location of fruit ingestion; during the rainy season, this mean dispersal distance increased to 276.7 m. Our results suggest that C. denticulata plays an important role in seed dispersal in Amazonian forests and highlight the influence of seasonal changes in movements on the resulting seed shadows.
Resumo:
Coleodactylus amazonicus, a small leaf-litter diurnal gecko widely distributed in Amazon Basin has been, considered a single species with no significant morphological differences between populations along its range. A recent molecular study, however, detected large genetic differences between populations of central Amazonia and those in the easternmost part of the Amazon Basin, suggesting the presence of taxonomically unrecognised diversity. In this study, DNA sequences of three mitochondrial (165, cytb, and ND4) and two nuclear genes (RAG-1, c-mos) were used to investigate whether the species currently identified as C. amazonicus contains morphologically cryptic species lineages. The present phylogenetic analysis reveals further genetic subdivision including at least five potential species lineages, restricted to northeastern (lineage A), southeastern (lineage B), central-northern (lineage E) and central-southern (lineages C and D) parts of Amazon Basin. All clades are characterized by exclusive groups of alleles for both nuclear genes and highly divergent mitochondrial haplotype clades, with corrected pairwise net sequence divergence between sister lineages ranging from 9.1% to 20.7% for the entire mtDNA dataset. Results of this study suggest that the real diversity of ""C. amazonicus"" has been underestimated due to its apparent cryptic diversification. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
Broad-scale phylogenetic analyses of the angiosperms and of the Asteridae have failed to confidently resolve relationships among the major lineages of the campanulid Asteridae (i.e., the euasterid II of APG II, 2003). To address this problem we assembled presently available sequences for a core set of 50 taxa, representing the diversity of the four largest lineages (Apiales, Aquifoliales, Asterales, Dipsacales) as well as the smaller ""unplaced"" groups (e.g., Bruniaceae, Paracryphiaceae, Columelliaceae). We constructed four data matrices for phylogenetic analysis: a chloroplast coding matrix (atpB, matK, ndhF, rbcL), a chloroplast non-coding matrix (rps16 intron, trnT-F region, trnV-atpE IGS), a combined chloroplast dataset (all seven chloroplast regions), and a combined genome matrix (seven chloroplast regions plus 18S and 26S rDNA). Bayesian analyses of these datasets using mixed substitution models produced often well-resolved and supported trees. Consistent with more weakly supported results from previous studies, our analyses support the monophyly of the four major clades and the relationships among them. Most importantly, Asterales are inferred to be sister to a clade containing Apiales and Dipsacales. Paracryphiaceae is consistently placed sister to the Dipsacales. However, the exact relationships of Bruniaceae, Columelliaceae, and an Escallonia clade depended upon the dataset. Areas of poor resolution in combined analyses may be partly explained by conflict between the coding and non-coding data partitions. We discuss the implications of these results for our understanding of campanulid phylogeny and evolution, paying special attention to how our findings bear on character evolution and biogeography in Dipsacales.
Resumo:
Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper. we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.
Resumo:
Several popular Machine Learning techniques are originally designed for the solution of two-class problems. However, several classification problems have more than two classes. One approach to deal with multiclass problems using binary classifiers is to decompose the multiclass problem into multiple binary sub-problems disposed in a binary tree. This approach requires a binary partition of the classes for each node of the tree, which defines the tree structure. This paper presents two algorithms to determine the tree structure taking into account information collected from the used dataset. This approach allows the tree structure to be determined automatically for any multiclass dataset.