905 resultados para Web Mining, Data Mining, User Topic Model, Web User Profiles
Resumo:
An efficient two-level model identification method aiming at maximising a model׳s generalisation capability is proposed for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularisation parameters in the elastic net are optimised using a particle swarm optimisation (PSO) algorithm at the upper level by minimising the leave one out (LOO) mean square error (LOOMSE). There are two elements of original contributions. Firstly an elastic net cost function is defined and applied based on orthogonal decomposition, which facilitates the automatic model structure selection process with no need of using a predetermined error tolerance to terminate the forward selection process. Secondly it is shown that the LOOMSE based on the resultant ENOFR models can be analytically computed without actually splitting the data set, and the associate computation cost is small due to the ENOFR procedure. Consequently a fully automated procedure is achieved without resort to any other validation data set for iterative model evaluation. Illustrative examples are included to demonstrate the effectiveness of the new approaches.
Resumo:
Aim Most vascular plants on Earth form mycorrhizae, a symbiotic relationship between plants and fungi. Despite the broad recognition of the importance of mycorrhizae for global carbon and nutrient cycling, we do not know how soil and climate variables relate to the intensity of colonization of plant roots by mycorrhizal fungi. Here we quantify the global patterns of these relationships. Location Global. Methods Data on plant root colonization intensities by the two dominant types of mycorrhizal fungi world-wide, arbuscular (4887 plant species in 233 sites) and ectomycorrhizal fungi (125 plant species in 92 sites), were compiled from published studies. Data for climatic and soil factors were extracted from global datasets. For a given mycorrhizal type, we calculated at each site the mean root colonization intensity by mycorrhizal fungi across all potentially mycorrhizal plant species found at the site, and subjected these data to generalized additive model regression analysis with environmental factors as predictor variables. Results We show for the first time that at the global scale the intensity of plant root colonization by arbuscular mycorrhizal fungi strongly relates to warm-season temperature, frost periods and soil carbon-to-nitrogen ratio, and is highest at sites featuring continental climates with mild summers and a high availability of soil nitrogen. In contrast, the intensity of ectomycorrhizal infection in plant roots is related to soil acidity, soil carbon-to-nitrogen ratio and seasonality of precipitation, and is highest at sites with acidic soils and relatively constant precipitation levels. Main conclusions We provide the first quantitative global maps of intensity of mycorrhizal colonization based on environmental drivers, and suggest that environmental changes will affect distinct types of mycorrhizae differently. Future analyses of the potential effects of environmental change on global carbon and nutrient cycling via mycorrhizal pathways will need to take into account the relationships discovered in this study.
Resumo:
We present a comprehensive analysis of the spatial, kinematic and chemical properties of stars and globular clusters (GCs) in the `ordinary` elliptical galaxy NGC 4494 using data from the Keck and Subaru telescopes. We derive galaxy surface brightness and colour profiles out to large galactocentric radii. We compare the latter to metallicities derived using the near-infrared Calcium Triplet. We obtain stellar kinematics out to similar to 3.5 effective radii. The latter appear flattened or elongated beyond similar to 1.8 effective radii in contrast to the relatively round photometric isophotes. In fact, NGC 4494 may be a flattened galaxy, possibly even an S0, seen at an inclination of similar to 45 degrees. We publish a catalogue of 431 GC candidates brighter than i(0) = 24 based on the photometry, of which 109 are confirmed spectroscopically and 54 have measured spectroscopic metallicities. We also report the discovery of three spectroscopically confirmed ultra-compact dwarfs around NGC 4494 with measured metallicities of -0.4 less than or similar to [Fe/H] less than or similar to -0.3. Based on their properties, we conclude that they are simply bright GCs. The metal-poor GCs are found to be rotating with similar amplitude as the galaxy stars, while the metal-rich GCs show marginal rotation. We supplement our analysis with available literature data and results. Using model predictions of galaxy formation, and a suite of merger simulations, we find that many of the observational properties of NGC 4494 may be explained by formation in a relatively recent gas-rich major merger. Complete studies of individual galaxies incorporating a range of observational avenues and methods such as the one presented here will be an invaluable tool for constraining the fine details of galaxy formation models, especially at large galactocentric radii.
Resumo:
In arthropods, most cases of morphological dimorphism within males are the result of a conditional evolutionarily stable strategy (ESS) with status-dependent tactics. In conditionally male-dimorphic species, the status` distributions of male morphs often overlap, and the environmentally cued threshold model (ET) states that the degree of overlap depends on the genetic variation in the distribution of the switchpoints that determine which morph is expressed in each value of status. Here we describe male dimorphism and alternative mating behaviors in the harvestman Serracutisoma proximum. Majors express elongated second legs and use them in territorial fights; minors possess short second legs and do not fight, but rather sneak into majors` territories and copulate with egg-guarding females. The static allometry of second legs reveals that major phenotype expression depends on body size (status), and that the switchpoint underlying the dimorphism presents a large amount of genetic variation in the population, which probably results from weak selective pressure on this trait. With a mark-recapture study, we show that major phenotype expression does not result in survival costs, which is consistent with our hypothesis that there is weak selection on the switchpoint. Finally, we demonstrate that switchpoint is independent of status distribution. In conclusion, our data support the ET model prediction that the genetic correlation between status and switchpoint is low, allowing the status distribution to evolve or to fluctuate seasonally, without any effect on the position of the mean switchpoint.
Resumo:
Clustering quality or validation indices allow the evaluation of the quality of clustering in order to support the selection of a specific partition or clustering structure in its natural unsupervised environment, where the real solution is unknown or not available. In this paper, we investigate the use of quality indices mostly based on the concepts of clusters` compactness and separation, for the evaluation of clustering results (partitions in particular). This work intends to offer a general perspective regarding the appropriate use of quality indices for the purpose of clustering evaluation. After presenting some commonly used indices, as well as indices recently proposed in the literature, key issues regarding the practical use of quality indices are addressed. A general methodological approach is presented which considers the identification of appropriate indices thresholds. This general approach is compared with the simple use of quality indices for evaluating a clustering solution.
Resumo:
We investigate the dielectric dispersion of water, specially in the low-frequency range, by using the impedance spectroscopy technique. The frequency dependencies of the real R and imaginary Z parts of the impedance Could not be explained by means of the Usual description of the dielectric properties of the water as all insulating liquid containing ions. This is due to the incomplete knowledge of the parameters entering in the fundamental equations describing the evolution of the system, and oil the mechanisms regulating the exchange of charge of the cell with the external circuit. We propose a simple description of our experimental data based on the model of Debye, by invoking a dc conductivity of the cell, related to the nonblocking character of the electrodes. A discussion on the electric Circuits able to simulate the cell under investigation, based oil bulk and Surface elements, is also reported. We find that the simple circuit formed by a series of two parallels of resistance and capacitance is able to reproduce the experimental data concerning the real and imaginary part of the electrical impedance of the cell for frequency larger than 1 Hz. According to this description, one of the parallels takes into account the electrical properties of interface between the electrode and water, and the other of the bulk. For frequency lower than 1 Hz, a good agreement with the experimental data is obtained by simulating the electrical properties of the interface by means of the constant phase element.
Resumo:
In this paper we present a novel approach for multispectral image contextual classification by combining iterative combinatorial optimization algorithms. The pixel-wise decision rule is defined using a Bayesian approach to combine two MRF models: a Gaussian Markov Random Field (GMRF) for the observations (likelihood) and a Potts model for the a priori knowledge, to regularize the solution in the presence of noisy data. Hence, the classification problem is stated according to a Maximum a Posteriori (MAP) framework. In order to approximate the MAP solution we apply several combinatorial optimization methods using multiple simultaneous initializations, making the solution less sensitive to the initial conditions and reducing both computational cost and time in comparison to Simulated Annealing, often unfeasible in many real image processing applications. Markov Random Field model parameters are estimated by Maximum Pseudo-Likelihood (MPL) approach, avoiding manual adjustments in the choice of the regularization parameters. Asymptotic evaluations assess the accuracy of the proposed parameter estimation procedure. To test and evaluate the proposed classification method, we adopt metrics for quantitative performance assessment (Cohen`s Kappa coefficient), allowing a robust and accurate statistical analysis. The obtained results clearly show that combining sub-optimal contextual algorithms significantly improves the classification performance, indicating the effectiveness of the proposed methodology. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
5-HT(1A) receptor antagonists have been employed to treat depression, but the lack of structural information on this receptor hampers the design of specific and selective ligands. In this study, we have performed CoMFA studies on a training set of arylpiperazines (high affinity 5-HT(1A) receptor ligands) and to produce an effective alignment of the data set, a pharmacophore model was produced using Galahad. A statistically significant model was obtained, indicating a good internal consistency and predictive ability for untested compounds. The information gathered from our receptor-independent pharmacophore hypothesis is in good agreement with results from independent studies using different approaches. Therefore, this work provides important insights on the chemical and structural basis involved in the molecular recognition of these compounds. (C) 2010 Elsevier Masson SAS. All rights reserved.
Resumo:
This work reports the energy transfer mechanism process of [Eu(TTA)(2)(NO(3))(TPPO)(2)] (bis-TTA complex) and [Eu(TTA)(3)(TPPO)(2)] (tris-TTA complex) based on experimental and theoretical spectroscopic properties, where TTA = 2-thienoyltrifluoroacetone and TPPO = triphenylphosphine oxide. These complexes were synthesized and characterized by elemental analyses, infrared spectroscopy and thermogavimetric analysis. The theoretical complexes geometry data by using Sparkle model for the calculation of lanthanide complexes (SMLC) is in agreement with the crystalline structure determined by single-crystal X-ray diffraction analysis. The emission spectra for [Gd(TTA)(3)(TPPO)(2)] and [Gd(TTA)(2) (NO(3))(TPPO)(2)] complexes are associated to T -> S(0) transitions centered on coordinated TTA ligands. Experimental luminescent properties of the bis-TTA complex have been quantified through emission intensity parameters Omega(lambda)(lambda = 2 and 4), spontaneous emission rates (A(rad)), luminescence lifetime (tau), emission quantum efficiency (eta) and emission quantum yield (q), which were compared with those for tris-TTA complex. The experimental data showed that the intensity parameter value for bis-TTA complex is twice smaller than the one for tris-TTA complex, indicating the less polarizable chemical environment in the system containing nitrate ion. A good agreement between the theoretical and experimental quantum yields for both Eu(Ill) complexes was obtained. The triboluminescence (TL) of the [Eu(TTA)(2)(NO(3))(TPPO)(2)] complexes are discussed in terms of ligand-to-metal energy transfer. (c) 2007 Elsevier B.V. All fights reserved.
Resumo:
We study constrained efficient aggregate risk sharing and its consequence for the behavior of macro-aggregates in a dynamic Mirrlees’s (1971) setting. Privately observed idiosyncratic productivity shocks are assumed to be independent of i.i.d. publicly observed aggregate shocks. Yet, private allocations display memory with respect to past aggregate shocks, when idosyncratic shocks are also i.i.d.. Under a mild restriction on the nature of optimal allocations the result extends to more persistent idiosyncratic shocks, for all but the limit at which idiosyncratic risk disappears, and the model collapses to a pure heterogeneity repeated Mirrlees economy identical to Werning [2007]. When preferences are iso-elastic we show that an allocation is memoryless only if it displays a strong form of separability with respect to aggregate shocks. Separability characterizes the pure heterogeneity limit as well as the general case with log preferences. With less than full persistence and risk aversion different from unity both memory and non-separability characterize optimal allocations. Exploiting the fact that non-separability is associated with state-varying labor wedges, we apply a business cycle accounting procedure (e.g. Chari et al. [2007]) to the aggregate data generated by the model. We show that, whenever risk aversion is great than one our model produces efficient counter-cyclical labor wedges.
Resumo:
This paper asks to what extent distortions to the adoption of new technology cause income inequality across nations. We work in the framework of embodied technological progress with an individual, C.E.S. production function. We estimate the parameters of this production function from international data and calibrate the model, using U.S. National Income statistics. Our analysis suggests that distortions account for a bigger portion of income inequality than hitherto has been assessed.
Resumo:
The present work aims to study the macroeconomic factors influence in credit risk for installment autoloans operations. The study is based on 4.887 credit operations surveyed in the Credit Risk Information System (SCR) hold by the Brazilian Central Bank. Using Survival Analysis applied to interval censured data, we achieved a model to estimate the hazard function and we propose a method for calculating the probability of default in a twelve month period. Our results indicate a strong time dependence for the hazard function by a polynomial approximation in all estimated models. The model with the best Akaike Information Criteria estimate a positive effect of 0,07% for males over de basic hazard function, and 0,011% for the increasing of ten base points on the operation annual interest rate, toward, for each R$ 1.000,00 on the installment, the hazard function suffer a negative effect of 0,28% , and an estimated elevation of 0,0069% for the same amount added to operation contracted value. For de macroeconomics factors, we find statistically significant effects for the unemployment rate (-0,12%) , for the one lag of the unemployment rate (0,12%), for the first difference of the industrial product index(-0,008%), for one lag of inflation rate (-0,13%) and for the exchange rate (-0,23%). We do not find statistic significant results for all other tested variables.
Resumo:
A modelagem da estrutura a termo da taxa juros tem grande relevância para o mercado financeiro, isso se deve ao fato de ser utilizada na precificação de títulos de crédito e derivativos, ser componente fundamental nas políticas econômicas e auxiliar a criação de estratégias trading. A classe de modelos criada por Nelson-Siegel (1987), foi estendida por diversos autores e atualmente é largamente utilizada por diversos bancos centrais ao redor do mundo. Nesse trabalho utilizaremos a extensão proposta por Diebold e Li (2006) aplicada para o mercado brasileiro, os parâmetros serão calibrados através do Filtro de Kalman e do Filtro de Kalman Estendido, sendo que o último método permitirá estimar com dinamismo os quatros parâmetros do modelo. Como mencionado por Durbin e Koopman (2012), as fórmulas envolvidas no filtro de Kalman e em sua versão estendida não impõe condições de dimensão constante do vetor de observações. Partindo desse conceito, a implementação dos filtros foi feita de forma a possibilitar sua aplicação independentemente do número de observações da curva de juros em cada instante de tempo, dispensando a necessidade de interpolar os dados antes da calibração. Isso ajuda a refletir mais fielmente a realidade do mercado e relaxar as hipóteses assumidas ao interpolar previamente para obter vértices fixos. Também será testada uma nova proposta de adaptação do modelo de Nelson-Siegel, nela o parâmetro de nível será condicionado aos títulos terem vencimento antes ou depois da próxima reunião do Copom. O objetivo é comparar qualidade da predição entre os métodos, pontuando quais são as vantagens e desvantagens encontradas em cada um deles.
Resumo:
Allergicasthmarepresentsanimportantpublichealthissuewithsignificantgrowthovertheyears,especially in the paediatric population. Exhaled breath is a non-invasive, easily performed and rapid method forobtainingsamplesfromthelowerrespiratorytract.Inthepresentmanuscript,themetabolicvolatile profiles of allergic asthma and control children were evaluated by headspace solid-phase microextraction combined with gas chromatography–quadrupole mass spectrometry (HS-SPME/GC–qMS). The lack ofstudiesinbreathofallergicasthmaticchildrenbyHS-SPMEledtothedevelopmentofanexperimental design to optimize SPME parameters. To fulfil this objective, three important HS-SPME experimental parameters that influence the extraction efficiency, namely fibre coating, temperature and time extractions were considered. The selected conditions that promoted higher extraction efficiency corresponding to the higher GC peak areas and number of compounds were: DVB/CAR/PDMS coating fibre, 22◦C and 60min as the extraction temperature and time, respectively. The suitability of two containers, 1L Tedlar® bags and BIOVOC®, for breath collection and intra-individual variability were also investigated. The developed methodology was then applied to the analysis of children exhaled breath with allergicasthma(35),fromwhich13hadalsoallergicrhinitis,andhealthycontrolchildren(15),allowing to identify 44 volatiles distributed over the chemical families of alkanes (linear and ramified) ketones, aromatic hydrocarbons, aldehydes, acids, among others. Multivariate studies were performed by Partial LeastSquares–DiscriminantAnalysis(PLS–DA)usingasetof28selectedmetabolitesanddiscrimination between allergic asthma and control children was attained with a classification rate of 88%. The allergic asthma paediatric population was characterized mainly by the compounds linked to oxidative stress, such as alkanes and aldehydes. Furthermore, more detailed information was achieved combining the volatile metabolic data, suggested by PLS–DA model, and clinical data.