897 resultados para Transformation-based semi-parametric estimators
Resumo:
Conventional web search engines are centralised in that a single entity crawls and indexes the documents selected for future retrieval, and the relevance models used to determine which documents are relevant to a given user query. As a result, these search engines suffer from several technical drawbacks such as handling scale, timeliness and reliability, in addition to ethical concerns such as commercial manipulation and information censorship. Alleviating the need to rely entirely on a single entity, Peer-to-Peer (P2P) Information Retrieval (IR) has been proposed as a solution, as it distributes the functional components of a web search engine – from crawling and indexing documents, to query processing – across the network of users (or, peers) who use the search engine. This strategy for constructing an IR system poses several efficiency and effectiveness challenges which have been identified in past work. Accordingly, this thesis makes several contributions towards advancing the state of the art in P2P-IR effectiveness by improving the query processing and relevance scoring aspects of a P2P web search. Federated search systems are a form of distributed information retrieval model that route the user’s information need, formulated as a query, to distributed resources and merge the retrieved result lists into a final list. P2P-IR networks are one form of federated search in routing queries and merging result among participating peers. The query is propagated through disseminated nodes to hit the peers that are most likely to contain relevant documents, then the retrieved result lists are merged at different points along the path from the relevant peers to the query initializer (or namely, customer). However, query routing in P2P-IR networks is considered as one of the major challenges and critical part in P2P-IR networks; as the relevant peers might be lost in low-quality peer selection while executing the query routing, and inevitably lead to less effective retrieval results. This motivates this thesis to study and propose query routing techniques to improve retrieval quality in such networks. Cluster-based semi-structured P2P-IR networks exploit the cluster hypothesis to organise the peers into similar semantic clusters where each such semantic cluster is managed by super-peers. In this thesis, I construct three semi-structured P2P-IR models and examine their retrieval effectiveness. I also leverage the cluster centroids at the super-peer level as content representations gathered from cooperative peers to propose a query routing approach called Inverted PeerCluster Index (IPI) that simulates the conventional inverted index of the centralised corpus to organise the statistics of peers’ terms. The results show a competitive retrieval quality in comparison to baseline approaches. Furthermore, I study the applicability of using the conventional Information Retrieval models as peer selection approaches where each peer can be considered as a big document of documents. The experimental evaluation shows comparative and significant results and explains that document retrieval methods are very effective for peer selection that brings back the analogy between documents and peers. Additionally, Learning to Rank (LtR) algorithms are exploited to build a learned classifier for peer ranking at the super-peer level. The experiments show significant results with state-of-the-art resource selection methods and competitive results to corresponding classification-based approaches. Finally, I propose reputation-based query routing approaches that exploit the idea of providing feedback on a specific item in the social community networks and manage it for future decision-making. The system monitors users’ behaviours when they click or download documents from the final ranked list as implicit feedback and mines the given information to build a reputation-based data structure. The data structure is used to score peers and then rank them for query routing. I conduct a set of experiments to cover various scenarios including noisy feedback information (i.e, providing positive feedback on non-relevant documents) to examine the robustness of reputation-based approaches. The empirical evaluation shows significant results in almost all measurement metrics with approximate improvement more than 56% compared to baseline approaches. Thus, based on the results, if one were to choose one technique, reputation-based approaches are clearly the natural choices which also can be deployed on any P2P network.
Resumo:
This study provides a preliminary contribution to the development of a bioprocess for the contintious production of xylitol from hemicellulosic hydrolyzate utilizing Candida guilliermondii cells immobilized onto natural sugarcane bagasse fibers. To this purpose, cells of this yeast were submitted to batch tests of ""in situ"" adsorption onto crushed and powdered sugarcane bagasse after treatment with 0.5 M NaOH. The results obtained on a xylose-based semi-synthetic medium were evaluated in terms of immobilization efficiency, cell retention and specific growth rates of suspended, immobilized and total cells. The first two parameters were shown to increase along the immobilization process, reached maximum values of 50.5% and 0.31 g immobilized cells/g bagasse after 21 h and then sharply decreased. The specific growth rate of suspended cells continuously increased during the immobilization tests, while that of the immobilized ones, after an initial growth, exhibited decreasing values. Under the conditions selected for cell immobilization, fermentation also took place with promising results. The yields of xylitol and biomass on consumed xylose were 0.65 and 0.18 g/g, respectively, xylitol and biomass productivities 0.66 and 0.13 g L-1 h(-1), and the efficiency of xylose-to-xylitol bioconversion was 70.8%. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
Survival analysis is applied when the time until the occurrence of an event is of interest. Such data are routinely collected in plant diseases, although applications of the method are uncommon. The objective of this study was to use two studies on post-harvest diseases of peaches, considering two harvests together and the existence of random effect shared by fruits of a same tree, in order to describe the main techniques in survival analysis. The nonparametric Kaplan-Meier method, the log-rank test and the semi-parametric Cox's proportional hazards model were used to estimate the effect of cultivars and the number of days after full bloom on the survival to the brown rot symptom and the instantaneous risk of expressing it in two consecutive harvests. The joint analysis with baseline effect, varying between harvests, and the confirmation of the tree effect as a grouping factor with random effect were appropriate to interpret the phenomenon (disease) evaluated and can be important tools to replace or complement the conventional analysis, respecting the nature of the variable and the phenomenon.
Resumo:
A implementação da evolução tecnológica no setor da construção tem se caracterizado pelo aparecimento de novas tecnologias que dinamizam os processos de troca de informação entre os vários intervenientes no ciclo de vida do empreendimento. O surgimento da tecnologia Building Information Modeling - BIM assente na modelação paramétrica e na interoperabilidade suportada em ficheiros de padrão aberto (IFC) pressupõem um novo paradigma na forma como são tratados os processos de troca de informação entre os vários intervenientes no ciclo de vida dos empreendimentos. Com base no BIM o Construction Operations Building Information Exchange – COBie, é outra tecnologia recente que recolhe informações não geométricas associadas ao modelo e em conjunto com a informação geométrica produzida pelo BIM fazem parte dos documentos de entrega para a fase Facility Management – FM. O objetivo da presente dissertação centrou-se no estudo da evolução de um modelo BIM de construção para a gestão de empreendimento. Foi desenvolvido o estudo de um modelo protótipo que incidiu na utilização de softwares para verificação e aplicação das tecnologias COBie e BIM e também foi sincronizado com a fase FM. Da aplicação dos requisitos COBie e modelação BIM foram extraídas informações geométricas e não geométricas preenchidas nas folhas de trabalho COBie. As principais conclusões do estudo realizado foram que as tecnologias COBie e BIM têm pouca implantação a nível nacional e a sua integração dinamiza os processos, reduzindo custos e aumentando a qualidade da informação fornecida.
Resumo:
The observational method in tunnel engineering allows the evaluation in real time of the actual conditions of the ground and to take measures if its behavior deviates considerably from predictions. However, it lacks a consistent and structured methodology to use the monitoring data to adapt the support system in real time. The definition of limit criteria above which adaptation is required are not defined and complex inverse analysis procedures (Rechea et al. 2008, Levasseur et al. 2010, Zentar et al. 2001, Lecampion et al. 2002, Finno and Calvello 2005, Goh 1999, Cui and Pan 2012, Deng et al. 2010, Mathew and Lehane 2013, Sharifzadeh et al. 2012, 2013) may be needed to consistently analyze the problem. In this paper a methodology for the real time adaptation of the support systems during tunneling is presented. In a first step limit criteria for displacements and stresses are proposed. The methodology uses graphics that are constructed during the project stage based on parametric calculations to assist in the process and when these graphics are not available, since it is not possible to predict every possible scenario, inverse analysis calculations are carried out. The methodology is applied to the “Bois de Peu” tunnel which is composed by two tubes with over 500 m long. High uncertainty levels existed concerning the heterogeneity of the soil and consequently in the geomechanical design parameters. The methodology was applied in four sections and the results focus on two of them. It is shown that the methodology has potential to be applied in real cases contributing for a consistent approach of a real time adaptation of the support system and highlight the importance of the existence of good quality and specific monitoring data to improve the inverse analysis procedure.
Resumo:
In recent years, there has been an increasing number of studies on carrion fly communities due to their medical importance and as a consequence of the large number of studies on forensic entomology. Surprisingly few studies have adressed with the asynantropic flies of the Amazon, and none were done in Colombia. A faunistic study of asynantropic flies of the families Calliphoridae, Sarcophagidae, Muscidae and Fannidae in three different landscapes of the Colombian Amazon is presented, trapping effectiveness is assessed, and the first record of Mesembrinella batesi (Aldrich, 1922) and Fannia femoralis (Stein, 1897) from Colombia is reported.
Resumo:
The suitability of a total-length-based, minimum capture-size and different protection regimes was investigated for the gooseneck barnacle Pollicipes pollicipes shellfishery in N Spain. For this analysis, individuals that were collected from 10 sites under different fishery protection regimes (permanently open, seasonally closed, and permanently closed) were used. First, we applied a non-parametric regression model to explore the relationship between the capitulum Rostro-Tergum (RT) size and the Total Length (TL). Important heteroskedastic disturbances were detected for this relationship, demon- strating a high variability of TL with respect to RT. This result substantiates the unsuitability of a TL-based minimum size by means of a mathematical model. Due to these disturbances, an alternative growth- based minimum capture size of 26.3 mm RT (23 mm RC) was estimated using the first derivative of a Kernel-based non-parametric regression model for the relationship between RT and dry weight. For this purpose, data from the permanently protected area were used to avoid bias due to the fishery. Second, the size-frequency distribution similarity was computed using a MDS analysis for the studied sites to evaluate the effectiveness of the protection regimes. The results of this analysis indicated a positive effect of the permanent protection, while the effect of the seasonal closure was not detected. This result needs to be interpreted with caution because the current harvesting based on a potentially unsuitable mini- mum capture size may dampen the efficacy of the seasonal protection regime.
Resumo:
There is a vast literature that specifies Bayesian shrinkage priors for vector autoregressions (VARs) of possibly large dimensions. In this paper I argue that many of these priors are not appropriate for multi-country settings, which motivates me to develop priors for panel VARs (PVARs). The parametric and semi-parametric priors I suggest not only perform valuable shrinkage in large dimensions, but also allow for soft clustering of variables or countries which are homogeneous. I discuss the implications of these new priors for modelling interdependencies and heterogeneities among different countries in a panel VAR setting. Monte Carlo evidence and an empirical forecasting exercise show clear and important gains of the new priors compared to existing popular priors for VARs and PVARs.
Resumo:
Drawing on PISA data of 2006, this study examines the impact of socio-economic school composition on science test score achievement for Spanish students in compulsory secondary schools. We define school composition in terms of the average parental human capital of students in the same school. These contextual peer effects are estimated using a semi-parametric methodology, which enables the spillovers to affect all the parameters of the educational production function. We also deal with the potential problem of self-selection of student into schools, using an artificial sorting that we argue to be independent from unobserved student’s abilities. The results indicate that the association between socio-economic school composition and test score results is clearly positive and significantly higher when computed with the semi-parametric approach. However, we find that the endogenous sorting of students into schools plays a fundamental role, given that the spillovers are significantly reduced when this selection process is ruled out from our measure of school composition effects. Specifically, the estimations suggest that the contextual peer effects are moderately positive only in those schools where the socio-economic composition is considerably elevated. In addition, we find some evidence of asymmetry of how the external effects and the sorting process actually operate, which seem affect in a different way males and females as well as high and low performance students.
Resumo:
This research studies from an internal view based on the Competency-Based Perspective (CBP), key organizational competencies developed for small new business. CBP is chosen in an attempt to explain the differences characterizing the closed companies from the consolidated ones. The main contribution of this paper is the definition of a set of key organizational competencies for new ventures from services and low technology based sectors. Using the classification proposed by [1] and a review of the entrepreneurship literature, the main competencies were defined and classified as: managerial, input-based, transformation-based, and output-based competencies. The proposed model for evaluating new ventures organizational competence is tested by means of Structural Equation
Resumo:
In the mid-1980s, many European countries introduced fixed-term contracts.Since then their labor markets have become more dynamic. This paper studiesthe implications of such reforms for the duration distribution ofunemployment, with particular emphasis on the changes in the durationdependence. I estimate a parametric duration model using cross-sectionaldata drawn from the Spanish Labor Force Survey from 1980 to 1994 to analyzethe chances of leaving unemployment before and after the introduction offixed-term contracts. I find that duration dependence has increased sincesuch reform. Semi-parametric estimation of the model also shows that forlong spells, the probability of leaving unemployment has decreased sincesuch reform.
Resumo:
In this paper we analyse the observed systematic differences incosts for teaching hospitals (THhenceforth) in Spain. Concernhas been voiced regarding the existence of a bias in thefinancing of TH s has been raised once prospective budgets arein the arena for hospital finance, and claims for adjusting totake into account the legitimate extra costs of teaching onhospital expenditure are well grounded. We focus on theestimation of the impact of teaching status on average cost. Weused a version of a multiproduct hospital cost function takinginto account some relevant factors from which to derive theobserved differences. We assume that the relationship betweenthe explanatory and the dependent variables follows a flexibleform for each of the explanatory variables. We also model theunderlying covariance structure of the data. We assumed twoqualitatively different sources of variation: random effects andserial correlation. Random variation refers to both general levelvariation (through the random intercept) and the variationspecifically related to teaching status. We postulate that theimpact of the random effects is predominant over the impact ofthe serial correlation effects. The model is estimated byrestricted maximum likelihood. Our results show that costs are 9%higher (15% in the case of median costs) in teaching than innon-teaching hospitals. That is, teaching status legitimatelyexplains no more than half of the observed difference in actualcosts. The impact on costs of the teaching factor depends on thenumber of residents, with an increase of 51.11% per resident forhospitals with fewer than 204 residents (third quartile of thenumber of residents) and 41.84% for hospitals with more than 204residents. In addition, the estimated dispersion is higher amongteaching hospitals. As a result, due to the considerable observedheterogeneity, results should be interpreted with caution. From apolicy making point of view, we conclude that since a higherrelative burden for medical training is under public hospitalcommand, an explicit adjustment to the extra costs that theteaching factor imposes on hospital finance is needed, beforehospital competition for inpatient services takes place.
Factors affecting hospital admission and recovery stay duration of in-patient motor victims in Spain
Resumo:
Hospital expenses are a major cost driver of healthcare systems in Europe, with motor injuries being the leading mechanism of hospitalizations. This paper investigates the injury characteristics which explain the hospitalization of victims of traffic accidents that took place in Spain. Using a motor insurance database with 16.081 observations a generalized Tobit regression model is applied to analyse the factors that influence both the likelihood of being admitted to hospital after a motor collision and the length of hospital stay in the event of admission. The consistency of Tobit estimates relies on the normality of perturbation terms. Here a semi-parametric regression model was fitted to test the consistency of estimates, concluding that a normal distribution of errors cannot be rejected. Among other results, it was found that older men with fractures and injuries located in the head and lower torso are more likely to be hospitalized after the collision, and that they also have a longer expected length of hospital recovery stay.
Resumo:
Type 2 diabetes increases the risk of cardiovascular mortality and these patients, even without previous myocardial infarction, run the risk of fatal coronary heart disease similar to non-diabetic patients surviving myocardial infarction. There is evidence showing that particulate matter air pollution is associated with increases in cardiopulmonary morbidity and mortality. The present study was carried out to evaluate the effect of diabetes mellitus on the association of air pollution with cardiovascular emergency room visits in a tertiary referral hospital in the city of São Paulo. Using a time-series approach, and adopting generalized linear Poisson regression models, we assessed the effect of daily variations in PM10, CO, NO2, SO2, and O3 on the daily number of emergency room visits for cardiovascular diseases in diabetic and non-diabetic patients from 2001 to 2003. A semi-parametric smoother (natural spline) was adopted to control long-term trends, linear term seasonal usage and weather variables. In this period, 45,000 cardiovascular emergency room visits were registered. The observed increase in interquartile range within the 2-day moving average of 8.0 µg/m³ SO2 was associated with 7.0% (95%CI: 4.0-11.0) and 20.0% (95%CI: 5.0-44.0) increases in cardiovascular disease emergency room visits by non-diabetic and diabetic groups, respectively. These data indicate that air pollution causes an increase of cardiovascular emergency room visits, and that diabetic patients are extremely susceptible to the adverse effects of air pollution on their health conditions.
Resumo:
The aim of the present study was to develop a classifier able to discriminate between healthy controls and dyspeptic patients by analysis of their electrogastrograms. Fifty-six electrogastrograms were analyzed, corresponding to 42 dyspeptic patients and 14 healthy controls. The original signals were subsampled, filtered and divided into the pre-, post-, and prandial stages. A time-frequency transformation based on wavelets was used to extract the signal characteristics, and a special selection procedure based on correlation was used to reduce their number. The analysis was carried out by evaluating different neural network structures to classify the wavelet coefficients into two groups (healthy subjects and dyspeptic patients). The optimization process of the classifier led to a linear model. A dimension reduction that resulted in only 25% of uncorrelated electrogastrogram characteristics gave 24 inputs for the classifier. The prandial stage gave the most significant results. Under these conditions, the classifier achieved 78.6% sensitivity, 92.9% specificity, and an error of 17.9 ± 6% (with a 95% confidence level). These data show that it is possible to establish significant differences between patients and normal controls when time-frequency characteristics are extracted from an electrogastrogram, with an adequate component reduction, outperforming the results obtained with classical Fourier analysis. These findings can contribute to increasing our understanding of the pathophysiological mechanisms involved in functional dyspepsia and perhaps to improving the pharmacological treatment of functional dyspeptic patients.