920 resultados para Large Data
Resumo:
Context. The ESA Rosetta spacecraft, currently orbiting around comet 67P/Churyumov-Gerasimenko, has already provided in situ measurements of the dust grain properties from several instruments,particularly OSIRIS and GIADA. We propose adding value to those measurements by combining them with ground-based observations of the dust tail to monitor the overall, time-dependent dust-production rate and size distribution.
Aims. To constrain the dust grain properties, we take Rosetta OSIRIS and GIADA results into account, and combine OSIRIS data during the approach phase (from late April to early June 2014) with a large data set of ground-based images that were acquired with the ESO Very Large Telescope (VLT) from February to November 2014.
Methods. A Monte Carlo dust tail code, which has already been used to characterise the dust environments of several comets and active asteroids, has been applied to retrieve the dust parameters. Key properties of the grains (density, velocity, and size distribution) were obtained from Rosetta observations: these parameters were used as input of the code to considerably reduce the number of free parameters. In this way, the overall dust mass-loss rate and its dependence on the heliocentric distance could be obtained accurately.
Results. The dust parameters derived from the inner coma measurements by OSIRIS and GIADA and from distant imaging using VLT data are consistent, except for the power index of the size-distribution function, which is α = −3, instead of α = −2, for grains smaller than 1 mm. This is possibly linked to the presence of fluffy aggregates in the coma. The onset of cometary activity occurs at approximately 4.3 AU, with a dust production rate of 0.5 kg/s, increasing up to 15 kg/s at 2.9 AU. This implies a dust-to-gas mass ratio varying between 3.8 and 6.5 for the best-fit model when combined with water-production rates from the MIRO experiment.
Resumo:
Accurate address information from health service providers is fundamental for the effective delivery of health care and population monitoring and screening. While it is currently used in the production of key statistics such as internal migration estimates, it will become even more important over time with the 2021 Census of UK constituent countries integrating administrative data to enhance the quality of statistical outputs. Therefore, it is beneficial to improve understanding of the accuracy of address information held by health service providers and factors that influence this. This paper builds upon previous research on the social geography of address mismatch between census and health service records in Northern Ireland. It is based on the Northern Ireland Longitudinal Study; this is a large data linkage study including about 28 per cent of the Northern Ireland population, which is matched between the census (2001, 2011) and Health Card Registration System maintained by the Health and Social Care Business Service Organisation (BSO). This research compares address information from the Spring 2011 BSO download (Unique Property Reference Number, Super Output Area) with comparable geographic information from the 2011 Census. Multivariate and multilevel analyses are used to assess the individual and ecological determinants of match/mismatch between geographical information in both data sources to determine if the characteristics of the associated people and places are the same as the position observed in 2001. It is important to understand if the same people are being inaccurately geographically referenced in both Census years or if the situation is more variable.
Resumo:
We present a large data set of high-cadence dMe flare light curves obtained with custom continuum filters on the triple-beam, high-speed camera system ULTRACAM. The measurements provide constraints for models of the near-ultraviolet (NUV) and optical continuum spectral evolution on timescales of ≈1 s. We provide a robust interpretation of the flare emission in the ULTRACAM filters using simultaneously obtained low-resolution spectra during two moderate-sized flares in the dM4.5e star YZ CMi. By avoiding the spectral complexity within the broadband Johnson filters, the ULTRACAM filters are shown to characterize bona fide continuum emission in the NUV, blue, and red wavelength regimes. The NUV/blue flux ratio in flares is equivalent to a Balmer jump ratio, and the blue/red flux ratio provides an estimate for the color temperature of the optical continuum emission. We present a new “color-color” relationship for these continuum flux ratios at the peaks of the flares. Using the RADYN and RH codes, we interpret the ULTRACAM filter emission using the dominant emission processes from a radiative-hydrodynamic flare model with a high nonthermal electron beam flux, which explains a hot, T ≈ 104 K, color temperature at blue-to-red optical wavelengths and a small Balmer jump ratio as observed in moderate-sized and large flares alike. We also discuss the high time resolution, high signal-to-noise continuum color variations observed in YZ CMi during a giant flare, which increased the NUV flux from this star by over a factor of 100. Based on observations obtained with the Apache Point Observatory 3.5 m telescope, which is owned and operated by the Astrophysical Research Consortium, based on observations made with the William Herschel Telescope operated on the island of La Palma by the Isaac Newton Group in the Spanish Observatorio del Roque de los Muchachos of the Instituto de Astrofsica de Canarias, and observations, and based on observations made with the ESO Telescopes at the La Silla Paranal Observatory under programme ID 085.D-0501(A).
Resumo:
Previous research on the prediction of fiscal aggregates has shown evidence that simple autoregressive models often provide better forecasts of fiscal variables than multivariate specifications. We argue that the multivariate models considered by previous studies are small-scale, probably burdened by overparameterization, and not robust to structural changes. Bayesian Vector Autoregressions (BVARs), on the other hand, allow the information contained in a large data set to be summarized efficiently, and can also allow for time variation in both the coefficients and the volatilities. In this paper we explore the performance of BVARs with constant and drifting coefficients for forecasting key fiscal variables such as government revenues, expenditures, and interest payments on the outstanding debt. We focus on both point and density forecasting, as assessments of a country’s fiscal stability and overall credit risk should typically be based on the specification of a whole probability distribution for the future state of the economy. Using data from the US and the largest European countries, we show that both the adoption of a large system and the introduction of time variation help in forecasting, with the former playing a relatively more important role in point forecasting, and the latter being more important for density forecasting.
Resumo:
O trabalho apresentado centra-se na determinação dos custos de construção de condutas de pequenos e médios diâmetros em Polietileno de Alta Densidade (PEAD) para saneamento básico, tendo como base a metodologia descrita no livro Custos de Construção e Exploração – Volume 9 da série Gestão de Sistemas de Saneamento Básico, de Lencastre et al. (1994). Esta metodologia descrita no livro já referenciado, nos procedimentos de gestão de obra, e para tal foram estimados custos unitários de diversos conjuntos de trabalhos. Conforme Lencastre et al (1994), “esses conjuntos são referentes a movimentos de terras, tubagens, acessórios e respetivos órgãos de manobra, pavimentações e estaleiro, estando englobado na parte do estaleiro trabalhos acessórios correspondentes à obra.” Os custos foram obtidos analisando vários orçamentos de obras de saneamento, resultantes de concursos públicos de empreitadas recentemente realizados. Com vista a tornar a utilização desta metodologia numa ferramenta eficaz, foram organizadas folhas de cálculo que possibilitam obter estimativas realistas dos custos de execução de determinada obra em fases anteriores ao desenvolvimento do projeto, designadamente numa fase de preparação do plano diretor de um sistema ou numa fase de elaboração de estudos de viabilidade económico-financeiros, isto é, mesmo antes de existir qualquer pré-dimensionamento dos elementos do sistema. Outra técnica implementada para avaliar os dados de entrada foi a “Análise Robusta de Dados”, Pestana (1992). Esta metodologia permitiu analisar os dados mais detalhadamente antes de se formularem hipóteses para desenvolverem a análise de risco. A ideia principal é o exame bastante flexível dos dados, frequentemente antes mesmo de os comparar a um modelo probabilístico. Assim, e para um largo conjunto de dados, esta técnica possibilitou analisar a disparidade dos valores encontrados para os diversos trabalhos referenciados anteriormente. Com os dados recolhidos, e após o seu tratamento, passou-se à aplicação de uma metodologia de Análise de Risco, através da Simulação de Monte Carlo. Esta análise de risco é feita com recurso a uma ferramenta informática da Palisade, o @Risk, disponível no Departamento de Engenharia Civil. Esta técnica de análise quantitativa de risco permite traduzir a incerteza dos dados de entrada, representada através de distribuições probabilísticas que o software disponibiliza. Assim, para por em prática esta metodologia, recorreu-se às folhas de cálculo que foram realizadas seguindo a abordagem proposta em Lencastre et al (1994). A elaboração e a análise dessas estimativas poderão conduzir à tomada de decisões sobre a viabilidade da ou das obras a realizar, nomeadamente no que diz respeito aos aspetos económicos, permitindo uma análise de decisão fundamentada quanto à realização dos investimentos.
Resumo:
Wireless communications had a great development in the last years and nowadays they are present everywhere, public and private, being increasingly used for different applications. Their application in the business of sports events as a means to improve the experience of the fans at the games is becoming essential, such as sharing messages and multimedia material on social networks. In the stadiums, given the high density of people, the wireless networks require very large data capacity. Hence radio coverage employing many small sized sectors is unavoidable. In this paper, an antenna is designed to operate in the Wi-Fi 5GHz frequency band, with a directive radiation pattern suitable to this kind of applications. Furthermore, despite the large bandwidth and low losses, this antenna has been developed using low cost, off-the-shelf materials without sacrificing quality or performance, essential to mass production. © 2015 EurAAP.
Resumo:
Rubisco is responsible for the fixation of CO2 into organic compounds through photosynthesis and thus has a great agronomic importance. It is well established that this enzyme suffers from a slow catalysis, and its low specificity results into photorespiration, which is considered as an energy waste for the plant. However, natural variations exist, and some Rubisco lineages, such as in C4 plants, exhibit higher catalytic efficiencies coupled to lower specificities. These C4 kinetics could have evolved as an adaptation to the higher CO2 concentration present in C4 photosynthetic cells. In this study, using phylogenetic analyses on a large data set of C3 and C4 monocots, we showed that the rbcL gene, which encodes the large subunit of Rubisco, evolved under positive selection in independent C4 lineages. This confirms that selective pressures on Rubisco have been switched in C4 plants by the high CO2 environment prevailing in their photosynthetic cells. Eight rbcL codons evolving under positive selection in C4 clades were involved in parallel changes among the 23 independent monocot C4 lineages included in this study. These amino acids are potentially responsible for the C4 kinetics, and their identification opens new roads for human-directed Rubisco engineering. The introgression of C4-like high-efficiency Rubisco would strongly enhance C3 crop yields in the future CO2-enriched atmosphere.
Resumo:
Ce mémoire est composé de trois articles qui s’unissent sous le thème de la recommandation musicale à grande échelle. Nous présentons d’abord une méthode pour effectuer des recommandations musicales en récoltant des étiquettes (tags) décrivant les items et en utilisant cette aura textuelle pour déterminer leur similarité. En plus d’effectuer des recommandations qui sont transparentes et personnalisables, notre méthode, basée sur le contenu, n’est pas victime des problèmes dont souffrent les systèmes de filtrage collaboratif, comme le problème du démarrage à froid (cold start problem). Nous présentons ensuite un algorithme d’apprentissage automatique qui applique des étiquettes à des chansons à partir d’attributs extraits de leur fichier audio. L’ensemble de données que nous utilisons est construit à partir d’une très grande quantité de données sociales provenant du site Last.fm. Nous présentons finalement un algorithme de génération automatique de liste d’écoute personnalisable qui apprend un espace de similarité musical à partir d’attributs audio extraits de chansons jouées dans des listes d’écoute de stations de radio commerciale. En plus d’utiliser cet espace de similarité, notre système prend aussi en compte un nuage d’étiquettes que l’utilisateur est en mesure de manipuler, ce qui lui permet de décrire de manière abstraite la sorte de musique qu’il désire écouter.
Resumo:
Avec les avancements de la technologie de l'information, les données temporelles économiques et financières sont de plus en plus disponibles. Par contre, si les techniques standard de l'analyse des séries temporelles sont utilisées, une grande quantité d'information est accompagnée du problème de dimensionnalité. Puisque la majorité des séries d'intérêt sont hautement corrélées, leur dimension peut être réduite en utilisant l'analyse factorielle. Cette technique est de plus en plus populaire en sciences économiques depuis les années 90. Étant donnée la disponibilité des données et des avancements computationnels, plusieurs nouvelles questions se posent. Quels sont les effets et la transmission des chocs structurels dans un environnement riche en données? Est-ce que l'information contenue dans un grand ensemble d'indicateurs économiques peut aider à mieux identifier les chocs de politique monétaire, à l'égard des problèmes rencontrés dans les applications utilisant des modèles standards? Peut-on identifier les chocs financiers et mesurer leurs effets sur l'économie réelle? Peut-on améliorer la méthode factorielle existante et y incorporer une autre technique de réduction de dimension comme l'analyse VARMA? Est-ce que cela produit de meilleures prévisions des grands agrégats macroéconomiques et aide au niveau de l'analyse par fonctions de réponse impulsionnelles? Finalement, est-ce qu'on peut appliquer l'analyse factorielle au niveau des paramètres aléatoires? Par exemple, est-ce qu'il existe seulement un petit nombre de sources de l'instabilité temporelle des coefficients dans les modèles macroéconomiques empiriques? Ma thèse, en utilisant l'analyse factorielle structurelle et la modélisation VARMA, répond à ces questions à travers cinq articles. Les deux premiers chapitres étudient les effets des chocs monétaire et financier dans un environnement riche en données. Le troisième article propose une nouvelle méthode en combinant les modèles à facteurs et VARMA. Cette approche est appliquée dans le quatrième article pour mesurer les effets des chocs de crédit au Canada. La contribution du dernier chapitre est d'imposer la structure à facteurs sur les paramètres variant dans le temps et de montrer qu'il existe un petit nombre de sources de cette instabilité. Le premier article analyse la transmission de la politique monétaire au Canada en utilisant le modèle vectoriel autorégressif augmenté par facteurs (FAVAR). Les études antérieures basées sur les modèles VAR ont trouvé plusieurs anomalies empiriques suite à un choc de la politique monétaire. Nous estimons le modèle FAVAR en utilisant un grand nombre de séries macroéconomiques mensuelles et trimestrielles. Nous trouvons que l'information contenue dans les facteurs est importante pour bien identifier la transmission de la politique monétaire et elle aide à corriger les anomalies empiriques standards. Finalement, le cadre d'analyse FAVAR permet d'obtenir les fonctions de réponse impulsionnelles pour tous les indicateurs dans l'ensemble de données, produisant ainsi l'analyse la plus complète à ce jour des effets de la politique monétaire au Canada. Motivée par la dernière crise économique, la recherche sur le rôle du secteur financier a repris de l'importance. Dans le deuxième article nous examinons les effets et la propagation des chocs de crédit sur l'économie réelle en utilisant un grand ensemble d'indicateurs économiques et financiers dans le cadre d'un modèle à facteurs structurel. Nous trouvons qu'un choc de crédit augmente immédiatement les diffusions de crédit (credit spreads), diminue la valeur des bons de Trésor et cause une récession. Ces chocs ont un effet important sur des mesures d'activité réelle, indices de prix, indicateurs avancés et financiers. Contrairement aux autres études, notre procédure d'identification du choc structurel ne requiert pas de restrictions temporelles entre facteurs financiers et macroéconomiques. De plus, elle donne une interprétation des facteurs sans restreindre l'estimation de ceux-ci. Dans le troisième article nous étudions la relation entre les représentations VARMA et factorielle des processus vectoriels stochastiques, et proposons une nouvelle classe de modèles VARMA augmentés par facteurs (FAVARMA). Notre point de départ est de constater qu'en général les séries multivariées et facteurs associés ne peuvent simultanément suivre un processus VAR d'ordre fini. Nous montrons que le processus dynamique des facteurs, extraits comme combinaison linéaire des variables observées, est en général un VARMA et non pas un VAR comme c'est supposé ailleurs dans la littérature. Deuxièmement, nous montrons que même si les facteurs suivent un VAR d'ordre fini, cela implique une représentation VARMA pour les séries observées. Alors, nous proposons le cadre d'analyse FAVARMA combinant ces deux méthodes de réduction du nombre de paramètres. Le modèle est appliqué dans deux exercices de prévision en utilisant des données américaines et canadiennes de Boivin, Giannoni et Stevanovic (2010, 2009) respectivement. Les résultats montrent que la partie VARMA aide à mieux prévoir les importants agrégats macroéconomiques relativement aux modèles standards. Finalement, nous estimons les effets de choc monétaire en utilisant les données et le schéma d'identification de Bernanke, Boivin et Eliasz (2005). Notre modèle FAVARMA(2,1) avec six facteurs donne les résultats cohérents et précis des effets et de la transmission monétaire aux États-Unis. Contrairement au modèle FAVAR employé dans l'étude ultérieure où 510 coefficients VAR devaient être estimés, nous produisons les résultats semblables avec seulement 84 paramètres du processus dynamique des facteurs. L'objectif du quatrième article est d'identifier et mesurer les effets des chocs de crédit au Canada dans un environnement riche en données et en utilisant le modèle FAVARMA structurel. Dans le cadre théorique de l'accélérateur financier développé par Bernanke, Gertler et Gilchrist (1999), nous approximons la prime de financement extérieur par les credit spreads. D'un côté, nous trouvons qu'une augmentation non-anticipée de la prime de financement extérieur aux États-Unis génère une récession significative et persistante au Canada, accompagnée d'une hausse immédiate des credit spreads et taux d'intérêt canadiens. La composante commune semble capturer les dimensions importantes des fluctuations cycliques de l'économie canadienne. L'analyse par décomposition de la variance révèle que ce choc de crédit a un effet important sur différents secteurs d'activité réelle, indices de prix, indicateurs avancés et credit spreads. De l'autre côté, une hausse inattendue de la prime canadienne de financement extérieur ne cause pas d'effet significatif au Canada. Nous montrons que les effets des chocs de crédit au Canada sont essentiellement causés par les conditions globales, approximées ici par le marché américain. Finalement, étant donnée la procédure d'identification des chocs structurels, nous trouvons des facteurs interprétables économiquement. Le comportement des agents et de l'environnement économiques peut varier à travers le temps (ex. changements de stratégies de la politique monétaire, volatilité de chocs) induisant de l'instabilité des paramètres dans les modèles en forme réduite. Les modèles à paramètres variant dans le temps (TVP) standards supposent traditionnellement les processus stochastiques indépendants pour tous les TVPs. Dans cet article nous montrons que le nombre de sources de variabilité temporelle des coefficients est probablement très petit, et nous produisons la première évidence empirique connue dans les modèles macroéconomiques empiriques. L'approche Factor-TVP, proposée dans Stevanovic (2010), est appliquée dans le cadre d'un modèle VAR standard avec coefficients aléatoires (TVP-VAR). Nous trouvons qu'un seul facteur explique la majorité de la variabilité des coefficients VAR, tandis que les paramètres de la volatilité des chocs varient d'une façon indépendante. Le facteur commun est positivement corrélé avec le taux de chômage. La même analyse est faite avec les données incluant la récente crise financière. La procédure suggère maintenant deux facteurs et le comportement des coefficients présente un changement important depuis 2007. Finalement, la méthode est appliquée à un modèle TVP-FAVAR. Nous trouvons que seulement 5 facteurs dynamiques gouvernent l'instabilité temporelle dans presque 700 coefficients.
Resumo:
Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining
Resumo:
Post-transcriptional gene silencing by RNA interference is mediated by small interfering RNA called siRNA. This gene silencing mechanism can be exploited therapeutically to a wide variety of disease-associated targets, especially in AIDS, neurodegenerative diseases, cholesterol and cancer on mice with the hope of extending these approaches to treat humans. Over the recent past, a significant amount of work has been undertaken to understand the gene silencing mediated by exogenous siRNA. The design of efficient exogenous siRNA sequences is challenging because of many issues related to siRNA. While designing efficient siRNA, target mRNAs must be selected such that their corresponding siRNAs are likely to be efficient against that target and unlikely to accidentally silence other transcripts due to sequence similarity. So before doing gene silencing by siRNAs, it is essential to analyze their off-target effects in addition to their inhibition efficiency against a particular target. Hence designing exogenous siRNA with good knock-down efficiency and target specificity is an area of concern to be addressed. Some methods have been developed already by considering both inhibition efficiency and off-target possibility of siRNA against agene. Out of these methods, only a few have achieved good inhibition efficiency, specificity and sensitivity. The main focus of this thesis is to develop computational methods to optimize the efficiency of siRNA in terms of “inhibition capacity and off-target possibility” against target mRNAs with improved efficacy, which may be useful in the area of gene silencing and drug design for tumor development. This study aims to investigate the currently available siRNA prediction approaches and to devise a better computational approach to tackle the problem of siRNA efficacy by inhibition capacity and off-target possibility. The strength and limitations of the available approaches are investigated and taken into consideration for making improved solution. Thus the approaches proposed in this study extend some of the good scoring previous state of the art techniques by incorporating machine learning and statistical approaches and thermodynamic features like whole stacking energy to improve the prediction accuracy, inhibition efficiency, sensitivity and specificity. Here, we propose one Support Vector Machine (SVM) model, and two Artificial Neural Network (ANN) models for siRNA efficiency prediction. In SVM model, the classification property is used to classify whether the siRNA is efficient or inefficient in silencing a target gene. The first ANNmodel, named siRNA Designer, is used for optimizing the inhibition efficiency of siRNA against target genes. The second ANN model, named Optimized siRNA Designer, OpsiD, produces efficient siRNAs with high inhibition efficiency to degrade target genes with improved sensitivity-specificity, and identifies the off-target knockdown possibility of siRNA against non-target genes. The models are trained and tested against a large data set of siRNA sequences. The validations are conducted using Pearson Correlation Coefficient, Mathews Correlation Coefficient, Receiver Operating Characteristic analysis, Accuracy of prediction, Sensitivity and Specificity. It is found that the approach, OpsiD, is capable of predicting the inhibition capacity of siRNA against a target mRNA with improved results over the state of the art techniques. Also we are able to understand the influence of whole stacking energy on efficiency of siRNA. The model is further improved by including the ability to identify the “off-target possibility” of predicted siRNA on non-target genes. Thus the proposed model, OpsiD, can predict optimized siRNA by considering both “inhibition efficiency on target genes and off-target possibility on non-target genes”, with improved inhibition efficiency, specificity and sensitivity. Since we have taken efforts to optimize the siRNA efficacy in terms of “inhibition efficiency and offtarget possibility”, we hope that the risk of “off-target effect” while doing gene silencing in various bioinformatics fields can be overcome to a great extent. These findings may provide new insights into cancer diagnosis, prognosis and therapy by gene silencing. The approach may be found useful for designing exogenous siRNA for therapeutic applications and gene silencing techniques in different areas of bioinformatics.
Resumo:
Ontic is an interactive system for developing and verifying mathematics. Ontic's verification mechanism is capable of automatically finding and applying information from a library containing hundreds of mathematical facts. Starting with only the axioms of Zermelo-Fraenkel set theory, the Ontic system has been used to build a data base of definitions and lemmas leading to a proof of the Stone representation theorem for Boolean lattices. The Ontic system has been used to explore issues in knowledge representation, automated deduction, and the automatic use of large data bases.
Resumo:
This report examines how to estimate the parameters of a chaotic system given noisy observations of the state behavior of the system. Investigating parameter estimation for chaotic systems is interesting because of possible applications for high-precision measurement and for use in other signal processing, communication, and control applications involving chaotic systems. In this report, we examine theoretical issues regarding parameter estimation in chaotic systems and develop an efficient algorithm to perform parameter estimation. We discover two properties that are helpful for performing parameter estimation on non-structurally stable systems. First, it turns out that most data in a time series of state observations contribute very little information about the underlying parameters of a system, while a few sections of data may be extraordinarily sensitive to parameter changes. Second, for one-parameter families of systems, we demonstrate that there is often a preferred direction in parameter space governing how easily trajectories of one system can "shadow'" trajectories of nearby systems. This asymmetry of shadowing behavior in parameter space is proved for certain families of maps of the interval. Numerical evidence indicates that similar results may be true for a wide variety of other systems. Using the two properties cited above, we devise an algorithm for performing parameter estimation. Standard parameter estimation techniques such as the extended Kalman filter perform poorly on chaotic systems because of divergence problems. The proposed algorithm achieves accuracies several orders of magnitude better than the Kalman filter and has good convergence properties for large data sets.
Resumo:
The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT&T Bell Labs. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and Multi-Layer Perceptron classifiers. An interesting property of this approach is that it is an approximate implementation of the Structural Risk Minimization (SRM) induction principle. The derivation of Support Vector Machines, its relationship with SRM, and its geometrical insight, are discussed in this paper. Training a SVM is equivalent to solve a quadratic programming problem with linear and box constraints in a number of variables equal to the number of data points. When the number of data points exceeds few thousands the problem is very challenging, because the quadratic form is completely dense, so the memory needed to store the problem grows with the square of the number of data points. Therefore, training problems arising in some real applications with large data sets are impossible to load into memory, and cannot be solved using standard non-linear constrained optimization algorithms. We present a decomposition algorithm that can be used to train SVM's over large data sets. The main idea behind the decomposition is the iterative solution of sub-problems and the evaluation of, and also establish the stopping criteria for the algorithm. We present previous approaches, as well as results and important details of our implementation of the algorithm using a second-order variant of the Reduced Gradient Method as the solver of the sub-problems. As an application of SVM's, we present preliminary results we obtained applying SVM to the problem of detecting frontal human faces in real images.
Resumo:
La feina feta en aquest treball de tesis s'ha desenvolupat a partir de tres objectius vertebradors, que fonamentalment són: ·El primer dels objectius d'aquest treball de tesis és presentar un recull dels factors que intervenen en l'acústica urbanística: soroll produït pels diferents vehicles, fórmules de predicció de soroll, geometria dels edificis... , tot estudiant els seus efectes en la ciutat de Girona. ·Un altre objectiu ha estat desenvolupar uns mètodes numèrics propis, contrastats experimentalment i extrapolables a qualsevol entorn urbanístic, que permetin predir els valors de les pertorbacions acústiques produïdes pels diferents vehicles en diferents situacions, entre els que es destaquen: -Fórmula de predicció del soroll en un entorn urbà i la seva aplicació a Girona. -Càlcul de l'increment de soroll en un carrer provocat per les reflexions de les ones sonores en les façanes dels edificis. -Estudi del nivell de soroll en la boca de la cavitat d'un túnel produït pel pas del ferrocarril. -Determinació del soroll provocat pel pas d'un tren sobre un viaducte. -Mètode de distribució i planificació del trànsit urbà per disminuir l'impacte acústic sobre la zona. ·El darrer objectiu consisteix en fer una descripció analítica de les principals fonts de soroll que afecten a la ciutat: el trànsit viari i el ferrocarril. Per realitzar aquests objectius s'ha disposat d'un banc de dades amb més de 2.000 mesures sonores de Girona (nivells equivalents de 10 minuts de durada). La metodologia seguida i els principis en que es fonamenta es detallen a l'inici de cada apartat. La finalitat de tots aquests estudis, no és altre que millorar el confort acústic, i la qualitat de vida, de les ciutats. Gairebé tots els grans nuclis de població del planeta es veuen afectats per una gravíssima problemàtica mediambiental, doncs a l'anomenada contaminació acústica cal afegir uns alts índexs de pol·lució atmosfèrica (altes concentracions de biòxid de carboni, generació d'illes de calor...). Aquesta situació, generalitzada arreu del planeta, ha propiciat l'aparició de mesures dràstiques consistents fonamentalment en restringir l'accés dels vehicles motoritzats als nuclis i zones centrals de les àrees urbanes. Precisament aquesta opció s'ha proposat per les zones interiors de Girona on l'elevada densitat de les edificacions deixa un escàs marge per plantejar la construcció de noves rutes o vies alternatives. Cal esmentar que tots els càlculs i teories que es desenvolupen en aquest treball de tesis reflecteixen la realitat acústica actual provocada pels diferents mitjans de transport. Molt possiblement, en un futur no massa llunyà, els nivells de soroll (dB) enregistrats en situacions de tràfic similar seran força menors. Són molts els factors que poden contribuir a aquesta disminució de la intensitat de les emissions sonores: reducció del fregament mecànic, augment del coeficient aerodinàmic, nous materials pels pneumàtics i l'asfalt ... Sense cap mena de dubte, però, una millora transcendental, i no només pel que fa al confort acústic sinó per l'ecosistema en general, seria potenciar la construcció de motors elèctrics o d'hidrogen. Aquests últims per exemple, a diferència dels motors de combustió, funcionen mitjançant piles de combustible que converteixen, amb molta netedat, el gas hidrogen en electricitat i possibiliten l'existència de vehicles no contaminants propulsats per motors elèctrics menys sorollosos. Així, al haver-hi menys fregament entre les parts mòbils del motor (no hi ha pistons ni cilindres) el soroll generat es reduiria considerablement.