913 resultados para Nearest Neighbor


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While dozens of classification algorithms have been applied to time series, recent empirical evidence strongly suggests that simple nearest neighbor classification is exceptionally difficult to beat. The choice of distance measure used by the nearest neighbor algorithm is important, and depends on the invariances required by the domain. For example, motion capture data typically requires invariance to warping, and cardiology data requires invariance to the baseline (the mean value). Similarly, recent work suggests that for time series clustering, the choice of clustering algorithm is much less important than the choice of distance measure used.In this work we make a somewhat surprising claim. There is an invariance that the community seems to have missed, complexity invariance. Intuitively, the problem is that in many domains the different classes may have different complexities, and pairs of complex objects, even those which subjectively may seem very similar to the human eye, tend to be further apart under current distance measures than pairs of simple objects. This fact introduces errors in nearest neighbor classification, where some complex objects may be incorrectly assigned to a simpler class. Similarly, for clustering this effect can introduce errors by “suggesting” to the clustering algorithm that subjectively similar, but complex objects belong in a sparser and larger diameter cluster than is truly warranted.We introduce the first complexity-invariant distance measure for time series, and show that it generally produces significant improvements in classification and clustering accuracy. We further show that this improvement does not compromise efficiency, since we can lower bound the measure and use a modification of triangular inequality, thus making use of most existing indexing and data mining algorithms. We evaluate our ideas with the largest and most comprehensive set of time series mining experiments ever attempted in a single work, and show that complexity-invariant distance measures can produce improvements in classification and clustering in the vast majority of cases.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Im Mittelpunkt der vorliegenden Arbeit standen Untersuchungen zu strukturellen Ordnungs- und Unordnungsphänomenen an natürlichen, substitutionellen Mischkristallen. Aufgrund der enormen Vielfalt an potentiellen Austauschpartnern wurden hierfür Vertreter der Biotit-Mischkristallreihe "Phlogopit-Annit" ausgewählt. Ihr modulartiger Aufbau ermöglichte die gezielte Beschreibung von Verteilungsmustern anionischer und kationischer Merkmalsträger innerhalb der Oktaederschicht der Biotit-Mischkristalle. Basierend auf der postulierten Bindungsaffinität zwischen Mg2+ und F- einerseits und Fe2+ und OH- andererseits, wurde die strukturelle Separation einer Fluor-Phlogopit-Komponente als primäre Ausprägungsform des Mg/F-Ordnungsprinzips abgeleitet. Im Rahmen dieser Modellvorstellung koexistieren im makroskopisch homogenen Biotit-Mischkristall Domänen zweier chemisch divergenter Phasen nebeneinander: Eine rein Mg2+/F- - führende Phlogopit-Phase und eine Wirtskristallphase, die mit fortschreitender Separation bzw. Entmischung der erstgenannten Phase sukzessive reicher an einer hydroxylführenden eisenreichen Annit-Komponente wird. Zwecks numerischer Beschreibung diverser Stadien der Entmischungsreaktion wurden die Begriffe der "Relativen" und "Absoluten Domänengröße" eingeführt. Sie stellen ein quantitatives Maß zur Beurteilung der diskutierten Ordnungsphänomene dar. Basierend auf einem sich stetig ändernden Chemismus der Wirtskristallphase kann jeder Übergangszustand zwischen statistischer Verteilung und vollständiger Ordnung durch das korrespondierende Verteilungsmuster der interessierenden Merkmalsträger ( = Nahordnungskonfiguration und Besetzungswahrscheinlichkeit) charakterisiert und beschrieben werden. Durch mößbauerspektroskopische Untersuchungen konnten die anhand der entwickelten Modelle vorhergesagten Ausprägungsformen von Ordnungs-/Unordnungsphänomenen qualitativ und quantitativ verifiziert werden. Es liessen sich hierbei zwei Gruppen von Biotit-Mischkristallen unterscheiden. Eine erste Gruppe, deren Mößbauer-Spektren durch den OH/F-Chemismus als dominierendes Differenzierungsmerkmal geprägt wird, und eine zweite Gruppe, deren Mößbauer-Spektren durch Gruppierungen von höherwertigen Kationen und Vakanzen ( = Defektchemie) geprägt wird. Auf der Basis von Korrelationsdiagrammen, die einen numerischen und graphischen Bezug zwischen absoluter und relativer Domänengröße einerseits und experimentell zugänglichem Mößbauer-Parameter A (= relativer Flächenanteil, korrespondierend mit der Besetzungswahrscheinlichkeit einer bestimmten Nahordnungs-konstellation) andererseits herstellen, konnten für die erste Gruppe die Volumina der beiden miteinander koexistierenden Komponenten "Hydroxyl-Annit reicher Wirtskristall" und "Fluor-Phlogopit" exakt quantifiziert werden. Das Spektrum der untersuchten Proben umfasste hierbei Kristallspezies, die einerseits durch geringe bis mittlere Mg2+/F- - bzw. Fe2+/OH- -Ordnungsgrade gekennzeichnet sind, andererseits eine nahezu vollständige Ordnung der interessierenden Merkmalsträger Mg2+, Fe2+, OH- und F- widerspiegeln. Desweiteren konnte der Nachweis geführt werden, dass für ausgewählte Proben eine quantitative Bestimmung der Defektvolumina möglich ist.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This dissertation investigates corporate governance and dividend policy in banking. This topic has recently attracted the attention of numerous scholars all over the world and currently remains one of the most discussed topics in Banking. The core of the dissertation is constituted by three papers. The first paper generalizes the main achievements in the field of relevant study using the approach of meta-analysis. The second paper provides an empirical analysis of the effect of banking corporate governance on dividend payout. Finally, the third paper investigates empirically the effect of government bailout during 2007-2010 on corporate governance and dividend policy of banks. The dissertation uses a new hand-collected data set with information on corporate governance, ownership structure and compensation structure for a sample of listed banks from 15 European countries for the period 2005-2010. The empirical papers employ such econometric approaches as Within-Group model, difference-in-difference technique, and propensity score matching method based on the Nearest Neighbor Matching estimator. The main empirical results may be summarized as follows. First, we provide evidence that CEO power and connection to government are associated with lower dividend payout ratios. This result supports the view that banking regulators are prevalently concerned about the safety of the bank, and powerful bank CEOs can afford to distribute low payout ratios, at the expense of minority shareholders. Next, we find that government bailout during 2007-2010 changes the banks’ ownership structure and helps to keep lending by bailed bank at the pre-crisis level. Finally, we provide robust evidence for increased control over the banks that receive government money. These findings show the important role of government when overcoming the consequences of the banking crisis, and high quality of governance of public bailouts in European countries.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In Sub-Saharan Africa, non-democratic events, like civil wars and coup d'etat, destroy economic development. This study investigates both domestic and spatial effects on the likelihood of civil wars and coup d'etat. To civil wars, an increase of income growth is one of common research conclusions to stop wars. This study adds a concern on ethnic fractionalization. IV-2SLS is applied to overcome causality problem. The findings document that income growth is significant to reduce number and degree of violence in high ethnic fractionalized countries, otherwise they are trade-off. Income growth reduces amount of wars, but increases its violent level, in the countries with few large ethnic groups. Promoting growth should consider ethnic composition. This study also investigates the clustering and contagion of civil wars using spatial panel data models. Onset, incidence and end of civil conflicts spread across the network of neighboring countries while peace, the end of conflicts, diffuse only with the nearest neighbor. There is an evidence of indirect links from neighboring income growth, without too much inequality, to reduce the likelihood of civil wars. To coup d'etat, this study revisits its diffusion for both all types of coups and only successful ones. The results find an existence of both domestic and spatial determinants in different periods. Domestic income growth plays major role to reduce the likelihood of coup before cold war ends, while spatial effects do negative afterward. Results on probability to succeed coup are similar. After cold war ends, international organisations seriously promote democracy with pressure against coup d'etat, and it seems to be effective. In sum, this study indicates the role of domestic ethnic fractionalization and the spread of neighboring effects to the likelihood of non-democratic events in a country. Policy implementation should concern these factors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L'attività di tesi è stata svolta presso la divisione System Ceramics della società System Group S.p.A. di Fiorano Modenese (MO) che si occupa dello sviluppo di soluzioni per l'industria ceramica, tra cui la decorazione delle piastrelle. Tipicamente nelle industrie ceramiche la movimentazione dei pezzi è effettuata tramite nastro trasportatore e durante il trasporto i pezzi possono subire leggeri movimenti. Se il pezzo non viene allineato alla stampante prima della fase di decorazione la stampa risulta disallineata e vi possono essere alcune zone non stampate lungo i bordi del pezzo. Perciò prima di procedere con la decorazione è fondamentale correggere il disallineamento. La soluzione più comune è installare delle guide all'ingresso del sistema di decorazione. Oltre a non consentire un’alta precisione, questa soluzione si dimostra inadatta nel caso la decorazione venga applicata in fasi successive da stampanti diverse. Il reparto di ricerca e sviluppo di System Ceramics ha quindi ideato una soluzione diversa e innovativa seguendo l'approccio inverso: allineare la grafica via software a ogni pezzo in base alla sua disposizione, invece che intervenire fisicamente modificandone la posizione. Il nuovo processo di stampa basato sull'allineamento software della grafica consiste nel ricavare inizialmente la disposizione di ogni piastrella utilizzando un sistema di visione artificiale posizionato sul nastro prima della stampante. Successivamente la grafica viene elaborata in base alla disposizione del pezzo ed applicata una volta che il pezzo arriva presso la zona di stampa. L'attività di tesi si è focalizzata sulla fase di rotazione della grafica ed è consistita nello studio e nell’ottimizzazione del prototipo di applicazione esistente al fine di ridurne i tempi di esecuzione. Il prototipo infatti, sebbene funzionante, ha un tempo di esecuzione così elevato da risultare incompatibile con la velocità di produzione adottata dalle industrie ceramiche.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A series of oligodeoxyribonucleotides and oligoribonucleotides containing single and multiple tricyclo(tc)-nucleosides in various arrangements were prepared and the thermal and thermodynamic transition profiles of duplexes with complementary DNA and RNA evaluated. Tc-residues aligned in a non-continuous fashion in an RNA strand significantly decrease affinity to complementary RNA and DNA, mostly as a consequence of a loss of pairing enthalpy DeltaH. Arranging the tc-residues in a continuous fashion rescues T(m) and leads to higher DNA and RNA affinity. Substitution of oligodeoxyribonucleotides in the same way causes much less differences in T(m) when paired to complementary DNA and leads to substantial increases in T(m) when paired to complementary RNA. CD-spectroscopic investigations in combination with molecular dynamics simulations of duplexes with single modifications show that tc-residues in the RNA backbone distinctly influence the conformation of the neighboring nucleotides forcing them into higher energy conformations, while tc-residues in the DNA backbone seem to have negligible influence on the nearest neighbor conformations. These results rationalize the observed affinity differences and are of relevance for the design of tc-DNA containing oligonucleotides for applications in antisense or RNAi therapy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dimensional modeling, GT-Power in particular, has been used for two related purposes-to quantify and understand the inaccuracies of transient engine flow estimates that cause transient smoke spikes and to improve empirical models of opacity or particulate matter used for engine calibration. It has been proposed by dimensional modeling that exhaust gas recirculation flow rate was significantly underestimated and volumetric efficiency was overestimated by the electronic control module during the turbocharger lag period of an electronically controlled heavy duty diesel engine. Factoring in cylinder-to-cylinder variation, it has been shown that the electronic control module estimated fuel-Oxygen ratio was lower than actual by up to 35% during the turbocharger lag period but within 2% of actual elsewhere, thus hindering fuel-Oxygen ratio limit-based smoke control. The dimensional modeling of transient flow was enabled with a new method of simulating transient data in which the manifold pressures and exhaust gas recirculation system flow resistance, characterized as a function of exhaust gas recirculation valve position at each measured transient data point, were replicated by quasi-static or transient simulation to predict engine flows. Dimensional modeling was also used to transform the engine operating parameter model input space to a more fundamental lower dimensional space so that a nearest neighbor approach could be used to predict smoke emissions. This new approach, intended for engine calibration and control modeling, was termed the "nonparametric reduced dimensionality" approach. It was used to predict federal test procedure cumulative particulate matter within 7% of measured value, based solely on steady-state training data. Very little correlation between the model inputs in the transformed space was observed as compared to the engine operating parameter space. This more uniform, smaller, shrunken model input space might explain how the nonparametric reduced dimensionality approach model could successfully predict federal test procedure emissions when roughly 40% of all transient points were classified as outliers as per the steady-state training data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Model-based calibration of steady-state engine operation is commonly performed with highly parameterized empirical models that are accurate but not very robust, particularly when predicting highly nonlinear responses such as diesel smoke emissions. To address this problem, and to boost the accuracy of more robust non-parametric methods to the same level, GT-Power was used to transform the empirical model input space into multiple input spaces that simplified the input-output relationship and improved the accuracy and robustness of smoke predictions made by three commonly used empirical modeling methods: Multivariate Regression, Neural Networks and the k-Nearest Neighbor method. The availability of multiple input spaces allowed the development of two committee techniques: a 'Simple Committee' technique that used averaged predictions from a set of 10 pre-selected input spaces chosen by the training data and the "Minimum Variance Committee" technique where the input spaces for each prediction were chosen on the basis of disagreement between the three modeling methods. This latter technique equalized the performance of the three modeling methods. The successively increasing improvements resulting from the use of a single best transformed input space (Best Combination Technique), Simple Committee Technique and Minimum Variance Committee Technique were verified with hypothesis testing. The transformed input spaces were also shown to improve outlier detection and to improve k-Nearest Neighbor performance when predicting dynamic emissions with steady-state training data. An unexpected finding was that the benefits of input space transformation were unaffected by changes in the hardware or the calibration of the underlying GT-Power model.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Reconstruction of patient-specific 3D bone surface from 2D calibrated fluoroscopic images and a point distribution model is discussed. We present a 2D/3D reconstruction scheme combining statistical extrapolation and regularized shape deformation with an iterative image-to-model correspondence establishing algorithm, and show its application to reconstruct the surface of proximal femur. The image-to-model correspondence is established using a non-rigid 2D point matching process, which iteratively uses a symmetric injective nearest-neighbor mapping operator and 2D thin-plate splines based deformation to find a fraction of best matched 2D point pairs between features detected from the fluoroscopic images and those extracted from the 3D model. The obtained 2D point pairs are then used to set up a set of 3D point pairs such that we turn a 2D/3D reconstruction problem to a 3D/3D one. We designed and conducted experiments on 11 cadaveric femurs to validate the present reconstruction scheme. An average mean reconstruction error of 1.2 mm was found when two fluoroscopic images were used for each bone. It decreased to 1.0 mm when three fluoroscopic images were used.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Maderas volcano is a small, andesitic stratovolcano located on the island of Ometepe, in Lake Nicaragua, Nicaragua with no record of historic activity. Twenty-one samples were collected from lava flows from Maderas in 2010. Selected samples were analyzed for whole-rock geochemical data using ICP-AES and/or were dated using the 40Ar/39Ar method. The results of these analyses were combined with previously collected data from Maderas as well as field observations to determine the eruptive history of the volcano and create a geologic map. The results of the geochemical analyses indicate that Maderas is a typical Central American andesitic volcano similar to other volcanoes in Nicaragua and Costa Rica and to its nearest neighbor, Concepción volcano. It is different from Concepción in one important way – higher incompatible elements. Determined age dates range from 176.8 ± 6.1 ka to 70.5 ± 6.1 ka. Based on these ages and the geomorphology of the volcano which is characterized by a bisecting graben, it is proposed that Maderas experienced two clear generations of development with three separate phases of volcanism: initial build-up of the older cone, pre-graben lava flows, and post-graben lava flows. The ages also indicate that Maderas is markedly older than Concepción which is historically active. Results were also analyzed regarding geologic hazards. The 40Ar/39Ar ages indicate that Maderas has likely been inactive for tens of thousands of years and the risk of future volcanic eruptions is low. However, earthquake, lahar and landslide hazards exist for the communities around the volcano. The steep slopes of the eroded older cone are the most likely source of landslide and lahar hazards.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Obesity is becoming an epidemic phenomenon in most developed countries. The fundamental cause of obesity and overweight is an energy imbalance between calories consumed and calories expended. It is essential to monitor everyday food intake for obesity prevention and management. Existing dietary assessment methods usually require manually recording and recall of food types and portions. Accuracy of the results largely relies on many uncertain factors such as user's memory, food knowledge, and portion estimations. As a result, the accuracy is often compromised. Accurate and convenient dietary assessment methods are still blank and needed in both population and research societies. In this thesis, an automatic food intake assessment method using cameras, inertial measurement units (IMUs) on smart phones was developed to help people foster a healthy life style. With this method, users use their smart phones before and after a meal to capture images or videos around the meal. The smart phone will recognize food items and calculate the volume of the food consumed and provide the results to users. The technical objective is to explore the feasibility of image based food recognition and image based volume estimation. This thesis comprises five publications that address four specific goals of this work: (1) to develop a prototype system with existing methods to review the literature methods, find their drawbacks and explore the feasibility to develop novel methods; (2) based on the prototype system, to investigate new food classification methods to improve the recognition accuracy to a field application level; (3) to design indexing methods for large-scale image database to facilitate the development of new food image recognition and retrieval algorithms; (4) to develop novel convenient and accurate food volume estimation methods using only smart phones with cameras and IMUs. A prototype system was implemented to review existing methods. Image feature detector and descriptor were developed and a nearest neighbor classifier were implemented to classify food items. A reedit card marker method was introduced for metric scale 3D reconstruction and volume calculation. To increase recognition accuracy, novel multi-view food recognition algorithms were developed to recognize regular shape food items. To further increase the accuracy and make the algorithm applicable to arbitrary food items, new food features, new classifiers were designed. The efficiency of the algorithm was increased by means of developing novel image indexing method in large-scale image database. Finally, the volume calculation was enhanced through reducing the marker and introducing IMUs. Sensor fusion technique to combine measurements from cameras and IMUs were explored to infer the metric scale of the 3D model as well as reduce noises from these sensors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Greedy routing can be used in mobile ad-hoc networks as geographic routing protocol. This paper proposes to use greedy routing also in overlay networks by positioning overlay nodes into a multi-dimensional Euclidean space. Greedy routing can only be applied when a routing decision makes progress towards the final destination. Our proposed overlay network is built such that there will be always progress at each forwarding node. This is achieved by constructing at each node a so-called nearest neighbor convex set (NNCS). NNCSs can be used for various applications such as multicast routing, service discovery and Quality-of-Service routing. NNCS has been compared with Pastry, another topology-aware overlay network. NNCS has superior relative path stretches indicating the optimality of a path.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background The RCSB Protein Data Bank (PDB) provides public access to experimentally determined 3D-structures of biological macromolecules (proteins, peptides and nucleic acids). While various tools are available to explore the PDB, options to access the global structural diversity of the entire PDB and to perceive relationships between PDB structures remain very limited. Methods A 136-dimensional atom pair 3D-fingerprint for proteins (3DP) counting categorized atom pairs at increasing through-space distances was designed to represent the molecular shape of PDB-entries. Nearest neighbor searches examples were reported exemplifying the ability of 3DP-similarity to identify closely related biomolecules from small peptides to enzyme and large multiprotein complexes such as virus particles. The principle component analysis was used to obtain the visualization of PDB in 3DP-space. Results The 3DP property space groups proteins and protein assemblies according to their 3D-shape similarity, yet shows exquisite ability to distinguish between closely related structures. An interactive website called PDB-Explorer is presented featuring a color-coded interactive map of PDB in 3DP-space. Each pixel of the map contains one or more PDB-entries which are directly visualized as ribbon diagrams when the pixel is selected. The PDB-Explorer website allows performing 3DP-nearest neighbor searches of any PDB-entry or of any structure uploaded as protein-type PDB file. All functionalities on the website are implemented in JavaScript in a platform-independent manner and draw data from a server that is updated daily with the latest PDB additions, ensuring complete and up-to-date coverage. The essentially instantaneous 3DP-similarity search with the PDB-Explorer provides results comparable to those of much slower 3D-alignment algorithms, and automatically clusters proteins from the same superfamilies in tight groups. Conclusion A chemical space classification of PDB based on molecular shape was obtained using a new atom-pair 3D-fingerprint for proteins and implemented in a web-based database exploration tool comprising an interactive color-coded map of the PDB chemical space and a nearest neighbor search tool. The PDB-Explorer website is freely available at www.​cheminfo.​org/​pdbexplorer and represents an unprecedented opportunity to interactively visualize and explore the structural diversity of the PDB.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An Internet portal accessible at www.gdb.unibe.ch has been set up to automatically generate color-coded similarity maps of the ChEMBL database in relation to up to two sets of active compounds taken from the enhanced Directory of Useful Decoys (eDUD), a random set of molecules, or up to two sets of user-defined reference molecules. These maps visualize the relationships between the selected compounds and ChEMBL in six different high dimensional chemical spaces, namely MQN (42-D molecular quantum numbers), SMIfp (34-D SMILES fingerprint), APfp (20-D shape fingerprint), Xfp (55-D pharmacophore fingerprint), Sfp (1024-bit substructure fingerprint), and ECfp4 (1024-bit extended connectivity fingerprint). The maps are supplied in form of Java based desktop applications called “similarity mapplets” allowing interactive content browsing and linked to a “Multifingerprint Browser for ChEMBL” (also accessible directly at www.gdb.unibe.ch) to perform nearest neighbor searches. One can obtain six similarity mapplets of ChEMBL relative to random reference compounds, 606 similarity mapplets relative to single eDUD active sets, 30 300 similarity mapplets relative to pairs of eDUD active sets, and any number of similarity mapplets relative to user-defined reference sets to help visualize the structural diversity of compound series in drug optimization projects and their relationship to other known bioactive compounds.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

One of the simplest questions that can be asked about molecular diversity is how many organic molecules are possible in total? To answer this question, my research group has computationally enumerated all possible organic molecules up to a certain size to gain an unbiased insight into the entire chemical space. Our latest database, GDB-17, contains 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens, by far the largest small molecule database reported to date. Molecules allowed by valency rules but unstable or nonsynthesizable due to strained topologies or reactive functional groups were not considered, which reduced the enumeration by at least 10 orders of magnitude and was essential to arrive at a manageable database size. Despite these restrictions, GDB-17 is highly relevant with respect to known molecules. Beyond enumeration, understanding and exploiting GDBs (generated databases) led us to develop methods for virtual screening and visualization of very large databases in the form of a “periodic system of molecules” comprising six different fingerprint spaces, with web-browsers for nearest neighbor searches, and the MQN- and SMIfp-Mapplet application for exploring color-coded principal component maps of GDB and other large databases. Proof-of-concept applications of GDB for drug discovery were realized by combining virtual screening with chemical synthesis and activity testing for neurotransmitter receptor and transporter ligands. One surprising lesson from using GDB for drug analog searches is the incredible depth of chemical space, that is, the fact that millions of very close analogs of any molecule can be readily identified by nearest-neighbor searches in the MQN-space of the various GDBs. The chemical space project has opened an unprecedented door on chemical diversity. Ongoing and yet unmet challenges concern enumerating molecules beyond 17 atoms and synthesizing GDB molecules with innovative scaffolds and pharmacophores.