989 resultados para data matching
Resumo:
One of the main challenges of classifying clinical data is determining how to handle missing features. Most research favours imputing of missing values or neglecting records that include missing data, both of which can degrade accuracy when missing values exceed a certain level. In this research we propose a methodology to handle data sets with a large percentage of missing values and with high variability in which particular data are missing. Feature selection is effected by picking variables sequentially in order of maximum correlation with the dependent variable and minimum correlation with variables already selected. Classification models are generated individually for each test case based on its particular feature set and the matching data values available in the training population. The method was applied to real patients' anonymous mental-health data where the task was to predict the suicide risk judgement clinicians would give for each patient's data, with eleven possible outcome classes: zero to ten, representing no risk to maximum risk. The results compare favourably with alternative methods and have the advantage of ensuring explanations of risk are based only on the data given, not imputed data. This is important for clinical decision support systems using human expertise for modelling and explaining predictions.
Resumo:
We propose a novel template matching approach for the discrimination of handwritten and machine-printed text. We first pre-process the scanned document images by performing denoising, circles/lines exclusion and word-block level segmentation. We then align and match characters in a flexible sized gallery with the segmented regions, using parallelised normalised cross-correlation. The experimental results over the Pattern Recognition & Image Analysis Research Lab-Natural History Museum (PRImA-NHM) dataset show remarkably high robustness of the algorithm in classifying cluttered, occluded and noisy samples, in addition to those with significant high missing data. The algorithm, which gives 84.0% classification rate with false positive rate 0.16 over the dataset, does not require training samples and generates compelling results as opposed to the training-based approaches, which have used the same benchmark.
Resumo:
Increasing use of the term, Strategic Human Resource Management (SHRM), reflects the recognition of the interdependencies between corporate strategy, organization and human resource management in the functioning of the firm. Dyer and Holder (1988) proposed a comprehensive Human Resource Strategic Typology consisting of three strategic types--inducement, investment and involvement. This research attempted to empirically validate their typology and also test the performance implications of the match between corporate strategy and HR strategy. Hypotheses were tested to determine the relationships between internal consistency in HRM sub-systems, match between corporate strategy and HR strategy, and firm performance. Data were collected by a mail survey of 998 senior HR executives of whom 263 returned the completed questionnaire. Financial information on 909 firms was collected from secondary sources like 10-K reports and CD-Disclosure. Profitability ratios were indexed to industry averages. Confirmatory Factor Analysis using LISREL provided support in favor of the six-factor HR measurement model; the six factors were staffing, training, compensation, appraisal, job design and corporate involvement. Support was also found for the presence of a second-order factor labeled "HR Strategic Orientation" explaining the variations among the six factors. LISREL analysis also supported the congruence hypothesis that HR Strategic Orientation significantly affects firm performance. There was a significant associative relationship between HR Strategy and Corporate Strategy. However, the contingency effects of the match between HR and Corporate strategies were not supported. Several tests were conducted to show that the survey results are not affected by non-response bias nor by mono-method bias. Implications of these findings for both researchers and practitioners are discussed. ^
Resumo:
The Buchans ore bodies of central Newfoundland represent some of the highest grade VMS deposits ever mined. These Kuroko-type deposits are also known for the well developed and preserved nature of the mechanically transported deposits. The deposits are hosted in Cambro-Ordovician, dominantly calc-alkaline, bimodal volcanic and epiclastic sequences of the Notre Dame Subzone, Newfoundland Appalachians. Stratigraphic relationships in this zone are complicated by extensively developed, brittledominated Silurian thrust faulting. Hydrothermal alteration of host rocks is a common feature of nearly all VMS deposits, and the recognition of these zones has been a key exploration tool. Alteration of host rocks has long been described to be spatially associated with the Buchans ore bodies, most notably with the larger in-situ deposits. This report represents a base-line study in which a complete documentation of the geochemical variance, in terms of both primary (igneous) and alteration effects, is presented from altered volcanic rocks in the vicinity of the Lucky Strike deposit (LSZ), the largest in-situ deposit in the Buchans camp. Packages of altered rocks also occur away from the immediate mining areas and constitute new targets for exploration. These zones, identified mostly by recent and previous drilling, represent untested targets and include the Powerhouse (PHZ), Woodmans Brook (WBZ) and Airport (APZ) alteration zones, as well as the Middle Branch alteration zone (MBZ), which represents a more distal alteration facies related to Buchans ore-formation. Data from each of these zones were compared to those from the LSZ in order to evaluate their relative propectivity. Derived litho geochemical data served two functions: (i) to define primary (igneous) trends and (ii) secondary alteration trends. Primary trends were established using immobile, or conservative, elements (i. e., HFSE, REE, Th, Ti0₂, Al₂0₃, P₂0₅). From these, altered volcanic rocks were interpreted in terms of composition (e.g., basalt - rhyodacite) and magmatic affinity (e.g., calc-alkaline vs. tholeiitic). The information suggests that bimodality is a common feature of all zones, with most rocks plotting as either basalt/andesite or dacite (or rhyodacite); andesitic senso stricto compositions are rare. Magmatic affinities are more varied and complex, but indicate that all units are arc volcanic sequences. Rocks from the LSZ/MBZ represent a transitional to calc-alkalic sequence, however, a slight shift in key geochemical discriminants occurs between the foot-wall to the hanging-wall. Specifically, mafic and felsic lavas of the foot-wall are of transitional (or mildly calc-alkaline) affinity whereas the hanging-wall rocks are relatively more strongly calc-alkaline as indicated by enriched LREE/HREE and higher ZrN, NbN and other ratios in the latter. The geochemical variations also serve as a means to separate the units (at least the felsic rocks) into hanging-wall and foot-wall sequences, therefore providing a valuable exploration tool. Volcanic rocks from the WBZ/PHZ (and probably the APZ) are more typical of tholeiitic to transitional suites, yielding flatter mantlenormalized REE patterns and lower ZrN ratios. Thus, the relationships between the immediate mining area (represented by LSZ/MBZ) and the Buchans East (PHZ/WBZ) and the APZ are uncertain. Host rocks for all zones consist of mafic to felsic volcanic rocks, though the proportion of pyroclastic and epiclastic rocks, is greatest at the LSZ. Phenocryst assemblages and textures are common in all zones, with minor exceptions, and are not useful for discrimination purposes. Felsic rocks from all zones are dominated by sericiteclay+/- silica alteration, whereas mafic rocks are dominated by chlorite- quartz- sericite alteration. Pyrite is ubiquitous in all moderately altered rocks and minor associated base metal sulphides occur locally. The exception is at Lucky Strike, where stockwork quartzveining contains abundant base-metal mineralization and barite. Rocks completely comprised of chlorite (chloritite) also occur in the LSZ foot-wall. In addition, K-feldspar alteration occurs in felsic volcanic rocks at the MBZ associated with Zn-Pb-Ba and, notably, without chlorite. This zone represents a peripheral, but proximal, zone of alteration induced by lower temperature hydrothermal fluids, presumably with little influence from seawater. Alteration geochemistry was interpreted from raw data as well as from mass balanced (recalculated) data derived from immobile element pairs. The data from the LSZ/MBZ indicate a range in the degree of alteration from only minor to severe modification of precursor compositions. Ba tends to show a strong positive correlation with K₂0, although most Ba occurs as barite. With respect to mass changes, Al₂0₃, Ti0₂ and P₂0₅ were shown to be immobile. Nearly all rocks display mass loss of Na₂O, CaO, and Sr reflecting feldspar destruction. These trends are usually mirrored by K₂0-Rb and MgO addition, indicating sericitic and chloritic alteration, respectively. More substantial gains ofK₂0 often occur in rocks with K-feldspar alteration, whereas a few samples also displayed excessive MgO enrichment and represent chloritites. Fe₂0₃ indicates both chlorite and sulphide formation. Si0₂ addition is almost always the case for the altered mafic rocks as silica often infills amygdules and replaces the finer tuffaceous material. The felsic rocks display more variability in Si0₂. Silicic, sericitic and chloritic alteration trends were observed from the other zones, but not K-feldspar, chloritite, or barite. Microprobe analysis of chlorites, sericites and carbonates indicate: (i) sericites from all zones are defined as muscovite and are not phengitic; (ii) at the LSZ, chlorites ranged from Fe-Mg chlorites (pycnochlorite) to Mg-rich chlorite (penninite), with the latter occurring in the stockwork zone and more proximal alteration facies; (iii) chlorites from the WBZ were typical of those from the more distal alteration facies of the LSZ, plotting as ripidolite to pycnochlorite; (iv) conversely, chlorite from the PHZ plot with Mg-Al-rich compositions (chlinochlore to penninite); and (v) carbonate species from each zone are also varied, with calcite occurring in each zone, in addition to dolomite and ankerite in the PHZ and WBZ, respectively. Lead isotope ratios for galena separates from the different various zones, when combined with data from older studies, tend to cluster into four distinctive fields. Overall, the data plot on a broad mixing line and indicate evolution in a relatively low-μ environment. Data from sulphide stringers in altered MBZ rocks, as well as from clastic sulphides (Sandfill prospect), plot in the Buchans ore field, as do the data for galena from altered rocks in the APZ. Samples from the Buchans East area are even more primitive than the Buchans ores, with lead from the PHZ plotting with the Connel Option prospect and data from the WBZ matching that of the Skidder prospect. A sample from a newly discovered debris flow-type sulphide occurrence (Middle Branch East) yields lead isotope ratios that are slightly more radiogenic than Buchans and plot with the Mary March alteration zone. Data within each cluster are interpreted to represent derivation from individual hydrothermal systems in which metals were derived from a common source.
Resumo:
With the popularization of GPS-enabled devices such as mobile phones, location data are becoming available at an unprecedented scale. The locations may be collected from many different sources such as vehicles moving around a city, user check-ins in social networks, and geo-tagged micro-blogging photos or messages. Besides the longitude and latitude, each location record may also have a timestamp and additional information such as the name of the location. Time-ordered sequences of these locations form trajectories, which together contain useful high-level information about people's movement patterns.
The first part of this thesis focuses on a few geometric problems motivated by the matching and clustering of trajectories. We first give a new algorithm for computing a matching between a pair of curves under existing models such as dynamic time warping (DTW). The algorithm is more efficient than standard dynamic programming algorithms both theoretically and practically. We then propose a new matching model for trajectories that avoids the drawbacks of existing models. For trajectory clustering, we present an algorithm that computes clusters of subtrajectories, which correspond to common movement patterns. We also consider trajectories of check-ins, and propose a statistical generative model, which identifies check-in clusters as well as the transition patterns between the clusters.
The second part of the thesis considers the problem of covering shortest paths in a road network, motivated by an EV charging station placement problem. More specifically, a subset of vertices in the road network are selected to place charging stations so that every shortest path contains enough charging stations and can be traveled by an EV without draining the battery. We first introduce a general technique for the geometric set cover problem. This technique leads to near-linear-time approximation algorithms, which are the state-of-the-art algorithms for this problem in either running time or approximation ratio. We then use this technique to develop a near-linear-time algorithm for this
shortest-path cover problem.
Resumo:
De Groot, D. (2016). Flexibele Leerroutes voor Propedeusestudenten: Grounded Theory Onderzoek naar het Identificeren van Studentkenmerken in de Matching, ten behoeve van een Vraaggerichte, Gepersonaliseerde Leerroute in de Propedeuse Social Work. Juli, 26, 2016, Heerlen, Nederland: Open Universiteit.
Resumo:
The advancement of GPS technology has made it possible to use GPS devices as orientation and navigation tools, but also as tools to track spatiotemporal information. GPS tracking data can be broadly applied in location-based services, such as spatial distribution of the economy, transportation routing and planning, traffic management and environmental control. Therefore, knowledge of how to process the data from a standard GPS device is crucial for further use. Previous studies have considered various issues of the data processing at the time. This paper, however, aims to outline a general procedure for processing GPS tracking data. The procedure is illustrated step-by-step by the processing of real-world GPS data of car movements in Borlänge in the centre of Sweden.
Resumo:
The Pennsylvania Adoption Exchange (PAE) helps case workers who represent children in state custody by recommending prospective families for adoption. We describe PAE's operational challenges using case worker surveys and analyze child outcomes through a regression analysis of data collected over multiple years. A match recommendation spreadsheet tool implemented by PAE incorporates insights from this analysis and allows PAE managers to better utilize available information. Using a discrete-event simulation of PAE, we justify the value of a statewide adoption network and demonstrate the importance of better information about family preferences for increasing the percentage of children who are successfully adopted. Finally, we detail a series of simple improvements that PAE achieved through collecting more valuable information and aligning incentives for families to provide useful preference information.
Resumo:
In questa tesi sono stati analizzati alcuni metodi di ricerca per dati 3D. Viene illustrata una panoramica generale sul campo della Computer Vision, sullo stato dell’arte dei sensori per l’acquisizione e su alcuni dei formati utilizzati per la descrizione di dati 3D. In seguito è stato fatto un approfondimento sulla 3D Object Recognition dove, oltre ad essere descritto l’intero processo di matching tra Local Features, è stata fatta una focalizzazione sulla fase di detection dei punti salienti. In particolare è stato analizzato un Learned Keypoint detector, basato su tecniche di apprendimento di machine learning. Quest ultimo viene illustrato con l’implementazione di due algoritmi di ricerca di vicini: uno esauriente (K-d tree) e uno approssimato (Radial Search). Sono state riportate infine alcune valutazioni sperimentali in termini di efficienza e velocità del detector implementato con diversi metodi di ricerca, mostrando l’effettivo miglioramento di performance senza una considerabile perdita di accuratezza con la ricerca approssimata.
Resumo:
Edge-labeled graphs have proliferated rapidly over the last decade due to the increased popularity of social networks and the Semantic Web. In social networks, relationships between people are represented by edges and each edge is labeled with a semantic annotation. Hence, a huge single graph can express many different relationships between entities. The Semantic Web represents each single fragment of knowledge as a triple (subject, predicate, object), which is conceptually identical to an edge from subject to object labeled with predicates. A set of triples constitutes an edge-labeled graph on which knowledge inference is performed. Subgraph matching has been extensively used as a query language for patterns in the context of edge-labeled graphs. For example, in social networks, users can specify a subgraph matching query to find all people that have certain neighborhood relationships. Heavily used fragments of the SPARQL query language for the Semantic Web and graph queries of other graph DBMS can also be viewed as subgraph matching over large graphs. Though subgraph matching has been extensively studied as a query paradigm in the Semantic Web and in social networks, a user can get a large number of answers in response to a query. These answers can be shown to the user in accordance with an importance ranking. In this thesis proposal, we present four different scoring models along with scalable algorithms to find the top-k answers via a suite of intelligent pruning techniques. The suggested models consist of a practically important subset of the SPARQL query language augmented with some additional useful features. The first model called Substitution Importance Query (SIQ) identifies the top-k answers whose scores are calculated from matched vertices' properties in each answer in accordance with a user-specified notion of importance. The second model called Vertex Importance Query (VIQ) identifies important vertices in accordance with a user-defined scoring method that builds on top of various subgraphs articulated by the user. Approximate Importance Query (AIQ), our third model, allows partial and inexact matchings and returns top-k of them with a user-specified approximation terms and scoring functions. In the fourth model called Probabilistic Importance Query (PIQ), a query consists of several sub-blocks: one mandatory block that must be mapped and other blocks that can be opportunistically mapped. The probability is calculated from various aspects of answers such as the number of mapped blocks, vertices' properties in each block and so on and the most top-k probable answers are returned. An important distinguishing feature of our work is that we allow the user a huge amount of freedom in specifying: (i) what pattern and approximation he considers important, (ii) how to score answers - irrespective of whether they are vertices or substitution, and (iii) how to combine and aggregate scores generated by multiple patterns and/or multiple substitutions. Because so much power is given to the user, indexing is more challenging than in situations where additional restrictions are imposed on the queries the user can ask. The proposed algorithms for the first model can also be used for answering SPARQL queries with ORDER BY and LIMIT, and the method for the second model also works for SPARQL queries with GROUP BY, ORDER BY and LIMIT. We test our algorithms on multiple real-world graph databases, showing that our algorithms are far more efficient than popular triple stores.
Resumo:
The major drawback of Ka band, operating frequency of the AltiKa altimeter on board SARAL, is its sensitivity to atmospheric liquid water. Even light rain or heavy clouds can strongly attenuate the signal and distort the signal leading to erroneous geophysical parameters estimates. A good detection of the samples affected by atmospheric liquid water is crucial. As AltiKa operates at a single frequency, a new technique based on the detection by a Matching Pursuit algorithm of short scale variations of the slope of the echo waveform plateau has been developed and implemented prelaunch in the ground segment. As the parameterization of the detection algorithm was defined using Jason-1 data, the parameters were re-estimated during the cal-val phase, during which the algorithm was also updated. The measured sensor signal-to-noise ratio is significantly better than planned, the data loss due to attenuation by rain is significantly smaller than expected (<0.1%). For cycles 2 to 9, the flag detects about 9% of 1Hz data, 5.5% as rainy and 3.5 % as backscatter bloom (or sigma0 bloom). The results of the flagging process are compared to independent rain data from microwave radiometers to evaluate its performances in term of detection and false alarms.
Resumo:
Dissertação (mestrado)—Universidade de Brasília, Faculdade de Tecnoloigia, 2016.
Resumo:
Depth estimation from images has long been regarded as a preferable alternative compared to expensive and intrusive active sensors, such as LiDAR and ToF. The topic has attracted the attention of an increasingly wide audience thanks to the great amount of application domains, such as autonomous driving, robotic navigation and 3D reconstruction. Among the various techniques employed for depth estimation, stereo matching is one of the most widespread, owing to its robustness, speed and simplicity in setup. Recent developments has been aided by the abundance of annotated stereo images, which granted to deep learning the opportunity to thrive in a research area where deep networks can reach state-of-the-art sub-pixel precision in most cases. Despite the recent findings, stereo matching still begets many open challenges, two among them being finding pixel correspondences in presence of objects that exhibits a non-Lambertian behaviour and processing high-resolution images. Recently, a novel dataset named Booster, which contains high-resolution stereo pairs featuring a large collection of labeled non-Lambertian objects, has been released. The work shown that training state-of-the-art deep neural network on such data improves the generalization capabilities of these networks also in presence of non-Lambertian surfaces. Regardless being a further step to tackle the aforementioned challenge, Booster includes a rather small number of annotated images, and thus cannot satisfy the intensive training requirements of deep learning. This thesis work aims to investigate novel view synthesis techniques to augment the Booster dataset, with ultimate goal of improving stereo matching reliability in presence of high-resolution images that displays non-Lambertian surfaces.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Resumo:
The article seeks to investigate patterns of performance and relationships between grip strength, gait speed and self-rated health, and investigate the relationships between them, considering the variables of gender, age and family income. This was conducted in a probabilistic sample of community-dwelling elderly aged 65 and over, members of a population study on frailty. A total of 689 elderly people without cognitive deficit suggestive of dementia underwent tests of gait speed and grip strength. Comparisons between groups were based on low, medium and high speed and strength. Self-related health was assessed using a 5-point scale. The males and the younger elderly individuals scored significantly higher on grip strength and gait speed than the female and oldest did; the richest scored higher than the poorest on grip strength and gait speed; females and men aged over 80 had weaker grip strength and lower gait speed; slow gait speed and low income arose as risk factors for a worse health evaluation. Lower muscular strength affects the self-rated assessment of health because it results in a reduction in functional capacity, especially in the presence of poverty and a lack of compensatory factors.