992 resultados para Multiple probability vectors
Resumo:
The aim of this research work was primarily to examine the relevance of patient parameters, ward structures, procedures and practices, in respect of the potential hazards of wound cross-infection and nasal colonisation with multiple resistant strains of Staphylococcus aureus, which it is thought might provide a useful indication of a patient's general susceptibility to wound infection. Information from a large cross-sectional survey involving 12,000 patients from some 41 hospitals and 375 wards was collected over a five-year period from 1967-72, and its validity checked before any subsequent analysis was carried out. Many environmental factors and procedures which had previously been thought (but never conclusively proved) to have an influence on wound infection or nasal colonisation rates, were assessed, and subsequently dismissed as not being significant, provided that the standard of the current range of practices and procedures is maintained and not allowed to deteriorate. Retrospective analysis revealed that the probability of wound infection was influenced by the patient's age, duration of pre-operative hospitalisation, sex, type of wound, presence and type of drain, number of patients in ward, and other special risk factors, whilst nasal colonisation was found to be influenced by the patient's age, total duration of hospitalisation, sex, antibiotics, proportion of occupied beds in the ward, average distance between bed centres and special risk factors. A multi-variate regression analysis technique was used to develop statistical models, consisting of variable patient and environmental factors which were found to have a significant influence on the risks pertaining to wound infection and nasal colonisation. A relationship between wound infection and nasal colonisation was then established and this led to the development of a more advanced model for predicting wound infections, taking advantage of the additional knowledge of the patient's state of nasal colonisation prior to operation.
Resumo:
2000 Mathematics Subject Classification: 62P30.
Resumo:
In this study, I determined the identity, taxonomic placement, and distribution of digenetic trematodes parasitizing the snails Pomacea paludosa and Planorbella duryi at Pa-hay-okee, Everglades National Park. I also characterized temporal and geographic variation in the probability of parasite infection for these snails based on two years of sampling. Although studies indicate that digenean parasites may have important effects both on individual species and the structure of communities, there have been no studies of digenean parasitism on snails within the Everglades ecosystem. For example, the endangered Everglade Snail Kite, a specialist that feeds almost exclusively on Pomacea paludosa, and is known to be a definitive host of digenean parasites, may suffer direct and indirect effects from consumption of parasitized apple snails. Therefore, information on the diversity and abundance of parasites harbored in snail populations in the Everglades should be of considerable interest for management and conservation of wildlife. Juvenile digeneans (cercariae) representing 20 species were isolated from these two snails, representing a quadrupling of the number of species known. Species were characterized based on morphological, morphometric, and sequence data (18S rDNA, COI, and ITS). Species richness of shed cercariae from P. duryi was greater than P. paludosa, with 13 and 7 species respectively. These species represented 14 families. P. paludosa and P. duryi had no digenean species in common. Probability of digenean infection was higher for P. duryi than P. paludosa and adults showed a greater risk of infection than juveniles for both of these snails. Planorbella duryi showed variation in probability of infection between sampling sites and hydrological seasons. The number of unique combinations of multi-species infections was greatest among P. duryi individuals, while the overall percentage of multi-species infections was greatest in P. paludosa. Analyses of six frequently-observed multiple infections from P. duryi suggest the presence of negative interactions, positive interactions, and neutral associations between larval digeneans. These results should contribute to an understanding of the factors controlling the abundance and distribution of key species in the Everglades ecosystem and may in particular help in the management and recovery planning for the Everglade Snail Kite.
Resumo:
Electromagnetic waves in suburban environment encounter multiple obstructions that shadow the signal. These waves are scattered and random in polarization. They take multiple paths that add as vectors at the portable device. Buildings have vertical and horizontal edges. Diffraction from edges has polarization dependent characteristics. In practical case, a signal transmitted from a vertically polarized high antenna will result in a significant fraction of total power in the horizontal polarization at the street level. Signal reception can be improved whenever there is a probability of receiving the signal in at least two independent ways or branches. The Finite-Difference Time-Domain (FDTD) method was applied to obtain the two and three-dimensional dyadic diffraction coefficients (soft and hard) of right-angle perfect electric conductor (PEC) wedges illuminated by a plane wave. The FDTD results were in good agreement with the asymptotic solutions obtained using Uniform Theory of Diffraction (UTD). Further, a material wedge replaced the PEC wedge and the dyadic diffraction coefficient for the same was obtained.
Resumo:
Subspaces and manifolds are two powerful models for high dimensional signals. Subspaces model linear correlation and are a good fit to signals generated by physical systems, such as frontal images of human faces and multiple sources impinging at an antenna array. Manifolds model sources that are not linearly correlated, but where signals are determined by a small number of parameters. Examples are images of human faces under different poses or expressions, and handwritten digits with varying styles. However, there will always be some degree of model mismatch between the subspace or manifold model and the true statistics of the source. This dissertation exploits subspace and manifold models as prior information in various signal processing and machine learning tasks.
A near-low-rank Gaussian mixture model measures proximity to a union of linear or affine subspaces. This simple model can effectively capture the signal distribution when each class is near a subspace. This dissertation studies how the pairwise geometry between these subspaces affects classification performance. When model mismatch is vanishingly small, the probability of misclassification is determined by the product of the sines of the principal angles between subspaces. When the model mismatch is more significant, the probability of misclassification is determined by the sum of the squares of the sines of the principal angles. Reliability of classification is derived in terms of the distribution of signal energy across principal vectors. Larger principal angles lead to smaller classification error, motivating a linear transform that optimizes principal angles. This linear transformation, termed TRAIT, also preserves some specific features in each class, being complementary to a recently developed Low Rank Transform (LRT). Moreover, when the model mismatch is more significant, TRAIT shows superior performance compared to LRT.
The manifold model enforces a constraint on the freedom of data variation. Learning features that are robust to data variation is very important, especially when the size of the training set is small. A learning machine with large numbers of parameters, e.g., deep neural network, can well describe a very complicated data distribution. However, it is also more likely to be sensitive to small perturbations of the data, and to suffer from suffer from degraded performance when generalizing to unseen (test) data.
From the perspective of complexity of function classes, such a learning machine has a huge capacity (complexity), which tends to overfit. The manifold model provides us with a way of regularizing the learning machine, so as to reduce the generalization error, therefore mitigate overfiting. Two different overfiting-preventing approaches are proposed, one from the perspective of data variation, the other from capacity/complexity control. In the first approach, the learning machine is encouraged to make decisions that vary smoothly for data points in local neighborhoods on the manifold. In the second approach, a graph adjacency matrix is derived for the manifold, and the learned features are encouraged to be aligned with the principal components of this adjacency matrix. Experimental results on benchmark datasets are demonstrated, showing an obvious advantage of the proposed approaches when the training set is small.
Stochastic optimization makes it possible to track a slowly varying subspace underlying streaming data. By approximating local neighborhoods using affine subspaces, a slowly varying manifold can be efficiently tracked as well, even with corrupted and noisy data. The more the local neighborhoods, the better the approximation, but the higher the computational complexity. A multiscale approximation scheme is proposed, where the local approximating subspaces are organized in a tree structure. Splitting and merging of the tree nodes then allows efficient control of the number of neighbourhoods. Deviation (of each datum) from the learned model is estimated, yielding a series of statistics for anomaly detection. This framework extends the classical {\em changepoint detection} technique, which only works for one dimensional signals. Simulations and experiments highlight the robustness and efficacy of the proposed approach in detecting an abrupt change in an otherwise slowly varying low-dimensional manifold.
Resumo:
We review and compare four broad categories of spatially-explicit modelling approaches currently used to understand and project changes in the distribution and productivity of living marine resources including: 1) statistical species distribution models, 2) physiology-based, biophysical models of single life stages or the whole life cycle of species, 3) food web models, and 4) end-to-end models. Single pressures are rare and, in the future, models must be able to examine multiple factors affecting living marine resources such as interactions between: i) climate-driven changes in temperature regimes and acidification, ii) reductions in water quality due to eutrophication, iii) the introduction of alien invasive species, and/or iv) (over-)exploitation by fisheries. Statistical (correlative) approaches can be used to detect historical patterns which may not be relevant in the future. Advancing predictive capacity of changes in distribution and productivity of living marine resources requires explicit modelling of biological and physical mechanisms. New formulations are needed which (depending on the question) will need to strive for more realism in ecophysiology and behaviour of individuals, life history strategies of species, as well as trophodynamic interactions occurring at different spatial scales. Coupling existing models (e.g. physical, biological, economic) is one avenue that has proven successful. However, fundamental advancements are needed to address key issues such as the adaptive capacity of species/groups and ecosystems. The continued development of end-to-end models (e.g., physics to fish to human sectors) will be critical if we hope to assess how multiple pressures may interact to cause changes in living marine resources including the ecological and economic costs and trade-offs of different spatial management strategies. Given the strengths and weaknesses of the various types of models reviewed here, confidence in projections of changes in the distribution and productivity of living marine resources will be increased by assessing model structural uncertainty through biological ensemble modelling.
Resumo:
We review and compare four broad categories of spatially-explicit modelling approaches currently used to understand and project changes in the distribution and productivity of living marine resources including: 1) statistical species distribution models, 2) physiology-based, biophysical models of single life stages or the whole life cycle of species, 3) food web models, and 4) end-to-end models. Single pressures are rare and, in the future, models must be able to examine multiple factors affecting living marine resources such as interactions between: i) climate-driven changes in temperature regimes and acidification, ii) reductions in water quality due to eutrophication, iii) the introduction of alien invasive species, and/or iv) (over-)exploitation by fisheries. Statistical (correlative) approaches can be used to detect historical patterns which may not be relevant in the future. Advancing predictive capacity of changes in distribution and productivity of living marine resources requires explicit modelling of biological and physical mechanisms. New formulations are needed which (depending on the question) will need to strive for more realism in ecophysiology and behaviour of individuals, life history strategies of species, as well as trophodynamic interactions occurring at different spatial scales. Coupling existing models (e.g. physical, biological, economic) is one avenue that has proven successful. However, fundamental advancements are needed to address key issues such as the adaptive capacity of species/groups and ecosystems. The continued development of end-to-end models (e.g., physics to fish to human sectors) will be critical if we hope to assess how multiple pressures may interact to cause changes in living marine resources including the ecological and economic costs and trade-offs of different spatial management strategies. Given the strengths and weaknesses of the various types of models reviewed here, confidence in projections of changes in the distribution and productivity of living marine resources will be increased by assessing model structural uncertainty through biological ensemble modelling.
Resumo:
We investigate the secrecy performance of dualhop amplify-and-forward (AF) multi-antenna relaying systems over Rayleigh fading channels, by taking into account the direct link between the source and destination. In order to exploit the available direct link and the multiple antennas for secrecy improvement, different linear processing schemes at the relay and different diversity combining techniques at the destination are proposed, namely, 1) Zero-forcing/Maximal ratio combining (ZF/MRC), 2) ZF/Selection combining (ZF/SC), 3) Maximal ratio transmission/MRC (MRT/MRC) and 4) MRT/Selection combining (MRT/SC). For all these schemes, we present new closed-form approximations for the secrecy outage probability. Moreover, we investigate a benchmark scheme, i.e., cooperative jamming/ZF (CJ/ZF), where the secrecy outage probability is obtained in exact closed-form. In addition, we present asymptotic secrecy outage expressions for all the proposed schemes in the high signal-to-noise ratio (SNR) regime, in order to characterize key design parameters, such as secrecy diversity order and secrecy array gain. The outcomes of this paper can be summarized as follows: a) MRT/MRC and MRT/SC achieve a full diversity order of M + 1, ZF/MRC and ZF/SC achieve a diversity order of M, while CJ/ZF only achieves unit diversity order, where M is the number of antennas at the relay. b) ZF/MRC (ZF/SC) outperforms the corresponding MRT/MRC (MRT/SC) in the low SNR regime, while becomes inferior to the corresponding MRT/MRC (MRT/SC) in the high SNR. c) All of the proposed schemes tend to outperform the CJ/ZF with moderate number of antennas, and linear processing schemes with MRC attain better performance than those with SC.
Resumo:
We investigate the impact of co-channel interference on the security performance of multiple amplify-and-forward (AF) relaying networks, where N intermediate AF relays assist the data transmission from the source to the destination. The relays are corrupted by multiple co-channel interferers, and the information transmitted from the relays to destination can be overheard by the eavesdropper. In order to deal with the interference and wiretap, the best out of N relays is selected for security enhancement. To this end, we derive a novel lower bound on the secrecy outage probability (SOP), which is then utilized to present two best relay selection criteria, based on the instantaneous and statistical channel information of the interfering links. For these criteria and the conventional maxmin criterion, we quantify the impact of co-channel interference and relay selection by deriving the lower bound on the SOP. Furthermore, we derive the asymptotic SOP for each criterion, to explicitly reveal the impact of transmit power allocation among interferers on the secrecy performance, which offers valuable insights into practical design. We demonstrate that all selection criteria achieve full secrecy diversity order N, while the proposed in this paper two criteria outperform the conventional max-min scheme.
Resumo:
We propose three research problems to explore the relations between trust and security in the setting of distributed computation. In the first problem, we study trust-based adversary detection in distributed consensus computation. The adversaries we consider behave arbitrarily disobeying the consensus protocol. We propose a trust-based consensus algorithm with local and global trust evaluations. The algorithm can be abstracted using a two-layer structure with the top layer running a trust-based consensus algorithm and the bottom layer as a subroutine executing a global trust update scheme. We utilize a set of pre-trusted nodes, headers, to propagate local trust opinions throughout the network. This two-layer framework is flexible in that it can be easily extensible to contain more complicated decision rules, and global trust schemes. The first problem assumes that normal nodes are homogeneous, i.e. it is guaranteed that a normal node always behaves as it is programmed. In the second and third problems however, we assume that nodes are heterogeneous, i.e, given a task, the probability that a node generates a correct answer varies from node to node. The adversaries considered in these two problems are workers from the open crowd who are either investing little efforts in the tasks assigned to them or intentionally give wrong answers to questions. In the second part of the thesis, we consider a typical crowdsourcing task that aggregates input from multiple workers as a problem in information fusion. To cope with the issue of noisy and sometimes malicious input from workers, trust is used to model workers' expertise. In a multi-domain knowledge learning task, however, using scalar-valued trust to model a worker's performance is not sufficient to reflect the worker's trustworthiness in each of the domains. To address this issue, we propose a probabilistic model to jointly infer multi-dimensional trust of workers, multi-domain properties of questions, and true labels of questions. Our model is very flexible and extensible to incorporate metadata associated with questions. To show that, we further propose two extended models, one of which handles input tasks with real-valued features and the other handles tasks with text features by incorporating topic models. Our models can effectively recover trust vectors of workers, which can be very useful in task assignment adaptive to workers' trust in the future. These results can be applied for fusion of information from multiple data sources like sensors, human input, machine learning results, or a hybrid of them. In the second subproblem, we address crowdsourcing with adversaries under logical constraints. We observe that questions are often not independent in real life applications. Instead, there are logical relations between them. Similarly, workers that provide answers are not independent of each other either. Answers given by workers with similar attributes tend to be correlated. Therefore, we propose a novel unified graphical model consisting of two layers. The top layer encodes domain knowledge which allows users to express logical relations using first-order logic rules and the bottom layer encodes a traditional crowdsourcing graphical model. Our model can be seen as a generalized probabilistic soft logic framework that encodes both logical relations and probabilistic dependencies. To solve the collective inference problem efficiently, we have devised a scalable joint inference algorithm based on the alternating direction method of multipliers. The third part of the thesis considers the problem of optimal assignment under budget constraints when workers are unreliable and sometimes malicious. In a real crowdsourcing market, each answer obtained from a worker incurs cost. The cost is associated with both the level of trustworthiness of workers and the difficulty of tasks. Typically, access to expert-level (more trustworthy) workers is more expensive than to average crowd and completion of a challenging task is more costly than a click-away question. In this problem, we address the problem of optimal assignment of heterogeneous tasks to workers of varying trust levels with budget constraints. Specifically, we design a trust-aware task allocation algorithm that takes as inputs the estimated trust of workers and pre-set budget, and outputs the optimal assignment of tasks to workers. We derive the bound of total error probability that relates to budget, trustworthiness of crowds, and costs of obtaining labels from crowds naturally. Higher budget, more trustworthy crowds, and less costly jobs result in a lower theoretical bound. Our allocation scheme does not depend on the specific design of the trust evaluation component. Therefore, it can be combined with generic trust evaluation algorithms.
Resumo:
A decision-maker, when faced with a limited and fixed budget to collect data in support of a multiple attribute selection decision, must decide how many samples to observe from each alternative and attribute. This allocation decision is of particular importance when the information gained leads to uncertain estimates of the attribute values as with sample data collected from observations such as measurements, experimental evaluations, or simulation runs. For example, when the U.S. Department of Homeland Security must decide upon a radiation detection system to acquire, a number of performance attributes are of interest and must be measured in order to characterize each of the considered systems. We identified and evaluated several approaches to incorporate the uncertainty in the attribute value estimates into a normative model for a multiple attribute selection decision. Assuming an additive multiple attribute value model, we demonstrated the idea of propagating the attribute value uncertainty and describing the decision values for each alternative as probability distributions. These distributions were used to select an alternative. With the goal of maximizing the probability of correct selection we developed and evaluated, under several different sets of assumptions, procedures to allocate the fixed experimental budget across the multiple attributes and alternatives. Through a series of simulation studies, we compared the performance of these allocation procedures to the simple, but common, allocation procedure that distributed the sample budget equally across the alternatives and attributes. We found the allocation procedures that were developed based on the inclusion of decision-maker knowledge, such as knowledge of the decision model, outperformed those that neglected such information. Beginning with general knowledge of the attribute values provided by Bayesian prior distributions, and updating this knowledge with each observed sample, the sequential allocation procedure performed particularly well. These observations demonstrate that managing projects focused on a selection decision so that the decision modeling and the experimental planning are done jointly, rather than in isolation, can improve the overall selection results.
Resumo:
Tese (doutorado)—Universidade de Brasília, Departamento de Economia, Brasília, 2016.
Resumo:
Pitch Estimation, also known as Fundamental Frequency (F0) estimation, has been a popular research topic for many years, and is still investigated nowadays. The goal of Pitch Estimation is to find the pitch or fundamental frequency of a digital recording of a speech or musical notes. It plays an important role, because it is the key to identify which notes are being played and at what time. Pitch Estimation of real instruments is a very hard task to address. Each instrument has its own physical characteristics, which reflects in different spectral characteristics. Furthermore, the recording conditions can vary from studio to studio and background noises must be considered. This dissertation presents a novel approach to the problem of Pitch Estimation, using Cartesian Genetic Programming (CGP).We take advantage of evolutionary algorithms, in particular CGP, to explore and evolve complex mathematical functions that act as classifiers. These classifiers are used to identify piano notes pitches in an audio signal. To help us with the codification of the problem, we built a highly flexible CGP Toolbox, generic enough to encode different kind of programs. The encoded evolutionary algorithm is the one known as 1 + , and we can choose the value for . The toolbox is very simple to use. Settings such as the mutation probability, number of runs and generations are configurable. The cartesian representation of CGP can take multiple forms and it is able to encode function parameters. It is prepared to handle with different type of fitness functions: minimization of f(x) and maximization of f(x) and has a useful system of callbacks. We trained 61 classifiers corresponding to 61 piano notes. A training set of audio signals was used for each of the classifiers: half were signals with the same pitch as the classifier (true positive signals) and the other half were signals with different pitches (true negative signals). F-measure was used for the fitness function. Signals with the same pitch of the classifier that were correctly identified by the classifier, count as a true positives. Signals with the same pitch of the classifier that were not correctly identified by the classifier, count as a false negatives. Signals with different pitch of the classifier that were not identified by the classifier, count as a true negatives. Signals with different pitch of the classifier that were identified by the classifier, count as a false positives. Our first approach was to evolve classifiers for identifying artifical signals, created by mathematical functions: sine, sawtooth and square waves. Our function set is basically composed by filtering operations on vectors and by arithmetic operations with constants and vectors. All the classifiers correctly identified true positive signals and did not identify true negative signals. We then moved to real audio recordings. For testing the classifiers, we picked different audio signals from the ones used during the training phase. For a first approach, the obtained results were very promising, but could be improved. We have made slight changes to our approach and the number of false positives reduced 33%, compared to the first approach. We then applied the evolved classifiers to polyphonic audio signals, and the results indicate that our approach is a good starting point for addressing the problem of Pitch Estimation.
Resumo:
Aim of the present study was to develop a statistical approach to define the best cut-off Copy number alterations (CNAs) calling from genomic data provided by high throughput experiments, able to predict a specific clinical end-point (early relapse, 18 months) in the context of Multiple Myeloma (MM). 743 newly diagnosed MM patients with SNPs array-derived genomic and clinical data were included in the study. CNAs were called both by a conventional (classic, CL) and an outcome-oriented (OO) method, and Progression Free Survival (PFS) hazard ratios of CNAs called by the two approaches were compared. The OO approach successfully identified patients at higher risk of relapse and the univariate survival analysis showed stronger prognostic effects for OO-defined high-risk alterations, as compared to that defined by CL approach, statistically significant for 12 CNAs. Overall, 155/743 patients relapsed within 18 months from the therapy start. A small number of OO-defined CNAs were significantly recurrent in early-relapsed patients (ER-CNAs) - amp1q, amp2p, del2p, del12p, del17p, del19p -. Two groups of patients were identified either carrying or not ≥1 ER-CNAs (249 vs. 494, respectively), the first one with significantly shorter PFS and overall survivals (OS) (PFS HR 2.15, p<0001; OS HR 2.37, p<0.0001). The risk of relapse defined by the presence of ≥1 ER-CNAs was independent from those conferred both by R-IIS 3 (HR=1.51; p=0.01) and by low quality (< stable disease) clinical response (HR=2.59 p=0.004). Notably, the type of induction therapy was not descriptive, suggesting that ER is strongly related to patients’ baseline genomic architecture. In conclusion, the OO- approach employed allowed to define CNAs-specific dynamic clonality cut-offs, improving the CNAs calls’ accuracy to identify MM patients with the highest probability to ER. As being outcome-dependent, the OO-approach is dynamic and might be adjusted according to the selected outcome variable of interest.
Resumo:
The cerebellum is an important site for cortical demyelination in multiple sclerosis, but the functional significance of this finding is not fully understood. To evaluate the clinical and cognitive impact of cerebellar grey-matter pathology in multiple sclerosis patients. Forty-two relapsing-remitting multiple sclerosis patients and 30 controls underwent clinical assessment including the Multiple Sclerosis Functional Composite, Expanded Disability Status Scale (EDSS) and cerebellar functional system (FS) score, and cognitive evaluation, including the Paced Auditory Serial Addition Test (PASAT) and the Symbol-Digit Modalities Test (SDMT). Magnetic resonance imaging was performed with a 3T scanner and variables of interest were: brain white-matter and cortical lesion load, cerebellar intracortical and leukocortical lesion volumes, and brain cortical and cerebellar white-matter and grey-matter volumes. After multivariate analysis high burden of cerebellar intracortical lesions was the only predictor for the EDSS (p<0.001), cerebellar FS (p = 0.002), arm function (p = 0.049), and for leg function (p<0.001). Patients with high burden of cerebellar leukocortical lesions had lower PASAT scores (p = 0.013), while patients with greater volumes of cerebellar intracortical lesions had worse SDMT scores (p = 0.015). Cerebellar grey-matter pathology is widely present and contributes to clinical dysfunction in relapsing-remitting multiple sclerosis patients, independently of brain grey-matter damage.