914 resultados para label regression
Resumo:
Peer-reviewed
Resumo:
Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.
Resumo:
A collection of slides from the authorpsilas seminar presentation is given
Resumo:
Ethernet is becoming the dominant aggregation technology for carrier transport networks; however, as it is a LAN technology, native bridged ethernet does not fulfill all the carrier requirements. One of the schemes proposed by the research community to make ethernet fulfill carrier requirements is ethernet VLAN-label switching (ELS). ELS allows the creation of label switched data paths using a 12-bit label encoded in the VLAN TAG control information field. Previous label switching technologies such as MPLS use more bits for encoding the label. Hence, they do not suffer from label sparsity issues as ELS might. This paper studies the sparsity issues resulting from the reduced ELS VLAN-label space and proposes the use of the label merging technique to improve label space usage. Experimental results show that label merging considerably improves label space usage
Resumo:
It is well known that regression analyses involving compositional data need special attention because the data are not of full rank. For a regression analysis where both the dependent and independent variable are components we propose a transformation of the components emphasizing their role as dependent and independent variables. A simple linear regression can be performed on the transformed components. The regression line can be depicted in a ternary diagram facilitating the interpretation of the analysis in terms of components. An exemple with time-budgets illustrates the method and the graphical features
Resumo:
This paper uses the possibilities provided by the regression-based inequality decomposition (Fields, 2003) to explore the contribution of different explanatory factors to international inequality in CO2 emissions per capita. In contrast to previous emissions inequality decompositions, which were based on identity relationships (Duro and Padilla, 2006), this methodology does not impose any a priori specific relationship. Thus, it allows an assessment of the contribution to inequality of different relevant variables. In short, the paper appraises the relative contributions of affluence, sectoral composition, demographic factors and climate. The analysis is applied to selected years of the period 1993–2007. The results show the important (though decreasing) share of the contribution of demographic factors, as well as a significant contribution of affluence and sectoral composition.
Resumo:
Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-effective hospital administration. Methods for automated processing of clinical text are emerging. The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality, and a road map for the technology development. Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow. Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising. Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality. The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.
Resumo:
Breast cancer is the most prevalent neoplasm among women in the majority of countries worldwide. Breast cancer treatment include mastectomy which is associated to strong impact in women. Breast reconstruction is an option for many women to re-establish their body image and also to decrease psychological impact. However, breast reconstruction rates are low and many factors are involved in not undergoing breast reconstruction. Patient involvement in the decision-making process increases breast reconstruction rates and is associated to higher satisfaction and less anxiety and depression symptoms. More physician-patient relation and more education in terms of breast reconstruction are needed to achieve our objective. A new approach of medical care, called Patson Approach, is created in order to meet our goal with more patient involvement, as well as, physician and psychological counsellingObjective: to increase breast reconstruction rates in women who are candidates for breast reconstruction after mastectomy and are included in the Patson Approach compared to women included in the Standard ApproachMethods: the study design will be a randomized, controlled, open-label clinical trial. 62 patients will be recruited during two years and randomly divided in two groups, 31 will be included in the Standard Approach and 31 will be included in the Patson Approach. Preoperative and postoperative appointments are established in order to do a follow-up of the patients and collect all the data
Resumo:
Acetylation was performed to reduce the polarity of wood and increase its compatibility with polymer matrices for the production of composites. These reactions were performed first as a function of acetic acid and anhydride concentration in a mixture catalyzed by sulfuric acid. A concentration of 50%/50% (v/v) of acetic acid and anhydride was found to produced the highest conversion rate between the functional groups. After these reactions, the kinetics were investigated by varying times and temperatures using a 3² factorial design, and showed time was the most relevant parameter in determining the conversion of hydroxyl into carbonyl groups.
Resumo:
Analytical curves are normally obtained from discrete data by least squares regression. The least squares regression of data involving significant error in both x and y values should not be implemented by ordinary least squares (OLS). In this work, the use of orthogonal distance regression (ODR) is discussed as an alternative approach in order to take into account the error in the x variable. Four examples are presented to illustrate deviation between the results from both regression methods. The examples studied show that, in some situations, ODR coefficients must substitute for those of OLS, and, in other situations, the difference is not significant.
Resumo:
This work focused on the development and validation of an RP-HPLC-UV method for quantification of beta-lactam antibiotics in three pharmaceutical samples. Active principles analyzed were amoxicillin and ampicillin, in 3 veterinary drugs. Mobile phase comprised 5 mmol L-1 phosphoric acid solution at pH 2.00, acetonitrile with gradient elution mode and detection wavelength at 220 nm. The method was validated according to the Brazilian National Health Surveillance regulation, where linear range and linearity, selectivity, precision, accuracy and ruggedness were evaluated. Inter day precision and accuracy for pharmaceutical samples 1, 2 and 3 were: 1.43 and 1.43%; 4.71 and 3.74%; 2.72 and 1.72%, respectively, while regression coefficients for analytical curves exceeded 0.99. The method had acceptable merit figure values, indicating reliable quantification. Analyzed samples had active principle concentrations varying from -12 to +21% compared to manufacturer label claims, rendering the medicine unsafe for administration to animals.
Resumo:
The increasing demand of consumer markets for the welfare of birds in poultry house has motivated many scientific researches to monitor and classify the welfare according to the production environment. Given the complexity between the birds and the environment of the aviary, the correct interpretation of the conduct becomes an important way to estimate the welfare of these birds. This study obtained multiple logistic regression models with capacity of estimating the welfare of broiler breeders in relation to the environment of the aviaries and behaviors expressed by the birds. In the experiment, were observed several behaviors expressed by breeders housed in a climatic chamber under controlled temperatures and three different ammonia concentrations from the air monitored daily. From the analysis of the data it was obtained two logistic regression models, of which the first model uses a value of ammonia concentration measured by unit and the second model uses a binary value to classify the ammonia concentration that is assigned by a person through his olfactory perception. The analysis showed that both models classified the broiler breeder's welfare successfully.