Biblioteca Digital

948 resultados para statistical methods

Evaluation of the detection and quantification limits in electroanalysis using two popular methods: application in the case study of paraquat determination

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, the reduction reaction of paraquat herbicide was used to obtain analytical signals using electrochemical techniques of differential pulse voltammetry, square wave voltammetry and multiple square wave voltammetry. Analytes were prepared with laboratory purified water and natural water samples (from Mogi-Guacu River, SP). The electrochemical techniques were applied to 1.0 mol L-1 Na2SO4 solutions, at pH 5.5, and containing different concentrations of paraquat, in the range of 1 to 10 mu mol L-1, using a gold ultramicroelectrode. 5 replicate experiments were conducted and in each the mean value for peak currents obtained -0.70 V vs. Ag/AgCl yielded excellent linear relationships with pesticide concentrations. The slope values for the calibration plots (method sensitivity) were 4.06 x 10(-3), 1.07 x 10(-2) and 2.95 x 10(-2) A mol(-1) L for purified water by differential pulse voltammetry, square wave voltammetry and multiple square wave voltammetry, respectively. For river water samples, the slope values were 2.60 x 10(-3), 1.06 x 10(-2) and 3.35 x 10(-2) A mol(-1) L, respectively, showing a small interference from the natural matrix components in paraquat determinations. The detection limits for paraquat determinations were calculated by two distinct methodologies, i.e., as proposed by IUPAC and a statistical method. The values obtained with multiple square waves voltammetry were 0.002 and 0.12 mu mol L-1, respectively, for pure water electrolytes. The detection limit from IUPAC recommendations, when inserted in the calibration curve equation, an analytical signal (oxidation current) is smaller than the one experimentally observed for the blank solution under the same experimental conditions. This is inconsistent with the definition of detection limit, thus the IUPAC methodology requires further discussion. The same conclusion can be drawn by the analyses of detection limits obtained with the other techniques studied.

Robust statistical modeling using the Birnbaum-Saunders-t distribution applied to insurance

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we carry out robust modeling and influence diagnostics in Birnbaum-Saunders (BS) regression models. Specifically, we present some aspects related to BS and log-BS distributions and their generalizations from the Student-t distribution, and develop BS-t regression models, including maximum likelihood estimation based on the EM algorithm and diagnostic tools. In addition, we apply the obtained results to real data from insurance, which shows the uses of the proposed model. Copyright (c) 2011 John Wiley & Sons, Ltd.

Comparison of different dosimetric methods for red marrow absorbed dose calculation in thyroid cancer therapy

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Several dosimetric methods have been proposed for estimating red marrow absorbed dose (RMAD) when radionuclide therapy is planned for differentiated thyroid cancer, although to date, there is no consensus as to whether dose calculation should be based on blood-activity concentration or not. Our purpose was to compare RMADs derived from methods that require collecting patients' blood samples versus those involving OLINDA/EXM software, thereby precluding this invasive procedure. This is a retrospective study that included 34 patients under treatment for metastatic thyroid disease. A deviation of 10 between RMADs was found, when comparing the doses from the most usual invasive dosimetric methods and those from OLINDA/EXM. No statistical difference between the methods was discovered, whereby the need for invasive procedures when calculating the dose is questioned. The use of OLINDA/EXM in clinical routine could possibly diminish data collection, thus giving rise to a simultaneous reduction in time and clinical costs, besides avoiding any kind of discomfort on the part of the patients involved.

Evaluating different methods of microarray data normalization

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background With the development of DNA hybridization microarray technologies, nowadays it is possible to simultaneously assess the expression levels of thousands to tens of thousands of genes. Quantitative comparison of microarrays uncovers distinct patterns of gene expression, which define different cellular phenotypes or cellular responses to drugs. Due to technical biases, normalization of the intensity levels is a pre-requisite to performing further statistical analyses. Therefore, choosing a suitable approach for normalization can be critical, deserving judicious consideration. Results Here, we considered three commonly used normalization approaches, namely: Loess, Splines and Wavelets, and two non-parametric regression methods, which have yet to be used for normalization, namely, the Kernel smoothing and Support Vector Regression. The results obtained were compared using artificial microarray data and benchmark studies. The results indicate that the Support Vector Regression is the most robust to outliers and that Kernel is the worst normalization technique, while no practical differences were observed between Loess, Splines and Wavelets. Conclusion In face of our results, the Support Vector Regression is favored for microarray normalization due to its superiority when compared to the other methods for its robustness in estimating the normalization curve.

Comparison of two laboratory-developed PCR methods for the diagnosis of Pulmonary Tuberculosis in Brazilian patients with and without HIV infection

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background Direct smear examination with Ziehl-Neelsen (ZN) staining for the diagnosis of pulmonary tuberculosis (PTB) is cheap and easy to use, but its low sensitivity is a major drawback, particularly in HIV seropositive patients. As such, new tools for laboratory diagnosis are urgently needed to improve the case detection rate, especially in regions with a high prevalence of TB and HIV. Objective To evaluate the performance of two in house PCR (Polymerase Chain Reaction): PCR dot-blot methodology (PCR dot-blot) and PCR agarose gel electrophoresis (PCR-AG) for the diagnosis of Pulmonary Tuberculosis (PTB) in HIV seropositive and HIV seronegative patients. Methods A prospective study was conducted (from May 2003 to May 2004) in a TB/HIV reference hospital. Sputum specimens from 277 PTB suspects were tested by Acid Fast Bacilli (AFB) smear, Culture and in house PCR assays (PCR dot-blot and PCR-AG) and their performances evaluated. Positive cultures combined with the definition of clinical pulmonary TB were employed as the gold standard. Results The overall prevalence of PTB was 46% (128/277); in HIV+, prevalence was 54.0% (40/74). The sensitivity and specificity of PCR dot-blot were 74% (CI 95%; 66.1%-81.2%) and 85% (CI 95%; 78.8%-90.3%); and of PCR-AG were 43% (CI 95%; 34.5%-51.6%) and 76% (CI 95%; 69.2%-82.8%), respectively. For HIV seropositive and HIV seronegative samples, sensitivities of PCR dot-blot (72% vs 75%; p = 0.46) and PCR-AG (42% vs 43%; p = 0.54) were similar. Among HIV seronegative patients and PTB suspects, ROC analysis presented the following values for the AFB smear (0.837), Culture (0.926), PCR dot-blot (0.801) and PCR-AG (0.599). In HIV seropositive patients, these area values were (0.713), (0.900), (0.789) and (0.595), respectively. Conclusion Results of this study demonstrate that the in house PCR dot blot may be an improvement for ruling out PTB diagnosis in PTB suspects assisted at hospitals with a high prevalence of TB/HIV.

Effects of EDTA and Sodium Citrate on hormone measurements by fluorometric (FIA) and immunofluorometric (IFMA) methods

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background Measurements of hormonal concentrations by immunoassays using fluorescent tracer substance (Eu3+) are susceptible to the action of chemical agents that may cause alterations in its original structure. Our goal was to verify the effect of two types of anticoagulants in the hormone assays performed by fluorometric (FIA) or immunofluorometric (IFMA) methods. Methods Blood samples were obtained from 30 outpatients and were drawn in EDTA, sodium citrate, and serum separation Vacutainer®Blood Collection Tubes. Samples were analyzed in automatized equipment AutoDelfia™ (Perkin Elmer Brazil, Wallac, Finland) for the following hormones: Luteinizing hormone (LH), Follicle stimulating homone (FSH), prolactin (PRL), growth hormone (GH), Sex hormone binding globulin (SHBG), thyroid stimulating hormone (TSH), insulin, C peptide, total T3, total T4, free T4, estradiol, progesterone, testosterone, and cortisol. Statistical analysis was carried out by Kruskal-Wallis method and Dunn's test. Results No significant differences were seen between samples for LH, FSH, PRL and free T4. Results from GH, TSH, insulin, C peptide, SHBG, total T3, total T4, estradiol, testosterone, cortisol, and progesterone were significant different between serum and EDTA-treated samples groups. Differences were also identified between serum and sodium citrate-treated samples in the analysis for TSH, insulin, total T3, estradiol, testosterone and progesterone. Conclusions We conclude that the hormonal analysis carried through by FIA or IFMA are susceptible to the effects of anticoagulants in the biological material collected that vary depending on the type of assay.

Probing the statistical properties of unknown texts: application to the Voynich manuscript

Relevância:

30.00% 30.00%

Publicador:

Resumo:

While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.

Kernel Methods for Tree Structured Data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.

Design of alternative energy storage systems for hybrid vehicles based on statistical processing of driving cycles information

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hybrid vehicles represent the future for automakers, since they allow to improve the fuel economy and to reduce the pollutant emissions. A key component of the hybrid powertrain is the Energy Storage System, that determines the ability of the vehicle to store and reuse energy. Though electrified Energy Storage Systems (ESS), based on batteries and ultracapacitors, are a proven technology, Alternative Energy Storage Systems (AESS), based on mechanical, hydraulic and pneumatic devices, are gaining interest because they give the possibility of realizing low-cost mild-hybrid vehicles. Currently, most literature of design methodologies focuses on electric ESS, which are not suitable for AESS design. In this contest, The Ohio State University has developed an Alternative Energy Storage System design methodology. This work focuses on the development of driving cycle analysis methodology that is a key component of Alternative Energy Storage System design procedure. The proposed methodology is based on a statistical approach to analyzing driving schedules that represent the vehicle typical use. Driving data are broken up into power events sequence, namely traction and braking events, and for each of them, energy-related and dynamic metrics are calculated. By means of a clustering process and statistical synthesis methods, statistically-relevant metrics are determined. These metrics define cycle representative braking events. By using these events as inputs for the Alternative Energy Storage System design methodology, different system designs are obtained. Each of them is characterized by attributes, namely system volume and weight. In the last part the work, the designs are evaluated in simulation by introducing and calculating a metric related to the energy conversion efficiency. Finally, the designs are compared accounting for attributes and efficiency values. In order to automate the driving data extraction and synthesis process, a specific script Matlab based has been developed. Results show that the driving cycle analysis methodology, based on the statistical approach, allows to extract and synthesize cycle representative data. The designs based on cycle statistically-relevant metrics are properly sized and have satisfying efficiency values with respect to the expectations. An exception is the design based on the cycle worst-case scenario, corresponding to same approach adopted by the conventional electric ESS design methodologies. In this case, a heavy system with poor efficiency is produced. The proposed new methodology seems to be a valid and consistent support for Alternative Energy Storage System design.

Statistical mechanics of protein complexed and condensed DNA

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis I treat various biophysical questions arising in the context of complexed / ”protein-packed” DNA and DNA in confined geometries (like in viruses or toroidal DNA condensates). Using diverse theoretical methods I consider the statistical mechanics as well as the dynamics of DNA under these conditions. In the first part of the thesis (chapter 2) I derive for the first time the single molecule ”equation of state”, i.e. the force-extension relation of a looped DNA (Eq. 2.94) by using the path integral formalism. Generalizing these results I show that the presence of elastic substructures like loops or deflections caused by anchoring boundary conditions (e.g. at the AFM tip or the mica substrate) gives rise to a significant renormalization of the apparent persistence length as extracted from single molecule experiments (Eqs. 2.39 and 2.98). As I show the experimentally observed apparent persistence length reduction by a factor of 10 or more is naturally explained by this theory. In chapter 3 I theoretically consider the thermal motion of nucleosomes along a DNA template. After an extensive analysis of available experimental data and theoretical modelling of two possible mechanisms I conclude that the ”corkscrew-motion” mechanism most consistently explains this biologically important process. In chapter 4 I demonstrate that DNA-spools (architectures in which DNA circumferentially winds on a cylindrical surface, or onto itself) show a remarkable ”kinetic inertness” that protects them from tension-induced disruption on experimentally and biologically relevant timescales (cf. Fig. 4.1 and Eq. 4.18). I show that the underlying model establishes a connection between the seemingly unrelated and previously unexplained force peaks in single molecule nucleosome and DNA-toroid stretching experiments. Finally in chapter 5 I show that toroidally confined DNA (found in viruses, DNAcondensates or sperm chromatin) undergoes a transition to a twisted, highly entangled state provided that the aspect ratio of the underlying torus crosses a certain critical value (cf. Eq. 5.6 and the phase diagram in Fig. 5.4). The presented mechanism could rationalize several experimental mysteries, ranging from entangled and supercoiled toroids released from virus capsids to the unexpectedly short cholesteric pitch in the (toroidaly wound) sperm chromatin. I propose that the ”topological encapsulation” resulting from our model may have some practical implications for the gene-therapeutic DNA delivery process.

Development of instrumental and sensory analytical methods of food obtained by traditional and emerging technologies

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The consumer demand for natural, minimally processed, fresh like and functional food has lead to an increasing interest in emerging technologies. The aim of this PhD project was to study three innovative food processing technologies currently used in the food sector. Ultrasound-assisted freezing, vacuum impregnation and pulsed electric field have been investigated through laboratory scale systems and semi-industrial pilot plants. Furthermore, analytical and sensory techniques have been developed to evaluate the quality of food and vegetable matrix obtained by traditional and emerging processes. Ultrasound was found to be a valuable technique to improve the freezing process of potatoes, anticipating the beginning of the nucleation process, mainly when applied during the supercooling phase. A study of the effects of pulsed electric fields on phenol and enzymatic profile of melon juice has been realized and the statistical treatment of data was carried out through a response surface method. Next, flavour enrichment of apple sticks has been realized applying different techniques, as atmospheric, vacuum, ultrasound technologies and their combinations. The second section of the thesis deals with the development of analytical methods for the discrimination and quantification of phenol compounds in vegetable matrix, as chestnut bark extracts and olive mill waste water. The management of waste disposal in mill sector has been approached with the aim of reducing the amount of waste, and at the same time recovering valuable by-products, to be used in different industrial sectors. Finally, the sensory analysis of boiled potatoes has been carried out through the development of a quantitative descriptive procedure for the study of Italian and Mexican potato varieties. An update on flavour development in fresh and cooked potatoes has been realized and a sensory glossary, including general and specific definitions related to organic products, used in the European project Ecropolis, has been drafted.

Application of innovative methods of source apportionment in air contamination assessment

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, new tools in atmospheric pollutant sampling and analysis were applied in order to go deeper in source apportionment study. The project was developed mainly by the study of atmospheric emission sources in a suburban area influenced by a municipal solid waste incinerator (MSWI), a medium-sized coastal tourist town and a motorway. Two main research lines were followed. For what concerns the first line, the potentiality of the use of PM samplers coupled with a wind select sensor was assessed. Results showed that they may be a valid support in source apportionment studies. However, meteorological and territorial conditions could strongly affect the results. Moreover, new markers were investigated, particularly focusing on the processes of biomass burning. OC revealed a good biomass combustion process indicator, as well as all determined organic compounds. Among metals, lead and aluminium are well related to the biomass combustion. Surprisingly PM was not enriched of potassium during bonfire event. The second research line consists on the application of Positive Matrix factorization (PMF), a new statistical tool in data analysis. This new technique was applied to datasets which refer to different time resolution data. PMF application to atmospheric deposition fluxes identified six main sources affecting the area. The incinerator’s relative contribution seemed to be negligible. PMF analysis was then applied to PM2.5 collected with samplers coupled with a wind select sensor. The higher number of determined environmental indicators allowed to obtain more detailed results on the sources affecting the area. Vehicular traffic revealed the source of greatest concern for the study area. Also in this case, incinerator’s relative contribution seemed to be negligible. Finally, the application of PMF analysis to hourly aerosol data demonstrated that the higher the temporal resolution of the data was, the more the source profiles were close to the real one.

Local Trigonometric Methods for Time Series Smoothing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The thesis is concerned with local trigonometric regression methods. The aim was to develop a method for extraction of cyclical components in time series. The main results of the thesis are the following. First, a generalization of the filter proposed by Christiano and Fitzgerald is furnished for the smoothing of ARIMA(p,d,q) process. Second, a local trigonometric filter is built, with its statistical properties. Third, they are discussed the convergence properties of trigonometric estimators, and the problem of choosing the order of the model. A large scale simulation experiment has been designed in order to assess the performance of the proposed models and methods. The results show that local trigonometric regression may be a useful tool for periodic time series analysis.

Sustainable improvement of water networks : to achieve best decisions for rehabilitation plans using advanced GIS and statistical techniques

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many of developing countries are facing crisis in water management due to increasing of population, water scarcity, water contaminations and effects of world economic crisis. Water distribution systems in developing countries are facing many challenges of efficient repair and rehabilitation since the information of water network is very limited, which makes the rehabilitation assessment plans very difficult. Sufficient information with high technology in developed countries makes the assessment for rehabilitation easy. Developing countries have many difficulties to assess the water network causing system failure, deterioration of mains and bad water quality in the network due to pipe corrosion and deterioration. The limited information brought into focus the urgent need to develop economical assessment for rehabilitation of water distribution systems adapted to water utilities. Gaza Strip is subject to a first case study, suffering from severe shortage in the water supply and environmental problems and contamination of underground water resources. This research focuses on improvement of water supply network to reduce the water losses in water network based on limited database using techniques of ArcGIS and commercial water network software (WaterCAD). A new approach for rehabilitation water pipes has been presented in Gaza city case study. Integrated rehabilitation assessment model has been developed for rehabilitation water pipes including three components; hydraulic assessment model, Physical assessment model and Structural assessment model. WaterCAD model has been developed with integrated in ArcGIS to produce the hydraulic assessment model for water network. The model have been designed based on pipe condition assessment with 100 score points as a maximum points for pipe condition. As results from this model, we can indicate that 40% of water pipeline have score points less than 50 points and about 10% of total pipes length have less than 30 score points. By using this model, the rehabilitation plans for each region in Gaza city can be achieved based on available budget and condition of pipes. The second case study is Kuala Lumpur Case from semi-developed countries, which has been used to develop an approach to improve the water network under crucial conditions using, advanced statistical and GIS techniques. Kuala Lumpur (KL) has water losses about 40% and high failure rate, which make severe problem. This case can represent cases in South Asia countries. Kuala Lumpur faced big challenges to reduce the water losses in water network during last 5 years. One of these challenges is high deterioration of asbestos cement (AC) pipes. They need to replace more than 6500 km of AC pipes, which need a huge budget to be achieved. Asbestos cement is subject to deterioration due to various chemical processes that either leach out the cement material or penetrate the concrete to form products that weaken the cement matrix. This case presents an approach for geo-statistical model for modelling pipe failures in a water distribution network. Database of Syabas Company (Kuala Lumpur water company) has been used in developing the model. The statistical models have been calibrated, verified and used to predict failures for both networks and individual pipes. The mathematical formulation developed for failure frequency in Kuala Lumpur was based on different pipeline characteristics, reflecting several factors such as pipe diameter, length, pressure and failure history. Generalized linear model have been applied to predict pipe failures based on District Meter Zone (DMZ) and individual pipe levels. Based on Kuala Lumpur case study, several outputs and implications have been achieved. Correlations between spatial and temporal intervals of pipe failures also have been done using ArcGIS software. Water Pipe Assessment Model (WPAM) has been developed using the analysis of historical pipe failure in Kuala Lumpur which prioritizing the pipe rehabilitation candidates based on ranking system. Frankfurt Water Network in Germany is the third main case study. This case makes an overview for Survival analysis and neural network methods used in water network. Rehabilitation strategies of water pipes have been developed for Frankfurt water network in cooperation with Mainova (Frankfurt Water Company). This thesis also presents a methodology of technical condition assessment of plastic pipes based on simple analysis. This thesis aims to make contribution to improve the prediction of pipe failures in water networks using Geographic Information System (GIS) and Decision Support System (DSS). The output from the technical condition assessment model can be used to estimate future budget needs for rehabilitation and to define pipes with high priority for replacement based on poor condition. rn

Learning Methods and Algorithms for Semantic Text Classification across Multiple Domains

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.

«
1
2
...
56
57
58
59
60
61
62
63
64
»