195 resultados para Naive Bayes classifier
Resumo:
This paper proposes a practical prediction procedure for vertical displacement of a Rotarywing Unmanned Aerial Vehicle (RUAV) landing deck in the presence of stochastic sea state disturbances. A proper time series model tending to capture characteristics of the dynamic relationship between an observer and a landing deck is constructed, with model orders determined by a novel principle based on Bayes Information Criterion (BIC) and coefficients identified using the Forgetting Factor Recursive Least Square (FFRLS) method. In addition, a fast-converging online multi-step predictor is developed, which can be implemented more rapidly than the Auto-Regressive (AR) predictor as it requires less memory allocations when updating coefficients. Simulation results demonstrate that the proposed prediction approach exhibits satisfactory prediction performance, making it suitable for integration into ship-helicopter approach and landing guidance systems in consideration of computational capacity of the flight computer.
Resumo:
Next Generation Sequencing (NGS) has revolutionised molec- ular biology, allowing routine clinical sequencing. NGS data consists of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans, with some strains exhibiting antibiotic resistance. Here we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from other pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.
Resumo:
Travel time prediction has long been the topic of transportation research. But most relevant prediction models in the literature are limited to motorways. Travel time prediction on arterial networks is challenging due to involving traffic signals and significant variability of individual vehicle travel time. The limited availability of traffic data from arterial networks makes travel time prediction even more challenging. Recently, there has been significant interest of exploiting Bluetooth data for travel time estimation. This research analysed the real travel time data collected by the Brisbane City Council using the Bluetooth technology on arterials. Databases, including experienced average daily travel time are created and classified for approximately 8 months. Thereafter, based on data characteristics, Seasonal Auto Regressive Integrated Moving Average (SARIMA) modelling is applied on the database for short-term travel time prediction. The SARMIA model not only takes the previous continuous lags into account, but also uses the values from the same time of previous days for travel time prediction. This is carried out by defining a seasonality coefficient which improves the accuracy of travel time prediction in linear models. The accuracy, robustness and transferability of the model are evaluated through comparing the real and predicted values on three sites within Brisbane network. The results contain the detailed validation for different prediction horizons (5 min to 90 minutes). The model performance is evaluated mainly on congested periods and compared to the naive technique of considering the historical average.
Resumo:
Reliability of the performance of biometric identity verification systems remains a significant challenge. Individual biometric samples of the same person (identity class) are not identical at each presentation and performance degradation arises from intra-class variability and inter-class similarity. These limitations lead to false accepts and false rejects that are dependent. It is therefore difficult to reduce the rate of one type of error without increasing the other. The focus of this dissertation is to investigate a method based on classifier fusion techniques to better control the trade-off between the verification errors using text-dependent speaker verification as the test platform. A sequential classifier fusion architecture that integrates multi-instance and multisample fusion schemes is proposed. This fusion method enables a controlled trade-off between false alarms and false rejects. For statistically independent classifier decisions, analytical expressions for each type of verification error are derived using base classifier performances. As this assumption may not be always valid, these expressions are modified to incorporate the correlation between statistically dependent decisions from clients and impostors. The architecture is empirically evaluated by applying the proposed architecture for text dependent speaker verification using the Hidden Markov Model based digit dependent speaker models in each stage with multiple attempts for each digit utterance. The trade-off between the verification errors is controlled using the parameters, number of decision stages (instances) and the number of attempts at each decision stage (samples), fine-tuned on evaluation/tune set. The statistical validation of the derived expressions for error estimates is evaluated on test data. The performance of the sequential method is further demonstrated to depend on the order of the combination of digits (instances) and the nature of repetitive attempts (samples). The false rejection and false acceptance rates for proposed fusion are estimated using the base classifier performances, the variance in correlation between classifier decisions and the sequence of classifiers with favourable dependence selected using the 'Sequential Error Ratio' criteria. The error rates are better estimated by incorporating user-dependent (such as speaker-dependent thresholds and speaker-specific digit combinations) and class-dependent (such as clientimpostor dependent favourable combinations and class-error based threshold estimation) information. The proposed architecture is desirable in most of the speaker verification applications such as remote authentication, telephone and internet shopping applications. The tuning of parameters - the number of instances and samples - serve both the security and user convenience requirements of speaker-specific verification. The architecture investigated here is applicable to verification using other biometric modalities such as handwriting, fingerprints and key strokes.
Resumo:
Highly sensitive infrared cameras can produce high-resolution diagnostic images of the temperature and vascular changes of breasts. Wavelet transform based features are suitable in extracting the texture difference information of these images due to their scale-space decomposition. The objective of this study is to investigate the potential of extracted features in differentiating between breast lesions by comparing the two corresponding pectoral regions of two breast thermograms. The pectoral regions of breastsare important because near 50% of all breast cancer is located in this region. In this study, the pectoral region of the left breast is selected. Then the corresponding pectoral region of the right breast is identified. Texture features based on the first and the second sets of statistics are extracted from wavelet decomposed images of the pectoral regions of two breast thermograms. Principal component analysis is used to reduce dimension and an Adaboost classifier to evaluate classification performance. A number of different wavelet features are compared and it is shown that complex non-separable 2D discrete wavelet transform features perform better than their real separable counterparts.
Resumo:
Despite research that has been conducted elsewhere, little is known, to-date, about land cover dynamics and their impacts on land surface temperature (LST) in fast growing mega cities of developing countries. Landsat satellite images of 1989, 1999, and 2009 of Dhaka Metropolitan (DMP) area were used for analysis. This study first identified patterns of land cover changes between the periods and investigated their impacts on LST; second, applied artificial neural network to simulate land cover changes for 2019 and 2029; and finally, estimated their impacts on LST in respective periods. Simulation results show that if the current trend continues, 56% and 87% of the DMP area will likely to experience temperatures in the range of greater than or equal to 30°C in 2019 and 2029, respectively. The findings possess a major challenge for urban planners working in similar contexts. However, the technique presented in this paper would help them to quantify the impacts of different scenarios (e.g., vegetation loss to accommodate urban growth) on LST and consequently to devise appropriate policy measures.
Resumo:
Purpose: Matrix metalloproteinases (MMPs) degrade extracellular proteins and facilitate tumor growth, invasion, metastasis, and angiogenesis. This trial was undertaken to determine the effect of prinomastat, an inhibitor of selected MMPs, on the survival of patients with advanced non-small-cell lung cancer (NSCLC), when given in combination with gemcitabine-cisplatin chemotherapy. Patients and Methods: Chemotherapy-naive patients were randomly assigned to receive prinomastat 15 mg or placebo twice daily orally continuously, in combination with gemcitabine 1,250 mg/m2 days 1 and 8 plus cisplatin 75 mg/m2 day 1, every 21 days for up to six cycles. The planned sample size was 420 patients. Results: Study results at an interim analysis and lack of efficacy in another phase III trial prompted early closure of this study. There were 362 patients randomized (181 on prinomastat and 181 on placebo). One hundred thirty-four patients had stage IIIB disease with T4 primary tumor, 193 had stage IV disease, and 34 had recurrent disease (one enrolled patient was ineligible with stage IIIA disease). Overall response rates for the two treatment arms were similar (27% for prinomastat v 26% for placebo; P = .81). There was no difference in overall survival or time to progression; for prinomastat versus placebo patients, the median overall survival times were 11.5 versus 10.8 months (P = .82), 1-year survival rates were 43% v 38% (P = .45), and progression-free survival times were 6.1 v 5.5 months (P = .11), respectively. The toxicities of prinomastat were arthralgia, stiffness, and joint swelling. Treatment interruption was required in 38% of prinomastat patients and 12% of placebo patients. Conclusion: Prinomastat does not improve the outcome of chemotherapy in advanced NSCLC. © 2005 by American Society of Clinical Oncology.
Resumo:
INTRODUCTION: Performance status (PS) 2 patients with non-small cell lung cancer (NSCLC) experience more toxicity, lower response rates, and shorter survival times than healthier patients treated with standard chemotherapy. Paclitaxel poliglumex (PPX), a macromolecule drug conjugate of paclitaxel and polyglutamic acid, reduces systemic exposure to peak concentrations of free paclitaxel and may lead to increased concentrations in tumors due to enhanced vascular permeability. METHODS: Chemotherapy-naive PS 2 patients with advanced NSCLC were randomized to receive carboplatin (area under the curve = 6) and either PPX (210 mg/m/10 min without routine steroid premedication) or paclitaxel (225 mg/m/3 h with standard premedication) every 3 weeks. The primary end point was overall survival. RESULTS: A total of 400 patients were enrolled. Alopecia, arthralgias/myalgias, and cardiac events were significantly less frequent with PPX/carboplatin, whereas grade ≥3 neutropenia and grade 3 neuropathy showed a trend of worsening. There was no significant difference in the incidence of hypersensitivity reactions despite the absence of routine premedication in the PPX arm. Overall survival was similar between treatment arms (hazard ratio, 0.97; log rank p = 0.769). Median and 1-year survival rates were 7.9 months and 31%, for PPX versus 8 months and 31% for paclitaxel. Disease control rates were 64% and 69% for PPX and paclitaxel, respectively. Time to progression was similar: 3.9 months for PPX/carboplatin versus 4.6 months for paclitaxel/carboplatin (p = 0.210). CONCLUSION: PPX/carboplatin failed to provide superior survival compared with paclitaxel/carboplatin in the first-line treatment of PS 2 patients with NSCLC, but the results with respect to progression-free survival and overall survival were comparable and the PPX regimen was more convenient. © 2008International Association for the Study of Lung Cancer.
Resumo:
INTRODUCTION In retrospective analyses of patients with nonsquamous non-small-cell lung cancer treated with pemetrexed, low thymidylate synthase (TS) expression is associated with better clinical outcomes. This phase II study explored this association prospectively at the protein and mRNA-expression level. METHODS Treatment-naive patients with nonsquamous non-small-cell lung cancer (stage IIIB/IV) had four cycles of first-line chemotherapy with pemetrexed/cisplatin. Nonprogressing patients continued on pemetrexed maintenance until progression or maximum tolerability. TS expression (nucleus/cytoplasm/total) was assessed in diagnostic tissue samples by immunohistochemistry (IHC; H-scores), and quantitative reverse-transcriptase polymerase chain reaction. Cox regression was used to assess the association between H-scores and progression-free/overall survival (PFS/OS) distribution estimated by the Kaplan-Meier method. Maximal χ analysis identified optimal cutpoints between low TS- and high TS-expression groups, yielding maximal associations with PFS/OS. RESULTS The study enrolled 70 patients; of these 43 (61.4%) started maintenance treatment. In 60 patients with valid H-scores, median (m) PFS was 5.5 (95% confidence interval [CI], 3.9-6.9) months, mOS was 9.6 (95% CI, 7.3-15.7) months. Higher nuclear TS expression was significantly associated with shorter PFS and OS (primary analysis IHC, PFS: p < 0.0001; hazard ratio per 1-unit increase: 1.015; 95%CI, 1.008-1.021). At the optimal cutpoint of nuclear H-score (70), mPFS in the low TS- versus high TS-expression groups was 7.1 (5.7-8.3) versus 2.6 (1.3-4.1) months (p = 0.0015; hazard ratio = 0.28; 95%CI, 0.16-0.52; n = 40/20). Trends were similar for cytoplasm H-scores, quantitative reverse-transcriptase polymerase chain reaction and other clinical endpoints (OS, response, and disease control). CONCLUSIONS The primary endpoint was met; low TS expression was associated with longer PFS. Further randomized studies are needed to explore nuclear TS IHC expression as a potential biomarker of clinical outcomes for pemetrexed treatment in larger patient cohorts. © 2013 by the International Association for the Study of Lung Cancer.
Resumo:
Background: Use of cetuximab, a monoclonal antibody targeting the epidermal growth factor receptor (EGFR), has the potential to increase survival in patients with advanced non-small-cell lung cancer. We therefore compared chemotherapy plus cetuximab with chemotherapy alone in patients with advanced EGFR-positive non-small-cell lung cancer. Methods: In a multinational, multicentre, open-label, phase III trial, chemotherapy-naive patients (≥18 years) with advanced EGFR-expressing histologically or cytologically proven stage wet IIIB or stage IV non-small-cell lung cancer were randomly assigned in a 1:1 ratio to chemotherapy plus cetuximab or just chemotherapy. Chemotherapy was cisplatin 80 mg/m 2 intravenous infusion on day 1, and vinorelbine 25 mg/m 2 intravenous infusion on days 1 and 8 of every 3-week cycle) for up to six cycles. Cetuximab-at a starting dose of 400 mg/m 2 intravenous infusion over 2 h on day 1, and from day 8 onwards at 250 mg/m 2 over 1 h per week-was continued after the end of chemotherapy until disease progression or unacceptable toxicity had occurred. The primary endpoint was overall survival. Analysis was by intention to treat. This study is registered with ClinicalTrials.gov, number NCT00148798. Findings: Between October, 2004, and January, 2006, 1125 patients were randomly assigned to chemotherapy plus cetuximab (n=557) or chemotherapy alone (n=568). Patients given chemotherapy plus cetuximab survived longer than those in the chemotherapy-alone group (median 11·3 months vs 10·1 months; hazard ratio for death 0·871 [95% CI 0·762-0·996]; p=0·044). The main cetuximab-related adverse event was acne-like rash (57 [10%] of 548, grade 3). Interpretation: Addition of cetuximab to platinum-based chemotherapy represents a new treatment option for patients with advanced non-small-cell lung cancer. Funding: Merck KGaA. © 2009 Elsevier Ltd. All rights reserved.
Resumo:
The detection and correction of defects remains among the most time consuming and expensive aspects of software development. Extensive automated testing and code inspections may mitigate their effect, but some code fragments are necessarily more likely to be faulty than others, and automated identification of fault prone modules helps to focus testing and inspections, thus limiting wasted effort and potentially improving detection rates. However, software metrics data is often extremely noisy, with enormous imbalances in the size of the positive and negative classes. In this work, we present a new approach to predictive modelling of fault proneness in software modules, introducing a new feature representation to overcome some of these issues. This rank sum representation offers improved or at worst comparable performance to earlier approaches for standard data sets, and readily allows the user to choose an appropriate trade-off between precision and recall to optimise inspection effort to suit different testing environments. The method is evaluated using the NASA Metrics Data Program (MDP) data sets, and performance is compared with existing studies based on the Support Vector Machine (SVM) and Naïve Bayes (NB) Classifiers, and with our own comprehensive evaluation of these methods.
Resumo:
Hot spot identification (HSID) aims to identify potential sites—roadway segments, intersections, crosswalks, interchanges, ramps, etc.—with disproportionately high crash risk relative to similar sites. An inefficient HSID methodology might result in either identifying a safe site as high risk (false positive) or a high risk site as safe (false negative), and consequently lead to the misuse the available public funds, to poor investment decisions, and to inefficient risk management practice. Current HSID methods suffer from issues like underreporting of minor injury and property damage only (PDO) crashes, challenges of accounting for crash severity into the methodology, and selection of a proper safety performance function to model crash data that is often heavily skewed by a preponderance of zeros. Addressing these challenges, this paper proposes a combination of a PDO equivalency calculation and quantile regression technique to identify hot spots in a transportation network. In particular, issues related to underreporting and crash severity are tackled by incorporating equivalent PDO crashes, whilst the concerns related to the non-count nature of equivalent PDO crashes and the skewness of crash data are addressed by the non-parametric quantile regression technique. The proposed method identifies covariate effects on various quantiles of a population, rather than the population mean like most methods in practice, which more closely corresponds with how black spots are identified in practice. The proposed methodology is illustrated using rural road segment data from Korea and compared against the traditional EB method with negative binomial regression. Application of a quantile regression model on equivalent PDO crashes enables identification of a set of high-risk sites that reflect the true safety costs to the society, simultaneously reduces the influence of under-reported PDO and minor injury crashes, and overcomes the limitation of traditional NB model in dealing with preponderance of zeros problem or right skewed dataset.
Resumo:
Field robots often rely on laser range finders (LRFs) to detect obstacles and navigate autonomously. Despite recent progress in sensing technology and perception algorithms, adverse environmental conditions, such as the presence of smoke, remain a challenging issue for these robots. In this paper, we investigate the possibility to improve laser-based perception applications by anticipating situations when laser data are affected by smoke, using supervised learning and state-of-the-art visual image quality analysis. We propose to train a k-nearest-neighbour (kNN) classifier to recognise situations where a laser scan is likely to be affected by smoke, based on visual data quality features. This method is evaluated experimentally using a mobile robot equipped with LRFs and a visual camera. The strengths and limitations of the technique are identified and discussed, and we show that the method is beneficial if conservative decisions are the most appropriate.
Resumo:
A cell classification algorithm that uses first, second and third order statistics of pixel intensity distributions over pre-defined regions is implemented and evaluated. A cell image is segmented into 6 regions extending from a boundary layer to an inner circle. First, second and third order statistical features are extracted from histograms of pixel intensities in these regions. Third order statistical features used are one-dimensional bispectral invariants. 108 features were considered as candidates for Adaboost based fusion. The best 10 stage fused classifier was selected for each class and a decision tree constructed for the 6-class problem. The classifier is robust, accurate and fast by design.
Resumo:
We present an approach to automatically de-identify health records. In our approach, personal health information is identified using a Conditional Random Fields machine learning classifier, a large set of linguistic and lexical features, and pattern matching techniques. Identified personal information is then removed from the reports. The de-identification of personal health information is fundamental for the sharing and secondary use of electronic health records, for example for data mining and disease monitoring. The effectiveness of our approach is first evaluated on the 2007 i2b2 Shared Task dataset, a widely adopted dataset for evaluating de-identification techniques. Subsequently, we investigate the robustness of the approach to limited training data; we study its effectiveness on different type and quality of data by evaluating the approach on scanned pathology reports from an Australian institution. This data contains optical character recognition errors, as well as linguistic conventions that differ from those contained in the i2b2 dataset, for example different date formats. The findings suggest that our approach compares to the best approach from the 2007 i2b2 Shared Task; in addition, the approach is found to be robust to variations of training size, data type and quality in presence of sufficient training data.