776 resultados para Defect Prediction
Resumo:
Most empirical disciplines promote the reuse and sharing of datasets, as it leads to greater possibility of replication. While this is increasingly the case in Empirical Software Engineering, some of the most popular bug-fix datasets are now known to be biased. This raises two significants concerns: first, that sample bias may lead to underperforming prediction models, and second, that the external validity of the studies based on biased datasets may be suspect. This issue has raised considerable consternation in the ESE literature in recent years. However, there is a confounding factor of these datasets that has not been examined carefully: size. Biased datasets are sampling only some of the data that could be sampled, and doing so in a biased fashion; but biased samples could be smaller, or larger. Smaller data sets in general provide less reliable bases for estimating models, and thus could lead to inferior model performance. In this setting, we ask the question, what affects performance more? bias, or size? We conduct a detailed, large-scale meta-analysis, using simulated datasets sampled with bias from a high-quality dataset which is relatively free of bias. Our results suggest that size always matters just as much bias direction, and in fact much more than bias direction when considering information-retrieval measures such as AUC and F-score. This indicates that at least for prediction models, even when dealing with sampling bias, simply finding larger samples can sometimes be sufficient. Our analysis also exposes the complexity of the bias issue, and raises further issues to be explored in the future.
Resumo:
Software Repository Mining (MSR) is a research area that analyses software repositories in order to derive relevant information for the research and practice of software engineering. The main goal of repository mining is to extract static information from repositories (e.g. code repository or change requisition system) into valuable information providing a way to support the decision making of software projects. On the other hand, another research area called Process Mining (PM) aims to find the characteristics of the underlying process of business organizations, supporting the process improvement and documentation. Recent works have been doing several analyses through MSR and PM techniques: (i) to investigate the evolution of software projects; (ii) to understand the real underlying process of a project; and (iii) create defect prediction models. However, few research works have been focusing on analyzing the contributions of software developers by means of MSR and PM techniques. In this context, this dissertation proposes the development of two empirical studies of assessment of the contribution of software developers to an open-source and a commercial project using those techniques. The contributions of developers are assessed through three different perspectives: (i) buggy commits; (ii) the size of commits; and (iii) the most important bugs. For the opensource project 12.827 commits and 8.410 bugs have been analyzed while 4.663 commits and 1.898 bugs have been analyzed for the commercial project. Our results indicate that, for the open source project, the developers classified as core developers have contributed with more buggy commits (although they have contributed with the majority of commits), more code to the project (commit size) and more important bugs solved while the results could not indicate differences with statistical significance between developer groups for the commercial project
Resumo:
The availability of a huge amount of source code from code archives and open-source projects opens up the possibility to merge machine learning, programming languages, and software engineering research fields. This area is often referred to as Big Code where programming languages are treated instead of natural languages while different features and patterns of code can be exploited to perform many useful tasks and build supportive tools. Among all the possible applications which can be developed within the area of Big Code, the work presented in this research thesis mainly focuses on two particular tasks: the Programming Language Identification (PLI) and the Software Defect Prediction (SDP) for source codes. Programming language identification is commonly needed in program comprehension and it is usually performed directly by developers. However, when it comes at big scales, such as in widely used archives (GitHub, Software Heritage), automation of this task is desirable. To accomplish this aim, the problem is analyzed from different points of view (text and image-based learning approaches) and different models are created paying particular attention to their scalability. Software defect prediction is a fundamental step in software development for improving quality and assuring the reliability of software products. In the past, defects were searched by manual inspection or using automatic static and dynamic analyzers. Now, the automation of this task can be tackled using learning approaches that can speed up and improve related procedures. Here, two models have been built and analyzed to detect some of the commonest bugs and errors at different code granularity levels (file and method levels). Exploited data and models’ architectures are analyzed and described in detail. Quantitative and qualitative results are reported for both PLI and SDP tasks while differences and similarities concerning other related works are discussed.
Resumo:
The predictive potential of six selected factors was assessed in 72 patients with primary myelodysplastic syndrome using univariate and multivariate logistic regression analysis of survival at 18 months. Factors were age (above median of 69 years), dysplastic features in the three myeloid bone marrow cell lineages, presence of chromosome defects, all metaphases abnormal, double or complex chromosome defects (C23), and a Bournemouth score of 2, 3, or 4 (B234). In the multivariate approach, B234 and C23 proved to be significantly associated with a reduction in the survival probability. The similarity of the regression coefficients associated with these two factors means that they have about the same weight. Consequently, the model was simplified by counting the number of factors (0, 1, or 2) present in each patient, thus generating a scoring system called the Lausanne-Bournemouth score (LB score). The LB score combines the well-recognized and easy-to-use Bournemouth score (B score) with the chromosome defect complexity, C23 constituting an additional indicator of patient outcome. The predicted risk of death within 18 months calculated from the model is as follows: 7.1% (confidence interval: 1.7-24.8) for patients with an LB score of 0, 60.1% (44.7-73.8) for an LB score of 1, and 96.8% (84.5-99.4) for an LB score of 2. The scoring system presented here has several interesting features. The LB score may improve the predictive value of the B score, as it is able to recognize two prognostic groups in the intermediate risk category of patients with B scores of 2 or 3. It has also the ability to identify two distinct prognostic subclasses among RAEB and possibly CMML patients. In addition to its above-described usefulness in the prognostic evaluation, the LB score may bring new insights into the understanding of evolution patterns in MDS. We used the combination of the B score and chromosome complexity to define four classes which may be considered four possible states of myelodysplasia and which describe two distinct evolutional pathways.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Objectives To evaluate the accuracy and probabilities of different fetal ultrasound parameters to predict neonatal outcome in isolated congenital diaphragmatic hernia (CDH). Methods Between January 2004 and December 2010, we evaluated prospectively 108 fetuses with isolated CDH (82 left-sided and 26 right-sided). The following parameters were evaluated: gestational age at diagnosis, side of the diaphragmatic defect, presence of polyhydramnios, presence of liver herniated into the fetal thorax (liver-up), lung-to-head ratio (LHR) and observed/expected LHR (o/e-LHR), observed/expected contralateral and total fetal lung volume (o/e-ContFLV and o/e-TotFLV) ratios, ultrasonographic fetal lung volume/fetal weight ratio (US-FLW), observed/expected contralateral and main pulmonary artery diameter (o/e-ContPA and o/eMPA) ratios and the contralateral vascularization index (Cont-VI). The outcomes were neonatal death and severe postnatal pulmonary arterial hypertension (PAH). Results Neonatal mortality was 64.8% (70/108). Severe PAH was diagnosed in 68 (63.0%) cases, of which 63 died neonatally (92.6%) (P < 0.001). Gestational age at diagnosis, side of the defect and polyhydramnios were not associated with poor outcome (P > 0.05). LHR, o/eLHR, liver-up, o/e-ContFLV, o/e-TotFLV, US-FLW, o/eContPA, o/e-MPA and Cont-VI were associated with both neonatal death and severe postnatal PAH (P < 0.001). Receiver-operating characteristics curves indicated that measuring total lung volumes (o/e-TotFLV and US-FLW) was more accurate than was considering only the contralateral lung sizes (LHR, o/e-LHR and o/e-ContFLV; P < 0.05), and Cont-VI was the most accurate ultrasound parameter to predict neonatal death and severe PAH (P < 0.001). Conclusions Evaluating total lung volumes is more accurate than is measuring only the contralateral lung size. Evaluating pulmonary vascularization (Cont-VI) is the most accurate predictor of neonatal outcome. Estimating the probability of survival and severe PAH allows classification of cases according to prognosis. Copyright (C) 2011 ISUOG. Published by John Wiley & Sons, Ltd.
Resumo:
An unusual presentation of a focal osteoporotic bone marrow defect (FOBMD) of the mandible mimicking a cystic lesion is documented. A definitive diagnosis could be established only on the basis of the histopathologic evaluation. A 66-year-old Brazilian woman was referred by her dentist for well-defined radiolucency of the mandibular molar region suggesting a cystic lesion of odontogenic origin. The computed tomography scan confirmed that the lesion did not affect the corticals. The biopsy confirmed the diagnosis of FOBMD. The diagnostic difficulty in the current case is obvious, because FOBMD, usually exhibiting an ill-defined radiolucency, is seldom suspected preoperatively when a differential diagnosis is considered for focal well-defined radiolucent areas in the jaws.
Resumo:
New DNA-based predictive tests for physical characteristics and inference of ancestry are highly informative tools that are being increasingly used in forensic genetic analysis. Two eye colour prediction models: a Bayesian classifier - Snipper and a multinomial logistic regression (MLR) system for the Irisplex assay, have been described for the analysis of unadmixed European populations. Since multiple SNPs in combination contribute in varying degrees to eye colour predictability in Europeans, it is likely that these predictive tests will perform in different ways amongst admixed populations that have European co-ancestry, compared to unadmixed Europeans. In this study we examined 99 individuals from two admixed South American populations comparing eye colour versus ancestry in order to reveal a direct correlation of light eye colour phenotypes with European co-ancestry in admixed individuals. Additionally, eye colour prediction following six prediction models, using varying numbers of SNPs and based on Snipper and MLR, were applied to the study populations. Furthermore, patterns of eye colour prediction have been inferred for a set of publicly available admixed and globally distributed populations from the HGDP-CEPH panel and 1000 Genomes databases with a special emphasis on admixed American populations similar to those of the study samples.
Resumo:
Negative-ion mode electrospray ionization, ESI(-), with Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) was coupled to a Partial Least Squares (PLS) regression and variable selection methods to estimate the total acid number (TAN) of Brazilian crude oil samples. Generally, ESI(-)-FT-ICR mass spectra present a power of resolution of ca. 500,000 and a mass accuracy less than 1 ppm, producing a data matrix containing over 5700 variables per sample. These variables correspond to heteroatom-containing species detected as deprotonated molecules, [M - H](-) ions, which are identified primarily as naphthenic acids, phenols and carbazole analog species. The TAN values for all samples ranged from 0.06 to 3.61 mg of KOH g(-1). To facilitate the spectral interpretation, three methods of variable selection were studied: variable importance in the projection (VIP), interval partial least squares (iPLS) and elimination of uninformative variables (UVE). The UVE method seems to be more appropriate for selecting important variables, reducing the dimension of the variables to 183 and producing a root mean square error of prediction of 0.32 mg of KOH g(-1). By reducing the size of the data, it was possible to relate the selected variables with their corresponding molecular formulas, thus identifying the main chemical species responsible for the TAN values.
Resumo:
To evaluate the correlation between neck circumference and insulin resistance and components of metabolic syndrome in adolescents with different adiposity levels and pubertal stages, as well as to determine the usefulness of neck circumference to predict insulin resistance in adolescents. Cross-sectional study with 388 adolescents of both genders from ten to 19 years old. The adolescents underwent anthropometric and body composition assessment, including neck and waist circumferences, and biochemical evaluation. The pubertal stage was obtained by self-assessment, and the blood pressure, by auscultation. Insulin resistance was evaluated by the Homeostasis Model Assessment-Insulin Resistance. The correlation between two variables was evaluated by partial correlation coefficient adjusted for the percentage of body fat and pubertal stage. The performance of neck circumference to identify insulin resistance was tested by Receiver Operating Characteristic Curve. After the adjustment for percentage body fat and pubertal stage, neck circumference correlated with waist circumference, blood pressure, triglycerides and markers of insulin resistance in both genders. The results showed that the neck circumference is a useful tool for the detection of insulin resistance and changes in the indicators of metabolic syndrome in adolescents. The easiness of application and low cost of this measure may allow its use in Public Health services.
Resumo:
PURPOSE: To study were to reproduce an alveolar bone defect model in Wistar rats to be used for testing the efficacy of stem cell therapies. Additionally, we also aimed to determine the osteogenesis process of this osseous defect in the 1 month period post-surgery. METHODS: The animals were randomly divided into two groups of 7 animals each. A gingivobuccal incision was made, and a bone defect of 28 mm² of area was performed in the alveolar region. Animals were killed at 2 weeks after surgery (n=7) and 4 weeks after surgery (n=7). RESULTS: The average area of the alveolar defect at time point of 2 weeks was 22.27 ± 1.31 mm² and the average area of alveolar defect at time point of 4 weeks was 9.03 ± 1.17 mm². The average amount of bone formation at time point of 2 weeks was 5.73 ± 1.31 mm² and the average amount of bone formation at time point of 4 weeks was 19 ± 1.17 mm². Statistically significant differences between the amount of bone formation at 2 weeks and 4 weeks after surgery were seen (p=0.003). CONCLUSION: The highest rate of ossification occurred mostly from 2 to 4 weeks after surgery. This observation suggests that 4 weeks after the bone defect creation should be a satisfactory timing to assess the potential of bone inductive stem cells to accelerate bone regeneration in Wistar rats.
Resumo:
A series of nine new [3-(disubstituted-phosphate)-4,4,4-trifluoro-butyl]-carbamic acid ethyl esters (phosphate-carbamate compounds) was obtained through the reaction of (4,4,4-trifluoro-3-hydroxybut-1-yl)-carbamic acid ethyl esters with phosphorus oxychloride followed by the addition of alcohols. The products were characterized by ¹H, 13C, 31P, and 19F NMR spectroscopy, GC-MS, and elemental analysis. All the synthesized compounds were screened for acetylcholinesterase (AChE) inhibitory activity using the Ellman method. All compounds containing phosphate and carbamate pharmacophores in their structures showed enzyme inhibition, being the compound bearing the diethoxy phosphate group (2b) the most active compound. Molecular modeling studies were performed to investigate the detailed interactions between AChE active site and small-molecule inhibitor candidates, providing valuable structural insights into AChE inhibition.
Resumo:
PURPOSE: The ability to predict and understand which biomechanical properties of the cornea are responsible for the stability or progression of keratoconus may be an important clinical and surgical tool for the eye-care professional. We have developed a finite element model of the cornea, that tries to predicts keratoconus-like behavior and its evolution based on material properties of the corneal tissue. METHODS: Corneal material properties were modeled using bibliographic data and corneal topography was based on literature values from a schematic eye model. Commercial software was used to simulate mechanical and surface properties when the cornea was subject to different local parameters, such as elasticity. RESULTS: The simulation has shown that, depending on the corneal initial surface shape, changes in local material properties and also different intraocular pressures values induce a localized protuberance and increase in curvature when compared to the remaining portion of the cornea. CONCLUSIONS: This technique provides a quantitative and accurate approach to the problem of understanding the biomechanical nature of keratoconus. The implemented model has shown that changes in local material properties of the cornea and intraocular pressure are intrinsically related to keratoconus pathology and its shape/curvature.
Resumo:
A new criterion has been recently proposed combining the topological instability (lambda criterion) and the average electronegativity difference (Delta e) among the elements of an alloy to predict and select new glass-forming compositions. In the present work, this criterion (lambda.Delta e) is applied to the Al-Ni-La and Al-Ni-Gd ternary systems and its predictability is validated using literature data for both systems and additionally, using own experimental data for the Al-La-Ni system. The compositions with a high lambda.Delta e value found in each ternary system exhibit a very good correlation with the glass-forming ability of different alloys as indicated by their supercooled liquid regions (Delta T(x)) and their critical casting thicknesses. In the case of the Al-La-Ni system, the alloy with the largest lambda.Delta e value, La(56)Al(26.5)Ni(17.5), exhibits the highest glass-forming ability verified for this system. Therefore, the combined lambda.Delta e criterion is a simple and efficient tool to select new glass-forming compositions in Al-Ni-RE systems. (C) 2011 American Institute of Physics. [doi: 10.1063/1.3563099]
Resumo:
Identification, prediction, and control of a system are engineering subjects, regardless of the nature of the system. Here, the temporal evolution of the number of individuals with dengue fever weekly recorded in the city of Rio de Janeiro, Brazil, during 2007, is used to identify SIS (susceptible-infective-susceptible) and SIR (susceptible-infective-removed) models formulated in terms of cellular automaton (CA). In the identification process, a genetic algorithm (GA) is utilized to find the probabilities of the state transition S -> I able of reproducing in the CA lattice the historical series of 2007. These probabilities depend on the number of infective neighbors. Time-varying and non-time-varying probabilities, three different sizes of lattices, and two kinds of coupling topology among the cells are taken into consideration. Then, these epidemiological models built by combining CA and GA are employed for predicting the cases of sick persons in 2008. Such models can be useful for forecasting and controlling the spreading of this infectious disease.