32 resultados para random forest regression
em Instituto Politécnico do Porto, Portugal
Resumo:
More than ever, there is an increase of the number of decision support methods and computer aided diagnostic systems applied to various areas of medicine. In breast cancer research, many works have been done in order to reduce false-positives when used as a double reading method. In this study, we aimed to present a set of data mining techniques that were applied to approach a decision support system in the area of breast cancer diagnosis. This method is geared to assist clinical practice in identifying mammographic findings such as microcalcifications, masses and even normal tissues, in order to avoid misdiagnosis. In this work a reliable database was used, with 410 images from about 115 patients, containing previous reviews performed by radiologists as microcalcifications, masses and also normal tissue findings. Throughout this work, two feature extraction techniques were used: the gray level co-occurrence matrix and the gray level run length matrix. For classification purposes, we considered various scenarios according to different distinct patterns of injuries and several classifiers in order to distinguish the best performance in each case described. The many classifiers used were Naïve Bayes, Support Vector Machines, k-nearest Neighbors and Decision Trees (J48 and Random Forests). The results in distinguishing mammographic findings revealed great percentages of PPV and very good accuracy values. Furthermore, it also presented other related results of classification of breast density and BI-RADS® scale. The best predictive method found for all tested groups was the Random Forest classifier, and the best performance has been achieved through the distinction of microcalcifications. The conclusions based on the several tested scenarios represent a new perspective in breast cancer diagnosis using data mining techniques.
Resumo:
Beyond the classical statistical approaches (determination of basic statistics, regression analysis, ANOVA, etc.) a new set of applications of different statistical techniques has increasingly gained relevance in the analysis, processing and interpretation of data concerning the characteristics of forest soils. This is possible to be seen in some of the recent publications in the context of Multivariate Statistics. These new methods require additional care that is not always included or refered in some approaches. In the particular case of geostatistical data applications it is necessary, besides to geo-reference all the data acquisition, to collect the samples in regular grids and in sufficient quantity so that the variograms can reflect the spatial distribution of soil properties in a representative manner. In the case of the great majority of Multivariate Statistics techniques (Principal Component Analysis, Correspondence Analysis, Cluster Analysis, etc.) despite the fact they do not require in most cases the assumption of normal distribution, they however need a proper and rigorous strategy for its utilization. In this work, some reflections about these methodologies and, in particular, about the main constraints that often occur during the information collecting process and about the various linking possibilities of these different techniques will be presented. At the end, illustrations of some particular cases of the applications of these statistical methods will also be presented.
Resumo:
Long-term contractual decisions are the basis of an efficient risk management. However those types of decisions have to be supported with a robust price forecast methodology. This paper reports a different approach for long-term price forecast which tries to give answers to that need. Making use of regression models, the proposed methodology has as main objective to find the maximum and a minimum Market Clearing Price (MCP) for a specific programming period, and with a desired confidence level α. Due to the problem complexity, the meta-heuristic Particle Swarm Optimization (PSO) was used to find the best regression parameters and the results compared with the obtained by using a Genetic Algorithm (GA). To validate these models, results from realistic data are presented and discussed in detail.
Resumo:
Amulti-residue methodology based on a solid phase extraction followed by gas chromatography–tandem mass spectrometry was developed for trace analysis of 32 compounds in water matrices, including estrogens and several pesticides from different chemical families, some of them with endocrine disrupting properties. Matrix standard calibration solutions were prepared by adding known amounts of the analytes to a residue-free sample to compensate matrix-induced chromatographic response enhancement observed for certain pesticides. Validation was done mainly according to the International Conference on Harmonisation recommendations, as well as some European and American validation guidelines with specifications for pesticides analysis and/or GC–MS methodology. As the assumption of homoscedasticity was not met for analytical data, weighted least squares linear regression procedure was applied as a simple and effective way to counteract the greater influence of the greater concentrations on the fitted regression line, improving accuracy at the lower end of the calibration curve. The method was considered validated for 31 compounds after consistent evaluation of the key analytical parameters: specificity, linearity, limit of detection and quantification, range, precision, accuracy, extraction efficiency, stability and robustness.
Resumo:
Every year European citizens become victims of devastating fires, which are especially disastrous for Southern European countries. Apart from the numerous health and economic consequences, fires generate hazardous pollutants that are introduced into the environment, thus representing serious risks for public health. In that regard, particulate matter (PM) is of amajor concern. Thus, the objectives of thisworkwere to characterize the trend of forest fire occurrences and burnt area during the period of 2005 and 2010 and to study the influence of forest fires on levels of particulatematter PM10 and PM2.5. In 2010, 22,026 forest fires occurred in Portugal. The northern region was the most affected by forest fires, with 27% of occurrences in Oporto district. The annual means of PM10 and PM2.5 concentrations at two urban background sites were 25±14 μg m−3 and 8.2±4.9 μg m−3, and 17±13 μg m−3 and 7.3±5.9 μg m−3, respectively. At both sites the highest levels of PMfractionswere observed during July and August of 2010, corresponding to the periods when majority (66%) of forest fires occurred. Furthermore, PM10 daily limit at the two sites was exceeded during 20 and 5 days, respectively; 56%, and respectively 60% of those exceedances occurred during the forest fire season. Considering that the risks of forest fire ignition and severity are enhanced with elevated temperatures, the climate change might increase the environmental impacts of forest fires.
Resumo:
This paper presents a biased random-key genetic algorithm for the resource constrained project scheduling problem. The chromosome representation of the problem is based on random keys. Active schedules are constructed using a priority-rule heuristic in which the priorities of the activities are defined by the genetic algorithm. A forward-backward improvement procedure is applied to all solutions. The chromosomes supplied by the genetic algorithm are adjusted to reflect the solutions obtained by the improvement procedure. The heuristic is tested on a set of standard problems taken from the literature and compared with other approaches. The computational results validate the effectiveness of the proposed algorithm.
Resumo:
This paper presents a genetic algorithm for the Resource Constrained Project Scheduling Problem (RCPSP). The chromosome representation of the problem is based on random keys. The schedule is constructed using a heuristic priority rule in which the priorities of the activities are defined by the genetic algorithm. The heuristic generates parameterized active schedules. The approach was tested on a set of standard problems taken from the literature and compared with other approaches. The computational results validate the effectiveness of the proposed algorithm.
Resumo:
One of the most important measures to prevent wild forest fires is the use of prescribed and controlled burning actions as it reduce the fuel mass availability. The impact of these management activities on soil physical and chemical properties varies according to the type of both soil and vegetation. Decisions in forest management plans are often based on the results obtained from soil-monitoring campaigns. Those campaigns are often man-labor intensive and expensive. In this paper we have successfully used the multivariate statistical technique Robust Principal Analysis Compounds (ROBPCA) to investigate on the sampling procedure effectiveness for two different methodologies, in order to reflect on the possibility of simplifying and reduce the sampling collection process and its auxiliary laboratory analysis work towards a cost-effective and competent forest soil characterization.
Resumo:
Portugal, as well as the Mediterranean basin, is favorable to the occurrence of forest fires. In this work a statistical analysis was carried out based on the official information, considering the forest fires occurrences and the corresponding burned area for each of the districts of the mainland Portugal, between 1996 and 2010. Concerning to the forest fires occurrence it was possible to identify three main regions in mainland Portugal, while the burned area can be characterized in two main regions. Associations between districts and years are different in the two approaches. The results obtained provide a synthetic analysis of the phenomenon of forest fires in continental Portugal, based on all the official information available to date.
Resumo:
The development and implementation of measures which promote the reduction of the impacts of forest fires on soils is imperative and should be part of any strategy for forest and soil preservation and recovery, especially considering the actual scenario of continuous growth in the number of fires and burnt area. Consequently, with the dendrocaustologic reality that has characterized the Portuguese mainland in recent decades, a research project promoted by the Center for the Study of Geography and Spatial Planning (CEGOT) was implemented with the objective of applying several erosion mitigation measures in a burned area of the Peneda-Geres National Park in NW Portugal. This paper therefore seeks to present the measures applied in the study area within the project Soil Protec, relating to triggered channel processes and the results of preliminary observations concerning the evaluation of the effectiveness of erosion mitigation measures implemented, as well as their cost/benefit ratio.
Resumo:
Among the most important measures to prevent wild forest fires is the use of prescribed and controlled burning actions in order to reduce the availability of fuel mass. However, the impact of these activities on soil physical and chemical properties varies according to the type of both soil and vegetation and is not fully understood. Therefore, soil monitoring campaigns are often used to measure these impacts. In this paper we have successfully used three statistical data treatments - the Kolmogorov-Smirnov test followed by the ANOVA and the Kruskall-Wallis tests – to investigate the variability among the soil pH, soil moisture, soil organic matter and soil iron variables for different monitoring times and sampling procedures.
Resumo:
The Portuguese northern forests are often and severely affected by wildfires during the Summer season. These occurrences significantly affect and negatively impact all ecosystems, namely soil, fauna and flora. In order to reduce the occurrences of natural wildfires, some measures to control the availability of fuel mass are regularly implemented. Those preventive actions concern mainly prescribed burnings and vegetation pruning. This work reports on the impact of a prescribed burning on several forest soil properties, namely pH, soil moisture, organic matter content and iron content, by monitoring the soil self-recovery capabilities during a one year span. The experiments were carried out in soil cover over a natural site of Andaluzitic schist, in Gramelas, Caminha, Portugal, which was kept intact from prescribed burnings during a period of four years. Soil samples were collected from five plots at three different layers (0–3, 3–6 and 6–18) 1 day before prescribed fire and at regular intervals after the prescribed fire. This paper presents an approach where Fuzzy Boolean Nets (FBN) and Fuzzy reasoning are used to extract qualitative knowledge regarding the effect of prescribed fire burning on soil properties. FBN were chosen due to the scarcity on available quantitative data. The results showed that soil properties were affected by prescribed burning practice and were unable to recover their initial values after one year.
Resumo:
Portuguese northern forests are often and severely affected by wildfires during the summer season. Some preventive actions, such as prescribed (or controlled) burnings and clear-cut logging, are often used as a measure to reduce the occurrences of wildfires. In the particular case of Serra da Cabreira forest, due to extremely difficulties in operational field work, the prescribed (or controlled) burning technique is the the most common preventive action used to reduce the existing fuel load amount. This paper focuses on a Fuzzy Boolean Nets analysis of the changes in some forest soil properties, namely pH, moisture and organic matter content, after a controlled fire, and on the difficulties found during the sampling process and how they were overcome. The monitoring process was conducted during a three-month period in Anjos, Vieira do Minho, Portugal, an area located in a contact zone between a two-mica coarse-grained porphyritic granite and a biotite with plagioclase granite. The sampling sites were located in a spot dominated by quartzphyllite with quartz veins whose bedrock is partially altered and covered by slightly thick humus, which maintains low undergrowth vegetation.
Resumo:
Controlled fires in forest areas are frequently used in most Mediterranean countries as a preventive technique to avoid severe wildfires in summer season. In Portugal, this forest management method of fuel mass availability is also used and has shown to be beneficial as annual statistical reports confirm that the decrease of wildfires occurrence have a direct relationship with the controlled fire practice. However prescribed fire can have serious side effects in some forest soil properties. This work shows the changes that occurred in some forest soils properties after a prescribed fire action. The experiments were carried out in soil cover over a natural site of Andaluzitic schist, in Gramelas, Caminha, Portugal, that had not been burn for four years. The composed soil samples were collected from five plots at three different layers (0-3cm, 3-6cm and 6-18cm) during a three-year monitoring period after the prescribed burning. Principal Component Analysis was used to reach the presented conclusions.
Resumo:
Mathematical models and statistical analysis are key instruments in soil science scientific research as they can describe and/or predict the current state of a soil system. These tools allow us to explore the behavior of soil related processes and properties as well as to generate new hypotheses for future experimentation. A good model and analysis of soil properties variations, that permit us to extract suitable conclusions and estimating spatially correlated variables at unsampled locations, is clearly dependent on the amount and quality of data and of the robustness techniques and estimators. On the other hand, the quality of data is obviously dependent from a competent data collection procedure and from a capable laboratory analytical work. Following the standard soil sampling protocols available, soil samples should be collected according to key points such as a convenient spatial scale, landscape homogeneity (or non-homogeneity), land color, soil texture, land slope, land solar exposition. Obtaining good quality data from forest soils is predictably expensive as it is labor intensive and demands many manpower and equipment both in field work and in laboratory analysis. Also, the sampling collection scheme that should be used on a data collection procedure in forest field is not simple to design as the sampling strategies chosen are strongly dependent on soil taxonomy. In fact, a sampling grid will not be able to be followed if rocks at the predicted collecting depth are found, or no soil at all is found, or large trees bar the soil collection. Considering this, a proficient design of a soil data sampling campaign in forest field is not always a simple process and sometimes represents a truly huge challenge. In this work, we present some difficulties that have occurred during two experiments on forest soil that were conducted in order to study the spatial variation of some soil physical-chemical properties. Two different sampling protocols were considered for monitoring two types of forest soils located in NW Portugal: umbric regosol and lithosol. Two different equipments for sampling collection were also used: a manual auger and a shovel. Both scenarios were analyzed and the results achieved have allowed us to consider that monitoring forest soil in order to do some mathematical and statistical investigations needs a sampling procedure to data collection compatible to established protocols but a pre-defined grid assumption often fail when the variability of the soil property is not uniform in space. In this case, sampling grid should be conveniently adapted from one part of the landscape to another and this fact should be taken into consideration of a mathematical procedure.