966 resultados para Data mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining techniques extract repeated and useful patterns from a large data set that in turn are utilized to predict the outcome of future events. The main purpose of the research presented in this paper is to investigate data mining strategies and develop an efficient framework for multi-attribute project information analysis to predict the performance of construction projects. The research team first reviewed existing data mining algorithms, applied them to systematically analyze a large project data set collected by the survey, and finally proposed a data-mining-based decision support framework for project performance prediction. To evaluate the potential of the framework, a case study was conducted using data collected from 139 capital projects and analyzed the relationship between use of information technology and project cost performance. The study results showed that the proposed framework has potential to promote fast, easy to use, interpretable, and accurate project data analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research team recognized the value of network-level Falling Weight Deflectometer (FWD) testing to evaluate the structural condition trends of flexible pavements. However, practical limitations due to the cost of testing, traffic control and safety concerns and the ability to test a large network may discourage some agencies from conducting the network-level FWD testing. For this reason, the surrogate measure of the Structural Condition Index (SCI) is suggested for use. The main purpose of the research presented in this paper is to investigate data mining strategies and to develop a prediction method of the structural condition trends for network-level applications which does not require FWD testing. The research team first evaluated the existing and historical pavement condition, distress, ride, traffic and other data attributes in the Texas Department of Transportation (TxDOT) Pavement Maintenance Information System (PMIS), applied data mining strategies to the data, discovered useful patterns and knowledge for SCI value prediction, and finally provided a reasonable measure of pavement structural condition which is correlated to the SCI. To evaluate the performance of the developed prediction approach, a case study was conducted using the SCI data calculated from the FWD data collected on flexible pavements over a 5-year period (2005 – 09) from 354 PMIS sections representing 37 pavement sections on the Texas highway system. The preliminary study results showed that the proposed approach can be used as a supportive pavement structural index in the event when FWD deflection data is not available and help pavement managers identify the timing and appropriate treatment level of preventive maintenance activities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Key decisions at the collection, pre-processing, transformation, mining and interpretation phase of any knowledge discovery from database (KDD) process depend heavily on assumptions and theorectical perspectives relating to the type of task to be performed and characteristics of data sourced. In this article, we compare and contrast theoretical perspectives and assumptions taken in data mining exercises in the legal domain with those adopted in data mining in TCM and allopathic medicine. The juxtaposition results in insights for the application of KDD for Traditional Chinese Medicine.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis improves the process of recommending people to people in social networks using new clustering algorithms and ranking methods. The proposed system and methods are evaluated on the data collected from a real life social network. The empirical analysis of this research confirms that the proposed system and methods achieved improvements in the accuracy and efficiency of matching and recommending people, and overcome some of the problems that social matching systems usually suffer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis takes a new data mining approach for analyzing road/crash data by developing models for the whole road network and generating a crash risk profile. Roads with an elevated crash risk due to road surface friction deficit are identified. The regression tree model, predicting road segment crash rate, is applied in a novel deployment coined regression tree extrapolation that produces a skid resistance/crash rate curve. Using extrapolation allows the method to be applied across the network and cope with the high proportion of missing road surface friction values. This risk profiling method can be applied in other domains.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Road surface skid resistance has been shown to have a strong relationship to road crash risk, however, applying the current method of using investigatory levels to identify crash prone roads is problematic as they may fail in identifying risky roads outside of the norm. The proposed method analyses a complex and formerly impenetrable volume of data from roads and crashes using data mining. This method rapidly identifies roads with elevated crash-rate, potentially due to skid resistance deficit, for investigation. A hypothetical skid resistance/crash risk curve is developed for each road segment, driven by the model deployed in a novel regression tree extrapolation method. The method potentially solves the problem of missing skid resistance values which occurs during network-wide crash analysis, and allows risk assessment of the major proportion of roads without skid resistance values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis is a study for automatic discovery of text features for describing user information needs. It presents an innovative data-mining approach that discovers useful knowledge from both relevance and non-relevance feedback information. The proposed approach can largely reduce noises in discovered patterns and significantly improve the performance of text mining systems. This study provides a promising method for the study of Data Mining and Web Intelligence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis describes the development of a robust and novel prototype to address the data quality problems that relate to the dimension of outlier data. It thoroughly investigates the associated problems with regards to detecting, assessing and determining the severity of the problem of outlier data; and proposes granule-mining based alternative techniques to significantly improve the effectiveness of mining and assessing outlier data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper uses innovative content analysis techniques to map how the death of Oscar Pistorius' girlfriend, Reeva Steenkamp, was framed on Twitter conversations. Around 1.5 million posts from a two-week timeframe are analyzed with a combination of syntactic and semantic methods. This analysis is grounded in the frame analysis perspective and is different than sentiment analysis. Instead of looking for explicit evaluations, such as “he is guilty” or “he is innocent”, we showcase through the results how opinions can be identified by complex articulations of more implicit symbolic devices such as examples and metaphors repeatedly mentioned. Different frames are adopted by users as more information about the case is revealed: from a more episodic one, highly used in the very beginning, to more systemic approaches, highlighting the association of the event with urban violence, gun control issues, and violence against women. A detailed timeline of the discussions is provided.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Road asset managers are seeking analysis of the whole road network to supplement statistical analyses of small subsets of homogeneous roadway. This study outlines the use of data mining capable of analyzing the wide range of situations found on the network, with a focus on the role of skid resistance in the cause of crashes. Results from the analyses show that on non-crash-prone roads with low crash rates, skid resistance contributes only in a minor way, whereas on high-crash roadways, skid resistance often contributes significantly in the calculation of the crash rate. The results provide evidence supporting a causal relationship between skid resistance and crashes and highlight the importance of the role of skid resistance in decision making in road asset management.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research is a step forward in improving the accuracy of detecting anomaly in a data graph representing connectivity between people in an online social network. The proposed hybrid methods are based on fuzzy machine learning techniques utilising different types of structural input features. The methods are presented within a multi-layered framework which provides the full requirements needed for finding anomalies in data graphs generated from online social networks, including data modelling and analysis, labelling, and evaluation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Identifying product families has been considered as an effective way to accommodate the increasing product varieties across the diverse market niches. In this paper, we propose a novel framework to identifying product families by using a similarity measure for a common product design data BOM (Bill of Materials) based on data mining techniques such as frequent mining and clus-tering. For calculating the similarity between BOMs, a novel Extended Augmented Adjacency Matrix (EAAM) representation is introduced that consists of information not only of the content and topology but also of the fre-quent structural dependency among the various parts of a product design. These EAAM representations of BOMs are compared to calculate the similarity between products and used as a clustering input to group the product fami-lies. When applied on a real-life manufacturing data, the proposed framework outperforms a current baseline that uses orthogonal Procrustes for grouping product families.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the availability of huge number of web services, finding an appropriate Web service according to the requirements of a service consumer is still a challenge. Moreover, sometimes a single web service is unable to fully satisfy the requirements of the service consumer. In such cases, combinations of multiple inter-related web services can be utilised. This paper proposes a method that first utilises a semantic kernel model to find related services and then models these related Web services as nodes of a graph. An all-pair shortest-path algorithm is applied to find the best compositions of Web services that are semantically related to the service consumer requirement. The recommendation of individual and composite Web services composition for a service request is finally made. Empirical evaluation confirms that the proposed method significantly improves the accuracy of service discovery in comparison to traditional keyword-based discovery methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Protein adsorption at solid-liquid interfaces is critical to many applications, including biomaterials, protein microarrays and lab-on-a-chip devices. Despite this general interest, and a large amount of research in the last half a century, protein adsorption cannot be predicted with an engineering level, design-orientated accuracy. Here we describe a Biomolecular Adsorption Database (BAD), freely available online, which archives the published protein adsorption data. Piecewise linear regression with breakpoint applied to the data in the BAD suggests that the input variables to protein adsorption, i.e., protein concentration in solution; protein descriptors derived from primary structure (number of residues, global protein hydrophobicity and range of amino acid hydrophobicity, isoelectric point); surface descriptors (contact angle); and fluid environment descriptors (pH, ionic strength), correlate well with the output variable-the protein concentration on the surface. Furthermore, neural network analysis revealed that the size of the BAD makes it sufficiently representative, with a neural network-based predictive error of 5% or less. Interestingly, a consistently better fit is obtained if the BAD is divided in two separate sub-sets representing protein adsorption on hydrophilic and hydrophobic surfaces, respectively. Based on these findings, selected entries from the BAD have been used to construct neural network-based estimation routines, which predict the amount of adsorbed protein, the thickness of the adsorbed layer and the surface tension of the protein-covered surface. While the BAD is of general interest, the prediction of the thickness and the surface tension of the protein-covered layers are of particular relevance to the design of microfluidics devices.