137 resultados para sequential extraction
Resumo:
We identify relation completion (RC) as one recurring problem that is central to the success of novel big data applications such as Entity Reconstruction and Data Enrichment. Given a semantic relation, RC attempts at linking entity pairs between two entity lists under the relation. To accomplish the RC goals, we propose to formulate search queries for each query entity α based on some auxiliary information, so that to detect its target entity β from the set of retrieved documents. For instance, a pattern-based method (PaRE) uses extracted patterns as the auxiliary information in formulating search queries. However, high-quality patterns may decrease the probability of finding suitable target entities. As an alternative, we propose CoRE method that uses context terms learned surrounding the expression of a relation as the auxiliary information in formulating queries. The experimental results based on several real-world web data collections demonstrate that CoRE reaches a much higher accuracy than PaRE for the purpose of RC.
Resumo:
This chapter examines two core dimensions of women’s gendered experiences of mining in Australia and more specifically in Western Australia (WA). First, the chapter explores what has been and continues to be women’s principal relationship to mining encapsulated in the social and cultural identity of the ‘mining wife’ and, more recently, ‘fly-in/fly-out (FIFO) wife’. Second, the chapter addresses the fraught emergence of women as mineworkers. As the research presented in this chapter makes clear, the human cost of developmentalism was and continues to be deeply gendered.
Resumo:
We investigated memories of room-sized spatial layouts learned by sequentially or simultaneously viewing objects from a stationary position. In three experiments, sequential viewing (one or two objects at a time) yielded subsequent memory performance that was equivalent or superior to simultaneous viewing of all objects, even though sequential viewing lacked direct access to the entire layout. This finding was replicated by replacing sequential viewing with directed viewing in which all objects were presented simultaneously and participants’ attention was externally focused on each object sequentially, indicating that the advantage of sequential viewing over simultaneous viewing may have originated from focal attention to individual object locations. These results suggest that memory representation of object-to-object relations can be constructed efficiently by encoding each object location separately, when those locations are defined within a single spatial reference system. These findings highlight the importance of considering object presentation procedures when studying spatial learning mechanisms.
Resumo:
Objective To evaluate methods for monitoring monthly aggregated hospital adverse event data that display clustering, non-linear trends and possible autocorrelation. Design Retrospective audit. Setting The Northern Hospital, Melbourne, Australia. Participants 171,059 patients admitted between January 2001 and December 2006. Measurements The analysis is illustrated with 72 months of patient fall injury data using a modified Shewhart U control chart, and charts derived from a quasi-Poisson generalised linear model (GLM) and a generalised additive mixed model (GAMM) that included an approximate upper control limit. Results The data were overdispersed and displayed a downward trend and possible autocorrelation. The downward trend was followed by a predictable period after December 2003. The GLM-estimated incidence rate ratio was 0.98 (95% CI 0.98 to 0.99) per month. The GAMM-fitted count fell from 12.67 (95% CI 10.05 to 15.97) in January 2001 to 5.23 (95% CI 3.82 to 7.15) in December 2006 (p<0.001). The corresponding values for the GLM were 11.9 and 3.94. Residual plots suggested that the GLM underestimated the rate at the beginning and end of the series and overestimated it in the middle. The data suggested a more rapid rate fall before 2004 and a steady state thereafter, a pattern reflected in the GAMM chart. The approximate upper two-sigma equivalent control limit in the GLM and GAMM charts identified 2 months that showed possible special-cause variation. Conclusion Charts based on GAMM analysis are a suitable alternative to Shewhart U control charts with these data.
Resumo:
Sparse optical flow algorithms, such as the Lucas-Kanade approach, provide more robustness to noise than dense optical flow algorithms and are the preferred approach in many scenarios. Sparse optical flow algorithms estimate the displacement for a selected number of pixels in the image. These pixels can be chosen randomly. However, pixels in regions with more variance between the neighbours will produce more reliable displacement estimates. The selected pixel locations should therefore be chosen wisely. In this study, the suitability of Harris corners, Shi-Tomasi's “Good features to track", SIFT and SURF interest point extractors, Canny edges, and random pixel selection for the purpose of frame-by-frame tracking using a pyramidical Lucas-Kanade algorithm is investigated. The evaluation considers the important factors of processing time, feature count, and feature trackability in indoor and outdoor scenarios using ground vehicles and unmanned aerial vehicles, and for the purpose of visual odometry estimation.
Resumo:
In this paper we present a novel scheme for improving speaker diarization by making use of repeating speakers across multiple recordings within a large corpus. We call this technique speaker re-diarization and demonstrate that it is possible to reuse the initial speaker-linked diarization outputs to boost diarization accuracy within individual recordings. We first propose and evaluate two novel re-diarization techniques. We demonstrate their complementary characteristics and fuse the two techniques to successfully conduct speaker re-diarization across the SAIVT-BNEWS corpus of Australian broadcast data. This corpus contains recurring speakers in various independent recordings that need to be linked across the dataset. We show that our speaker re-diarization approach can provide a relative improvement of 23% in diarization error rate (DER), over the original diarization results, as well as improve the estimated number of speakers and the cluster purity and coverage metrics.
Resumo:
Commercially viable carbon–neutral biodiesel production from microalgae has potential for replacing depleting petroleum diesel. The process of biodiesel production from microalgae involves harvesting, drying and extraction of lipids which are energy- and cost-intensive processes. The development of effective large-scale lipid extraction processes which overcome the complexity of microalgae cell structure is considered one of the most vital requirements for commercial production. Thus the aim of this work was to investigate suitable extraction methods with optimised conditions to progress opportunities for sustainable microalgal biodiesel production. In this study, the green microalgal species consortium, Tarong polyculture was used to investigate lipid extraction with hexane (solvent) under high pressure and variable temperature and biomass moisture conditions using an Accelerated Solvent Extraction (ASE) method. The performance of high pressure solvent extraction was examined over a range of different process and sample conditions (dry biomass to water ratios (DBWRs): 100%, 75%, 50% and 25% and temperatures from 70 to 120 ºC, process time 5–15 min). Maximum total lipid yields were achieved at 50% and 75% sample dryness at temperatures of 90–120 ºC. We show that individual fatty acids (Palmitic acid C16:0; Stearic acid C18:0; Oleic acid C18:1; Linolenic acid C18:3) extraction optima are influenced by temperature and sample dryness, consequently affecting microalgal biodiesel quality parameters. Higher heating values and kinematic viscosity were compliant with biodiesel quality standards under all extraction conditions used. Our results indicate that biodiesel quality can be positively manipulated by selecting process extraction conditions that favour extraction of saturated and mono-unsaturated fatty acids over optimal extraction conditions for polyunsaturated fatty acids, yielding positive effects on cetane number and iodine values. Exceeding biodiesel standards for these two parameters opens blending opportunities with biodiesels that fall outside the minimal cetane and maximal iodine values.
Resumo:
Organic compounds in Australian coal seam gas produced water (CSG water) are poorly understood despite their environmental contamination potential. In this study, the presence of some organic substances is identified from government-held CSG water-quality data from the Bowen and Surat Basins, Queensland. These records revealed the presence of polycyclic aromatic hydrocarbons (PAHs) in 27% of samples of CSG water from the Walloon Coal Measures at concentrations <1 µg/L, and it is likely these compounds leached from in situ coals. PAHs identified from wells include naphthalene, phenanthrene, chrysene and dibenz[a,h]anthracene. In addition, the likelihood of coal-derived organic compounds leaching to groundwater is assessed by undertaking toxicity leaching experiments using coal rank and water chemistry as variables. These tests suggest higher molecular weight PAHs (including benzo[a]pyrene) leach from higher rank coals, whereas lower molecular weight PAHs leach at greater concentrations from lower rank coal. Some of the identified organic compounds have carcinogenic or health risk potential, but they are unlikely to be acutely toxic at the observed concentrations which are almost negligible (largely due to the hydrophobicity of such compounds). Hence, this study will be useful to practitioners assessing CSG water related environmental and health risk.
Resumo:
A computationally efficient sequential Monte Carlo algorithm is proposed for the sequential design of experiments for the collection of block data described by mixed effects models. The difficulty in applying a sequential Monte Carlo algorithm in such settings is the need to evaluate the observed data likelihood, which is typically intractable for all but linear Gaussian models. To overcome this difficulty, we propose to unbiasedly estimate the likelihood, and perform inference and make decisions based on an exact-approximate algorithm. Two estimates are proposed: using Quasi Monte Carlo methods and using the Laplace approximation with importance sampling. Both of these approaches can be computationally expensive, so we propose exploiting parallel computational architectures to ensure designs can be derived in a timely manner. We also extend our approach to allow for model uncertainty. This research is motivated by important pharmacological studies related to the treatment of critically ill patients.
Resumo:
The strain data acquired from structural health monitoring (SHM) systems play an important role in the state monitoring and damage identification of bridges. Due to the environmental complexity of civil structures, a better understanding of the actual strain data will help filling the gap between theoretical/laboratorial results and practical application. In the study, the multi-scale features of strain response are first revealed after abundant investigations on the actual data from two typical long-span bridges. Results show that, strain types at the three typical temporal scales of 10^5, 10^2 and 10^0 sec are caused by temperature change, trains and heavy trucks, and have their respective cut-off frequency in the order of 10^-2, 10^-1 and 10^0 Hz. Multi-resolution analysis and wavelet shrinkage are applied for separating and extracting these strain types. During the above process, two methods for determining thresholds are introduced. The excellent ability of wavelet transform on simultaneously time-frequency analysis leads to an effective information extraction. After extraction, the strain data will be compressed at an attractive ratio. This research may contribute to a further understanding of actual strain data of long-span bridges; also, the proposed extracting methodology is applicable on actual SHM systems.
Resumo:
Text is the main method of communicating information in the digital age. Messages, blogs, news articles, reviews, and opinionated information abounds on the Internet. People commonly purchase products online and post their opinions about purchased items. This feedback is displayed publicly to assist others with their purchasing decisions, creating the need for a mechanism with which to extract and summarize useful information for enhancing the decision-making process. Our contribution is to improve the accuracy of extraction by combining different techniques from three major areas, named Data Mining, Natural Language Processing techniques and Ontologies. The proposed framework sequentially mines product’s aspects and users’ opinions, groups representative aspects by similarity, and generates an output summary. This paper focuses on the task of extracting product aspects and users’ opinions by extracting all possible aspects and opinions from reviews using natural language, ontology, and frequent “tag” sets. The proposed framework, when compared with an existing baseline model, yielded promising results.
Resumo:
AN ENGINEERING Workshop was held from 21 to 24 November 2006 in Veracruz, Mexico. Forty delegates from 12 countries attended the workshop on theory and practice of milling and diffusion extraction. This report provides a general overview of activities undertaken during that workshop which consisted of five technical sessions over two days with presentations and discussions plus two days of field and factory visits. Topics covered during the technical sessions included: power transmissions, cane preparation, diffusers, mills, and a comparison of milling and diffusion.
Resumo:
BACKGROUND: The use of salivary diagnostics is increasing because of its noninvasiveness, ease of sampling, and the relatively low risk of contracting infectious organisms. Saliva has been used as a biological fluid to identify and validate RNA targets in head and neck cancer patients. The goal of this study was to develop a robust, easy, and cost-effective method for isolating high yields of total RNA from saliva for downstream expression studies. METHODS: Oral whole saliva (200 mu L) was collected from healthy controls (n = 6) and from patients with head and neck cancer (n = 8). The method developed in-house used QIAzol lysis reagent (Qiagen) to extract RNA from saliva (both cell-free supernatants and cell pellets), followed by isopropyl alcohol precipitation, cDNA synthesis, and real-time PCR analyses for the genes encoding beta-actin ("housekeeping" gene) and histatin (a salivary gland-specific gene). RESULTS: The in-house QIAzol lysis reagent produced a high yield of total RNA (0.89 -7.1 mu g) from saliva (cell-free saliva and cell pellet) after DNase treatment. The ratio of the absorbance measured at 260 nm to that at 280 nm ranged from 1.6 to 1.9. The commercial kit produced a 10-fold lower RNA yield. Using our method with the QIAzol lysis reagent, we were also able to isolate RNA from archived saliva samples that had been stored without RNase inhibitors at -80 degrees C for >2 years. CONCLUSIONS: Our in-house QIAzol method is robust, is simple, provides RNA at high yields, and can be implemented to allow saliva transcriptomic studies to be translated into a clinical setting.
Resumo:
Double-pulse tests are commonly used as a method for assessing the switching performance of power semiconductor switches in a clamped inductive switching application. Data generated from these tests are typically in the form of sampled waveform data captured using an oscilloscope. In cases where it is of interest to explore a multi-dimensional parameter space and corresponding result space it is necessary to reduce the data into key performance metrics via feature extraction. This paper presents techniques for the extraction of switching performance metrics from sampled double-pulse waveform data. The reported techniques are applied to experimental data from characterisation of a cascode gate drive circuit applied to power MOSFETs.