962 resultados para Clustering a large document collection


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Using data from the H I Parkes All Sky Survey (HIPASS), we have searched for neutral hydrogen in galaxies in a region similar to25x25 deg(2) centred on NGC 1399, the nominal centre of the Fornax cluster. Within a velocity search range of 300-3700 km s(-1) and to a 3sigma lower flux limit of similar to40 mJy, 110 galaxies with H I emission were detected, one of which is previously uncatalogued. None of the detections has early-type morphology. Previously unknown velocities for 14 galaxies have been determined, with a further four velocity measurements being significantly dissimilar to published values. Identification of an optical counterpart is relatively unambiguous for more than similar to90 per cent of our H I galaxies. The galaxies appear to be embedded in a sheet at the cluster velocity which extends for more than 30degrees across the search area. At the nominal cluster distance of similar to20 Mpc, this corresponds to an elongated structure more than 10 Mpc in extent. A velocity gradient across the structure is detected, with radial velocities increasing by similar to500 km s(-1) from south-east to north-west. The clustering of galaxies evident in optical surveys is only weakly suggested in the spatial distribution of our H I detections. Of 62 H I detections within a 10degrees projected radius of the cluster centre, only two are within the core region (projected radius

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aim: To test the efficacy of a comprehensive health assessment using the CHAP tool in adults with an intellectual disability (ID). Method: A cluster randomised control design was used. The intervention group received the CHAP, while the control group received usual care. This tool directed carers to gather a health history, which was reviewed by the person’s general practitioner (GP) who completed a medical examination and a healthcare plan. The tool acted as an advocacy tool, a ticket-of-entry to the GPs surgery and educated the GP and the caregiver about the deficits in the healthcare of adults with ID. The healthcare of the participants was followed for one-year after intervention by the collection of data from GP and service providers’ notes. Also interviews were performed with all those involved. Results: We obtained a representative sample of  adults with ID (RR%). We found the intervention group received a significant increase in many health promotion/disease prevention activities e.g. hearing screening was  times and a Pap smear was  times more likely to have occurred in the intervention groups.We also found a trend towards earlier detection of disease. Conclusions: The CHAP process improves the provision of health screening/promotion activities and should be implemented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: To document outcome and to investigate patterns of physical and psychosocial recovery in the first year following severe traumatic brain injury (TBI) in an Australian patient sample. Design: A longitudinal prospective study of a cohort of patients, with data collection at 3, 6, 9, and 12 months post injury. Setting: A head injury rehabilitation unit in a large metropolitan public hospital. Patients: A sample of 55 patients selected from 120 consecutive admissions with severe TBI. Patients who were more than 3 months post injury on admission, who remained confused, or who had severe communication deficits or a previous neurologic disorder were excluded. Interventions: All subjects participated in a multidisciplinary inpatient rehabilitation program, followed by varied participation in outpatient rehabilitation and community-based sen ices. Main Outcome Measures: The Sickness impact Profile (SIP) provided physical, psychosocial, and total dysfunction scores at each follow-up. Outcome at 1 year was measured by the Disability Rating Scale. Results: Multivariate analysis of variance indicated that the linear trend of recovery over time was less for psychosocial dysfunction than for physical dysfunction (F(1,51) = 5.87, P < .02). One rear post injury, 22% of subjects had returned to their previous level of employability, and 42% were able to live independently. Conclusions: Recovery from TBI in this Australian sample followed a pattern similar to that observed in other countries, with psychosocial dysfunction being more persistent. Self-report measures such as the SIP in TBI research are limited by problems of diminished self-awareness.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Recent studies support an important role for human papillomavirus (HPV) in a subgroup of head and neck squamous cell carcinomas (HNSCC). We have evaluated the HPV deoxyribonucleic acid (DNA) prevalence as well as the association between serological response to HPV infection and HNSCC in two distinct populations from Central Europe (CE) and Latin America (LA). Methods Cases (n = 2214) and controls (n = 3319) were recruited from 1998 to 2003, using a similar protocol including questionnaire and blood sample collection. Tumour DNA from 196 fresh tissue biopsies was analysed for multiple HPV types followed by an HPV type-specific polymerase chain reaction (PCR) protocol towards the E7 gene from HPV 16. Using multiplex serology, serum samples were analysed for antibodies to 17 HPV types. Statistical analysis included the estimation of adjusted odds ratios (ORs) and the respective 95% confidence intervals (CIs). Results HPV16 E7 DNA prevalence among cases was 3.1% (6/196), including 4.4% in the oropharynx (3/68), 3.8% in the hypopharynx/larynx (3/78) and 0% among 50 cases of oral cavity carcinomas. Positivity for both HPV16 E6 and E7 antibodies was associated with a very high risk of oropharyngeal cancer (OR = 179, 95% CI 35.8-899) and hypopharyngeal/laryngeal cancer (OR = 14.9, 95% CI 2.92-76.1). Conclusions A very low prevalence of HPV DNA and serum antibodies was observed among cases in both CE and LA. The proportion of head and neck cancer caused by HPV may vary substantially between different geographical regions and studies that are designed to evaluate the impact of HPV vaccination on HNSCC need to consider this heterogeneity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The effect of number of samples and selection of data for analysis on the calculation of surface motor unit potential (SMUP) size in the statistical method of motor unit number estimates (MUNE) was determined in 10 normal subjects and 10 with amyotrophic lateral sclerosis (ALS). We recorded 500 sequential compound muscle action potentials (CMAPs) at three different stable stimulus intensities (10–50% of maximal CMAP). Estimated mean SMUP sizes were calculated using Poisson statistical assumptions from the variance of 500 sequential CMAP obtained at each stimulus intensity. The results with the 500 data points were compared with smaller subsets from the same data set. The results using a range of 50–80% of the 500 data points were compared with the full 500. The effect of restricting analysis to data between 5–20% of the CMAP and to standard deviation limits was also assessed. No differences in mean SMUP size were found with stimulus intensity or use of different ranges of data. Consistency was improved with a greater sample number. Data within 5% of CMAP size gave both increased consistency and reduced mean SMUP size in many subjects, but excluded valid responses present at that stimulus intensity. These changes were more prominent in ALS patients in whom the presence of isolated SMUP responses was a striking difference from normal subjects. Noise, spurious data, and large SMUP limited the Poisson assumptions. When these factors are considered, consistent statistical MUNE can be calculated from a continuous sequence of data points. A 2 to 2.5 SD or 10% window are reasonable methods of limiting data for analysis. Muscle Nerve 27: 320–331, 2003

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses a document discovery tool based on Conceptual Clustering by Formal Concept Analysis. The program allows users to navigate e-mail using a visual lattice metaphor rather than a tree. It implements a virtual. le structure over e-mail where files and entire directories can appear in multiple positions. The content and shape of the lattice formed by the conceptual ontology can assist in e-mail discovery. The system described provides more flexibility in retrieving stored e-mails than what is normally available in e-mail clients. The paper discusses how conceptual ontologies can leverage traditional document retrieval systems and aid knowledge discovery in document collections.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A series of large area single layers and heterojunction cells in the assembly glass/ZnO:Al/p (SixC1-x:H)/i (Si:H)/n (SixC1-x:H)/Al (0

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A series of large area single layers and glass/ZnO:AVp(SixC1-x:H)/i(Si:H)/n(SixC1-x:H)/AI (0 < x < 1) heterojunction cells were produced by plasma-enhanced chemical vapour deposition (PE-CVD) at low temperature. Junction properties, carrier transport and photogeneration are investigated from dark and illuminated current-voltage (J-V) and capacitance-voltage (C-V) characteristics. For the heterojunction cells atypical J-V characteristics under different illumination conditions are observed leading to poor fill factors. High series resistances around 106 Q are also measured. These experimental results were used as a basis for the numerical simulation of the energy band diagram, and the electrical field distribution of the structures. Further comparison with the sensor performance gave satisfactory agreement. Results show that the conduction band offset is the most limiting parameter for the optimal collection of the photogenerated carriers. As the optical gap increases and the conductivity of the doped layers decreases, the transport mechanism changes from a drift to a diffusion-limited process.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The growing importance and influence of new resources connected to the power systems has caused many changes in their operation. Environmental policies and several well know advantages have been made renewable based energy resources largely disseminated. These resources, including Distributed Generation (DG), are being connected to lower voltage levels where Demand Response (DR) must be considered too. These changes increase the complexity of the system operation due to both new operational constraints and amounts of data to be processed. Virtual Power Players (VPP) are entities able to manage these resources. Addressing these issues, this paper proposes a methodology to support VPP actions when these act as a Curtailment Service Provider (CSP) that provides DR capacity to a DR program declared by the Independent System Operator (ISO) or by the VPP itself. The amount of DR capacity that the CSP can assure is determined using data mining techniques applied to a database which is obtained for a large set of operation scenarios. The paper includes a case study based on 27,000 scenarios considering a diversity of distributed resources in a 33 bus distribution network.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

TPM Vol. 21, No. 4, December 2014, 435-447 – Special Issue © 2014 Cises.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering ensemble methods produce a consensus partition of a set of data points by combining the results of a collection of base clustering algorithms. In the evidence accumulation clustering (EAC) paradigm, the clustering ensemble is transformed into a pairwise co-association matrix, thus avoiding the label correspondence problem, which is intrinsic to other clustering ensemble schemes. In this paper, we propose a consensus clustering approach based on the EAC paradigm, which is not limited to crisp partitions and fully exploits the nature of the co-association matrix. Our solution determines probabilistic assignments of data points to clusters by minimizing a Bregman divergence between the observed co-association frequencies and the corresponding co-occurrence probabilities expressed as functions of the unknown assignments. We additionally propose an optimization algorithm to find a solution under any double-convex Bregman divergence. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Evidence Accumulation Clustering (EAC) paradigm is a clustering ensemble method which derives a consensus partition from a collection of base clusterings obtained using different algorithms. It collects from the partitions in the ensemble a set of pairwise observations about the co-occurrence of objects in a same cluster and it uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix. The Probabilistic Evidence Accumulation for Clustering Ensembles (PEACE) algorithm is a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix based on a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters. In this paper we extend the PEACE algorithm by deriving a consensus solution according to a MAP approach with Dirichlet priors defined for the unknown probabilistic cluster assignments. In particular, we study the positive regularization effect of Dirichlet priors on the final consensus solution with both synthetic and real benchmark data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article deals with a real-life waste collection routing problem. To efficiently plan waste collection, large municipalities may be partitioned into convenient sectors and only then can routing problems be solved in each sector. Three diverse situations are described, resulting in three different new models. In the first situation, there is a single point of waste disposal from where the vehicles depart and to where they return. The vehicle fleet comprises three types of collection vehicles. In the second, the garage does not match any of the points of disposal. The vehicle is unique and the points of disposal (landfills or transfer stations) may have limitations in terms of the number of visits per day. In the third situation, disposal points are multiple (they do not coincide with the garage), they are limited in the number of visits, and the fleet is composed of two types of vehicles. Computational results based not only on instances adapted from the literature but also on real cases are presented and analyzed. In particular, the results also show the effectiveness of combining sectorization and routing to solve waste collection problems.