962 resultados para Clustering a large document collection


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Globalization involves several facility location problems that need to be handled at large scale. Location Allocation (LA) is a combinatorial problem in which the distance among points in the data space matter. Precisely, taking advantage of the distance property of the domain we exploit the capability of clustering techniques to partition the data space in order to convert an initial large LA problem into several simpler LA problems. Particularly, our motivation problem involves a huge geographical area that can be partitioned under overall conditions. We present different types of clustering techniques and then we perform a cluster analysis over our dataset in order to partition it. After that, we solve the LA problem applying simulated annealing algorithm to the clustered and non-clustered data in order to work out how profitable is the clustering and which of the presented methods is the most suitable

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Abstract In this study, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) was used as a rapid method to identify yeasts isolated from patients in Tunisian hospitals. When identification could not be exstablished with this procedure, sequencing of the internal transcribed spacer with 5.8S ribosomal DNA (rDNA) (ITS1-5.8S-ITS2) and D1/D2 domain of large-subunit (LSU rDNA) were employed as a molecular approach for species differentiation. Candida albicans was the dominant species (43.37% of all cases), followed by C. glabrata (16.55%), C. parapsilosis (13.23%), C. tropicalis (11.34%), C. dubliniensis (4.96%), and other species more rarely encountered in human diseases such as C. krusei, C. metapsilosis, C. lusitaniae, C. kefyr, C. palmioleophila, C. guilliermondii, C. intermedia, C. orthopsilosis, and C. utilis. In addition, other yeast species were obtained including Saccharomyces cerevisiae, Debaryomyces hansenii (anamorph known as C. famata), Hanseniaspora opuntiae, Kodamaea ohmeri, Pichia caribbica (anamorph known as C. fermentati), Trichosporon spp. and finally a novel yeast species, C. tunisiensis. The in vitro antifungal activities of fluconazole and voriconazole were determined by the agar disk diffusion test and Etest, while the susceptibility to additional antifungal agents was determined with the Sensititre YeastOne system. Our results showed low incidence of azole resistance in C. albicans (0.54%), C. tropicalis (2.08%) and C. glabrata (4.28%). In addition, caspofungin was active against most isolates of the collection with the exception of two K. ohmeri isolates. This is the first report to describe caspofungin resistant isolates of this yeast.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present a new framework for large-scale data clustering. The main idea is to modify functional dimensionality reduction techniques to directly optimize over discrete labels using stochastic gradient descent. Compared to methods like spectral clustering our approach solves a single optimization problem, rather than an ad-hoc two-stage optimization approach, does not require a matrix inversion, can easily encode prior knowledge in the set of implementable functions, and does not have an ?out-of-sample? problem. Experimental results on both artificial and real-world datasets show the usefulness of our approach.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

La biologie de la conservation est communément associée à la protection de petites populations menacées d?extinction. Pourtant, il peut également être nécessaire de soumettre à gestion des populations surabondantes ou susceptibles d?une trop grande expansion, dans le but de prévenir les effets néfastes de la surpopulation. Du fait des différences tant quantitatives que qualitatives entre protection des petites populations et contrôle des grandes, il est nécessaire de disposer de modèles et de méthodes distinctes. L?objectif de ce travail a été de développer des modèles prédictifs de la dynamique des grandes populations, ainsi que des logiciels permettant de calculer les paramètres de ces modèles et de tester des scénarios de gestion. Le cas du Bouquetin des Alpes (Capra ibex ibex) - en forte expansion en Suisse depuis sa réintroduction au début du XXème siècle - servit d?exemple. Cette tâche fut accomplie en trois étapes : En premier lieu, un modèle de dynamique locale, spécifique au Bouquetin, fut développé : le modèle sous-jacent - structuré en classes d?âge et de sexe - est basé sur une matrice de Leslie à laquelle ont été ajoutées la densité-dépendance, la stochasticité environnementale et la chasse de régulation. Ce modèle fut implémenté dans un logiciel d?aide à la gestion - nommé SIM-Ibex - permettant la maintenance de données de recensements, l?estimation automatisée des paramètres, ainsi que l?ajustement et la simulation de stratégies de régulation. Mais la dynamique d?une population est influencée non seulement par des facteurs démographiques, mais aussi par la dispersion et la colonisation de nouveaux espaces. Il est donc nécessaire de pouvoir modéliser tant la qualité de l?habitat que les obstacles à la dispersion. Une collection de logiciels - nommée Biomapper - fut donc développée. Son module central est basé sur l?Analyse Factorielle de la Niche Ecologique (ENFA) dont le principe est de calculer des facteurs de marginalité et de spécialisation de la niche écologique à partir de prédicteurs environnementaux et de données d?observation de l?espèce. Tous les modules de Biomapper sont liés aux Systèmes d?Information Géographiques (SIG) ; ils couvrent toutes les opérations d?importation des données, préparation des prédicteurs, ENFA et calcul de la carte de qualité d?habitat, validation et traitement des résultats ; un module permet également de cartographier les barrières et les corridors de dispersion. Le domaine d?application de l?ENFA fut exploré par le biais d?une distribution d?espèce virtuelle. La comparaison à une méthode couramment utilisée pour construire des cartes de qualité d?habitat, le Modèle Linéaire Généralisé (GLM), montra qu?elle était particulièrement adaptée pour les espèces cryptiques ou en cours d?expansion. Les informations sur la démographie et le paysage furent finalement fusionnées en un modèle global. Une approche basée sur un automate cellulaire fut choisie, tant pour satisfaire aux contraintes du réalisme de la modélisation du paysage qu?à celles imposées par les grandes populations : la zone d?étude est modélisée par un pavage de cellules hexagonales, chacune caractérisée par des propriétés - une capacité de soutien et six taux d?imperméabilité quantifiant les échanges entre cellules adjacentes - et une variable, la densité de la population. Cette dernière varie en fonction de la reproduction et de la survie locale, ainsi que de la dispersion, sous l?influence de la densité-dépendance et de la stochasticité. Un logiciel - nommé HexaSpace - fut développé pour accomplir deux fonctions : 1° Calibrer l?automate sur la base de modèles de dynamique (par ex. calculés par SIM-Ibex) et d?une carte de qualité d?habitat (par ex. calculée par Biomapper). 2° Faire tourner des simulations. Il permet d?étudier l?expansion d?une espèce envahisseuse dans un paysage complexe composé de zones de qualité diverses et comportant des obstacles à la dispersion. Ce modèle fut appliqué à l?histoire de la réintroduction du Bouquetin dans les Alpes bernoises (Suisse). SIM-Ibex est actuellement utilisé par les gestionnaires de la faune et par les inspecteurs du gouvernement pour préparer et contrôler les plans de tir. Biomapper a été appliqué à plusieurs espèces (tant végétales qu?animales) à travers le Monde. De même, même si HexaSpace fut initialement conçu pour des espèces animales terrestres, il pourrait aisément être étndu à la propagation de plantes ou à la dispersion d?animaux volants. Ces logiciels étant conçus pour, à partir de données brutes, construire un modèle réaliste complexe, et du fait qu?ils sont dotés d?une interface d?utilisation intuitive, ils sont susceptibles de nombreuses applications en biologie de la conservation. En outre, ces approches peuvent également s?appliquer à des questions théoriques dans les domaines de l?écologie des populations et du paysage.<br/><br/>Conservation biology is commonly associated to small and endangered population protection. Nevertheless, large or potentially large populations may also need human management to prevent negative effects of overpopulation. As there are both qualitative and quantitative differences between small population protection and large population controlling, distinct methods and models are needed. The aim of this work was to develop theoretical models to predict large population dynamics, as well as computer tools to assess the parameters of these models and to test management scenarios. The alpine Ibex (Capra ibex ibex) - which experienced a spectacular increase since its reintroduction in Switzerland at the beginning of the 20th century - was used as paradigm species. This task was achieved in three steps: A local population dynamics model was first developed specifically for Ibex: the underlying age- and sex-structured model is based on a Leslie matrix approach with addition of density-dependence, environmental stochasticity and culling. This model was implemented into a management-support software - named SIM-Ibex - allowing census data maintenance, parameter automated assessment and culling strategies tuning and simulating. However population dynamics is driven not only by demographic factors, but also by dispersal and colonisation of new areas. Habitat suitability and obstacles modelling had therefore to be addressed. Thus, a software package - named Biomapper - was developed. Its central module is based on the Ecological Niche Factor Analysis (ENFA) whose principle is to compute niche marginality and specialisation factors from a set of environmental predictors and species presence data. All Biomapper modules are linked to Geographic Information Systems (GIS); they cover all operations of data importation, predictor preparation, ENFA and habitat suitability map computation, results validation and further processing; a module also allows mapping of dispersal barriers and corridors. ENFA application domain was then explored by means of a simulated species distribution. It was compared to a common habitat suitability assessing method, the Generalised Linear Model (GLM), and was proven better suited for spreading or cryptic species. Demography and landscape informations were finally merged into a global model. To cope with landscape realism and technical constraints of large population modelling, a cellular automaton approach was chosen: the study area is modelled by a lattice of hexagonal cells, each one characterised by a few fixed properties - a carrying capacity and six impermeability rates quantifying exchanges between adjacent cells - and one variable, population density. The later varies according to local reproduction/survival and dispersal dynamics, modified by density-dependence and stochasticity. A software - named HexaSpace - was developed, which achieves two functions: 1° Calibrating the automaton on the base of local population dynamics models (e.g., computed by SIM-Ibex) and a habitat suitability map (e.g. computed by Biomapper). 2° Running simulations. It allows studying the spreading of an invading species across a complex landscape made of variously suitable areas and dispersal barriers. This model was applied to the history of Ibex reintroduction in Bernese Alps (Switzerland). SIM-Ibex is now used by governmental wildlife managers to prepare and verify culling plans. Biomapper has been applied to several species (both plants and animals) all around the World. In the same way, whilst HexaSpace was originally designed for terrestrial animal species, it could be easily extended to model plant propagation or flying animals dispersal. As these softwares were designed to proceed from low-level data to build a complex realistic model and as they benefit from an intuitive user-interface, they may have many conservation applications. Moreover, theoretical questions in the fields of population and landscape ecology might also be addressed by these approaches.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Some recent winters in Western Europe have been characterized by the occurrence of multiple extratropical cyclones following a similar path. The occurrence of such cyclone clusters leads to large socio-economic impacts due to damaging winds, storm surges, and floods. Recent studies have statistically characterized the clustering of extratropical cyclones over the North Atlantic and Europe and hypothesized potential physical mechanisms responsible for their formation. Here we analyze 4 months characterized by multiple cyclones over Western Europe (February 1990, January 1993, December 1999, and January 2007). The evolution of the eddy driven jet stream, Rossby wave-breaking, and upstream/downstream cyclone development are investigated to infer the role of the large-scale flow and to determine if clustered cyclones are related to each other. Results suggest that optimal conditions for the occurrence of cyclone clusters are provided by a recurrent extension of an intensified eddy driven jet toward Western Europe lasting at least 1 week. Multiple Rossby wave-breaking occurrences on both the poleward and equatorward flanks of the jet contribute to the development of these anomalous large-scale conditions. The analysis of the daily weather charts reveals that upstream cyclone development (secondary cyclogenesis, where new cyclones are generated on the trailing fronts of mature cyclones) is strongly related to cyclone clustering, with multiple cyclones developing on a single jet streak. The present analysis permits a deeper understanding of the physical reasons leading to the occurrence of cyclone families over the North Atlantic, enabling a better estimation of the associated cumulative risk over Europe.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels have to be built using only the terms in the documents of the collection. This paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical document clusters. The candidates are processed by a classical method to generate the labels. The idea of the proposed method is to process each parent-child relationship of the nodes as an antecedent-consequent relationship of association rules. The experimental results show that the proposed method can improve the precision and recall of labels obtained by classical methods. © 2010 Springer-Verlag.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The post-processing of association rules is a difficult task, since a large number of patterns can be obtained. Many approaches have been developed to overcome this problem, as objective measures and clustering, which are respectively used to: (i) highlight the potentially interesting knowledge in domain; (ii) structure the domain, organizing the rules in groups that contain, somehow, similar knowledge. However, objective measures don't reduce nor organize the collection of rules, making the understanding of the domain difficult. On the other hand, clustering doesn't reduce the exploration space nor direct the user to find interesting knowledge, making the search for relevant knowledge not so easy. This work proposes the PAR-COM (Post-processing Association Rules with Clustering and Objective Measures) methodology that, combining clustering and objective measures, reduces the association rule exploration space directing the user to what is potentially interesting. Thereby, PAR-COM minimizes the user's effort during the post-processing process.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels must be built using all the terms in the documents of the collection. This paper presents the SeCLAR method, which explores the use of association rules in the selection of good candidates for labels of hierarchical document clusters. The purpose of this method is to select a subset of terms by exploring the relationship among the terms of each document. Thus, these candidates can be processed by a classical method to generate the labels. An experimental study demonstrates the potential of the proposed approach to improve the precision and recall of labels obtained by classical methods only considering the terms which are potentially more discriminative. © 2012 - IOS Press and the authors. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

BACKGROUND: Individual adaptation of processed patient's blood volume (PBV) should reduce number and/or duration of autologous peripheral blood progenitor cell (PBPC) collections. STUDY DESIGN AND METHODS: The durations of leukapheresis procedures were adapted by means of an interim analysis of harvested CD34+ cells to obtain the intended yield of CD34+ within as few and/or short as possible leukapheresis procedures. Absolute efficiency (AE; CD34+/kg body weight) and relative efficiency (RE; total CD34+ yield of single apheresis/total number of preapheresis CD34+) were calculated, assuming an intraapheresis recruitment if RE was greater than 1, and a yield prediction models for adults was generated. RESULTS: A total of 196 adults required a total of 266 PBPC collections. The median AE was 7.99 x 10(6), and the median RE was 1.76. The prediction model for AE showed a satisfactory predictive value for preapheresis CD34+ only. The prediction model for RE also showed a low predictive value (R2 = 0.36). Twenty-eight children underwent 44 PBPC collections. The median AE was 12.13 x 10(6), and the median RE was 1.62. Major complications comprised bleeding episodes related to central venous catheters (n = 4) and severe thrombocytopenia of less than 10 x 10(9) per L (n = 16). CONCLUSION: A CD34+ interim analysis is a suitable tool for individual adaptation of the duration of leukapheresis. During leukapheresis, a substantial recruitment of CD34+ was observed, resulting in a RE of greater than 1 in more than 75 percent of patients. The upper limit of processed PBV showing an intraapheresis CD34+ recruitment is higher than in a standard large-volume leukapheresis. Therefore, a reduction of individually needed PBPC collections by means of a further escalation of the processed PBV seems possible.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

BACKGROUND Clostridium difficile is an important cause of intestinal infections in some animal species and animals might be a reservoir for community associated human infections. Here we describe a collection of animal associated C. difficile strains from 12 countries based on inclusion criteria of one strain (PCR ribotype) per animal species per laboratory. RESULTS Altogether 112 isolates were collected and distributed into 38 PCR ribotypes with agarose based approach and 50 PCR ribotypes with sequencer based approach. Four PCR ribotypes were most prevalent in terms of number of isolates as well as in terms of number of different host species: 078 (14.3% of isolates; 4 hosts), 014/020 (11.6%; 8 hosts); 002 (5.4%; 4 hosts) and 012 (5.4%; 5 hosts). Two animal hosts were best represented; cattle with 31 isolates (20 PCR ribotypes; 7 countries) and pigs with 31 isolates (16 PCR ribotypes; 10 countries). CONCLUSIONS This results show that although PCR ribotype 078 is often reported as the major animal C. difficile type, especially in pigs, the variability of strains in pigs and other animal hosts is substantial. Most common human PCR ribotypes (014/020 and 002) are also among most prevalent animal associated C. difficile strains worldwide. The widespread dissemination of toxigenic C. difficile and the considerable overlap in strain distribution between species furthers concerns about interspecies, including zoonotic, transmission of this critically important pathogen.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The study aim was to determine whether using automated side loader (ASL) trucks in higher proportions compared to other types of trucks for residential waste collection results in lower injury rates (from all causes). The primary hypothesis was that the risk of injury to workers was lower for those who work with ASL trucks than for workers who work with other types of trucks used in residential waste collection. To test this hypothesis, data were collected from one of the nation’s largest companies in the solid waste management industry. Different local operating units (i.e. facilities) in the company used different types of trucks to varying degrees, which created a special opportunity to examine refuse collection injuries and illnesses and the risk reduction potential of ASL trucks.^ The study design was ecological and analyzed end-of-year data provided by the company for calendar year 2007. During 2007, there were a total of 345 facilities which provided residential services. Each facility represented one observation.^ The dependent variable – injury and illness rate, was defined as a facility’s total case incidence rate (TCIR) recorded in accordance with federal OSHA requirements for the year 2007. The TCIR is the rate of total recordable injury and illness cases per 100 full-time workers. The independent variable, percent of ASL trucks, was calculated by dividing the number of ASL trucks by the total number of residential trucks at each facility.^ Multiple linear regression models were estimated for the impact of the percent of ASL trucks on TCIR per facility. Adjusted analyses included three covariates: median number of hours worked per week for residential workers; median number of months of work experience for residential workers; and median age of residential workers. All analyses were performed with the statistical software, Stata IC (version 11.0).^ The analyses included three approaches to classifying exposure, percent of ASL trucks. The first approach included two levels of exposure: (1) 0% and (2) >0 - <100%. The second approach included three levels of exposure: (1) 0%, (2) ≥ 1 - < 100%, and (3) 100%. The third approach included six levels of exposure to improve detection of a dose-response relationship: (1) 0%, (2) 1 to <25%, (3) 25 to <50%, (4) 50 to <75%, (5) 75 to <100%, and (6) 100%. None of the relationships between injury and illness rate and percent ASL trucks exposure levels was statistically significant (i.e., p<0.05), even after adjustment for all three covariates.^ In summary, the present study shows that there is some risk reduction impact of ASL trucks but not statistically significant. The covariates demonstrated a varied yet more modest impact on the injury and illness rate but again, none of the relationships between injury and illness rate and the covariates were statistically significant (i.e., p<0.05). However, as an ecological study, the present study also has the limitations inherent in such designs and warrants replication in an individual level cohort design. Any stronger conclusions are not suggested.^