895 resultados para error estimate
Resumo:
For the standard kernel density estimate, it is known that one can tune the bandwidth such that the expected L1 error is within a constant factor of the optimal L1 error (obtained when one is allowed to choose the bandwidth with knowledge of the density). In this paper, we pose the same problem for variable bandwidth kernel estimates where the bandwidths are allowed to depend upon the location. We show in particular that for positive kernels on the real line, for any data-based bandwidth, there exists a densityfor which the ratio of expected L1 error over optimal L1 error tends to infinity. Thus, the problem of tuning the variable bandwidth in an optimal manner is ``too hard''. Moreover, from the class of counterexamples exhibited in the paper, it appears thatplacing conditions on the densities (monotonicity, convexity, smoothness) does not help.
Resumo:
This paper explores three aspects of strategic uncertainty: its relation to risk, predictability of behavior and subjective beliefs of players. In a laboratory experiment we measure subjects certainty equivalents for three coordination games and one lottery. Behavior in coordination games is related to risk aversion, experience seeking, and age.From the distribution of certainty equivalents we estimate probabilities for successful coordination in a wide range of games. For many games, success of coordination is predictable with a reasonable error rate. The best response to observed behavior is close to the global-game solution. Comparing choices in coordination games with revealed risk aversion, we estimate subjective probabilities for successful coordination. In games with a low coordination requirement, most subjects underestimate the probability of success. In games with a high coordination requirement, most subjects overestimate this probability. Estimating probabilistic decision models, we show that the quality of predictions can be improved when individual characteristics are taken into account. Subjects behavior is consistent with probabilistic beliefs about the aggregate outcome, but inconsistent with probabilistic beliefs about individual behavior.
Resumo:
Nonlinear regression problems can often be reduced to linearity by transforming the response variable (e.g., using the Box-Cox family of transformations). The classic estimates of the parameter defining the transformation as well as of the regression coefficients are based on the maximum likelihood criterion, assuming homoscedastic normal errors for the transformed response. These estimates are nonrobust in the presence of outliers and can be inconsistent when the errors are nonnormal or heteroscedastic. This article proposes new robust estimates that are consistent and asymptotically normal for any unimodal and homoscedastic error distribution. For this purpose, a robust version of conditional expectation is introduced for which the prediction mean squared error is replaced with an M scale. This concept is then used to develop a nonparametric criterion to estimate the transformation parameter as well as the regression coefficients. A finite sample estimate of this criterion based on a robust version of smearing is also proposed. Monte Carlo experiments show that the new estimates compare favorably with respect to the available competitors.
Resumo:
There is a controversial debate about the effects of permanent disability benefits on labormarket behavior. In this paper we estimate equations for deserving and receiving disabilitybenefits to evaluate the award error as the difference in the probability of receiving anddeserving using survey data from Spain. Our results indicate that individuals aged between55 and 59, self-employers or working in an agricultural sector have a probability of receiving a benefit without deserving it significantly higher than the rest of individuals. We also find evidence of gender discrimination since male have a significantly higher probability of receiving a benefit without deserving it. This seems to confirm that disability benefits are being used as an instrument for exiting the labor market for some individuals approaching the early retirement or those who do not have right to retire early. Taking into account that awarding process depends on Social Security Provincial Department, this means that some departments are applying loosely the disability requirements for granting disability benefits.
Resumo:
Despite the advancement of phylogenetic methods to estimate speciation and extinction rates, their power can be limited under variable rates, in particular for clades with high extinction rates and small number of extant species. Fossil data can provide a powerful alternative source of information to investigate diversification processes. Here, we present PyRate, a computer program to estimate speciation and extinction rates and their temporal dynamics from fossil occurrence data. The rates are inferred in a Bayesian framework and are comparable to those estimated from phylogenetic trees. We describe how PyRate can be used to explore different models of diversification. In addition to the diversification rates, it provides estimates of the parameters of the preservation process (fossilization and sampling) and the times of speciation and extinction of each species in the data set. Moreover, we develop a new birth-death model to correlate the variation of speciation/extinction rates with changes of a continuous trait. Finally, we demonstrate the use of Bayes factors for model selection and show how the posterior estimates of a PyRate analysis can be used to generate calibration densities for Bayesian molecular clock analysis. PyRate is an open-source command-line Python program available at http://sourceforge.net/projects/pyrate/.
Resumo:
Consider the problem of testing k hypotheses simultaneously. In this paper,we discuss finite and large sample theory of stepdown methods that providecontrol of the familywise error rate (FWE). In order to improve upon theBonferroni method or Holm's (1979) stepdown method, Westfall and Young(1993) make eective use of resampling to construct stepdown methods thatimplicitly estimate the dependence structure of the test statistics. However,their methods depend on an assumption called subset pivotality. The goalof this paper is to construct general stepdown methods that do not requiresuch an assumption. In order to accomplish this, we take a close look atwhat makes stepdown procedures work, and a key component is a monotonicityrequirement of critical values. By imposing such monotonicity on estimatedcritical values (which is not an assumption on the model but an assumptionon the method), it is demonstrated that the problem of constructing a validmultiple test procedure which controls the FWE can be reduced to the problemof contructing a single test which controls the usual probability of a Type 1error. This reduction allows us to draw upon an enormous resamplingliterature as a general means of test contruction.
Resumo:
This work is part of a project studying the performance of model basedestimators in a small area context. We have chosen a simple statisticalapplication in which we estimate the growth rate of accupation for severalregions of Spain. We compare three estimators: the direct one based onstraightforward results from the survey (which is unbiassed), and a thirdone which is based in a statistical model and that minimizes the mean squareerror.
Resumo:
Given $n$ independent replicates of a jointly distributed pair $(X,Y)\in {\cal R}^d \times {\cal R}$, we wish to select from a fixed sequence of model classes ${\cal F}_1, {\cal F}_2, \ldots$ a deterministic prediction rule $f: {\cal R}^d \to {\cal R}$ whose risk is small. We investigate the possibility of empirically assessingthe {\em complexity} of each model class, that is, the actual difficulty of the estimation problem within each class. The estimated complexities are in turn used to define an adaptive model selection procedure, which is based on complexity penalized empirical risk.The available data are divided into two parts. The first is used to form an empirical cover of each model class, and the second is used to select a candidate rule from each cover based on empirical risk. The covering radii are determined empirically to optimize a tight upper bound on the estimation error. An estimate is chosen from the list of candidates in order to minimize the sum of class complexity and empirical risk. A distinguishing feature of the approach is that the complexity of each model class is assessed empirically, based on the size of its empirical cover.Finite sample performance bounds are established for the estimates, and these bounds are applied to several non-parametric estimation problems. The estimates are shown to achieve a favorable tradeoff between approximation and estimation error, and to perform as well as if the distribution-dependent complexities of the model classes were known beforehand. In addition, it is shown that the estimate can be consistent,and even possess near optimal rates of convergence, when each model class has an infinite VC or pseudo dimension.For regression estimation with squared loss we modify our estimate to achieve a faster rate of convergence.
Resumo:
1. ABSTRACTS - RÉSUMÉSSCIENTIFIC ABSTRACT - ENGLISH VERSIONGeometry, petrology and growth of a shallow crustal laccolith: the Torres del Paine Mafi c Complex (Patagonia)The Torres del Paine intrusive complex (TPIC) is a composite mafic-granitic intrusion, ~70km2, belonging to a chain of isolated Miocene plutons in southern Patagonia. Their position is intermediate between the Mesozoic-Cenozoic calc-alkaline subduction related Patagonian batholith in the West and the late Cenozoic alkaline basaltic back-arc related plateau lavas in the East. The Torres del Paine complex formed during an important reconfiguration of the Patagonian geodynamic setting, with a migration of magmatism from the arc to the back-arc, possibly related to the Chile ridge subductionThe complex intruded the flysch of the Cretaceous Cerro Toro and Punta Barrosa Formations during the Miocene, creating a well-defined narrow contact aureole of 200-400 m width.In its eastern part, the Torres del Paine intrusive complex is a laccolith, composed of a succession of hornblende-gabbro to diorite sills at its base, with a total thickness of ~250m, showing brittle contacts with the overlying granitic sills, that form spectacular cliffs of more than 1000m. This laccolith is connected, in the western part, to its feeding system, with vertical alternating sheets of layered gabbronorite and Hbl-gabbro, surrounded and percolated by diorites. ID-TIMS U-Pb on zircons on feeder zone (FZ) gab- bros yield 12.593±0.009Ma and 12.587±0.009Ma, which is identifcal within error to the oldest granite dated so far by Michel et al. (2008). In contrast, the laccolith mafic complex is younger than than the youngest granite (12.50±0.02Ma), and has been emplaced from 12.472±0.009Ma to 12.431 ±0.006Ma, by under-accretion beneath the youngest granite at the interface with previously emplaced mafic sills.The gabbronorite crystallization sequence in the feeder zone is dominated by olivine, plagioclase, clinopyroxene and orthopyroxene, while amphibole forms late interstitial crystals. The crystallization sequence is identical in Hornblende-gabbro from the feeder zone, with higher modal hornblende. Gabbronorite and Hornblende-gabbro both display distinct Eu and Sr positive anomalies. In the laccolith, a lower Hornblende-gabbro crystallized in sills and evolved to a high alkali shoshonitic series. The Al203, Ti02, Na20, K20, Ba and Sr composition of these gabbros is highly variable and increases up to ~50wt% Si02. The lower hornblende-gabbro is characterized by kaersutite anhedral cores with inclusions of olivine, clino- and orthopyroxene and rare apatite and An70 plagioclase. Trace element modelling indicates that hornblende and clinopyroxene are in equilibrium with a liquid whose composition is similar to late basaltic trachyandesitic dikes that cut the complex. The matrix in the lower hornblende gabbro is composed of normally zoned oligoclase, Magnesio-hornblende, biotite, ilmenite and rare quartz and potassium feldspar. This assemblage crystallized in-situ from a Ba and Sr-depleted melts. In contrast, the upper Hbl-gabbro is high-K calc-alkaline. Poikilitic pargasite cores have inclusions of euhedral An70 plagioclase inclusions, and contain occasionally clinopyroxene, olivine and orthopyroxene. The matrix composition is identical to the lower hornblende-gabbro and similar to the diorite. Diorite bulk rock compositions show the same mineralogy but different modal proportions relative to hornblende-gabbrosThe Torres del Paine Intrusive Complex isotopic composition is 87Sr/86Sr=0.704, 143Nd/144Nd=0.5127, 206Pb/204Pb=18.70 and 207Pb/204Pb=15.65. Differentiated dioritic and granitic units may be linked to the gabbroic cumulates series, with 20-50% trapped interstitial melt, through fractionation of olivine-bearing gabbronorite or hornblende-gabbro fractionation The relative homogeneity of the isotopic compositions indicate that only small amounts of assimilation occurred. Two-pyroxenes thermometry, clinopyroxene barometry and amphibole-plagioclase thermometry was used to estimate pressure and temperature conditions. The early fractionation of ultramafic cumulates occurs at mid to lower crustal conditions, at temperatures exceeding 900°C. In contrast, the TPIC emplacement conditions have been estimated to ~0.7±0.5kbar and 790±60°C.Based on field and microtextural observations and geochemical modelling, fractionation of basaltic-trachyandesitic liquids at intermediate to lower crustal levels, has led to the formation of the Torres del Paine granites. Repetitive replenishment of basaltic trachy- andesitic liquid in crustal reservoirs led to mixed magmas that will ascend via the feeder zone, and crystallize into a laccolith, in the form of successive dioritic and gabbroic sills. Dynamic fractionation during emplacement concentrated hornblende rich cumulates in the center of individual sills. Variable degrees.of post-emplacement compaction led to the expulsion of felsic liquids that preferentially concentrated at the top of the sills. Incremental sills amalgamation of the entire Torres del Paine Intrusive Complex has lasted for ~160ka.RESUME SCIENTIFIQUE - VERSION FRANÇAISEGéométrie, pétrologie et croissance d'un laccolite peu profond : Le complexe ma- fique du Torres del Paine (Patagonie)Le Complexe Intrusif du Torres del Paine (CITP) est une intrusion bimodale, d'environ 70km2, appartenant à une chaîne de plutons Miocènes isolés, dans le sud de la Patago-nie. Leur position est intermédiaire entre le batholite patagonien calco-alcalin, à l'Ouest, mis en place au Mesozoïque-Cenozoïque dans un contexte de subduction, et les basal-tes andésitiques et trachybasaltes alcalins de plateau, plus jeune, à l'Est, lié à l'ouverture d'un arrière-arc.A son extrémité Est, le CITP est une succession de sills de gabbro à Hbl et de diorite, sur une épaisseur de ~250m, avec des évidences de mélange. Les contacts avec les sills de granite au-dessus, formant des parois de plus de 1000m, sont cassants. Ce laccolite est connecté, dans sa partie Ouest, à une zone d'alimentation, avec des intrusions sub-ver- ticales de gabbronorite litée et de gabbro à Hbl, en alternance. Celles-ci sont traversées et entourées par des diorites. Les zircons des gabbros de la zone d'alimentation, datés par ID-TIMS, ont cristallisés à 12.593±0.009Ma et 12.587±0.009Ma, ce qui correspond au plus vieux granite daté à ce jour par Michel et al. (2008). A l'inverse, les roches manques du laccolite se sont mises en place entre 12.472±0.009Ma et 12.431 ±0.006Ma, par sous-plaquage successifs à l'interface avec le granite le plus jeune daté à ce jour (12.50±0.02Ma).La séquence de cristallisation des gabbronorites est dominée par Ol, Plg, Cpx et Opx, alors que la Hbl est un cristal interstitiel. Elle est identique dans les gabbros à Hbl de la zone d'alimentation, avec ~30%vol de Hbl. Les gabbros de la zone d'alimentation montrent des anomalies positives en Eu et Sr distinctes. Dans le laccolite, le gabbro à Hbl inférieur évolue le long d'une série shoshonitique, riche en éléments incompatibles. Sa concentration en Al203, Ti02, Na20, K20, Ba et Sr est très variable et augmente rapide-ment jusqu'à ~50wt% Si02. Il est caractérisé par la présence de coeurs résorbés de kaer- sutite, entourés de Bt, et contenant des inclusions d'OI, Cpx et Opx, ou alors d'Ap et de rares Plg (An70). Hbl et Cpx ont cristallisés à partir d'un liquide de composition similaire aux dykes trachy-andesite basaltique du CITP. La matrice, cristallisée in-situ à partir d'un liquide pauvre en Ba et Sr, est composée d'oligoclase zoné de façon simple, de Mg-Hbl, Bt, llm ainsi que de rares Qtz et KF. Le gabbro à Hbl supérieur, quant à lui, appartient à une suite chimique calco-alcaline riche en K. Des coeurs poecilitiques de pargasite con-tiennent de nombreuses inclusions de Plg (An70) automorphe, ainsi que des Ol, Cpx et Opx. La composition de la matrice est identique à celle des gabbros à Hbl inférieurs et toutes deux sont similaires à la minéralogie des diorites. Les analyses sur roches totales de diorites montrent la même variabilité que celles de gabbros à Hbl, mais avec une ten-eur en Si02 plus élevée.La composition isotopique des liquides primitifs du CITP a été mesurée à 87Sr/86Sr=0.704, 143Nd/144Nd=0.5127, 206Pb/204Pb=18.70 et 207Pb/204Pb=15.65. Les granites et diorites différenciés peuvent être reliés à des cumulais gabbronoritiques (F=0.74 pour les granites et F=1-0.5 pour les diorites) et gabbroïques à Hbl (fractionnement supplémentaire pour les granites, avec F=0.3). La cristallisation de 20 à 50%vol de liquide interstitiel piégé dans les gabbros du CITP explique leur signature géochimique. Seules de faibles quantités de croûte continentale ont été assimilées. La température et la pression de fractionnement ont été estimées, sur la base des thermobaromètres Opx-Cpx, Hbl-Plg et Cpx, à plus de 900°C et une profondeur correspondant à la croûte inférieure-moyenne. A l'inverse, les conditions de cristallisation de la matrice des gabbros et diorites du laccolite ont été estimées à 790±60°C et ~0.7±0.5kbar.Je propose que les liquides felsiques du CITP se soient formés par cristallisation frac-tionnée en profondeur des assemblages minéralogiques observés dans les gabbros du CITP, à partir d'un liquide trachy-andesite basaltique. La percolation de magma dans les cristaux accumulés permet la remontée du mélange à travers la zone d'alimentation, vers le laccolite, où des sills se mettent en place successivement. L'amalgamation de sills dans le CITP a duré ~160ka.Le CITP s'est formé durant une reconfiguration importante du contexte géodynamique en Patagonie, avec un changement du magmatisme d'arc vers un volcanisme d'arrière- arc. Ce changement est certainement lié à la subduction de la ride du Chili.RESUME GRAND PUBLIC - VERSION FRANÇAISEGéométrie, pétrologie et croissance d'une chambre magmatique peu profonde : Le complexe mafique du Torres del Paine (Patagonie)Le pourtour de l'Océan Pacifique est caractérisé par une zone de convergence de plaques tectoniques, appelée zone de subduction, avec le plongement de croûte océa-nique sous les Andes dans le cas de la Patagonie. De nombreux volcans y sont associés, formant la ceinture de feu. Mais seuls quelques pourcents de tout le magma traversant la croûte terrestre parviennent à la surface et la majeure partie cristallise en profondeur, dans des chambres magmatiques. Quelles est leur forme, croissance, cristallisation et durée de vie ? Le complexe magmatique du Torres del Paine représente l'un des meilleurs endroits au monde pour répondre à ces questions. Il se situe au sud de la Patagonie, formant un massif de 70km2. Des réponses peuvent être trouvées à différentes échelles, variant de la montagne à des minéraux de quelques 1000ème de millimètres.Il est possible de distinguer trois types de roches : des gabbros et des diorites sur une épaisseur de 250m, surmontées par des parois de granite de plus de 1000m. Les contacts entre ces roches sont tous horizontaux. Entre granites et gabbro-diorite, le contact est net, indiquant que le second magma s'est mis en place au contact avec un magma plus ancien, totalement solidifié. Entre gabbros et diorites, les contacts sont diffus, souvent non-linéaires, indiquant à l'inverse la mise en contact de magmas encore partiellement liquides. Dans la partie Ouest de cette chambre magmatique, les contacts entre roches sont verticaux. Il s'agit certainement du lieu de remplissage de la chambre magmatique.Lors du refroidissement d'un magma, différents cristaux vont se former. Leur stabilité et leur composition varient en fonction de la pression, de la température ou de la chimie du magma. La séquence de cristallisation peut être définie sur la base d'observations microscopiques et de la composition chimique des minéraux. Différents gabbros sont ainsi distingués : le gabbro à la base est riche en hornblende, d'une taille de ~5mm, sans inclusion de plagioclase mais avec des cristaux d'olivine, clinopyroxene et orthopyroxene inclus ; le gabbro supérieur est lui-aussi riche en hornblende (~5mm), avec les mêmes inclusions additionnées de plagioclase. Ces cristaux se sont formés à une température supérieure à 900°C et une profondeur correspondant à la croûte moyenne ou inférieure. Les minéraux plus fin, se trouvant hors des cristaux de hornblende des deux gabbros, sont similaires à ceux des diorites : plagioclase, biotite, hornblende, apatite, quartz et feldspath alcalin. Ces minéraux sont caractéristiques des granites. Ils ont cristallisé à ~790°C et ~2km de profondeur.La cristallisation des minéraux et leur extraction du magma par gravité provoque un changement progressif de la composition de ce dernier. Ainsi, après extraction d'olivine et d'orthopyroxene riches en Mg, de clinopyroxene riche en Ca, de plagioclase riche en Ca et Al et d'hornblende riche en Ca, Al et Mg, le liquide final sera appauvri en ces élé-ments. Un lien peut ainsi être proposé entre les diorites dont la composition est proche du liquide de départ, les granites dont la composition est similaire au liquide final, et les gabbros dont la minéralogie correspond aux minéraux extraits.L'utilisation de zircons, un minéral riche en U dont les atomes se transforment en Pb par décomposition radioactive au cours de millions d'années, permet de dater le refroidissement des roches qui les contiennent. Ainsi, il a été observé que les roches de la zone d'alimentation, à l'Ouest du complexe magmatique, ont cristallisés il y a 12.59±0.01 Ma, en même temps que les granites les plus vieux, se trouvant au sommet de la chambre magmatique, datés par Michel et al. (2008). Les deux roches pourraient donc avoir la même origine. A l'inverse, les gabbros et diorites de la chambre magmatique ont cristallisé entre 12.47±0.01Ma et 12.43±0.01Ma, les roches les plus vieilles étant à la base.En comparant la composition des roches du Torres del Paine avec celles d'autres en-tités géologiques de Patagonie, les causes du magmatisme peuvent être recherchées. A l'Ouest, on trouve en effet des intrusions granitiques, plus anciennes, caractéristiques de zones de convergence de plaque tectonique, alors qu'à l'Est, des laves basaltiques plus jeunes sont caractéristiques d'une dynamique d'extension. Sur la base des compositions chimiques des roches de ces différentes entités, l'évolution progressive de l'une à l'autre a pu être démontrée. Elle est certainement due à l'arrivée d'une dorsale océanique (zone d'extension crustale et de création de croûte océanique par la remontée de magma) dans la zone de subduction, le long des Andes.Je propose que, dans un premier temps, des magmas granitiques sont remontés dans la chambre magmatique, laissant d'importants volumes de cristaux dans la croûte pro-fonde. Dans un second épisode, les cristaux formés en profondeur ont été transportés à travers la croûte continentale, suite au mélange avec un nouveau magma injecté. Ces magmas chargés de cristaux ont traversé la zone d'alimentation avant de s'injecter dans la chambre magmatique. Différents puises ont été distingués, injectés dans la chambre magmatique du sommet à la base concernant les granites, puis à la base du granite le plus jeune pour les gabbros et diorites. Le complexe magmatique du Torres del Paine s'est construit sur une période totale de 160'000±20'000 ans.
Resumo:
ABSTRACT. Chrysomya albiceps (Wiedemann) and Hemilucilia segmentaria (Fabricius) (Diptera, Calliphoridae) used to estimate the postmortem interval in a forensic case in Minas Gerais, Brazil. The corpse of a man was found in a Brazilian highland savanna (cerrado) in the state of Minas Gerais. Fly larvae were collected at the crime scene and arrived at the laboratory three days afterwards. From the eight pre-pupae, seven adults of Chrysomya albiceps (Wiedemann, 1819) emerged and, from the two larvae, two adults of Hemilucilia segmentaria (Fabricius, 1805) were obtained. As necrophagous insects use corpses as a feeding resource, their development rate can be used as a tool to estimate the postmortem interval. The post-embryonary development stage of the immature collected on the body was estimated as the difference between the total development time and the time required for them to become adults in the lab. The estimated age of the maggots from both species and the minimum postmortem interval were four days. This is the first time that H. segmentaria is used to estimate the postmortem interval in a forensic case.
Resumo:
The classical binary classification problem is investigatedwhen it is known in advance that the posterior probability function(or regression function) belongs to some class of functions. We introduceand analyze a method which effectively exploits this knowledge. The methodis based on minimizing the empirical risk over a carefully selected``skeleton'' of the class of regression functions. The skeleton is acovering of the class based on a data--dependent metric, especiallyfitted for classification. A new scale--sensitive dimension isintroduced which is more useful for the studied classification problemthan other, previously defined, dimension measures. This fact isdemonstrated by performance bounds for the skeleton estimate in termsof the new dimension.
Resumo:
We continue the development of a method for the selection of a bandwidth or a number of design parameters in density estimation. We provideexplicit non-asymptotic density-free inequalities that relate the $L_1$ error of the selected estimate with that of the best possible estimate,and study in particular the connection between the richness of the classof density estimates and the performance bound. For example, our methodallows one to pick the bandwidth and kernel order in the kernel estimatesimultaneously and still assure that for {\it all densities}, the $L_1$error of the corresponding kernel estimate is not larger than aboutthree times the error of the estimate with the optimal smoothing factor and kernel plus a constant times $\sqrt{\log n/n}$, where $n$ is the sample size, and the constant only depends on the complexity of the family of kernels used in the estimate. Further applications include multivariate kernel estimates, transformed kernel estimates, and variablekernel estimates.
Resumo:
Summary points: - The bias introduced by random measurement error will be different depending on whether the error is in an exposure variable (risk factor) or outcome variable (disease) - Random measurement error in an exposure variable will bias the estimates of regression slope coefficients towards the null - Random measurement error in an outcome variable will instead increase the standard error of the estimates and widen the corresponding confidence intervals, making results less likely to be statistically significant - Increasing sample size will help minimise the impact of measurement error in an outcome variable but will only make estimates more precisely wrong when the error is in an exposure variable
Resumo:
ABSTRACTThis study reviewed the data on the Brazilian Ephemeroptera, based on the studies published before July, 2013, estimated the number of species still to be described, and identified which regions of the country have been the subject of least research. More than half the species are known from the description of only one developmental stage, with imagoes being described more frequently than nymphs. The Brazilian Northeast is the region with the weakest database. Body size affected description rates, with a strong tendency for the larger species to be described first. The estimated number of unknown Brazilian species was accentuated by the fact that so few species have been described so far. The steep slope of the asymptote and the considerable confidence interval of the estimate reinforce the conclusion that a large number of species are still to be described. This emphasizes the need for investments in the training of specialists in systematics and ecology for all regions of Brazil to correct these deficiencies, given the role of published papers as a primary source of information, and the fundamental importance of taxonomic knowledge for the development of effective measures for the conservation of ephemeropteran and the aquatic ecosystems they depend on.