For a current research project, I am developing an occupancy model for horseshoe crabs in Florida. We are working with data from FWC, which is awesome long-term random stratified sampling data, but since it isn't specifically targeted at horseshoe crabs (it is for fish inventories), we are dealing with highly zero-inflated data ‚Äì lots of non-detection. Hence, we can't in good conscience treat the data in its current form as a measure of abundance of the crabs. Cue occupancy modeling: this type of model considers two processes: occupancy and detectability. Occupancy refers to the presence or absence of a species during the period of sampling / season, and detectability refers to whether or not a species is detected. Thus, there is a possibility that a species is present but not detected! This method also allows you to incorporate variables that could potentially impact either occupancy or detectability.
I am really excited about this approach! I first heard about it at a 2016 seminar talk by Dr Bliznyuk (UF, see here for his arXiv paper), and have been itching to apply it ever since! And now I finally have a chance.
However, I am new to this type of modeling, and also have not done much binomial modeling in the past. As I was going through a book, the R code and its manuals, making little examples to clarify formulas, I realized this might also benefit others ‚Äì which is the reason I am blogging about this. I will post several installments as I progress from simple to more complex models. I will make R code available too.
First and foremost; all credit for theoretical aspects goes to the following publications: - MacKenzie, D. I., Nichols, J. D., Lachman, G. B., Droege, S., Royle, A., & Langtimm, C. A. (2002). Estimating site occupancy rates when detection probabilities are less than one. Ecology, 83(8), 2248‚Äì2255. - MacKenzie, D. I., Nichols, J. D., Royle, A., Pollock, K.H., Bailey, L.L. & Hines, J.E. (2005). Occupancy Estimation and Modeling: Inferring Patterns and Dynamics of Species Occurrence. Elsevier. Especially the latter, the book, has been incredibly helpful.
For coding, I am using the package ‚Äúunmarked‚Äù in R: Fiske, I., & Chandler, R. (2011). unmarked: An RPackage for Fitting Hierarchical Models of Wildlife Occurrence and Abundance. Journal of Statistical Software, 43(10), 1‚Äì23. http://doi.org/10.18637/jss.v043.i10
Let‚Äôs get started!
First a refresher: the binomial process shows the probability of either success (1) or failure (0). The normal binomial distribution formula:
with n=number of trials, x=number of successes, p=probability of success and b=binomial probability. The first part, the binomial coefficient expresses how many ways The expected mean is and the variance is for a normal distribution.
For occupancy modeling, we use the following terminology: - The 'area' is the larger spatial area for which we want to draw an inference; - It is divided into sampling units ( - We select - Surveys are done repeatedly at each site,
There are a few assumptions we need to take into account: 1) We assume occupancy for the period of analysis to be 'closed', i.e. it does not change, a species occupies a site or it does not. In practice this often means you have to develop a model for a 'season', which can differ in length for different species. 2) The probability of occupancy is the same for all sites; 3) The probability of detection is the same for all sites; 4) Detections in each survey are independent; and 5) Detection histories are independent. Note: some of these assumptions are/can be modified with more complex modeling, but for now we are starting with simple situations!
The starting point for our first simple occupancy model is that we assume there is a common probability of sites being occupied by a species, e.g. 0.7. With formula (with species detected perfectly), this would result in 70 detections (x) at 100 sites (s), for instance. Note that this is the same formula as in the refresher earlier, just with different symbols (
But then we realize, detection is not perfect! If the probability of detecting a species at an occupied site is (1-p) is the probability that the species is not detected at all after ^{k}k surveys.
This illustrates that with low probabilities of detection, you need more trials to detect at least once (obviously).
Thus the probability of the species being present and being detected becomes and thus the estimator for the proportion of sites occupied (if we know with s
Generally though, we do not know what the detection probability is - so occupancy modeling is based on figuring out what the detection probability (
So, what are we calculating? 1) expresses the probability of getting this particular detection history (absence-presence-absence-presence) with being the probability that the species occupies the site and j: to repeat, p is the probability of success and p-1 is the probability of failure.2) It gets a little bit more complex if the species is not detected at all since this does not necessarily imply absence. We add the possibility of non-detection to absence:
Here we add the probability that the species is present but undetected (probability of occupancy multiplied by the product of all absence probabilities) to the probability of non-occupancy
Then, if these detection histories are constructed for all sites, we assume they‚Äôre independent and use the model likelihood for the observed data: the likelihood of occupancy and detection given the available data, is the product of all site probabilities:
In words this is 'the likelihood that we observe these detection histories, given occupancy probability and detection probability
In a more extensive form, using the formulas defined earlier, this becomes
Essentially, the second part represents the sites where there was no detection, so where we need to take absence and non-detection into account. This part is raised to the number of sites without detection. The first part is for where there is detection: jth survey. It again includes a calculation for success (presence) and for failure (absence). We first take the occupancy probability and raise it to the power of the number of sites where there was detection. We then multiply this with the product of the detection probabilities based on K surveys.This started to look like alphabet soup to me too, so I made an example with known probabilities.
=3 (number of sites where species was detected at least once
Since this has become a longer post than expected, I will leave more detail on log-likelihood and Maximum Likelihood Estimation (MLE) for the next post. I hope to also include a bit of R code in the next post ‚Äì though admittedly I have found implementation a little bit challenging so far.
If you want to do some more reading on occupancy models, the USGS created a nice straightforward (short) document: https://fresc.usgs.gov/products/fs/fs2005-3096.pdf
1 Comment
## Leave a Reply. |
## About meI currently work as a Research Scientist at the University of Florida. I try to blog about interesting random science stuff. And because I do a lot of coding (in R), I will share coding tips I found useful. ## Archives
November 2017
## Categories |

Proudly powered by Weebly