In the mean time, I have also been working on some other research such as the bonefish and horseshoe crab modeling. Parts of my dissertation are also 'in the works' as publications. I will share all of this work as soon as it is ready to share, or becomes officially available.

I will spend the next few weeks updating my websites and adding information on the things I'm working on. Stay tuned!

]]>

For a current research project, I am developing an occupancy model for horseshoe crabs in Florida. We are working with data from FWC, which is awesome long-term random stratified sampling data, but since it isn't specifically targeted at horseshoe crabs (it is for fish inventories), we are dealing with highly zero-inflated data ‚Äì lots of non-detection. Hence, we can't in good conscience treat the data in its current form as a measure of abundance of the crabs. Cue occupancy modeling: this type of model considers two processes: occupancy and detectability. Occupancy refers to the presence or absence of a species during the period of sampling / season, and detectability refers to whether or not a species is detected. Thus, there is a possibility that a species is present but not detected! This method also allows you to incorporate variables that could potentially impact either occupancy or detectability.

I am really excited about this approach! I first heard about it at a 2016 seminar talk by Dr Bliznyuk (UF, see here for his arXiv paper), and have been itching to apply it ever since! And now I finally have a chance.

However, I am new to this type of modeling, and also have not done much binomial modeling in the past. As I was going through a book, the R code and its manuals, making little examples to clarify formulas, I realized this might also benefit others ‚Äì which is the reason I am blogging about this. I will post several installments as I progress from simple to more complex models. I will make R code available too.

First and foremost; all credit for theoretical aspects goes to the following publications:

- MacKenzie, D. I., Nichols, J. D., Lachman, G. B., Droege, S., Royle, A., & Langtimm, C. A. (2002). Estimating site occupancy rates when detection probabilities are less than one. Ecology, 83(8), 2248‚Äì2255.

- MacKenzie, D. I., Nichols, J. D., Royle, A., Pollock, K.H., Bailey, L.L. & Hines, J.E. (2005). Occupancy Estimation and Modeling: Inferring Patterns and Dynamics of Species Occurrence. Elsevier.

Especially the latter, the book, has been incredibly helpful.

For coding, I am using the package ‚Äúunmarked‚Äù in R:

Fiske, I., & Chandler, R. (2011). unmarked: An RPackage for Fitting Hierarchical Models of Wildlife Occurrence and Abundance. Journal of Statistical Software, 43(10), 1‚Äì23. http://doi.org/10.18637/jss.v043.i10

Let‚Äôs get started!

First a refresher: the binomial process shows the probability of either success (1) or failure (0).

The normal binomial distribution formula:

with n=number of trials, x=number of successes, p=probability of success and b=binomial probability. The first part, the binomial coefficient expresses how many ways *k* successes can be distributed over *n* trials. We then multiply with the probability of success (*p*), then the probability of failure (*p-1*).

The expected mean is and the variance is for a normal distribution.

* *

*EXAMPLE*

*So* *if for 29 trials (n), you get 13 successes (x), and you know the probability of success is 0.5 this formula tells you that the probability of getting this result (13 out of 29 is a success) is 0.13. There is a 0.14 probability of getting 14 successes. There is only a 0.02 probability of getting 20 successes. These values are the area under the probability distribution plot (pdf). So if you know p, you can calculate b, and vice versa.*

For occupancy modeling, we use the following terminology:

- The 'area' is the larger spatial area for which we want to draw an inference;

- It is divided into sampling units (*S*) ‚Äì which could be grid cells or discrete units (like a pond);

- We select *s* units ‚Äì sites ‚Äì as random samples: the assumption is that S is very large in comparison to the sites s, and we are trying to say something about occupancy of *S* (not *s*);

- Surveys are done repeatedly at each site, *K* times, to record presence (1) or absence (0) of a species, creating a detection history (*h _{i}*).

There are a few assumptions we need to take into account:

1) We assume occupancy for the period of analysis to be 'closed', i.e. it does not change, a species occupies a site or it does not. In practice this often means you have to develop a model for a 'season', which can differ in length for different species.

2) The probability of occupancy is the same for all sites;

3) The probability of detection is the same for all sites;

4) Detections in each survey are independent; and

5) Detection histories are independent.

Note: some of these assumptions are/can be modified with more complex modeling, but for now we are starting with simple situations!

The starting point for our first simple occupancy model is that we assume there is a common probability of sites being occupied by a species, e.g. 0.7. With formula (with species detected perfectly), this would result in 70 detections (x) at 100 sites (s), for instance. Note that this is the same formula as in the refresher earlier, just with different symbols (*p=** and n=s*).

But then we realize, detection is not perfect! If the probability of detecting a species at an occupied site is *p*, after *k* surveys the probability of detecting at least once is *p*=1-(1-p) ^{k}.* The second part

*Examples:*

*High p and high k: p*=1-(1-0.7) ^{30}=1*

*High p and low k: p*=1-(1-0.7) ^{3}=0.973*

*Low p and high k: p*=1-(1-0.3) ^{30}=1*

*Low p and low k: p*=1-(1-0.3) ^{3}=0.66 (i.e. the probability 1-0.66 that species go undetected in all surveys)*

This illustrates that with low probabilities of detection, you need more trials to detect at least once (obviously).

Thus the probability of the species being present and being detected becomes and thus the estimator for the proportion of sites occupied (if we know *p*) is

with s_{D} = sites where species is detected. Similarly as above

*Example*

*With 30 surveys and 10 detections, if p=0.7 and k=30, p*=1 (see above), so* * should be close to the real number, e.g. 10/30. In this case, even if p is low, the number of k makes up for it, so this is also close to the 'real' probability.*

*But let's say k=3 and p=0.3 (so a low probability of detection, which makes p*=0.66, as above), and there is 1 detection, this makes* * (instead of 0.333). *

Generally though, we do not know what the detection probability is - so occupancy modeling is based on figuring out what the detection probability (*p*) and the occupancy probability () is.

So, what are we calculating?

1) expresses the probability of getting this particular detection history (absence-presence-absence-presence) with being the probability that the species occupies the site and *p _{j}* the probability of detecting the species during survey

2) It gets a little bit more complex if the species is not detected at all since this does not necessarily imply absence. We add the possibility of non-detection to absence:

Here we add the probability that the species is present but undetected (probability of occupancy multiplied by the product of all absence probabilities) to the probability of non-occupancy

Then, if these detection histories are constructed for all sites, we assume they‚Äôre independent and use the model likelihood for the observed data: the likelihood of occupancy and detection given the available data, is the product of all site probabilities:

In words this is 'the likelihood that we observe these detection histories, given occupancy probability and detection probability *p*)'.

In a more extensive form, using the formulas defined earlier, this becomes

Essentially, the second part represents the sites where there was no detection, so where we need to take absence and non-detection into account. This part is raised to the number of sites without detection.

The first part is for where there is detection: *s _{j}* is the number of sites where the species was detected at the

This started to look like alphabet soup to me too, so I made an example with known probabilities.

*Example:*

*We make a detection survey with 4 sites (s, rows), 5 surveys (j=1 to K, columns),* *=0.7 and p=0.6. For:*

*Site 1 1 1 1 0 0*

*Site 2 0 0 0 0 0*

*Site 3 1 0 1 0 1*

*Site 4 1 1 0 0 0*

*s _{D}*

*We start with second half of the detection histories (h _{1} through h_{5}), the parts with p's in them:*

*survey j=1:* *=0.216*

*survey j=2:* *=0.144*

*survey j=3:* *=0.144*

*survey j=4:* *=0.7 (this one is easy because there is only one survey with zeros, so no product calculation needed)*

*survey j=5:* *=0.096*

*Then putting it all together gives*

*=0.000147.0.49=0.00007227*

*for which the log-likelihood is -4.141*

Since this has become a longer post than expected, I will leave more detail on log-likelihood and Maximum Likelihood Estimation (MLE) for the next post. I hope to also include a bit of R code in the next post ‚Äì though admittedly I have found implementation a little bit challenging so far.

If you want to do some more reading on occupancy models, the USGS created a nice straightforward (short) document: https://fresc.usgs.gov/products/fs/fs2005-3096.pdf

I didn't come up with this solution myself, but am grateful for finding it and would like to disseminate it wider - I know there are more of you struggling with the same problem! Dr. Hocking (Frostburg State University) published the solution on his website: click here.]]>

I just found out seagrass dying is a problem around the world, but also that seagrass is so much more amazing than I thought. So that's the reason for this post, to share an excellent New York Times article:

Disappearing Seagrass Protects Against Pathogens, Even Climate Change, Scientists Find

Also, if you're in Gainesville (FL) and you want to learn more about seagrass, Savanna Barry (Nature Coast Biological Station) will be doing a talk on 1 March 2017, "Seagrasses of the Florida Big Bend: Ecology, resilience, and threats". This is part of the Water, Wetlands and Watersheds Seminar series by the UF Howard T. Odum Center for Wetlands (click here for details).]]>