---
title: "lab7_DiscussionNotes"
author: "sbsambado"
date: "5/14/2021"
output: html_document
---
1. Final Project

- use office hours as a time to get questions addressed


2. What you need for this lab
```{r setup, include=FALSE, message=FALSE}
knitr::opts_chunk$set(echo = TRUE)


# packages needed for lab 7
library(readr)
library(tidyverse)
library(car)
library(ggplot2)

# install.packages("psych")
library(psych) # new package alert! Make sure to install.packages("")


# datasets needed for lab 7
kelp <- read_csv("kelp_data.csv")
deet <- read_csv("deet.csv")
plant <- read_csv("plant_data.csv") # the plant data will forever haunt us
spiders <- read_csv("spiders.csv")


str(kelp)
summary(kelp)

```



3. *Where are we?*

~ let's get comfortable
Week 1 -- hopefully everyone has R and Rstudio running on their computer
Week 2 -- Rmarkdown & data analysis

~ let's compare means - test 1 for final project
Week 3 -- Data visualization
Week 4 -- Determine normality & compare means
Week 5 -- What to do when data is not normal
Week 6 -- ANOVAs & residuals

~ let's test relationships - test 2 for final project
*Week 7 -- Regression and correlation*


    
4. Quick Notes about Lab 7

#### CORRELATION
  - YOU CAN ONLY DO A CORRELATION ON TWO NUMERIC VALUES
  - Pearson's correlation coefficient (r) (parametric) -1 to 1
      + -1 perfect negative correlation
      + 0 no correlation 
      + 1 perfect positive correlation
  - Spearman's rank (nonparametric)

How to read pairs.panels in the `psych` package
```{r if you don't subset, message=FALSE}
str(kelp) # 120 rows, 6 variables
#subset the data you want to plot - pairs.panels() only takes data.frames, not formulas!


pairs.panels(kelp) # add arguments: density = T/F, cor = T/F, lm = T/F


#pairs.panels(kelp, density = TRUE) # adds curve to histograms
#pairs.panels(kelp, lm = TRUE) # add linear regression line




```

```{r subset, but not normal data, message=FALSE}

plot.kelp <- data.frame(kelp[,2:3], kelp[,5]) # subset your dataset kelp into variables you care about
colnames(plot.kelp)<-c("max_wid", "min_wid", "abund_amphipod") #give columns names

# Always double check your dataframe after subsetting it!
str(plot.kelp) # 120 rows, 3 variables

#install.packages("psych") #a new package for you!
#library(psych)
pairs.panels(plot.kelp, density = TRUE, cor=TRUE, lm=TRUE)  # makes a scatterplot matrix of your dataset. show density plots, do not show correlation values, use a linear model

```

```{r normal data, message=FALSE}
log.plot.kelp <- log(plot.kelp+1) #a fast and easy way to log the whole dataset!
colnames(log.plot.kelp)<-c("max_wid_log", "min_wid_log", "abund_amphipod_log") #rename since they're logged now
pairs.panels(log.plot.kelp, density = TRUE, cor=TRUE, lm=TRUE)  # makes a scatterplot matrix of your dataset. show density plots (density = TRUE), do not show correlation values (cor = FALSE), use a linear model (lm = TRUE)

```

#### LINEAR REGRESSION
  - Hypotheses of linear regression
      + H0 **slope of line** = 0 (means there is no influence of x on y)
      + HA slope of line != 0 (means there is influence of x on y )

**REMEMBER**
  Y ~ INTERCEPT (B0) + SLOPE (B1) * X1 + SLOPE (B2) X2

When you call for `summary(model)`, look for:
- intercept
- slope of X
- p-value for X
- R^2

```{r how to understand the output}      
fit3 <- lm(log(number.spiders) ~ height.cm, subset = colony != 5, data = spiders)
par(mfrow = c(2,2))
plot(fit3)

summary(fit3)



```

how to find an outlier on a scatterplot for HW Q2
```{r how to find an outlier on a scatterplot}

plot(spiders$height.cm, spiders$number.spiders) 
text(spiders$height.cm, spiders$number.spiders, labels=rownames(spiders), pos = 1) 
# labels = rownames(DATASET) # pos is position of label


dim(peguin) # check demensions 
penguin_subset <- peguin[, is.na(penguin$body)] # subset

dim(penguin_subset)
```
