Everyone knows that smoking is a bad idea. The question that we will try to address in this article is the following: *Do smoking habits differ between women and men?* We are going to use some data from the Centers for Disease Control.

It defines everyday smokers as current smokers who smoke every day; some-day smokers are current smokers who smoke on some days. Former smokers have smoked at least 100 cigarettes in their lifetime but currently do not smoke at all.

```
>>> sexSmoke<-rbind(c(13771, 4987, 30803), c(11722, 3674, 24184))
# The rbind() function is used to bind or combine these two vectors by rows.
>>> dimnames(sexSmoke)<-list(c("Male", "Female"), c("Every-day smoker", "Some-day smoker", "Former smoker"))
# The dimnames() command sets the row and column names of a matrix
>>> sexSmoke
Every-day smoker Some-day smoker Former smoker
Male 13771 4987 30803
Female 11722 3674 24184
> chisq.test(sexSmoke)
Pearson's Chi-squared test
data: sexSmoke
X-squared = 43.479, df = 2, p-value = 3.62e-10
```

The chi-square test of independence is used to determine whether there is a statistically significant difference between the expected and the observed frequencies in one or more categories of a contingency table.

**Null hypothesis (H _{0}): the row and the column variables of the contingency table are independent.** The test statistic computed from the observations follows a χ

Alternative hypothesis (H_{1}): the row and the column variables of the contingency table are dependent.

In our example, the p-value of the Chi-squared test statistic = 3.62e-10, the row and the column variables are statistically significantly associated. In other words, smoking habits differ between women and men.

Most people will stop here—but the more adventurous try to understand better the Chi-squared test.

```
sexSmoke
Every-day smoker Some-day smoker Former smoker
Male 13771 4987 30803 49561
Female 11722 3674 24184 39580
25493 8661 54987 89141
```

By the assumption of independence under the null hypothesis we should “expect” the number of male every-day smoker to be = ^{25493 * 49561}⁄_{89141} = 14173.7087648.

```
>>> chisq<-chisq.test(sexSmoke)
>>> round(chisq$expected, 2) # We can extract the expected counts from the result of the test
Every-day smoker Some-day smoker Former smoker
Male 14173.71 4815.38 30571.91
Female 11319.29 3845.62 24415.09
```

What is the difference between the expected and observed results? The difference between the expected and observed count in the “male every day smoker” cell is ^{(observed-expected)2}⁄_{expected} = ^{(13771-14173.71)2}⁄_{14173.71} = 11.4419826637.

```
>>> (chisq$observed-chisq$expected)^2/(chisq$expected)
# We can extract the expected and observed counts from the result of the test and calculate the difference
Every-day smoker Some-day smoker Former smoker
Male 11.44191 6.116505 1.746773
Female 14.32725 7.658922 2.187262
```

The sum of these quantities on all cells is the test statistic; in this case, χ^{2} = ∑^{(observed-expected)2}⁄_{expected} = 43.47863.

```
>>> m= (chisq$observed-chisq$expected)^2/(chisq$expected)
>>> sum(m)
[1] 43.47863
```

Under the null hypothesis, this sum has approximately a chi-squared distribution with n = (number of rows-1) (number of columns-1) = (2 - 1) (3 - 1) = 2 degrees of freedom.

The test statistic is improbably large according to that chi-squared distribution, so we should reject the null hypothesis of independence.

The one-way analysis of variance (ANOVA) is used to **determine whether there are any statistically significant differences between the means of three or more independent groups**. The data is organized into several groups based on one single grouping variable (the factor or independent variable), so there is one categorical independent variable and one quantitative dependent variable.

Imagine that you want to test the effect of three different fertilizer mixtures on crop yield or the effect of three drugs on pain. You can use *a one-way ANOVA to find out if there is a difference in crop yields or a difference in pain between the three groups*. The independent variable is the type of fertilizer or drug. A pharmaceutical company conducts an experiment to test the effect of its new drug. The company selects some subjects and each subject is randomly assigned to one of three treatment groups where different doses are applied. The drug is the independent variable.

ANOVA test hypotheses:

H_{0}, null hypothesis: μ_{0} = μ_{1} = μ_{2} = … =μ_{k}, where μ = group mean and k = number of groups. In other words, the means of the different groups are identical. H_{a}, alternative hypothesis: At least, the mean of one group is different.

Let’s evaluate the relationship between self-reported sleeping hours and Coke consumption (too much, normal, nothing at all). The data is organized into several groups based on one single grouping variable “Coke consumption” and one quantitative dependent variable “sleeping hours.”

Coke Sleeping

toomuch 4

toomuch 3

toomuch 3

toomuch 5

toomuch 7

toomuch 5

normal 7

normal 8

normal 6

normal 8

normal 7

normal 7

normal 8

normal 8

nothing 7

nothing 8

nothing 7

nothing 8

nothing 8

nothing 9

*cocaSleep <- read.table("/file_path/CokeSleeping.txt", header = T)*: reads a file in table format and creates a data frame from it. header = T so the file (CokeSleeping.txt) contains the names of the variables as its first line (Coke Sleeping).*attach(cocaSleep)*: The database is attached to the R search path so objects in the database can be accessed by simply giving their names.- cocaSleep allows you to see the data in a nice clean table.

- We use the command aov() to run or compute an ANOVA:
**anova <- aov(Sleeping~Coke)**. - The function summary is used to view the summary of a statistical model in R (the analysis of our one-way anova test):
**summary(anova)**.

```
Df Sum Sq Mean Sq F value Pr(>F)
Coke 2 40.34 20.171 18.83 4.88e-05 ***
Residuals 17 18.21 1.071
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

The output includes the F-value column. This is the test statistic from the F test. The idea behind an ANOVA test relies on estimating the common population variance in two different ways: 1) through the common variance, the mean of the sample variances, which is called the variance within samples or residual variance and denoted S^{2}_{within}, and 2) through the variance of the sample means, which is called the variance between samples means and denoted S^{2}_{between}.

The **F-statistic is the ratio of the variance between samples means to the variance within sample means**: S^{2}_{between}/S^{2}_{within}. When the means are not significantly different, the variance of the sample means will be small, relative to the mean of the sample variances. When at least one mean is significantly different from the others, the variance of the sample means will be larger, relative to the mean of the sample variances. The larger the F value, the more likely it is that the variation caused by the independent variable is real and not due to chance.

As the p-value is less than the significance level 0.05, there are significant differences between the groups (it is highlighted with an asterisk “*" in the model summary). In other words, the Coke consumption has a real impact on sleeping.

```
>>> summary(cocaSleep)
Coke Sleeping
Length:20 Min. :3.00
Class :character 1st Qu.:5.75
Mode :character Median :7.00
Mean :6.65
3rd Qu.:8.00
Max. :9.00
>>> tapply(Sleeping, Coke, mean) # computes the mean for each factor variable (Coke) in a vector (Sleeping)
normal nothing toomuch
7.375000 7.833333 4.500000
>>> tapply(Sleeping, Coke, sd)
normal nothing toomuch
0.7440238 0.7527727 1.5165751
```

- Calculations by hand
SS
_{between}= Sum Sq = “sum of squares between groups” = ((7.375 - 6.65)^{2}* 8 + (7.8333 - 6.65)^{2}* 6 + (4.50 - 6.65)^{2}* 6 = 40.34. Observe that (7.375 - 6.65)^{2}* 8 = (Mean_{normal}- Mean)^{2}* Count_{normal}).**If the sample means are close to each other (and therefore the grand mean) this will be small.**Notice that MS means Mean Square: MS_{between}= Mean Sq = SS_{between}⁄df_{between}=^{40.34}⁄_{2}= 20.17. df_{between}= m -1 = 3 -1 = 2.

Coke Sleeping Mean_{toomuch}

toomuch 4 4.5

toomuch 3 4.5

toomuch 3 4.5

toomuch 5 4.5

toomuch 7 4.5

toomuch 5 4.5

SS_{within} = “Sum of squares within groups” = (4-4.5)^{2} +(3-4.5)^{2} +(3-4.5)^{2} +(5-4.5)^{2} +(7-4.5)^{2} +(5-4.5)^{2} + (7-7.375)^{2} + (8-7.375)^{2} + (6-7.375)^{2} +(8-7.375)^{2} + (7-7.3753)^{2} +(7-7.375)^{2} +(8-7.375)^{2} +(8-7.375)^{2} + (7-7.83)^{2} +(8-7.83)^{2} + (7-7.83)^{2} + (8-7.83)^{2} + (8-7.83)^{2} + (9-7.83)^{2} = 18.21. MS_{within} = SS_{within}⁄df_{within} = ^{18.21}⁄_{17} = 1.07. Finally, F = MS_{between}⁄MS_{within} = ^{20.171}⁄_{1.07} = 18.83.

- Let’s visualize our data:

```
library("ggpubr")
ggboxplot(cocaSleep, x = "Coke", y = "Sleeping",
color = "Coke", palette = c("blue", "red", "yellow"),
ylab = "Sleeping", xlab = "Coke")
```

- In one-way ANOVA test, a significant p-value indicates that some of the group means are statistically significant different,
*but we don’t have any idea which group*(or groups) has a significantly different mean. We can**perform a Tukey test to perform multiple pairwise-comparison between groups**and by doing so, we will be able to determine if the mean differences between specific groups are statistically significant.

```
TukeyHSD(anova)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Sleeping ~ Coke)
$Coke
diff lwr upr p adj
nothing-normal 0.4583333 -0.9755105 1.892177 0.6960429
toomuch-normal -2.8750000 -4.3088438 -1.441156 0.0002279
toomuch-nothing -3.3333333 -4.8661769 -1.800490 0.0000940
```

It can be seen from the output’s results (**diff** is the *difference between means of the two groups*; lwr and upr are the lower and the upper-end point of the confidence interval; and p adj, p-value after adjustment for the multiple comparisons) that there is significant difference between drinking too much Coke and normal, and drinking too much and nothing.

- We can use the residuals versus fits plot to check
**the homogeneity of variances**:*plot(model, 1)*

As we can see in the plot above, it looks quite random, there is no evident relationships between residuals and fitted values (the mean of each group). Therefore, we can assume the homogeneity of variances in the different groups. The Levene’s test is also available to check the homogeneity of variances.

```
library(car)
leveneTest(Sleeping~Coke)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 2 2.2956 0.1311
17
```

“Levene’s test is an inferential statistic used to *assess the equality of variances for two or more groups*. If the resulting p-value of Levene’s test is less than some significance level (typically 0.05), the obtained differences in sample variances are unlikely to have occurred based on random sampling from a population with equal variances,” Wiki, Levene’s test.

From our result, the p-value 0.1311 > 0.05 so there is no evidence to suggest that the variance across groups is statistically significantly different. In other words, we can assume the homogeneity of variances in the different groups.

A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according to the levels of two categorical variables. It not only aims at **assessing the main effect of each independent variable but also if there is any interaction between them**.

Hypotheses: H_{0}, null hypothesis: The means of the dependent variable for each group in the first independent variable are equal H_{a}, alternative hypothesis: The means of the dependent variable for each group in the first independent variable are not equal Similar hypotheses are considered for the other independent variable and the interaction effect.

Let’s determine if there is a statistically significant difference in “Sleeping hours” based on Coke consumption (too much, normal, nothing at all) and gender. The data is organized based on two grouping independent variables “Coke consumption” and “Gender”, and one quantitative dependent variable “Sleeping hours.” CokeSleeping.txt

f toomuch 5 m toomuch 5

m toomuch 4 f toomuch 5

f toomuch 4 f toomuch 6

m toomuch 4 f toomuch 5

f toomuch 6 f toomuch 6

m toomuch 6 f toomuch 6

f normal 6 f normal 7

m normal 5 f normal 8

m normal 4 m normal 5

m normal 5 m normal 4

f normal 8 f normal 6

f normal 7 f normal 7

m normal 6 m nothing 6

f nothing 8 m nothing 6

f nothing 8 f nothing 7

m nothing 8 f nothing 6

m nothing 8 \

*cocaSleep <- read.table("/file_path/CokeSleeping.txt”, header = T)*: reads a file in table format and creates a data frame from it. header = T so the file (CokeSleeping.txt) contains the names of the variables as its first line (Gender Coke Sleeping).*attach(cocaSleep)*: The database is attached to the R search path so objects in the database can be accessed by simply giving their names.- cocaSleep allows you to see the data in a nice clean table.

*summary(cocaSleep)*gives you the range, quartiles, median, and mean.

```
Gender Coke Sleeping
Length:33 Length:33 Min. :4.00
Class :character Class :character 1st Qu.:5.00
Mode :character Mode :character Median :6.00
Mean :5.97
3rd Qu.:7.00
Max. :8.00
```

- We use the command aov() to run or compute an ANOVA:
**anova <- aov(Sleeping~Gender*Coke)**. - The function summary is used to view the summary of a statistical model in R (the analysis of our two-way Anova test):
**summary(anova)**.

```
Df Sum Sq Mean Sq F value Pr(>F)
Gender 1 7.120 7.120 9.513 0.00467 **
Coke 2 21.948 10.974 14.662 4.89e-05 ***
Gender:Coke 2 5.693 2.847 3.803 0.03505 *
Residuals 27 20.208 0.748
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

From the ANOVA table above, we can conclude that both independent variables “Coke consumption” and “Gender” are statistically significant, as well as their interaction. Futhermore, Coke consumption is the most significant factor variable.

- Let’s visualize our data:

```
library("ggpubr")
ggboxplot(cocaSleep, x = "Coke", y = "Sleeping", color = "Gender",
palette = c("red", "blue"))
```

Create a boxplot: x = “Coke” is the name of the x variable, y = “Sleeping” is the name of the y variable, and we change outline colors by groups (color = “gender”) and use custom color palette ( palette = c(“red”, “blue”) ).

```
>>> tapply(Sleeping, list(Gender, Coke), mean)
# computes the mean for each factor variable (Gender and Coke) in a vector (Sleeping)
normal nothing toomuch
f 7.000000 7.25 5.375
m 4.833333 7.00 4.750
>>> tapply(Sleeping, list(Gender, Coke), sd)
normal nothing toomuch
f 0.8164966 0.9574271 0.7440238
m 0.7527727 1.1547005 0.9574271
```

Caffeine can have a disruptive effect on your sleep. Sodas are “loaded with caffeine and lots of sugar. The caffeine can make it hard to fall asleep, and the sugar may affect your ability to stay asleep”, sleepeducation.org. This data confirm our assumption, but it also indicates that gender plays a role, it is worse in men.

**Pairwise comparisons of means are conducted only after the ANOVA test detects that some of the group means are significant different**. We will compute Tukey HSD for performing multiple pairwise-comparison between the means of groups. We don’t need to perform the test for the “Gender” variable because it has only two levels, which have been already proven to be significantly different by the two-way ANOVA test. Therefore, the Tukey HSD test will be done only for the factor variable “Coke”.

```
>>> TukeyHSD(anova, which = "Coke")
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Sleeping ~ Gender * Coke)
$Coke
diff lwr upr p adj
nothing-normal 1.1611481 0.1972612 2.12503488 0.0158381
toomuch-normal -0.9538269 -1.8125255 -0.09512825 0.0272046
toomuch-nothing -2.1149749 -3.0940420 -1.13590787 0.0000340
```

As we can see from the output below, all pairwise comparisons are significant with an adjusted p-value < 0.05. However, it is also revealed a more significant difference between drinking too much Coke or not at all.

- Check the homogeneity of variance:

```
plot(anova, 1) # A "residuals versus fits plot" is a scatter plot of residuals on the y axis and fitted values (estimated responses) on the x axis. It is used to detect non-linearity, unequal error variances, and outliers.
library(car)
leveneTest(Sleeping~Gender*Coke)
> leveneTest(Sleeping~Gender*Coke)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 5 0.7188 0.6149
27
```

The residuals bounce randomly around the 0 line. Points 5 (f toomuch 5), 11 (f toomuch 6) and 32 (m nothing 8) are detected as outliers, you may need to consider to remove them from your study. Furthermore, we can see that the Levene’s p-value (0.6149) is greater than the significance level of 0.05. In other words, there is no evidence to suggest that the variance across groups is statistically significantly different and **we can assume the homogeneity of variances in the different groups**.

- Assessing Normality: Check that the observations are normally distributed:

```
>>> plot(anova, 2)
```

A “normal probability plot of the residuals” is a way of learning whether it is reasonable to assume that the error terms are normally distributed. If the data follow a normal distribution, then a plot of the theoretical percentiles of the normal distribution versus the observed sample percentiles (the percentiles of the residuals) should be approximately linear.

```
# Extract the residuals
>>> my_residuals <- residuals(object = anova)
# Run Shapiro-Wilk test
>>> shapiro.test(x = my_residuals ) # The Shapiro–Wilk test is a test of normality in statistics. W = 0.93722, p-value = 0.05641 indicates that normality is not violated.
Shapiro-Wilk normality test
data: my_residuals
W = 0.93722, p-value = 0.05641
```

A one-way repeated measures ANOVA or a within-subjects ANOVA is the equivalent of the one-way ANOVA, **but for related, not independent groups**. It requires one independent categorical variable and one dependent variable. *It is used to compare three or more group means where the participants are the same in each group*.

For example, we are going to use a repeated measures ANOVA to understand whether there is a difference in memory retention (subjects are given a list of words that they need to memorize for a test) over time (half an hour, an hour, and next day). In this example, “memory” is the dependent variable (the number of words being remembered by our subjects), whilst the independent variable is “time” (i.e., with three related groups, where each of the three time points is considered a “related group”).

Subject Time Memory

Peter HalfHour 32

Peter Hour 30

Peter NextDay 12

Paul HalfHour 30

Paul Hour 28

Paul NextDay 10

Martha HalfHour 34

Martha Hour 32

Martha NextDay 16

John HalfHour 28

John Hour 24

John NextDay 8

Esther HalfHour 28

Esther Hour 20

Esther NextDay 4

Bew HalfHour 38

Bew Hour 34

Bew NextDay 22

Robert HalfHour 32

Robert Hour 30

Robert NextDay 12 \

*timeMemory <- read.table("/home/nmaximo7/memory.txt”, header = T)*: reads a file in table format and creates a data frame from it. header = T so the file (memory.txt) contains the names of the variables as its first line (Subject Time Memory).*attach(timeMemory)*: The database is attached to the R search path so objects in the database can be accessed by simply giving their names.*timeMemory*allows you to see the data in a nice clean table.- Let’s compute some summary statistics of Memory by time:

```
myData <- timeMemory %>% group_by(Time) # group_by() takes an existing data frame and converts it into a grouped one (time is the grouping variable) where operations are performed "by group"
get_summary_stats(myData, type = "mean_sd") # get_summary_stats computes summary statistics for myData. type is the type of summary statistics, eg., "full", "common", "mean_sd", etc.
```

- Next, we visualize our data:
**ggboxplot(myData, x = “Time”, y = “Memory”, add = “point”, color=“Time”)**

Obviously, the passing of time has a negative effect on memory. The longer the time from learning to testing, the less material (number of words) will be remembered.

- Let’s check some assumption. There should be
**no significant outliers**. Outliers are data values that differ greatly from the majority of a set of data. They can be caused by data entry or measurement errors.

```
library(rstatix)
identify_outliers(myData, Memory) #takes a data frame and extract rows suspected as outliers
```

There is no extreme outliers, but Bew NextDay 22 is detected as an outlier. 22 seems to be a lot of words being remembered after twenty four hours compared with other subjects results.

- Let’s check Normality.

```
shapiro_test(myData, Memory)
```

The Shapiro–Wilk test is a test of normality in statistics. The Memory is normally distributed at each time point (HalfHour, Hour, and NextDay), as it is shown by the Shapiro-Wilk’s test because all probabilities are greater than 0.05 (p > 0.05).

```
ggqqplot(myData, "Memory", facet.by = "Time", color="Time")
```

We draw a “quantile-quantile” plot to visually check the normality assumption using ggqqplot: myData (the data frame), facet.by (it specifies the grouping variable, “Time”), and color (point color). If the data is normally distributed, the points in the QQ-normal plot lie on a straight diagonal line.

- We use the command anova_test() to run the ANOVA:
**anova_test(data = timeMemory, dv = Memory, wid = Subject, within = Time)**where data is our data frame; dv = Memory is the dependent variable, wid = Subject is the column name containing subjects identifier, and within = Time is the within-subjects factor variable.

```
$ANOVA
Effect DFn DFd F p p<.05 ges
1 Time 2 12 302 5.47e-11 * 0.789
$`Mauchly's Test for Sphericity`
Effect W p p<.05
1 Time 0.976 0.942
```

The assumption of sphericity is automatically checked during the ANOVA test. The null hypothesis is that the variances of the differences between groups are equal. Therefore, a significant p-value indicates that the variances of group differences are not equal, and a not significant p-value (p = 0.942 > 0.05) indicates that *we can assume sphericity*.

The Memory is statistically significantly different at the different time points: F(2, 12) = 302, p = 5.47e-11 < .05.

- Finally, we will ask R to compute multiple pairwise paired t-tests between the levels of the within-subjects factor (Time):
**timeMemory %>% pairwise_t_test(Memory ~ Time, paired = TRUE, p.adjust.method = “bonferroni”)**

Considering the Bonferroni adjusted p-value (p.adj), it can be seen that the effect of time in memory is statistically significant at each time point. It is less statistically significant after just half an hour which makes a lot of sense as half an hour is not much time.

It is used when there are two factors in the experiment and the same subjects take part in all of the conditions of the experiment. We evaluate simultaneously the effect of two within-subject factors and their interaction on a dependent variable.

Suppose we have a slightly different experiment in which there are two independent variables: time of day at which subjects are tested (with three levels: half an hour, an hour, and next day) and emotional reaction to words (with two levels: positive and negative). Subjects are given a memory test under all permutations of these two variables. A two-way repeated-measures ANOVA is an appropriate test in these circumstances.

Subject Time Emotion Memory

Peter HalfHour Positive 34 | Peter HalfHour Negative 32

Peter Hour Positive 30 | Peter Hour Negative 27

Peter NextDay Positive 14 | Peter NextDay Negative 11

Paul HalfHour Positive 31 | Paul HalfHour Negative 27

Paul Hour Positive 28 | Paul Hour Negative 25

Paul NextDay Positive 14 | Paul NextDay Negative 9

Martha HalfHour Positive 34 | Martha HalfHour Negative 31

Martha Hour Positive 32 | Martha Hour Negative 31

Martha NextDay Positive 16 | Martha NextDay Negative 9

John HalfHour Positive 29 | John HalfHour Negative 27

John Hour Positive 24 | John Hour Negative 21

John NextDay Positive 12 | John NextDay Negative 7

Esther HalfHour Positive 28 | Esther HalfHour Negative 27

Esther Hour Positive 20 | Esther Hour Negative 19

Esther NextDay Positive 10 | Esther NextDay Negative 7

Bew HalfHour Positive 38 | Bew HalfHour Negative 37

Bew Hour Positive 34 | Bew Hour Negative 32

Bew NextDay Positive 22 | Bew NextDay Negative 12

Robert HalfHour Positive 32 | Robert HalfHour Negative 31

Robert Hour Positive 30 | Robert Hour Negative 28

Robert NextDay Positive 15 | Robert NextDay Negative 9

*timeMemoryEmotion <- read.table("/home/nmaximo7/memoryEmotion.txt", header = T)*: reads a file in table format and creates a data frame from it. header = T so the file (memoryEmotion.txt) contains the names of the variables as its first line (Subject Time Emotion Memory).*attach(timeMemoryEmotion)*: The database is attached to the R search path so objects in the database can be accessed by simply giving their names.*timeMemoryEmotion*allows you to see the data in a nice clean table.- Let’s compute some summary statistics of Memory by time and emotion:

```
myData <- timeMemory %>% group_by(Time, Emotion)
# group_by() takes an existing data frame and converts it into a grouped one (Time and Emotion are the grouping variables) where operations are performed "by group"
get_summary_stats(myData, type = "mean_sd") # get_summary_stats computes summary statistics for myData. type is the type of summary statistics, eg., "full", "common", "mean_sd", etc.
```

- Next, we visualize our data:
**ggboxplot(myData, x = “Time”, y = “Memory”, color=“Emotion”)**

Obviously, the passing of time has a negative effect on memory. The longer the time from learning to testing, the less material (number of words) will be remembered. Besides, positive emotional words are typically processed and remembered more accurately and efficiently than neutral or negative words.

- Let’s check some assumption. There should be
**no significant outliers**. Outliers are data values that differ greatly from the majority of a set of data. They can be caused by data entry or measurement errors.

```
library(rstatix)
identify_outliers(myData, Memory)
#takes a data frame and extract rows suspected as outliers
```

There is no extreme outliers, but Bew Positive NextDay 22 is detected as an outlier. 22 seems to be *a lot of positive words* being remembered after twenty four hours compared with other subjects.

- Let’s check Normality:
**shapiro_test(myData, Memory)**. The Shapiro–Wilk test is a test of normality in statistics. The Memory is normally distributed under both emotional words conditions and at each time point (HalfHour, Hour, and NextDay), as it is shown by the Shapiro-Wilk’s test because all probabilities are greater than 0.05 (p > 0.05).

```
ggqqplot(myData, "Memory", color="Time") + facet_grid("Time ~ Emotion")
```

We draw a “quantile-quantile” plot to visually check the normality assumption using ggqqplot. *With face_grid we can split our plot into many plots*. Its syntax is as follows: facet_grid(row_variable ~ column_variable). If the data is normally distributed, the points in the QQ-normal plot lie on a straight diagonal line.

- We use the command anova_test() to run the ANOVA:
**anova_test(data = timeMemoryEmotion, dv = Memory, wid = Subject, within = c(Time, Emotion))**where data is our data frame; dv = Memory is the dependent variable, wid = Subject is the column name containing subjects identifier, and within = c(Time, Emotion) are the within-subjects factor variables.

```
ANOVA Table (type III tests)
$ANOVA
Effect DFn DFd F p p<.05 ges
1 Time 2 12 324.037 3.61e-11 * 0.844
2 Emotion 1 6 91.263 7.51e-05 * 0.170
3 Time:Emotion 2 12 10.073 3.00e-03 * 0.051
```

The Memory is statistically significantly different at the different time points (F(2, 12) = 324.037, p < 0.05) and under both emotional words conditions (F(1, 6) = 91.263, p < 0.05). There is also a statistically significant two-way interaction between Time and Emotion (F(2, 12) = 10.073, p < 0.05). In other words, the impact that one factor (e.g., time) has on the outcome or dependent variable (Memory) depends on the other factor (Emotion) and vice versa.

- Finally, we will ask R to compute multiple pairwise paired t-tests between the levels of the within-subjects factor Time:
**timeMemoryEmotion %>% pairwise_t_test(Memory ~ Time, paired = TRUE, p.adjust.method = “bonferroni”)**

Considering the Bonferroni adjusted p-value (p.adj), it can be seen that the effect of time in memory is statistically significant at each time point.

“The binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes-no question, and each with its own Boolean-valued outcome: success (with probability p) or failure (with probability q = 1 − p),” (Wikipedia, Binomial distribution).

A binomial test compares a sample proportion to a hypothesized proportion. The test has the following null and alternative hypotheses: H_{0}: π = p, the population proportion π is equal to some value p. H_{a}: π ≠ p, the population proportion π is not equal to some value p.

**Binomial tests are used in situations where a researcher wants to know whether or not a subject is just guessing** or truly is able to perform some task. For example, a monkey is shown two pictures of dots simultaneously. Then, it is shown two choices: the previous dots added together and a different number. Let’s suppose that the experiment is repeated 100 times and the monkey ended up selecting the right answer on 68 of the trials.

**binom.test(68, 100, 0.5, alternative=‘greater’)** performs an exact test of a simple null hypothesis about the probability of success in a Bernoulli experiment where: x = 68 is the number of successes; n = 100 is the number of trials; p = 0.5 is the hypothesized probability of success; and alternative=‘greater’ indicates the alternative hypothesis.

The binomial test tells us something about what flipping a coin could have done (p = 0.5). It states that the binomial process, flipping a coin p=1/2, is capable of producing 65/100 or greater with only a very small probability (0.0002044). Therefore, the monkey is not really good at Maths, but it is very unlikely that the 68% correct answers could be explained by pure chance, so it was not just guessing. Hire the fucking monkey!

In other words, p-value is less than 0.05. It indicates strong evidence against the null hypothesis. Therefore, we reject the null hypothesis, and accept the alternative hypothesis: the true probability of success is greater than 0.5.

How can we distinguish the fair from the loaded dice? **binom.test(14, 60, 1/6)**, so x = 14 is the number of successes of an outcome, e.g., the die landing with a six; n = 60 is the number of trials; and p = 1/6 is the hypothesized probability of success (a fair dice).

p-value = 0.1664, it is higher than 0.05 (> 0.05), so it is not statistically significant. It indicates strong evidence for the null hypothesis. This means that we retain the null hypothesis, p = 1/6, the dice is fair.

**binom.test(30, 60, 1/6)**, so x = 30, the die has landed with a six 30 times; n = 60 is the number of trials; and p = 1/6 is the hypothesized probability of success (a fair dice).

p-value = 2.785e-09, it is lower than 0.05 (> 0.05), so it is statistically significant. It indicates strong evidence against the null hypothesis. This means that the dice is loaded. In other words, it is very unlikely that a fair dice will land on six 30 times out of 60.