Final Project for LIS4930

Summary 

For my final project in this class I will analyze the difference in price between two different model cars. I chose this topic in order to show the difference in price based on the model of the car and the current milage.

I will be using different analyzing methods to be able to show these differences.

I will collect the data (random samples) from the website autotrader.com. The random samples will be used cars, models Honda and Nissan.

Null hypothesis

The price in the sample of cars will be significantly different based on milage.

Alternative hypothesis

The price in the sample of cars will not be significantly different based on milage.

 

I will show different analysis with the data collected using R.

Code in R

price<- c(4520,17500,22800,25177,15455,21750,11678,19772,18916,17797,8465,8278,5527,18283,17995,25333,18263,17500,16363,16342)
miles<- c(169123,36711,11647,25592,56943,18880,65726,4091,39690,40564,110113,90288,167197,29407,35055,26670,17533,36711,29770,20510)

CombinedCars<- data.frame(cbind(price,miles))

price  miles

1   4520 169123

2  17500  36711

3  22800  11647

4  25177  25592

5  15455  56943

6  21750  18880

7  11678  65726

8  19772   4091

9  18916  39690

10 17797  40564

11  8465 110113

12  8278  90288

13  5527 167197

14 18283  29407

15 17995  35055

16 25333  26670

17 18263  17533

18 17500  36711

19 16363  29770

20 16342  20510

 

Sample T-test in R

Screen Shot 2018-11-23 at 2.07.42 PM

Significant information

  • Sample means

Mean of x= 16385.70                  Mean of y= 51611.05

P- Value= 0.003548

Significance level of α= 0.05

            0.003648 ≤ 0.05

This indicates that there is a significant difference between the two means

Confidence level between -57519.37 and -12931.33

 

 

 

Anova test in R

Screen Shot 2018-11-23 at 2.04.53 PM

This analysis shows us there will be a difference in price based on milage.

Screen Shot 2018-11-23 at 2.59.00 PM.png

 

 

 

Scatter plot using R

Screen Shot 2018-11-23 at 2.56.27 PM.png

The scatter plot indicates there is decrease in price based on the milage of the car.

 

Conclusion

There is significant difference in price based on the milage of the car. I was able to conclude that the null hypothesis is true.

The price of the car will decreased with the more milage the car has.

 

Module # 11 Chi Square Test

> Beachcomber<- c(163,64,227)

> Windsurfer<- c(154,108,262)

> table(Beachcomber,Windsurfer)

Windsurfer

Beachcomber 108 154 262

64    1   0   0

163   0   1   0

227   0   0   1

> chisq.test(table(Beachcomber,Windsurfer))

 

Pearson’s Chi-squared test

 

data:  table(Beachcomber, Windsurfer)

X-squared = 6, df = 4, p-value = 0.1991

 

Screen Shot 2018-11-04 at 8.13.14 PM

Module # 10 Introduction to ANOVA

> HighStress<- c(10,9,8,9,10,8)
> ModerateStress<- c(8,10,6,7,8,8)
> LowStress<- c(4,6,6,4,2,2)
> Combined_Groups<- data.frame(cbind(HighStress, ModerateStress, LowStress))
>
> HighStress
[1] 10 9 8 9 10 8
> ModerateStress
[1] 8 10 6 7 8 8
> LowStress
[1] 4 6 6 4 2 2
> Combined_Groups
HighStress ModerateStress LowStress
1 10 8 4
2 9 10 6
3 8 6 6
4 9 7 4
5 10 8 2
6 8 8 2
> summary(Combined_Groups)
HighStress ModerateStress LowStress
Min. : 8.00 Min. : 6.000 Min. :2.0
1st Qu.: 8.25 1st Qu.: 7.250 1st Qu.:2.5
Median : 9.00 Median : 8.000 Median :4.0
Mean : 9.00 Mean : 7.833 Mean :4.0
3rd Qu.: 9.75 3rd Qu.: 8.000 3rd Qu.:5.5
Max. :10.00 Max. :10.000 Max. :6.0
>
> Stacked_Groups<- stack(Combined_Groups)
> Stacked_Groups
values ind
1 10 HighStress
2 9 HighStress
3 8 HighStress
4 9 HighStress
5 10 HighStress
6 8 HighStress
7 8 ModerateStress
8 10 ModerateStress
9 6 ModerateStress
10 7 ModerateStress
11 8 ModerateStress
12 8 ModerateStress
13 4 LowStress
14 6 LowStress
15 6 LowStress
16 4 LowStress
17 2 LowStress
18 2 LowStress
> AnovaResults <- aov(values ~ ind, data= Stacked_Groups)
> AnovaResults
Call:
aov(formula = values ~ ind, data = Stacked_Groups)

Terms:
ind Residuals
Sum of Squares 82.11111 28.83333
Deg. of Freedom 2 15

Residual standard error: 1.386442
Estimated effects may be unbalanced
> summary(AnovaResults)
Df Sum Sq Mean Sq F value Pr(>F)
ind 2 82.11 41.06 21.36 4.08e-05 ***
Residuals 15 28.83 1.92

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Module # 9 t-test for independent means

> x<- c(1,2,2,1,2,2,1,2,1,2,1,2)

> y<- c(9,3,1,8,2,6,3,4,8,3,10,6)

 

 

> mean(x)

[1] 1.583333

> mean(y)

[1] 5.25

 

 

> t.test(x)

 

One Sample t-test

 

data:  x

t = 10.652, df = 11, p-value = 3.921e-07

alternative hypothesis: true mean is not equal to 0

95 percent confidence interval:

1.256163 1.910503

sample estimates:

mean of x

1.583333

 

> t.test(y)

 

One Sample t-test

 

data:  y

t = 6.0853, df = 11, p-value = 7.901e-05

alternative hypothesis: true mean is not equal to 0

95 percent confidence interval:

3.351125 7.148875

sample estimates:

mean of x

5.25

 

> muX<- 1.583333

> muY<- 5.25

Module # 8 Hypothesis Testing and Correlation Analysis

Question 1

 

 

  1. A) Ho: A new machine is production a particular type of cookies according to the manufacturer’s specifications.

Ha: The new machine is not meeting the manufacturer’s specifications for average strength.

 

 

B)70-69.1/3.5-7= -0.26       not significant at 0.05

 

C)P Value= 0.397432       not significant at 0.05

 

D)70-69.1/1.75-7= -0.17    not significant at 0.05

 

E)69-69.1/3.5-7= 0.028   not significant at 0.05

P- Value= 0.488831

 

 

Question 2

 

N= 64

Mean Xbar= 85

Std Dev.= 8

 

85-+ 1.960* 8/sqrt(64)

85-1.96= 83.04            85+1.96= 86.96

 

 

Question 3

 

x<-c(“boys”)

y<-c(“girls”)

z<-c(4,5,6, 19,22,28)

 

df<- data.frame(x,y,z)

cor.test(x,y,z)

 

Error in match.arg(alternative) :

‘arg’ must be NULL or a character vector——— Kept getting this error.

 

 

Other approach I tried

 

> cor(x,y,z)

Error in cor(x, y, z) : invalid ‘use’ argument

In addition: Warning message:

In if (is.na(na.method)) stop(“invalid ‘use’ argument”) :

the condition has length > 1 and only the first element will be used—– Getting this error.

 

 

Module # 7 Review: Confidence Interval Estimation And introduction to Fundamental of hypothesis testing

 

  1. x̄ = 85 and σ = 8, and n = 64, set up a 95% confidence interval estimate of the population mean μ.

85 ± 1.96 = 85-1.96= 83.04 and 85+1.96= 86.96

 

 

  1. If x̄ = 125, σ = 24 and n = 36, set up a 99% confidence interval estimate of the population mean μ.

125±10.32= 125-10.32= 114.68 and 125+10.32=135.32

 

  1. The manager of a supply store wants to estimate the actual amount of paint contained in 1-gallon cans purchased from a nationally known manufacturer. It is known from the manufacturer’s specification sheet that standard deviation of the amount of paint is equal to 0.02 gallon. A Random sample of 50 cans is selected and the sample mean amount of paint per 1 gallon is 0.99 gallon.

 

3a. Set up a 99% confidence interval estimate of the true population mean amount of paint included in 1-gallon can?

 

0.02/50

= 0.99±2.58 * 0.02/50

.99-0.0004=0.9896.           .99+0.0004=0.9904

 
3b. On the basis of your results, do you think that the manager has a right to complain to the manufacturer? why?

Yes, because the CI does not equal to 1.

 

 

  1. A stationery store wants to estimate the mean retail value of greeting cards that has in its inventory. A random sample of 20 greeting cards indicates an average value of $1.67 and standard deviation of $0.32

4a. Assuming a normal distribution set up with 95% confidence interval estimate of the mean value of all greeting cards stored in the store’s inventory.

 

x̄= 1.67      1.96*0.32/20= 1402

 

 

1.67-0.1402= 1.5298. and 1.67+0.1402= 1.8102

 
4b. How might the results obtained in (a) be useful in assisting the store owner to estimate of the mean value of all greeting cards in the store’s inventory.

 

It lets the manager know that the inventory value is between $1.53 and $1.81

 

  1. If you want to be 95% confident of estimating the population mean to within a sampling error of ± 5 and standard deviation is assumed to be equal 15, what sample size is required?

 

1.96 * 15/5

5.88

N=35

 

 

 

  1. Generate your own null and alternative hypothesis statements and provide rationale for your selection.

 

Null example: Not doing the homework assignments will result in the student not passing the class.

 

Alternative example: Doing the homework has no effect on your final grade in the class.

 

 

Module # 6 Sampling & Confidence Interval Estimation

> population

[1]  8 14 16 10 11

> mean(population)

[1] 11.8

> sample(population,2)

[1] 16 14

> mean(16,14)

[1] 16

 

> sample<-c(16,14)

> mean(sample)

[1] 15

> sd(sample)

[1] 1.414214

 

> mean(population)

[1] 11.8

> sd(population)

[1] 3.193744

 

The mean of the random sample is 15, while the mean of the population set of (8,14, 16, 10, 11) equals to 11.8.

 

The standard Deviation of the random sample is 1.414214, while the sd of the population set is 3.19374.

Homework #5 Random Variables & Probability distributions

x <- c(0.5, 0.2, 0.15, 010)
> y <- c(0.10, 0.2, 0.6, 0.2)
> x<- c(0,1,2,3)
> x <- c(0.5, 0.2, 0.15, 010)
> x
[1] 0.50 0.20 0.15 10.00
> mean(x)
[1] 2.7125
> mean(y)
[1] 0.275
> var(x)
[1] 23.62729
> var(y)
[1] 0.04916667
> x<-c(7,9)
> sd(x)
[1] 1.414214
> 7-8/1.414214
[1] 1.343148
> -1/1.414214
[1] -0.7071066

pnorm(-0.707)
[1] 0.2397832
> pnorm(0.707)[1]
[1] 0.7602168

 

Homework #4 Probability Theory

Homework assignment #4

 

A1. Event A = .33
A2. Event B? = .33
A3. Event A or B = 0.555
A4. P(A or B) = P(A) + P(B) = .66

When does this part of the assignment I went back to the video lecture and the way the video showed the chart was adding the two numbers. I tried adding the two number which it gave me 30. This part was a little confusing.

 

  1. Applying Bayes’ Therorem
  • Doing the chart on the homework was really confusing. The reason why I find the chart confusing is because I am not sure where I would start. As I watched videos on Bayes Rule, a lot them just showed the steps and formula.

 

0.014(0.9)/

(0.014)(0.9)+(0.986)(0.1)

 

P (A1 | B) = 0.111

 

While doing the calculations I was getting the answer as 0.113, I’m not sure why.

 

 

 

 

Wagner Perez HW#4