
I found this visual from the presentation to be interesting. One of the things that never came to mind is that visualization can also lie.
I found this visual to be easy to read and understand.

I found this visual from the presentation to be interesting. One of the things that never came to mind is that visualization can also lie.
I found this visual to be easy to read and understand.
Summary
For my final project in this class I will analyze the difference in price between two different model cars. I chose this topic in order to show the difference in price based on the model of the car and the current milage.
I will be using different analyzing methods to be able to show these differences.
I will collect the data (random samples) from the website autotrader.com. The random samples will be used cars, models Honda and Nissan.
Null hypothesis
The price in the sample of cars will be significantly different based on milage.
Alternative hypothesis
The price in the sample of cars will not be significantly different based on milage.
I will show different analysis with the data collected using R.
Code in R
price<- c(4520,17500,22800,25177,15455,21750,11678,19772,18916,17797,8465,8278,5527,18283,17995,25333,18263,17500,16363,16342)
miles<- c(169123,36711,11647,25592,56943,18880,65726,4091,39690,40564,110113,90288,167197,29407,35055,26670,17533,36711,29770,20510)
CombinedCars<- data.frame(cbind(price,miles))
price miles
1 4520 169123
2 17500 36711
3 22800 11647
4 25177 25592
5 15455 56943
6 21750 18880
7 11678 65726
8 19772 4091
9 18916 39690
10 17797 40564
11 8465 110113
12 8278 90288
13 5527 167197
14 18283 29407
15 17995 35055
16 25333 26670
17 18263 17533
18 17500 36711
19 16363 29770
20 16342 20510
Sample T-test in R

Significant information
Mean of x= 16385.70 Mean of y= 51611.05
P- Value= 0.003548
Significance level of α= 0.05
This indicates that there is a significant difference between the two means
Confidence level between -57519.37 and -12931.33
Anova test in R

This analysis shows us there will be a difference in price based on milage.

Scatter plot using R

The scatter plot indicates there is decrease in price based on the milage of the car.
Conclusion
There is significant difference in price based on the milage of the car. I was able to conclude that the null hypothesis is true.
The price of the car will decreased with the more milage the car has.
> Beachcomber<- c(163,64,227)
> Windsurfer<- c(154,108,262)
> table(Beachcomber,Windsurfer)
Windsurfer
Beachcomber 108 154 262
64 1 0 0
163 0 1 0
227 0 0 1
> chisq.test(table(Beachcomber,Windsurfer))
Pearson’s Chi-squared test
data: table(Beachcomber, Windsurfer)
X-squared = 6, df = 4, p-value = 0.1991

> HighStress<- c(10,9,8,9,10,8)
> ModerateStress<- c(8,10,6,7,8,8)
> LowStress<- c(4,6,6,4,2,2)
> Combined_Groups<- data.frame(cbind(HighStress, ModerateStress, LowStress))
>
> HighStress
[1] 10 9 8 9 10 8
> ModerateStress
[1] 8 10 6 7 8 8
> LowStress
[1] 4 6 6 4 2 2
> Combined_Groups
HighStress ModerateStress LowStress
1 10 8 4
2 9 10 6
3 8 6 6
4 9 7 4
5 10 8 2
6 8 8 2
> summary(Combined_Groups)
HighStress ModerateStress LowStress
Min. : 8.00 Min. : 6.000 Min. :2.0
1st Qu.: 8.25 1st Qu.: 7.250 1st Qu.:2.5
Median : 9.00 Median : 8.000 Median :4.0
Mean : 9.00 Mean : 7.833 Mean :4.0
3rd Qu.: 9.75 3rd Qu.: 8.000 3rd Qu.:5.5
Max. :10.00 Max. :10.000 Max. :6.0
>
> Stacked_Groups<- stack(Combined_Groups)
> Stacked_Groups
values ind
1 10 HighStress
2 9 HighStress
3 8 HighStress
4 9 HighStress
5 10 HighStress
6 8 HighStress
7 8 ModerateStress
8 10 ModerateStress
9 6 ModerateStress
10 7 ModerateStress
11 8 ModerateStress
12 8 ModerateStress
13 4 LowStress
14 6 LowStress
15 6 LowStress
16 4 LowStress
17 2 LowStress
18 2 LowStress
> AnovaResults <- aov(values ~ ind, data= Stacked_Groups)
> AnovaResults
Call:
aov(formula = values ~ ind, data = Stacked_Groups)
Terms:
ind Residuals
Sum of Squares 82.11111 28.83333
Deg. of Freedom 2 15
Residual standard error: 1.386442
Estimated effects may be unbalanced
> summary(AnovaResults)
Df Sum Sq Mean Sq F value Pr(>F)
ind 2 82.11 41.06 21.36 4.08e-05 ***
Residuals 15 28.83 1.92
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> x<- c(1,2,2,1,2,2,1,2,1,2,1,2)
> y<- c(9,3,1,8,2,6,3,4,8,3,10,6)
> mean(x)
[1] 1.583333
> mean(y)
[1] 5.25
> t.test(x)
One Sample t-test
data: x
t = 10.652, df = 11, p-value = 3.921e-07
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
1.256163 1.910503
sample estimates:
mean of x
1.583333
> t.test(y)
One Sample t-test
data: y
t = 6.0853, df = 11, p-value = 7.901e-05
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
3.351125 7.148875
sample estimates:
mean of x
5.25
> muX<- 1.583333
> muY<- 5.25
Question 1
Ha: The new machine is not meeting the manufacturer’s specifications for average strength.
B)70-69.1/3.5-7= -0.26 not significant at 0.05
C)P Value= 0.397432 not significant at 0.05
D)70-69.1/1.75-7= -0.17 not significant at 0.05
E)69-69.1/3.5-7= 0.028 not significant at 0.05
P- Value= 0.488831
Question 2
N= 64
Mean Xbar= 85
Std Dev.= 8
85-+ 1.960* 8/sqrt(64)
85-1.96= 83.04 85+1.96= 86.96
Question 3
x<-c(“boys”)
y<-c(“girls”)
z<-c(4,5,6, 19,22,28)
df<- data.frame(x,y,z)
cor.test(x,y,z)
Error in match.arg(alternative) :
‘arg’ must be NULL or a character vector——— Kept getting this error.
Other approach I tried
> cor(x,y,z)
Error in cor(x, y, z) : invalid ‘use’ argument
In addition: Warning message:
In if (is.na(na.method)) stop(“invalid ‘use’ argument”) :
the condition has length > 1 and only the first element will be used—– Getting this error.
85 ± 1.96 = 85-1.96= 83.04 and 85+1.96= 86.96
125±10.32= 125-10.32= 114.68 and 125+10.32=135.32
3a. Set up a 99% confidence interval estimate of the true population mean amount of paint included in 1-gallon can?
0.02/50
= 0.99±2.58 * 0.02/50
.99-0.0004=0.9896. .99+0.0004=0.9904
3b. On the basis of your results, do you think that the manager has a right to complain to the manufacturer? why?
Yes, because the CI does not equal to 1.
4a. Assuming a normal distribution set up with 95% confidence interval estimate of the mean value of all greeting cards stored in the store’s inventory.
x̄= 1.67 1.96*0.32/√20= 1402
1.67-0.1402= 1.5298. and 1.67+0.1402= 1.8102
4b. How might the results obtained in (a) be useful in assisting the store owner to estimate of the mean value of all greeting cards in the store’s inventory.
It lets the manager know that the inventory value is between $1.53 and $1.81
1.96 * 15/5
5.88
N=35
Null example: Not doing the homework assignments will result in the student not passing the class.
Alternative example: Doing the homework has no effect on your final grade in the class.
> population
[1] 8 14 16 10 11
> mean(population)
[1] 11.8
> sample(population,2)
[1] 16 14
> mean(16,14)
[1] 16
> sample<-c(16,14)
> mean(sample)
[1] 15
> sd(sample)
[1] 1.414214
> mean(population)
[1] 11.8
> sd(population)
[1] 3.193744
The mean of the random sample is 15, while the mean of the population set of (8,14, 16, 10, 11) equals to 11.8.
The standard Deviation of the random sample is 1.414214, while the sd of the population set is 3.19374.
x <- c(0.5, 0.2, 0.15, 010)
> y <- c(0.10, 0.2, 0.6, 0.2)
> x<- c(0,1,2,3)
> x <- c(0.5, 0.2, 0.15, 010)
> x
[1] 0.50 0.20 0.15 10.00
> mean(x)
[1] 2.7125
> mean(y)
[1] 0.275
> var(x)
[1] 23.62729
> var(y)
[1] 0.04916667
> x<-c(7,9)
> sd(x)
[1] 1.414214
> 7-8/1.414214
[1] 1.343148
> -1/1.414214
[1] -0.7071066pnorm(-0.707)
[1] 0.2397832
> pnorm(0.707)[1]
[1] 0.7602168
Homework assignment #4
A1. Event A = .33
A2. Event B? = .33
A3. Event A or B = 0.555
A4. P(A or B) = P(A) + P(B) = .66
When does this part of the assignment I went back to the video lecture and the way the video showed the chart was adding the two numbers. I tried adding the two number which it gave me 30. This part was a little confusing.
0.014(0.9)/
(0.014)(0.9)+(0.986)(0.1)
P (A1 | B) = 0.111
While doing the calculations I was getting the answer as 0.113, I’m not sure why.
Wagner Perez HW#4