<- function(p, y) {
cross_entropy_loss_2 stopifnot(is.numeric(p))
stopifnot(is.logical(y))
<- length(p)
n
stopifnot(n == length(y))
- (1/n) * sum( y * log(p) + (1 - y) * log(1 - p))
}
4 Lists and attributes
4.1 Cross-entropy loss take two
4.1.2. Implicit conversion (coercion)
In one of the previous exercises, we computed the cross-entropy loss between a logical vector \(y∈{0,1}^n\) and a numeric vector \(p ∈ (0,1)^n\). This measure can be equivalently defined as:
\(L(p, y) = - \frac{1}{n} \left( \sum_{i=1}^{n} y_i log(p_i) + (1 - y_i) log(1-p_i) \right)\)
Using vectorised operations, but not relying on ifelse this time, implement this formula
Then, compute the cross-entropy loss between, for instance, “
y <- sample(c(FALSE, TRUE), n)
” and “p <- runif(n)
” for some n.
<- 100
n <- sample(c(FALSE, TRUE), n, replace = TRUE)
y <- runif(n)
p
cross_entropy_loss_2(p, y)
[1] 0.9256621
Note how seamlessly we translate between FALSE/TRUEs and 0/1s in the above equation (in particular, where \(1 - y_i\) means the logical negation of \(y\)).
4.2 First attributes
4.4.2. But there are a few use cases
Create a list with EUR/AUD, EUR/GBP, and EUR/USD exchange rates read from the euraud-*.csv, eurgbp-*.csv, and eurusd-*.csv files in our data repository. Each of its three elements should be a numeric vector storing the currency exchange rates. Furthermore, equip them with currency_from, currency_to, date_from, and date_to attributes. For example:
<- c("AUD", "GBP", "USD")
currency_to
str(lapply(currency_to,
function(currency_to) {
<-
data scan(
paste0(
"https://github.com/gagolews/teaching-data/raw/",
"master/marek/eur",
tolower(currency_to),
"-20200101-20200630.csv"
),comment.char = "#"
)structure(
data,currency_from = "EUR",
currency_to = currency_to,
date_from = "2020-01-01",
date_to = "2020-06-30"
) }))
List of 3
$ : num [1:182] NA 1.6 1.6 NA NA ...
..- attr(*, "currency_from")= chr "EUR"
..- attr(*, "currency_to")= chr "AUD"
..- attr(*, "date_from")= chr "2020-01-01"
..- attr(*, "date_to")= chr "2020-06-30"
$ : num [1:182] NA 0.848 0.851 NA NA ...
..- attr(*, "currency_from")= chr "EUR"
..- attr(*, "currency_to")= chr "GBP"
..- attr(*, "date_from")= chr "2020-01-01"
..- attr(*, "date_to")= chr "2020-06-30"
$ : num [1:182] NA 1.12 1.11 NA NA ...
..- attr(*, "currency_from")= chr "EUR"
..- attr(*, "currency_to")= chr "USD"
..- attr(*, "date_from")= chr "2020-01-01"
..- attr(*, "date_to")= chr "2020-06-30"
4.3 Comment
comment is perhaps the most rarely used special attribute. Create an object (whatever) equipped with the comment attribute. Verify that assigning to it anything other than a character vector leads to an error. Read its value by calling the comment function. Display the object equipped with this attribute. Note that the print function ignores its existence whatsoever: this is how special it is.
<- c(1, 2, 3)
my_numbers
comment(my_numbers) <- "a curious attribute"
# Can't do this
comment(my_numbers) <- TRUE
Error in `comment<-`(`*tmp*`, value = TRUE): attempt to set invalid 'comment' attribute
comment(my_numbers)
[1] "a curious attribute"
4.4 Label elements with names()
4.4.4. Labelling vector elements with the names attribute
<- structure(c(13, 2, 6), names=c("spam", "sausage", "celery"))) (x
spam sausage celery
13 2 6
Verify that the above x is still an ordinary numeric vector by calling typeof and sum on it.
typeof(x)
[1] "double"
sum(x)
[1] 21
👍
4.5 Functions which return named vectors
A whole lot of functions return named vectors. Evaluate the following expressions and read the corresponding pages in their documentation:
- quantile(runif(100)),
quantile(runif(100))
0% 25% 50% 75% 100%
0.002999586 0.227113775 0.422457957 0.697311777 0.993522633
Named with the proportion at each quantile, neat.
- hist(runif(100), plot=FALSE),
hist(runif(100), plot=FALSE)
$breaks
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
$counts
[1] 13 11 14 6 8 16 9 6 7 10
$density
[1] 1.3 1.1 1.4 0.6 0.8 1.6 0.9 0.6 0.7 1.0
$mids
[1] 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
$xname
[1] "runif(100)"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
Yes, a name for the different features of the plot. Breaks for the bins, counts for the frequency, you’ve got density and then one I wasn’t particularly expecting, mids for the middle of the breaks. A little bit of non-standard evaluation to store the x input as a name and then a settingn for using equal distances.
- options() (take note of digits, scipen, max.print, and width),
options()[names(options()) %in% c("digits", "scipen", "max.print", "width")]
$digits
[1] 7
$max.print
[1] 99999
$scipen
[1] 0
$width
[1] 80
Yeah I’ve used scipen before to see a full integer instead of the scientific 1.1e6 style version.
- capabilities().
capabilities()
jpeg png tiff tcltk X11 aqua
TRUE TRUE TRUE TRUE FALSE FALSE
http/ftp sockets libxml fifo cledit iconv
TRUE TRUE FALSE TRUE FALSE TRUE
NLS Rprof profmem cairo ICU long.double
TRUE TRUE TRUE TRUE TRUE TRUE
libcurl
TRUE
Ok, info on whether my R build has certain functionality.
4.6 Exercises
Provide an answer to the following questions.
What is the meaning of c(TRUE, FALSE)1:10?*
The same as c(1, 0) * 1:10, for each number from 1 to 10, display the odd numbers (multiply them by 1) and show 0 (multiply by 0) for all the even numbers.
What does sum(as.logical(x)) compute when x is a numeric vector?
I think this is the number of non-zero values. Non-zero values are converted to TRUE by as.logical()
then to 1 by sum
coercing to numeric.
We said that atomic vectors of the type character are the most general ones. Therefore, is
as.numeric(as.character(x))
the same asas.numeric(x)
, regardless of the type of x?
Definitely not, for instance a logical value converted to string first will be treated as any other string that doesn’t just contain numbers by as.numeric()
as.numeric(TRUE)
[1] 1
as.numeric(as.character(TRUE))
Warning: NAs introduced by coercion
[1] NA
What is the meaning of
as.logical(x+y)
if x and y are logical vectors? What aboutas.logical(x\*y)
,as.logical(1-x)
, andas.logical(x!=y)
?
Huh, does the x+y one function like an OR statement? First the +
operator converts both values to numeric, then any numeric value greater than 0 gets turned to TRUE
by as.logical()
, so as long as at least one of x or y is TRUE
, the result will be too.
x*y would then be like an AND statement, because if even one is FALSE
, you multiply by zero to get zero, which gets converted to FALSE
.
We know from the cross-entropy loss example that wecan use 1 - x to give us the negation of a logical value e.g. TRUE
becomes FALSE
, like !x
.
For the last one I think the as.logical()
call is doing nothing because the result is already logical.
What is the meaning of the following when x is a logical vector? - cummin(x) and cummin(!x),
Whether all values have been TRUE
up to that point, and the same for FALSE
.
- cummax(x) and cummax(!x),
Whether any values have been TRUE
up to that point, and then the same for FALSE
.
- cumsum(x) and cumsum(!x),
The number of TRUE
values and the number of FALSE
values up to that point.
- cumprod(x) and cumprod(!x).
Whether all values have been TRUE
up to that point, and the same for FALSE
.
Let x be a named numeric vector, e.g., “x <- quantile(runif(100))”. What is the result of 2*x, mean(x), and round(x, 2)?
Double every element of the vector. Take the sum of all values in the vector and divide it by the length of the vector. Round every value in the vector to 2 decimal places.
What is the meaning of x == NULL?
Uhh, I’m not exactly sure what that means, it’s trying to test each element of x
for cotaining the value NULL
, but I don’t think that works because of how NULL
s are handled as zero-length vectors. I know that to test for NULL
you need to use is.null()
.
Give two ways to create a named character vector.
Name the elements when creating the vector or uses the names<-
replacement function.
c(this = "that", that = "this")
this that
"that" "this"
<- c("John", "Smith")
my_vec
names(my_vec) <- c("first_name", "last_name")
my_vec
first_name last_name
"John" "Smith"
Give two ways (discussed above; there are more) to remove the names attribute from an object.
Use unname()
or set the attribute to NULL
.
names(my_vec) <- NULL
my_vec
[1] "John" "Smith"