4 Lists and attributes

4.1 Cross-entropy loss take two

In one of the previous exercises, we computed the cross-entropy loss between a logical vector \(y∈{0,1}^n\) and a numeric vector \(p ∈ (0,1)^n\). This measure can be equivalently defined as:

\(L(p, y) = - \frac{1}{n} \left( \sum_{i=1}^{n} y_i log(p_i) + (1 - y_i) log(1-p_i) \right)\)

Using vectorised operations, but not relying on ifelse this time, implement this formula

cross_entropy_loss_2 <- function(p, y) {
  stopifnot(is.numeric(p))
  stopifnot(is.logical(y))
  
  n <- length(p)
  
  stopifnot(n == length(y))
  
  - (1/n) * sum( y * log(p) + (1 - y) * log(1 - p))
}

Then, compute the cross-entropy loss between, for instance, “y <- sample(c(FALSE, TRUE), n)” and “p <- runif(n)” for some n.

n <- 100
y <- sample(c(FALSE, TRUE), n, replace = TRUE)
p <- runif(n)

cross_entropy_loss_2(p, y)

[1] 0.9256621

Note how seamlessly we translate between FALSE/TRUEs and 0/1s in the above equation (in particular, where \(1 - y_i\) means the logical negation of \(y\)).

4.2 First attributes

4.4.2. But there are a few use cases

Create a list with EUR/AUD, EUR/GBP, and EUR/USD exchange rates read from the euraud-*.csv, eurgbp-*.csv, and eurusd-*.csv files in our data repository. Each of its three elements should be a numeric vector storing the currency exchange rates. Furthermore, equip them with currency_from, currency_to, date_from, and date_to attributes. For example:

currency_to <- c("AUD", "GBP", "USD")

str(lapply(currency_to,
  function(currency_to) {
    data <-
      scan(
        paste0(
          "https://github.com/gagolews/teaching-data/raw/",
          "master/marek/eur",
          tolower(currency_to),
          "-20200101-20200630.csv"
        ),
        comment.char = "#"
      )
    structure(
      data,
      currency_from = "EUR",
      currency_to = currency_to,
      date_from = "2020-01-01",
      date_to = "2020-06-30"
   )
}))

List of 3
 $ : num [1:182] NA 1.6 1.6 NA NA ...
  ..- attr(*, "currency_from")= chr "EUR"
  ..- attr(*, "currency_to")= chr "AUD"
  ..- attr(*, "date_from")= chr "2020-01-01"
  ..- attr(*, "date_to")= chr "2020-06-30"
 $ : num [1:182] NA 0.848 0.851 NA NA ...
  ..- attr(*, "currency_from")= chr "EUR"
  ..- attr(*, "currency_to")= chr "GBP"
  ..- attr(*, "date_from")= chr "2020-01-01"
  ..- attr(*, "date_to")= chr "2020-06-30"
 $ : num [1:182] NA 1.12 1.11 NA NA ...
  ..- attr(*, "currency_from")= chr "EUR"
  ..- attr(*, "currency_to")= chr "USD"
  ..- attr(*, "date_from")= chr "2020-01-01"
  ..- attr(*, "date_to")= chr "2020-06-30"

4.3 Comment

4.4.3. Special attributes

comment is perhaps the most rarely used special attribute. Create an object (whatever) equipped with the comment attribute. Verify that assigning to it anything other than a character vector leads to an error. Read its value by calling the comment function. Display the object equipped with this attribute. Note that the print function ignores its existence whatsoever: this is how special it is.

my_numbers <- c(1, 2, 3)

comment(my_numbers) <- "a curious attribute"

# Can't do this
comment(my_numbers) <- TRUE

Error in `comment<-`(`*tmp*`, value = TRUE): attempt to set invalid 'comment' attribute

comment(my_numbers)

[1] "a curious attribute"

4.4 Label elements with `names()`

4.4.4. Labelling vector elements with the names attribute

(x <- structure(c(13, 2, 6), names=c("spam", "sausage", "celery")))

   spam sausage  celery 
     13       2       6

Verify that the above x is still an ordinary numeric vector by calling typeof and sum on it.

typeof(x)

[1] "double"

sum(x)

[1] 21

👍

4.5 Functions which return named vectors

A whole lot of functions return named vectors. Evaluate the following expressions and read the corresponding pages in their documentation:

quantile(runif(100)),

quantile(runif(100))

         0%         25%         50%         75%        100% 
0.002999586 0.227113775 0.422457957 0.697311777 0.993522633

Named with the proportion at each quantile, neat.

hist(runif(100), plot=FALSE),

hist(runif(100), plot=FALSE)

$breaks
 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

$counts
 [1] 13 11 14  6  8 16  9  6  7 10

$density
 [1] 1.3 1.1 1.4 0.6 0.8 1.6 0.9 0.6 0.7 1.0

$mids
 [1] 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95

$xname
[1] "runif(100)"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

Yes, a name for the different features of the plot. Breaks for the bins, counts for the frequency, you’ve got density and then one I wasn’t particularly expecting, mids for the middle of the breaks. A little bit of non-standard evaluation to store the x input as a name and then a settingn for using equal distances.

options() (take note of digits, scipen, max.print, and width),

options()[names(options()) %in% c("digits", "scipen", "max.print", "width")]

$digits
[1] 7

$max.print
[1] 99999

$scipen
[1] 0

$width
[1] 80

Yeah I’ve used scipen before to see a full integer instead of the scientific 1.1e6 style version.

capabilities().

capabilities()

       jpeg         png        tiff       tcltk         X11        aqua 
       TRUE        TRUE        TRUE        TRUE       FALSE       FALSE 
   http/ftp     sockets      libxml        fifo      cledit       iconv 
       TRUE        TRUE       FALSE        TRUE       FALSE        TRUE 
        NLS       Rprof     profmem       cairo         ICU long.double 
       TRUE        TRUE        TRUE        TRUE        TRUE        TRUE 
    libcurl 
       TRUE

Ok, info on whether my R build has certain functionality.

4.6 Exercises

4.5. Exercises

Provide an answer to the following questions.

What is the meaning of c(TRUE, FALSE)1:10?*

The same as c(1, 0) * 1:10, for each number from 1 to 10, display the odd numbers (multiply them by 1) and show 0 (multiply by 0) for all the even numbers.

What does sum(as.logical(x)) compute when x is a numeric vector?

I think this is the number of non-zero values. Non-zero values are converted to TRUE by as.logical() then to 1 by sum coercing to numeric.

We said that atomic vectors of the type character are the most general ones. Therefore, is as.numeric(as.character(x)) the same as as.numeric(x), regardless of the type of x?

Definitely not, for instance a logical value converted to string first will be treated as any other string that doesn’t just contain numbers by as.numeric()

as.numeric(TRUE)

[1] 1

as.numeric(as.character(TRUE))

Warning: NAs introduced by coercion

[1] NA

What is the meaning of as.logical(x+y) if x and y are logical vectors? What about as.logical(x\*y), as.logical(1-x), and as.logical(x!=y)?

Huh, does the x+y one function like an OR statement? First the + operator converts both values to numeric, then any numeric value greater than 0 gets turned to TRUE by as.logical(), so as long as at least one of x or y is TRUE, the result will be too.

x*y would then be like an AND statement, because if even one is FALSE, you multiply by zero to get zero, which gets converted to FALSE.

We know from the cross-entropy loss example that wecan use 1 - x to give us the negation of a logical value e.g. TRUE becomes FALSE, like !x.

For the last one I think the as.logical() call is doing nothing because the result is already logical.

What is the meaning of the following when x is a logical vector? - cummin(x) and cummin(!x),

Whether all values have been TRUE up to that point, and the same for FALSE.

cummax(x) and cummax(!x),

Whether any values have been TRUE up to that point, and then the same for FALSE.

cumsum(x) and cumsum(!x),

The number of TRUE values and the number of FALSE values up to that point.

cumprod(x) and cumprod(!x).

Whether all values have been TRUE up to that point, and the same for FALSE.

Let x be a named numeric vector, e.g., “x <- quantile(runif(100))”. What is the result of 2*x, mean(x), and round(x, 2)?

Double every element of the vector. Take the sum of all values in the vector and divide it by the length of the vector. Round every value in the vector to 2 decimal places.

What is the meaning of x == NULL?

Uhh, I’m not exactly sure what that means, it’s trying to test each element of x for cotaining the value NULL, but I don’t think that works because of how NULLs are handled as zero-length vectors. I know that to test for NULL you need to use is.null().

Give two ways to create a named character vector.

Name the elements when creating the vector or uses the names<- replacement function.

c(this = "that", that = "this")

  this   that 
"that" "this"

my_vec <- c("John", "Smith")

names(my_vec) <- c("first_name", "last_name")

my_vec

first_name  last_name 
    "John"    "Smith"

Give two ways (discussed above; there are more) to remove the names attribute from an object.

Use unname() or set the attribute to NULL.

names(my_vec) <- NULL

my_vec

[1] "John"  "Smith"