Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast data.table to matrix conversion in C [Depends: #4196] #4144

Draft
wants to merge 207 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
207 commits
Select commit Hold shift + click to select a range
a0c09dc
Began implementing a fast as.matrix in C
sritchie73 Dec 28, 2019
a614743
Made loop code more efficient
sritchie73 Dec 28, 2019
a7a22a0
Casmatrix now works on simple types
sritchie73 Dec 29, 2019
27fa71c
Implemented Casmatrix for character vectors
sritchie73 Dec 29, 2019
0d6742a
Implemented Casmatrix for list type columns
sritchie73 Dec 29, 2019
89e4239
Bugfix: no longer crashes on 1-column matrices
sritchie73 Dec 29, 2019
e101b0e
removed spurious line of code
sritchie73 Dec 29, 2019
dff21b5
Style fix: changed <- to =
sritchie73 Dec 29, 2019
a7118b7
as.matrix now handles type conversion
sritchie73 Dec 29, 2019
4456f26
Special case for data.tables with multi-columns
sritchie73 Dec 29, 2019
7b4891a
Bugfix
sritchie73 Dec 29, 2019
d226099
Improved multi-column handling in as.matrix
sritchie73 Dec 30, 2019
cc08c5d
Style fix: Numeric -> Explicit integer
sritchie73 Dec 30, 2019
92c1d54
Removed expensive copy call when given rownames
sritchie73 Dec 30, 2019
002fd76
Refactored Casmatrix
sritchie73 Dec 30, 2019
ee4e6af
Added long vector support
sritchie73 Dec 30, 2019
02310aa
refactored wide-column detection
sritchie73 Dec 30, 2019
a9c9926
Replaced inner loop with memcpy
sritchie73 Dec 30, 2019
5c9f526
Added OpenMP support to Casmatrix
sritchie73 Dec 31, 2019
7cbf38e
Removed extraneous xlength call
sritchie73 Dec 31, 2019
224a8ab
Removed calls to R API from OpenMP threads
sritchie73 Jan 1, 2020
ab3ed57
R_alloc used to allocate pointer array memory
sritchie73 Jan 1, 2020
946f38e
Imposed 32 bit vector length on matrix dim
sritchie73 Jan 3, 2020
297b607
Bugfix type casting
sritchie73 Jan 3, 2020
2dd90d9
Single-line ifs now multi-line for codecov
sritchie73 Jan 3, 2020
504ee7c
Renamed asmatrix.c to matrix.c
sritchie73 Jan 3, 2020
d4c2465
Explicit integer L not needed when using `:`
sritchie73 Jan 3, 2020
4d3669e
Added check for multi-column and list rownames
sritchie73 Jan 3, 2020
b367f23
matrix column and row number check now INT_MAX
sritchie73 Jan 3, 2020
b711fb0
rownames column may now be an ff object
sritchie73 Jan 3, 2020
d41cb0e
Refactored column checks
sritchie73 Jan 3, 2020
1f566f4
homogenous non-atomic types converted correctly
sritchie73 Jan 3, 2020
f58029d
Added bit64 support and fallback coercion
sritchie73 Jan 3, 2020
893cbba
Bugfix rownames check
sritchie73 Jan 3, 2020
e63df58
Simplified rownames handling
sritchie73 Jan 4, 2020
1f7f94e
Replaced int with Rboolean in asmatrix_logical
sritchie73 Jan 4, 2020
1f50c13
Split out handling of class and types
sritchie73 Jan 4, 2020
caeb706
Better handling of additional and multi-classes
sritchie73 Jan 4, 2020
f5fdfb3
Added tests for a debugged new rownames checks
sritchie73 Jan 4, 2020
a0b760e
added tests to cover all C matrix types
sritchie73 Jan 5, 2020
49fc1b3
Added tests for ff coverage
sritchie73 Jan 5, 2020
cc954bc
Added tests for bit64 coverage
sritchie73 Jan 5, 2020
a75705a
bugfix bad test specification
sritchie73 Jan 5, 2020
a1ffe65
Added tests to cover remaining type conversions
sritchie73 Jan 5, 2020
24af9fd
Test 2134.4 now passes
sritchie73 Jan 5, 2020
9ceea1b
Moved dimension check into R code
sritchie73 Jan 5, 2020
64334f8
Catch case when bit64 package not installed
sritchie73 Jan 7, 2020
475c24d
bugfix interger.max checks
sritchie73 Jan 7, 2020
d08e67d
Removed ff tests and added nocov
sritchie73 Jan 8, 2020
7b60f79
Nocov for dimension checks
sritchie73 Jan 8, 2020
815c4b5
Bugfix; error is stop in R
sritchie73 Jan 8, 2020
d2ec0f0
Merge in master to get updated tests.Rraw file
sritchie73 Jan 8, 2020
5a81e37
Updated new test numbers so CI doesnt fail
sritchie73 Jan 8, 2020
9a68b7c
"numeric" is not a "type", see ?typeof
sritchie73 Jan 8, 2020
447d3e7
Unecessary to track dm with n and p
sritchie73 Jan 8, 2020
b4bdf9f
Fixed compiled warnings for asmatrix_logical
sritchie73 Jan 8, 2020
6294407
Added timing function to C code
sritchie73 Jan 8, 2020
589b7c0
nocov for rownames ff check
sritchie73 Jan 8, 2020
8aa261e
More robust handling of malformed data.tables
sritchie73 Jan 9, 2020
f0aafae
spelling error in warning message
sritchie73 Jan 9, 2020
ee625f3
removed rownames.index, no need to track
sritchie73 Jan 10, 2020
e90db9c
Removed unecessary {} for single-line if
sritchie73 Jan 10, 2020
8aa779f
Reverted error message
sritchie73 Jan 10, 2020
685385e
Fixed tests
sritchie73 Jan 10, 2020
dbd9331
Simplified some if statements
sritchie73 Jan 10, 2020
0bad2ca
Added helper function for column properties
sritchie73 Jan 10, 2020
da74463
sapply can return matrix causing errors
sritchie73 Jan 10, 2020
2f853b9
removed brackets from single line if
sritchie73 Jan 12, 2020
3481a93
Bugfix
sritchie73 Jan 12, 2020
5038ebc
Added support for raw matrices
sritchie73 Jan 12, 2020
bdcea33
error when coercion of integer64 but no bit64
sritchie73 Jan 12, 2020
32ba105
column_properties() only required if coercion
sritchie73 Jan 12, 2020
2f12067
integer64 now works with raw
sritchie73 Jan 12, 2020
746d287
Fixed raw to logical and added test
sritchie73 Jan 12, 2020
131e2d2
Nocov for unsupported matrix types
sritchie73 Jan 12, 2020
87ac4c9
asmatrix_logical redundant with integer
sritchie73 Jan 12, 2020
9768918
missing indentation
sritchie73 Jan 12, 2020
4d87e9f
Do not enter typeof coercion if recursive cols
sritchie73 Jan 12, 2020
c8e6aed
Added fallback for recursive non-list columns
sritchie73 Jan 12, 2020
75f6b57
Reduced number of vapply across columns
sritchie73 Jan 12, 2020
defc74e
turns out we do need to separate int and log
sritchie73 Jan 12, 2020
9f64ddb
streamlined comments
sritchie73 Jan 12, 2020
5dbc33c
No need for column_properties after type coerce
sritchie73 Jan 12, 2020
4067cd4
Added warning for type fallback
sritchie73 Jan 12, 2020
20ce5fd
Added non.atomic check to class addition
sritchie73 Jan 12, 2020
a963766
Added helper functions
sritchie73 Jan 12, 2020
b842b65
Bugfix extra class checks
sritchie73 Jan 12, 2020
80199fd
Bugfix test
sritchie73 Jan 12, 2020
8c2cbd3
More helper functions
sritchie73 Jan 12, 2020
f4ecb46
Added test to cover VECSXP
sritchie73 Jan 12, 2020
a948c7f
Added nocovs
sritchie73 Jan 12, 2020
ec698bb
Add to, don't overwrite class(X)
sritchie73 Jan 13, 2020
b7cbaab
allocMatrix instead of allocVector
sritchie73 Jan 13, 2020
47e79be
Simple OMP for loops can be used
sritchie73 Jan 13, 2020
2be89d6
Fixed test 2135.1
sritchie73 Jan 13, 2020
0f14c06
No need to assign dim after Casmatrix
sritchie73 Jan 13, 2020
f1ce5cd
Fixed nocov tags in matrix.c
sritchie73 Jan 13, 2020
ab0453c
recursive non-list types found from typelist
sritchie73 Jan 13, 2020
5d516d7
bugfix
sritchie73 Jan 13, 2020
af02985
Deduplicated helper functions
sritchie73 Jan 13, 2020
d9a5fc8
removed trailing whitespace
sritchie73 Jan 13, 2020
590363d
No need to vectorify when coercing to list
sritchie73 Jan 13, 2020
5381580
Maybe.recursive handling now clearer
sritchie73 Jan 13, 2020
f713224
Fixed type checking for unlist
sritchie73 Jan 13, 2020
e92c6d0
Fixed typo
sritchie73 Jan 13, 2020
9047f64
Improved helper function comments
sritchie73 Jan 14, 2020
c4405dd
Improved memory when list + factor or date
sritchie73 Jan 14, 2020
6cf1553
Fixed comment
sritchie73 Jan 14, 2020
bef9fa1
Merge branch 'master' into asmatrix-c
mattdowle Jan 15, 2020
ab048ad
as.matrix now uses Crindlist
sritchie73 Jan 25, 2020
18c45c4
Bugfix rownames
sritchie73 Jan 25, 2020
1a1e40a
Basic Casmatrix implementation following Crbindlist
sritchie73 Jan 26, 2020
eee7197
Condensed loop, removed debug prints
sritchie73 Jan 26, 2020
37f8c42
ncol = 0 should preserve rownames
sritchie73 Jan 26, 2020
d040581
Reverting to commit bef9fa17
sritchie73 Jan 26, 2020
c351389
Crbindlist used for type coercion
sritchie73 Jan 26, 2020
69cb3b0
Now handles integer64 coercion and factors
sritchie73 Jan 26, 2020
f618c64
Type now preserved when matrix has dim 0
sritchie73 Jan 26, 2020
2da2d76
class mismatch check should not trigger if asmatrix
sritchie73 Jan 26, 2020
109bcf8
Test ncol=0 nrow>0
sritchie73 Jan 26, 2020
44232b1
check int64 and list coercion rules
sritchie73 Jan 26, 2020
114c710
Restoring matrix.c for conversion
sritchie73 Jan 26, 2020
bdf66fb
Removed class logic
sritchie73 Jan 26, 2020
f5dd790
len check in R
sritchie73 Jan 26, 2020
1ef7be9
Comments
sritchie73 Jan 26, 2020
3de02cc
rm empty line
sritchie73 Jan 26, 2020
c43daec
Casmatrix now integer64 aware
sritchie73 Jan 27, 2020
0a136fc
Coercion of integer64 and complex to character
sritchie73 Jan 27, 2020
9f3b8eb
Class not added to matrix
sritchie73 Jan 27, 2020
8164cc1
Fixed tests
sritchie73 Jan 27, 2020
5d59d23
Rownames coerced to charcter if needed
sritchie73 Jan 27, 2020
99dead0
Will use coerceAsList pending #4196
sritchie73 Jan 27, 2020
856a8c1
No need to handle CPLX separately, #4203
sritchie73 Jan 27, 2020
9397963
Missing nprotect counter increment
sritchie73 Jan 27, 2020
c3ba221
asCharacterInteger64 generalised to 64bit vectors
sritchie73 Jan 28, 2020
6a1343d
missing EOF newlines
sritchie73 Jan 28, 2020
70cdd14
Formatting + 64 int
2005m Jan 28, 2020
bd5ca7a
integer64 should be long long int, not long int
sritchie73 Jan 28, 2020
3a384c0
Using PRId64 instead of lld as per #4602
sritchie73 Jan 28, 2020
5d82c5f
memrecycle now theoretically long vector compatible
sritchie73 Jan 28, 2020
07ae873
Added 64bit compatible memcpy
sritchie73 Jan 28, 2020
cc0d412
Bugfix memcpy64
sritchie73 Jan 28, 2020
8198f46
ansloc also needs to be int64_t
sritchie73 Jan 29, 2020
2ab6122
No need for memcpy64, bug was elsewhere
sritchie73 Jan 29, 2020
c3bd7ac
%ld -> %"PRId64"
sritchie73 Jan 29, 2020
a839c40
nrow 0 check must come before i64 class
sritchie73 Jan 29, 2020
06597bf
Optimise for data.tables with all same col type
sritchie73 Jan 29, 2020
56c1ec0
sizeof * nrow for number of bytes in memcpy
sritchie73 Jan 29, 2020
e23ddbe
VECSXP and STRSXP wrong way round
sritchie73 Jan 29, 2020
186d687
coercion rules
sritchie73 Jan 29, 2020
60614d5
bugfix coerce
sritchie73 Jan 29, 2020
a3ba3c6
Added OMP support
sritchie73 Jan 29, 2020
7e361d0
Need to allocate memory for array of ptrs
sritchie73 Jan 29, 2020
e3a9e41
Simplified code to macros
sritchie73 Jan 29, 2020
30017a8
Coerce before recycling other columns
sritchie73 Jan 29, 2020
93144c1
i64 coercion works again
sritchie73 Jan 29, 2020
f0e99c3
Macro comments
sritchie73 Jan 30, 2020
e73f143
Factors can be easily handled in C.
sritchie73 Jan 30, 2020
47a1331
Date-likes should now be formatted in C
sritchie73 Jan 30, 2020
07e6f00
moved as.character.ITime into C function
sritchie73 Jan 31, 2020
51ecbc0
formatting
sritchie73 Jan 31, 2020
a00d0a7
Second attempt at calling format from C
sritchie73 Jan 31, 2020
10f6d03
S3 dispatch doesnt work from C
sritchie73 Jan 31, 2020
f880b27
char_Date more general than char_IDate
sritchie73 Jan 31, 2020
0452272
FF support moved into as.matrix
sritchie73 Jan 31, 2020
4498083
Rownames checks and coercions now in C
sritchie73 Jan 31, 2020
c81b367
bugfix
sritchie73 Jan 31, 2020
2d57084
Need to extract rownames.value
sritchie73 Jan 31, 2020
a39d771
Moved maximum dimension check to C
sritchie73 Jan 31, 2020
005c228
Code to unpack nested data.tables
sritchie73 Jan 31, 2020
9dfd669
missing ;
sritchie73 Jan 31, 2020
ffb7310
fixed names() getters
sritchie73 Jan 31, 2020
4dfe9a5
multidim columns unpacked
sritchie73 Jan 31, 2020
f44ecdb
bugfix callRfun1
sritchie73 Jan 31, 2020
4fa0c53
fixed comment
sritchie73 Jan 31, 2020
7c7758a
Column recycling
sritchie73 Jan 31, 2020
6cffa32
Modify by reference semantics fixed
sritchie73 Jan 31, 2020
ce9e211
bugfixes and test updates
sritchie73 Jan 31, 2020
e98cf28
Names in nested dts no longer modified by ref.
sritchie73 Jan 31, 2020
82de007
Bugfix integer64
sritchie73 Jan 31, 2020
d1fec57
Silence compiler warnings for snprintf
sritchie73 Jan 31, 2020
57ee386
Attempt to silence PRId64 compiler warning
sritchie73 Jan 31, 2020
94d8bf5
linux still reports "%lld" must be used
sritchie73 Jan 31, 2020
fca5ab9
Micromanage PROTECT stack
sritchie73 Jan 31, 2020
eec32e1
rownames check fix
sritchie73 Jan 31, 2020
c501db2
Entry function from .Call controls PROTECT stack
sritchie73 Jan 31, 2020
3e88266
asCharacterFactor is internal R func
sritchie73 Jan 31, 2020
a7fac5a
rownames vs. rncontainer fixed
sritchie73 Jan 31, 2020
37cbea9
Fixed regression in as.character.ITime
sritchie73 Feb 1, 2020
c65e113
Fixed test
sritchie73 Feb 1, 2020
9a9b0df
Unintentional pointer arithmatic
sritchie73 Feb 1, 2020
80f4bd4
missed an int -> int64_t in memrecycle
sritchie73 Feb 1, 2020
cd5bb04
R_xlen_t -> int64_t
sritchie73 Feb 18, 2020
79f8899
Tests to check no modification by reference
sritchie73 Feb 18, 2020
7b20cea
Don't use copy for data.table made from structure
sritchie73 Feb 18, 2020
a284589
Only need 1 test for dropping NULL columns
sritchie73 Feb 18, 2020
93563a3
Fixed detection of coercion required
sritchie73 Feb 19, 2020
7236060
base type is logical not raw
sritchie73 Feb 19, 2020
f5c5b46
Fixed raw type coercion rules
sritchie73 Feb 19, 2020
dd81334
raw now works with list columns
sritchie73 Feb 19, 2020
2d30f8a
Fixed ncol incrementer
sritchie73 Feb 19, 2020
61fcc0a
bugfix raw type detection and coercion
sritchie73 Feb 19, 2020
6a5cb04
Refactored preprocess in asmatrix
sritchie73 Feb 19, 2020
7668701
Updated integer64 tests for raw rules
sritchie73 Feb 19, 2020
8519f31
Fixed initialisation rules
sritchie73 Feb 20, 2020
fdea714
*wd is a pointer to an INTEGER SEXP array not int64_t
sritchie73 Feb 20, 2020
53846b8
missed an int64_t case
sritchie73 Feb 20, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 1 addition & 16 deletions R/IDateTime.R
Original file line number Diff line number Diff line change
Expand Up @@ -198,22 +198,7 @@ as.ITime.times = function(x, ms = 'truncate', ...) {
}

as.character.ITime = format.ITime = function(x, ...) {
# adapted from chron's format.times
# Fix for #811. Thanks to @StefanFritsch for the code snippet
neg = x < 0L
x = abs(unclass(x))
hh = x %/% 3600L
mm = (x - hh * 3600L) %/% 60L
# #2171 -- trunc gives numeric but %02d requires integer;
# as.integer is also faster (but doesn't handle integer overflow)
# http://stackoverflow.com/questions/43894077
ss = as.integer(x - hh * 3600L - 60L * mm)
res = sprintf('%02d:%02d:%02d', hh, mm, ss)
# Fix for #1354, so that "NA" input is handled correctly.
if (is.na(any(neg))) res[is.na(x)] = NA
neg = which(neg)
if (length(neg)) res[neg] = paste0("-", res[neg])
res
.Call("CasCharacterITime", x)
}

as.data.frame.ITime = function(x, ...) {
Expand Down
84 changes: 16 additions & 68 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -1870,10 +1870,9 @@ as.matrix.data.table = function(x, rownames=NULL, rownames.value=NULL, ...) {
stop("length(rownames)==0 but should be a single column name or number, or NULL")
} else {
if (isTRUE(rownames)) {
if (length(key(x))>1L) {
if (length(key(x))>1L)
warning("rownames is TRUE but key has multiple columns ",
brackify(key(x)), "; taking first column x[,1] as rownames")
}
rownames = if (length(key(x))==1L) chmatch(key(x),names(x)) else 1L
}
else if (is.logical(rownames) || is.na(rownames)) {
Expand All @@ -1883,7 +1882,6 @@ as.matrix.data.table = function(x, rownames=NULL, rownames.value=NULL, ...) {
else if (is.character(rownames)) {
w = chmatch(rownames, names(x))
if (is.na(w)) stop("'", rownames, "' is not a column of x")
rownames = w
}
else { # rownames is a column number already
rownames = as.integer(rownames)
Expand All @@ -1892,73 +1890,23 @@ as.matrix.data.table = function(x, rownames=NULL, rownames.value=NULL, ...) {
" which is outside the column number range [1,ncol=", ncol(x), "].")
}
}
} else if (!is.null(rownames.value)) {
if (length(rownames.value)!=nrow(x))
stop("length(rownames.value)==", length(rownames.value),
" but should be nrow(x)==", nrow(x))
}
if (!is.null(rownames)) {
# extract that column and drop it.
rownames.value = x[[rownames]]
dm = dim(x) - 0:1
cn = names(x)[-rownames]
X = x[, .SD, .SDcols = cn]
} else {
dm = dim(x)
cn = names(x)
X = x
}
if (any(dm == 0L))
return(array(NA, dim = dm, dimnames = list(rownames.value, cn)))
p = dm[2L]
n = dm[1L]
collabs = as.list(cn)
sritchie73 marked this conversation as resolved.
Show resolved Hide resolved

# Create shallow copy - where each element of the list X is simply a pointer to
# the same column in the data.table x. This means that we do not modify the
# input data.table when dropping rows or coercing columns. See tests #2139.XXX
X = x
class(X) = NULL
sritchie73 marked this conversation as resolved.
Show resolved Hide resolved
non.numeric = non.atomic = FALSE
all.logical = TRUE
for (j in seq_len(p)) {
if (is.ff(X[[j]])) X[[j]] = X[[j]][] # nocov to bring the ff into memory, since we need to create a matrix in memory
xj = X[[j]]
if (length(dj <- dim(xj)) == 2L && dj[2L] > 1L) {
if (inherits(xj, "data.table"))
xj = X[[j]] = as.matrix(X[[j]])
dnj = dimnames(xj)[[2L]]
collabs[[j]] = paste(collabs[[j]], if (length(dnj) >
0L)
dnj
else seq_len(dj[2L]), sep = ".")
}
if (!is.logical(xj))
all.logical = FALSE
if (length(levels(xj)) > 0L || !(is.numeric(xj) || is.complex(xj) || is.logical(xj)) ||
(!is.null(cl <- attr(xj, "class", exact=TRUE)) && any(cl %chin%
c("Date", "POSIXct", "POSIXlt"))))
non.numeric = TRUE
if (!is.atomic(xj))
non.atomic = TRUE
}
if (non.atomic) {
for (j in seq_len(p)) {
xj = X[[j]]
if (is.recursive(xj)) { }
else X[[j]] = as.list(as.vector(xj))
}
}
else if (all.logical) { }
else if (non.numeric) {
for (j in seq_len(p)) {
if (is.character(X[[j]])) next
xj = X[[j]]
miss = is.na(xj)
xj = if (length(levels(xj))) as.vector(xj) else format(xj)
is.na(xj) = miss
X[[j]] = xj
}
}
X = unlist(X, recursive = FALSE, use.names = FALSE)
dim(X) <- c(n, length(X)/n)
dimnames(X) <- list(rownames.value, unlist(collabs, use.names = FALSE))
X

# Extract and drop the rownames column, if used
if (!is.null(rownames)) {
rownames.value = X[[rownames]]
X[[rownames]] = NULL
}

# Remaining type and class coercion is handled in Casmatrix
ans = .Call(Casmatrix, X, rownames.value)
ans
}

# bug #2375. fixed. same as head.data.frame and tail.data.frame to deal with negative indices
Expand Down
145 changes: 142 additions & 3 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -12544,7 +12544,7 @@ mat4 <- matrix(c("a", 1, 5), nrow=1, dimnames=list(c("x"), c("id", "X", "Y")))
test(1899.14, as.matrix(DT[1,], 1), mat[1,,drop=FALSE])
test(1899.15, as.matrix(DT[1,], "id"), mat[1,,drop=FALSE])
test(1899.16, as.matrix(DT[1,], rownames.value="x"), mat4)
test(1899.17, as.matrix(DT[1,], rownames.value=c("x", "y")), error="length(rownames.value)==2 but should be nrow(x)==1")
test(1899.17, as.matrix(DT[1,], rownames.value=c("x", "y")), error="Extracted rownames column or provided rownames.values do not match the number of rows in the matrix")
test(1899.18, as.matrix(DT, rownames=TRUE, rownames.value=1:nrow(DT)), error="rownames and rownames.value cannot both be used at the same time")

# index argument for fread, #2633
Expand Down Expand Up @@ -13687,7 +13687,7 @@ test(1967.526, x[keyby=a], x, warning=c("Ignoring keyby= because j= i
test(1967.53, as.matrix(x, rownames = 2:3),
error = 'length(rownames)==2 but')
test(1967.54, as.matrix(x[0L]),
structure(logical(0), .Dim = c(0L, 2L), .Dimnames = list(NULL, c("a", "b"))))
structure(integer(0), .Dim = c(0L, 2L), .Dimnames = list(NULL, c("a", "b"))))

test(1967.55, subset(x, 5L), error = "'subset' must evaluate to logical")

Expand Down Expand Up @@ -15789,7 +15789,7 @@ DT = data.table(a=1, b=2)
test(2074.06, DT[ , c(.SD[1], .SD[1, .SD[1]]), by=a], data.table(a=1, b=2, b=2))
## as.matrix.data.table when a column has columns (only possible when constructed incorrectly)
DT = structure(list(a=1:5, d=data.table(b=6:10, c=11:15), m=matrix(16:25, ncol=2L)), class = c('data.table', 'data.frame'))
test(2074.07, as.matrix(DT), matrix(1:25, ncol=5L, dimnames=list(NULL, c('a', 'd.b', 'd.c', 'm.1', 'm.2'))))
test(2074.07, as.matrix(DT), matrix(1:25, ncol=5L, dimnames=list(NULL, c('a', 'd.b', 'd.c', 'm.V1', 'm.V2'))))
## can induce !cedta() from base::rownames to get this error
test(2074.08, rownames(structure(list(1:5), class='data.table')), error="Has it been created manually")
## default dimnames.data.table
Expand Down Expand Up @@ -16770,6 +16770,145 @@ test(2132.2, fifelse(TRUE, 1, s2), error = "S4 class objects (except nanot
test(2132.3, fcase(TRUE, s1, FALSE, s2), error = "S4 class objects (except nanotime) are not supported. Please see https://github.com/Rdatatable/data.table/issues/4131.")
rm(s1, s2, class2132)

## Check for appropriate errors when giving rownames to as.matrix.data.table that are mutli-column (only possible when constructed incorrectly) or list
DT = structure(list(a=list(1,2:3,4:6,letters[1:5],list(1, "a")), d=data.table(b=6:10, c=11:15), m=matrix(16:25, ncol=2L)), class = c('data.table', 'data.frame'))
test(2133.1, as.matrix(DT, rownames="d"), error="Extracted rownames column or provided rownames.values are multi-column type")
test(2133.2, as.matrix(DT, rownames=2), error='Extracted rownames column or provided rownames.values are multi-column type')
mat = matrix(6:25, nrow=5, dimnames=list(DT[["a"]], c("d.b", "d.c", "m.V1", "m.V2")))
test(2133.3, as.matrix(DT, rownames="a"), mat, warning="Extracted rownames column or provided rownames.values are a list column")
test(2133.4, as.matrix(DT, rownames=1), mat, warning='Extracted rownames column or provided rownames.values are a list column')

## Check C-implementations of basic matrix types not covered by previous tests
lmat = matrix(c(TRUE, FALSE, NA, FALSE), ncol=2, dimnames=list(NULL, c("A", "B")))
ldt = as.data.table(lmat)
test(2134.1, as.matrix(ldt), lmat)
cmat = matrix(complex(real=1:4, imaginary=0:3), ncol=2, dimnames=list(NULL, c("A", "B")))
cdt = as.data.table(cmat)
test(2134.2, as.matrix(cdt), cmat)

## Test bit64 conversion in as.matrix
if (test_bit64) {
# integer64 with logical, raw, or integer should be coerced to integer64
i64_mat <- matrix(as.integer64(c(1,0,NA,1:6)), nrow=3, ncol=3, dimnames=list(NULL, LETTERS[1:3]))
DT = data.table(A=c(TRUE, FALSE, NA), B=1:3, C=as.integer64(4:6))
test(2135.1, as.matrix(DT), i64_mat) # should be integer64 aware, but class not added

# integer64 with raw, numeric and/or complex should be coerced to character -
# large integers (>32 bits) cannot be converted to numeric or complex
mat = matrix(c("0", "1", "0.1", "0.2", "1+0i", "2+1i", "02", "03"), ncol=4, dimnames=list(NULL, LETTERS[1:4]))
DT = data.table(A=as.integer64(0:1), B=c(0.1, 0.2), C=complex(real=1:2, imaginary=0:1), D=as.raw(2:3))
test(2135.2, as.matrix(DT), mat)

# But not if any non-atomic columns
DT = data.table(A=as.integer64(0:1), B=c('asd', 'qwe'), C=list(2:3, "A"))
mat = matrix(list(as.integer64(0), as.integer64(1), 'asd', 'qwe', 2:3, "A"),
ncol=3, dimnames=list(NULL, c("A", "B", "C")))
test(2135.3, as.matrix(DT), mat)
}

## Test other class and type conversion in as.matrix
mat = matrix(letters[1:4], ncol=2, dimnames=list(NULL, c("A", "B")))
DT = data.table(A=as.factor(c("a", "b")), B=as.factor(c("c", "d")))
test(2136.1, as.matrix(DT), mat)

mat = matrix(as.raw(1:4), ncol=2, dimnames=list(NULL, c("A", "B")))
DT = data.table(A=as.raw(1:2), B=as.raw(3:4))
test(2136.2, as.matrix(DT), mat)

mat = matrix(paste0("2019-01-0", 1:4), ncol=2, dimnames=list(NULL, c("A", "B")))
DT = data.table(A=as.IDate(paste0("2019-01-0", 1:2)), B=as.IDate(paste0("2019-01-0", 3:4)))
test(2136.3, as.matrix(DT), mat)

mat = matrix(c(1, 2, 0.1, 0.2, TRUE, FALSE), ncol=3, dimnames=list(NULL, c("A", "B", "C")))
DT = data.table(A=1:2, B=c(0.1, 0.2), C=c(TRUE, FALSE))
test(2136.4, as.matrix(DT), mat)

mat = matrix(c(1, 2, 0.1, 0.2, TRUE, FALSE, complex(1:2, 1:2)), ncol=4, dimnames=list(NULL, LETTERS[1:4]))
DT = data.table(A=1:2, B=c(0.1, 0.2), C=c(TRUE, FALSE), D=complex(1:2, 1:2))
test(2136.5, as.matrix(DT), mat)

mat = matrix(c("TRUE", "FALSE", "00", "01"), ncol=2, dimnames=list(NULL, c("A", "B")))
DT = data.table(A=c(TRUE,FALSE), B=as.raw(0:1))
test(2136.6, as.matrix(DT), mat)

mat = matrix(logical(0), nrow=26, ncol=0, dimnames=list(LETTERS, NULL))
DT = data.table(A=LETTERS)
test(2136.7, as.matrix(DT, rownames="A"), mat) # nrow and rownames should be preserved

## Tests for complex column types not captured by standard tests. These are things that would usually only
## be possible through incorrect construction of data.tables, e.g. NULL columns, environments, expressions, etc.
DT = structure(list(A=NULL, B=1:2, C=NULL, D=3:4, E=NULL), class=c("data.table", "data.frame"))
mat = matrix(1:4, ncol=2, dimnames=list(NULL, c("B", "D")))
test(2137.1, as.matrix(DT), mat)

DT = structure(list(B=1:2, C=NULL, D=3:4, E=NULL), class=c("data.table", "data.frame"))
test(2137.2, as.matrix(DT), mat)

DT = structure(list(C=NULL, E=NULL), class=c("data.table", "data.frame"))
mat = array(NA, dim = list(0, 0))
test(2137.3, as.matrix(DT), mat)

DT = structure(list(A=NULL, B=logical(0), C=NULL, D=character(0)), class=c("data.table", "data.frame"))
mat = matrix(character(0), nrow=0, ncol=2, dimnames = list(NULL, c("B", "D")))
test(2137.4, as.matrix(DT), mat)

DT = structure(list(A=1:6, B=1:2), class=c("data.table", "data.frame")) # DT[,B] is c(1,2)
DT2 = data.table(A=1:6, B=1:2) # DT2[,B] is c(1,2,1,2,1,2)
test(2137.5, as.matrix(DT), as.matrix(DT2))

DT = structure(list(A=function(){}, B=list(1), C=expression(1+1)), class=c("data.table", "data.frame"))
mat = matrix(c(A=list(function(){}), B=list(1), C=list(expression(1+1))), ncol=3, dimnames=list(NULL, c("A","B","C")))
test(2137.6, as.matrix(DT), mat)

DT = data.table(A=function(){}, B=list(1))
mat = matrix(c(A=function(){}, B=list(1)), ncol=2, dimnames=list(NULL, c("A","B")))
test(2137.7, as.matrix(DT), mat)

DT = data.table(A=expression(as.character(system.time())), B=quote(1+1))
mat = matrix(list(expression(as.character(system.time())), expression(as.character(system.time())),
expression(as.character(system.time())), quote(1+1), quote(1+1), quote(1+1)),
nrow=3, ncol=2, dimnames=list(NULL, c("A","B")))
test(2137.8, as.matrix(DT), mat) # non-atomic wrapped in list #4196

# Test rownames to character conversions
DT = data.table(A=as.IDate(1:3), B=factor(c("A", "B", "A"), levels=c("B", "A")))
mat1 = matrix(c("A", "B", "A"), nrow=3, ncol=1, dimnames=list(as.character(as.IDate(1:3)), "B"))
mat2 = matrix(as.character(as.IDate(1:3)), nrow=3, ncol=1, dimnames=list(c("A", "B", "A"), "A"))
test(2138.1, as.matrix(DT, rownames="A"), mat1)
test(2138.2, as.matrix(DT, rownames="B"), mat2)

if (test_bit64) {
# Integer64 as row names
DT = data.table(A=as.integer64(0:1), B=c('asd', 'qwe'), C=list(2:3, "A"))
mat = matrix(list('asd', 'qwe', 2:3, "A"), ncol=2, dimnames=list(c("0", "1"), c("B", "C")))
test(2138.3, as.matrix(DT, rownames="A"), mat)
}

# Check input data.table is not modified by reference in as.matrix
dt = data.table(rn=letters[1:3], A=1:3, B=4:6)
dt2 = copy(dt)
mat = as.matrix(dt, rownames="rn")
test(2139.1, dt, dt2) # rn column taken as rownames should still be in dt

dt = data.table(rn=letters[1:3], A=1:3, B=4:6)
dt2 = copy(dt)
mat = as.matrix(dt)
test(2139.2, dt, dt2) # column type coercion should not modify input dt

dt = structure(list(A=NULL, B=1:2, C=NULL, D=3:4, E=NULL), class=c("data.table", "data.frame"))
dt2 = structure(list(A=NULL, B=1:2, C=NULL, D=3:4, E=NULL), class=c("data.table", "data.frame"))
mat = as.matrix(dt)
test(2139.3, dt, dt2) # null columns should not be dropped after as.matrix

dt = structure(list(A=1:6, B=1:2), class=c("data.table", "data.frame")) # DT[,B] is c(1,2)
dt2 = structure(list(A=1:6, B=1:2), class=c("data.table", "data.frame")) # DT[,B] is c(1,2)
mat = as.matrix(dt)
test(2139.4, dt, dt2) # recycling of dt$B to length dt$A in as.matrix should not modify input dt$B.

dt = structure(list(a=list(1,2:3,4:6,letters[1:5],list(1, "a")), d=data.table(b=6:10, c=11:15), m=matrix(16:25, ncol=2L)), class = c('data.table', 'data.frame'))
dt2 = structure(list(a=list(1,2:3,4:6,letters[1:5],list(1, "a")), d=data.table(b=6:10, c=11:15), m=matrix(16:25, ncol=2L)), class = c('data.table', 'data.frame'))
mat = as.matrix(dt)
test(2139.5, dt, dt2) # flattening data.table and matrix columns should not modify input dt

########################
# Add new tests here #
Expand Down
Loading