Solveddata.table Compatibility with the future native pipe

Right now, r-devel is implementing a native pipe that is incompatible with chaining in data.table. Using the proposed d => syntax I get an error using this R version

data <- CJ(group1 = letters[1:5], group2 = letters[10:14])

data[, x := rnorm(.N)] |> 
  d => d[, mean(x), by = group1]
#> Error: function '[' not supported in RHS call of a pipe

If "[" won't be supported as the RHS of a pipe, then no syntax transformation can make data.table compatible with the |> pipe, if I understand correctly.

I see two ways of addressing this.

On the one side, we could talk with r-core to present this issue and see if they can change their implementation to make it work.

Another option would be to create a functional alias to data.table:::"[.data.table", lets say dt(). This, for example, seems to work.

dt <- data.table:::"[.data.table"

data[, x := rnorm(.N)] |> 
   dt(, mean(x), by = group1)
#>    group1          V1
#> 1:      a  0.58876302
#> 2:      b -0.24705765
#> 3:      c  0.07676786
#> 4:      d -0.33047608
#> 5:      e  0.54832829

(in fact, it also works with dt <- base::"[")

Personally, I don't mind that notation at all. It is true that if such a simple fix resolves the issue, then each user could do it in their own scripts, but I think it might be preferable if data.table provided a standardised alias so that other people's code stays readable. For the record, I don't like dt very much, it's just the first thing that came in mind.

18 Answers

✔️Accepted Answer

(FWIW the tidyverse team are planning to work with R core to figure out a placeholder syntax (or equivalent) for the base pipe that there's some hope that 4.2 might allow you do to dt|> .[i] (or similar))

Other Answers:

Personally, I'm not so keen on either of .t() or .s()/.S().

For .t(), I think it goes against the mnemonic that data.table uses for its other special operators. .SD, .GRP, etc. are all based on an abbreviation heuristic rather than the "shape" of the operator. I also think it's too close to the base transpose function, which is where my brain automatically goes when I see t... and invites unexpected behaviour in the case of a simple typo. (C.f. DT |> .t() vs DT |> t()).

(dt() would be even worse ofc because that creates a namespace conflict with a base R function.)

Similarly, .s()/.S() is too close to .SD and .SDcols in my view, without being at all related to the underlying functionality.

Taking a step back, IMO the point of this symbol should be to encapsulate a data.table. I know I've already said this, but I genuinely think that .DT() is the obvious solution here. It's the widely used shorthand for constructed data.tables already, used throughout the docs, and should be intuitive enough for newcomers as well. (Although, I'll readily accept .d() or .D too ;-)

My sense is that iif there is some guarantee that 4.2.0 will support dt |> .[], then there's not much point in introducing a wrapper that could be easily be created by an user on a per-script basis. In the meantime, data.table's documentation could just suggest users to add the wrapper themselves.

Related Issues:

data.table Compatibility with the future native pipe
(FWIW the tidyverse team are planning to work with R core to figure out a placeholder syntax (or equ...
data.table memory leak
Markus: I shouldn't have written RAM usage What I had in mind more specifically was cache not RAM My...
data.table fwrite(): final items
Do people actually like having quote=TRUE when writing to csv? I find it to be a big nuisance and wo...
data.table fwrite UTF8
Having encoding issues in Windows Windows encoding is a real pain +1 Hello Could you please add an o...
data.table Error when installing from source on macOS
Never mind I figured it out and am able to install with multithreaded support I had to change my .R/...