New translations for
dbplyr now supplies appropriate translations for the RMariaDB and RPostgres packages (#3154). We generally recommend using these packages in favour of the older RMySQL and RPostgreSQL packages as they are fully DBI compliant and tested with DBItest.
copy_to() can now “copy” tbl_sql in the same src, providing another way to cache a query into a temporary table (#3064). You can also
copy_to tbl_sqls from another source, and
copy_to() will automatically collect then copy.
Initial support for stringr functions:
str_trim(). Regular expression support varies from database to database, but most simple regular expressions should be ok.
db_compute() gains an
analyze argument to match
remote_query_plan() provide a standard API for get metadata about a remote tbl (#3130, #2923, #2824).
sql_expr() is a more convenient building block for low-level SQL translation (#3169).
win_aggregate() for generating SQL and windowed SQL functions for aggregates. These take one argument,
x, and warn if
na.rm is not
win_recycled() is equivalent to
win_aggregate() and has been soft-deprecated.
db_write_table now needs to return the table name
head() calls in a row now collapse to a single call. This avoids a printing problem with MS SQL (#3084).
escape() now works with integer64 values from the bit64 package (#3230)
if_else() now correctly scope the false condition so that it only applies to non-NULL conditions (#3157)
ident_q() handle 0-length inputs better, and should be easier to use with S3 (#3212)
in_schema() should now work in more places, particularly in
copy_to() (#3013, @baileych)
SQL generation for joins no longer gets stuck in a endless loop if you request an empty suffix (#3220).
mutate() has better logic for splitting a single mutate into multiple subqueries (#3095).
paste0() support in MySQL, PostgreSQL (#3168), and RSQLite (#3176). MySQL and PostgreSQL gain support for
str_flatten() which behaves like
paste(x, collapse = "-") (but for technical reasons can’t be implemented as a straightforward translation of
same_src.tbl_sql() now performs correct comparison instead of always returning
TRUE. This means that
copy = TRUE once again allows you to perform cross-database joins (#3002).
select() queries no longer alias column names unnecessarily (#2968, @DavisVaughan).
rename() are now powered by tidyselect, fixing a few renaming bugs (#3132, #2943, #2860).
summarise() once again performs partial evaluation before database submission (#3148).
test_src() makes it easier to access a single test source.
Better support for temporary tables (@Hong-Revo)
Different translations for filter/mutate contexts for:
NULL evaluation (
is.null()), logical operators (
||), and comparison operators (
db_write_table()) correctly translates logical variables to integers (#3151).
n() translation in windowed context.
na_if translation (@cwarden)
PostgreSQL: translation for
grepl() added (@zozlak)
full_join() over non-overlapping columns
by = character() translated to
CROSS JOIN (#2924).
case_when() now translates to SQL “CASE WHEN” (#2894)
x %in% c(1) now generates the same SQL as
x %in% 1 (#2898).
window_frame() give you finer control over the window functions that dplyr creates (#2874, #2593).
Added SQL translations for Oracle (@edgararuiz).
x %in% c(1) now generates the same SQL as
x %in% 1 (#2898).
head(tbl, 0) is now supported (#2863).
select()ing zero columns gives a more information error message (#2863).
Variables created in a join are now disambiguated against other variables in the same table, not just variables in the other table (#2823).
PostgreSQL gains a better translation for
db_analyze_table() for MS SQL, Oracle, Hive and Impala (@edgararuiz)
Added support for
sd() for aggregate and window functions (#2887) (@edgararuiz)
You can now use the magrittr pipe within expressions, e.g.
mutate(mtcars, cyl %>% as.character()).
If a translation was supplied for a summarise function, but not for the equivalent windowed variant, the expression would be translated to
NULL with a warning. Now
sql_variant() checks that all aggregate functions have matching window functions so that correct translations or clean errors will be generated (#2887)
copy_to() now work directly with DBI connections (#2423, #2576), so there is no longer a need to generate a dplyr src.
library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars) mtcars2 <- tbl(con, "mtcars") mtcars2
glimpse() now works with remote tables (#2665)
dplyr has gained a basic SQL optimiser, which collapses certain nested SELECT queries into a single query (#1979). This will improve query execution performance for databases with less sophisticated query optimisers, and fixes certain problems with ordering and limits in subqueries (#1979). A big thanks goes to @hhoeflin for figuring out this optimisation.
collapse() now preserve the “ordering” of rows. This only affects the computation of window functions, as the rest of SQL does not care about row order (#2281).
copy_to() gains an
overwrite argument which allows you to overwrite an existing table. Use with care! (#2296)
in_schema() function makes it easy to refer to tables in schema:
query()is no longer exported. It hasn’t been useful for a while so this shouldn’t break any code.
Partial evaluation occurs immediately when you execute a verb (like
mutate()) rather than happening when the query is executed (#2370).
mutate.tbl_sql() will now generate as many subqueries as necessary so that you can refer to variables that you just created (like in mutate with regular dataframes) (#2481, #2483).
SQL joins have been improved:
SQL joins always use the
ON ... syntax, avoiding
USING ... even for natural joins. Improved handling of tables with columns of the same name (#1997, @javierluraschi). They now generate SQL more similar to what you’d write by hand, eliminating a layer or two of subqueries (#2333)
[API] They now follow the same rules for including duplicated key variables that the data frame methods do, namely that key variables are only kept from
x, and never from
sql_join() generic now gains a
vars argument which lists the variables taken from the left and right sides of the join. If you have a custom
sql_join() method, you’ll need to update how your code generates joins, following the template in
full_join() throws a clear error when you attempt to use it with a MySQL backend (#2045)
full_join() now return results consistent with local data frame sources when there are records in the right table with no match in the left table.
right_join() returns values of
by columns from the right table.
full_join() returns coalesced values of
by columns from the left and right tables (#2578, @ianmcook)
group_by() can now perform an inline mutate for database backends (#2422).
The SQL generation set operations (
union_all()) have been considerably improved.
By default, the component SELECT are surrounded with parentheses, except on SQLite. The SQLite backend will now throw an error if you attempt a set operation on a query that contains a LIMIT, as that is not supported in SQLite (#2270).
All set operations match column names across inputs, filling in non-matching variables with NULL (#2556).
group_by() now combine correctly (#1962)
lazy_tbl() have been exported. These help you test generated SQL with out an active database connection.
ungroup() correctly resets grouping variables (#2704).
as.sql() safely coerces an input to SQL.
More tranlators for
ident_q() makes it possible to specifier identifiers that do not need to be quoted.
Translation of inline scalars:
Logical values are now translated differently depending on the backend. The default is to use “true” and “false” which is the SQL-99 standard, but not widely support. SQLite translates to “0” and “1” (#2052).
-Inf are correctly escaped
Better test for whether or not a double is similar to an integer and hence needs a trailing 0.0 added (#2004).
Quoting defaults to
::: are handled correctly (#2321)
x %in% 1 is now correctly translated to
x IN (1) (#511).
if_else() use correct argument names in SQL translation (#2225).
ident() now returns an object with class
c("ident", "character"). It no longer contains “sql” to indicate that this is not already escaped.
is.null() gain extra parens in SQL translation to preserve correct precedence (#2302).
log(x, b) is now correctly translated to the SQL
log(b, x) (#2288). SQLite does not support the 2-argument log function so it is translated to
log(x) / log(b).
nth(x, i) is now correctly translated to
n_distinct() now accepts multiple variables (#2148).
substr() is now translated to SQL, correcting for the difference in the third argument. In R, it’s the position of the last character, in SQL it’s the length of the string (#2536).
win_over() escapes expression using current database rules.
copy_to() now uses
db_write_table() instead of
db_collect() allow backends to override the entire database process behind
db_sql_render() allow additional control over the SQL rendering process.
All generics whose behaviour can vary from database to database now provide a DBIConnection method. That means that you can easily scan the NAMESPACE to see the extension points.
sql_escape_logical() allows you to control the translation of literal logicals (#2614).
src_desc() has been replaced by
db_desc() and now dispatches on the connection, eliminating the last method that required dispatch on the class of the src.
win_current_order() are now exported. This should make it easier to provide customised SQL for window functions (#2051, #2126).
SQL translation for Microsoft SQL Server (@edgararuiz)
SQL translation for Apache Hive (@edgararuiz)
SQL translation for Apache Impala (@edgararuiz)
collect() once again defaults to return all rows in the data (#1968). This makes it behave the same as
collect() only regroups by variables present in the data (#2156)
collect() will automatically LIMIT the result to the
n, the number of rows requested. This will provide the query planner with more information that it may be able to use to improve execution time (#2083).
common_by() gets a better error message for unexpected inputs (#2091)
copy_to() no longer checks that the table doesn’t exist before creation, intead preferring to fall back on the database for error messages. This should reduce both false positives and false negative (#1470)
copy_to() now succeeds for MySQL if a character column contains
NA (#1975, #2256, #2263, #2381, @demorenoc, @eduardgrebe).
copy_to() now returns it’s output invisibly (since you’re often just calling for the side-effect).
distinct() reports improved variable information for SQL backends. This means that it is more likely to work in the middle of a pipeline (#2359).
do() on database backends now collects all data locally first (#2392).
dbFetch() instead of the deprecated
fetch() (#2134). Use
DBI::dbExecute() for non-query SQL commands (#1912)
show_query() now invisibly return the first argument, making them easier to use inside a pipeline.
print.tbl_sql() displays ordering (#2287) and prints table name, if known.
print(df, n = Inf) and
head(df, n = Inf) now work with remote tables (#2580).
sql_translate_env() get defaults for DBIConnection.
Formatting now works by overriding the
tbl_sum() generic instead of
print(). This means that the output is more consistent with tibble, and that
format() is now supported also for SQL sources (tidyverse/dbplyr#14).
[API] The signature of
op_base has changed to
op_base(x, vars, class)
partial_eval() have been refined:
translate_sql() no longer takes a vars argument; instead call
Because it no longer needs the environment
translate_sql()_ now works with a list of dots, rather than a
partial_eval() now takes a character vector of variable names rather than a tbl.
This leads to a simplification of the
op data structure: dots is now a list of expressions rather than a
op_vars() now returns a list of quoted expressions. This enables escaping to happen at the correct time (i.e. when the connection is known).