Join intervals on chromosomes in data frames, to the closest partner

genome_join_closest(x, y, by = NULL, mode = "inner",
  distance_column_name = NULL, max_distance = Inf, select = "all")

genome_inner_join_closest(x, y, by = NULL, ...)

genome_left_join_closest(x, y, by = NULL, ...)

genome_right_join_closest(x, y, by = NULL, ...)

genome_full_join_closest(x, y, by = NULL, ...)

genome_semi_join_closest(x, y, by = NULL, ...)

genome_anti_join_closest(x, y, by = NULL, ...)

Arguments

x

A dataframe.

y

A dataframe.

by

A character vector with 3 entries which are used to match the chromosome, start and end column. For example: by=c("Chromosome"="chr", "Start"="start", "End"="end")

mode

One of "inner", "full", "left", "right", "semi" or "anti".

distance_column_name

A string that is used as the new column name with the distance. If NULL no new column is added.

max_distance

The maximum distance that is allowed to join 2 entries.

select

A string that is passed on to IRanges::distanceToNearest, can either be all which means that in case that multiple intervals have the same distance all are reported, or arbitrary which means in that case one would be chosen at random.

...

Additional arguments parsed on to genome_join_closest.

Value

The joined dataframe of x and y.

Examples

library(dplyr) x1 <- data.frame(id = 1:4, bla=letters[1:4], chromosome = c("chr1", "chr1", "chr2", "chr2"), start = c(100, 200, 300, 400), end = c(150, 250, 350, 450)) x2 <- data.frame(id = 1:4, BLA=LETTERS[1:4], chromosome = c("chr1", "chr2", "chr2", "chr1"), start = c(140, 210, 400, 300), end = c(160, 240, 415, 320)) j <- genome_intersect(x1, x2, by=c("chromosome", "start", "end"), mode="both") print(j)
#> id.x bla chromosome id.y BLA start end #> 1 1 a chr1 1 A 140 150 #> 2 4 d chr2 3 C 400 415