Intersect data frames based on chromosome, start and end.

genome_cluster(x, by = NULL, max_distance = 0,
  cluster_column_name = "cluster_id")

Arguments

x	A dataframe.
by	A character vector with 3 entries which are the chromosome, start and end column. For example: `by=c("chr", "start", "end")`
max_distance	The maximum distance up to which intervals are still considered to be the same cluster. Default: 0.
cluster_column_name	A string that is used as the new column name

Value

The dataframe with the additional column of the cluster

Examples


library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following object is masked from ‘package:testthat’:
#> 
#>     matches
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union

x1 <- data.frame(id = 1:4, bla=letters[1:4],
                 chromosome = c("chr1", "chr1", "chr2", "chr1"),
                 start = c(100, 120, 300, 260),
                 end = c(150, 250, 350, 450))
genome_cluster(x1, by=c("chromosome", "start", "end"))
#> # A tibble: 4 x 6
#>      id bla   chromosome start   end cluster_id
#>   <int> <fct> <fct>      <dbl> <dbl>      <dbl>
#> 1     1 a     chr1         100   150          0
#> 2     2 b     chr1         120   250          0
#> 3     3 c     chr2         300   350          2
#> 4     4 d     chr1         260   450          1
genome_cluster(x1, by=c("chromosome", "start", "end"), max_distance=10)
#> # A tibble: 4 x 6
#>      id bla   chromosome start   end cluster_id
#>   <int> <fct> <fct>      <dbl> <dbl>      <dbl>
#> 1     1 a     chr1         100   150          0
#> 2     2 b     chr1         120   250          0
#> 3     3 c     chr2         300   350          1
#> 4     4 d     chr1         260   450          0

Intersect data frames based on chromosome, start and end.

Arguments

Value

Examples

Contents