Aggregation is one of the most common uses for R. There are several ways to do so in R, which we will illustrate here.

For this, we will use the function aggregate, which can be used as follows:

```
aggregate(formula,function,data)
```

The following code shows various ways of using the aggregate function.

**CODE:**

```
df = data.frame(group=c("Group 1","Group 1","Group 2","Group 2","Group 2"), subgroup = c("A","A","A","A","B"),value = c(2,2.5,1,2,1.5))
# sum, grouping by one column
aggregate(value~group, FUN=sum, data=df)
# mean, grouping by one column
aggregate(value~group, FUN=mean, data=df)
# sum, grouping by multiple columns
aggregate(value~group+subgroup,FUN=sum,data=df)
# custom function, grouping by one column
# in this example we want the sum of all values larger than 2 per group.
aggregate(value~group, FUN=function(x) sum(x[x>2]), data=df)
```

**OUTPUT:**

```
> df = data.frame(group=c("Group 1","Group 1","Group 2","Group 2","Group 2"), subgroup = c("A","A","A","A","B"),value = c(2,2.5,1,2,1.5))
> print(df)
group subgroup value
1 Group 1 A 2.0
2 Group 1 A 2.5
3 Group 2 A 1.0
4 Group 2 A 2.0
5 Group 2 B 1.5
>
> # sum, grouping by one column
> aggregate(value~group, FUN=sum, data=df)
group value
1 Group 1 4.5
2 Group 2 4.5
>
> # mean, grouping by one column
> aggregate(value~group, FUN=mean, data=df)
group value
1 Group 1 2.25
2 Group 2 1.50
>
> # sum, grouping by multiple columns
> aggregate(value~group+subgroup,FUN=sum,data=df)
group subgroup value
1 Group 1 A 4.5
2 Group 2 A 3.0
3 Group 2 B 1.5
>
> # custom function, grouping by one column
> # in this example we want the sum of all values larger than 2 per group.
> aggregate(value~group, FUN=function(x) sum(x[x>2]), data=df)
group value
1 Group 1 2.5
2 Group 2 0.0
```

Aggregating with dplyr is easy! You can use the group_by() and the summarize() functions for this. Some examples are given below.

**CODE:**

```
# Aggregating with dplyr
library(dplyr)
df = data.frame(group=c("Group 1","Group 1","Group 2","Group 2","Group 2"), subgroup = c("A","A","A","A","B"),value = c(2,2.5,1,2,1.5))
print(df)
# sum, grouping by one column
df %>% group_by(group) %>% summarize(value = sum(value)) %>% as.data.frame()
# mean, grouping by one column
df %>% group_by(group) %>% summarize(value = mean(value)) %>% as.data.frame()
# sum, grouping by multiple columns
df %>% group_by(group,subgroup) %>% summarize(value = sum(value)) %>% as.data.frame()
# custom function, grouping by one column
# in this example we want the sum of all values larger than 2 per group.
df %>% group_by(group) %>% summarize(value = sum(value[value>2])) %>% as.data.frame()
```

**OUTPUT:**

```
> library(dplyr)
>
> df = data.frame(group=c("Group 1","Group 1","Group 2","Group 2","Group 2"), subgroup = c("A","A","A","A","B"),value = c(2,2.5,1,2,1.5))
> print(df)
group subgroup value
1 Group 1 A 2.0
2 Group 1 A 2.5
3 Group 2 A 1.0
4 Group 2 A 2.0
5 Group 2 B 1.5
>
> # sum, grouping by one column
> df %>% group_by(group) %>% summarize(value = sum(value)) %>% as.data.frame()
group value
1 Group 1 4.5
2 Group 2 4.5
>
> # mean, grouping by one column
> df %>% group_by(group) %>% summarize(value = mean(value)) %>% as.data.frame()
group value
1 Group 1 2.25
2 Group 2 1.50
>
> # sum, grouping by multiple columns
> df %>% group_by(group,subgroup) %>% summarize(value = sum(value)) %>% as.data.frame()
group subgroup value
1 Group 1 A 4.5
2 Group 2 A 3.0
3 Group 2 B 1.5
>
> # custom function, grouping by one column
> # in this example we want the sum of all values larger than 2 per group.
> df %>% group_by(group) %>% summarize(value = sum(value[value>2])) %>% as.data.frame()
group value
1 Group 1 2.5
2 Group 2 0.0
```

Grouping with the data.table package is done using the syntax `dt[i, j, by]`

Which can be read out loud as: "*Take dt, subset rows using i, then calculate j, grouped by by.*" Within the dt statement, multiple calculations or groups should be put in a list. Since an alias for `list()`

is `.()`

, both can be used interchangeably. In the examples below we use `.()`

.

**CODE:**

```
# Aggregating with data.table
library(data.table)
dt = data.table(group=c("Group 1","Group 1","Group 2","Group 2","Group 2"), subgroup = c("A","A","A","A","B"),value = c(2,2.5,1,2,1.5))
print(dt)
# sum, grouping by one column
dt[,.(value=sum(value)),group]
# mean, grouping by one column
dt[,.(value=mean(value)),group]
# sum, grouping by multiple columns
dt[,.(value=sum(value)),.(group,subgroup)]
# custom function, grouping by one column
# in this example we want the sum of all values larger than 2 per group.
dt[,.(value=sum(value[value>2])),group]
```

**OUTPUT:**

```
> # Aggregating with data.table
> library(data.table)
>
> dt = data.table(group=c("Group 1","Group 1","Group 2","Group 2","Group 2"), subgroup = c("A","A","A","A","B"),value = c(2,2.5,1,2,1.5))
> print(dt)
group subgroup value
1: Group 1 A 2.0
2: Group 1 A 2.5
3: Group 2 A 1.0
4: Group 2 A 2.0
5: Group 2 B 1.5
>
> # sum, grouping by one column
> dt[,.(value=sum(value)),group]
group value
1: Group 1 4.5
2: Group 2 4.5
>
> # mean, grouping by one column
> dt[,.(value=mean(value)),group]
group value
1: Group 1 2.25
2: Group 2 1.50
>
> # sum, grouping by multiple columns
> dt[,.(value=sum(value)),.(group,subgroup)]
group subgroup value
1: Group 1 A 4.5
2: Group 2 A 3.0
3: Group 2 B 1.5
>
> # custom function, grouping by one column
> # in this example we want the sum of all values larger than 2 per group.
> dt[,.(value=sum(value[value>2])),group]
group value
1: Group 1 2.5
2: Group 2 0.0
```

This modified text is an extract of the original Stack Overflow Documentation created by following contributors and released under CC BY-SA 3.0

This website is not affiliated with Stack Overflow

Email: [email protected]