r - Select random row for each unique value in one specific column od fata frame -
i have quite simple request cannot, however, deal use of 1 code line.
all want subset input data frame in way in output data frame there 1 randomly selected row each unique value (factor's level) of 1 particular data frame's column.
e.x. have (v2 particular data frame's column)
v1 v2 1 1 2 b 1 3 c 2 4 1 5 b 2 6 b 1 7 b 1 8 c 2 9 d 1 10 e 1
and want have output data frame:
v1 v2 1 b 1 2 c 2
thank suggestions in advance!
this way more asked for, wrote function called stratified
lets take random samples data.frame
1 or more group variables.
you can load , use this:
library(devtools) source_gist("https://gist.github.com/mrdwab/6424112") # [1] "https://raw.github.com/gist/6424112" # sha-1 hash of file 0006d8548785ec8a5651c3dd599648cc88d153a4 ## 1 row stratified(mydf, "v2", 1) # v1 v2 # 10 e 1 # 8 c 2 ## 2 rows stratified(mydf, "v2", 2) # v1 v2 # 2 b 1 # 6 b 1 # 3 c 2 # 5 b 2
i'll add official documentation function @ point, here's summary best use out of it:
the arguments stratified
are:
df
: inputdata.frame
group
: character vector of column or columns make "strata".size
: desired sample size.- if
size
value less 1, proportionate sample taken each stratum. - if
size
single integer of 1 or more, number of samples taken each stratum. - if
size
vector of integers, specified number of samples taken each stratum. recommended use named vector. example, if have 2 strata, "a" , "b", , wanted 5 samples "a" , 10 "b", entersize = c(a = 5, b = 10)
.
- if
select
: allows subset groups in sampling process.list
. instance, ifgroup
variable "group", , contained 3 strata, "a", "b", , "c", wanted sample "a" , "c", can useselect = list(group = c("a", "c"))
.replace
: sampling replacement.
Comments
Post a Comment