regex - Text processing / regular expressions? in R -

i have data frame following columns.

user_id: g17165fd2e0bba9a449857645bb6g3a9a7ef8e6c   time: 1361553741   url: string url.

the url, sometime, takes form https://something.com/name/forum/thread?thread_id=51.

i want create data frame tells me each user, between time x , y, how many time or visited each thread_id. so, number of observations equal number of user , number of columns equal number of thread ids + 1(the total views)

the data set big, doing in parallel must.

what best way of doing in r ?

thanks lot!

ps: @david create code generates data frame 1 mentioned, , provided perfect answer question.

set.seed(2) #make junk data dat <- data.frame(user=1:5,                                  time=1:20,                                  url=paste0("https://domain.com/forum/thread?     thread_id=",sample(5,20,t)))

pretty sure work you:

> library(plyr) > library(domc) > library(reshape2) >  > set.seed(2) > #make junk data > dat <- data.frame(user=1:5, +                   time=1:20, +                   url=paste0("https://domain.com/forum/thread?thread_id=",sample(5,20,t))) > head(dat)   user time                                         url 1    1    1 https://domain.com/forum/thread?thread_id=1 2    2    2 https://domain.com/forum/thread?thread_id=4 3    3    3 https://domain.com/forum/thread?thread_id=3 4    4    4 https://domain.com/forum/thread?thread_id=1 5    5    5 https://domain.com/forum/thread?thread_id=5 6    1    6 https://domain.com/forum/thread?thread_id=5 > #subet within time range > dat <- dat[dat$time >=1 & dat$time <= 20,] >  > #make threadid variable > dat$threadid <- gsub("^.*thread_id=",'',dat$url) >  >  > #register parallel cores > registerdomc(4) > #count number of thread occurrences each user (in parallel) > dat.new <- ddply(dat,.(user,threadid),summarize,threadcount=length(threadid),.parallel=true) > #reshape data in format want > dat.new <- dcast(dat.new,user~threadid,value.var="threadcount",fill=0) > #add total views > dat.new$totalview <- rowsums(dat.new[,-1]) > dat.new   user 1 2 3 4 5 totalview 1    1 1 0 1 0 2         4 2    2 1 1 0 1 1         4 3    3 0 1 1 1 1         4 4    4 2 0 2 0 0         4 5    5 1 0 2 0 1         4

Search This Blog

Brazell

regex - Text processing / regular expressions? in R -

Comments

Post a Comment

Popular posts from this blog

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

html - How to style widget with post count different than without post count -

url rewriting - How to redirect a http POST with urlrewritefilter -