This paper presents a method for the detection and
recognition of social interactions in a day-long first-person
video of a social event, like a trip to an amusement park.
The location and orientation of faces are estimated and
used to compute the line of sight for each face. The context
provided by all the faces in a frame is used to convert the
lines of sight into locations in space to which individuals attend.
Further, individuals are assigned roles based on their
patterns of attention. The roles and locations of individuals
are analyzed over time to detect and recognize the types
of social interactions. In addition to patterns of face locations
and attention, the head movements of the first-person
can provide additional useful cues as to their attentional focus.
We demonstrate encouraging results on detection and
recognition of social interactions in first-person videos captured
from multiple days of experience in amusement parks.