Evaluating the effectiveness of robot behaviors in human-robot interactions - article

Multivariate evaluation of interactive robot systems
Bilge Mutlu, Chien-Ming Huang,

Robots that interact with everyday users may need a combination of speech, gaze, and gesture behaviors to convey their message effectively. This is similar to human-human interactions except that every behavior the robot displays must be designed and programmed ahead of time. In other words, designers of robot applications must understand how each of these behaviors contributes to the robot’s effectiveness so that they can determine which behaviors must be included in the application’s design.

To this end, Huang and Mutlu from the University of Wisconsin–Madison present a method that designers can use to determine which behaviors should be used to produce a desired effect. They illustrate the method’s use by designing and evaluating a set of narrative behaviors for a storytelling robot that might be used in educational, informational, and entertainment settings.

As an example, the figure above shows the Wakamaru humanlike robot coordinating speech, gaze, and gesture to tell a story on the process of making paper. The full narration lasted approximately six minutes. One result showed how the robot’s use of pointing gestures improved its audience’s recall of story information and by how much. The impact of different gestures on the robot’s performance is further captured in the diagram shown below. Such a diagram can be used by robot designers to choose appropriate behaviors from a large set of behaviors, or understand the impact each behavior has on the goals of their design.

| More

Who goes where? Robot boats self-organize to defend against attacks - article

Model-predictive asset guarding by team of autonomous surface vehicles in environment with civilian boats
Eric Raboin, Petr Švec, Dana S. Nau, Satyandra K. Gupta

Individual robots have accomplished many impressive feats. However, certain tasks are much better suited for a team of robots. The use of multiple robots enables the team to divide the task space into regions and commence the task simultaneously in each region of the space (e.g., consider the case of multiple robotic lawn mowers simultaneously cutting grass over a large patch of land). The presence of multiple robots in a team makes it easier to recover from failures. The failure of a highly capable robot doing a task by itself leads to the mission being aborted. However, the failure of a robot in a team often means that the team can continue with the mission.

Because of the above mentioned benefits, there has been significant interest in developing and deploying robot teams. Broadly speaking, there are two different approaches to realize robot teams. The first approach uses robots with very limited intelligence that utilize simple rules to govern their behaviors based on the state of their neighbors and the environment (e.g., consider a robot team inspired by a flock of birds). This type of approach is often called swarming. Despite the apparent simplicity of interaction rules and limited intelligence of individual members, a swarm can exhibit complex and interesting behaviors as a collective. The second approach involves use of intelligent robots that use coordination and cooperation among themselves to tackle challenging missions. In this type of team, each individual robot is capable of doing planning and control for its assigned tasks, actively contribute to the team task assignment, and share situational awareness with other members of the team.

We are interested in deploying robot teams in missions that involve adversaries. In such missions, communication among robots can be limited and intermittent. Hence using a central controller is not feasible. Sensing involves high level of uncertainty due to the possibility of adversaries actively utilizing deception to impede the efforts of the robot team. The adversary may be able devise methods to confuse or misdirect robots with limited reasoning and planning capabilities. Hence we decided to deploy a team of robots that have planning and reasoning capabilities to deal with these situations. Successfully carrying out a challenging mission requires each member of the robot team to do a particular task at a given time. Depending upon the mission context, the role of a robot may change over the time. Each robot should be able to take actions to carry out the task assigned to it. In challenging fast changing environments, this means that individual robots and team need to make good decisions rapidly. Computationally slow techniques for making optimal decision will not work in these types of situations. Instead, the team will need to teach itself how to make fast decisions in a given context.

We have developed computational techniques to optimize behaviors of individual robots and how they assign tasks among themselves using a decentralized architecture. Our approach enables optimization of behaviors based on a given mission scenario. Our approach is able to incorporate physics-based models of the adversaries and the environment in the behavior optimization process. The developed approach enables the robot team to utilize model-predictive simulation in the decision making process. This enables the team and robots to make good decisions rapidly. Results show that using our approach a team of robotic boats can effectively defend against an attack on a valuable target under the conditions of limited communication and significant sensing uncertainty.

| More

Enabling Robots to Plan Motion that Leads to Better Coordination with Humans - article

Integrating human observer inferences into robot motion planning
Anca Dragan, Siddhartha Srinivasa

Imagine a scenario where a robot and a human are collaborating side by side to perform a tightly coupled physical task together, like clearing a table.

The task amplifies the burden on the robot’s motion. Most motion in robotics is purely functional: industrial robots move to package parts, vacuuming robots move to suck dust, and personal robots move to clean up a dirty table. This type of motion is ideal when the robot is performing a task in isolation.

Collaboration, however, does not happen in isolation. In collaboration, the robot’s motion has a human observer, watching and interpreting the motion.

In this paper, Dragan et al. move beyond functional motion, and introduce the notion of an observer and their inferences into motion planning, so that robots can generate motion that is mindful of how it will be interpreted by a human collaborator.

When we collaborate, we make two inferences about our collaborator, action-to-goal and goal-to-action, leading to two important motion properties: legibility and predictability.

Legibility is about conveying intent — moving in a manner that makes the robot’s goal clear to the observer. We infer the robot’s goal based on its ongoing action (action-to-goal).

Predictability is about matching the observer’s expectation — matching the motion they predict when they know the robot’s goal. If we know the robot’s goal, we infer its future action from it (goal-to-action).

Predictable and legible motion can be correlated. For example, in an unambiguous situation, where an actor’s observed motion matches what is expected for a given intent (i.e. is predictable), then this intent can be used to explain the motion. If this is the only intent which explains the motion, the observer can immediately infer the actor’s intent, meaning that the motion is also legible. This is why we tend to assume that predictability implies legibility — that if the robot moves in an expected way, then its intentions will automatically be clear.

The writing domain, however, clearly distinguishes the two. The word legibility, traditionally an attribute of written text legibility, refers to the quality of being easy to read. When we write legibly, we try consciously, and with some effort, to make our writing clear and readable to someone else. The word predictability, on the other hand, refers to the quality of matching expectation. When we write predictably, we fall back to old habits, and write with minimal effort.

As a consequence, our legible and predictable writings are different: our friends do not expect to open our diary and see our legible writing style. They rightfully assume the diary will be written for us, and expect our usual, day-to-day style. By formalizing predictability and legibility as directly stemming from the two inferences in opposing directions, goal-to-action and action-to-goal, we show that the two are different in motion as well.

Ambiguous situations, occurring often in daily tasks, make this opposition clear: more than one possible intent can be used to explain the motion observed so far, rendering the predictable motion illegible. The figure above exemplifies the effect of this contradiction. The robot hand’s motion on the left is predictable in that it matches expected behavior. The hand reaches out directly towards the target. But, it is not legible, failing to make the intent of grasping the green object clear. In contrast, the trajectory on the right is more legible, making it clear that the target is the green object by deliberately bending away from the red object. But it is less predictable, as it does not match the expected behavior of reaching directly.

Dragan et al. produce predictable and legible motion by mathematically modeling how humans infer motion from goals and goals from motion, and introducing trajectory optimizers that maximize the probability that the right inferences will be made. The figure below shows the robot starting with a predictable trajectory (gray) and optimizing it to be more and more legible (orange).

By exaggerating the motion to the right, it becomes more immediately clear that the robot’s goal is the object on the right. Exaggeration is one of the principles of Disney animation, and it naturally emerges out of the mathematics of legible motion.

| More

Related posts

Coordinated UAV Docking - article

Coordinated landing of a quadrotor on a skid-steered ground vehicle in the presence of time delays
John M. Daly, Yan Ma, Steven L. Waslander

Small Unmanned Aerial Vehicles (UAVs) can be both safe and manoeuvrable, but their small size means they can’t carry much payload and their battery life only allows for short flights. To increase the range of a small UAV, one idea is to pair it with an unmanned ground vehicle (UGV) that can carry it to a site of operation and transport heavier cargo. Having both ground and aerial perspectives can also be useful during a mission. One challenge is to make sure the vehicles have the ability to rendezvous and perform coordinated landings autonomously. To this end, Daly et al. present a coordinated control method and experimental results for landing a quadrotor on a ground rover. The two robots communicate their positions, converge to a common docking location and the dock successfully, both indoors and out.

The video above demonstrates the use of a coordinated control strategy for autonomous docking of a Aeryon Scout UAV onto a skid-steer UGV (Unmanned Ground Vehicle) from Clearpath Robotics. The controller handles the nonlinearities inherent in the motions of the two vehicles, and is stable in the face of multi-second time delays, allowing unreliable wifi communication to be used in the landing. Both indoor and outdoor experiments demonstrate the validity of the approach, and also reveal the major disturbance caused by the ground effect when hovering over the ground vehicle.

| More

Grasping with robots – which object is in reach? - article

Representing the robot’s workspace through constrained manipulability analysis
Nikolaus Vahrenkamp, Tamim Asfour

Imagine a robot reaching for a mug on the table, only to realize that it is too far, or that it would need to bend its arm joint backwards to get there. Understanding which objects are within reach and how to grasp them is an essential requirement if robots are to operate in our everyday environments. To solve this problem, Vahrenkamp et al. propose a new approach to build a comprehensive representation of the capabilities of a robot related to reaching and grasping.

The “manipulability” representation shown below allows the robot to know where it can reach in 6D with its right arm. That means it knows which x,y,z positions it can reach, as well as the orientation of the robot hand that is best for manipulation. The representation takes into account constraints due to joints in the arm. The manipulability is encoded by color (blue: low, red: high).

A cut through one of these vector clouds looks like this.

In addition to single handed grasping, the authors discuss how the approach can be extended to grasping with two arms. Experiments were run in simulation on the humanoid robots ARMAR-III and ARMAR-IV.

And in case you want to try this at home, there is an open source version of this work here.

| More

Related posts

Grasping objects in a way that is suitable for manipulation - article

Semantic grasping: planning task-specific stable robotic grasps
Hao Dang, Peter K. Allen

Robots are expected to manipulate a large variety of objects from our everyday lives. The first step is to establish a physical connection between the robot end-effector and the object to be manipulated. In our context, this physical connection is a robotic grasp. What grasp the robot adopts will depend on how it needs to manipulate the object.

Existing grasp planning algorithms have made impressive progress in generating stable robotic grasps. However, stable grasps are mostly good to transport objects. If you consider manipulation, the stability of the grasp is no longer sufficient to guarantee success. For example, a mug can be grasped with a top-down grasp or a side grasp. Both grasps are good for transporting the mug from one place to another. However, if the manipulation task is to pour water out of the mug, the top-down grasp is no longer suitable since the palm and the fingers of the hand may block the opening part of the mug. We call such task-related constraints “semantic constraints”.

In our work, we take an example-based approach to build a grasp planner that searches for stable grasps satisfying semantic constraints. This approach is inspired by psychological research which showed that human grasping is to a very large extent guided by previous grasping experience. To mimic this process, we propose that semantic constraints be embedded into a database which includes partial object geometry, hand kinematics, and tactile contacts. Task specific knowledge in the database should be transferable between similar objects. We design a semantic affordance map which contains a set of depth images from different views of an object and predefined example grasps that satisfy semantic constraints of different tasks. These depth images help infer the approach direction of a robot hand with respect to an object, guiding the hand along an ideal approach direction. Predefined example grasps provide hand kinematics and tactile information to the planner as references to the ideal hand posture and tactile contact formation. Utilizing this information, our planner searches for stable grasps with an ideal approach direction, hand kinematics, and tactile contact formation.

The figure above illustrates the process of planning a semantic grasp on a target object (i.e., a drill) with a given grasping semantics “to-drill” and a semantic affordance map built on a source object (i.e., another drill shown in Step 1, which is similar to the target drill). Step 1 is to retrieve a semantic grasp that is stored in the semantic affordance map. This semantic grasp is used as a reference in the next two steps. Step 2 is to achieve the ideal approach direction on the target object according to the exemplar semantic grasp. Once the ideal approach direction is achieved, a local grasp planning process starts in Step 3 to obtain stable grasps on the target object which share similar tactile feedback and hand posture as that of the exemplar semantic grasp.

The figure below shows some grasps planned on typical everyday objects using the approach. Shown from left to right are: experiment ID, the predefined semantic grasps stored in the semantic affordance map, a pair of source object and target object for each experiment, and the top two grasps generated. The last two columns for the top two grasps were obtained within 180 seconds and are both stable in terms of their quality.

| More

Grasping unknown objects - article

Sparse pose manifolds
Rigas Kouskouridas, Kostantinos Charalampous, Antonios Gasteratos

To manipulate objects, robots are often required to estimate their position and orientation in space. The robot will behave differently if it’s grasping a glass that is standing up, or one that has been tipped over. On the other hand, it shouldn’t make a difference if the robot is gripping two different glasses with similar poses. The challenge is to have robots learn how to grasp new objects, based on previous experience.

To this end, Kouskouridas et al. propose the Sparse Pose Manifolds (SPM) method. As shown in the figure above, different objects viewed from the same perspective should share identical poses. All the objects facing right are in the same “pose-bucket”, which is different from the bucket for objects facing left, or forward. For each pose, the robot knows how to behave to guide the gripper to grasp the object. To grip an unknown object, the robot estimates what “bucket” the object falls into.

The videos below shows how this method can efficiently guide a robotic gripper to grasp an unknown object and the performance of the pose estimation module.

| More

Using geometry to help robots map their environment - article

Feature based graph-SLAM in structured environments
P. de la Puente, D. Rodriguez-Losada

To get around unknown environments, most robots will need to build maps. To help them do so, robots can use the fact that human environments are often made of geometric shapes like circles, rectangles and lines. This paper presents a flexible framework for geometrical robotic mapping in structured environments.

Most human designed environments, such as buildings, present regular geometrical properties that can be preserved in the maps that robots build and use. If some information about the general layout of the environment is available, it can be used to build more meaningful models and significantly improve the accuracy of the resulting maps. Human cognition exploits domain knowledge to a large extent, usually employing prior assumptions for the interpretation of situations and environments. When we see a wall, for example, we assume that it’s straight. We’ll probably also assume that it’s connected to another orthogonal wall.

This research presents a novel framework for the inference and incorporation of knowledge about the structure of the environment into the robotic mapping process. A hierarchical representation of geometrical elements (features) and relations between them (constraints) provides enhanced flexibility, also making it possible to correct wrong hypotheses. Various features and constraints are available, and it is very easy to add even more.

A variety of experiments with both synthetic and real data were conducted. The map below was generated from data measured by a robot navigating Killian Court at MIT using a laser scanner, and allows the geometrical properties of the environment to be well respected. You can easily tell that features are parallel, orthogonal and straight where needed.

| More

What do teachers mean when they say ‘do it like me’? - article

Discovering relevant task spaces using inverse feedback control
Nikolay Jetchev, Marc Toussaint

Teaching robots to do tasks is useful, and teaching them in an easy and non time-intensive way is even more useful. The algorithm TRIC presented in this paper allows robots to observe a few motions from a teacher, understand the essence of what the demonstration is, and then repeat it and adapt it to new situations.

Robots should learn to move and do useful tasks in order to be helpful to humans. However, tasks that are easy for a human, like grasping a glass, are not so obvious for a machine. Programming a robot requires time and work. Instead, what if the robot could watch the human and learn why the human did what he did, and in what way?

This is a task that we people do all the time. Imagine you are playing tennis and the teacher says ‘do the forehand like me’ and then shows an example. How should the student understand this? Should he move his fingers, or his elbow? Should he watch the ball, the racket, the ground, or the net? All these possible reference points can be described with numbers. The algorithm presented in this paper, called Task Space Retrieval Using Inverse Feedback Control (TRIC), can help a robot learn the important aspects of a demonstrated motion. Afterwards, the robot should be able to reproduce the moves like an expert, even if the task changes slightly.

The algorithm was successfully tested in simulation on various grasping and manipulation tasks. The figure above shows one of these tasks in which a robot hand must approach a box and open the cover. The robot was shown 10 sets of trajectories from a simulated teacher. After training, it was then asked to open a series of boxes where the box is moved, rotated, or of a different size. Overall, TRIC was very good on these scenarios with 24 successes out of 25 tries.

| More

Related posts

ManyEars: open source framework for sound processing - article

The ManyEars open framework
François Grondin, Dominic Létourneau, François Ferland, Vincent Rousseau, François Michaud

Making robots that are able to localize, track and separate multiple sound sources, even in noisy places, is essential for their deployment in our everyday environments. This could for example allow them to process human speech, even in crowded places, or identify noises of interest and where they came from. Unlike vision however, there are few software and hardware tools that can easily be integrated to robotic platforms.

The ManyEars open source framework allows users to easily experiment with robot audition. The software, which can be downloaded here, is compatible with ROS (Robot Operating System). Its modular design makes it possible to interface with different microphone configurations and hardware, thereby allowing the same software package to be used for different robots. A Graphical User Interface is provided for tuning parameters and visualizing information about the sound sources in real-time. The ManyEars software library is composed of five modules: Preprocessing, Localization, Tracking, Separation and Postprocessing.

To make use of the ManyEars software, a computer, a sound card and microphones are required. ManyEars can be used with commercially available sound cards and microphones. However, commercial sound cards present limitations when used for embedded robotic applications: they can be expensive and have functionalities which are not required for robot audition. They also require significant amount of power and size. For these reasons, the authors introduce a customized microphone board and sound card available as an open hardware solution that can be used on your robot and interfaced with the software package. The board uses an array of microphones, instead of only one or two, thereby allowing a robot to localize, track, and separate multiple sound sources.

The framework is demonstrated using a microphone array on the IRL-1 robot. The placement of the microphones is marked by red circles. Results show that the robot is able to track two human speakers producing uninterrupted speech sequences, even when they are moving, and crossing paths. For videos of the IRL-1, check out the lab’s YouTube Channel.

| More