Enabling Robots to Plan Motion that Leads to Better Coordination with Humans - article


Integrating human observer inferences into robot motion planning
Anca Dragan, Siddhartha Srinivasa

Imagine a scenario where a robot and a human are collaborating side by side to perform a tightly coupled physical task together, like clearing a table.

The task amplifies the burden on the robot’s motion. Most motion in robotics is purely functional: industrial robots move to package parts, vacuuming robots move to suck dust, and personal robots move to clean up a dirty table. This type of motion is ideal when the robot is performing a task in isolation.

Collaboration, however, does not happen in isolation. In collaboration, the robot’s motion has a human observer, watching and interpreting the motion.

In this paper, Dragan et al. move beyond functional motion, and introduce the notion of an observer and their inferences into motion planning, so that robots can generate motion that is mindful of how it will be interpreted by a human collaborator.

When we collaborate, we make two inferences about our collaborator, action-to-goal and goal-to-action, leading to two important motion properties: legibility and predictability.

Legibility is about conveying intent — moving in a manner that makes the robot’s goal clear to the observer. We infer the robot’s goal based on its ongoing action (action-to-goal).

Predictability is about matching the observer’s expectation — matching the motion they predict when they know the robot’s goal. If we know the robot’s goal, we infer its future action from it (goal-to-action).

Predictable and legible motion can be correlated. For example, in an unambiguous situation, where an actor’s observed motion matches what is expected for a given intent (i.e. is predictable), then this intent can be used to explain the motion. If this is the only intent which explains the motion, the observer can immediately infer the actor’s intent, meaning that the motion is also legible. This is why we tend to assume that predictability implies legibility — that if the robot moves in an expected way, then its intentions will automatically be clear.

The writing domain, however, clearly distinguishes the two. The word legibility, traditionally an attribute of written text legibility, refers to the quality of being easy to read. When we write legibly, we try consciously, and with some effort, to make our writing clear and readable to someone else. The word predictability, on the other hand, refers to the quality of matching expectation. When we write predictably, we fall back to old habits, and write with minimal effort.

As a consequence, our legible and predictable writings are different: our friends do not expect to open our diary and see our legible writing style. They rightfully assume the diary will be written for us, and expect our usual, day-to-day style. By formalizing predictability and legibility as directly stemming from the two inferences in opposing directions, goal-to-action and action-to-goal, we show that the two are different in motion as well.

Ambiguous situations, occurring often in daily tasks, make this opposition clear: more than one possible intent can be used to explain the motion observed so far, rendering the predictable motion illegible. The figure above exemplifies the effect of this contradiction. The robot hand’s motion on the left is predictable in that it matches expected behavior. The hand reaches out directly towards the target. But, it is not legible, failing to make the intent of grasping the green object clear. In contrast, the trajectory on the right is more legible, making it clear that the target is the green object by deliberately bending away from the red object. But it is less predictable, as it does not match the expected behavior of reaching directly.

Dragan et al. produce predictable and legible motion by mathematically modeling how humans infer motion from goals and goals from motion, and introducing trajectory optimizers that maximize the probability that the right inferences will be made. The figure below shows the robot starting with a predictable trajectory (gray) and optimizing it to be more and more legible (orange).

By exaggerating the motion to the right, it becomes more immediately clear that the robot’s goal is the object on the right. Exaggeration is one of the principles of Disney animation, and it naturally emerges out of the mathematics of legible motion.

| More

Related posts


Coordinated UAV Docking - article


Coordinated landing of a quadrotor on a skid-steered ground vehicle in the presence of time delays
John M. Daly, Yan Ma, Steven L. Waslander

Small Unmanned Aerial Vehicles (UAVs) can be both safe and manoeuvrable, but their small size means they can’t carry much payload and their battery life only allows for short flights. To increase the range of a small UAV, one idea is to pair it with an unmanned ground vehicle (UGV) that can carry it to a site of operation and transport heavier cargo. Having both ground and aerial perspectives can also be useful during a mission. One challenge is to make sure the vehicles have the ability to rendezvous and perform coordinated landings autonomously. To this end, Daly et al. present a coordinated control method and experimental results for landing a quadrotor on a ground rover. The two robots communicate their positions, converge to a common docking location and the dock successfully, both indoors and out.

The video above demonstrates the use of a coordinated control strategy for autonomous docking of a Aeryon Scout UAV onto a skid-steer UGV (Unmanned Ground Vehicle) from Clearpath Robotics. The controller handles the nonlinearities inherent in the motions of the two vehicles, and is stable in the face of multi-second time delays, allowing unreliable wifi communication to be used in the landing. Both indoor and outdoor experiments demonstrate the validity of the approach, and also reveal the major disturbance caused by the ground effect when hovering over the ground vehicle.

| More

Grasping with robots – which object is in reach? - article


Representing the robot’s workspace through constrained manipulability analysis
Nikolaus Vahrenkamp, Tamim Asfour

Imagine a robot reaching for a mug on the table, only to realize that it is too far, or that it would need to bend its arm joint backwards to get there. Understanding which objects are within reach and how to grasp them is an essential requirement if robots are to operate in our everyday environments. To solve this problem, Vahrenkamp et al. propose a new approach to build a comprehensive representation of the capabilities of a robot related to reaching and grasping.

The “manipulability” representation shown below allows the robot to know where it can reach in 6D with its right arm. That means it knows which x,y,z positions it can reach, as well as the orientation of the robot hand that is best for manipulation. The representation takes into account constraints due to joints in the arm. The manipulability is encoded by color (blue: low, red: high).

A cut through one of these vector clouds looks like this.

In addition to single handed grasping, the authors discuss how the approach can be extended to grasping with two arms. Experiments were run in simulation on the humanoid robots ARMAR-III and ARMAR-IV.

And in case you want to try this at home, there is an open source version of this work here.

| More

Related posts


Grasping objects in a way that is suitable for manipulation - article


Semantic grasping: planning task-specific stable robotic grasps
Hao Dang, Peter K. Allen

Robots are expected to manipulate a large variety of objects from our everyday lives. The first step is to establish a physical connection between the robot end-effector and the object to be manipulated. In our context, this physical connection is a robotic grasp. What grasp the robot adopts will depend on how it needs to manipulate the object.

Existing grasp planning algorithms have made impressive progress in generating stable robotic grasps. However, stable grasps are mostly good to transport objects. If you consider manipulation, the stability of the grasp is no longer sufficient to guarantee success. For example, a mug can be grasped with a top-down grasp or a side grasp. Both grasps are good for transporting the mug from one place to another. However, if the manipulation task is to pour water out of the mug, the top-down grasp is no longer suitable since the palm and the fingers of the hand may block the opening part of the mug. We call such task-related constraints “semantic constraints”.

In our work, we take an example-based approach to build a grasp planner that searches for stable grasps satisfying semantic constraints. This approach is inspired by psychological research which showed that human grasping is to a very large extent guided by previous grasping experience. To mimic this process, we propose that semantic constraints be embedded into a database which includes partial object geometry, hand kinematics, and tactile contacts. Task specific knowledge in the database should be transferable between similar objects. We design a semantic affordance map which contains a set of depth images from different views of an object and predefined example grasps that satisfy semantic constraints of different tasks. These depth images help infer the approach direction of a robot hand with respect to an object, guiding the hand along an ideal approach direction. Predefined example grasps provide hand kinematics and tactile information to the planner as references to the ideal hand posture and tactile contact formation. Utilizing this information, our planner searches for stable grasps with an ideal approach direction, hand kinematics, and tactile contact formation.

The figure above illustrates the process of planning a semantic grasp on a target object (i.e., a drill) with a given grasping semantics “to-drill” and a semantic affordance map built on a source object (i.e., another drill shown in Step 1, which is similar to the target drill). Step 1 is to retrieve a semantic grasp that is stored in the semantic affordance map. This semantic grasp is used as a reference in the next two steps. Step 2 is to achieve the ideal approach direction on the target object according to the exemplar semantic grasp. Once the ideal approach direction is achieved, a local grasp planning process starts in Step 3 to obtain stable grasps on the target object which share similar tactile feedback and hand posture as that of the exemplar semantic grasp.

The figure below shows some grasps planned on typical everyday objects using the approach. Shown from left to right are: experiment ID, the predefined semantic grasps stored in the semantic affordance map, a pair of source object and target object for each experiment, and the top two grasps generated. The last two columns for the top two grasps were obtained within 180 seconds and are both stable in terms of their quality.

| More

Grasping unknown objects - article


Sparse pose manifolds
Rigas Kouskouridas, Kostantinos Charalampous, Antonios Gasteratos

To manipulate objects, robots are often required to estimate their position and orientation in space. The robot will behave differently if it’s grasping a glass that is standing up, or one that has been tipped over. On the other hand, it shouldn’t make a difference if the robot is gripping two different glasses with similar poses. The challenge is to have robots learn how to grasp new objects, based on previous experience.

To this end, Kouskouridas et al. propose the Sparse Pose Manifolds (SPM) method. As shown in the figure above, different objects viewed from the same perspective should share identical poses. All the objects facing right are in the same “pose-bucket”, which is different from the bucket for objects facing left, or forward. For each pose, the robot knows how to behave to guide the gripper to grasp the object. To grip an unknown object, the robot estimates what “bucket” the object falls into.

The videos below shows how this method can efficiently guide a robotic gripper to grasp an unknown object and the performance of the pose estimation module.



| More

Using geometry to help robots map their environment - article


Feature based graph-SLAM in structured environments
P. de la Puente, D. Rodriguez-Losada

To get around unknown environments, most robots will need to build maps. To help them do so, robots can use the fact that human environments are often made of geometric shapes like circles, rectangles and lines. This paper presents a flexible framework for geometrical robotic mapping in structured environments.

Most human designed environments, such as buildings, present regular geometrical properties that can be preserved in the maps that robots build and use. If some information about the general layout of the environment is available, it can be used to build more meaningful models and significantly improve the accuracy of the resulting maps. Human cognition exploits domain knowledge to a large extent, usually employing prior assumptions for the interpretation of situations and environments. When we see a wall, for example, we assume that it’s straight. We’ll probably also assume that it’s connected to another orthogonal wall.

This research presents a novel framework for the inference and incorporation of knowledge about the structure of the environment into the robotic mapping process. A hierarchical representation of geometrical elements (features) and relations between them (constraints) provides enhanced flexibility, also making it possible to correct wrong hypotheses. Various features and constraints are available, and it is very easy to add even more.

A variety of experiments with both synthetic and real data were conducted. The map below was generated from data measured by a robot navigating Killian Court at MIT using a laser scanner, and allows the geometrical properties of the environment to be well respected. You can easily tell that features are parallel, orthogonal and straight where needed.

| More

What do teachers mean when they say ‘do it like me’? - article


Discovering relevant task spaces using inverse feedback control
Nikolay Jetchev, Marc Toussaint

Teaching robots to do tasks is useful, and teaching them in an easy and non time-intensive way is even more useful. The algorithm TRIC presented in this paper allows robots to observe a few motions from a teacher, understand the essence of what the demonstration is, and then repeat it and adapt it to new situations.

Robots should learn to move and do useful tasks in order to be helpful to humans. However, tasks that are easy for a human, like grasping a glass, are not so obvious for a machine. Programming a robot requires time and work. Instead, what if the robot could watch the human and learn why the human did what he did, and in what way?

This is a task that we people do all the time. Imagine you are playing tennis and the teacher says ‘do the forehand like me’ and then shows an example. How should the student understand this? Should he move his fingers, or his elbow? Should he watch the ball, the racket, the ground, or the net? All these possible reference points can be described with numbers. The algorithm presented in this paper, called Task Space Retrieval Using Inverse Feedback Control (TRIC), can help a robot learn the important aspects of a demonstrated motion. Afterwards, the robot should be able to reproduce the moves like an expert, even if the task changes slightly.

The algorithm was successfully tested in simulation on various grasping and manipulation tasks. The figure above shows one of these tasks in which a robot hand must approach a box and open the cover. The robot was shown 10 sets of trajectories from a simulated teacher. After training, it was then asked to open a series of boxes where the box is moved, rotated, or of a different size. Overall, TRIC was very good on these scenarios with 24 successes out of 25 tries.

| More

Related posts


ManyEars: open source framework for sound processing - article


The ManyEars open framework
François Grondin, Dominic Létourneau, François Ferland, Vincent Rousseau, François Michaud

Making robots that are able to localize, track and separate multiple sound sources, even in noisy places, is essential for their deployment in our everyday environments. This could for example allow them to process human speech, even in crowded places, or identify noises of interest and where they came from. Unlike vision however, there are few software and hardware tools that can easily be integrated to robotic platforms.

The ManyEars open source framework allows users to easily experiment with robot audition. The software, which can be downloaded here, is compatible with ROS (Robot Operating System). Its modular design makes it possible to interface with different microphone configurations and hardware, thereby allowing the same software package to be used for different robots. A Graphical User Interface is provided for tuning parameters and visualizing information about the sound sources in real-time. The ManyEars software library is composed of five modules: Preprocessing, Localization, Tracking, Separation and Postprocessing.

To make use of the ManyEars software, a computer, a sound card and microphones are required. ManyEars can be used with commercially available sound cards and microphones. However, commercial sound cards present limitations when used for embedded robotic applications: they can be expensive and have functionalities which are not required for robot audition. They also require significant amount of power and size. For these reasons, the authors introduce a customized microphone board and sound card available as an open hardware solution that can be used on your robot and interfaced with the software package. The board uses an array of microphones, instead of only one or two, thereby allowing a robot to localize, track, and separate multiple sound sources.

The framework is demonstrated using a microphone array on the IRL-1 robot. The placement of the microphones is marked by red circles. Results show that the robot is able to track two human speakers producing uninterrupted speech sequences, even when they are moving, and crossing paths. For videos of the IRL-1, check out the lab’s YouTube Channel.

| More

Tracking 3D objects in real-time using active stereo vision - article


Real-time visuomotor update of an active binocular head
Michael Sapienza, Miles Hansard, Radu Horaud

Humans have the ability to track objects by turning their head to and gazing at areas of interest. Integrating images from both eyes provides depth information that allows us to represent 3D objects. Such feats could prove useful in robotic systems with similar vision functionalities. POPEYE, shown in the video below, is able to independently move its head and two cameras used for stereo vision.

To perform 3D reconstruction of object features, the robot needs to know the spatial relationship between its two cameras. For this purpose, Sapienza et al. calibrate the robot vision system before the experiment by placing cards with known patterns in the environment and systematically moving the camera motors to learn how these motor changes impact the images captured. After calibration, and thanks to some math (homography-based method), the robot is able to measure how much its motors have moved and relate that to changes in the image features. Measuring motor changes is very fast, allowing for real-time 3D tracking.

Results show that the robot is able to keep track of a human face while performing 3D reconstruction. In the future, the authors hope to add zooming functionalities to their method.

| More

Related posts


Using 3D snapshots to control a small helicopter - article


Design of a 3D snapshot based visual flight control system using a single camera in hover
Matthew A. Garratt, Andrew J. Lambert, Hamid Teimoori

To control a flying robot, you usually need to know the attitude of the robot (roll, pitch, yaw), where it is in the horizontal plane (x,y), and how high it is from the ground (z). While attitude measurements are provided by inertial sensors on board the robot, most flying robots rely on GPS and additional range sensors such as ultra-sound sensors, lasers or radars to determine their position and altitude. GPS signal however is not always available in cluttered environments and can be jammed. Additional sensors increase the weight that needs to be carried by the robot. Instead Garratt et al. propose to replace position sensors with a single small, low cost camera.

By comparing a snapshot taken from a downward pointing camera and a reference snapshot taken at an earlier time, the robot is able to calculate its displacement in the horizontal plane. The loom of the image is used to calculate the change in altitude. Image loom corresponds to image expansion or contraction as can be seen in the images below. By reacting to image displacements, the robot is able to control its position.

Grass as seen from altitudes of 0.25 m, 0.5 m, 1.0 m and 2.0 m (from left to right).

Using this strategy, the researchers were able to show in simulation that a helicopter could perform take-off, hover and the transition from low speed forward flight to hover. The ability to track horizontal and vertical displacements using 3D snapshots from a single camera was then confirmed in reality using a Vario XLC gas-turbine helicopter.

In the future, the authors intend to further test the 3D snapshot control strategy in flight using their Vario XLC helicopter before moving to smaller platforms such as an Asctec Pelican quadrotor. Additional challenges include taking into account the shadow of the robot, which might change position from snapshot to snapshot.

| More

Related posts