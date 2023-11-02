Given some examples of how to accomplish the task with standard objects, if the objects involved have different visual or physical characteristics, humans can extrapolate and learn to solve variations of the manipulation task. To make the learned policies universal for different object scales, orientations, and visual appearances, existing studies in robot learning still need considerable data enhancement. However, despite these enhancements, generalization to undiscovered variations is not guaranteed.

A new paper from Stanford University investigates the challenge of zero-shot learning of a visuomotor policy that can take as input a small number of sample trajectories from single source manipulation scenarios and scenarios with unseen object visual appearance, size, and pose. Can generalize. In particular, it was important to learn strategies for dealing with deformable and clear objects like clothes or boxes, as well as rigid objects like pick-and-place. To ensure that the learned policy is robust across different object placements, orientations, and scales, the proposal was to incorporate equivalence in the visual object representation and policy architecture.

They present EquivAct, a new visuomotor policy learning approach that can learn closed-loop policies for 3D robot manipulation tasks from demonstrations in a single source manipulation scenario and generalize zero-shot to invisible scenarios. The learned policy takes the robot’s end-effector posture and a partial point cloud of the environment as inputs and takes the robot’s movements, such as end-effector velocity and gripper commands, as outputs. Unlike most previous work, the researchers used a SIM(3)-equivalent network architecture for their neural network. This means that when the input point cloud and end-effector position are translated and rotated the outputs will adjust to the end-effector velocity type. Since their policy architecture is equivalent, it can learn from demonstrations of small-scale tabletop activities and then generalize to zero-shot mobile manipulation tasks involving larger variations of displayed objects with different visual and physical appearances. .

This approach is divided into two parts: representation learning and policy. To train the agent’s representations, the team first provides it with a set of synthetic point clouds that were captured using the same cameras and settings as the target task’s objects, but with a different random non-uniform scale. They supplemented the training data in such a way to accommodate non-uniform scaling, even though the suggested architecture is equivalent to uniform scaling. Simulated data does not need to show robot activities or even reflect actual work. To extract global and local features from the scene point cloud, they use simulated data to train a SIM(3)-equivariant encoder-decoder architecture. During training, an adversarial learning loss was used on the paired point cloud inputs to combine local features for corresponding object sections of objects in the same position. During the policy-learning phase, it was recognized that access to a sample of previously validated work trajectories was limited.

Researchers use the data to train a closed-loop policy that, given a partial point cloud of the scene as input, uses a previously learned encoder to extract global and local features from the point cloud. and then feeds those features into a SIM(3)-equivalent action prediction network to predict final effector movements. Beyond the standard rigid object manipulation tasks of previous work, the proposed method is evaluated on more complex tasks of comforter folding, container covering, and box sealing.

The team presents several human examples in which a person manipulates a tabletop object for each activity. After demonstrating the method, they evaluated it on a mobile manipulation platform, where robots would have to solve the same problem on a much larger scale. The findings show that this method is capable of learning a closed-loop robot manipulation policy from the source manipulation demo and executing the target task in a single run without the need for fine-tuning. It is further demonstrated that the approach is more efficient than that and relies on significant enhancements for normalization of out-of-distribution object pose and scale. It also outperforms tasks that do not exploit equivalence.

check it out paper And Project, All credit for this research goes to the researchers of this project. Don’t forget to join as well Our 32k+ ML subreddit, 40k+ facebook community, discord channelAnd email newsletterWhere we share the latest AI research news, cool AI projects, and more.

If you like our work, you’ll also like our newsletter..

we are on too Wire And Whatsapp.

Dhanashree Shenvai is a Computer Science Engineer with a keen interest in the applications of AI and has a wealth of experience in FinTech companies covering financial, cards & payments and banking domains. He is passionate about exploring new technologies and advancements that make everyone’s lives easier in today’s changing world.

🔥 Meet Retouch4me: a family of artificial intelligence-powered plug-ins for photography retouching

Source: www.marktechpost.com