ROS探索总结(十)——语音控制

  • 内容
  • 评论
  • 相关

        如今语音识别在PC机和智能手机上炒的火热,ROS走在技术的最前沿当然也不会错过这么帅的技术。ROS中使用了CMU Sphinx和Festival开源项目中的代码,发布了独立的语音识别包,而且可以将识别出来的语音转换成文字,然后让机器人智能处理后说话。


 

一、语音识别包


 

1、安装

        安装很简单,直接使用ubuntu命令即可,首先安装依赖库:
  1. $ sudo apt-get install gstreamer0.10-pocketsphinx  
  2. $ sudo apt-get install ros-fuerte-audio-common  
  3. $ sudo apt-get install libasound2

        然后来安装ROS包:

  1. $ svn checkout http://albany-ros-pkg.googlecode.com/svn/trunk/rharmony  
  2. $ rosmake pocketsphinx

 

        其中的核心文件就是nodes文件夹下的recognizer.py文件了。这个文件通过麦克风收集语音信息,然后调用语音识别库进行识别生成文本信息,通过/recognizer/output消息发布,其他节点就可以订阅该消息然后进行相应的处理了。

2、测试

        安装完成后我们就可以运行测试了。
        首先,插入你的麦克风设备,然后在系统设置里测试麦克风是否有语音输入。
        然后,运行包中的测试程序:
      $ roslaunch pocketsphinx robocup.launch 
        此时,在终端中会看到一大段的信息。尝试说一些简单的语句,当然,必须是英语,例如:bring me the glass,come with me,看看能不能识别出来。
       《ros by example》这本书中写得测试还是很准确的,但是我在测试中感觉识别相当不准确,可能是我的英语太差了吧。
       
        我们也可以直接看ROS最后发布的结果消息:
        $ rostopic echo /recognizer/output 
       


二、语音库


1、查看语音库

        这个语音识别时一种离线识别的方法,将一些常用的词汇放到一个文件中,作为识别的文本库,然后分段识别语音信号,最后在库中搜索对应的文本信息。如果想看语音识别库中有哪些文本信息,可以通过下面的指令进行查询:
  1. $ roscd pocketsphinx/demo  
  2. $ more robocup.corpus

 

2、添加语音库

        我们可以自己向语音库中添加其他的文本识别信息,《ros by example》自带的例程中是带有语音识别的例程的,而且有添加语音库的例子。
        首先看看例子中要添加的文本信息:

  1. $ roscd rbx1_speech/config  
  2. $ more nav_commands.txt

 

       
        这就是需要添加的文本,我们也可以修改其中的某些文本,改成自己需要的。
        然后我们要把这个文件在线生成语音信息和库文件,这一步需要登陆网站http://www.speech.cs.cmu.edu/tools/lmtool-new.html,根据网站的提示上传文件,然后在线编译生成库文件。

        把下载的文件都解压放在rbx1_speech包的config文件夹下。我们可以给这些文件改个名字: 
  1. $ roscd rbx1_speech/config  
  2. $ rename -f 's/3026/nav_commands/' *

 

        在rbx1_speech/launch文件夹下看看voice_nav_commands.launch这个文件:

  1. <launch>  
  2. <node name="recognizer" pkg="pocketsphinx" type="recognizer.py"  
  3. output="screen">  
  4. <param name="lm" value="$(find rbx1_speech)/config/nav_commands.lm"/>  
  5. <param name="dict" value="$(find rbx1_speech)/config/nav_commands.dic"/>  
  6. </node>  
  7. </launch>

        可以看到,这个launch文件在运行recognizer.py节点的时候使用了我们生成的语音识别库和文件参数,这样就可以实用我们自己的语音库来进行语音识别了。
        通过之前的命令来测试一下效果如何吧:
 
  1. $ roslaunch rbx1_speech voice_nav_commands.launch  
  2. $ rostopic echo /recognizer/output

三、语音控制

        有了语音识别,我们就可以来做很多犀利的应用了,首先先来尝试一下用语音来控制机器人动作。

1、机器人控制节点

        前面说到的recognizer.py会将最后识别的文本信息通过消息发布,那么我们来编写一个机器人控制节点接收这个消息,进行相应的控制即可。
        在pocketsphinx包中本身有一个语音控制发布Twist消息的例程voice_cmd_vel.py,rbx1_speech包对其进行了一些简化修改,在nodes文件夹里可以查看voice_nav.py文件:
  1. #!/usr/bin/env python  
  2.  
  3. """ 
  4.   voice_nav.py 
  5.  
  6.   Allows controlling a mobile base using simple speech commands. 
  7.  
  8.   Based on the voice_cmd_vel.py script by Michael Ferguson in 
  9.   the pocketsphinx ROS package. 
  10.  
  11.   See http://www.ros.org/wiki/pocketsphinx 
  12. """  
  13.  
  14. import roslib; roslib.load_manifest('rbx1_speech')  
  15. import rospy  
  16. from geometry_msgs.msg import Twist  
  17. from std_msgs.msg import String  
  18. from math import copysign  
  19.  
  20. class VoiceNav:  
  21.     def __init__(self):  
  22.         rospy.init_node('voice_nav')  
  23.  
  24.         rospy.on_shutdown(self.cleanup)  
  25.  
  26.         # Set a number of parameters affecting the robot's speed  
  27.         self.max_speed = rospy.get_param("~max_speed", 0.4)  
  28.         self.max_angular_speed = rospy.get_param("~max_angular_speed", 1.5)  
  29.         self.speed = rospy.get_param("~start_speed", 0.1)  
  30.         self.angular_speed = rospy.get_param("~start_angular_speed", 0.5)  
  31.         self.linear_increment = rospy.get_param("~linear_increment", 0.05)  
  32.         self.angular_increment = rospy.get_param("~angular_increment", 0.4)  
  33.  
  34.         # We don't have to run the script very fast  
  35.         self.rate = rospy.get_param("~rate", 5)  
  36.         r = rospy.Rate(self.rate)  
  37.  
  38.         # A flag to determine whether or not voice control is paused  
  39.         self.paused = False  
  40.  
  41.         # Initialize the Twist message we will publish.  
  42.         self.cmd_vel = Twist()  
  43.  
  44.         # Publish the Twist message to the cmd_vel topic  
  45.         self.cmd_vel_pub = rospy.Publisher('cmd_vel', Twist)  
  46.  
  47.         # Subscribe to the /recognizer/output topic to receive voice commands.  
  48.         rospy.Subscriber('/recognizer/output', String, self.speech_callback)  
  49.  
  50.         # A mapping from keywords or phrases to commands  
  51.         self.keywords_to_command = {'stop': ['stop', 'halt', 'abort', 'kill', 'panic', 'off', 'freeze', 'shut down', 'turn off', 'help', 'help me'],  
  52.                                     'slower': ['slow down', 'slower'],  
  53.                                     'faster': ['speed up', 'faster'],  
  54.                                     'forward': ['forward', 'ahead', 'straight'],  
  55.                                     'backward': ['back', 'backward', 'back up'],  
  56.                                     'rotate left': ['rotate left'],  
  57.                                     'rotate right': ['rotate right'],  
  58.                                     'turn left': ['turn left'],  
  59.                                     'turn right': ['turn right'],  
  60.                                     'quarter': ['quarter speed'],  
  61.                                     'half': ['half speed'],  
  62.                                     'full': ['full speed'],  
  63.                                     'pause': ['pause speech'],  
  64.                                     'continue': ['continue speech']}  
  65.  
  66.         rospy.loginfo("Ready to receive voice commands")  
  67.  
  68.         # We have to keep publishing the cmd_vel message if we want the robot to keep moving.  
  69.         while not rospy.is_shutdown():  
  70.             self.cmd_vel_pub.publish(self.cmd_vel)  
  71.             r.sleep()                         
  72.  
  73.     def get_command(self, data):  
  74.         # Attempt to match the recognized word or phrase to the   
  75.         # keywords_to_command dictionary and return the appropriate  
  76.         # command  
  77.         for (command, keywords) in self.keywords_to_command.iteritems():  
  78.             for word in keywords:  
  79.                 if data.find(word) > -1:  
  80.                     return command  
  81.  
  82.     def speech_callback(self, msg):  
  83.         # Get the motion command from the recognized phrase  
  84.         command = self.get_command(msg.data)  
  85.  
  86.         # Log the command to the screen  
  87.         rospy.loginfo("Command: " + str(command))  
  88.  
  89.         # If the user has asked to pause/continue voice control,  
  90.         # set the flag accordingly   
  91.         if command == 'pause':  
  92.             self.paused = True  
  93.         elif command == 'continue':  
  94.             self.paused = False  
  95.  
  96.         # If voice control is paused, simply return without  
  97.         # performing any action  
  98.         if self.paused:  
  99.             return         
  100.  
  101.         # The list of if-then statements should be fairly  
  102.         # self-explanatory  
  103.         if command == 'forward':      
  104.             self.cmd_vel.linear.x = self.speed  
  105.             self.cmd_vel.angular.z = 0  
  106.  
  107.         elif command == 'rotate left':  
  108.             self.cmd_vel.linear.x = 0  
  109.             self.cmd_vel.angular.z = self.angular_speed  
  110.  
  111.         elif command == 'rotate right':    
  112.             self.cmd_vel.linear.x = 0        
  113.             self.cmd_vel.angular.z = -self.angular_speed  
  114.  
  115.         elif command == 'turn left':  
  116.             if self.cmd_vel.linear.x != 0:  
  117.                 self.cmd_vel.angular.z += self.angular_increment  
  118.             else:          
  119.                 self.cmd_vel.angular.z = self.angular_speed  
  120.  
  121.         elif command == 'turn right':      
  122.             if self.cmd_vel.linear.x != 0:  
  123.                 self.cmd_vel.angular.z -= self.angular_increment  
  124.             else:          
  125.                 self.cmd_vel.angular.z = -self.angular_speed  
  126.  
  127.         elif command == 'backward':  
  128.             self.cmd_vel.linear.x = -self.speed  
  129.             self.cmd_vel.angular.z = 0  
  130.  
  131.         elif command == 'stop':   
  132.             # Stop the robot!  Publish a Twist message consisting of all zeros.           
  133.             self.cmd_vel = Twist()  
  134.  
  135.         elif command == 'faster':  
  136.             self.speed += self.linear_increment  
  137.             self.angular_speed += self.angular_increment  
  138.             if self.cmd_vel.linear.x != 0:  
  139.                 self.cmd_vel.linear.x += copysign(self.linear_increment, self.cmd_vel.linear.x)  
  140.             if self.cmd_vel.angular.z != 0:  
  141.                 self.cmd_vel.angular.z += copysign(self.angular_increment, self.cmd_vel.angular.z)  
  142.  
  143.         elif command == 'slower':  
  144.             self.speed -= self.linear_increment  
  145.             self.angular_speed -= self.angular_increment  
  146.             if self.cmd_vel.linear.x != 0:  
  147.                 self.cmd_vel.linear.x -= copysign(self.linear_increment, self.cmd_vel.linear.x)  
  148.             if self.cmd_vel.angular.z != 0:  
  149.                 self.cmd_vel.angular.z -= copysign(self.angular_increment, self.cmd_vel.angular.z)  
  150.  
  151.         elif command in ['quarter', 'half', 'full']:  
  152.             if command == 'quarter':  
  153.                 self.speed = copysign(self.max_speed / 4, self.speed)  
  154.  
  155.             elif command == 'half':  
  156.                 self.speed = copysign(self.max_speed / 2, self.speed)  
  157.  
  158.             elif command == 'full':  
  159.                 self.speed = copysign(self.max_speed, self.speed)  
  160.  
  161.             if self.cmd_vel.linear.x != 0:  
  162.                 self.cmd_vel.linear.x = copysign(self.speed, self.cmd_vel.linear.x)  
  163.  
  164.             if self.cmd_vel.angular.z != 0:  
  165.                 self.cmd_vel.angular.z = copysign(self.angular_speed, self.cmd_vel.angular.z)  
  166.  
  167.         else:  
  168.             return  
  169.  
  170.         self.cmd_vel.linear.x = min(self.max_speed, max(-self.max_speed, self.cmd_vel.linear.x))  
  171.         self.cmd_vel.angular.z = min(self.max_angular_speed, max(-self.max_angular_speed, self.cmd_vel.angular.z))  
  172.  
  173.     def cleanup(self):  
  174.         # When shutting down be sure to stop the robot!  
  175.         twist = Twist()  
  176.         self.cmd_vel_pub.publish(twist)  
  177.         rospy.sleep(1)  
  178.  
  179. if __name__=="__main__":  
  180.     try:  
  181.         VoiceNav()  
  182.         rospy.spin()  
  183.     except rospy.ROSInterruptException:  
  184.         rospy.loginfo("Voice navigation terminated.")

 
        可以看到,代码中定义了接收到各种命令时的控制策略。

2、仿真测试

        和前面一样,我们在rviz中进行仿真测试。
        首先是运行一个机器人模型:
        $ roslaunch rbx1_bringup fake_turtlebot.launch 
        然后打开rviz:
        $ rosrun rviz rviz -d `rospack find rbx1_nav`/sim_fuerte.vcg 
        如果不喜欢在终端里看输出,可以打开gui界面:
        $ rxconsole 
        再打开语音识别的节点:
        $ roslaunch rbx1_speech voice_nav_commands.launch 
        最后就是机器人的控制节点了:
        $ roslaunch rbx1_speech turtlebot_voice_nav.launch 
       OK,打开上面这一堆的节点之后,就可以开始了。可以使用的命令如下:

        下图是我的测试结果,不过感觉准确度还是欠佳:

四、播放语音

        现在机器人已经可以按照我们说的话行动了,要是机器人可以和我们对话就更好了。ROS中已经集成了这样的包,下面就来尝试一下。
        运行下面的命令:
 
  1. $ rosrun sound_play soundplay_node.py  
  2. $ rosrun sound_play say.py "Greetings Humans. Take me to your leader."

 

        有没有听见声音!ROS通过识别我们输入的文本,让机器人读了出来。发出这个声音的人叫做kal_diphone,如果不喜欢,我们也可以换一个人来读: 
  1. $ sudo apt-get install festvox-don  
  2. $ rosrun sound_play say.py "Welcome to the future" voice_don_diphone

 

       哈哈,这回换了一个人吧,好吧,这不也是我们的重点。
       在rbx1_speech/nodes文件夹中有一个让机器人说话的节点talkback.py:

  1. #!/usr/bin/env python  
  2.  
  3. """ 
  4.     talkback.py - Version 0.1 2012-01-10 
  5.  
  6.     Use the sound_play client to say back what is heard by the pocketsphinx recognizer. 
  7.  
  8.     Created for the Pi Robot Project: http://www.pirobot.org 
  9.     Copyright (c) 2012 Patrick Goebel.  All rights reserved. 
  10.  
  11.     This program is free software; you can redistribute it and/or modify 
  12.     it under the terms of the GNU General Public License as published by 
  13.     the Free Software Foundation; either version 2 of the License, or 
  14.     (at your option) any later version.5 
  15.  
  16.     This program is distributed in the hope that it will be useful, 
  17.     but WITHOUT ANY WARRANTY; without even the implied warranty of 
  18.     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
  19.     GNU General Public License for more details at: 
  20.  
  21.     http://www.gnu.org/licenses/gpl.htmlPoint 
  22. """  
  23.  
  24. import roslib; roslib.load_manifest('rbx1_speech')  
  25. import rospy  
  26. from std_msgs.msg import String  
  27. from sound_play.libsoundplay import SoundClient  
  28. import sys  
  29.  
  30. class TalkBack:  
  31.     def __init__(self, script_path):  
  32.         rospy.init_node('talkback')  
  33.  
  34.         rospy.on_shutdown(self.cleanup)  
  35.  
  36.         # Set the default TTS voice to use  
  37.         self.voice = rospy.get_param("~voice", "voice_don_diphone")  
  38.  
  39.         # Set the wave file path if used  
  40.         self.wavepath = rospy.get_param("~wavepath", script_path + "/../sounds")  
  41.  
  42.         # Create the sound client object  
  43.         self.soundhandle = SoundClient()  
  44.  
  45.         # Wait a moment to let the client connect to the  
  46.         # sound_play server  
  47.         rospy.sleep(1)  
  48.  
  49.         # Make sure any lingering sound_play processes are stopped.  
  50.         self.soundhandle.stopAll()  
  51.  
  52.         # Announce that we are ready for input  
  53.         self.soundhandle.playWave(self.wavepath + "/R2D2a.wav")  
  54.         rospy.sleep(1)  
  55.         self.soundhandle.say("Ready", self.voice)  
  56.  
  57.         rospy.loginfo("Say one of the navigation commands...")  
  58.  
  59.         # Subscribe to the recognizer output and set the callback function  
  60.         rospy.Subscriber('/recognizer/output', String, self.talkback)  
  61.  
  62.     def talkback(self, msg):  
  63.         # Print the recognized words on the screen  
  64.         rospy.loginfo(msg.data)  
  65.  
  66.         # Speak the recognized words in the selected voice  
  67.         self.soundhandle.say(msg.data, self.voice)  
  68.  
  69.         # Uncomment to play one of the built-in sounds  
  70.         #rospy.sleep(2)  
  71.         #self.soundhandle.play(5)  
  72.  
  73.         # Uncomment to play a wave file  
  74.         #rospy.sleep(2)  
  75.         #self.soundhandle.playWave(self.wavepath + "/R2D2a.wav")  
  76.  
  77.     def cleanup(self):  
  78.         self.soundhandle.stopAll()  
  79.         rospy.loginfo("Shutting down talkback node...")  
  80.  
  81. if __name__=="__main__":  
  82.     try:  
  83.         TalkBack(sys.path[0])  
  84.         rospy.spin()  
  85.     except rospy.ROSInterruptException:  
  86.         rospy.loginfo("Talkback node terminated.")

         我们来运行看一下效果:
         $ roslaunch rbx1_speech talkback.launch
         然后再说话,ROS不仅将文本信息识别出来了,而且还读了出来,厉害吧。当然了,现在还没有加入什么人工智能的算法,不能让机器人和我们聪明的说话,不过这算是基础了,以后有时间再研究一下人工智能就更犀利了。


原创文章,转载请注明: 转载自古月居

本文链接地址: ROS探索总结(十)——语音控制

微信 OR 支付宝 扫描二维码
为本文作者 打个赏
pay_weixinpay_weixin

评论

18条评论
  1. Gravatar 头像

    BrightRain 回复

    我按照您书上的方法,出现了以下的错误,ImportError No module named pygst

  2. Gravatar 头像

    陈遥 回复

    请问一下我在安装依赖时为什么出现E: 无法定位软件包 gstreamer0.10-pocketsphinx,版本是kinetic,是因为需要翻墙吗?

    • 古月

      古月 回复

      @海旋涡 参考http://www.guyuehome.com/514,在indigo上重新做的,现在应该不用单独编译rharmony了

  3. Gravatar 头像

    俞敏杰 回复

    我试了一下,bring me glass那边识别率还是挺高的,到了forward那边识别率就比较低了。另外我发现发音浑浊一些识别率会变高,可能还是跟英文的准确度有关。

    • 古月

      古月 回复

      @胜杰-tarantula 不好意思,所有人都可以回复,不知道为什么网站拦截了你的回复,我检查一下配置

  4. Gravatar 头像

    谢志浩 回复

    请问文中提到的rbx1是什么文件,我在ros by example 中看到关于rbx1的文字,还是不知道从哪里下载以及安装。请大神指教

发表评论

电子邮件地址不会被公开。 必填项已用*标注