Monday, April 19, 2010

A VERY SIMPLE CHATBOX IN PYTHON

A VERY SIMPLE CHATBOX IN PYTHON

A naive chatbot program. No parsing, no cleverness, just a training file and output.


It first trains itself on a text and then later uses the data from that training to generate responses to the interlocutor's input. The training process creates a dictionary where each key is a word and the value is a list of all the words that follow that word sequentially anywhere in the training text. If a word features more than once in this list then that reflects and it is more likely to be chosen by the bot, no need for probabilistic stuff just do it with a list.

The bot chooses a random word from your input and generates a response by choosing another random word that has been seen to be a successor to its held word. It then repeats the process by finding a successor to that word in turn and carrying on iteratively until it thinks it's said enough. It reaches that conclusion by stopping at a word that was prior to a punctuation mark in the training text. It then returns to input mode again to let you respond, and so on.

It isn't very realistic but I hereby challenge anyone to do better in 71 lines of code !! This is a great challenge for any budding Pythonists, and I just wish I could open the challenge to a wider audience than the small number of visitors I get to this blog. To code a bot that is always guaranteed to be grammatical must surely be closer to several hundred lines, I simplified hugely by just trying to think of the simplest rule to give the computer a mere stab at having something to say.

Its responses are rather impressionistic to say the least ! Also you have to put what you say in single quotes.

I used War and Peace for my "corpus" which took a couple of hours for the training run, use a shorter file if you are impatient...

here is the trainer

#lukebot-trainer.py
import pickle
b=open('war&peace.txt')
text=[]
for line in b:
for word in line.split():
text.append (word)
b.close()
textset=list(set(text))
follow={}
for l in range(len(textset)):
working=[]
check=textset[l]
for w in range(len(text)-1):
if check==text[w] and text[w][-1] not in '(),.?!':
working.append(str(text[w+1]))
follow[check]=working
a=open('lexicon-luke','wb')
pickle.dump(follow,a,2)
a.close()


here is the bot

#lukebot.py
import pickle,random
a=open('lexicon-dict','rb')
successorlist=pickle.load(a)
a.close()
def nextword(a):
if a in successorlist:
return random.choice(successorlist[a])
else:
return 'the'
speech=''
while speech!='quit':
speech=input('>')
s=random.choice(speech.split())
response=''
while True:
neword=nextword(s)
response+=' '+neword
s=neword
if neword[-1] in ',?!.':
break
print response


You tend to get an uncanny feeling when it says something that seems partially to make sense. I think this is a universal syndrome with chatbots and shows us a lot about our expectations and hopes for machine intelligence. The tendency to attribute some kind of surreal cosmic wisdom or irony to the responses is common.

TALKING TO TOLSTOY ! ...... ;)

Sample output:( is this like talking to Tolstoy? Not really but still...)

>>>
>'hello'
the human affairs he thought.
>'oh those affairs were great'
God,
>'what -surprised?'
do you value but she is what had the new country,
>'i do value but i do not blame the war on her'
so alarming them with those around with particular chair by all those who caressed me.
>'i never get anyone caressing me!'
allowed to talk about in a really so.
>'oh do not be prudish I am allowed to mention it'
should she suddenly heard of a duty for reason was handed him very hot.
>'you like these women a lot to talk about them so much'
advantage if your father?
>'my father approves of my romantic side'
love my son's having obeyed in the causes historic events that circle they didn't you know a force producing two Emperors with a certain prince.
>'oh now it is princes is it'
God,
>'do you believe in Him?'
the insistence of incongruities giving you married at his squadron--that is,
>' i don't understand you sir'
>'i do not understand you sir'
the bridge which,
>'i crossed'
himself,
>'ok ok... well 71 lines still is not bad'
the work at Smolensk?
>'yes'
or that of War.
>'no, war'
was gradually that millions of married in which is Dolokhov?


REFERENCES

(1) Pythomism - Luke's website

No comments: