The GPT model is pretty impressive, but did you know it’s not the first natural language model out there? In fact, natural language processing (NLP) has been around for decades, and we’ve had varying degrees of success with it. You might remember HAL 9000 from 2001: A Space Odyssey, a computer that could understand and respond to human language. Or maybe you’ve heard of 790, the robotic head character from the LEXX TV series with an insufferable personality.
As we’ve seen from these sci-fi examples, conversational agents that aren’t user-friendly have been a popular topic for years. And in reality, the GPT model also does have its limitations.
But, as we continue exploring the potential of conversational agents like the GPT model, we may be on the edge of a new trend where they can gradually replace traditional UI systems. This shift has already begun in various industries, with chatbots and voice assistants becoming more common in customer service and personal assistant roles.
Recently, I’ve been testing ways of using the GPT model and immediately faced the challenge of output parsing. In the early 2000s, I had an opportunity to work with Natural Language Grammar-based speech recognition engines in the Silicon Valley and wanted to try to apply this same approach. While being quite a bit more powerful than older NL engines, the GPT model is also more stubborn (obviously!) in its output format. And while it does understand the JSON schema format and can produce an output conforming to the input schema, it is not always doing what is expected.
As the initial tests were quite promising, I envisioned a proof-of-concept bot capable of invoking Web API requests to showcase the technology. The easiest and most available test bench was a timesheeting system we use internally. And the initial scope was quite simple – build a chatbot that can submit a timesheet on a person’s behalf. As the project scope escalated beyond one person’s weekend project, I teamed up with Ivan Oh, Tim Golubev, and Ev Uklad.
As a team, we could quickly build a test bench with a stateless chatbot microservice connected to the OpenAI API and the timesheeting system. We then worked on the JSON schema representing the timesheet data, including the fields for employee names, dates, hours worked, and some metadata fields. Then, we trained the GPT model using the schema as a guide to understand and generate responses based on the data model.
The results are encouraging – we can get a simple task like submitting a timesheet or change of approver completed end-to-end in response to a text request.
We’re now taking this approach to the next level by implementing the use of the GPT model’s conversation context to enhance the chatbot’s functionality and enable more natural interactions. For instance, the chatbot will respond to a message such as “please submit my timesheet” by requesting additional details such as the project name and how many hours were worked. Using the GPT model to generate these responses, we can develop a straightforward chatbot that can be used intuitively without user training.
One thing to note: we chose the timesheet management process as a readily available process for the purpose of proof-of-concept. However, the ultimate goal is to develop a more generic solution that can be applied to a wide range of systems that require a UI today.
We plan to explore more advanced techniques, such as sequence-to-sequence modelling and attention mechanisms. These methods have been shown to be effective in improving the quality of conversational systems, and we believe that they will be instrumental in enabling more complex interactions with the GPT model. We will continue to investigate these techniques and provide updates on our progress in future posts.