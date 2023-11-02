NYU’s LLMTime program finds the next possible event in a sequence of events, as represented by a string of numerical digits.

New York University

Today’s generative artificial intelligence programs, tools like ChatGPT, are ready to produce many more types of results than just text, as ZDNET explored in some depth.

One of the most important of those “modalities”, as they are known, is called time series data –Data that measures the same variable at different points in time to detect trends. Data in time series format can be important for things like tracking a patient’s medical history over time with entries made by the physician in a chart. doing what is called a time series Forecast This means taking historical data and predicting what’s going to happen next; For example: “Will this patient get better?”

The traditional approach to time series data involves software designed specifically for that type of data. But now, generic AI is gaining a new ability to handle time series data, in the same way it handles essay questions, image creation, software coding, and various other tasks at which ChatGPT and similar programs have excelled.

In a new study published this month by Nate Gruver of New York University and colleagues at NYU and Carnegie Mellon, OpenAI’s GPT-3 program was trained to predict the next event in a time series, similar to predicting the next word in a sentence. Has gone.

“Because language models are built to represent complex probability distributions over sequences, they are theoretically suitable for time series modeling,” Gruver and team write in their paper, “Large Language Models for Zero-Shot Time Series are predictors,” posted on the arXiv pre-print server. “Time series data usually takes the same form as language modeling data, as a collection of sequences.”

Gruver and team write that the program they created, LLMTime, is “highly simple”, and is able to surpass or match purpose-built time series methods across a range of different problems. zero-shot fashion, which means that LLMTime can be used without any fine-tuning on downstream data used by other models.”

Key to the creation of LLMTime was for Gruver and team to rethink something called “tokenization,” the way a larger language model represents the data it is working on.

Programs like GPT-3 have a fixed way that they input words and characters, breaking them into pieces that can be inserted one at a time. Time series data is represented as a sequence of numbers, such as “123”; Time series is simply the pattern that contains such sequence of numbers.

Given this, GPT-3’s tokenization is problematic because it will often break those strings into strange groups. “For example, the number 42235630 is tokenized as [422, 35, 630] by the GPT-3 tokenizer, and changing even one digit can result in completely different tokenization,” Gruver and team say.

To avoid those awkward groupings, Gruver and team created the code to put white space around each digit of the digit sequence, so that each digit could be encoded separately.

They then went to work on training GPT-3 to predict the next digit sequence in real-world examples of time series.

Any time series is a sequence of things that happen one after the other, like, “The dog jumped down from the sofa and ran to the door,” where one event happens, and then another. An example of a real data set that people might want to make predictions about would be predicting ATM withdrawals based on historical withdrawals. A bank would be very interested in predicting such things.

Forecasting ATM withdrawals, in fact, is one of the challenges of the real-time series competition such as the Artificial Neural Network and Computational Intelligence Forecasting Competition run by Lancaster University, UK. That set of data is simply a string of strings and numbers, in this form:

T1: 1996-03-18 00-00-00 : 13.4070294784581, 14.7250566893424, etc.

The first part is obviously the date and time stamp for “T1”, representing the first moment in time, and the amounts that follow (separated by periods, not commas, as in European notation. It happens). The challenge for a neural net is to predict, given thousands or millions of such items, what will happen in the next moment after the last instance in the chain – how much money will be withdrawn by customers tomorrow.

The authors say, “LLMTime is not only able to generate plausible completenesses of real and synthetic time series, but it also achieves high probabilities.” […] In zero-shot evaluation compared to dedicated time series models […]“Which have been built over decades.

The LLMTime program detects where in the distribution a number occurs, a distinct pattern of repeating numbers, to conclude whether the sequence represents one of the common patterns such as “exponential” or Gaussian.

However, Gruver and team point out that one of the limitations of large language models is that they can only take in so much data at a time, known as a “context window.” To handle larger and larger time series, the program will need to expand that context window to many more tokens. This is a project that is being considered by multiple parties, such as the Hyena team at Stanford University and Canada’s MILA Institute for AI and Microsoft, among others.

The obvious question is why a large language model should be good at predicting numbers. As the authors note, for any sequence of numbers such as an ATM withdrawal, “there are arbitrarily many generation rules that correspond to the input.” Translation: There are so many reasons why those particular strings of numbers might appear that it would be hard to guess what the underlying rule is for them.

The answer is that GPT-3 and others like it find rules that are the simplest of all possible rules. Gruver and team write, referring to the principle of parsimony, “LLMs can make predictions effectively because they prefer the completeness achieved by simple rules, adopting a form of Occam’s razor.”

Sometimes the GPT-4 program goes astray when it tries to explain what a time series pattern is, indicating that it doesn’t actually “understand” time series in the traditional sense.

This doesn’t actually mean GPT-3 understands What’s going on. In the second experiment, Gruver and team presented GPT-4 (a more powerful successor to GPT-3) a new data set that they had created using a special mathematical function. To answer the question, “Can GPT-4 explain its understanding of a given time series in text,” Gruver and team wrote, they asked GPT-4 to extract the mathematical function that produces the time series. Asked for.

They found that GPT-4 was able to predict a mathematical function better than random chance, but it gave some explanations that were not correct. “The model sometimes makes wrong conclusions about the behavior of the data it observes, or the expected behavior of candidate actions.” In other words, even if a program like GPT-4 can perform well at predicting the next thing in a time series, its explanations still become “hallucinations”, which is the tendency to give wrong answers.

Gruver and team are excited about how time series fit into a multi-model future for generative AI. “Formulating time series forecasting as a natural language construct can be seen as another step toward unifying more capabilities within a larger and powerful model, with understanding across multiple functions and modalities. can be shared,” they write in their concluding section.

The code for LLMTime is posted on GitHub.

