Insight

In the beginning, we got the insight form the stock prediction. Studies have shown that if we have enough historical data of a stock, it is possible to train a predictor to achieve reasonable accuracy through some complex training model such as Neural network. Studies have shown, even though the price vibrated very heavy across the time. the NN is surprisedly predicted very well. However, when we look back to the historical price data of a steam game, we find out that there are some fundamental different between stock price and steam price. First of all stock’s price is influenced by the total amount of market sell. In contrast, Steam Game’s price is determined by the publisher, which means we may not possible to gain the knowledge of whether the price will go up or down through the historical data. Also the steam historical price curve is very sharp, it is very different from the stock curve. Then we realise that predicting the price of Steam is not practical.

But, do we really need to predict every exact price of every game at every day in order to determine whether we should buy a game or not ? All we need to know is whether a game is going to have a discount very soon or not. Then we could decide we should buy the game right away and enjoy it, or wait a couple days to see whether it would have a discount. In other words, we should focus on training the model to capture the marketing mechanism behind discounts, therefore we probably will be able to predict when will the game have the discount.

Digging into Steam data

Then we do some investigations on Steam games. We crawled over 1300 games’ historical data from the SteamDB. Here is a histogram of discount gap means, we could clearly see some kind of Gaussian distribution’s centroid around 60 days. This gave us more confidence on predicting discount from the historical data. At this point , we transferred the difficult regression problem into a relatively easier multi-label classification problem and still could infer the information we want the most.

Problem Formulation

  • Input :
    1. Same game in different time slots are different instance
    2. Meta Features:
      • Publisher
      • Genres
    3. Static Features:
      • Discount Gap mean
      • Discount Gap variance
    4. Time relative Features:
      • How many days it has been since the latest discount
      • Month
  • Output:
    • 1 : Buy Now
    • 0 : Caution
    • -1 : Don’t Buy

Ground truth labeling

The primary goal of this project is to help people make the correct decision, therefore labeling our instance correctly is crucial to get our desire predictor. Long story short, we need to tell our training model which time slots of some games are “Buy Now”, “Caution” ,or “Don’t Buy”. Here are how we labeling our instance.

  1. Label a time slot instance as “Buy Now” when it’s game had a discount.
  2. Should we label all the others as “Don’t buy” ? It is too pessimistic. Recall that we don’t what to wait a whole month just for a game to discount. So, we make an assumption here, we assume that most people can’t wait any longer than 20 days for a game. As the result, we label all time slots that will have discount within 20 days.
  3. But what about those time slots between “But Now” and “Don’t Buy”?We introduce CCC curve. CCC curve is kind like an exponentially decreasing function, it is created to fit the trend of most human passion which quickly decreases across time. Then we can set a reasonable threshold and extend “Buy Now” labels and label the rest with “Caution”.

We think this approach fits our goal perfectly.