But I know nothing about what happens in each city (or "square area"..); in other words is the city a famous tourist location, central of large business life .. etc. This data needs to be manually inserted for each country and is a big task, and will require community help eventually.
I don't know how you are mining the population data, for squares (Google?), but I am glad you have that part figured out.
As far as figuring out whether the square contains important business or tourist centers, doing this manually will be very time consuming and subjective. A good proxy for whether the the area has any kind of attractions is the hotel capacity. While that (number of hotel rooms per square) may be hard to figure out, is it possible to figure out jus number of hotels - through some Google data mining? That would be a good starting point, and the raw number of hotel establishments can be scaled by population density of the squre. A hotel in Manhattan may have 100+ rooms average, while a hotel in a rural are may have 20 room capacity on average...
This step could be sufficient for 99 out of 100 squares, and manual adjustment would be necessary on just ~1 out of 100. squares...
But for now I will continue to make small-scale experiments on how to proceed further. But the first step towards this feature (= base data) is almost done, the next step (= detailed data) and further steps (= forming the demand demand & airports accessing them) are much more complicated than this mere data collection.
That is a complex problem, but it is just a question of developing algorithms and fine tuning them. The algorithms may be somewhat computantionally intensive...
The way I think it should work is as follows:
For each AB pair of squares:
1 - we know the AB demand from the population / business / tourist data
2 - put all this demand into 4 buckets: AB-Y bucket with certain number of passengers, AB-C, AB-F and AB-cargo buckets.
3 - gather the list of flights (including connecting flights - up to 2 stops) that can take passenger from square A to square B
4 - rate the flights based on all of the current desirability factors. Plus add new factors such as number of stops, distance from square to the airport
5 - allocate the passengers to flights. Do this by splitting AB-Y bucket to AB-Y-flight_sequence_1 bucket, AB-Y-flight_sequence_2 bucket etc. based on overall desirability. This may result in overbooking flights, which we will deal with later.
Next step is we look at each flight.
- each flight will have a certain number of buckets associated with it. Add up all the Y seats in those buckets, compare with plane capacity. If demand is, say 2x the number of available seats, split all buckets between those who are going to be taking the flights (1/2 from each bucket), return the rest to their original bucket.
So this would be iterative, it may need 2, 3 or more iterations. So the modification of the algorithm would be:
1 - we know the AB
remaining demand from the population / business / tourist data
that are left to be allocated from previous iteration2 - we will already have them in the buckets, some returned to these buckets because of lack of capacity
3 - gather the list of flights (including connecting flights - up to 2 stops) that can take passenger from square A to square B
that still have capacity left
4 - rate them as above
5 - allocate
Then again, we will look at the flights, their capacity, and the passengers that don't fit will be returned to their original AB buckets.
Each iteration will take less time, since
- there will be fewer and fewer AB squares pairs with passengers that have not been served in previous iterations
- there will be fewer and fewer flights with capacity left
In case you take up this "bucket" concept, it can be used to refine the pricing demand to make it a lot more elastic. Instead of fixed YCF categories, the demand between any AB squares can be split into more price based buckets. For example, now, we may have
90 pax Y demand with default price of $100
9 pax C demand with default price of $300
1 pax F demand with default price of $600
Total 100 pax. Suppose you split it into more bucket, that are more gradual, we can model a scenario where we have, let's say 1 bucket at 100% price of $100, another say 6 buckets above that price going up to $600+ price. The sum of pax would remain 100, but the passengers would just take the seating based on the prices and what they are willing to pay. Let's say:
$100 - 50 pax
$120 - 25 pax
$150 - 13 pax
$200 - 5 pax
$300 - 4 pax
$450 - 2 pax
$600 - 1 pax
Total 100 pax
We can add additional price buckets:
$90 - 10
$80 - 15
$70 - 25
$50 - 50
So if an airline were to price their Y seats at $70, $30 below the default price, the demand would go up by 50%, and at price of $50, it would double.
This way, the demand elasticity would be closer to real world, and cutting prices would not just take passengers from the other guy, but more passengers would become available...