I think there are some fundamental questions that need to be answered before you can even attempt to do the algorithm to distribute pax demand. Let's take 3 airports in Los Angeles, Chicago, New York. Let's assume an airline is flying NYC-LAX and charging $200. Another airline is flying NYC-CHI and CHI-LAX and charging $150 for both flights. If a pax goes NYC-CHI-LAX then how much is the ticket? $200 or $300? If you force an airline to sell it for $200, they are missing out on $100 in revenue that could have been earned by not allowing connecting flights. If the airline is getting $200, they shouldn't be because the direct flight is $200--they should be charging something less than $200 for the inconvenience of connecting.
Let's say the default ticket price (to keep things very simple) is $100 to take off + $1 per mile And distances as follows:
JFK-ORD = 600nm, $700
ORD-LAX = 1600nm, $1700
JFK-LAX = 2000nm, $2100
Then full JFK-ORD-LAX would be 2200 nm and $2300
I think what can come into play is pricing. On dense, competitive routes, player can chose to lower than default pricing (as a formula derived from default), in order to drive more traffic into its hub, making the route more viable pricewise, even accounting for transfer inconvenience.
If the player sets price of both ORD-JFK and ORD-LAX at 80% of default, resulting in total price of $560 + $1360 = $1920, which is less than $2300 default price, and some price conscious customers could chose to go this route, assuming everybody at the direct flight is charging $2300 (or there is no capacity left on the direct flights).
But majority of people connecting would be doing so because there is no direct flight. There are no direct flights that can connect a vast majority of the demand square in the world. If one lives in, say Albany, NY, there may only be a 10-15 destinations you can fly to (I am guessing) and everywhere else, you have to connect. If a person wants to go from Albany, NY to Split, Croatia, it will take 2 connections (not sure if it is viable to model more than 1 connection in the system).
I don't think you can setup a pax demand algorithm without first setting up business plans, the simplest form being 3 options: Traditional hub/spoke with connecting pax (like Delta), Focus City airlines that will use multiple legs for a single flight number so they would have 3 routes with 3 prices being LAX-CHI, CHI-NYC, and LAX-CHI-NYC (like Southwest), and then strict point to point carriers where there are no connections and you pay per leg (like Allegiant).
If a user has to manually price all the combinations, than the system would be a complete fail, and there would not be any point to even build it, since the tedium user setting up pricing would take 95% of play time, and fun part of play would just not be there. Even existing manual pricing for all AB flights to an absolute value is a fail, as is, since nobody has the time to do it.
The pricing would have to be automatic, based on formulas (if player choses to tinker with the formulas), or default as the system currently has.
As far as connection taking place (passengers transferring flights) it would be completely automatic, the player would not have to do anything. So player would not have to set up his airline any differently than he currently does. Passenger connections would offer new strategies (to take advantage of the feature), if player choses to pursue these strategies, but it should pretty much all be invisible to the player, especially a new player.
Once you have these 3 business models in place, you can create a master database for routing with 3 columns: departure-arrival-connection. There should be limits on connections, such as the next flight must depart within 4 hours to be considered a connecting flight. Then it's just a matter of populating the database as flights are created. For the hub/spoke model every permutation would be added within 4 hours and a discount rate could be added to each leg for connections for simplicity. So if 2 legs are $100 each and a 20% discount (modifiable by the player) for connecting flights you'd have a $160 ticket instead of a $200 one. For the focus city model a player would have to explicitly add LAX-CHI-NYC as a separate flight number and/or use ABCBA routing where it would be implicit where you can assign price values for those 2 leg flights. For the point to point carrier, they'd simply have each route added.
I don't think the player strategy should drive the set up of database. The database should be able to cover all situations regardless of player strategy.
All there is to figure out is how demand from airport A to B is allocated. The system (I am assuming) will have a table of all AB airport pairs.
Now, suppose there is a demand of 100 pax from NYC squares to SPU (Split, Croatia). The system will have a table that will have 3 choices:
JFK-SPU
EWR-SPU
LGA-SPU
Based on some magic (examining supply and its attractiveness), it allocates:
JFK-SPU: 70 pax
EWR-SPU: 25 pax
LGA-SPU: 5 pax
(subject to periodic re-allocations)
Let's go ahead with only the JFK-SPU, 70 pax demand, that is now pinned to these 2 airports.
Nobody flies direct (and even if somebody did, alternatives would have to be considered). To allocate this demand, the system will have to look up viable combinations, ranked from best to worst (based on number of variables). Let's say, there are flights with connections exist (are being flown):
JFK-LHR-SPU (l10x JFK-LHR, 1x LHR-SPU, resulting in 10 combinations)
JFK-FRA-SPU (4x JFK-FRA, 2x FRA-SPU, resulting in 8 combinations)
JFK-CDG-SPU
JFK-MUC-SPU
JFK-VIE-SPU
and maybe 10s more.
The system can rank them, and store them in a table. So for JFK-SPU pair there would be several flight combination candidates for this table. As far as how many of possibly endless combinations the system stores, the system can limit them based on the pax demand between these airports.
So for example, for 70 pax, between JFK-SPU the system will store top 10 combinations.
For 25 pax demand between EWR-SPU, the system would store 4 combinations.
For 5 pax demand between LGA-SPU, the system would store 2 combination.
Or whatever works. The system would have to re-evaluate the candidates periodically.
Then, back to JFK-SPU, it is just a matter of allocating the pax between the 10 connecting flight possibilities. The same table that keeps the list of candidate connecting flights can also keep a tally of how successful they were on the pax allocation. Let's say there is a top flight combo:
JFK-FRA, Airline XY, 8pm flight, flight #XY001 connecting to
FRA-SPU Airline XY, 12noon flight XY301
This flight was allocated and flown 30 pax. The row with a combination XY001, XY301 (related to airport pair JFK-SPU) would store this figure of 30 pax. Or it can be used as a running average for past 7 days with a formula:
(old value * 6 + 30 pax) / 7
So we would have 10 records of these with various allocations. In order to keep the flight combinations fresh, the system can keep top 70% of these 10 (=7) and replace the bottom 30% (=3) and do so periodically.
This can also be used to re-allocate the demand between JFK, EWR and LGA, based on successful completions. From the above table, we would know that the sum of the 10 flights connecting JFK-SPU completed flights of 60 pax (of available 70).
Sum of EWR-SPU was 25 (of available 25)
Sum of LGA-SPU was 0 (of available 5)
Based on this, the system would reallocate NYC pax demand, shifting demand away from LGA-SPU, with the benefit going mostly to EWR-SPU.
Once you have the database setup, it's just a matter of pulling the flights where departure= and arrival= the specified values. You are already using a haversine equation for distance, so you just need to modify it to get the angle of the two legs (between 0 and 180) versus the angle of a direct flight (180). You can then use that value logarithmically to start penalizing connecting flights with extreme connections. For example, flying NYC to LAX is a 180 degree flight. Flying NYC to LAX via CHI has maybe a 150 degree angle. Flying NYC to LAX via Moscow, Russia is going to have an extreme angle of maybe 20 degrees. The traffic would be distributed across these flights, but the Moscow flight would be penalized exponentially compared to the Chicago flight. Additionally, since these numbers would remain static, you could add them to the database so they don't have to be calculated each time. Then it would just be a matter of calculating each flight's pax ratio/value against a sum() of the matching rows.
I see what you are trying to say here, and it seems like a good way to select candidate flight combinations to be considered. And my recent flight (using frequent flyer miles, on the cheap) would never make it into your candidate database, based on the angles. Yet, United Airlines (Star Alliance) offered this to me as the cheapest flight (using fewest miles) during peak travel week:
LGA-TOR-SXM
SXM-PTY-JFK
