An Activity-Based Scheduling Framework Incorporating Reinforcement Learning

VANHULSEL, Marlies

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/25354

Title:	An Activity-Based Scheduling Framework Incorporating Reinforcement Learning
Authors:	VANHULSEL, Marlies
Advisors:	WETS, Geert
Issue Date:	2010
Abstract:	While making decisions, governments and policy makers wish to be supported by models in order to estimate the impact of their decisions on society as a whole. Transportation models comprise a major example of such decision supporting models, as they are applied to monitor travel behaviour, to evaluate policy decisions, to assess the environmental influence of traffic, etc. Traditionally, transportation modelling highly concentrated on trip-based modelling. Yet, recently activity-based modelling is gaining importance. This type of modelling assumes that travel patterns are the result of activity schedules that individuals execute in their attempt to achieve certain goals, taking into account individual needs, preferences, opportunities and constraints. Consequently, activity-based models aim at simulating the individual decision-making behaviour considering the distinct activity-travel related dimensions simultaneously. As such, an activity-based model predicts for each individual which activities to perform at which locations, when to start these activities and for how long and which transport modes are used in order to get to the desired locations. The resulting activity-travel sequences constitute the basis of the assignment of the individual routes to the transportation network, and as such estimating aggregate travel demand (Ettema & Timmermans, 1997a; Timmermans, 2000). As a result, activity-based transportation models offer the opportunity of predicting travel demand more accurately as they provide a more profound insight into individual activitytravel behaviour. Furthermore these models are capable of estimating more realistically the impact of a policy on transportation related issues, for instance traffic safety, environmental pollution and land use patterns. Yet, the majority of such models is still quite static. This signifies that these models disregard the interaction between individual agents, as well the iii ABSTRACT effect of unforeseen events which occur in the course of the execution of the activity schedule (e.g. unexpected travel times) (Arentze et al., 2005). To this end, the present research contributes to the state-of-the-art of activity-based travel-demand modelling by presenting a framework to simulate activity-travel sequences, taking these requirements into account. For this purpose, the entire prediction process from pre-processing the data up to analysing the activity-travel sequences generated by the core scheduling engine is designed and tested in this dissertation. To start with, the suitability of reinforcement learning to generate activity-travel patterns based on observed activity-travel diary data is explored. As the traditional reinforcement learning technique is not capable of learning efficiently in large state and action spaces with respect to memory and computational requirements on the one hand, and of generalizing based on infrequent visits of all state-action pairs on the other hand, the Q-learning technique used in most applications, is enhanced, by implementing an incremental regression tree function approximator. Furthermore, to incorporate the impact of interactions between the different aspects of activity-travel decisions (G¨arling et al., 1997; Joh et al., 2002), multi-actor reinforcement learning is introduced. Next, the data feeding the algorithm is examined. These data consist of observed activitytravel diaries on the one hand, and corresponding socio-demographic data on the other hand. Because people differ in their needs and preferences, while facing different opportunities and constraints, the observed activity-travel diaries expose a large variation. From this perspective, the predictive power of the scheduling algorithm can be increased by splitting the observed activity-travel sequences, which serve as input to the algorithm, into a number of clusters displaying similar activity-travel behaviour. For that reason, the current research presents a technique, founded on work conducted by Wilson (2008) - in which a multidimensional extension of the well-known sequence alignment method (Wilson, 1998a) is introduced -, which is capable of estimating the dissimilarity between sequences, considering the activity type as well as the relative positions of the locations in a sequence with regard to one another and the distances travelled between these locations. Based on the dissimilarities between the observed patterns calculated by means of this technique, the observed data are divided into a number of groups. Subsequently, these clusiv ters are linked to the socio-demographic data so as to formulate socio-demographic profiles matching the clusters by means of a classification tree. The profiles can now be used to partition the synthetic population according to their socio-demographic attributes. Thereafter, the current dissertation describes the actual core of the scheduling engine, which is composed of five decision modules, each of which incorporates a reinforcement learning system enhanced with a regression tree function approximator and connected to the subsequent module through multi-actor reinforcement learning. The first module determines the activity duration. In the second module the agent decides whether or not he wants to execute the next fixed activity (in this case sleeping or working) while the agent chooses which activity to perform in the third module. The fourth module is held responsible for selecting the distance band of the location where the agent wants to execute the selected activity. Finally, the agent fixes the travel mode to get to this location in the fifth module. The decisions in these modules are all guided by reward functions which are calibrated on the observed activity-travel sequences. The design of the reward functions incorporated in these modules, as well as the functioning of each of the modules, is examined carefully. To speed up the learning process, one prototype agent for each cluster is trained first and the knowledge acquired by these prototype agents is used to initialize the individual agents corresponding to a member of the synthetic population thereafter. Finally, as the main goal of the current research comprises formulating a framework to simulate activity-travel patterns within an activity-based travel-demand model, the performance - both with respect to the predictive power and the computational requirements - of the presented scheduling engine is assessed. To begin with, the execution time of the algorithm is investigated, pushing forward some adjustments to the program code to increase its efficiency. In addition, two conceptual improvements are advanced. Firstly, more prototype agents are included in the scheduling algorithm. These prototype agents no longer match the clusters of similar activity-travel patters, but are linked to the (terminal) nodes of the decision tree, attaching socio-demographic profiles to each of these clusters. Consequently, the membership degrees in the nodes - which are determined by the distribution of the observed sequences belonging to that node over these clusters - now serve as weighting factors in the reward functions. v ABSTRACT The second suggestion to enable scaling up the algorithm concentrates on converting the -greedy action selection strategy into a softmax action selection strategy. Such action selection strategy assigns a probability of being selected to every action within the set of feasible actions based on the experienced Q-values. As a result, the agent can recognize a set of optimal actions and their corresponding probability, rather than selecting either the most optimal action (i.e. exploit) or picking an action uniformly randomly (i.e. explore) as is the case for the -greedy action selection strategy. Because both suggestions introduce more variability into the predicted activity-travel patterns, it is no longer required to attune the reinforcement learning to each individual agent in the synthetic population. After implementing these modifications, the reinforcement learning framework proves to be able to train the prototype agents in a time span of one hour (for ten prototype agents) and to generate one-day activity-travel sequences at a time resolution of five minutes for a population of five million agents in the course of approximately ten hours. This execution time can even be reduced when utilizing multiple parallel processors. Concerning the predictive performance of this scheduling algorithm, the resulting activity- travel patterns are set side by side to a set of observed test sequences by means of a multidimensional sequence alignment method and descriptive statistics with regard to the simulated versus the observed activity types, duration and location. The outcome of these analyses shows that the algorithm is particularly suited to predict activity-travel behaviour based on observed activity-travel diaries.
Document URI:	http://hdl.handle.net/1942/25354
Category:	T1
Type:	Theses and Dissertations
Appears in Collections:	PhD theses Research publications

Files in This Item:

File	Description	Size	Format
Vanhulsel Marlies (1).pdf		9.81 MB	Adobe PDF	View/Open

Show full item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM