An optimization approach to assessing the self-sustainability potential of food demand in the Midwestern United States

Conventional agriculture faces significant challenges as world population grows, food demand increases, and mobility becomes increasingly constrained. Reducing the distance food needs to travel is an important goal of sustainability and resiliency, particularly in the context of a variety of transportation challenges. In this study, we developed a linear programming optimization method to assess the potential of regions to meet dietary requirements with more localized and diversified agricultural systems. Emphasis is on minimizing the distance between population centers and available cropland, accounting for variations in yield among 40 of the most marketable food crops that can be grown in the Midwestern United States. We also derived two new metrics to guide strategic planning toward more localized systems: the “per capita cropland requirement” and the “regional self-sustainability index.” Overall, we conclude that the eight-state study region would require an average of 0.49 acres (0.2 ha) per consumer with an average absolute deviation of 0.09 acres (.04 ha). The self-sustainability index is estimated at 9.3, which indicates that the region has 9.3 times the cropland needed to become self-sustaining. Targeted dietary recommendations could potentially be met within a population-weighted average distance of 13.6 miles (21.9 km).

minimizing the distance between population centers and available cropland, accounting for variations in yield among 40 of the most marketable food crops that can be grown in the Midwestern United States.We also derived two new metrics to guide strategic planning toward more localized systems: the "per capita cropland requirement" and the "regional self-sustainability index." Overall, we conclude that the eight-state study region would require an average of 0.49 acres (0.2 ha) per consumer with an average absolute deviation of 0.09 acres (.04 ha).The self-sustainability index is estimated at 9.3, which indicates that the region has 9.3 times the cropland needed to become self-sustaining.Targeted dietary recommendations could potentially be met within a population-weighted average distance of 13.6 miles (21.9 km).

Introduction
Commodity farming evolved out of the massproduction era, when the cost to overcome distance was small compared to the labor savings generated by highly capitalized, single-purpose equipment.The digital era, however, is shifting economic direction toward technologies that are smaller, more adaptable, and more decentralized in nature.Consider, for example, how wireless devices have evolved to replace phone booths over the last three decades.Farm-to-market systems are likely to follow a similar path, particularly in response to a backlog of transportation-related costs that have accrued over the same timeframe.
The most important of these cost issues are: (1) Global demand for transportation fuels is accelerating at the same time that the parasitic losses required to extract petroleum (or condition alternatives) are increasing.(2) Many segments of the Eisenhower-era highway system are about to reach the end of their 50-year design life.The plans to rebuild or shift modes are seriously underfunded and backlogged.(3) Public knowledge that transportation contributes to climate change implies it is likely to become a target for remedial sanctions at some point in the future.For these reasons, food system stakeholders both large and small will need to become increasingly focused on minimizing the transportation dependency of food systems.
Attempts have been made to quantify local food consumption and local food production using various methods.Some researchers have determined demand based on current dietary consumption patterns, accessing food consumption data from the USDA Economic Research Service, while others have looked at it from a health standpoint and determined demand based on optimal nutrient consumption.Production, or supply, has generally been determined using USDA Census of Agriculture data, yet the units of analysis have ranged from U.S. dollars to calories to dietary exchanges, just to name a few.Desjardins, MacRae, and Schumilas (2010) conducted a regional study in Canada assessing quantity needed (demand) and local production capacity (supply) to meet Canadian dietary requirements.Timmons, Wang, and Lass (2008) developed a local food measure, termed "maximum local food," using per capita food value produced and useable (in dollars) and production per capita (in dollars) to represent consumption.With each method came practical and applied limitations.In the Canadian study, it was noted that supply could meet demand in the region studied by the target year 2016; however, about 10% of the cropland would need to be reallocated.For example, some of the current corn and soybean cropland would need to be allocated to rye, oats, and white beans.Likewise, constraints to achieving maximum local food percentages such as seasonality and lack of processing facilities were recognized limitations.
The term "foodshed" will be used throughout this paper.Some credit Arthur Getz with the introduction of the term in 1991 to describe where food comes from and how it gets there, although others might look to the earlier writings of Walter Hedden (1929).However, the term has evolved and been used in multiple ways since that time.Kloppenburg, Hendrickson, and Stevenson (1996) presented a foodshed as a unit of analysis, while Peters, Bills, Wilkins, and Fick (2008) defined it as a geographic area from which food is acquired.
In our study, we introduce methods to quantify and optimize the positioning of foodsheds within an eight-state region of the Midwest, taking into account the availability of cropland and the aggregate needs of each competing town or county located within it.For this purpose, we introduce two new terms to compare the food system potential of each location: (1) the "per capita cropland requirement" indicates the total cropland needed to produce a comprehensive mix of food products for one person (including products derived from livestock feed), taking into account loss factors and geographic variability in yields, and (2) the "regional self-sustainability index" indicates the ratio of the cropland available to the cropland needed to supply the same diet to the entire population residing within a targeted area.A value greater than one implies that a region has potential to become self-sustaining; a value less than one indicates otherwise.Linear optimization techniques are applied to minimize aggregate distance among competing populations within the region.
The remainder of this paper discusses the methods and rationales incorporated in this study.The next section provides the applied research methods, including background and description of the problem; the following section describes how the data were formulated; the subsequent section introduces the linear programming model; the succeeding section reports the results; and the final section summarizes the paper with concluding remarks and future research directions.

Research Methods
Our methods are derived from a study by researchers from Cornell University that identified foodshed potential in New York State (Peters, Bills, Lembo, Wilkins, & Fick, 2009).The New York study introduced the concept of a Human Nutritional Equivalent (HNE), which the authors defined as "a basket of food that contains representatives from all food groups combined in the proper proportions to constitute a complete diet for one person for one year."The HNE targets a representative diet, which is used to identify how much cropland is needed for each consumer.The New York diet was targeted by the authors to meet USDA Food Pyramid guidelines (Kantor, 1999), based on crops that could be grown in New York State.Yields were averaged over three mile (~5 km) grid increments using geographic information systems (GIS) and high-resolution soil and land use data.The location of existing cropland was precisely identified and applied to the model to minimize overall transportation distance within the state.
Like the New York model, our study also bases dietary targets on a representative product mix derived from USDA dietary guidelines (United States Department of Agriculture [USDA], 2008).However, instead of assigning a product mix, we distribute the recommendations for each major food group proportionately to the national average per capita rates of consumption reported by the USDA's Economic Research Service (United States Department of Agriculture Economic Research Service, 2010).For example, we base the recommended amount of "orange vegetables" on carrots (55%), sweet potatoes (23%) and pumpkin (22%), which is proportionate to national average rates of consumption among the orange vegetables that can be grown inside the study region.To simplify the model, we ignored food products with negligible consumption within each group, generally individual products that contributed less than 1% of the weight of a defined food group.
A second key difference between our study and the New York study is in how we estimate yields.While the New York study uses GIS to estimate yields from soil types, we estimate them using relative proximity to reported yields for each crop.Data reported in the Agricultural Census (USDA, 2009) and Yearbook Summaries for Vegetables and Fruits and Nuts (USDA, 2010a, 2010b) were used to identify yields for counties located throughout the continental United States.Those yields were then projected to the geographic coordinates of each county in the study area using a north-south, east-west averaging method.
A third difference between our model and the New York study is that instead of using GIS to specifically identify where cropland is located, we use a linear programming model to optimize placement relative to each population center.In essence, we estimate the net difference in demand and supply potential for each location in the study area, and then optimize the allocation of deficit locations to surplus locations, with the objective of minimizing the distance between geographic coordinates for each region in the study area.
While the New York study is based on a three-mile grid resolution, ours averages 27 miles (~44 km), which varies according to the land area of each county-level data record.However, our method is based on rates, not totals, for each region.Instead of estimating totals for a specific three-mile grid location, we apply county-level rates per square mile to the area covered by each county or population center.For example, the availability of cropland is based on the percentage of total land area, instead of its actual location.Rates specific to each county are then linked to the central latitude and longitude of the county, plus or minus the square root of its land area.Any error is contained within the approximate location of each county.The average distance across the counties included in the study region is 27 miles.
Basing the analysis on rates substantially reduces the volume of data and processing needed, and opens the model for application to almost any region of the United States.Data on population (Census) and cropland (Census of Agriculture) can be easily accessed for almost any county in the United States; the method for translating yields from national level reference data to locations inside the study area is described below in the section entitled "Yield Estimates."

Formulation of Data
Individual data records were consolidated for all cities and counties located in the eight-state region surrounding Iowa.These include Iowa, Illinois, Missouri, Kansas, Nebraska, South Dakota, Minnesota, and Wisconsin.This region consists of 38 million people, distributed over 549,000 square miles (1,422,000 square km) of land area.Approximately half the population lives in cities or towns that are 1,000 to 100,000 people in size, a fourth lives in larger cities, and another fourth lives in smaller towns or rural areas.While the population density for the region as a whole averages 70 persons per square mile (27 persons per square km), over three-fourths live among densities that average 3,700 persons per square mile (1,423 persons per square km).By comparison, cropland averages 300 acres per square mile (47 ha per square km) or 4.3 acres (1.7 ha) per capita.Overall, cropland accounts for 48% of total land area throughout the region.These values were calculated from data downloaded directly from the Census Bureau and Census of Agriculture.

Data Summary
In total, the study area was broken down into 6,853 data records, consisting of 738 counties and 6,115 cities or towns (i.e., incorporated places).All data were downloaded from various websites and consolidated using Federal Information Processing Standard (FIPS) codes to align each component.All data records include population, land area, and the longitude and latitude coordinates for its geographic center.Additionally, each county-level record includes total acres of cropland, and several calculated fields that estimate yields for 40 key crops and translate them in relation to each MyPyramid dietary group.For example, the recommended numbers of servings for each dietary group (USDA, 2008) are translated into the pounds that each crop is expected to contribute to it, based on the relative proportions of actual per capita consumption (Food Availability Data System).The pounds of each contributing crop are translated into acres per capita, and summed for each dietary group.Dietary groups are listed in table 1.
Six additional worksheets translate MyPyramid recommendations from daily loss-adjusted rates to equivalent farm weight requirements for each crop.These calculations also translate meat, dairy, poultry, and aquaculture into annual demand for feed crops, using generalized conversions to primary weight, carcass weight, live weight, feed rates and ration mixes.Only beef and dairy products required forage crops; all other rations are based solely on corn and soybeans.
In general, all supply and demand values are translated into "consumer equivalent" rates.By this, we mean any metric that can be linked to an "average" consumer for one year.On the demand side, a consumer equivalent refers to one unit of population, regardless of age or gender.On the supply side, it might include MyPyramid recommendations for one or more dietary groups, actual rates of per capita consumption, the equivalent farm weight needed to supply an individual food product, the number of acres required, the total land area required, or several other measures.
The primary result generated for each data record is net production capacity expressed in consumer equivalents per year, after deducting for the needs of the internal population.A positive value indicates the data region has surplus production capacity; a negative value indicates the region has deficit capacity, and must import food from other areas to meet the needs of its population.The net values for each record, combined with its geographic coordinates, provide the inputs to the linear optimization model.
Generally, most counties generate surplus production relative to the needs of their rural populations.However, because cities and towns have no reported cropland referenced to them, they are estimated to have "zero" production capacity relative to the needs of their populations, and always generate production deficits.In essence, the linear programming model allocates these deficits to the surpluses available in the nearest part of a county, accounting for the competing needs of the cities in the nearby region.

Yield Estimates
Because the availability of data is extremely limited for many crops, only products with substantive per capita rates of demand were designated to represent each food group.For example the "red meat" group includes beef, pork, lamb, and veal.However, because beef and pork account for 99% of consumption, lamb and veal were ignored, and the total recommended amount of "red meat" was based entirely on the crops targeted to produce beef and pork.Overall, 164 distinct food products, including processing variants (fresh, canned, frozen, etc.) of the same crop were narrowed to the capabilities of the 40 representative crops listed in table 2.
All available data for each crop were downloaded for each of 3,040 counties located throughout the United States (including Hawaii and Alaska).Reference data varied from a minimum of 12 data points for celery up to 3,012 for forage crops.Countylevel yield and acreage data for field crops are widely available, and were sourced directly from the 2007 Census of Agriculture.Reference yields for other crops, however, were estimated by projecting statewide averages to the counties inside the state that reported substantive acreage for that Counties with reported acreage data but no yield data were used to identify the maximum northsouth and east-west growing ranges for each crop.
If a county in the study area was located outside the growing range for a crop, the yield for that crop was automatically determined to be zero, and the other crops in the dietary group were used to identify production capacity for that food group for that county.
Yields for crops that were in range of a data record were estimated by averaging all reference points available within a specified north-south and eastwest offset distance.This "estimating range" was determined by dividing the distances between the outer limits of each growing range by the square root of the number of data points available for it.
A relatively large availability of reference data resulted in a relatively narrow estimating range, and vice versa.If the data record was within the growing range, but reference data was not available in either direction, a default yield was used which was based on the smallest yield identified nationally for the crop.Default yields were only applied when a county was within growing range of the crop.
The yields projected to each data record in the study area were then translated to the cropland needed to produce enough of each food group to meet the targeted dietary needs for an average consumer for one year.Regional averages for each of these values are listed in table 3. The conversion factors used to convert livestock feed to per capita crop requirements are included in table 4.

Net Production Capacity
Among the eight-state study area, the total amount of land needed for all crops averaged 0.49 acres (0.2 ha) per consumer, with an average absolute deviation of 0.09 acres (0.04 ha).This means that on average, each consumer requires between 0.40 and 0.58 acres (0.16 and 0.23 ha) of local cropland, depending on which county they are located.We define this metric as the per capita cropland requirement, which is a value identified for each county in the study region.
The total amount of cropland available in each county divided by its per capita cropland requirement is used to estimate the maximum production capacity of each county, expressed in consumer equivalent units.Among the eight-state study area, cropland averaged 48% of total land area, with an absolute deviation of 21%.This means that cropland generally accounts for between 27% and 69% of total land area, depending in which county it is located.
The rural population of each county is subtracted from maximum production capacity to identify the net capacity that each county can supply to other locations.All supply capacity originates in counties; none originates in cities or towns.The net capacities are indexed to a range that extends from the central coordinates of the county, plus or minus half the square root of the county's total land area, and allocated to nearby locations using the linear programming model.Surplus capacity that is not allocated to a city or town is ignored.

Population Distribution
On the demand side, a consumer equivalent is synonymous with the total population of each city or county.Although recommended serving amounts vary by age group, it was determined that this did not substantially influence location-specific demands within the study area (see table 5).This is primarily because the distribution of population is relatively consistent from place to place.
For example, even though the number of servings for the 19-30 age group is 126% of the recommended average for the population as a whole (weighted to national population distribution), as a percentage of population, the age group only deviates from one location to another by 1.0% (United States Census Bureau, 2009).As such, deviations in the average servings needed per capita caused by variations in the percentage of 19-30 year old con-sumers in any particular county is likely to be less than 0.3% (i.e., 26% multiplied by an average deviation of 1% of the population).Note that when a location has a relatively higher percentage of one age group, it will have a relatively lower percentage of another, meaning that part of the absolute deviation in per capita average will be offset.Note also that even though population deviations are generally higher among the two oldest age groups, the net dietary amounts for these groups do not vary significantly from the per capita average.
Thus it was determined that the benefits of accounting for regional variations in the distribution among age groups were not worth the substantial amounts of data processing that would be needed to account for all variables among a large number of discrete and competing regions.In essence, it would turn a single per capita rate into several thousand variables, with relatively little effect on accuracy.

Optimization Model and Results
In this study, we used a linear programming model to formulate the foodshed optimization problem.This model has been used in a previous, smallerscale study and reported in the literature (Hu, Wang, Arendt, & Boeckenstedt, in press).We used population, dietary, and geographical information to map potential foodsheds.The emphasis is on minimizing total geographic distance between supply and demand.
The model formulation can be expressed as: subject to: x ij ≥ 0 The key components of this linear programming model are: 1.A set of decision variables: resenting the foodshed mapping relationship.
The variable x ij denotes the supply amount from supply block i to demand block j.We divide the studied region into one-square-mile blocks, with each block having either net supply, net demand, or neither.The values assigned to each block are effectively the net rates per square mile described in the "Research Methods" section above.
2. A parameter vector: The parameter c ij denotes the distance between supply block i to demand block j.Longitude and latitude coordinates are used to calculate the distance.We made the assumption that the transportation routes generally follow a northsouth and east-west road grid and the total distance is the summation of distances between longitude and latitude.
The parameter d j denotes the food demand for demand block j based on population size and per capita consumption requirements.The per capita consumption requirements are based on USDA MyPyramid daily servings.Demand was adjusted to MyPyramid rates as consumer equivalents, representing a total dietary amount for all food groups.4. A parameter vector: s = {s i }, i = 1,2,…,S.
The parameter s i denotes the supply capacity of supply block i based on land availability and expected yields of each crop.
Although linear programming problems can be efficiently solved by optimization solvers, we used a heuristic algorithm to obtain near-optimal solutions.This is because the studied region contains 26,175 demand blocks and 481,086 supply blocks, which makes the model too large to be solved by regular solvers.The heuristic algorithm is a greedy type that simply searches for available supply blocks in the neighborhood of demand blocks to match them.Heuristic algorithms are commonly used in solving optimization problems in which the exact solutions are computationally expensive to obtain.The area of the neighborhood is gradually increased until all demands are satisfied.Due to the simplicity of the heuristic method, we are able to obtain a solution within seconds on a standard personal computer.
The results of the linear optimization model for the eight-state region targeted for this study are shown in figure 1 (next page).The red areas represent locations with negative production capacity, which are generally urban centers or counties that do not have sufficient production capacity to support their populations.In essence these are locations that need to import food from other locations within the region.The blue areas represent locations with positive net capacity that has been allocated to other locations by the linear optimization model.Blue locations are located as close as possible to red locations and account for competing deficits from multiple locations.
In general, each population center tries to satisfy its demand using the nearest surplus production capacity available.Whenever net capacity is insufficient, the region is expanded until demand from all population centers located within it are satisfied.Generally, the bigger the supply-demand area, the larger the area needed for its population to become self-sustaining.
Note that larger distances areas are generally associated with larger population centers, but not entirely.Specifically, areas with lower cropland density or lower aggregate yields also require larger areas.For example, even though the population residing in the blue area surrounding Chicago is 25 times larger than the blue area that covers northeast Minnesota and Wisconsin, the self-sustainability region surrounding Chicago is smaller, because it is situated closer to more productive and densely available cropland.Figure 2 (next page) summarizes the minimum distances needed for serving a percentage of populations in the studied region.
The graph starts at (distance = 1, population percentile = 35%), which means that about 35% of locations require less than the 1 mile based on the model results.

Conclusions
In this study, we identified two new foodshed performance metrics and demonstrated that linear Longitude programming is effective and appropriate for organizing foodsheds to minimize transport requirements between population centers.While data from eight Midwestern states were used to demonstrate the model, the methods developed can also be applied to almost any other region of the country; all data is available online.
The per capita cropland requirement and regional self-sustainability index were both introduced to provide quick, easy-to-understand references for the comparison between regions.The per capita cropland requirement identifies how much acreage is needed within a region to meet all targeted dietary requirements for an average person for one year.This metric accounts for all expected production and spoilage losses, and can be multiplied directly by population to identify the total cropland needed to supply a target area.
The regional self-sustainability index more broadly characterizes how effectively an area can become self-sustaining relative to the dietary target.This value is calculated by dividing the cropland available within a region by the total cropland needed to supply its internal population.A value greater than one implies the region can become self-sustaining; a value less than one implies otherwise.
When targeting a dietary mix that emulates all MyPyramid recommendations, the per capita cropland requirement for the eight-state region averages 0.49 acres (0.2 ha) per consumer, which on average varies by 0.09 acres (0.04 ha) per consumer within the region.The self-sustainability index is estimated at 9.3, which indicates that the region has 9.3 times more cropland than it needs to become selfsustaining relative to the MyPyramid dietary target.The total cropland requirement for the region is estimated at 18 million acres (7.3 million ha).
Findings from the linear programming model are summarized as follows:  Targeted MyPyramid recommendations can be met within an average distance (weighted by population) of 13.6 miles (21.9 km) throughout the study region.
 Fifty-six percent of the population could be supplied in less than a five-mile (8 km) production range.
 The Chicago area, which represents the largest concentration of consumers in the study area, could become self-sustaining within a 76 mile (122 km) range.
 Minneapolis (37 miles or 60 km), St. Louis (27 miles or 43 km), Kansas City (24 miles or 39 km), and Des Moines (10 miles or 16 km) could also become self-sustaining within relatively small travel distances.
 The predominantly rural, wooded areas of northern Minnesota and Wisconsin require a relatively larger range relative to population.
Of course, these results do not account for seasonality, storage methods, or quality perceptions, which are beyond the scope of this study.Our premise is that over time, a rise in transportation costs will drive investment toward more advanced food production and storage technologies to resolve these issues.As such, our future research will focus on methods to integrate risk and sensitivity analyses with respect to yield and demand fluctuations.

Figure
Figure 1.Foodshed Locations Identified Using Linear Optimization Modeling

Figure 2 .
Figure 2. Distribution of Foodshed Distances Within the Study Region

Table 1 .
Targeted Demand Rates Per Capita

Table 2 .
Representative Crops Used To Determine Production Capacity

Table 3 .
Estimated Yields by Crop Group

Table 4 .
Livestock Conversion RatesPer Capita Feed Requirements (Pounds or Kilograms of Dry Matter)

Table 5 .
Expected Deviations in Foodshed Demand by Age Group