Coverage for mlos_bench/mlos_bench/storage/__init__.py: 100%

3 statements  

« prev     ^ index     » next       coverage.py v7.6.10, created at 2025-01-21 01:50 +0000

1# 

2# Copyright (c) Microsoft Corporation. 

3# Licensed under the MIT License. 

4# 

5""" 

6Interfaces to the storage backends for mlos_bench. 

7 

8Storage backends (for instance :py:mod:`~mlos_bench.storage.sql`) are used to store 

9and retrieve the results of experiments and implement a persistent queue for 

10:py:mod:`~mlos_bench.schedulers`. 

11 

12The :py:class:`~mlos_bench.storage.base_storage.Storage` class is the main interface 

13and provides the ability to 

14 

15- Create or reload a new :py:class:`~.Storage.Experiment` with one or more 

16 associated :py:class:`~.Storage.Trial` instances which are used by the 

17 :py:mod:`~mlos_bench.schedulers` during ``mlos_bench`` run time to execute 

18 `Trials`. 

19 

20 In MLOS terms, an *Experiment* is a group of *Trials* that share the same scripts 

21 and target system. 

22 

23 A *Trial* is a single run of the target system with a specific *Configuration* 

24 (e.g., set of tunable parameter values). 

25 (Note: other systems may call this a *sample*) 

26 

27- Retrieve the :py:class:`~mlos_bench.storage.base_trial_data.TrialData` results 

28 with the :py:attr:`~mlos_bench.storage.base_experiment_data.ExperimentData.trials` 

29 property on a :py:class:`~mlos_bench.storage.base_experiment_data.ExperimentData` 

30 instance via the :py:class:`~.Storage` instance's 

31 :py:attr:`~mlos_bench.storage.base_storage.Storage.experiments` property. 

32 

33 These can be especially useful with :py:mod:`mlos_viz` for interactive exploration 

34 in a Jupyter Notebook interface, for instance. 

35 

36The :py:func:`.from_config` :py:mod:`.storage_factory` function can be used to get a 

37:py:class:`.Storage` instance from a 

38:py:attr:`~mlos_bench.config.schemas.config_schemas.ConfigSchema.STORAGE` type json 

39config. 

40 

41Example 

42------- 

43 

44Here's a very basic example of the Storage APIs. 

45 

46>>> # Create a new storage object from a JSON config. 

47>>> # Normally, we'd load these from a file, but for this example we'll use a string. 

48>>> global_config = ''' 

49... { 

50... // Additional global configuration parameters can be added here. 

51... /* For instance: 

52... "storage_host": "some-remote-host", 

53... "storage_user": "mlos_bench", 

54... "storage_pass": "SuperSecretPassword", 

55... */ 

56... } 

57... ''' 

58>>> storage_config = ''' 

59... { 

60... "class": "mlos_bench.storage.sql.storage.SqlStorage", 

61... "config": { 

62... // Don't create the schema until we actually need it. 

63... // (helps speed up initial launch and tests) 

64... "lazy_schema_create": true, 

65... // Parameters below must match kwargs of `sqlalchemy.URL.create()`: 

66... // Normally, we'd specify a real database, but for testing examples 

67... // we'll use an in-memory one. 

68... "drivername": "sqlite", 

69... "database": ":memory:" 

70... // Otherwise we might use something like the following 

71... // to pull the values from the globals: 

72... /* 

73... "host": "$storage_host", 

74... "username": "$storage_user", 

75... "password": "$storage_pass", 

76... */ 

77... } 

78... } 

79... ''' 

80>>> from mlos_bench.storage import from_config 

81>>> storage = from_config(storage_config, global_configs=[global_config]) 

82>>> storage 

83sqlite::memory: 

84>>> # 

85>>> # Internally, mlos_bench will use this config and storage backend to track 

86>>> # Experiments and Trials it creates. 

87>>> # Most users won't need to do that, but it works something like the following: 

88>>> # Create a new experiment with a single trial. 

89>>> # (Normally, we'd use a real environment config, but for this example we'll use a string.) 

90>>> # 

91>>> # Create a dummy tunable group. 

92>>> from mlos_bench.services.config_persistence import ConfigPersistenceService 

93>>> config_persistence_service = ConfigPersistenceService() 

94>>> tunables_config = ''' 

95... { 

96... "param_group": { 

97... "cost": 1, 

98... "params": { 

99... "param1": { 

100... "type": "int", 

101... "range": [0, 100], 

102... "default": 50 

103... } 

104... } 

105... } 

106... } 

107... ''' 

108>>> tunables = config_persistence_service.load_tunables([tunables_config]) 

109>>> from mlos_bench.environments.status import Status 

110>>> from datetime import datetime 

111>>> with storage.experiment( 

112... experiment_id="my_experiment_id", 

113... trial_id=1, 

114... root_env_config="root_env_config_info", 

115... description="some description", 

116... tunables=tunables, 

117... opt_targets={"objective_metric": "min"}, 

118... ) as experiment: 

119... # Create a dummy trial. 

120... trial = experiment.new_trial(tunables=tunables) 

121... # Pretend something ran with that trial and we have the results now. 

122... # NOTE: Normally this would run through a TrialRunner via a Scheduler. 

123... _ = trial.update(Status.SUCCEEDED, datetime.now(), {"objective_metric": 42}) 

124>>> # 

125>>> # Now, once there's data to look at, in a Jupyter notebook or similar, 

126>>> # we can also use the storage object to view the results. 

127>>> # 

128>>> storage.experiments 

129{'my_experiment_id': Experiment :: my_experiment_id: 'some description'} 

130>>> # Access ExperimentData by experiment id. 

131>>> experiment_data = storage.experiments["my_experiment_id"] 

132>>> experiment_data.trials 

133{1: Trial :: my_experiment_id:1 cid:1 rid:None SUCCEEDED} 

134>>> # Access TrialData for an Experiment by trial id. 

135>>> trial_data = experiment_data.trials[1] 

136>>> assert trial_data.status == Status.SUCCEEDED 

137>>> # Retrieve the tunable configuration from the TrialData as a dictionary. 

138>>> trial_config_data = trial_data.tunable_config 

139>>> trial_config_data.config_dict 

140{'param1': 50} 

141>>> # Retrieve the results from the TrialData as a dictionary. 

142>>> trial_data.results_dict 

143{'objective_metric': 42} 

144>>> # Retrieve the results of all Trials in the Experiment as a DataFrame. 

145>>> experiment_data.results_df.columns.tolist() 

146['trial_id', 'ts_start', 'ts_end', 'tunable_config_id', 'tunable_config_trial_group_id', 'status', 'trial_runner_id', 'config.param1', 'result.objective_metric'] 

147>>> # Drop the timestamp columns to make it a repeatable test. 

148>>> experiment_data.results_df.drop(columns=["ts_start", "ts_end"]) 

149 trial_id tunable_config_id tunable_config_trial_group_id status trial_runner_id config.param1 result.objective_metric 

1500 1 1 1 SUCCEEDED None 50 42 

151 

152[1 rows x 7 columns] 

153 

154See Also 

155-------- 

156mlos_bench.storage.base_storage : Base interface for backends. 

157mlos_bench.storage.base_experiment_data : Base interface for ExperimentData. 

158mlos_bench.storage.base_trial_data : Base interface for TrialData. 

159 

160Notes 

161----- 

162- See `sqlite-autotuning notebooks 

163 <https://github.com/Microsoft-CISL/sqlite-autotuning/blob/main/mlos_demo_sqlite_teachers.ipynb>`_ 

164 for additional examples. 

165""" # pylint: disable=line-too-long # noqa: E501 

166 

167from mlos_bench.storage.base_storage import Storage 

168from mlos_bench.storage.storage_factory import from_config 

169 

170__all__ = [ 

171 "Storage", 

172 "from_config", 

173]