Coverage for mlos_bench/mlos_bench/storage/__init__.py: 100%

3 statements  

« prev     ^ index     » next       coverage.py v7.6.9, created at 2024-12-14 01:58 +0000

1# 

2# Copyright (c) Microsoft Corporation. 

3# Licensed under the MIT License. 

4# 

5""" 

6Interfaces to the storage backends for mlos_bench. 

7 

8Storage backends (for instance :py:mod:`~mlos_bench.storage.sql`) are used to store 

9and retrieve the results of experiments and implement a persistent queue for 

10:py:mod:`~mlos_bench.schedulers`. 

11 

12The :py:class:`~mlos_bench.storage.base_storage.Storage` class is the main interface 

13and provides the ability to 

14 

15- Create or reload a new :py:class:`~.Storage.Experiment` with one or more 

16 associated :py:class:`~.Storage.Trial` instances which are used by the 

17 :py:mod:`~mlos_bench.schedulers` during ``mlos_bench`` run time to execute 

18 `Trials`. 

19 

20 In MLOS terms, an *Experiment* is a group of *Trials* that share the same scripts 

21 and target system. 

22 

23 A *Trial* is a single run of the target system with a specific *Configuration* 

24 (e.g., set of tunable parameter values). 

25 (Note: other systems may call this a *sample*) 

26 

27- Retrieve the :py:class:`~mlos_bench.storage.base_trial_data.TrialData` results 

28 with the :py:attr:`~mlos_bench.storage.base_experiment_data.ExperimentData.trials` 

29 property on a :py:class:`~mlos_bench.storage.base_experiment_data.ExperimentData` 

30 instance via the :py:class:`~.Storage` instance's 

31 :py:attr:`~mlos_bench.storage.base_storage.Storage.experiments` property. 

32 

33 These can be especially useful with :py:mod:`mlos_viz` for interactive exploration 

34 in a Jupyter Notebook interface, for instance. 

35 

36The :py:func:`.from_config` :py:mod:`.storage_factory` function can be used to get a 

37:py:class:`.Storage` instance from a 

38:py:attr:`~mlos_bench.config.schemas.config_schemas.ConfigSchema.STORAGE` type json 

39config. 

40 

41Example 

42------- 

43 

44Here's a very basic example of the Storage APIs. 

45 

46>>> # Create a new storage object from a JSON config. 

47>>> # Normally, we'd load these from a file, but for this example we'll use a string. 

48>>> global_config = ''' 

49... { 

50... // Additional global configuration parameters can be added here. 

51... /* For instance: 

52... "storage_host": "some-remote-host", 

53... "storage_user": "mlos_bench", 

54... "storage_pass": "SuperSecretPassword", 

55... */ 

56... } 

57... ''' 

58>>> storage_config = ''' 

59... { 

60... "class": "mlos_bench.storage.sql.storage.SqlStorage", 

61... "config": { 

62... // Don't create the schema until we actually need it. 

63... // (helps speed up initial launch and tests) 

64... "lazy_schema_create": true, 

65... // Parameters below must match kwargs of `sqlalchemy.URL.create()`: 

66... // Normally, we'd specify a real database, but for testing examples 

67... // we'll use an in-memory one. 

68... "drivername": "sqlite", 

69... "database": ":memory:" 

70... // Otherwise we might use something like the following 

71... // to pull the values from the globals: 

72... /* 

73... "host": "$storage_host", 

74... "username": "$storage_user", 

75... "password": "$storage_pass", 

76... */ 

77... } 

78... } 

79... ''' 

80>>> from mlos_bench.storage import from_config 

81>>> storage = from_config(storage_config, global_configs=[global_config]) 

82>>> storage 

83sqlite::memory: 

84>>> # 

85>>> # Internally, mlos_bench will use this config and storage backend to track 

86>>> # Experiments and Trials it creates. 

87>>> # Most users won't need to do that, but it works something like the following: 

88>>> # Create a new experiment with a single trial. 

89>>> # (Normally, we'd use a real environment config, but for this example we'll use a string.) 

90>>> # 

91>>> # Create a dummy tunable group. 

92>>> from mlos_bench.services.config_persistence import ConfigPersistenceService 

93>>> config_persistence_service = ConfigPersistenceService() 

94>>> tunables_config = ''' 

95... { 

96... "param_group": { 

97... "cost": 1, 

98... "params": { 

99... "param1": { 

100... "type": "int", 

101... "range": [0, 100], 

102... "default": 50 

103... } 

104... } 

105... } 

106... } 

107... ''' 

108>>> tunables = config_persistence_service.load_tunables([tunables_config]) 

109>>> from mlos_bench.environments.status import Status 

110>>> from datetime import datetime 

111>>> with storage.experiment( 

112... experiment_id="my_experiment_id", 

113... trial_id=1, 

114... root_env_config="root_env_config_info", 

115... description="some description", 

116... tunables=tunables, 

117... opt_targets={"objective_metric": "min"}, 

118... ) as experiment: 

119... # Create a dummy trial. 

120... trial = experiment.new_trial(tunables=tunables) 

121... # Pretend something ran with that trial and we have the results now. 

122... _ = trial.update(Status.SUCCEEDED, datetime.now(), {"objective_metric": 42}) 

123>>> # 

124>>> # Now, once there's data to look at, in a Jupyter notebook or similar, 

125>>> # we can also use the storage object to view the results. 

126>>> # 

127>>> storage.experiments 

128{'my_experiment_id': Experiment :: my_experiment_id: 'some description'} 

129>>> # Access ExperimentData by experiment id. 

130>>> experiment_data = storage.experiments["my_experiment_id"] 

131>>> experiment_data.trials 

132{1: Trial :: my_experiment_id:1 cid:1 SUCCEEDED} 

133>>> # Access TrialData for an Experiment by trial id. 

134>>> trial_data = experiment_data.trials[1] 

135>>> assert trial_data.status == Status.SUCCEEDED 

136>>> # Retrieve the tunable configuration from the TrialData as a dictionary. 

137>>> trial_config_data = trial_data.tunable_config 

138>>> trial_config_data.config_dict 

139{'param1': 50} 

140>>> # Retrieve the results from the TrialData as a dictionary. 

141>>> trial_data.results_dict 

142{'objective_metric': 42} 

143>>> # Retrieve the results of all Trials in the Experiment as a DataFrame. 

144>>> experiment_data.results_df.columns.tolist() 

145['trial_id', 'ts_start', 'ts_end', 'tunable_config_id', 'tunable_config_trial_group_id', 'status', 'config.param1', 'result.objective_metric'] 

146>>> # Drop the timestamp columns to make it a repeatable test. 

147>>> experiment_data.results_df.drop(columns=["ts_start", "ts_end"]) 

148 trial_id tunable_config_id tunable_config_trial_group_id status config.param1 result.objective_metric 

1490 1 1 1 SUCCEEDED 50 42 

150 

151[1 rows x 6 columns] 

152 

153See Also 

154-------- 

155mlos_bench.storage.base_storage : Base interface for backends. 

156mlos_bench.storage.base_experiment_data : Base interface for ExperimentData. 

157mlos_bench.storage.base_trial_data : Base interface for TrialData. 

158 

159Notes 

160----- 

161- See `sqlite-autotuning notebooks 

162 <https://github.com/Microsoft-CISL/sqlite-autotuning/blob/main/mlos_demo_sqlite_teachers.ipynb>`_ 

163 for additional examples. 

164""" # pylint: disable=line-too-long # noqa: E501 

165 

166from mlos_bench.storage.base_storage import Storage 

167from mlos_bench.storage.storage_factory import from_config 

168 

169__all__ = [ 

170 "Storage", 

171 "from_config", 

172]