earthdaily.earthdatastore.parallel_search module

exception earthdaily.earthdatastore.parallel_search.NoItemsFoundError[source]

Bases: Exception

Exception raised when no items are found during search operation.

This exception is raised when a parallel search operation yields no results, indicating that the search criteria did not match any items in the dataset.

earthdaily.earthdatastore.parallel_search.datetime_split(dt_range, freq='auto', n_jobs=10)[source]

Split a datetime range into smaller chunks based on specified frequency.

Parameters:

dt_range (tuple of (datetime or Timestamp)) – A tuple containing the start and end datetimes to split.
freq (str or int or Timedelta, default="auto") – The frequency to use for splitting the datetime range. If “auto”, frequency is calculated based on the total date range: It increases by 5 days for every 6 months in the range. If int, interpreted as number of days. If Timedelta, used directly as the splitting frequency.
n_jobs (int, default=10) – Number of jobs for parallel processing (currently unused in the function but maintained for API compatibility).

Returns:

If the date range is smaller than the frequency:: Returns the original datetime range tuple.
Otherwise:: Returns a tuple containing: - List of datetime range tuples split by the frequency - The Timedelta frequency used for splitting

Return type:

Union[DatetimeRange, tuple[list[DatetimeRange], Timedelta]]

Notes

The automatic frequency calculation uses the formula: freq = total_days // (5 + 5 * (total_days // 183))

This ensures that the frequency increases by 5 days for every 6-month period in the total date range.

Examples

>>> start = pd.Timestamp('2023-01-01')
>>> end = pd.Timestamp('2023-12-31')
>>> splits, freq = datetime_split((start, end))
>>> len(splits)  # Number of chunks
12

>>> # Using fixed frequency
>>> splits, freq = datetime_split((start, end), freq=30)  # 30 days
>>> freq
Timedelta('30 days')

earthdaily.earthdatastore.parallel_search.datetime_to_str(dt_range)[source]

Convert a datetime range to a tuple of formatted strings.

Parameters:: dt_range (tuple of (datetime or Timestamp)) – A tuple containing start and end datetimes to be converted.
Returns:: A tuple containing two strings representing the formatted start and end dates.
Return type:: tuple of str

Notes

This function relies on ItemSearch._format_datetime internally to perform the actual formatting. The returned strings are split from a forward-slash separated string format.

Examples

>>> start = pd.Timestamp('2023-01-01')
>>> end = pd.Timestamp('2023-12-31')
>>> datetime_to_str((start, end))
('2023-01-01', '2023-12-31')

earthdaily.earthdatastore.parallel_search.parallel_search(func)[source]

Decorator for parallelizing search operations across datetime ranges.

This decorator enables parallel processing of search operations by splitting the datetime range into batches. It automatically handles parallel execution when conditions are met (multiple batches or large date range) and falls back to sequential processing otherwise.

Parameters:

func (callable) –

The search function to be parallelized. Should accept the following kwargs: - datetime : tuple of datetime

Range of dates to search

batch_daysint or “auto”, optional
Number of days per batch for splitting
n_jobsint, optional
Number of parallel jobs. Use -1 or >10 for maximum of 10 jobs
raise_no_itemsbool, optional
Whether to raise exception when no items found

Returns:

Wrapped function that handles parallel execution of the search operation.

Return type:

callable

Notes

The wrapped function preserves the same interface as the original function but adds parallel processing capabilities based on the following parameters in kwargs: - batch_days : Controls the size of datetime batches - n_jobs : Controls the number of parallel jobs (max 10) - datetime : Required for parallel execution

The parallel execution uses threading backend from joblib.