earthdaily.earthdatastore.parallel_search module
- exception earthdaily.earthdatastore.parallel_search.NoItemsFoundError[source]
Bases:
Exception
Exception raised when no items are found during search operation.
This exception is raised when a parallel search operation yields no results, indicating that the search criteria did not match any items in the dataset.
- earthdaily.earthdatastore.parallel_search.datetime_split(dt_range: Tuple[datetime | Timestamp, datetime | Timestamp], freq: str | int | Timedelta = 'auto', n_jobs: int = 10) Tuple[datetime | Timestamp, datetime | Timestamp] | Tuple[List[Tuple[datetime | Timestamp, datetime | Timestamp]], Timedelta] [source]
Split a datetime range into smaller chunks based on specified frequency.
- Parameters:
dt_range (tuple of (datetime or Timestamp)) – A tuple containing the start and end datetimes to split.
freq (str or int or Timedelta, default="auto") – The frequency to use for splitting the datetime range. If “auto”, frequency is calculated based on the total date range: It increases by 5 days for every 6 months in the range. If int, interpreted as number of days. If Timedelta, used directly as the splitting frequency.
n_jobs (int, default=10) – Number of jobs for parallel processing (currently unused in the function but maintained for API compatibility).
- Returns:
- If the date range is smaller than the frequency:
Returns the original datetime range tuple.
- Otherwise:
Returns a tuple containing: - List of datetime range tuples split by the frequency - The Timedelta frequency used for splitting
- Return type:
Union[DatetimeRange, tuple[list[DatetimeRange], Timedelta]]
Notes
The automatic frequency calculation uses the formula: freq = total_days // (5 + 5 * (total_days // 183))
This ensures that the frequency increases by 5 days for every 6-month period in the total date range.
Examples
>>> start = pd.Timestamp('2023-01-01') >>> end = pd.Timestamp('2023-12-31') >>> splits, freq = datetime_split((start, end)) >>> len(splits) # Number of chunks 12
>>> # Using fixed frequency >>> splits, freq = datetime_split((start, end), freq=30) # 30 days >>> freq Timedelta('30 days')
- earthdaily.earthdatastore.parallel_search.datetime_to_str(dt_range: Tuple[datetime | Timestamp, datetime | Timestamp]) Tuple[str, str] [source]
Convert a datetime range to a tuple of formatted strings.
- Parameters:
dt_range (tuple of (datetime or Timestamp)) – A tuple containing start and end datetimes to be converted.
- Returns:
A tuple containing two strings representing the formatted start and end dates.
- Return type:
tuple of str
Notes
This function relies on ItemSearch._format_datetime internally to perform the actual formatting. The returned strings are split from a forward-slash separated string format.
Examples
>>> start = pd.Timestamp('2023-01-01') >>> end = pd.Timestamp('2023-12-31') >>> datetime_to_str((start, end)) ('2023-01-01', '2023-12-31')
- earthdaily.earthdatastore.parallel_search.parallel_search(func: Callable[[...], T]) Callable[[...], T] [source]
Decorator for parallelizing search operations across datetime ranges.
This decorator enables parallel processing of search operations by splitting the datetime range into batches. It automatically handles parallel execution when conditions are met (multiple batches or large date range) and falls back to sequential processing otherwise.
- Parameters:
func (callable) –
The search function to be parallelized. Should accept the following kwargs: - datetime : tuple of datetime
Range of dates to search
- batch_daysint or “auto”, optional
Number of days per batch for splitting
- n_jobsint, optional
Number of parallel jobs. Use -1 or >10 for maximum of 10 jobs
- raise_no_itemsbool, optional
Whether to raise exception when no items found
- Returns:
Wrapped function that handles parallel execution of the search operation.
- Return type:
callable
Notes
The wrapped function preserves the same interface as the original function but adds parallel processing capabilities based on the following parameters in kwargs: - batch_days : Controls the size of datetime batches - n_jobs : Controls the number of parallel jobs (max 10) - datetime : Required for parallel execution
The parallel execution uses threading backend from joblib.
See also
joblib.Parallel
Used for parallel execution
datetime_split
Helper function for splitting datetime ranges
Examples
>>> @parallel_search ... def search_items(query, datetime=None, batch_days="auto", n_jobs=1): ... # Search implementation ... return items >>> >>> # Will execute in parallel if conditions are met >>> items = search_items("query", ... datetime=(start_date, end_date), ... batch_days=30, ... n_jobs=4)