1
0
mirror of https://github.com/yt-dlp/yt-dlp synced 2025-12-18 15:15:42 +07:00

Compare commits

..

3 Commits

Author SHA1 Message Date
pukkandan
d392c66fb4 Release 2021.03.21 2021-03-22 02:56:39 +05:30
pukkandan
a31a3a791c Update to ytdl-commit-7e79ba7
7e79ba7dd6
[vimeo:album] Fix extraction for albums with number of videos multiple to page size
2021-03-22 02:56:38 +05:30
Matthew
89a0d0c071 [youtube] Show if video is private, unlisted etc in new field availability
Authored by: colethedj, pukkandan
2021-03-22 02:53:34 +05:30
26 changed files with 385 additions and 647 deletions

View File

@@ -21,7 +21,7 @@ ## Checklist
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.24. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.15. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp.
- Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -29,7 +29,7 @@ ## Checklist
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running yt-dlp version **2021.03.24**
- [ ] I've verified that I'm running yt-dlp version **2021.03.15**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@@ -44,7 +44,7 @@ ## Verbose log
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version 2021.03.24
[debug] yt-dlp version 2021.03.15
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -21,7 +21,7 @@ ## Checklist
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.24. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.15. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://github.com/yt-dlp/yt-dlp. yt-dlp does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -29,7 +29,7 @@ ## Checklist
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running yt-dlp version **2021.03.24**
- [ ] I've verified that I'm running yt-dlp version **2021.03.15**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@@ -21,13 +21,13 @@ ## Checklist
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.24. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.15. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running yt-dlp version **2021.03.24**
- [ ] I've verified that I'm running yt-dlp version **2021.03.15**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@@ -21,7 +21,7 @@ ## Checklist
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.24. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.15. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp.
- Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -30,7 +30,7 @@ ## Checklist
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running yt-dlp version **2021.03.24**
- [ ] I've verified that I'm running yt-dlp version **2021.03.15**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -46,7 +46,7 @@ ## Verbose log
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version 2021.03.24
[debug] yt-dlp version 2021.03.15
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -21,13 +21,13 @@ ## Checklist
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.24. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.15. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running yt-dlp version **2021.03.24**
- [ ] I've verified that I'm running yt-dlp version **2021.03.15**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@@ -31,7 +31,3 @@ DennyDai
codeasashu
teesid
kevinoconnor7
damianoamatruda
2ShedsJackson
CXwudi
xtkoba

View File

@@ -17,23 +17,9 @@ # Instuctions for creating release
-->
### 2021.03.24.1
* Revert [commit/8562218](https://github.com/ytdl-org/youtube-dl/commit/8562218350a79d4709da8593bb0c538aa0824acf)
### 2021.03.24
* Merge youtube-dl: Upto [commit/8562218](https://github.com/ytdl-org/youtube-dl/commit/8562218350a79d4709da8593bb0c538aa0824acf)
* Parse metadata from multiple fields using `--parse-metadata`
* Ability to load playlist infojson using `--load-info-json`
* Write current epoch to infojson when using `--no-clean-infojson`
* [youtube_live_chat] fix bug when trying to set cookies
* [niconico] Fix for when logged in by: @CXwudi and @xtkoba
* [linuxacadamy] Fix login
### 2021.03.21
* Merge youtube-dl: Upto [commit/7e79ba7](https://github.com/ytdl-org/youtube-dl/commit/7e79ba7dd6e6649dd2ce3a74004b2044f2182881)
* Option `--no-clean-infojson` to keep private keys in the infojson
* Option `--clean-infojson` to keep private keys in the infojson
* [aria2c] Support retry/abort unavailable fragments by [damianoamatruda](https://github.com/damianoamatruda)
* [aria2c] Better default arguments
* [movefiles] Fix bugs and make more robust

View File

@@ -3,7 +3,7 @@ # YT-DLP
[![Release version](https://img.shields.io/github/v/release/yt-dlp/yt-dlp?color=brightgreen&label=Release)](https://github.com/yt-dlp/yt-dlp/releases/latest)
[![License: Unlicense](https://img.shields.io/badge/License-Unlicense-blue.svg)](LICENSE)
[![CI Status](https://github.com/yt-dlp/yt-dlp/workflows/Core%20Tests/badge.svg?branch=master)](https://github.com/yt-dlp/yt-dlp/actions)
[![Discord](https://img.shields.io/discord/807245652072857610?color=blue&label=discord&logo=discord)](https://discord.gg/H5MNcFW63r)
[![Discord](https://img.shields.io/discord/807245652072857610?color=blue&label=discord&logo=discord)](https://discord.gg/S75JaBna)
[![Commits](https://img.shields.io/github/commit-activity/m/yt-dlp/yt-dlp?label=commits)](https://github.com/yt-dlp/yt-dlp/commits)
[![Last Commit](https://img.shields.io/github/last-commit/yt-dlp/yt-dlp/master)](https://github.com/yt-dlp/yt-dlp/commits)
@@ -58,7 +58,7 @@ # NEW FEATURES
* **[Format Sorting](#sorting-formats)**: The default format sorting options have been changed so that higher resolution and better codecs will be now preferred instead of simply using larger bitrate. Furthermore, you can now specify the sort order using `-S`. This allows for much easier format selection that what is possible by simply using `--format` ([examples](#format-selection-examples))
* **Merged with youtube-dl v2021.03.25**: You get all the latest features and patches of [youtube-dl](https://github.com/ytdl-org/youtube-dl) in addition to all the features of [youtube-dlc](https://github.com/blackjack4494/yt-dlc)
* **Merged with youtube-dl v2021.03.14**: You get all the latest features and patches of [youtube-dl](https://github.com/ytdl-org/youtube-dl) in addition to all the features of [youtube-dlc](https://github.com/blackjack4494/yt-dlc)
* **Merged with animelover1984/youtube-dl**: You get most of the features and improvements from [animelover1984/youtube-dl](https://github.com/animelover1984/youtube-dl) including `--get-comments`, `BiliBiliSearch`, `BilibiliChannel`, Embedding thumbnail in mp4/ogg/opus, Playlist infojson etc. Note that the NicoNico improvements are not available. See [#31](https://github.com/yt-dlp/yt-dlp/pull/31) for details.
@@ -670,24 +670,18 @@ ## Post-Processing Options:
--add-metadata Write metadata to the video file
--no-add-metadata Do not write metadata (default)
--parse-metadata FIELD:FORMAT Parse additional metadata like title/artist
from other fields. Give a template or field
name to extract data from and the format to
interpret it as, seperated by a ":". Either
regular expression with named capture
groups or a similar syntax to the output
template can be used for the FORMAT.
Similarly, the syntax for output template
can be used for FIELD to parse the data
from multiple fields. The parsed parameters
replace any existing values and can be used
in output templates. This option can be
used multiple times. Example: --parse-
metadata "title:%(artist)s - %(title)s"
matches a title like "Coldplay - Paradise".
Example: --parse-metadata "%(series)s
%(episode_number)s:%(title)s" sets the
title using series and episode number.
Example (regex): --parse-metadata
from other fields. Give field name to
extract data from, and format of the field
seperated by a ":". Either regular
expression with named capture groups or a
similar syntax to the output template can
also be used. The parsed parameters replace
any existing values and can be use in
output template. This option can be used
multiple times. Example: --parse-metadata
"title:%(artist)s - %(title)s" matches a
title like "Coldplay - Paradise". Example
(regex): --parse-metadata
"description:Artist - (?P<artist>.+?)"
--xattrs Write metadata to the video file's xattrs
(using dublin core and xdg standards)

View File

@@ -97,8 +97,7 @@ # Supported sites
- **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer
- **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:episodes**
- **bbc.co.uk:iplayer:group**
- **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist**
- **BBVTV**
- **Beatport**
@@ -1252,6 +1251,5 @@ # Supported sites
- **zee5:series**
- **Zhihu**
- **zingmp3**: mp3.zing.vn
- **zingmp3:album**
- **zoom**
- **Zype**

View File

@@ -60,14 +60,12 @@
encode_compat_str,
encodeFilename,
error_to_compat_str,
EntryNotInPlaylist,
ExistingVideoReached,
expand_path,
ExtractorError,
float_or_none,
format_bytes,
format_field,
FORMAT_RE,
formatSeconds,
GeoRestrictedError,
int_or_none,
@@ -773,93 +771,95 @@ def parse_outtmpl(self):
'Put from __future__ import unicode_literals at the top of your code file or consider switching to Python 3.x.')
return outtmpl_dict
def prepare_outtmpl(self, outtmpl, info_dict, sanitize=None):
""" Make the template and info_dict suitable for substitution (outtmpl % info_dict)"""
template_dict = dict(info_dict)
# duration_string
template_dict['duration_string'] = ( # %(duration>%H-%M-%S)s is wrong if duration > 24hrs
formatSeconds(info_dict['duration'], '-')
if info_dict.get('duration', None) is not None
else None)
# epoch
template_dict['epoch'] = int(time.time())
# autonumber
autonumber_size = self.params.get('autonumber_size')
if autonumber_size is None:
autonumber_size = 5
template_dict['autonumber'] = self.params.get('autonumber_start', 1) - 1 + self._num_downloads
# resolution if not defined
if template_dict.get('resolution') is None:
if template_dict.get('width') and template_dict.get('height'):
template_dict['resolution'] = '%dx%d' % (template_dict['width'], template_dict['height'])
elif template_dict.get('height'):
template_dict['resolution'] = '%sp' % template_dict['height']
elif template_dict.get('width'):
template_dict['resolution'] = '%dx?' % template_dict['width']
if sanitize is None:
sanitize = lambda k, v: v
template_dict = dict((k, v if isinstance(v, compat_numeric_types) else sanitize(k, v))
for k, v in template_dict.items()
if v is not None and not isinstance(v, (list, tuple, dict)))
na = self.params.get('outtmpl_na_placeholder', 'NA')
template_dict = collections.defaultdict(lambda: na, template_dict)
# For fields playlist_index and autonumber convert all occurrences
# of %(field)s to %(field)0Nd for backward compatibility
field_size_compat_map = {
'playlist_index': len(str(template_dict['n_entries'])),
'autonumber': autonumber_size,
}
FIELD_SIZE_COMPAT_RE = r'(?<!%)%\((?P<field>autonumber|playlist_index)\)s'
mobj = re.search(FIELD_SIZE_COMPAT_RE, outtmpl)
if mobj:
outtmpl = re.sub(
FIELD_SIZE_COMPAT_RE,
r'%%(\1)0%dd' % field_size_compat_map[mobj.group('field')],
outtmpl)
numeric_fields = list(self._NUMERIC_FIELDS)
# Format date
FORMAT_DATE_RE = FORMAT_RE.format(r'(?P<key>(?P<field>\w+)>(?P<format>.+?))')
for mobj in re.finditer(FORMAT_DATE_RE, outtmpl):
conv_type, field, frmt, key = mobj.group('type', 'field', 'format', 'key')
if key in template_dict:
continue
value = strftime_or_none(template_dict.get(field), frmt, na)
if conv_type in 'crs': # string
value = sanitize(field, value)
else: # number
numeric_fields.append(key)
value = float_or_none(value, default=None)
if value is not None:
template_dict[key] = value
# Missing numeric fields used together with integer presentation types
# in format specification will break the argument substitution since
# string NA placeholder is returned for missing fields. We will patch
# output template for missing fields to meet string presentation type.
for numeric_field in numeric_fields:
if numeric_field not in template_dict:
outtmpl = re.sub(
FORMAT_RE.format(re.escape(numeric_field)),
r'%({0})s'.format(numeric_field), outtmpl)
return outtmpl, template_dict
def _prepare_filename(self, info_dict, tmpl_type='default'):
try:
template_dict = dict(info_dict)
template_dict['duration_string'] = ( # %(duration>%H-%M-%S)s is wrong if duration > 24hrs
formatSeconds(info_dict['duration'], '-')
if info_dict.get('duration', None) is not None
else None)
template_dict['epoch'] = int(time.time())
autonumber_size = self.params.get('autonumber_size')
if autonumber_size is None:
autonumber_size = 5
template_dict['autonumber'] = self.params.get('autonumber_start', 1) - 1 + self._num_downloads
if template_dict.get('resolution') is None:
if template_dict.get('width') and template_dict.get('height'):
template_dict['resolution'] = '%dx%d' % (template_dict['width'], template_dict['height'])
elif template_dict.get('height'):
template_dict['resolution'] = '%sp' % template_dict['height']
elif template_dict.get('width'):
template_dict['resolution'] = '%dx?' % template_dict['width']
sanitize = lambda k, v: sanitize_filename(
compat_str(v),
restricted=self.params.get('restrictfilenames'),
is_id=(k == 'id' or k.endswith('_id')))
template_dict = dict((k, v if isinstance(v, compat_numeric_types) else sanitize(k, v))
for k, v in template_dict.items()
if v is not None and not isinstance(v, (list, tuple, dict)))
na = self.params.get('outtmpl_na_placeholder', 'NA')
template_dict = collections.defaultdict(lambda: na, template_dict)
outtmpl = self.outtmpl_dict.get(tmpl_type, self.outtmpl_dict['default'])
outtmpl, template_dict = self.prepare_outtmpl(outtmpl, info_dict, sanitize)
force_ext = OUTTMPL_TYPES.get(tmpl_type)
# For fields playlist_index and autonumber convert all occurrences
# of %(field)s to %(field)0Nd for backward compatibility
field_size_compat_map = {
'playlist_index': len(str(template_dict['n_entries'])),
'autonumber': autonumber_size,
}
FIELD_SIZE_COMPAT_RE = r'(?<!%)%\((?P<field>autonumber|playlist_index)\)s'
mobj = re.search(FIELD_SIZE_COMPAT_RE, outtmpl)
if mobj:
outtmpl = re.sub(
FIELD_SIZE_COMPAT_RE,
r'%%(\1)0%dd' % field_size_compat_map[mobj.group('field')],
outtmpl)
# As of [1] format syntax is:
# %[mapping_key][conversion_flags][minimum_width][.precision][length_modifier]type
# 1. https://docs.python.org/2/library/stdtypes.html#string-formatting
FORMAT_RE = r'''(?x)
(?<!%)
%
\({0}\) # mapping key
(?:[#0\-+ ]+)? # conversion flags (optional)
(?:\d+)? # minimum field width (optional)
(?:\.\d+)? # precision (optional)
[hlL]? # length modifier (optional)
(?P<type>[diouxXeEfFgGcrs%]) # conversion type
'''
numeric_fields = list(self._NUMERIC_FIELDS)
# Format date
FORMAT_DATE_RE = FORMAT_RE.format(r'(?P<key>(?P<field>\w+)>(?P<format>.+?))')
for mobj in re.finditer(FORMAT_DATE_RE, outtmpl):
conv_type, field, frmt, key = mobj.group('type', 'field', 'format', 'key')
if key in template_dict:
continue
value = strftime_or_none(template_dict.get(field), frmt, na)
if conv_type in 'crs': # string
value = sanitize(field, value)
else: # number
numeric_fields.append(key)
value = float_or_none(value, default=None)
if value is not None:
template_dict[key] = value
# Missing numeric fields used together with integer presentation types
# in format specification will break the argument substitution since
# string NA placeholder is returned for missing fields. We will patch
# output template for missing fields to meet string presentation type.
for numeric_field in numeric_fields:
if numeric_field not in template_dict:
outtmpl = re.sub(
FORMAT_RE.format(re.escape(numeric_field)),
r'%({0})s'.format(numeric_field), outtmpl)
# expand_path translates '%%' into '%' and '$$' into '$'
# correspondingly that is not what we want since we need to keep
@@ -874,7 +874,6 @@ def _prepare_filename(self, info_dict, tmpl_type='default'):
# title "Hello $PATH", we don't want `$PATH` to be expanded.
filename = expand_path(outtmpl).replace(sep, '') % template_dict
force_ext = OUTTMPL_TYPES.get(tmpl_type)
if force_ext is not None:
filename = replace_extension(filename, force_ext, template_dict.get('ext'))
@@ -1181,16 +1180,48 @@ def __process_playlist(self, ie_result, download):
playlist = ie_result.get('title') or ie_result.get('id')
self.to_screen('[download] Downloading playlist: %s' % playlist)
if 'entries' not in ie_result:
raise EntryNotInPlaylist()
incomplete_entries = bool(ie_result.get('requested_entries'))
if incomplete_entries:
def fill_missing_entries(entries, indexes):
ret = [None] * max(*indexes)
for i, entry in zip(indexes, entries):
ret[i - 1] = entry
return ret
ie_result['entries'] = fill_missing_entries(ie_result['entries'], ie_result['requested_entries'])
if self.params.get('allow_playlist_files', True):
ie_copy = {
'playlist': playlist,
'playlist_id': ie_result.get('id'),
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': 0
}
ie_copy.update(dict(ie_result))
if self.params.get('writeinfojson', False):
infofn = self.prepare_filename(ie_copy, 'pl_infojson')
if not self._ensure_dir_exists(encodeFilename(infofn)):
return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(infofn)):
self.to_screen('[info] Playlist metadata is already present')
else:
playlist_info = dict(ie_result)
# playlist_info['entries'] = list(playlist_info['entries']) # Entries is a generator which shouldnot be resolved here
self.to_screen('[info] Writing playlist metadata as JSON to: ' + infofn)
try:
write_json_file(self.filter_requested_info(playlist_info, self.params.get('clean_infojson', True)), infofn)
except (OSError, IOError):
self.report_error('Cannot write playlist metadata to JSON file ' + infofn)
if self.params.get('writedescription', False):
descfn = self.prepare_filename(ie_copy, 'pl_description')
if not self._ensure_dir_exists(encodeFilename(descfn)):
return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(descfn)):
self.to_screen('[info] Playlist description is already present')
elif ie_result.get('description') is None:
self.report_warning('There\'s no playlist description to write.')
else:
try:
self.to_screen('[info] Writing playlist description to: ' + descfn)
with io.open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
descfile.write(ie_result['description'])
except (OSError, IOError):
self.report_error('Cannot write playlist description file ' + descfn)
return
playlist_results = []
@@ -1217,20 +1248,25 @@ def iter_playlistitems(format):
def make_playlistitems_entries(list_ie_entries):
num_entries = len(list_ie_entries)
for i in playlistitems:
if -num_entries < i <= num_entries:
yield list_ie_entries[i - 1]
elif incomplete_entries:
raise EntryNotInPlaylist()
return [
list_ie_entries[i - 1] for i in playlistitems
if -num_entries <= i - 1 < num_entries]
def report_download(num_entries):
self.to_screen(
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, num_entries))
if isinstance(ie_entries, list):
n_all_entries = len(ie_entries)
if playlistitems:
entries = list(make_playlistitems_entries(ie_entries))
entries = make_playlistitems_entries(ie_entries)
else:
entries = ie_entries[playliststart:playlistend]
n_entries = len(entries)
msg = 'Collected %d videos; downloading %d of them' % (n_all_entries, n_entries)
self.to_screen(
'[%s] playlist %s: Collected %d video ids (downloading %d of them)' %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
elif isinstance(ie_entries, PagedList):
if playlistitems:
entries = []
@@ -1242,73 +1278,25 @@ def make_playlistitems_entries(list_ie_entries):
entries = ie_entries.getslice(
playliststart, playlistend)
n_entries = len(entries)
msg = 'Downloading %d videos' % n_entries
report_download(n_entries)
else: # iterable
if playlistitems:
entries = list(make_playlistitems_entries(list(itertools.islice(
ie_entries, 0, max(playlistitems)))))
entries = make_playlistitems_entries(list(itertools.islice(
ie_entries, 0, max(playlistitems))))
else:
entries = list(itertools.islice(
ie_entries, playliststart, playlistend))
n_entries = len(entries)
msg = 'Downloading %d videos' % n_entries
if any((entry is None for entry in entries)):
raise EntryNotInPlaylist()
if not playlistitems and (playliststart or playlistend):
playlistitems = list(range(1 + playliststart, 1 + playliststart + len(entries)))
ie_result['entries'] = entries
ie_result['requested_entries'] = playlistitems
if self.params.get('allow_playlist_files', True):
ie_copy = {
'playlist': playlist,
'playlist_id': ie_result.get('id'),
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': 0
}
ie_copy.update(dict(ie_result))
if self.params.get('writeinfojson', False):
infofn = self.prepare_filename(ie_copy, 'pl_infojson')
if not self._ensure_dir_exists(encodeFilename(infofn)):
return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(infofn)):
self.to_screen('[info] Playlist metadata is already present')
else:
self.to_screen('[info] Writing playlist metadata as JSON to: ' + infofn)
try:
write_json_file(self.filter_requested_info(ie_result, self.params.get('clean_infojson', True)), infofn)
except (OSError, IOError):
self.report_error('Cannot write playlist metadata to JSON file ' + infofn)
if self.params.get('writedescription', False):
descfn = self.prepare_filename(ie_copy, 'pl_description')
if not self._ensure_dir_exists(encodeFilename(descfn)):
return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(descfn)):
self.to_screen('[info] Playlist description is already present')
elif ie_result.get('description') is None:
self.report_warning('There\'s no playlist description to write.')
else:
try:
self.to_screen('[info] Writing playlist description to: ' + descfn)
with io.open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
descfile.write(ie_result['description'])
except (OSError, IOError):
self.report_error('Cannot write playlist description file ' + descfn)
return
report_download(n_entries)
if self.params.get('playlistreverse', False):
entries = entries[::-1]
if self.params.get('playlistrandom', False):
random.shuffle(entries)
x_forwarded_for = ie_result.get('__x_forwarded_for_ip')
self.to_screen('[%s] playlist %s: %s' % (ie_result['extractor'], playlist, msg))
for i, entry in enumerate(entries, 1):
self.to_screen('[download] Downloading video %s of %s' % (i, n_entries))
# This __x_forwarded_for_ip thing is a bit ugly but requires
@@ -1322,7 +1310,7 @@ def make_playlistitems_entries(list_ie_entries):
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': playlistitems[i - 1] if playlistitems else i,
'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']),
@@ -2536,10 +2524,10 @@ def download_with_info_file(self, info_filename):
[info_filename], mode='r',
openhook=fileinput.hook_encoded('utf-8'))) as f:
# FileInput doesn't have a read method, we can't call json.load
info = self.filter_requested_info(json.loads('\n'.join(f)), self.params.get('clean_infojson', True))
info = self.filter_requested_info(json.loads('\n'.join(f)))
try:
self.process_ie_result(info, download=True)
except (DownloadError, EntryNotInPlaylist):
except DownloadError:
webpage_url = info.get('webpage_url')
if webpage_url is not None:
self.report_warning('The info failed to download, trying with "%s"' % webpage_url)
@@ -2551,10 +2539,9 @@ def download_with_info_file(self, info_filename):
@staticmethod
def filter_requested_info(info_dict, actually_filter=True):
if not actually_filter:
info_dict['epoch'] = int(time.time())
return info_dict
exceptions = {
'remove': ['requested_formats', 'requested_subtitles', 'requested_entries', 'filepath', 'entries'],
'remove': ['requested_formats', 'requested_subtitles', 'filepath', 'entries'],
'keep': ['_type'],
}
keep_key = lambda k: k in exceptions['keep'] or not (k.startswith('_') or k in exceptions['remove'])

View File

@@ -79,7 +79,8 @@ def download_and_parse_fragment(url, frag_index, request_data):
self._prepare_and_start_frag_download(ctx)
success, raw_fragment = dl_fragment(info_dict['url'])
success, raw_fragment = dl_fragment(
'https://www.youtube.com/watch?v={}'.format(video_id))
if not success:
return False
try:

View File

@@ -1,22 +1,17 @@
# coding: utf-8
from __future__ import unicode_literals
import functools
import itertools
import json
import re
from .common import InfoExtractor
from ..compat import (
compat_etree_Element,
compat_HTTPError,
compat_parse_qs,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
OnDemandPagedList,
clean_html,
dict_get,
float_or_none,
@@ -816,7 +811,7 @@ class BBCIE(BBCCoUkIE):
@classmethod
def suitable(cls, url):
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerEpisodesIE, BBCCoUkIPlayerGroupIE, BBCCoUkPlaylistIE)
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE)
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
else super(BBCIE, cls).suitable(url))
@@ -1343,149 +1338,21 @@ def _real_extract(self, url):
playlist_id, title, description)
class BBCCoUkIPlayerPlaylistBaseIE(InfoExtractor):
_VALID_URL_TMPL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/%%s/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
@staticmethod
def _get_default(episode, key, default_key='default'):
return try_get(episode, lambda x: x[key][default_key])
def _get_description(self, data):
synopsis = data.get(self._DESCRIPTION_KEY) or {}
return dict_get(synopsis, ('large', 'medium', 'small'))
def _fetch_page(self, programme_id, per_page, series_id, page):
elements = self._get_elements(self._call_api(
programme_id, per_page, page + 1, series_id))
for element in elements:
episode = self._get_episode(element)
episode_id = episode.get('id')
if not episode_id:
continue
thumbnail = None
image = self._get_episode_image(episode)
if image:
thumbnail = image.replace('{recipe}', 'raw')
category = self._get_default(episode, 'labels', 'category')
yield {
'_type': 'url',
'id': episode_id,
'title': self._get_episode_field(episode, 'subtitle'),
'url': 'https://www.bbc.co.uk/iplayer/episode/' + episode_id,
'thumbnail': thumbnail,
'description': self._get_description(episode),
'categories': [category] if category else None,
'series': self._get_episode_field(episode, 'title'),
'ie_key': BBCCoUkIE.ie_key(),
}
def _real_extract(self, url):
pid = self._match_id(url)
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
series_id = qs.get('seriesId', [None])[0]
page = qs.get('page', [None])[0]
per_page = 36 if page else self._PAGE_SIZE
fetch_page = functools.partial(self._fetch_page, pid, per_page, series_id)
entries = fetch_page(int(page) - 1) if page else OnDemandPagedList(fetch_page, self._PAGE_SIZE)
playlist_data = self._get_playlist_data(self._call_api(pid, 1))
return self.playlist_result(
entries, pid, self._get_playlist_title(playlist_data),
self._get_description(playlist_data))
class BBCCoUkIPlayerEpisodesIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:episodes'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'episodes'
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
_TESTS = [{
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance',
'description': 'md5:58eb101aee3116bad4da05f91179c0cb',
'description': 'French thriller serial about a missing teenager.',
},
'playlist_mincount': 8,
'playlist_mincount': 6,
'skip': 'This programme is not currently available on BBC iPlayer',
}, {
# all seasons
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 10,
}, {
# explicit season
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster?seriesId=b094m6nv',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 5,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 37,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove?page=2',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 1,
}]
_PAGE_SIZE = 100
_DESCRIPTION_KEY = 'synopsis'
def _get_episode_image(self, episode):
return self._get_default(episode, 'image')
def _get_episode_field(self, episode, field):
return self._get_default(episode, field)
@staticmethod
def _get_elements(data):
return data['entities']['results']
@staticmethod
def _get_episode(element):
return element.get('episode') or {}
def _call_api(self, pid, per_page, page=1, series_id=None):
variables = {
'id': pid,
'page': page,
'perPage': per_page,
}
if series_id:
variables['sliceId'] = series_id
return self._download_json(
'https://graph.ibl.api.bbc.co.uk/', pid, headers={
'Content-Type': 'application/json'
}, data=json.dumps({
'id': '5692d93d5aac8d796a0305e895e61551',
'variables': variables,
}).encode('utf-8'))['data']['programme']
@staticmethod
def _get_playlist_data(data):
return data
def _get_playlist_title(self, data):
return self._get_default(data, 'title')
class BBCCoUkIPlayerGroupIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:group'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'group'
_TESTS = [{
# Available for over a year unlike 30 days for most other programmes
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
'info_dict': {
@@ -1494,56 +1361,14 @@ class BBCCoUkIPlayerGroupIE(BBCCoUkIPlayerPlaylistBaseIE):
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
},
'playlist_mincount': 10,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 47,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7?page=2',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 11,
}]
_PAGE_SIZE = 200
_DESCRIPTION_KEY = 'synopses'
def _get_episode_image(self, episode):
return self._get_default(episode, 'images', 'standard')
def _get_episode_field(self, episode, field):
return episode.get(field)
@staticmethod
def _get_elements(data):
return data['elements']
@staticmethod
def _get_episode(element):
return element
def _call_api(self, pid, per_page, page=1, series_id=None):
return self._download_json(
'http://ibl.api.bbc.co.uk/ibl/v1/groups/%s/episodes' % pid,
pid, query={
'page': page,
'per_page': per_page,
})['group_episodes']
@staticmethod
def _get_playlist_data(data):
return data['group']
def _get_playlist_title(self, data):
return data.get('title')
def _extract_title_and_description(self, webpage):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
description = self._search_regex(
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>',
webpage, 'description', fatal=False, group='value')
return title, description
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):

View File

@@ -108,8 +108,7 @@
from .bbc import (
BBCCoUkIE,
BBCCoUkArticleIE,
BBCCoUkIPlayerEpisodesIE,
BBCCoUkIPlayerGroupIE,
BBCCoUkIPlayerPlaylistIE,
BBCCoUkPlaylistIE,
BBCIE,
)
@@ -1674,14 +1673,9 @@
ZattooLiveIE,
)
from .zdf import ZDFIE, ZDFChannelIE
from .zee5 import (
Zee5IE,
Zee5SeriesIE,
)
from .zhihu import ZhihuIE
from .zingmp3 import (
ZingMp3IE,
ZingMp3AlbumIE,
)
from .zingmp3 import ZingMp3IE
from .zee5 import Zee5IE
from .zee5 import Zee5SeriesIE
from .zoom import ZoomIE
from .zype import ZypeIE

View File

@@ -2965,7 +2965,7 @@ def _real_extract(self, url):
webpage)
if not mobj:
mobj = re.search(
r'data-video-link=["\'](?P<url>http://m\.mlb\.com/video/[^"\']+)',
r'data-video-link=["\'](?P<url>http://m.mlb.com/video/[^"\']+)',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'MLB')

View File

@@ -112,7 +112,7 @@ def random_string():
'client_id': self._CLIENT_ID,
'redirect_uri': self._ORIGIN_URL,
'tenant': 'lacausers',
'connection': 'Username-Password-ACG-Proxy',
'connection': 'Username-Password-Authentication',
'username': username,
'password': password,
'sso': 'true',

View File

@@ -340,7 +340,7 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media\.mtvnservices\.com/embed/.+?)\1', webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media.mtvnservices.com/embed/.+?)\1', webpage)
if mobj:
return mobj.group('url')

View File

@@ -492,12 +492,13 @@ def get_video_info_xml(items):
self._sort_formats(formats)
# Start extracting information
title = (
get_video_info_web(['originalTitle', 'title'])
or self._og_search_title(webpage, default=None)
or self._html_search_regex(
title = get_video_info_web('originalTitle')
if not title:
title = self._og_search_title(webpage, default=None)
if not title:
title = self._html_search_regex(
r'<span[^>]+class="videoHeaderTitle"[^>]*>([^<]+)</span>',
webpage, 'video title'))
webpage, 'video title')
watch_api_data_string = self._html_search_regex(
r'<div[^>]+id="watchAPIDataContainer"[^>]+>([^<]+)</div>',

View File

@@ -143,10 +143,7 @@ def _real_extract(self, url):
props_data = try_get(json_data, lambda x: x['props'], expected_type=dict)
# Chech statusCode for success
status = props_data.get('pageProps').get('statusCode')
if status == 0:
if props_data.get('pageProps').get('statusCode') == 0:
return self._extract_aweme(props_data, webpage, url)
elif status == 10216:
raise ExtractorError('This video is private', expected=True)
raise ExtractorError('Video not available', video_id=video_id)

View File

@@ -23,8 +23,6 @@ class VGTVIE(XstreamIE):
'fvn.no/fvntv': 'fvntv',
'aftenposten.no/webtv': 'aptv',
'ap.vgtv.no/webtv': 'aptv',
'tv.aftonbladet.se': 'abtv',
# obsolete URL schemas, kept in order to save one HTTP redirect
'tv.aftonbladet.se/abtv': 'abtv',
'www.aftonbladet.se/tv': 'abtv',
}
@@ -142,10 +140,6 @@ class VGTVIE(XstreamIE):
'url': 'http://www.vgtv.no/#!/video/127205/inside-the-mind-of-favela-funk',
'only_matching': True,
},
{
'url': 'https://tv.aftonbladet.se/video/36015/vulkanutbrott-i-rymden-nu-slapper-nasa-bilderna',
'only_matching': True,
},
{
'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'only_matching': True,

View File

@@ -1947,7 +1947,7 @@ def feed_entry(name):
f['format_id'] = itag
formats.append(f)
if self._downloader.params.get('youtube_include_dash_manifest', True):
if self._downloader.params.get('youtube_include_dash_manifest'):
dash_manifest_url = streaming_data.get('dashManifestUrl')
if dash_manifest_url:
for f in self._extract_mpd_formats(
@@ -2150,7 +2150,6 @@ def process_language(container, base_url, lang_code, query):
# This will error if there is no livechat
initial_data['contents']['twoColumnWatchNextResults']['conversationBar']['liveChatRenderer']['continuations'][0]['reloadContinuationData']['continuation']
info['subtitles']['live_chat'] = [{
'url': 'https://www.youtube.com/watch?v=%s' % video_id, # url is needed to set cookies
'video_id': video_id,
'ext': 'json',
'protocol': 'youtube_live_chat_replay',

View File

@@ -1,94 +1,93 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
update_url_query,
)
class ZingMp3BaseIE(InfoExtractor):
_VALID_URL_TMPL = r'https?://(?:mp3\.zing|zingmp3)\.vn/(?:%s)/[^/]+/(?P<id>\w+)\.html'
_GEO_COUNTRIES = ['VN']
class ZingMp3BaseInfoExtractor(InfoExtractor):
def _extract_item(self, item, fatal):
item_id = item['id']
title = item.get('name') or item['title']
formats = []
for k, v in (item.get('source') or {}).items():
if not v:
continue
if k in ('mp4', 'hls'):
for res, video_url in v.items():
if not video_url:
continue
if k == 'hls':
formats.extend(self._extract_m3u8_formats(
video_url, item_id, 'mp4',
'm3u8_native', m3u8_id=k, fatal=False))
elif k == 'mp4':
formats.append({
'format_id': 'mp4-' + res,
'url': video_url,
'height': int_or_none(self._search_regex(
r'^(\d+)p', res, 'resolution', default=None)),
})
else:
formats.append({
'ext': 'mp3',
'format_id': k,
'tbr': int_or_none(k),
'url': self._proto_relative_url(v),
'vcodec': 'none',
})
if not formats:
def _extract_item(self, item, page_type, fatal=True):
error_message = item.get('msg')
if error_message:
if not fatal:
return
msg = item['msg']
if msg == 'Sorry, this content is not available in your country.':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
raise ExtractorError(msg, expected=True)
self._sort_formats(formats)
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error_message),
expected=True)
subtitles = None
lyric = item.get('lyric')
if lyric:
subtitles = {
'origin': [{
'url': lyric,
}],
formats = []
for quality, source_url in zip(item.get('qualities') or item.get('quality', []), item.get('source_list') or item.get('source', [])):
if not source_url or source_url == 'require vip':
continue
if not re.match(r'https?://', source_url):
source_url = '//' + source_url
source_url = self._proto_relative_url(source_url, 'http:')
quality_num = int_or_none(quality)
f = {
'format_id': quality,
'url': source_url,
}
if page_type == 'video':
f.update({
'height': quality_num,
'ext': 'mp4',
})
else:
f.update({
'abr': quality_num,
'ext': 'mp3',
})
formats.append(f)
album = item.get('album') or {}
cover = item.get('cover')
return {
'id': item_id,
'title': title,
'title': (item.get('name') or item.get('title')).strip(),
'formats': formats,
'thumbnail': item.get('thumbnail'),
'subtitles': subtitles,
'duration': int_or_none(item.get('duration')),
'track': title,
'artist': item.get('artists_names'),
'album': album.get('name') or album.get('title'),
'album_artist': album.get('artists_names'),
'thumbnail': 'http:/' + cover if cover else None,
'artist': item.get('artist'),
}
def _real_extract(self, url):
page_id = self._match_id(url)
webpage = self._download_webpage(
url.replace('://zingmp3.vn/', '://mp3.zing.vn/'),
page_id, query={'play_song': 1})
data_path = self._search_regex(
r'data-xml="([^"]+)', webpage, 'data path')
return self._process_data(self._download_json(
'https://mp3.zing.vn/xhr' + data_path, page_id)['data'])
def _extract_player_json(self, player_json_url, id, page_type, playlist_title=None):
player_json = self._download_json(player_json_url, id, 'Downloading Player JSON')
items = player_json['data']
if 'item' in items:
items = items['item']
if len(items) == 1:
# one single song
data = self._extract_item(items[0], page_type)
data['id'] = id
return data
else:
# playlist of songs
entries = []
for i, item in enumerate(items, 1):
entry = self._extract_item(item, page_type, fatal=False)
if not entry:
continue
entry['id'] = '%s-%d' % (id, i)
entries.append(entry)
return {
'_type': 'playlist',
'id': id,
'title': playlist_title,
'entries': entries,
}
class ZingMp3IE(ZingMp3BaseIE):
_VALID_URL = ZingMp3BaseIE._VALID_URL_TMPL % 'bai-hat|video-clip'
class ZingMp3IE(ZingMp3BaseInfoExtractor):
_VALID_URL = r'https?://mp3\.zing\.vn/(?:bai-hat|album|playlist|video-clip)/[^/]+/(?P<id>\w+)\.html'
_TESTS = [{
'url': 'http://mp3.zing.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
'md5': 'ead7ae13693b3205cbc89536a077daed',
@@ -96,66 +95,49 @@ class ZingMp3IE(ZingMp3BaseIE):
'id': 'ZWZB9WAB',
'title': 'Xa Mãi Xa',
'ext': 'mp3',
'thumbnail': r're:^https?://.+\.jpg',
'subtitles': {
'origin': [{
'ext': 'lrc',
}]
},
'duration': 255,
'track': 'Xa Mãi Xa',
'artist': 'Bảo Thy',
'album': 'Special Album',
'album_artist': 'Bảo Thy',
'thumbnail': r're:^https?://.*\.jpg$',
},
}, {
'url': 'https://mp3.zing.vn/video-clip/Suong-Hoa-Dua-Loi-K-ICM-RYO/ZO8ZF7C7.html',
'md5': 'e9c972b693aa88301ef981c8151c4343',
'url': 'http://mp3.zing.vn/video-clip/Let-It-Go-Frozen-OST-Sungha-Jung/ZW6BAEA0.html',
'md5': '870295a9cd8045c0e15663565902618d',
'info_dict': {
'id': 'ZO8ZF7C7',
'title': 'Sương Hoa Đưa Lối',
'id': 'ZW6BAEA0',
'title': 'Let It Go (Frozen OST)',
'ext': 'mp4',
'thumbnail': r're:^https?://.+\.jpg',
'duration': 207,
'track': 'Sương Hoa Đưa Lối',
'artist': 'K-ICM, RYO',
},
}, {
'url': 'https://zingmp3.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
'info_dict': {
'_type': 'playlist',
'id': 'ZWZBWDAF',
'title': 'Lâu Đài Tình Ái - Bằng Kiều,Minh Tuyết | Album 320 lossless',
},
'playlist_count': 10,
'skip': 'removed at the request of the owner',
}, {
'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
'only_matching': True,
}]
IE_NAME = 'zingmp3'
IE_DESC = 'mp3.zing.vn'
def _process_data(self, data):
return self._extract_item(data, True)
def _real_extract(self, url):
page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id)
class ZingMp3AlbumIE(ZingMp3BaseIE):
_VALID_URL = ZingMp3BaseIE._VALID_URL_TMPL % 'album|playlist'
_TESTS = [{
'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
'info_dict': {
'_type': 'playlist',
'id': 'ZWZBWDAF',
'title': 'Lâu Đài Tình Ái',
},
'playlist_count': 10,
}, {
'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
'only_matching': True,
}, {
'url': 'https://zingmp3.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
'only_matching': True,
}]
IE_NAME = 'zingmp3:album'
player_json_url = self._search_regex([
r'data-xml="([^"]+)',
r'&amp;xmlURL=([^&]+)&'
], webpage, 'player xml url')
def _process_data(self, data):
def entries():
for item in (data.get('items') or []):
entry = self._extract_item(item, False)
if entry:
yield entry
info = data.get('info') or {}
return self.playlist_result(
entries(), info.get('id'), info.get('name') or info.get('title'))
playlist_title = None
page_type = self._search_regex(r'/(?:html5)?xml/([^/-]+)', player_json_url, 'page type')
if page_type == 'video':
player_json_url = update_url_query(player_json_url, {'format': 'json'})
else:
player_json_url = player_json_url.replace('/xml/', '/html5xml/')
if page_type == 'album':
playlist_title = self._og_search_title(webpage)
return self._extract_player_json(player_json_url, page_id, page_type, playlist_title)

View File

@@ -1,68 +1,82 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
url_or_none,
parse_filesize,
urlencode_postdata,
urlencode_postdata
)
class ZoomIE(InfoExtractor):
IE_NAME = 'zoom'
_VALID_URL = r'(?P<base_url>https?://(?:[^.]+\.)?zoom.us/)rec(?:ording)?/(?:play|share)/(?P<id>[A-Za-z0-9_.-]+)'
_VALID_URL = r'https://(?:.*).?zoom.us/rec(?:ording)?/(play|share)/(?P<id>[A-Za-z0-9\-_.]+)'
_TEST = {
'url': 'https://economist.zoom.us/rec/play/dUk_CNBETmZ5VA2BwEl-jjakPpJ3M1pcfVYAPRsoIbEByGsLjUZtaa4yCATQuOL3der8BlTwxQePl_j0.EImBkXzTIaPvdZO5',
'md5': 'ab445e8c911fddc4f9adc842c2c5d434',
'url': 'https://zoom.us/recording/play/SILVuCL4bFtRwWTtOCFQQxAsBQsJljFtm9e4Z_bvo-A8B-nzUSYZRNuPl3qW5IGK',
'info_dict': {
'id': 'dUk_CNBETmZ5VA2BwEl-jjakPpJ3M1pcfVYAPRsoIbEByGsLjUZtaa4yCATQuOL3der8BlTwxQePl_j0.EImBkXzTIaPvdZO5',
'ext': 'mp4',
'title': 'China\'s "two sessions" and the new five-year plan',
'md5': '031a5b379f1547a8b29c5c4c837dccf2',
'title': "GAZ Transformational Tuesdays W/ Landon & Stapes",
'id': "SILVuCL4bFtRwWTtOCFQQxAsBQsJljFtm9e4Z_bvo-A8B-nzUSYZRNuPl3qW5IGK",
'ext': "mp4"
}
}
def _real_extract(self, url):
base_url, play_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, play_id)
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
try:
form = self._form_hidden_inputs('password_form', webpage)
except ExtractorError:
form = None
if form:
password = self._downloader.params.get('videopassword')
if not password:
raise ExtractorError(
'This video is protected by a passcode, use the --video-password option', expected=True)
is_meeting = form.get('useWhichPasswd') == 'meeting'
validation = self._download_json(
base_url + 'rec/validate%s_passwd' % ('_meet' if is_meeting else ''),
play_id, 'Validating passcode', 'Wrong passcode', data=urlencode_postdata({
'id': form[('meet' if is_meeting else 'file') + 'Id'],
'passwd': password,
'action': form.get('action'),
}))
if not validation.get('status'):
raise ExtractorError(validation['errorMessage'], expected=True)
webpage = self._download_webpage(url, play_id)
password_protected = self._search_regex(r'<form[^>]+?id="(password_form)"', webpage, 'password field', fatal=False, default=None)
if password_protected is not None:
self._verify_video_password(url, display_id, webpage)
webpage = self._download_webpage(url, display_id)
data = self._parse_json(self._search_regex(
r'(?s)window\.__data__\s*=\s*({.+?});',
webpage, 'data'), play_id, js_to_json)
video_url = self._search_regex(r"viewMp4Url: \'(.*)\'", webpage, 'video url')
title = self._html_search_regex([r"topic: \"(.*)\",", r"<title>(.*) - Zoom</title>"], webpage, 'title')
viewResolvtionsWidth = self._search_regex(r"viewResolvtionsWidth: (\d*)", webpage, 'res width', fatal=False)
viewResolvtionsHeight = self._search_regex(r"viewResolvtionsHeight: (\d*)", webpage, 'res height', fatal=False)
fileSize = parse_filesize(self._search_regex(r"fileSize: \'(.+)\'", webpage, 'fileSize', fatal=False))
urlprefix = url.split("zoom.us")[0] + "zoom.us/"
formats = []
formats.append({
'url': url_or_none(video_url),
'width': int_or_none(viewResolvtionsWidth),
'height': int_or_none(viewResolvtionsHeight),
'http_headers': {'Accept': 'video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5',
'Referer': urlprefix},
'ext': "mp4",
'filesize_approx': int_or_none(fileSize)
})
self._sort_formats(formats)
return {
'id': play_id,
'title': data['topic'],
'url': data['viewMp4Url'],
'width': int_or_none(data.get('viewResolvtionsWidth')),
'height': int_or_none(data.get('viewResolvtionsHeight')),
'http_headers': {
'Referer': base_url,
},
'filesize_approx': parse_filesize(data.get('fileSize')),
'id': display_id,
'title': title,
'formats': formats
}
def _verify_video_password(self, url, video_id, webpage):
password = self._downloader.params.get('videopassword')
if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
meetId = self._search_regex(r'<input[^>]+?id="meetId" value="([^\"]+)"', webpage, 'meetId')
data = urlencode_postdata({
'id': meetId,
'passwd': password,
'action': "viewdetailedpage",
'recaptcha': ""
})
validation_url = url.split("zoom.us")[0] + "zoom.us/rec/validate_meet_passwd"
validation_response = self._download_json(
validation_url, video_id,
note='Validating Password...',
errnote='Wrong password?',
data=data)
if validation_response['errorCode'] != 0:
raise ExtractorError('Login failed, %s said: %r' % (self.IE_NAME, validation_response['errorMessage']))

View File

@@ -1147,18 +1147,13 @@ def _dict_from_multiple_values_options_callback(
metavar='FIELD:FORMAT', dest='metafromfield', action='append',
help=(
'Parse additional metadata like title/artist from other fields. '
'Give a template or field name to extract data from and the '
'format to interpret it as, seperated by a ":". '
'Give field name to extract data from, and format of the field seperated by a ":". '
'Either regular expression with named capture groups or a '
'similar syntax to the output template can be used for the FORMAT. '
'Similarly, the syntax for output template can be used for FIELD '
'to parse the data from multiple fields. '
'The parsed parameters replace any existing values and can be used in output templates. '
'similar syntax to the output template can also be used. '
'The parsed parameters replace any existing values and can be use in output template. '
'This option can be used multiple times. '
'Example: --parse-metadata "title:%(artist)s - %(title)s" matches a title like '
'"Coldplay - Paradise". '
'Example: --parse-metadata "%(series)s %(episode_number)s:%(title)s" '
'sets the title using series and episode number. '
'Example (regex): --parse-metadata "description:Artist - (?P<artist>.+?)"'))
postproc.add_option(
'--xattrs',

View File

@@ -4,10 +4,11 @@
from .common import PostProcessor
from ..compat import compat_str
from ..utils import str_or_none
class MetadataFromFieldPP(PostProcessor):
regex = r'(?P<in>.+):(?P<out>.+)$'
regex = r'(?P<field>\w+):(?P<format>.+)$'
def __init__(self, downloader, formats):
PostProcessor.__init__(self, downloader)
@@ -18,20 +19,11 @@ def __init__(self, downloader, formats):
match = re.match(self.regex, f)
assert match is not None
self._data.append({
'in': match.group('in'),
'out': match.group('out'),
'tmpl': self.field_to_template(match.group('in')),
'regex': self.format_to_regex(match.group('out')),
})
'field': match.group('field'),
'format': match.group('format'),
'regex': self.format_to_regex(match.group('format'))})
@staticmethod
def field_to_template(tmpl):
if re.match(r'\w+$', tmpl):
return '%%(%s)s' % tmpl
return tmpl
@staticmethod
def format_to_regex(fmt):
def format_to_regex(self, fmt):
r"""
Converts a string like
'%(title)s - %(artist)s'
@@ -45,7 +37,7 @@ def format_to_regex(fmt):
# replace %(..)s with regex group and escape other string parts
for match in re.finditer(r'%\((\w+)\)s', fmt):
regex += re.escape(fmt[lastpos:match.start()])
regex += r'(?P<%s>[^\r\n]+)' % match.group(1)
regex += r'(?P<' + match.group(1) + r'>[^\r\n]+)'
lastpos = match.end()
if lastpos < len(fmt):
regex += re.escape(fmt[lastpos:])
@@ -53,16 +45,22 @@ def format_to_regex(fmt):
def run(self, info):
for dictn in self._data:
tmpl, info_copy = self._downloader.prepare_outtmpl(dictn['tmpl'], info)
data_to_parse = tmpl % info_copy
self.write_debug('Searching for r"%s" in %s' % (dictn['regex'], tmpl))
match = re.search(dictn['regex'], data_to_parse)
field, regex = dictn['field'], dictn['regex']
if field not in info:
self.report_warning('Video doesnot have a %s' % field)
continue
data_to_parse = str_or_none(info[field])
if data_to_parse is None:
self.report_warning('Field %s cannot be parsed' % field)
continue
self.write_debug('Searching for r"%s" in %s' % (regex, field))
match = re.search(regex, data_to_parse)
if match is None:
self.report_warning('Could not interpret video %s as "%s"' % (dictn['in'], dictn['out']))
self.report_warning('Could not interpret video %s as "%s"' % (field, dictn['format']))
continue
for attribute, value in match.groupdict().items():
info[attribute] = value
self.to_screen('parsed %s from "%s": %s' % (attribute, dictn['in'], value if value is not None else 'NA'))
self.to_screen('parsed %s from %s: %s' % (attribute, field, value if value is not None else 'NA'))
return [], info

View File

@@ -2423,15 +2423,6 @@ def __init__(self, msg, exc_info=None):
self.exc_info = exc_info
class EntryNotInPlaylist(YoutubeDLError):
"""Entry not in playlist exception.
This exception will be thrown by YoutubeDL when a requested entry
is not found in the playlist info_dict
"""
pass
class SameFileError(YoutubeDLError):
"""Same File exception.
@@ -4205,20 +4196,6 @@ def q(qid):
'pl_infojson': 'info.json',
}
# As of [1] format syntax is:
# %[mapping_key][conversion_flags][minimum_width][.precision][length_modifier]type
# 1. https://docs.python.org/2/library/stdtypes.html#string-formatting
FORMAT_RE = r'''(?x)
(?<!%)
%
\({0}\) # mapping key
(?:[#0\-+ ]+)? # conversion flags (optional)
(?:\d+)? # minimum field width (optional)
(?:\.\d+)? # precision (optional)
[hlL]? # length modifier (optional)
(?P<type>[diouxXeEfFgGcrs%]) # conversion type
'''
def limit_length(s, length):
""" Add ellipses to overly long strings """

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2021.03.24'
__version__ = '2021.03.15'