Pyspark: ?·?°??N€N??·???° N„?°???»???? ???µ????N?N€?µ??N?N‚???µ?????? ?? hdfs

?? ?????????»N????? ????????N‡???? ?? pyspark, ???°N‡?????°N? N? ?? RDD ?? ???µ??N? ?µN?N‚N? (url, names) ?? N?N‚???? N„??N€???µ:

url1 [name1, name2,..., nameN]
url2 [name2, name44,..., nameN]
url3 [name1, name3,..., nameM]
...

?”?»N? ???°?¶???????? URL-?°??N€?µN??° N? N…??N‚?µ?» ?±N‹ ???·N?N‚N? ??N??µ ?????µ???° ?? ??N??????»N??·?????°N‚N? ???°?¶??N‹?? ???· ????N…, N‡N‚???±N‹ ?·?°??N€N??·??N‚N? ??N‚???µ?»N???N‹?? N„?°???», ??N??????»N??·N?N? http-?·?°??N€??N?, ???°??N€?????µN€, ???»N? url1, ????N‚??N€N‹?? N? N…??N‚?µ?» ?±N‹ N????µ?»?°N‚N? ?°???°?»??????N‡???? N?N‚????N? (?µN??»?? ?±N‹ N? ?±N‹?» ?? ???»?°N?N???N‡?µN??????? ??N‚?µN€?°N†????) :

requests.get('http://some_site.com/'+str(name1))
requests.get('http://some_site.com/'+str(name2))
...
requests.get('http://some_site.com/'+str(nameN))

?YN€???±?»?µ???° ?? N‚????, N‡N‚?? N? N…??N‚?µ?» ?±N‹ ?·?°??N€N??·??N‚N? ??N… ???µ????N?N€?µ??N?N‚???µ?????? ?? hdfs, ???µ ????????N€N?N? ??N??µ N„?°???»N‹ ???»N? ???°?¶???????? N€?°?±??N‡?µ????, ???°?? ??N€?µ???»?°???°?»??N?N? ?·???µN?N? N? ????????N‰N?NZ ???????°????N‹ addFile (??N?N‚N?), ???°?? dataframereader ?????¶?µN‚ N‡??N‚?°N‚N? http? ?›NZ?±???? N?????N????± N????µ?»?°N‚N? N?N‚?? ???µ????N?N€?µ??N?N‚???µ?????? ???· ??N???N€?????????? ??N€???»???¶?µ????N?? ?¤?°???» ??N‡?µ??N? N‚N??¶?µ?»N‹??, ?? N? ???µ ??????N? N???N…N€?°????N‚N? ??N… ???° N??????µ?? ???°N??????µ, N‡N‚???±N‹ ?·?°??N€N??·??N‚N? ??N… ?? hdfs ?? ??N€N??????? ???????µ??N‚

файл, апач искровые pyspark,

0

Ответов: 0

Pyspark: ?·?°??N€N??·???° N„?°???»???? ???µ????N?N€?µ??N?N‚???µ?????? ?? hdfs

?? ?????????»N????? ????????N‡???? ?? pyspark, ???°N‡?????°N? N? ?? RDD ?? ???µ??N? ?µN?N‚N? (url, names) ?? N?N‚???? N„??N€???µ:

url1 [name1, name2,..., nameN]
url2 [name2, name44,..., nameN]
url3 [name1, name3,..., nameM]
...

?”?»N? ???°?¶???????? URL-?°??N€?µN??° N? N…??N‚?µ?» ?±N‹ ???·N?N‚N? ??N??µ ?????µ???° ?? ??N??????»N??·?????°N‚N? ???°?¶??N‹?? ???· ????N…, N‡N‚???±N‹ ?·?°??N€N??·??N‚N? ??N‚???µ?»N???N‹?? N„?°???», ??N??????»N??·N?N? http-?·?°??N€??N?, ???°??N€?????µN€, ???»N? url1, ????N‚??N€N‹?? N? N…??N‚?µ?» ?±N‹ N????µ?»?°N‚N? ?°???°?»??????N‡???? N?N‚????N? (?µN??»?? ?±N‹ N? ?±N‹?» ?? ???»?°N?N???N‡?µN??????? ??N‚?µN€?°N†????) :

requests.get('http://some_site.com/'+str(name1))
requests.get('http://some_site.com/'+str(name2))
...
requests.get('http://some_site.com/'+str(nameN))

?YN€???±?»?µ???° ?? N‚????, N‡N‚?? N? N…??N‚?µ?» ?±N‹ ?·?°??N€N??·??N‚N? ??N… ???µ????N?N€?µ??N?N‚???µ?????? ?? hdfs, ???µ ????????N€N?N? ??N??µ N„?°???»N‹ ???»N? ???°?¶???????? N€?°?±??N‡?µ????, ???°?? ??N€?µ???»?°???°?»??N?N? ?·???µN?N? N? ????????N‰N?NZ ???????°????N‹ addFile (??N?N‚N?), ???°?? dataframereader ?????¶?µN‚ N‡??N‚?°N‚N? http? ?›NZ?±???? N?????N????± N????µ?»?°N‚N? N?N‚?? ???µ????N?N€?µ??N?N‚???µ?????? ???· ??N???N€?????????? ??N€???»???¶?µ????N?? ?¤?°???» ??N‡?µ??N? N‚N??¶?µ?»N‹??, ?? N? ???µ ??????N? N???N…N€?°????N‚N? ??N… ???° N??????µ?? ???°N??????µ, N‡N‚???±N‹ ?·?°??N€N??·??N‚N? ??N… ?? hdfs ?? ??N€N??????? ???????µ??N‚

00N„?°???», ?°???°N‡ ??N???N€????N‹?µ pyspark,
Похожие вопросы
Яндекс.Метрика