MLlib is Apache Spark's scalable machine learning library.

    Ease of use

    Usable in Java, Scala, Python, and R.

    MLlib fits into Spark's APIs and interoperates with NumPy in Python (as of Spark 0.9) and R libraries (as of Spark 1.5). You can use any Hadoop data source (e.g. HDFS, HBase, or local files), making it easy to plug into Hadoop workflows.

    data = spark.read.format("libsvm")\
      .load("hdfs://...")

    model = KMeans(k=10).fit(data)
    Calling MLlib in Python

    Performance

    High-quality algorithms, 100x faster than MapReduce.

    Spark excels at iterative computation, enabling MLlib to run fast. At the same time, we care about algorithmic performance: MLlib contains high-quality algorithms that leverage iteration, and can yield better results than the one-pass approximations sometimes used on MapReduce.

    Logistic regression in Hadoop and Spark

    Runs everywhere

    Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud, against diverse data sources.

    You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.

    Algorithms

    MLlib contains many algorithms and utilities.

    ML algorithms include:

    • Classification: logistic regression, naive Bayes,...
    • Regression: generalized linear regression, survival regression,...
    • Decision trees, random forests, and gradient-boosted trees
    • Recommendation: alternating least squares (ALS)
    • Clustering: K-means, Gaussian mixtures (GMMs),...
    • Topic modeling: latent Dirichlet allocation (LDA)
    • Frequent itemsets, association rules, and sequential pattern mining

    ML workflow utilities include:

    • Feature transformations: standardization, normalization, hashing,...
    • ML Pipeline construction
    • Model evaluation and hyper-parameter tuning
    • ML persistence: saving and loading models and Pipelines

    Other utilities include:

    • Distributed linear algebra: SVD, PCA,...
    • Statistics: summary statistics, hypothesis testing,...

    Refer to the MLlib guide for usage examples.

    Community

    MLlib is developed as part of the Apache Spark project. It thus gets tested and updated with each Spark release.

    If you have questions about the library, ask on the Spark mailing lists.

    MLlib is still a rapidly growing project and welcomes contributions. If you'd like to submit an algorithm to MLlib, read how to contribute to Spark and send us a patch!

    Getting started

    To get started with MLlib:

    • Download Spark. MLlib is included as a module.
    • Read the MLlib guide, which includes various usage examples.
    • Learn how to deploy Spark on a cluster if you'd like to run in distributed mode. You can also run locally on a multicore machine without any setup.
    主站蜘蛛池模板: 精品一区二区三区中文| 中文字幕一区二区三区免费视频| 亚洲福利视频一区| 日本一区高清视频| 无码中文人妻在线一区二区三区| 一区二区三区午夜视频| 无码人妻AⅤ一区二区三区| 国产自产在线视频一区| 国产AV午夜精品一区二区三区| 国产伦精品一区二区免费| 一区二区三区中文字幕| 四虎永久在线精品免费一区二区| 日本视频一区二区三区| 欧洲精品一区二区三区在线观看| 精品一区二区三区AV天堂| 国产一区高清视频| 日本在线观看一区二区三区| 日韩精品电影一区亚洲| 亚洲乱码国产一区网址| 亲子乱AV视频一区二区| 一区二区视频在线播放| 久久精品免费一区二区| 亚洲国产欧美一区二区三区 | 日本视频一区在线观看免费| 韩国美女vip福利一区| 一本大道在线无码一区| 精品视频午夜一区二区| 日韩人妻无码一区二区三区久久99 | 人妻激情偷乱视频一区二区三区| 日韩一区二区三区无码影院| 在线观看一区二区三区av| 在线观看一区二区三区视频| 武侠古典一区二区三区中文| 制服丝袜一区二区三区| 亚洲一区在线免费观看| 福利片福利一区二区三区| 日本一区二区在线| 国精产品一区一区三区有限在线| 无码人妻精品一区二区三区99不卡 | 成人久久精品一区二区三区| 日本一区二区在线不卡|