MLlib is Apache Spark's scalable machine learning library.

    Ease of use

    Usable in Java, Scala, Python, and R.

    MLlib fits into Spark's APIs and interoperates with NumPy in Python (as of Spark 0.9) and R libraries (as of Spark 1.5). You can use any Hadoop data source (e.g. HDFS, HBase, or local files), making it easy to plug into Hadoop workflows.

    data = spark.read.format("libsvm")\
      .load("hdfs://...")

    model = KMeans(k=10).fit(data)
    Calling MLlib in Python

    Performance

    High-quality algorithms, 100x faster than MapReduce.

    Spark excels at iterative computation, enabling MLlib to run fast. At the same time, we care about algorithmic performance: MLlib contains high-quality algorithms that leverage iteration, and can yield better results than the one-pass approximations sometimes used on MapReduce.

    Logistic regression in Hadoop and Spark

    Runs everywhere

    Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud, against diverse data sources.

    You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.

    Algorithms

    MLlib contains many algorithms and utilities.

    ML algorithms include:

    • Classification: logistic regression, naive Bayes,...
    • Regression: generalized linear regression, survival regression,...
    • Decision trees, random forests, and gradient-boosted trees
    • Recommendation: alternating least squares (ALS)
    • Clustering: K-means, Gaussian mixtures (GMMs),...
    • Topic modeling: latent Dirichlet allocation (LDA)
    • Frequent itemsets, association rules, and sequential pattern mining

    ML workflow utilities include:

    • Feature transformations: standardization, normalization, hashing,...
    • ML Pipeline construction
    • Model evaluation and hyper-parameter tuning
    • ML persistence: saving and loading models and Pipelines

    Other utilities include:

    • Distributed linear algebra: SVD, PCA,...
    • Statistics: summary statistics, hypothesis testing,...

    Refer to the MLlib guide for usage examples.

    Community

    MLlib is developed as part of the Apache Spark project. It thus gets tested and updated with each Spark release.

    If you have questions about the library, ask on the Spark mailing lists.

    MLlib is still a rapidly growing project and welcomes contributions. If you'd like to submit an algorithm to MLlib, read how to contribute to Spark and send us a patch!

    Getting started

    To get started with MLlib:

    • Download Spark. MLlib is included as a module.
    • Read the MLlib guide, which includes various usage examples.
    • Learn how to deploy Spark on a cluster if you'd like to run in distributed mode. You can also run locally on a multicore machine without any setup.
    主站蜘蛛池模板: 91久久精品一区二区| 久久精品道一区二区三区| 精品国产毛片一区二区无码 | 97人妻无码一区二区精品免费| 日韩美一区二区三区| 亚洲天堂一区二区三区四区| 国产丝袜视频一区二区三区| 午夜肉伦伦影院久久精品免费看国产一区二区三区| 亚洲高清一区二区三区电影| 人妻少妇精品视频一区二区三区| 国产福利电影一区二区三区,日韩伦理电影在线福 | 久久毛片免费看一区二区三区 | 日本一区二区三区免费高清在线 | 冲田杏梨AV一区二区三区| 国产福利一区二区三区| 无码精品人妻一区二区三区人妻斩| 超清无码一区二区三区| 日本一区二区三区精品国产| 亚洲欧美国产国产一区二区三区| 亚洲日本一区二区| 亚洲日本va午夜中文字幕一区| 亚洲乱码国产一区三区| 久久99国产精品一区二区| 最美女人体内射精一区二区| 精品国产一区二区22| 变态拳头交视频一区二区| 精品人体无码一区二区三区| 亚洲视频一区二区| 中文字幕一区二区三区在线播放 | 狠狠色婷婷久久一区二区三区 | 91福利视频一区| 日韩精品无码一区二区三区不卡 | 国产一区二区精品久久岳| 国产AⅤ精品一区二区三区久久| 精品国产AⅤ一区二区三区4区 | 一区二区三区免费视频网站| 亚洲一区无码精品色| 久久亚洲综合色一区二区三区| 无码人妻久久一区二区三区免费| 久久久久人妻精品一区| 国产精品成人免费一区二区|